Apache-2.0 Postgres-native read-only by construction

SQL is engineering,
not conversation.

PromptQuery (prq) turns plain English into read-only Postgres SQL — deterministically, on schemas with hundreds of tables, not toy demos. It introspects your schema, generates the query, shows it to you, and runs it read-only. No chat. No drift.

pip install promptquery

View on GitHub PyPI

PromptQuery CLI converting a plain-English question into a read-only Postgres SELECT, then asking for confirmation before running it. — One question, end to end: plain English in → tables picked from the schema → a single read-only `SELECT` → `Run? [y/N]` → rows out.

The numbers

The pitch isn't a demo. It's a benchmark — with the failures committed.

Retrieval accuracy on two real, recognizable production schemas. Not Spider, not BIRD, not a 10-table fixture. Every run — including the unfavourable ones — is a JSON receipt checked into the repo, reproducible with one command.

100% +16pp vs naive

Odoo 18 ERP

675 tables · 25 questions

A real, FK-linked ERP schema — the kind that breaks every tool that stuffs the whole thing into a prompt.

94% +12pp vs naive

EMBL-EBI RNAcentral

216 tables · 50 questions

An unrelated scientific database, public and read-only. Same shape of result on a completely different domain.

conditions Parsing-mode retrieval accuracy — does the generated SQL reference the tables the question must touch — measured with gpt-4o for generation and gpt-4o-mini as the table selector, on the PromptQuery v0.2 pipeline. Not execution-equality. Stated plainly because a number without its conditions is marketing, not evidence.

The naive "stuff the whole schema into one prompt" baseline scored 84% on Odoo and 82% on RNAcentral. The v0.1 pipeline actually did worse than naive on both — that receipt is committed too, on purpose.

Reproduce the benchmarks → eval/ in the repo

How it works

A retrieval problem, not a chat problem.

The hard part of text-to-SQL on a 675-table schema isn't writing the SQL — it's finding the six tables that matter, every time, without drifting. The pipeline is mechanical and you can read each stage:

01

Introspect

On connect, read the full schema from pg_catalog — tables, columns, types, and the foreign-key graph. No embeddings, no GPU, no model download. It works the instant you connect.
02

Rank — TF-IDF retrieval

Score every table against the question by stemmed TF-IDF over names, comments, columns and FK targets. Top 50 candidates surface. Pure Python; the cost is that it can't reason about synonyms — which is what the next step is for.
03

Select — a cheap LLM call

One small model reads the candidate list and the question, and returns the ~15 tables that are actually relevant — catching the semantic gaps TF-IDF misses ("invoice" → account_move). Fails closed: any error falls back to the TF-IDF prefix.
04

Expand — walk the FK graph

Walk one hop out and one hop in across the foreign-key graph to pick up join targets the question never named. Capped at 25 tables — that's all the generator ever sees, instead of all 675.
05

Generate — deterministically

The frontier model receives ~25 tables and writes a single SELECT. Temperature 0 (and a fixed seed on OpenAI chat models): the same question yields the same SQL. Deterministic by default — best-effort across model snapshots, and reasoning models are the explicit exception.
06

Validate, confirm, run

Two safety layers reject anything that isn't read-only, the query is shown to you, and only an explicit y runs it — read-only, with a 60-second timeout. Rows out as a table, JSON, or CSV.

defaults 50 candidates → 15 selected → 25 max tables. Every cap is a tunable flag.

Safety

Two independent layers. A write is designed to be impossible.

The audience's deepest fear is a tool writing to prod. So the read-only guarantee is belt-and-suspenders: two redundant layers, either of which alone would block a write. Even a jailbroken prompt can't mutate the database.

Layer A · database-enforced

The session opens read-only

Every connection runs SET default_transaction_read_only = on before any query — Postgres itself refuses non-SELECT operations — plus a statement_timeout of 60 seconds. The guarantee comes from the database, not the application.

Layer B · pre-execution

An AST guard rejects non-reads

Before anything runs, the generated SQL is parsed with sqlglot in Postgres dialect and rejected unless it's a single read-only statement. It catches CTEs that try to hide a DELETE — WITH x AS (DELETE … RETURNING …) SELECT * FROM x — via a recursive scan, and blocks 8 state-mutating Postgres functions by name.

plus A third practical guard: every query is shown to you and waits for an explicit y / yes before it runs. Runs on your machine, against your database — your data never leaves the box.

How it compares

Honest about where the others are good.

Vanna, WrenAI and IDE assistants are genuinely good tools — at the jobs they're built for. PromptQuery doesn't win on breadth. It wins on a narrow axis: CLI-first, deterministic, read-only by construction, and built for production scale.

	PromptQuery	Vanna	WrenAI	IDE assistants
Form factor	CLI / pipe-friendly	Python library & web	GenBI app & UI	Editor inline
Deterministic by default	Yes — temp 0*	Not by default	Not by default	No
Read-only by construction	Two independent layers	Depends on setup	Governance-gated	You run what it writes
Built for 100s of tables	Retrieval + FK graph	RAG, you train it	Semantic layer, you author it	Whatever's in context
Setup before first query	None — connect & ask	Train on your data	Model the semantic layer	None
Honest, committed benchmark	Failures included	Accuracy paper	—	—
Where they're better	Not for dashboards or BI	Rich RAG, charts, web UI	Dashboards, semantic governance	In-editor, multi-language

* Best-effort across model snapshots; reasoning models (GPT-5.x / o-series) sample internally and are the explicit exception.

Try it

Run it against a real 216-table database — no setup.

EMBL-EBI publishes RNAcentral as a public, read-only Postgres database. Install PromptQuery, set one API key, and point prq straight at it. Same query that's in the demo, on a real production schema, in under a minute.

1

Install

pip install "promptquery[openai]"
2

Set one key

export OPENAI_API_KEY=sk-...
3

Ask the public RNAcentral DB a question

prq postgresql://reader:[email protected]:5432/pfmegrnargs

Then, at the prompt, ask in plain English — e.g. “how many human sequences are there grouped by RNA type”. PromptQuery shows the SQL and waits for your y.

Public credentials are published by EMBL-EBI for tutorial use. The session is read-only; nothing you do can change it.

What it does NOT do (yet)

The limits, up front.

No writes. SELECT-only, by design.
Postgres only. MySQL / SQLite are on the roadmap.
One database at a time. No multi-DB sessions.
No charts. Rows out — pipe to csv / jq / your tool.
No query history between sessions yet.
No embeddings, no MCP, no local LLM yet — all on the roadmap.

Naming the weaknesses is the point. It's how you know the strong claims are real.

SQL is engineering,not conversation.

The pitch isn't a demo. It's a benchmark — with the failures committed.

Odoo 18 ERP

EMBL-EBI RNAcentral

A retrieval problem, not a chat problem.

Introspect

Rank — TF-IDF retrieval

Select — a cheap LLM call

Expand — walk the FK graph

Generate — deterministically

Validate, confirm, run