Apache-2.0 Postgres-native read-only by construction

SQL is engineering,
not conversation.

PromptQuery (prq) turns plain English into read-only Postgres SQL — deterministically, on schemas with hundreds of tables, not toy demos. It introspects your schema, generates the query, shows it to you, and runs it read-only. No chat. No drift.

pip install promptquery
One question, end to end: plain English in → tables picked from the schema → a single read-only SELECTRun? [y/N] → rows out.

The numbers

The pitch isn't a demo. It's a benchmark — with the failures committed.

Retrieval accuracy on two real, recognizable production schemas. Not Spider, not BIRD, not a 10-table fixture. Every run — including the unfavourable ones — is a JSON receipt checked into the repo, reproducible with one command.

100% +16pp vs naive

Odoo 18 ERP

675 tables · 25 questions

A real, FK-linked ERP schema — the kind that breaks every tool that stuffs the whole thing into a prompt.

94% +12pp vs naive

EMBL-EBI RNAcentral

216 tables · 50 questions

An unrelated scientific database, public and read-only. Same shape of result on a completely different domain.

conditions Parsing-mode retrieval accuracy — does the generated SQL reference the tables the question must touch — measured with gpt-4o for generation and gpt-4o-mini as the table selector, on the PromptQuery v0.2 pipeline. Not execution-equality. Stated plainly because a number without its conditions is marketing, not evidence.

The naive "stuff the whole schema into one prompt" baseline scored 84% on Odoo and 82% on RNAcentral. The v0.1 pipeline actually did worse than naive on both — that receipt is committed too, on purpose.

Reproduce the benchmarks → eval/ in the repo

How it works

A retrieval problem, not a chat problem.

The hard part of text-to-SQL on a 675-table schema isn't writing the SQL — it's finding the six tables that matter, every time, without drifting. The pipeline is mechanical and you can read each stage:

  1. 01

    Introspect

    On connect, read the full schema from pg_catalog — tables, columns, types, and the foreign-key graph. No embeddings, no GPU, no model download. It works the instant you connect.

  2. 02

    Rank — TF-IDF retrieval

    Score every table against the question by stemmed TF-IDF over names, comments, columns and FK targets. Top 50 candidates surface. Pure Python; the cost is that it can't reason about synonyms — which is what the next step is for.

  3. 03

    Select — a cheap LLM call

    One small model reads the candidate list and the question, and returns the ~15 tables that are actually relevant — catching the semantic gaps TF-IDF misses ("invoice"account_move). Fails closed: any error falls back to the TF-IDF prefix.

  4. 04

    Expand — walk the FK graph

    Walk one hop out and one hop in across the foreign-key graph to pick up join targets the question never named. Capped at 25 tables — that's all the generator ever sees, instead of all 675.

  5. 05

    Generate — deterministically

    The frontier model receives ~25 tables and writes a single SELECT. Temperature 0 (and a fixed seed on OpenAI chat models): the same question yields the same SQL. Deterministic by default — best-effort across model snapshots, and reasoning models are the explicit exception.

  6. 06

    Validate, confirm, run

    Two safety layers reject anything that isn't read-only, the query is shown to you, and only an explicit y runs it — read-only, with a 60-second timeout. Rows out as a table, JSON, or CSV.

defaults 50 candidates → 15 selected → 25 max tables. Every cap is a tunable flag.

Safety

Two independent layers. A write is designed to be impossible.

The audience's deepest fear is a tool writing to prod. So the read-only guarantee is belt-and-suspenders: two redundant layers, either of which alone would block a write. Even a jailbroken prompt can't mutate the database.

Layer A · database-enforced

The session opens read-only

Every connection runs SET default_transaction_read_only = on before any query — Postgres itself refuses non-SELECT operations — plus a statement_timeout of 60 seconds. The guarantee comes from the database, not the application.

Layer B · pre-execution

An AST guard rejects non-reads

Before anything runs, the generated SQL is parsed with sqlglot in Postgres dialect and rejected unless it's a single read-only statement. It catches CTEs that try to hide a DELETEWITH x AS (DELETE … RETURNING …) SELECT * FROM x — via a recursive scan, and blocks 8 state-mutating Postgres functions by name.

plus A third practical guard: every query is shown to you and waits for an explicit y / yes before it runs. Runs on your machine, against your database — your data never leaves the box.

How it compares

Honest about where the others are good.

Vanna, WrenAI and IDE assistants are genuinely good tools — at the jobs they're built for. PromptQuery doesn't win on breadth. It wins on a narrow axis: CLI-first, deterministic, read-only by construction, and built for production scale.

  PromptQuery Vanna WrenAI IDE assistants
Form factor CLI / pipe-friendly Python library & web GenBI app & UI Editor inline
Deterministic by default Yes — temp 0* Not by default Not by default No
Read-only by construction Two independent layers Depends on setup Governance-gated You run what it writes
Built for 100s of tables Retrieval + FK graph RAG, you train it Semantic layer, you author it Whatever's in context
Setup before first query None — connect & ask Train on your data Model the semantic layer None
Honest, committed benchmark Failures included Accuracy paper
Where they're better Not for dashboards or BI Rich RAG, charts, web UI Dashboards, semantic governance In-editor, multi-language

* Best-effort across model snapshots; reasoning models (GPT-5.x / o-series) sample internally and are the explicit exception.

Try it

Run it against a real 216-table database — no setup.

EMBL-EBI publishes RNAcentral as a public, read-only Postgres database. Install PromptQuery, set one API key, and point prq straight at it. Same query that's in the demo, on a real production schema, in under a minute.

  1. 1

    Install

    pip install "promptquery[openai]"
  2. 2

    Set one key

    export OPENAI_API_KEY=sk-...
  3. 3

    Ask the public RNAcentral DB a question

    prq postgresql://reader:[email protected]:5432/pfmegrnargs

    Then, at the prompt, ask in plain English — e.g. “how many human sequences are there grouped by RNA type”. PromptQuery shows the SQL and waits for your y.

Public credentials are published by EMBL-EBI for tutorial use. The session is read-only; nothing you do can change it.

What it does NOT do (yet)

The limits, up front.

  • No writes. SELECT-only, by design.
  • Postgres only. MySQL / SQLite are on the roadmap.
  • One database at a time. No multi-DB sessions.
  • No charts. Rows out — pipe to csv / jq / your tool.
  • No query history between sessions yet.
  • No embeddings, no MCP, no local LLM yet — all on the roadmap.

Naming the weaknesses is the point. It's how you know the strong claims are real.