Odoo 18 ERP
A real, FK-linked ERP schema — the kind that breaks every tool that stuffs the whole thing into a prompt.
Apache-2.0 Postgres-native read-only by construction
PromptQuery (prq) turns plain English into read-only Postgres SQL —
deterministically, on schemas with hundreds of tables, not toy demos.
It introspects your schema, generates the query, shows it to you, and runs it read-only.
No chat. No drift.
pip install promptquery
SELECT → Run? [y/N] → rows out.
The numbers
Retrieval accuracy on two real, recognizable production schemas. Not Spider, not BIRD, not a 10-table fixture. Every run — including the unfavourable ones — is a JSON receipt checked into the repo, reproducible with one command.
A real, FK-linked ERP schema — the kind that breaks every tool that stuffs the whole thing into a prompt.
An unrelated scientific database, public and read-only. Same shape of result on a completely different domain.
conditions
Parsing-mode retrieval accuracy — does the generated SQL reference the tables the question
must touch — measured with gpt-4o for generation and gpt-4o-mini as the
table selector, on the PromptQuery v0.2 pipeline. Not execution-equality. Stated plainly because
a number without its conditions is marketing, not evidence.
The naive "stuff the whole schema into one prompt" baseline scored 84% on Odoo and 82% on RNAcentral. The v0.1 pipeline actually did worse than naive on both — that receipt is committed too, on purpose.
Reproduce the benchmarks → eval/ in the repoHow it works
The hard part of text-to-SQL on a 675-table schema isn't writing the SQL — it's finding the six tables that matter, every time, without drifting. The pipeline is mechanical and you can read each stage:
On connect, read the full schema from pg_catalog — tables, columns, types, and the foreign-key graph. No embeddings, no GPU, no model download. It works the instant you connect.
Score every table against the question by stemmed TF-IDF over names, comments, columns and FK targets. Top 50 candidates surface. Pure Python; the cost is that it can't reason about synonyms — which is what the next step is for.
One small model reads the candidate list and the question, and returns the ~15 tables that are actually relevant — catching the semantic gaps TF-IDF misses ("invoice" → account_move). Fails closed: any error falls back to the TF-IDF prefix.
Walk one hop out and one hop in across the foreign-key graph to pick up join targets the question never named. Capped at 25 tables — that's all the generator ever sees, instead of all 675.
The frontier model receives ~25 tables and writes a single SELECT. Temperature 0 (and a fixed seed on OpenAI chat models): the same question yields the same SQL. Deterministic by default — best-effort across model snapshots, and reasoning models are the explicit exception.
Two safety layers reject anything that isn't read-only, the query is shown to you, and only an explicit y runs it — read-only, with a 60-second timeout. Rows out as a table, JSON, or CSV.
defaults 50 candidates → 15 selected → 25 max tables. Every cap is a tunable flag.
Safety
The audience's deepest fear is a tool writing to prod. So the read-only guarantee is belt-and-suspenders: two redundant layers, either of which alone would block a write. Even a jailbroken prompt can't mutate the database.
Every connection runs SET default_transaction_read_only = on before any
query — Postgres itself refuses non-SELECT operations — plus a
statement_timeout of 60 seconds. The guarantee comes from the
database, not the application.
Before anything runs, the generated SQL is parsed with sqlglot in Postgres dialect
and rejected unless it's a single read-only statement. It catches CTEs that try to hide a
DELETE — WITH x AS (DELETE … RETURNING …) SELECT * FROM x — via a recursive
scan, and blocks 8 state-mutating Postgres functions by name.
plus
A third practical guard: every query is shown to you and waits for an explicit
y / yes before it runs. Runs on your machine, against your database — your data
never leaves the box.
How it compares
Vanna, WrenAI and IDE assistants are genuinely good tools — at the jobs they're built for. PromptQuery doesn't win on breadth. It wins on a narrow axis: CLI-first, deterministic, read-only by construction, and built for production scale.
| PromptQuery | Vanna | WrenAI | IDE assistants | |
|---|---|---|---|---|
| Form factor | CLI / pipe-friendly | Python library & web | GenBI app & UI | Editor inline |
| Deterministic by default | Yes — temp 0* | Not by default | Not by default | No |
| Read-only by construction | Two independent layers | Depends on setup | Governance-gated | You run what it writes |
| Built for 100s of tables | Retrieval + FK graph | RAG, you train it | Semantic layer, you author it | Whatever's in context |
| Setup before first query | None — connect & ask | Train on your data | Model the semantic layer | None |
| Honest, committed benchmark | Failures included | Accuracy paper | — | — |
| Where they're better | Not for dashboards or BI | Rich RAG, charts, web UI | Dashboards, semantic governance | In-editor, multi-language |
* Best-effort across model snapshots; reasoning models (GPT-5.x / o-series) sample internally and are the explicit exception.
Try it
EMBL-EBI publishes RNAcentral as a public, read-only Postgres database. Install PromptQuery,
set one API key, and point prq straight at it. Same query that's in the demo, on a
real production schema, in under a minute.
Install
pip install "promptquery[openai]"
Set one key
export OPENAI_API_KEY=sk-...
Ask the public RNAcentral DB a question
prq postgresql://reader:[email protected]:5432/pfmegrnargs
Then, at the prompt, ask in plain English — e.g. “how many human sequences are there grouped by RNA type”. PromptQuery shows the SQL and waits for your y.
Public credentials are published by EMBL-EBI for tutorial use. The session is read-only; nothing you do can change it.
What it does NOT do (yet)
SELECT-only, by design.csv / jq / your tool.Naming the weaknesses is the point. It's how you know the strong claims are real.