Skip to main content

LLM_SQL_Report_Demo

  • October 30, 2025
  • 0 replies
  • 9 views

fmelizard
Safer
Forum|alt.badge.img+21
FME Hub user antoine just uploaded a new template to the FME Hub.

FME-SQL-LLM Report Generator

Type: Template workspace (Form)

Purpose: Turn a plain-language question about a table into a planned set of SQL queries, execute them in DuckDB, and produce a Markdown → HTML report using three lightweight LLM roles (Planner, SQL Generator, Analyst).

What it does

Standardizes input: reads any tabular file (Excel/CSV/Parquet), normalizes headers, and writes a temporary Parquet dataset.

Profiles data: extracts schema, summary stats, and sample rows with DuckDB (compact “metadata” for prompts).

Plans the analysis (Planner LLM): converts the goal + metadata into formal sub-questions and SQL intents.

Drafts SQL (SQL Generator LLM): returns executable SQL per sub-question (minimal JSON: just sql).

Executes safely in DuckDB: runs each query with guardrails (e.g., COPY (SELECT ...), semicolon stripping, optional LIMIT cap).

Analyzes results (Analyst LLM): synthesizes named result sets into a Markdown narrative answering the original question.

Publishes: renders Markdown to HTML with a simple template and writes the final report.

Why it’s useful

Scales expertise: lets non-SQL users ask meaningful questions; the system plans and writes SQL for them.

Keeps LLMs in their lane: models see only metadata + small samples, while DuckDB does the heavy lifting.

Repeatable & portable: drop in any table, update the research question, and re-run—same pipeline, new insight.

Inputs & outputs

Inputs

Any tabular file (e.g., .xlsx, .csv, .parquet).

A natural-language question (e.g., “Which locations have the most cyclists, and when are the peaks?”).

Outputs

report.html — final HTML report (from generated Markdown).

Intermediates (optional to keep):

temp/*.parquet (standardized table)

schema.sql (DDL), summary.json (stats), sample.csv (few rows)

questions.json (planner output), queries.jsonl (SQL list)

results/*.csv (per-query results, capped)

Parameters (exposed)

Research question (string).

Result row cap (int; default 30) for safety/preview.

Keep intermediates (yes/no).

Model choices

Planner model (name / endpoint / temperature)

SQL Generator model (name / endpoint / temperature)

Analyst model (name / endpoint / temperature)

DuckDB work folder (temp path override).

Markdown → HTML template (file/path or inline).

How it works (pipeline)

1) Convert → Parquet

Normalize headers (e.g., replace non-alphanumerics), write a temp Parquet.

2) Profile in DuckDB

Emit schema.sql, summary.json (via SUMMARIZE/aggregates), and sample.csv (tiny SELECT).

3) Plan (Planner LLM)

Input: research question + metadata → Output: sub-questions + SQL intents (strict JSON).

4) Draft SQL (SQL Generator LLM)

For each sub-question → return only sql (strict), flatten to attributes.

5) Execute in DuckDB

- Remove trailing ;

- Wrap non-SELECT so COPY (SELECT ...) TO ... always receives a SELECT

- Apply optional LIMIT {cap} to guard output size

- Save per-query result files

6) Analyze (Analyst LLM)

Provide original task + metadata + named result sets → Markdown narrative with findings/limitations.

7) Publish

Inject Markdown into HTML template and write report.html.

Guardrails & conventions

No raw table → LLM: only schema, summary, sample rows.

Deterministic output shape: use COPY (SELECT ...) for stable CSV schemas.

Safety caps: configurable LIMIT on SELECT queries.

Strict JSON for planner/generator outputs; validate before execution.

Repro hints: log model name, temperature, and prompts with a run ID.

Example prompts (minimal)

Planner (system, abridged):

You are a data analysis planner. Given a goal and compact metadata (schema, summary, sample),

return STRICT JSON with sub-questions and brief SQL intents. No SQL execution.

SQL Generator (system, abridged):

You return ONLY a JSON object: {"sql": "<single SQL statement>"} executable in DuckDB.

No explanation, no comments, no code fences.

Analyst (system, abridged):

Synthesize named result tables into a concise Markdown report answering the original question.

Reference table names; note assumptions and limitations. No code.

Requirements

FME Form 2025.1+

DuckDB (included/embedded or accessible from FME run environment)

LLM endpoints (OpenAI-compatible or local gateway) for the three roles

Extending

Add validation checks (row count, coverage, sum/avg bounds) before the Analyst step.

Swap Markdown → PDF or add small charts from result CSVs.

Cache metadata for repeated runs on the same dataset.

Localize labels / dictionary for non-English reports.

Version & compatibility

v1.0 — initial release (three-role pipeline, HTML reporting).

Tested with FME Form 2025.1 and DuckDB 1.x.

Tags

FME · DuckDB · LLM · OpenAI-compatible · SQL · Analytics · Automation · Parquet · Data Profiling · Markdown · HTML Report



Would you like to know more? Click here to find out more details!
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.