Skip to main content
FME Hub user antoine just uploaded a new template to the FME Hub.

FME-SQL-LLM Report Generator

Type: Template workspace (Form)

Purpose: Turn a plain-language question about a table into a planned set of SQL queries, execute them in DuckDB, and produce a Markdown → HTML report using three lightweight LLM roles (Planner, SQL Generator, Analyst).

What it does

Standardizes input: reads any tabular file (Excel/CSV/Parquet), normalizes headers, and writes a temporary Parquet dataset.

Profiles data: extracts schema, summary stats, and sample rows with DuckDB (compact “metadata” for prompts).

Plans the analysis (Planner LLM): converts the goal + metadata into formal sub-questions and SQL intents.

Drafts SQL (SQL Generator LLM): returns executable SQL per sub-question (minimal JSON: just sql).

Executes safely in DuckDB: runs each query with guardrails (e.g., COPY (SELECT ...), semicolon stripping, optional LIMIT cap).

Analyzes results (Analyst LLM): synthesizes named result sets into a Markdown narrative answering the original question.

Publishes: renders Markdown to HTML with a simple template and writes the final report.

Why it’s useful

Scales expertise: lets non-SQL users ask meaningful questions; the system plans and writes SQL for them.

Keeps LLMs in their lane: models see only metadata + small samples, while DuckDB does the heavy lifting.

Repeatable & portable: drop in any table, update the research question, and re-run—same pipeline, new insight.

Inputs & outputs

Inputs

Any tabular file (e.g., .xlsx, .csv, .parquet).

A natural-language question (e.g., “Which locations have the most cyclists, and when are the peaks?”).

Outputs

report.html — final HTML report (from generated Markdown).

Intermediates (optional to keep):

temp/*.parquet (standardized table)

schema.sql (DDL), summary.json (stats), sample.csv (few rows)

questions.json (planner output), queries.jsonl (SQL list)

results/*.csv (per-query results, capped)

Parameters (exposed)

Research question (string).

Result row cap (int; default 30) for safety/preview.

Keep intermediates (yes/no).

Model choices

Planner model (name / endpoint / temperature)

SQL Generator model (name / endpoint / temperature)

Analyst model (name / endpoint / temperature)

DuckDB work folder (temp path override).

Markdown → HTML template (file/path or inline).

How it works (pipeline)

1) Convert → Parquet

Normalize headers (e.g., replace non-alphanumerics), write a temp Parquet.

2) Profile in DuckDB

Emit schema.sql, summary.json (via SUMMARIZE/aggregates), and sample.csv (tiny SELECT).

3) Plan (Planner LLM)

Input: research question + metadata → Output: sub-questions + SQL intents (strict JSON).

4) Draft SQL (SQL Generator LLM)

For each sub-question → return only sql (strict), flatten to attributes.

5) Execute in DuckDB

- Remove trailing ;

- Wrap non-SELECT so COPY (SELECT ...) TO ... always receives a SELECT

- Apply optional LIMIT {cap} to guard output size

- Save per-query result files

6) Analyze (Analyst LLM)

Provide original task + metadata + named result sets → Markdown narrative with findings/limitations.

7) Publish

Inject Markdown into HTML template and write report.html.

Guardrails & conventions

No raw table → LLM: only schema, summary, sample rows.

Deterministic output shape: use COPY (SELECT ...) for stable CSV schemas.

Safety caps: configurable LIMIT on SELECT queries.

Strict JSON for planner/generator outputs; validate before execution.

Repro hints: log model name, temperature, and prompts with a run ID.

Example prompts (minimal)

Planner (system, abridged):

You are a data analysis planner. Given a goal and compact metadata (schema, summary, sample),

return STRICT JSON with sub-questions and brief SQL intents. No SQL execution.

SQL Generator (system, abridged):

You return ONLY a JSON object: {"sql": "<single SQL statement>"} executable in DuckDB.

No explanation, no comments, no code fences.

Analyst (system, abridged):

Synthesize named result tables into a concise Markdown report answering the original question.

Reference table names; note assumptions and limitations. No code.

Requirements

FME Form 2025.1+

DuckDB (included/embedded or accessible from FME run environment)

LLM endpoints (OpenAI-compatible or local gateway) for the three roles

Extending

Add validation checks (row count, coverage, sum/avg bounds) before the Analyst step.

Swap Markdown → PDF or add small charts from result CSVs.

Cache metadata for repeated runs on the same dataset.

Localize labels / dictionary for non-English reports.

Version & compatibility

v1.0 — initial release (three-role pipeline, HTML reporting).

Tested with FME Form 2025.1 and DuckDB 1.x.

Tags

FME · DuckDB · LLM · OpenAI-compatible · SQL · Analytics · Automation · Parquet · Data Profiling · Markdown · HTML Report



Would you like to know more? Click here to find out more details!
Be the first to reply!