Skip to main content
Solved

DuckDB Summarize command

  • November 2, 2024
  • 4 replies
  • 120 views

oliver.morris
Contributor
Forum|alt.badge.img+14

I am trying to use the duckdb summarize command

https://duckdb.org/docs/archive/0.9/guides/meta/summarize

in an in memory duckdb using the sqlexecutor:

SUMMARIZE SELECT * FROM read_parquet('/Users/x/Downloads/Address.parquet');

This doesn’t return any results/attributes but ‘runs’ successfully in FME. In DBeaver it runs no issues.

Any idea is there is a trick to get this working?

Many ThanksQ!

Best answer by arnold_bijlsma

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • November 3, 2024

Have you exposed the Attributes on the Workspace?

SELECT *” gives no clues to SQLExecutor as to what Workspace Attribute Names to create for generated Workspace Features.  This will need to be set in the Attributes to Expose Parameter, or in a downstream AttributeExposer

 


 

 

If you however view the results in Data Inspector you will see on any individual Feature in the Feature Information any unexposed Attribute values.

 

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+15

@oliver.morris : I’m getting the same: no output when using SUMMARIZE, not even unexposed output.

And its sister statement DESCRIBE even gives an error:

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • Best Answer
  • December 6, 2024

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );

oliver.morris
Contributor
Forum|alt.badge.img+14
  • Author
  • Contributor
  • December 8, 2024

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );

@arnold_bijlsma nice solution, very cool - I think this means I can build a little hub transformer to summarise a whole set of file types super quick!