Skip to main content
Solved

DuckDB Summarize command

  • November 2, 2024
  • 4 replies
  • 76 views

oliver.morris
Contributor
Forum|alt.badge.img+12

I am trying to use the duckdb summarize command

https://duckdb.org/docs/archive/0.9/guides/meta/summarize

in an in memory duckdb using the sqlexecutor:

SUMMARIZE SELECT * FROM read_parquet('/Users/x/Downloads/Address.parquet');

This doesn’t return any results/attributes but ‘runs’ successfully in FME. In DBeaver it runs no issues.

Any idea is there is a trick to get this working?

Many ThanksQ!

Best answer by arnold_bijlsma

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );
View original
Did this help you find an answer to your question?

4 replies

bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • November 3, 2024

Have you exposed the Attributes on the Workspace?

SELECT *” gives no clues to SQLExecutor as to what Workspace Attribute Names to create for generated Workspace Features.  This will need to be set in the Attributes to Expose Parameter, or in a downstream AttributeExposer

 


 

 

If you however view the results in Data Inspector you will see on any individual Feature in the Feature Information any unexposed Attribute values.

 

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14

@oliver.morris : I’m getting the same: no output when using SUMMARIZE, not even unexposed output.

And its sister statement DESCRIBE even gives an error:

 


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • Best Answer
  • December 6, 2024

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );

oliver.morris
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • December 8, 2024
arnold_bijlsma wrote:

@oliver.morris : There is a very simple workaround: wrap the SUMMARIZE or DESCRIBE query in another SELECT * FROM clause:

SELECT * FROM (SUMMARIZE SELECT * FROM read_csv('https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv') );

And in DuckDB’s SQL dialect you don’t explicitly need the “SELECT *” or the read_csv() function, so this could be abbreviated to:

FROM ( SUMMARIZE FROM 'https://oedi-data-lake.s3.amazonaws.com/pvdaq/csv/systems.csv' );

@arnold_bijlsma nice solution, very cool - I think this means I can build a little hub transformer to summarise a whole set of file types super quick!


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings