Skip to main content

Hi community, I am using FME Workbench 2024.1 to create parquet files. Those files contain DateTime columns.

 

It seems as those DateTime columns are missing a codec flag in order to enable the loading in other software (Dremio in my user case).

 

Using parquet-tools we can see the problem:

java -jar parquet-tools-1.9.0.jar schema ~/Downloads/part0.parquet

 

can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT64, encodings:gRLE_DICTIONARY, PLAIN, RLE], path_in_schema:m_transaction_ts], codec:null, num_values:66, total_uncompressed_size:95, total_compressed_size:113, data_page_offset:35, dictionary_page_offset:4, statistics:Statistics(max:EC 90 D1 36 42 42 10 18, min:EC 90 D1 36 42 42 10 18, null_count:0), encoding_stats:tPageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN, count:1), PageEncodingStats(page_type:DATA_PAGE, encoding:RLE_DICTIONARY, count:1)])

 

I include here a small parquet example with two datetime columns (last_update and _transaction_ts).

Please, can you help me?

Is it a bug? Do I have to define an extra setting in my FME workflow?

 

Thanks in advance

 

Which version of Parquet are you using when writing?

I looked up the error and found this thread: https://github.com/pola-rs/polars/issues/3929

There is a suggestion to made sure you’re using version 2.0 and to try a different compression method (Snappy?). Seems a bit strange that a different compression might help in this case but worth a shot.

 


Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 


Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 

Awesome thanks for sharing


Reply