Skip to main content
Question

Apache Parquer Writer does not write DateTime columns properly

  • December 14, 2024
  • 3 replies
  • 32 views

ahuarte47
Contributor
Forum|alt.badge.img+3

Hi community, I am using FME Workbench 2024.1 to create parquet files. Those files contain DateTime columns.

 

It seems as those DateTime columns are missing a codec flag in order to enable the loading in other software (Dremio in my user case).

 

Using parquet-tools we can see the problem:

java -jar parquet-tools-1.9.0.jar schema ~/Downloads/part0.parquet

 

can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT64, encodings:[RLE_DICTIONARY, PLAIN, RLE], path_in_schema:[_transaction_ts], codec:null, num_values:66, total_uncompressed_size:95, total_compressed_size:113, data_page_offset:35, dictionary_page_offset:4, statistics:Statistics(max:EC 90 D1 36 42 42 10 18, min:EC 90 D1 36 42 42 10 18, null_count:0), encoding_stats:[PageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN, count:1), PageEncodingStats(page_type:DATA_PAGE, encoding:RLE_DICTIONARY, count:1)])

 

I include here a small parquet example with two datetime columns (last_update and _transaction_ts).

Please, can you help me?

Is it a bug? Do I have to define an extra setting in my FME workflow?

 

Thanks in advance

 

3 replies

virtualcitymatt
Celebrity
Forum|alt.badge.img+34

Which version of Parquet are you using when writing?

I looked up the error and found this thread: https://github.com/pola-rs/polars/issues/3929

There is a suggestion to made sure you’re using version 2.0 and to try a different compression method (Snappy?). Seems a bit strange that a different compression might help in this case but worth a shot.

 


debbiatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • January 7, 2025

Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 


virtualcitymatt
Celebrity
Forum|alt.badge.img+34
debbiatsafe wrote:

Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 

Awesome thanks for sharing


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings