Skip to main content
Question

Apache Parquer Writer does not write DateTime columns properly

  • December 14, 2024
  • 3 replies
  • 79 views

ahuarte47
Contributor
Forum|alt.badge.img+3

Hi community, I am using FME Workbench 2024.1 to create parquet files. Those files contain DateTime columns.

 

It seems as those DateTime columns are missing a codec flag in order to enable the loading in other software (Dremio in my user case).

 

Using parquet-tools we can see the problem:

java -jar parquet-tools-1.9.0.jar schema ~/Downloads/part0.parquet

 

can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT64, encodings:[RLE_DICTIONARY, PLAIN, RLE], path_in_schema:[_transaction_ts], codec:null, num_values:66, total_uncompressed_size:95, total_compressed_size:113, data_page_offset:35, dictionary_page_offset:4, statistics:Statistics(max:EC 90 D1 36 42 42 10 18, min:EC 90 D1 36 42 42 10 18, null_count:0), encoding_stats:[PageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN, count:1), PageEncodingStats(page_type:DATA_PAGE, encoding:RLE_DICTIONARY, count:1)])

 

I include here a small parquet example with two datetime columns (last_update and _transaction_ts).

Please, can you help me?

Is it a bug? Do I have to define an extra setting in my FME workflow?

 

Thanks in advance

 

3 replies

virtualcitymatt
Celebrity
Forum|alt.badge.img+47

Which version of Parquet are you using when writing?

I looked up the error and found this thread: https://github.com/pola-rs/polars/issues/3929

There is a suggestion to made sure you’re using version 2.0 and to try a different compression method (Snappy?). Seems a bit strange that a different compression might help in this case but worth a shot.

 


debbiatsafe
Safer
Forum|alt.badge.img+21
  • Safer
  • January 7, 2025

Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 


virtualcitymatt
Celebrity
Forum|alt.badge.img+47

Posting the resolution publicly in case it helps another user.

FME outputs timestamp data type (DateTime) columns with nanosecond precision (https://docs.safe.com/fme/2024.1/html/FME-Form-Documentation/FME-ReadersWriters/parquet/User-Attributes.htm) for the Apache Parquet writer format.

FME 2024.2 added support for allowing a choice of writing timestamps with either nano-, micro-, or millisecond precision for Parquet.

The output files could be loaded into Dremio after specifying timestamp_microsecond as the data type for timestamp columns in the Parquet writer.

 

Awesome thanks for sharing