Skip to main content

Hi community, I am using FME Workbench 2024.1 to create parquet files. Those files contain DateTime columns.

 

It seems as those DateTime columns are missing a codec flag in order to enable the loading in other software (Dremio in my user case).

 

Using parquet-tools we can see the problem:

java -jar parquet-tools-1.9.0.jar schema ~/Downloads/part0.parquet

 

can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT64, encodings:gRLE_DICTIONARY, PLAIN, RLE], path_in_schema:m_transaction_ts], codec:null, num_values:66, total_uncompressed_size:95, total_compressed_size:113, data_page_offset:35, dictionary_page_offset:4, statistics:Statistics(max:EC 90 D1 36 42 42 10 18, min:EC 90 D1 36 42 42 10 18, null_count:0), encoding_stats:tPageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN, count:1), PageEncodingStats(page_type:DATA_PAGE, encoding:RLE_DICTIONARY, count:1)])

 

I include here a small parquet example with two datetime columns (last_update and _transaction_ts).

Please, can you help me?

Is it a bug? Do I have to define an extra setting in my FME workflow?

 

Thanks in advance

 

Which version of Parquet are you using when writing?

I looked up the error and found this thread: https://github.com/pola-rs/polars/issues/3929

There is a suggestion to made sure you’re using version 2.0 and to try a different compression method (Snappy?). Seems a bit strange that a different compression might help in this case but worth a shot.

 


Reply