Tips for handeling huge Parquet singel file, trying not to load in the complete file.

Question

Hi, what is the best way to avoid loading in a complete huge singel Parquet file (58 million records).

My memory keeps running out.

I have the option to split the data into smaller pieces with a Tester transformer selecting the data on date and then saving it to further process, and have done that with a child-parent workspace, and a tester + published parameter for the dates. But it still seems to go wrong due to memory problems. The process takes very long to save by date, especially loading in 58 million records before running the data through the tester. I would love tips on splitting up the data before loading in the complete 58 million records. or any other tips for dealing with Parquet files.

featuremichael · Answer

Hello @wrijs,i understand your problem with large data and a small amount of RAM Storage.had u tried to run your PARQUET Reader with the Features Read Parameters for "Start Feature:" and "Min Features to Read:? i would try to run my workspace in pieces with these 2 Paramters, like:Workspace run 1.Start Feature: Min Feature to Read: 1 000 000Workspace run 2.Start Feature 1 000 000Min Feature to Read 2 000 00and so on.....i hope this helps :-/Greeting Michael

Tips for handeling huge Parquet singel file, trying not to load in the complete file.

1 reply

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded