Hi, what is the best way to avoid loading in a complete huge singel Parquet file (58 million records).
My memory keeps running out.
I have the option to split the data into smaller pieces with a Tester transformer selecting the data on date and then saving it to further process, and have done that with a child-parent workspace, and a tester + published parameter for the dates. But it still seems to go wrong due to memory problems. The process takes very long to save by date, especially loading in 58 million records before running the data through the tester. I would love tips on splitting up the data before loading in the complete 58 million records. or any other tips for dealing with Parquet files.