Suppose I have a data containing 10 million records but I want to read every 1 million features and run my processing parallel in 10 batches. Can someone please suggest any method?

How can we read and process the data in batches of say 1 million records ?

takashi
7645 replies
3 years ago
January 31, 2022

What is the sourse dataset format?

+1

bhavyagandhi
Author
Contributor
14 replies
3 years ago
January 31, 2022

takashi wrote:

What is the sourse dataset format?

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

takashi
7645 replies
3 years ago
January 31, 2022

bhavyagandhi wrote:

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

A possible way is to convert the transformers which you need to perform in parallel to a custom transformer, configure its parallel processing parameters, and run it for each group (i.e. block of 1 million features).

The attached screenshots illustrate how you can create a transformer parameter linked to the Group By parameter, and set a parallel mode (minimal or above) to the Parallel Processing parameter. custom-transformer-parameters-1

+1

bhavyagandhi
Author
Contributor
14 replies
3 years ago
January 31, 2022

bhavyagandhi wrote:

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

how we can process it in blocks? like 1 million features then next million features & then the next ? Is there a way to segregate in blocks and run in parallel, I saw all of the features which are going inside the custom transformer through different streams are going together one by one but not in parallel

+54

hkingsbury
Celebrity
1493 replies
3 years ago
January 31, 2022

bhavyagandhi wrote:

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

Setting the Group By Mode to "Process At End" and using a transformers like the modulo counter to group features into X number of groups

takashi
7645 replies
3 years ago
January 31, 2022

bhavyagandhi wrote:

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

I think it would be efficient to keep the order of features, in this case. A possible way is to use a Counter to add sequential number to the features, then calculate group ID (integer number) with this expression.

@floor(@Value(_count) / 1000000)

You can then set "Process When Group Changes (Advanced)" to the Group By Mode parameter.

[Add] The attached screenshot illustrates my intention.

workflow-example

+1

bhavyagandhi
Author
Contributor
14 replies
3 years ago
February 2, 2022

bhavyagandhi wrote:

shapefile, but regardless what the format is, I want to send 1 million records or you can say rows (if its in database format) in 1 set and parallel I want to run 10 more batches like this from same series of processes and transformers, just wanted to know how it can be done in batches in parallel

Thank you for the suggestions, appreciate it 😊

How can we read and process the data in batches of say 1 million records ?

7 replies

Reply

Helpful Members This Week

Recently Solved Questions

generate triangles between 3D lines

Speeding up geocoder

All Attributes from GeoJSON Retrieved via HTTPCaller (FME 2021)

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Cordova 6.2.0

Cordova 6.0.2

Ionic android buil failsicon

Ionic/Angular/Cordova/Android - cannot find symbolicon

Error after updating Cordova SDK to 4.0.0icon

Helpful Members This Week

Recently Solved Questions

generate triangles between 3D lines

Speeding up geocoder

All Attributes from GeoJSON Retrieved via HTTPCaller (FME 2021)

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings