How to feed a workspacerunner parts of a 60 million dataset.

I would like to input only 1,000,000 features at a time from a single 60,000,000 feature dataset (Parquet) to a workspacerunner. Can I use featurReader for this? I could select features based on attributes (dates or names) but I cant proces the data per tile or a spatial filter.

Page 1 / 1

Thats not really how the workspacerunner works. If you "sent" 1m features to it, you would trigger the workbench 1m times.

In the workbench you're triggering, you would need filtering logic.

As a very simple example say you had the following data...

+-----------+-----------+-------+
|   name    |   type    | count |
+-----------+-----------+-------+
| banana    | fruit     |    10 |
| apple     | fruit     |     2 |
| carrot    | vegetable |   100 |
| chocolate | other     |    10 |
| potato    | vegetable |    53 |
+-----------+-----------+-------+

You'd set you child workspace (the one referenced in the workspace runner) to read data in based on the "type". This would be a published parameter.

In your parent workspace (the one that calls the workspace runner) you would have logic to filter what types you have. Then a single feature (for each type) would trigger the workspace runner.

Thats not really how the workspacerunner works. If you "sent" 1m features to it, you would trigger the workbench 1m times.

In the workbench you're triggering, you would need filtering logic.

As a very simple example say you had the following data...

+-----------+-----------+-------+
|   name    |   type    | count |
+-----------+-----------+-------+
| banana    | fruit     |    10 |
| apple     | fruit     |     2 |
| carrot    | vegetable |   100 |
| chocolate | other     |    10 |
| potato    | vegetable |    53 |
+-----------+-----------+-------+

You'd set you child workspace (the one referenced in the workspace runner) to read data in based on the "type". This would be a published parameter.

In your parent workspace (the one that calls the workspace runner) you would have logic to filter what types you have. Then a single feature (for each type) would trigger the workspace runner.

Thank you for your reply. This has helped me understand Workspacerunner better. I have been trying things out in fme but it is still very difficlut to proces the 60000000 records.

I now wonder if it is possible to input into workspacerunner published parameters in the child workspace for Start Features and Max Features read? I would love to avoid having a reader read in 60,000,000 records before it can do anything at all.

Thank you for your reply. This has helped me understand Workspacerunner better. I have been trying things out in fme but it is still very difficlut to proces the 60000000 records.

I now wonder if it is possible to input into workspacerunner published parameters in the child workspace for Start Features and Max Features read? I would love to avoid having a reader read in 60,000,000 records before it can do anything at all.

You could do that, using a featurereader and exposing the parameters, but that apporach is sometimes a bit flacky. I'd recommend using a WHERE clause

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded