Question

FeatureWriter GDB Performance // Over 94M records to process

11 months ago
July 10, 2024
6 replies
92 views

+19

galigis
Enthusiast
92 replies

Hi All,

Today’s million dollar question: I have created a workflow to process some polygon and table data into a template GDB. The process isn’t too complicated but I think the most consuming transformers in the process are the DataTimeConverter and the FeatureWriter. I have done some tests but assume that processing around 94M of records will takes days.

Mainly, writing 94M of records into a gdb template will be a nightmare.

Here is a screenshot of the workflow, I have also attached a version of it:

How can I improve the performance of the workflow and process around 94M of records in a clever way?

Open to suggestions :)

Thanks

+54

hkingsbury
Celebrity
1497 replies
11 months ago
July 10, 2024

I notice in the screenshot you have featurecaching enabled, turn that off, that will give a massive performance boost. Also make sure your FME Temp directory is set to an SSD rather than a HDD.

https://support.safe.com/hc/en-us/articles/25407446479373-Setting-a-temporary-file-location-for-FME-to-use-via-the-FME-TEMP-environment-variable

Another thing you could look to do is split the data into smaller subsets and process them one at a time. This could be done manually or via a parent/child workspace using the workspace runner

You could also look to split it out based on the output feature classes your writing (so each workbench on writes one output FC rather than five)

Whilst some of these might not necessarily speed up the entire process, it will ‘micro service’ them into smaller processes that can be rerun separately and if it fails, then you’ve only lost progress from the one process

+19

galigis
Author
Enthusiast
92 replies
11 months ago
July 11, 2024

Thanks @hkingsbury I was planning to run it via quick translator to improve the performance.

Where can I find info about the parent/child workspace runner? I couldn't find much online.

Would also the option 'Grouping By' help to process massive data?

Thanks

+35

virtualcitymatt
Celebrity
1865 replies
11 months ago
July 11, 2024

Looking at the workspace I don’t see really anything in there that should cause a huge performance hit.

94 million records is a lot for sure, however, I wouldn’t expect days. I took a look at the writer and I noticed that the Transaction Type is set to Edit Session. Is there a specific reason for this? I think this could be a big part of your performance drain. Transactions is the better choice if it’s an option. This could indeed change the process time from days to hours. Here’s a similar question where changing the transaction type sped up the process from days to hours:

Writing to an SDE dramatically slower than writing to a file geodatabase?

That and the Feature Caching of course as @hkingsbury mentioned.

Do you see in the log file any mention about splitting features out of bulk mode? if so you should focus your attention on those spots to see if you can maintain bulk mode processing.

+54

hkingsbury
Celebrity
1497 replies
11 months ago
July 11, 2024

galigis wrote:

Thanks @hkingsbury I was planning to run it via quick translator to improve the performance.

Where can I find info about the parent/child workspace runner? I couldn't find much online.

Would also the option 'Grouping By' help to process massive data?

Thanks

A parent/child setup would involve one ‘parent’ workspace that has a workspace runner in it. The parent process would be responsible for telling the child process (through use of where clauses etc on the reader via published parameters) what data to read in. Essentially it splits the data into smaller chunks.

It’s likely that this has no performance gains, but what it does provide is a safety net of being able to rerun subsets of the data and not needing to wait for the whole process to run again should it fail on a specific feature.

There is a very slim chance that you may see a minor performance increase using this. A large dataset may fill up the ram and have to write to disk based temp files - which is slower. But on the flip side, its very possible the overhead of starting up multiple smaller processes negates any processing speed improvements

+19

galigis
Author
Enthusiast
92 replies
10 months ago
July 15, 2024

Thanks @virtualcitymatt and @hkingsbury for your suggestions.

I’ve just changed the settings of the FeatureWriter with the following settings and the performance has improved :

The total time pf the process is now 12h to process and write 94M of records. A very good improvement but will need some tweaks.

Would the performance improve by increase the Feature per Transaction form 5,000 to 20,000….or maybe leave it blank?

+54

hkingsbury
Celebrity
1497 replies
10 months ago
July 15, 2024

Increasing the transaction size probably won’t make any noticeable difference, especially on a GDB

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

FeatureWriter GDB Performance // Over 94M records to process

1 Attachments