Question

Daily Data update in FME

8 months ago
November 20, 2024
8 replies
75 views

marjan_nikoukar
Contributor
19 replies

Hi everyone,

I’m working on an FME workbench where I download a full dataset file geodatabase daily. I also have a CSV file that lists daily changes.

My goal is to remove files from the main dataset based on the CSV and add updated files from file geodatabase back into the main dataset.

I’ve tried using the FeatureMerger and FeatureJoiner transformers to identify changes, but they were not effective due to time constraints. Since this needs to be done daily, time efficiency is crucial.

Do you have any suggestions, Python scripts, or SQL queries, or any other ideas to speed up this process?

Thank you!

+37

liamfez
Influencer
260 replies
8 months ago
November 20, 2024

Have you looked into the ChangeDetector transformer?
What kind of runtimes have you been experiencing and hoping for?

Rotate monitor to the right to improve translation speed using gravity

marjan_nikoukar
Author
Contributor
19 replies
8 months ago
November 20, 2024

Thank you liamfez for your reply.

The dataset has about 18 million records, so I do not want to run all of them daily to check the changed data.
I just tested a small portion of the data, which takes about one hour!

I need as short as possible to be able to run daily.

I appreciate any assistance you can provide.

Thanks

+26

bwn
Evangelist
562 replies
8 months ago
November 21, 2024

@marjan_nikoukar , you might need to explain in more detail what the overall goal is, and samples of your data and FME workflow.

If it is to apply changes back to an enterprise database, there are far better formats to process delta changes in than offline File Geodatabases.

marjan_nikoukar
Author
Contributor
19 replies
8 months ago
November 27, 2024

Hi bwn liamfez danilo_fme,

I am working on an ETL process and need advice on the best approach to achieve the following:

Main Dataset: Stored in an SDE, where each feature has a unique ID.
Inputs:
1. A daily CSV containing the unique IDs of changed features.
2. A File Geodatabase (FGDB) containing the complete updated dataset.
Goal:
1. Use the CSV to detect the changed features.
2. Remove these features from the SDE.
3. Add the corresponding updated features from the FGDB back to the SDE.

Constraints:

The process will run daily, and the dataset is large, so it must be optimized for speed.

Questions:

Should I use transformers like FeatureMerger or Tester to filter and process the data?
Would it be better to use SQL or Python for performance?
Is there a best practice for handling this kind of incremental update in FME?

Any guidance or recommendations would be greatly appreciated!

marjan_nikoukar
Author
Contributor
19 replies
8 months ago
November 27, 2024

Hi,

I am working on an ETL process and need advice on the best approach to achieve the following:

Main Dataset: Stored in an SDE, where each feature has a unique ID.
Inputs:
1. A daily CSV containing the unique IDs of changed features.
2. A File Geodatabase (FGDB) containing the complete updated dataset.
Goal:
1. Use the CSV to detect the changed features.
2. Remove these features from the SDE.
3. Add the corresponding updated features from the FGDB back to the SDE.

Constraints:

The process will run daily, and the dataset is large, so it must be optimized for speed.

Questions:

Should I use transformers like FeatureMerger or Tester to filter and process the data?
Would it be better to use SQL or Python for performance?
Is there a best practice for handling this kind of incremental update in FME?

Any guidance or recommendations would be greatly appreciated!

+53

redgeographics
Celebrity
3652 replies
8 months ago
November 27, 2024

No, I don’t think you should use a FeatureMerger or Tester here. In broad terms: read the CSV and then use a SQLCreator to read the features you need from the FGDB and write those to SDE, set to Update.

Each feature from the CSV will be one query though, so if you have a lot of changed features you may want to combine them in a single query.

Another thing to consider: if the FGDB contains *all* features anyway, why not simply skip the check and just write that to SDE ¯\_(ツ)_/¯ It's probably not going to be faster, but it's worth a try.

FME rocks! \m/

marjan_nikoukar
Author
Contributor
19 replies
8 months ago
November 27, 2024

Hi redgeographics,

Thank you so much for your reply.

I don’t have much experience with writing SQL queries.

Would you mind assisting me in creating the correct SQL query for this task? I’d greatly appreciate

+53

redgeographics
Celebrity
3652 replies
8 months ago
November 28, 2024

I’m by no means an SQL expert myself, but the simplest (albeit potentially slower) way is to trigger an SQLExecutor for every CSV feature. Assuming there’s an id attribute in there the query would be something like this:

SELECT * FROM table
WHERE id = your_id_attribute

FME rocks! \m/

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Daily Data update in FME