Skip to main content

Hi everyone,


I’m working on an FME workbench where I download a full dataset file geodatabase daily. I also have a CSV file that lists daily changes.

My goal is to remove files from the main dataset based on the CSV and add updated files from file geodatabase back into the main dataset.

I’ve tried using the FeatureMerger and FeatureJoiner transformers to identify changes, but they were not effective due to time constraints. Since this needs to be done daily, time efficiency is crucial.

Do you have any suggestions, Python scripts, or SQL queries, or any other ideas to speed up this process?

Thank you!

Have you looked into the ChangeDetector transformer?
What kind of runtimes have you been experiencing and hoping for?


Thank you liamfez for your reply.

The dataset has about 18 million records, so I do not want to run all of them daily to check the changed data. 
I just tested a small portion of the data, which takes about one hour!

I need as short as possible to be able to run daily.

I appreciate any assistance you can provide.

Thanks


@marjan_nikoukar , you might need to explain in more detail what the overall goal is, and samples of your data and FME workflow.

If it is to apply changes back to an enterprise database, there are far better formats to process delta changes in than offline File Geodatabases.


Hi bwn liamfez  danilo_fme,

I am working on an ETL process and need advice on the best approach to achieve the following:

  • Main Dataset: Stored in an SDE, where each feature has a unique ID.
  • Inputs:
    1. A daily CSV containing the unique IDs of changed features.
    2. A File Geodatabase (FGDB) containing the complete updated dataset.
  • Goal:
    1. Use the CSV to detect the changed features.
    2. Remove these features from the SDE.
    3. Add the corresponding updated features from the FGDB back to the SDE.

Constraints:

  • The process will run daily, and the dataset is large, so it must be optimized for speed.

Questions:

  1. Should I use transformers like FeatureMerger or Tester to filter and process the data?
  2. Would it be better to use SQL or Python for performance?
  3. Is there a best practice for handling this kind of incremental update in FME?

Any guidance or recommendations would be greatly appreciated!


Hi,

I am working on an ETL process and need advice on the best approach to achieve the following:

  • Main Dataset: Stored in an SDE, where each feature has a unique ID.
  • Inputs:
    1. A daily CSV containing the unique IDs of changed features.
    2. A File Geodatabase (FGDB) containing the complete updated dataset.
  • Goal:
    1. Use the CSV to detect the changed features.
    2. Remove these features from the SDE.
    3. Add the corresponding updated features from the FGDB back to the SDE.

Constraints:

  • The process will run daily, and the dataset is large, so it must be optimized for speed.

Questions:

  1. Should I use transformers like FeatureMerger or Tester to filter and process the data?
  2. Would it be better to use SQL or Python for performance?
  3. Is there a best practice for handling this kind of incremental update in FME?

Any guidance or recommendations would be greatly appreciated!


No, I don’t think you should use a FeatureMerger or Tester here. In broad terms: read the CSV and then use a SQLCreator to read the features you need from the FGDB and write those to SDE, set to Update.

 

Each feature from the CSV will be one query though, so if you have a lot of changed features you may want to combine them in a single query.

 

Another thing to consider: if the FGDB contains *all* features anyway, why not simply skip the check and just write that to SDE ¯\_(ツ)_/¯ It's probably not going to be faster, but it's worth a try.


Hi redgeographics,
 
Thank you so much for your reply. 

I don’t have much experience with writing SQL queries.

Would you mind assisting me in creating the correct SQL query for this task? I’d greatly appreciate


I’m by no means an SQL expert myself, but the simplest (albeit potentially slower) way is to trigger an SQLExecutor for every CSV feature. Assuming there’s an id attribute in there the query would be something like this:

SELECT * FROM table
WHERE id = your_id_attribute

 


Reply