Question

Poor performance when updating data while good performance when inserting new data.


I have a dataset of around 200,000 records (point geometry with approximately 10 attributes).

I am using the Change detector transformer to update only those records that are updated and to add new records or to mark some as deleted records.

I am using the Feature Writer transformer to write the data into an ESRI Enterprise geodatabase in MS SQL in Azure Cloud. Source data resides in the same geodatabase in Azure Cloud.

!!! FME Server is installed on premise, not in Azure !!!

When initializing the ETL by truncating the destination table and initial loading of the data, the 200,000 records are written in about 7 minutes.

!!! There is an index on the attribute that is used as Primary Key and that attribute is indicated imageAfter the initialization, the time to perform the ETL is still low (5-8 minutes) if there are no updates on data, or if there are few updates.

However, if there are many updates (i.e. 100,000 updated records) the time increase dramatically to 4 hours.

Data is written in 3 tables:

- one table storing the attribute that was updated, its initial value and its updated value and a timestamp. This is written quite fast, in a matter of minutes.

- one table storing the destination data. This is written slow: in batches of less than 20 records and 1000 records committed at a time.

- one table storing the history of the records, so data is always added. The same record that is updated is written here so in this table it is possible to find how a certain record looked two months ago. This is written slow: in batches of less than 20 records and 1000 records committed at a time.

 

1) I noticed that the timestamp of the records that are updated is almost the same. All 100,000 updated records have almost the same timestamp ( 5 seconds range), which differs up o two hours with the time of commit. As other systems are using this timestamp to make an incremental refresh, this "wrong time" written in the database is causing problems to the other systems as they are forced to grab the data only after the entire ETL process is finished and are not able to grab the data during the ETL process. Is there any solution to write the time of commit in an attribute? I am calculating the DateTimeNow(UTC) one step before, not here:

 

image 

2) Are there any recommendations on how to speed up the process, while keeping the FME Server on premise for the time being?


11 replies

Userlevel 1
Badge +11

Hmmm, at what point in time the index is being created? If this is being done at the same time the table is initially populated the index ends up not being very well. Maybe you should try to rebuild the index once the table is fully populated, and then run your update process again. Does this make any difference?

Hi, the index on the Primary Key was created in ArcGIS Pro after the Feature Classes were created (empty). When the ETL is executed first time, the Feature Classes are empty. Except the first execution of the ETL, all the others are just updating the Feature Classes. The index is managed by Arc SDE, not by FME.

Userlevel 1
Badge +11

OK. Thanks. I think it would be wise to first populate the Feature Classes (with a first run of the ETL), then use ArcSDE to create the index before running any updates. In this case even the initial insert of all the 200,000 records (now taking approximately 7 minutes) might go faster. I can't prove it from here, you'll have to test it on your side.

Badge +2

@iuriemaxim​ Have a glance through the article Improving Performance when working with Esri Geodatabases, but I can't see anything in that article that you aren't already doing.

I think it's very likely your history table - is this configured as an archive feature class? We have heard from users who have had to cull or backup their archive tables as they grow see here and here. You could test this by writing to an temporary, but empty archive and see if FME performs any faster.

Hi

@egge​ Ok I will try to remove the index and recreate it again. As you explained, I am sure that the initial loading will take less than 7 minutes, but the problem is not related to the initial loading, but with updating records and logging all these records that were updated. The ETL takes hours only if there are tens of thousands of records changed in a day and in this case the time is hours and I would expect to be minutes.

 

@Mark Stoakes​ The "History Feature Class" is not configured as an archive feature class because nobody edits the data in ArcGIS Pro. Data is gathered from branch offices databases into the head office database. The "history feature class" has quite a similar role as the the "archive feature class", namely is used to log all transactions performed on the 'Feature Class with actual data'. The size is not yet too big. The ETL takes hours even if there are many updates in the second day after the initialization.

 

I think that could be related also to the TEMPORARY data stored by FME.

Badge +2

@iuriemaxim​ Regarding the TimeStamp - the time FME will give is the time where it passed through the FME transformer. If you want the actual database record insert time you can probably add a trigger to the table.

@Mark Stoakes​ Indeed adding a trigger to the table can be a non FME "work around". Unfortunately special rights should be available on the database and these rights are not available. Only writing rights exists and I suppose that this is quite common to most FME users, as administrative rights are only held by the DBA.

As I see that FME is able to display in the log the time of the writing/commit, I am thinking that it should/could have the ability to write this time trough the FeatureWriter transformer. It is possible to confirm that such a functionality does not exist in the FeatureWriter even if it exist in the log, to not investigate more in this direction ?

image 

 

image 

Any hint in speeding up the time of writing updates? Can be seen in the image above that it takes 5 seconds to update 20 records. So in one minute are updated 240 records, in one hour 14000 records and more than 4 hours for updating 65000 records. That's quite a lot. The timestamp put in a transformer right before the FeatureWriter is 4 hours before last commit, so this could be an indication that 4 hours are spent for loading/updating data, while extract and transform is few minutes.

Badge +2

@iuriemaxim​ I think you need to have you're local FME reseller look at this and walk through it with them. File a support case with them. Other suggestions:

  • test the different update separately
  • test the updates on a local SDE

@Mark Stoakes​ Thank you for the valuable feedback. We will test the updates on a local file geodatabase as we cannot test on an SQL Azure enterprise geodatabase without moving the FME into the cloud (and this is not yet an option).

I am not sure if I correctly understood how to test different update separately. Does this mean to update only one table to see if it takes so much time if it is updated alone? actually there is only one table that is updated. All the other tables are using addition of records.

As the ETL was done by/with the local FME distributor I suppose will be quite easy to file a support case with them.

Badge +3

Hi, 

I’m wondering that if you found the reason and solution. We have a similar situation. We have a FME to update Esri Geodatabase (File Geodb) in Azure VM by truncating and inserting. It takes huge amount time to truncate and insert 1000 data.

Once I stop the service which references the fgdb, the FME updating data runs very fast.

We did another test. if we run FME against a fgdb on premise, It is much faster no matter the service referencing the fgdb is stopping or running.

 

 

Reply