Would like to hear the opinions of the collective on best practices for writing features back to a SQL database. Specifically Intergraph SQL in my case, but I guess this could apply to many formats.
I have a table with millions of linear features that need to go through a LineCombiner process before being written back to the same database. The big question….what is the best methodology for deleting the old geometry and inserting the new? Two methods I’ve used in the past, each with their own downsides.
• SQLExecutor/DatabaseDeleter based on the unique ID. Main concern with this is if the translation fails after the Delete but before the writing of new features, those features are gone. • Let the writer do a DELETE/INSERT (fme_db_operation). To me this is safer, but can take twice as long due to number of transactions.
I can’t simply truncate the table because we are sometimes only reading a subset of the features in a specific AOI within the database. Truncating would also delete the features outside of our AOI.
Is there another method I’m missing?
Page 1 / 1
Not knowing Intergraph SQL so I might miss some specifics.
You know which feature needs to be updated based on an id right? Is there no option to just update geometry iso fully deleting the feature (for PostGIS I know that is possible)
Alternatively I'm thinking about a procedural language sql approach, again if applicable in your query language (again for postgresql it is possible). That way id the delete works and create fails, the whole statement is turned back. You might need to commit the changes in batches off x features to not lose too much work when it fails.
Othet alternative, wrote to a temp table and do updates from there?
Not exactly sure what the most optimal “best practices” since, in my mind…...there are several parameters and variables that can make each design implementation vary quite a bit. Below is an example that has served me well with several client projects (mostly dealing with Enterprise geodatabase in SQL Server and Microsoft SQL Server databases; but the design theory should be relatively the same with other languages).
Basically…..we had several disparate databases (databases coming from other sources or entities that fed us large linear data) that end up providing the edits to our master database. Before a feature can be written, each feature is assigned an UUID (if not already), then all attributes we want to “replicate” and geometry undergo validation, then geometry numbers are exposed and calc’d, crc values are assigned, and then our features from the master are compared against the incoming disparate databases and routed accordingly (INSERT,UPDATE,DELETE,UNCHANGED). Features written are done through transactions or if applicable (versioned edit sessions) through fme_db_operations.
This workflow has provided us with several redundant QC methods within the fme workspace and allows us to log each feature transaction along the way as well since we have to deal with so much schema drift and geometry checks before we can consider submitting the edit up to our master.
So that is our workbench in a nutshell…..to increase the speed to allow the millions of linear records to be processed, we put the whole workbench into a custom transformer and set it up as a parallel process that batches chunks of the linear features at a time to avoid .ffs locks and large memory pulls. Then during the transactional writing, we set the writer to limit the amount of features per transaction submit to i think 5,000 to avoid database locks. When all is said and done….we are currently able to run about 1.6 million features in just under 10 minutes and then some of our large data runs of about 40 million linear features we are doing in about 4-5 hours depending on the number of deltas we get (number of database updates that need to be written back to the master).
In summary, I prefer the fme_db_operation method due to the stability and QC it provides during writing.