I work with time series data a lot, and to keep some slack with regards to “race conditions” across my own flows and other upstream jobs, I tend to use rather generous time windows and “full rerun at the end of the day” configurations to ensure I don’t leave any time gaps in the data destination. That means a lot of the features being inserted at the end will have already been inserted by a previous run and will raise a conflict error if there’s some sort of primary key constraint (or create duplicate entries if these constraints are not enforced).
This sort of scenario is what UPSERT is for, but UPSERT causes existing rows to be updated by design. The problem is that several databases, such as Postgres and Snowflake don’t handle large amounts of rows being updated frequently very well, since the “updated row” is actually just a duplicate of the old row with its fields updated, while the “replaced row” is just hidden and actually deleted later by a separate process.
Given I know any rows already present in the DB would be identical to the ones I’m inserting, I would much rather skip inserting any conflicting rows instead of causing unneeded churn by “updating” them with the same data. Standard SQL handles this with MERGE INTO … WHEN NOT MATCHED THEN INSERT, but there are also non-standard ways like Postgres’ INSERT INTO … ON CONFLICT DO NOTHING. The only workaround I have at this time is inserting all my features into a temporary table and using an SQLExecutor to write that data to its final destination (something not helped by not being able to use actual temporary tables in FME). Being able to set the FeatureWriter to write new features and ignore conflicts from the get go would be a lot more conveinient for me.
Related question/community post:


