I would like to see FeatureWriter or another transformer updated to allow data being written to a temporary table and an SQL statement (or several) to be run afterwards to manipulate that data, most likely to move it to its permanent location and/or create partitions for the range to be inserted.
Related to a community question I opened last year, and to a similar idea I submitted for temporary files.
One method I commonly use (at least in Postgres) to efficiently write large amounts of data to large tables is to create a temporary table that’s identical to my target table (minus indexes and constrainsts), use some bulk loading method like COPY to fill that table with the data I’m inserting, and then running whatever statement I need to insert/upsert the final table with that data. This allows the DBMS to use whatever method works best for filling that data and does the actual data insertion as a single transaction. For FME specifically, this allows rows to be upserted without opting out of bulk-mode insertion.
As you can see from my community question, this turns out to be very complicated with FME because transactions/sessions aren’t scoped across transformer nodes in a way that makes it possible to use temporary tables, so you need to:
- Create a schema for fake temporary tables
- Figure out some way to make a table name that is unique per run, which works reliably for both FME Form and FME Flow and fits within the table name length limits
CREATE UNLOGGED TABLE “fme_fake_temp”.temp_table_name LIKE source_table_name EXCLUDING CONSTRAINTS EXCLUDING INDEXES
- In a subsequent SQLExecutor, run your insert statement and drop the table
Dropping the table is often unreliable, insertion errors will often cause the temporary table to remain there and slowly inflate the fake temporary schema, requiring preriodical cleanup. Creating a unique table name is also inconveinient; $(FME_UUID)
(which is what TempPathnameCreator uses) would work reasonnably well, since it’s an underscore-separated UUID prefixed with the transformer name, but it doesn’t work with FeatureWriter and requires transformer names to be valid table identifiers and shorter than 25 characters long to fit under Postgres’ 63 character limit. Besides that, FME offers no identifier that is unique per run and available on both Form and Flow.
This is a lot of gymnastics for what I believe is a fairly common data insertion pattern, and FME would benefit from supporting it “natively”, so to speak, possibly using actual temporary tables. Other DBMS support similar private/temporary tables, like Oracle’s Private Temporary Tables, MySQL’s temporary tables, MS-SQL’s #TemporaryTables, or the (somewhat more convoluted) SQLite approach of mounting a separate database (possibly one that only exists in :memory:
) as a schema, and creating temporary tables there. For FeatureWriter proper, however, these mechanisms wouldn’t work, so having FME track temporary “real” tables and drop them at the end of the flow run would likely be the least disruptive option. Using the actual mechanisms for session-local tables would probably require something more akin to InlineQuerier.