I have a dataset of features that I'm trying to insert in a Postgres (pg14) database. That data has a number of text columns that together make up a "report name" plus a timestamp. The name and timestamp together make up the primary key of that dataset. There are also some more columns that contain the actual data. The table I'm inserting this in currently has 800 million rows, and I'm inserting in batches of around 50k rows about a hundred times a day, so keeping indexes small is important.
Given the above situation, the text columns from the "report name" have been put into a separate "dataset_reports" table" (~60k rows); with the timestamp and actual data being stored separately (with a foreign index) in their own heavily partitionned "dataset_reports_data" (~800M rows) table. In order to insert data in that table, I need to upsert dataset_reports with all the unique report names in my feature set in a first pass, then get the FK value for all these reports and then I can insert the actual data in "dataset_reports_data".
To try and reduce back and forth, what I used to do (before FME) was to bulk copy all the data I was going to insert in a temporary table, and then have the database handle all the deduplication and joining. Something like this:
-- STEP 1: Create a session-private and session-scoped temporary table
CREATE TEMPORARY TABLE _tmp_dataset_full_data (
report_name_a text NOT NULL,
report_name_b text NOT NULL,
line_timestamp timestamptz NOT NULL,
rest_of_the_data text NOT NULL
-- STEP 2: Copy everything to the DB for easy manipulation
COPY TO _tmp_dataset_full_data ;
-- STEP 3: Fill the report names table
-- report_id is generated by a sequence column
INSERT INTO dataset_reports (report_name_a, report_name_b) (
SELECT DISTINCT report_name_a, report_name_b FROM _tmp_dataset_full_data
) ON CONFLICT (report_name_a, report_name_b)
DO NOTHING ;
-- STEP 4: Fill the data table by joining to get the possibly new report_id
INSERT INTO dataset_report_data (report_id,line_timestamp,rest_of_the_data) (
FROM _tmp_dataset_full_data AS report_lines
INNER JOIN dataset_reports AS reports USING (report_name_a,report_name_b)
) ON CONFLICT (report_id, line_timestamp)
DO NOTHING ;
I can easily use the "Run before/after write" queries to take care of steps 1, 3 and 4, even though I'm sure that's not what they were intended for.
My problem is that FME's Postgres Writer needs a schema-qualified table to work, and will check to see that said table exists prior to inserting, and that part is currently failing with "POSTGRES Writer: Table 'public._tmp_dataset_full_data' does not exist. Please check the Table Name specified, or modify the Table Handling value under Format Parameters".
The query FME uses to test for the table's existence (this is copied from the database's statement log) looks like this:
select tablename from pg_tables where schemaname = 'pg_temp' and tablename = '_tmp_dataset_full_data'
This seems to make step 2 impossible, because Postgres' temporary tables don't live in a fixed schema, but in pg_temp,, which is an alias to a per-session private schema that's actually named something like "pg_temp_102", is unique for every session, and should be tested like this.
select tablename from pg_tables where schemaname::regnamespace = pg_my_temp_schema() and tablename = '_tmp_dataset_full_data';
So is there a way with FME (I'm currently on 2022.1) to make the Writer understand that this is a temporary table that does exist, or perhaps to skip the check entirely and go straight to inserting?
I'm very new to FME, so I realize that this may not be the "idiomatic" way of doing things.
Best answer by vlroyrennView original