Skip to main content

PostgresWatch A Generic FME Flow Automation for Table Monitoring

  • June 12, 2025
  • 1 reply
  • 30 views

alexbiz
Enthusiast
Forum|alt.badge.img+12

 



I wanted to share a project I’ve been working on : a way to "watch" a PostgreSQL table using FME Flow Automation.


Since I began working with FME Server/Flow, Automation has always stood out as one of its most powerful and easy-to-demonstrate features. 
The ability to trigger workspaces based on incoming emails or new files and automatically send notifications, consistently makes an impact during demos.

 

But one question kept coming back:
“Can I detect changes in a database table and trigger workspaces from that?”

 

For a long time, my answer was: “Not really”, or at least, not in a clean or recommended way. I used to think the best option was for the database to notify FME Flow, not the other way around. For instance, in PostgreSQL, you could use a trigger on insert/update/delete and call an FME Flow webhook using something like the PL/Python or pgsql-http.

 

That method is covered here:
- 🔗 4 methods for tracking data changes in FME (blog post)
- ▶️ Triggering FME with your Database (YouTube video)  

 

So why not let FME Flow handle the detection itself? Mainly because that typically involves maintaining a duplicate of the table somewhere that FME Flow can compare against. That means double the storage, plus a possible security issue. Your duplicated table may not have the same access controls as your production one, and sensitive data could leak.

But then I had an idea:
What if the “duplicate” table didn’t store the actual data, just a CRC hash representing each row’s content?

 

A CRC hash (Cyclic Redundancy Check) is a lightweight checksum used to detect changes or errors in data.`  
It’s not cryptographically secure (like SHA-256), but it’s fast and perfect for spotting changes.

Thankfully, FME makes this easy with the CRCCalculator transformer.

This way, storage is minimal and no sensitive data gets duplicated.


I built an FME workspace that can monitor any PostgreSQL table and detect changes between two executions by storing the CRC hash on a local SQLite database. 

 

 

It outputs the detected changes using Automation Writers: one for inserts, one for updates, and one for deletions. 

 

 

You can pair it with a Schedule Trigger in FME Flow to define how often it runs.

 

 

The challenging part? Making it generic enough so that users can plug in their own table without modifying the workspace.

 

 

The workspace uses three user parameters:

Connection: A database connection parameter so you can pick from available PostgreSQL connections in FME Flow. That works well with the new connection storage based on FME Flow.

PostgreSQL Table to Watch: The fully qualified name of the table to monitor (e.g., public.my_table).

PostgreSQL Table ID Column: The primary key column used to identify each record.

Only the ID column is stored in plain text in the tracking table. It’s needed to detect updates and deletions. The Automation Writers output this ID so it can be used downstream

 

You can find the PostgresWatch FME Flow project on FME Hub here:
https://hub.safe.com/publishers/abizien/projects/postgreswatch

There aren’t many FME Flow projects on the Hub yet (only 8 at the time of writing, 6 of which are from Safe themselves), so I’m happy to contribute!

 

A limitation of this approach is that it requires an FME Flow engine to execute the workspace and perform the change detection.  
Unlike most native Automation triggers, which can idle without consuming engine resources, this one actively runs at each interval.

While CRC collisions are extremely rare, they are theoretically possible.  
So this is a trade-off between performance and absolute certainty.

I haven’t tested it yet on large datasets.  
Depending on the size of your table and how often you run the detection, performance could become an issue. Especially if hashing and comparison involve thousands (or millions) of rows per run.

 

This project is still in early stages and hasn’t been tested exhaustively.  
If you give it a try and run into any edge cases or have ideas for improvement, I’d love to hear about them!

1 reply

david_r
Celebrity
  • June 12, 2025

Very interesting stuff, thanks for sharing! :-)

I did something similar some years ago. The requirement was to detect changed rows in all tables (feature classes) between the staging and the production schema, and update the production schema accordingly. This was using SQL Server with ArcSDE on top, but I suppose the concept could be reproduced in most similar database systems.

A major challenge was the data volume, we could not use FME to compare all data because simply reading everything out of the database would take far too long (many hours). The solution was to offload as much as possible to the database, using the following concept:

  1. All tables in both the staging and the production schema had a binary checksum column.
  2. The checksum column was populated using a SQL Server procedure that
    1. used the SQL Server metadata views to query all columns in all tables
    2. excluded certain tables and also certain columns, e.g. OBJECTID, SHAPE.*, etc. The exclusions were defined in separate tables, allowing regex definitions.
    3. geometry columns were converted to WKT using ST_AsText()
    4. all column values incl. the string representation of the geometry were concatenated and the checksum was calculated using HASHBYTES()
  3. All checksums were updated after each batch import in the staging schema. This was very fast, many orders of magnitude faster than using FME since everything happened inside the database, no data was ever read into FME.
  4. We compared the checksums in the staging and production schema using outer joins, this basically reproduced the ChangeDetector, again without having to read any data into FME.
  5. Based on the results of the outer joins, the rows in the production schema were either inserted or archived (soft delete using a timestamp column).

All this was orchestrated using FME, and it was very fast even with huge data volumes.

A few learnings:

  • We had so much data that CRC-32, MD5 or similar “short” checksums had to be excluded, as we got observable collisions. After extensive testing, SHA-256 was used.
  • Geometries that were huge or made out of complex composites (e.g. some donuts) were problematic. In the end we had to switch to using hex-encoded WKB because of limitations in WKT.
  • Comparing geometries in this way makes it important to consider floating point precision and the Esri snapping grid.
  • When inserting rows in SDE feature classes using SQL, use the Next_RowID() function to populate the OBJECTID column. This ensures always using the underlying database sequence to avoid potential out-of-sync conflicts later.

I’d be curious to hear from others that have tackled similar challenges!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings