Skip to main content

Hi, I have a huge dataset sitting in a table in PostGIS, there are loads of duplicate geometries that I would like to identify and hopefully remove. I am trying to see if there is a faster way than doing a geometry match. Is it possible to create a hash of a polygon so that I could compare that? I can see its possible for point but not polygon data in PostGIS. In there anything FME can bring to the party?

Thank you

Have a look at the CRCCalculator, there are several different algorithms.

However, a different take may be to cast the geometry column to WKT (text) or WKB (bytea) in PostGIS and calculate the hash based on the results. That will certainly be faster since everything can be done within the database, e.g.

update my_table
set my_hash = MD5(encode(ST_AsEWKB(my_geometry_column), 'base64'))

If they are exact duplicates you could convert the geometry to a WKT string (or similar) and then filter the strings using a duplicate filter. Might be quicker than spatial geometry matching.


Thanks for the super quick reply David, will go with the WKB and hopefully can then work something up in PostGIS, otherwise will go down the FME route then. Thank you again for your help.


If they are exact duplicates you could convert the geometry to a WKT string (or similar) and then filter the strings using a duplicate filter. Might be quicker than spatial geometry matching.

Thank you Marcus. Hope all is well with you.


Reply