Skip to main content
Solved

create a hash of a polygon


oliver.morris
Contributor
Forum|alt.badge.img+12

Hi, I have a huge dataset sitting in a table in PostGIS, there are loads of duplicate geometries that I would like to identify and hopefully remove. I am trying to see if there is a faster way than doing a geometry match. Is it possible to create a hash of a polygon so that I could compare that? I can see its possible for point but not polygon data in PostGIS. In there anything FME can bring to the party?

Thank you

Best answer by david_r

Have a look at the CRCCalculator, there are several different algorithms.

However, a different take may be to cast the geometry column to WKT (text) or WKB (bytea) in PostGIS and calculate the hash based on the results. That will certainly be faster since everything can be done within the database, e.g.

update my_table
set my_hash = MD5(encode(ST_AsEWKB(my_geometry_column), 'base64'))
View original
Did this help you find an answer to your question?

4 replies

david_r
Celebrity
  • Best Answer
  • May 19, 2020

Have a look at the CRCCalculator, there are several different algorithms.

However, a different take may be to cast the geometry column to WKT (text) or WKB (bytea) in PostGIS and calculate the hash based on the results. That will certainly be faster since everything can be done within the database, e.g.

update my_table
set my_hash = MD5(encode(ST_AsEWKB(my_geometry_column), 'base64'))

Forum|alt.badge.img

If they are exact duplicates you could convert the geometry to a WKT string (or similar) and then filter the strings using a duplicate filter. Might be quicker than spatial geometry matching.


oliver.morris
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • May 19, 2020

Thanks for the super quick reply David, will go with the WKB and hopefully can then work something up in PostGIS, otherwise will go down the FME route then. Thank you again for your help.


oliver.morris
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • May 19, 2020
mwiltshire wrote:

If they are exact duplicates you could convert the geometry to a WKT string (or similar) and then filter the strings using a duplicate filter. Might be quicker than spatial geometry matching.

Thank you Marcus. Hope all is well with you.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings