Question

I have a database table with about 5 million objects. Each object has positions (lat, long in two fields). I want to read only the objects that lie within polygons (features read from a spatial source). Is it possible without reading all objects?

  • 31 March 2023
  • 8 replies
  • 3 views

I have a database table with about 5 million objects. Each object has positions (lat, long in two fields). I want to read only the objects that lie within polygons (features read from a spatial source). Is it possible without reading all objects?

8 replies

Userlevel 4
Badge +26

Just to clarify, which format are we talking about here. And when you say you have objects with Lat Long do they also have a geometry or just the columns with the Lat Long which represent their positions?

Userlevel 4

Depends on if the rows are in a spatially enabled database or not (e.g. PostGIS, SQL Server or Oracle Spatial), but also partially on the complexity of the polygons.

Alternative 1 - non-spatial database

In FME, get the bounding box of the polygon (BoundsExtractor) and use it to generate a WHERE-clause to read all rows where the lat/lon values lie within the rectangle. This should drastically reduce the number of rows read into FME. Then use e.g. a VertexCreator + Clipper in FME to only retain the rows where the point lies within the actual polygon.

Alternative 2 - spatial database

Convert the polygon, or its bounding box, to OGC WKT using the GeometryExtractor and use the spatial functions of the database in the WHERE-clause. This depends heavily on the database, but here's an example for SQL Server:

select *
from my_points_table as pt
where geometry::Point([pt.Latitude], [pt.Longitude], 4326).STIntersects(geometry::STGeomFromText('<Bounding box WKT here>))', 4326)) = 1

 

 

Hi! The first is a MicrosoftSQL table. It has (two) columns with Lat Long (respectively) which represent their positions. The other, that contains polygon is a geopackage with spatial information.

 

Just to visualise: The table contains trees and the geopackage contains stands. I have positions for each tree (around 5m). There is just around 1000 stands. I want to relate/join the trees to the stands.

 

If that makes sense... :)

Badge +2

@sveamarc​ If it's just two columns of Lat and Long then probably the BETWEEN operator is easiest. You could include this on the WHERE Clause of any of the FME database reader Feature Types, or build a ful SQL statement in a SQLExecutor.

Depends on if the rows are in a spatially enabled database or not (e.g. PostGIS, SQL Server or Oracle Spatial), but also partially on the complexity of the polygons.

Alternative 1 - non-spatial database

In FME, get the bounding box of the polygon (BoundsExtractor) and use it to generate a WHERE-clause to read all rows where the lat/lon values lie within the rectangle. This should drastically reduce the number of rows read into FME. Then use e.g. a VertexCreator + Clipper in FME to only retain the rows where the point lies within the actual polygon.

Alternative 2 - spatial database

Convert the polygon, or its bounding box, to OGC WKT using the GeometryExtractor and use the spatial functions of the database in the WHERE-clause. This depends heavily on the database, but here's an example for SQL Server:

select *
from my_points_table as pt
where geometry::Point([pt.Latitude], [pt.Longitude], 4326).STIntersects(geometry::STGeomFromText('<Bounding box WKT here>))', 4326)) = 1

 

 

Hi David! Thanks! The 2nd alternative looks right. But as a FME-beginner i have to ask: How do i refer to the WKT string? Lets say I used a GeometryExtractor and got a new column _geometry. Should I use that as an initiator to the FeatureReader with where-clause?

Userlevel 4

Hi David! Thanks! The 2nd alternative looks right. But as a FME-beginner i have to ask: How do i refer to the WKT string? Lets say I used a GeometryExtractor and got a new column _geometry. Should I use that as an initiator to the FeatureReader with where-clause?

You can use the SQLExecutor rather than the FeatureReader with the following SQL:

select *
from my_points_table as pt
where geometry::Point([pt.Latitude], [pt.Longitude], 4326).STIntersects(geometry::STGeomFromText('@Value(_geometry)', 4326)) = 1

This will insert the contents of "_geometry" into the SQL. However, note that if the WKT gets very long, it might exceed the maximum string literal length for SQL Server and throw an exception. In that case it might be better to just use the bounding box, as suggested by @Mark Stoakes​ below.

@sveamarc​ If it's just two columns of Lat and Long then probably the BETWEEN operator is easiest. You could include this on the WHERE Clause of any of the FME database reader Feature Types, or build a ful SQL statement in a SQLExecutor.

Hello Mark, and thank you for your answer. With a risk of being annoying... How could such clause look like? Would it be better (faster) to use FeatureReader instead of SQLExecutor? I saw its around 50m objects instead of 5 as i first thought...

Userlevel 4

Hello Mark, and thank you for your answer. With a risk of being annoying... How could such clause look like? Would it be better (faster) to use FeatureReader instead of SQLExecutor? I saw its around 50m objects instead of 5 as i first thought...

I'm not Mark, but I'd try the following WHERE-clause (you'll need the BoundsExtractor before the FeatureReader).:

(latitude BETWEEN @Value(_xmin) AND @Value(_xmax)) AND (longitude BETWEEN @Value(_ymin) AND @Value(_ymax))

imageThis will return all rows where the lat/lon values intersect the incoming polygon bounding box.

Make sure to replace "latitude" and "longitude" with the actual column names of the database table. I also highly recommend indexing those two fields if performance is important to you.

You can send the output of the FeatureReader to a VertexCreator to convert the lat/lon value to a point and then use a Clipper transformer to only retain the points actually inside the polygon.

Reply