Question

Spatial Query Performance


Badge +1

We are attempting to insert a spatial query into our flow. We need to derive a county via an input point. We tried reading the input point file (points - 2 mill) and then calling SQLExecuter and inserting the point, but the performance is poor. We then saw this article, which takes a different approach of polygon input extraction first. Unfortunately the county polygon def is too big. So we are back to square one. Anyone have any ideas how to better pull this off.


9 replies

Badge +2

What kind of database are you using?

If you run a spatial query directly in the database (ignore FME for the time being, and just run in pgAdmin / SQL Server Management Studio / whatever) - something like:

SELECT ST_Intersects(ST_GeometryFromText('POINT(40 -73)'), geom);

How's the performance then?

If it's slow, try running something like:

EXPLAIN SELECT ST_Intersects(ST_GeometryFromText('POINT(40 -73)'), geom);

And see whether changing the indexing on the database would help.

How does FME perform with only one point against your counties dataset?

If you're happy with the database performance but FME is slow, the next thing I would think about would be network latency.

Userlevel 4
Badge +13

So to be clear, you're reading points and for each one, you want to know what county they were in. And from the sounds of it you've got the counties in a spatial database of some kind. Without knowing more I'd try setting up a workflow where the FME Reader is the point data, and then route those points into a FeatureReader, which is configured to read from the spatial database. Turn on the spatial part of the feature reader so that it does the query for you. That should give good results.

Badge +1

I just ran SQL direct and the queries are pretty fast, < 100 ms. However, when I connect an Input reader (3500 point rows) to the SQL executer and then write out 1 attr from the spatial query to a text file, the logging seems odd.

First it logs the first 2 spatial queries like this:

ORACLE Reader: Executing SQL Statement `SELECT ST_CNTY_FIPS_CD

 

FROM COUNTY

WHERE SDO_RELATE( geom,

MDSYS.SDO_GEOMETRY(2001, 8307,

MDSYS.SDO_POINT_TYPE('-90.9','38.9',NULL),NULL,NULL),

'mask=CONTAINS querytype=WINDOW')='TRUE'' to obtain features for feature type `QueryFactory_SQLExecute'

Logging limit reached for the preceding message. Further instances of this message will not be logged

The it goes into a brutally long process of reading around 50 rows at a time and logs it like this:

SQLExecutor(QueryFactory): 33 input features read

SQLExecutor(QueryFactory): 79 input features read

SQLExecutor(QueryFactory): 127 input features read

...

3500 rows takes 6 minutes.

Just wondering if there is a transform I should add before the SQLExecuter or something.

Badge +1

So to be clear, you're reading points and for each one, you want to know what county they were in. And from the sounds of it you've got the counties in a spatial database of some kind. Without knowing more I'd try setting up a workflow where the FME Reader is the point data, and then route those points into a FeatureReader, which is configured to read from the spatial database. Turn on the spatial part of the feature reader so that it does the query for you. That should give good results.

I just got a new workspace running as you suggested. It uses a FileReader on the Points, then a FeatureReader with a JOIN on COUNTY, and then writing out to a CSV for grins. It took 7 min to run through 3500 point rows. Ugh.

Badge +1

Summary of Test Results:

  1. SQL Executer – 3500 point rows = 6 min
  2. FeatureReader – 3500 point rows = 7 min
  3. Web Service (SOAPSender) - 3500 point rows = 2 min

None of these options will be adequate for a 2 mill row table or a 7 mill row table

Userlevel 4
Badge +13

Summary of Test Results:

  1. SQL Executer – 3500 point rows = 6 min
  2. FeatureReader – 3500 point rows = 7 min
  3. Web Service (SOAPSender) - 3500 point rows = 2 min

None of these options will be adequate for a 2 mill row table or a 7 mill row table

So you may have between 2 and 7 million input points , probably coming from CSV? ANd you want, for each of them, to find out what county they are inside of? Can you advise how many counties there are in your database? And presumably you want to write this all out again to CSV afterwards? Definitely want to pursue making this really work well for you... @jbradfor

Badge +1

Use Case: Need to derive and update county info cols for 2 Oracle Point tables (2 mill and 7 mill). We have an Oracle County boundary table for the nation.

I developed and did some light performance testing in order for us to select the best option:

Summary of Test Results:

  1. SQL Executer – 3500 point rows = 6 min
  2. FeatureReader – 3500 point rows = 7 min
  3. Web Service (SOAPSender) - 3500 point rows = 2 min
  4. Oracle Trigger on the Point tables (insert or update of lat and/or long) triggers an Oracle Spatial query against County - 403,013 rows = 29 min / 13,793 rows per min

Looks like the trigger is the best option.

Userlevel 1
Badge +10

Is there a reason you cannot use an sql creator at the start of your workspace to do the oracle spatial query?

Or use a point on area overlayer within the workspace after reading in all the counties?

Reading an oracle point dataset containing 2.7 million records, reading an oracle polygon dataset containing 75 counties and using the point on area overlayer takes me 14 minutes to assign a county to each record.

Userlevel 4
Badge +13

Use Case: Need to derive and update county info cols for 2 Oracle Point tables (2 mill and 7 mill). We have an Oracle County boundary table for the nation.

I developed and did some light performance testing in order for us to select the best option:

Summary of Test Results:

  1. SQL Executer – 3500 point rows = 6 min
  2. FeatureReader – 3500 point rows = 7 min
  3. Web Service (SOAPSender) - 3500 point rows = 2 min
  4. Oracle Trigger on the Point tables (insert or update of lat and/or long) triggers an Oracle Spatial query against County - 403,013 rows = 29 min / 13,793 rows per min

Looks like the trigger is the best option.

If everything is all in Oracle, could you just use a single SQL Statement to do the updates in one huge shot?

Reply