We are attempting to insert a spatial query into our flow. We need to derive a county via an input point. We tried reading the input point file (points - 2 mill) and then calling SQLExecuter and inserting the point, but the performance is poor. We then saw this article, which takes a different approach of polygon input extraction first. Unfortunately the county polygon def is too big. So we are back to square one. Anyone have any ideas how to better pull this off.

Spatial Query Performance

+2

What kind of database are you using?

If you run a spatial query directly in the database (ignore FME for the time being, and just run in pgAdmin / SQL Server Management Studio / whatever) - something like:

SELECT ST_Intersects(ST_GeometryFromText('POINT(40 -73)'), geom);

How's the performance then?

If it's slow, try running something like:

EXPLAIN SELECT ST_Intersects(ST_GeometryFromText('POINT(40 -73)'), geom);

And see whether changing the indexing on the database would help.

How does FME perform with only one point against your counties dataset?

If you're happy with the database performance but FME is slow, the next thing I would think about would be network latency.

Userlevel 4

+13

fmelizard
Contributor
3701 replies
7 years ago
26 July 2016

So to be clear, you're reading points and for each one, you want to know what county they were in. And from the sounds of it you've got the counties in a spatial database of some kind. Without knowing more I'd try setting up a workflow where the FME Reader is the point data, and then route those points into a FeatureReader, which is configured to read from the spatial database. Turn on the spatial part of the feature reader so that it does the query for you. That should give good results.

J

+1

jbradfor
Author
32 replies
7 years ago
26 July 2016

I just ran SQL direct and the queries are pretty fast, < 100 ms. However, when I connect an Input reader (3500 point rows) to the SQL executer and then write out 1 attr from the spatial query to a text file, the logging seems odd.

First it logs the first 2 spatial queries like this:

ORACLE Reader: Executing SQL Statement `SELECT ST_CNTY_FIPS_CD

FROM COUNTY

WHERE SDO_RELATE( geom,

MDSYS.SDO_GEOMETRY(2001, 8307,

MDSYS.SDO_POINT_TYPE('-90.9','38.9',NULL),NULL,NULL),

'mask=CONTAINS querytype=WINDOW')='TRUE'' to obtain features for feature type `QueryFactory_SQLExecute'

Logging limit reached for the preceding message. Further instances of this message will not be logged

The it goes into a brutally long process of reading around 50 rows at a time and logs it like this:

SQLExecutor(QueryFactory): 33 input features read

SQLExecutor(QueryFactory): 79 input features read

SQLExecutor(QueryFactory): 127 input features read

...

3500 rows takes 6 minutes.

Just wondering if there is a transform I should add before the SQLExecuter or something.

J

+1

jbradfor
Author
32 replies
7 years ago
27 July 2016

So to be clear, you're reading points and for each one, you want to know what county they were in. And from the sounds of it you've got the counties in a spatial database of some kind. Without knowing more I'd try setting up a workflow where the FME Reader is the point data, and then route those points into a FeatureReader, which is configured to read from the spatial database. Turn on the spatial part of the feature reader so that it does the query for you. That should give good results.

I just got a new workspace running as you suggested. It uses a FileReader on the Points, then a FeatureReader with a JOIN on COUNTY, and then writing out to a CSV for grins. It took 7 min to run through 3500 point rows. Ugh.

J

+1

jbradfor
Author
32 replies
7 years ago
27 July 2016

Summary of Test Results:

SQL Executer – 3500 point rows = 6 min
FeatureReader – 3500 point rows = 7 min
Web Service (SOAPSender) - 3500 point rows = 2 min

None of these options will be adequate for a 2 mill row table or a 7 mill row table

Userlevel 4

+13

fmelizard
Contributor
3701 replies
7 years ago
28 July 2016

Summary of Test Results:

SQL Executer – 3500 point rows = 6 min
FeatureReader – 3500 point rows = 7 min
Web Service (SOAPSender) - 3500 point rows = 2 min

None of these options will be adequate for a 2 mill row table or a 7 mill row table

So you may have between 2 and 7 million input points , probably coming from CSV? ANd you want, for each of them, to find out what county they are inside of? Can you advise how many counties there are in your database? And presumably you want to write this all out again to CSV afterwards? Definitely want to pursue making this really work well for you... @jbradfor

J

+1

jbradfor
Author
32 replies
7 years ago
28 July 2016

Use Case: Need to derive and update county info cols for 2 Oracle Point tables (2 mill and 7 mill). We have an Oracle County boundary table for the nation.

I developed and did some light performance testing in order for us to select the best option:

Summary of Test Results:

SQL Executer – 3500 point rows = 6 min
FeatureReader – 3500 point rows = 7 min
Web Service (SOAPSender) - 3500 point rows = 2 min
Oracle Trigger on the Point tables (insert or update of lat and/or long) triggers an Oracle Spatial query against County - 403,013 rows = 29 min / 13,793 rows per min

Looks like the trigger is the best option.

Userlevel 1

+10

ebygomm
Participant
3078 replies
7 years ago
28 July 2016

Is there a reason you cannot use an sql creator at the start of your workspace to do the oracle spatial query?

Or use a point on area overlayer within the workspace after reading in all the counties?

Reading an oracle point dataset containing 2.7 million records, reading an oracle polygon dataset containing 75 counties and using the point on area overlayer takes me 14 minutes to assign a county to each record.

Userlevel 4

+13

fmelizard
Contributor
3701 replies
7 years ago
29 July 2016

Use Case: Need to derive and update county info cols for 2 Oracle Point tables (2 mill and 7 mill). We have an Oracle County boundary table for the nation.

I developed and did some light performance testing in order for us to select the best option:

Summary of Test Results:

SQL Executer – 3500 point rows = 6 min
FeatureReader – 3500 point rows = 7 min
Web Service (SOAPSender) - 3500 point rows = 2 min
Oracle Trigger on the Point tables (insert or update of lat and/or long) triggers an Oracle Spatial query against County - 403,013 rows = 29 min / 13,793 rows per min

Looks like the trigger is the best option.

If everything is all in Oracle, could you just use a single SQL Statement to do the updates in one huge shot?

Spatial Query Performance

9 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded