Question

FeatureReader returns duplicates when using Spatial Filter and Where clause on SQL Server


Badge +7

FME 2017.0 (17271).

I have a polygon layer which consists of an area with buffer zones around it. The polygons do not overlap. I also have point data of address which covers a larger area and includes postal and non-postal addresses. I want to select only the postal addresses that fall within the polygons.

The address data is stored in SQL Server using the Geometry data type to store the locations (WKB Geometry). So I've been comparing ways of doing the selection. In doing this I've found that using FeatureReader with both a WHERE Clause AND a Spatial Filter results in some address records being selected twice.

I read in the polygons then send these to the FeatureReader. The FeatureReader connects to Microsoft SQL Server Spatial to read the point data. The Spatial Filter is set to "Contains".

If I leave it like this and add a Tester to do "Postal NOT LIKE 'N'", I get 653,819 records. If, instead of the Tester, I add a WHERE Clause to the FeatureReader, I get 654,305 records. This is a difference of 486, and these are all duplicates i.e. some address points are being selected twice. It's pretty random which ones are selected and they are not on the boundary of a polygon.

I wondered if it was something to do with this message in the Translation Log:

Notice - reader format `MSSQL_SPATIAL' does not support querying by spatial predicate, but supports envelope queries. FME will create equivalent results by performing spatial predicate calculations locally. Performance should not be affected.

I only get this if I use both a WHERE Clause and a Spatial Filter in the FeatureReader. If I just use the Spatial Filter in the FeatureReader, that message does not appear in the log.

Incidentally, there's no difference in processing time between FeatureReader (Spatial Filter only) + Tester vs FeatureReader (Spatial Filter & WHERE Clause) - both are 1.5 hours. So I'm not desperate for an answer on this one.

By comparison, reading both datasets and using a PointOnAreaOverlayer + Tester takes 2 hours 20 mins.


9 replies

Badge +7
It's SQL Server 2012 and I'm using the SQL Server 2012 Native Client

 

 

Userlevel 4
Badge +13

Hi Tim, this is a very very curious one. It definitely sounds like the WHERE clause is not interacting well with the envelope query. It is true today that for all database formats other than Oracle Spatial, we only use a bounding box query against the database to select data, and then we do the spatial predicate filtering on the FME side (we are working towards changing this in the future. Starting with the log message).

If it would be possible to send us over the data and the workspace to support@safe.com I'd really like us to take a look at it here. I'd also love to use it during our "Futures" talk at FMEUC because this type of scenario is one that we're targeting for a big speedup over the FME 2018 timeframe and we've got some good initial / preliminary results. So please CC me on your support request.

Thanks for bringing this to our attention.

Badge +7

Hi @daleatsafe I'd be happy to share this. although the source point data is 4.7 million records. I can send it to you via cloud transfer - I'll email details.

Userlevel 4
Badge +13

Hi @daleatsafe I'd be happy to share this. although the source point data is 4.7 million records. I can send it to you via cloud transfer - I'll email details.

Excellent -- very interested in this. Should be "child's play" for the new approach we're working on. Well, that is a bit Elon Musk-ish of me (https://twitter.com/elonmusk/status/812708946225963008) to say but still it sounds like a great test set.

 

 

Badge +7

OK. I've been prepping my Workspaces to send and ran them again to verify the figures. This time both produced the same results. I'm pretty sure I deleted the output CSV files before running the translations before but there's a slight possibility that I aborted a run of the translation early and Overwrite Existing File was set to "No" in the CSV Writer. So it could be panic over as far as that's concerned! The other possibility is that I originally had both FeatureReaders in one Workspace and enabled/disabled them in turn whereas for this run I split them into 2 separate Workspaces. But I wouldn't expect the 2 FeatureReaders in the same Workspace to affect each other. More likely it's PICNIC (Problem In Chair, Not In Computer)!

On the positive side, the translations ran even quicker this time, the WHERE Clause and SpatialFilter in FeatureReader being the quickest at 1 hour 17 minutes vs 1 hour 22 minutes for SpatialFilter in FeatureReader + Tester!

This may be superseded by the fact that at some point I need to add the buffer attributes to the point data. If that can be done quickly enough, I can just use WHERE clauses to get this and various other outputs I need. I've been testing parallel processing in PointOnAreaOverlayer for this and got good time improvements but weirdly the attributes don't get added to the point data. If I don't use parallel processing they do. I'll post a separate question for this.

Badge

Funny, I just ran into the same issue. Duplicate features emitted from the FeatureReader when SpatialFilter is used (no where clause).

Reading SDE/PostGres FME 2019.0.2

Badge

Funny, I just ran into the same issue. Duplicate features emitted from the FeatureReader when SpatialFilter is used (no where clause).

Reading SDE/PostGres FME 2019.0.2

Ah in my case it is because there were some overlapping query features going into the SpatialQuery so I guess that may be what I should expect.

Badge +10

Ah in my case it is because there were some overlapping query features going into the SpatialQuery so I guess that may be what I should expect.

Same here. I realized there were some (polygon) features intersecting more than one (polygon) Initiator feature. So if a feature intersects 3 Initiator features I get 3 copies of the feature from the FeatureReader.

A spatial filter is not an intersection, so this is not what I would expect.

My only option is to use a DuplicatFilter right after the FeatureReader but when reading thousands of features this will slow down my translation.

Badge

Hi @dms2​ this looks like it was posted a few years back you might have more luck if you post a new question to the Community and see if anyone has found a solution recently.

Thanks, Daragh

Reply