Question

FeatureReader returns duplicates when using Spatial Filter and Where clause on SQL Server

6 years ago
9 May 2017
9 replies
19 views

tim_wood
Contributor
311 replies

FME 2017.0 (17271).

I have a polygon layer which consists of an area with buffer zones around it. The polygons do not overlap. I also have point data of address which covers a larger area and includes postal and non-postal addresses. I want to select only the postal addresses that fall within the polygons.

The address data is stored in SQL Server using the Geometry data type to store the locations (WKB Geometry). So I've been comparing ways of doing the selection. In doing this I've found that using FeatureReader with both a WHERE Clause AND a Spatial Filter results in some address records being selected twice.

I read in the polygons then send these to the FeatureReader. The FeatureReader connects to Microsoft SQL Server Spatial to read the point data. The Spatial Filter is set to "Contains".

If I leave it like this and add a Tester to do "Postal NOT LIKE 'N'", I get 653,819 records. If, instead of the Tester, I add a WHERE Clause to the FeatureReader, I get 654,305 records. This is a difference of 486, and these are all duplicates i.e. some address points are being selected twice. It's pretty random which ones are selected and they are not on the boundary of a polygon.

I wondered if it was something to do with this message in the Translation Log:

Notice - reader format `MSSQL_SPATIAL' does not support querying by spatial predicate, but supports envelope queries. FME will create equivalent results by performing spatial predicate calculations locally. Performance should not be affected.

I only get this if I use both a WHERE Clause and a Spatial Filter in the FeatureReader. If I just use the Spatial Filter in the FeatureReader, that message does not appear in the log.

Incidentally, there's no difference in processing time between FeatureReader (Spatial Filter only) + Tester vs FeatureReader (Spatial Filter & WHERE Clause) - both are 1.5 hours. So I'm not desperate for an answer on this one.

By comparison, reading both datasets and using a PointOnAreaOverlayer + Tester takes 2 hours 20 mins.

9 replies

tim_wood
Author
Contributor
311 replies
6 years ago
9 May 2017

It's SQL Server 2012 and I'm using the SQL Server 2012 Native Client

Userlevel 4

+13

fmelizard
Contributor
3725 replies
6 years ago
10 May 2017

Hi Tim, this is a very very curious one. It definitely sounds like the WHERE clause is not interacting well with the envelope query. It is true today that for all database formats other than Oracle Spatial, we only use a bounding box query against the database to select data, and then we do the spatial predicate filtering on the FME side (we are working towards changing this in the future. Starting with the log message).

If it would be possible to send us over the data and the workspace to support@safe.com I'd really like us to take a look at it here. I'd also love to use it during our "Futures" talk at FMEUC because this type of scenario is one that we're targeting for a big speedup over the FME 2018 timeframe and we've got some good initial / preliminary results. So please CC me on your support request.

Thanks for bringing this to our attention.

tim_wood
Author
Contributor
311 replies
6 years ago
10 May 2017

Hi @daleatsafe I'd be happy to share this. although the source point data is 4.7 million records. I can send it to you via cloud transfer - I'll email details.

Userlevel 4

+13

fmelizard
Contributor
3725 replies
6 years ago
10 May 2017

Hi @daleatsafe I'd be happy to share this. although the source point data is 4.7 million records. I can send it to you via cloud transfer - I'll email details.

Excellent -- very interested in this. Should be "child's play" for the new approach we're working on. Well, that is a bit Elon Musk-ish of me (https://twitter.com/elonmusk/status/812708946225963008) to say but still it sounds like a great test set.

tim_wood
Author
Contributor
311 replies
6 years ago
11 May 2017

OK. I've been prepping my Workspaces to send and ran them again to verify the figures. This time both produced the same results. I'm pretty sure I deleted the output CSV files before running the translations before but there's a slight possibility that I aborted a run of the translation early and Overwrite Existing File was set to "No" in the CSV Writer. So it could be panic over as far as that's concerned! The other possibility is that I originally had both FeatureReaders in one Workspace and enabled/disabled them in turn whereas for this run I split them into 2 separate Workspaces. But I wouldn't expect the 2 FeatureReaders in the same Workspace to affect each other. More likely it's PICNIC (Problem In Chair, Not In Computer)!

On the positive side, the translations ran even quicker this time, the WHERE Clause and SpatialFilter in FeatureReader being the quickest at 1 hour 17 minutes vs 1 hour 22 minutes for SpatialFilter in FeatureReader + Tester!

This may be superseded by the fact that at some point I need to add the buffer attributes to the point data. If that can be done quickly enough, I can just use WHERE clauses to get this and various other outputs I need. I've been testing parallel processing in PointOnAreaOverlayer for this and got good time improvements but weirdly the attributes don't get added to the point data. If I don't use parallel processing they do. I'll post a separate question for this.

braggken
Contributor
18 replies
3 years ago
29 May 2020

Funny, I just ran into the same issue. Duplicate features emitted from the FeatureReader when SpatialFilter is used (no where clause).

Reading SDE/PostGres FME 2019.0.2

braggken
Contributor
18 replies
3 years ago
29 May 2020

Funny, I just ran into the same issue. Duplicate features emitted from the FeatureReader when SpatialFilter is used (no where clause).

Reading SDE/PostGres FME 2019.0.2

Ah in my case it is because there were some overlapping query features going into the SpatialQuery so I guess that may be what I should expect.

+10

dms2
Contributor
44 replies
3 years ago
28 October 2020

Ah in my case it is because there were some overlapping query features going into the SpatialQuery so I guess that may be what I should expect.

Same here. I realized there were some (polygon) features intersecting more than one (polygon) Initiator feature. So if a feature intersects 3 Initiator features I get 3 copies of the feature from the FeatureReader.

A spatial filter is not an intersection, so this is not what I would expect.

My only option is to use a DuplicatFilter right after the FeatureReader but when reading thousands of features this will slow down my translation.

Hi @dms2 this looks like it was posted a few years back you might have more luck if you post a new question to the Community and see if anyone has found a solution recently.

Thanks, Daragh

FeatureReader returns duplicates when using Spatial Filter and Where clause on SQL Server

9 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded