Skip to main content
Solved

Looking for increasing performance in SpatialRelator flow with large datasets


lifalin2016
Contributor
Forum|alt.badge.img+29

Hi,

Faced with having to handle multiple large datasets (5M+ features), I'm looking into whether I can improve performance. I just had one of these dataset crash after 12 hours !

The task is matching any source data set (only) spatially with a data set of buffered municipalities, and so there are no common attributes. I.e., no possible values for "Group By". Any source feature could possibly end of in any number of municipalities.

Is it possible to get a performance boost under these circumstances ? And if so, how ?

Cheers

Best answer by gio

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

View original
Did this help you find an answer to your question?

4 replies

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • Best Answer
  • November 1, 2017

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • November 2, 2017
gio wrote:

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

I'm considering tiling, but one based on aggregating municipalities into regions first. I really don't want to have to break my workspace into multiple.

 

 

You ask "How?". What I meant was that there is no way (except by spatial comparison) to pre-determine which municipalities a given feature might end up in. Note that I've added a km wide buffer around each of the municipalities to account for border issues, so each feature may end up in 1-3 or even more municipalities. It's a requirement from the customer.

 

 


Forum|alt.badge.img+1
  • November 2, 2017

Hi, @lifalin2016

You can use the FeatureReader to link your input features to the municipality using the spatial filter option 'intersects'. This transformer allows you to add the attributes of the initiator (your municipalities) to the output (your input dataset). It will read all input features that overlap with an incoming municipality (initiator), regardless of the fact if they should be linked to 1 or more municipalities.

It will also be more memory friendly since it does not require you to read all data before you do your analysis.


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • November 7, 2017
gio wrote:

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

Hi gio.

 

Well, I reconsidered your grid suggestion, and found a way to utilize it. I first created a grid of 1 km cells, and sorted out all cells uniquely within a single municipality. I then preemptively matched my features with these grid cells, only performing the "expensive" matching on municipality polygons with the features not entirely within these grid cells. It seems that about 2/3 can be sorted this way.

 

I'm running my two 5 million+ feature sets with this improved workflow (which crashed before), hoping it'll speed up things. I'll let y'all know how it went.

 

Cheers

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings