Solved

Looking for increasing performance in SpatialRelator flow with large datasets

7 years ago
November 1, 2017
4 replies
28 views

+29

lifalin2016
Contributor
571 replies

Hi,

Faced with having to handle multiple large datasets (5M+ features), I'm looking into whether I can improve performance. I just had one of these dataset crash after 12 hours !

The task is matching any source data set (only) spatially with a data set of buffered municipalities, and so there are no common attributes. I.e., no possible values for "Group By". Any source feature could possibly end of in any number of municipalities.

Is it possible to get a performance boost under these circumstances ? And if so, how ?

Cheers

Best answer by gio

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

View original

Did this help you find an answer to your question?

+15

gio
Contributor
2252 replies
Best Answer
7 years ago
November 1, 2017

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

+29

lifalin2016
Author
Contributor
571 replies
7 years ago
November 2, 2017

gio wrote:

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

I'm considering tiling, but one based on aggregating municipalities into regions first. I really don't want to have to break my workspace into multiple.

You ask "How?". What I meant was that there is no way (except by spatial comparison) to pre-determine which municipalities a given feature might end up in. Note that I've added a km wide buffer around each of the municipalities to account for border issues, so each feature may end up in 1-3 or even more municipalities. It's a requirement from the customer.

-- Cheers, Lars I.

kd
52 replies
7 years ago
November 2, 2017

Hi, @lifalin2016

You can use the FeatureReader to link your input features to the municipality using the spatial filter option 'intersects'. This transformer allows you to add the attributes of the initiator (your municipalities) to the output (your input dataset). It will read all input features that overlap with an incoming municipality (initiator), regardless of the fact if they should be linked to 1 or more municipalities.

It will also be more memory friendly since it does not require you to read all data before you do your analysis.

+29

lifalin2016
Author
Contributor
571 replies
7 years ago
November 7, 2017

gio wrote:

A tiling strategy would help. (tile all sets involved)

Tile by block or municipality.

Or run per tile.

Workspacerunner.

Portproces for border objects.

"Any source feature could possibly end of in any number of municipalities."

How? ;)

Hi gio.

Well, I reconsidered your grid suggestion, and found a way to utilize it. I first created a grid of 1 km cells, and sorted out all cells uniquely within a single municipality. I then preemptively matched my features with these grid cells, only performing the "expensive" matching on municipality polygons with the features not entirely within these grid cells. It seems that about 2/3 can be sorted this way.

I'm running my two 5 million+ feature sets with this improved workflow (which crashed before), hoping it'll speed up things. I'll let y'all know how it went.

Cheers

-- Cheers, Lars I.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Looking for increasing performance in SpatialRelator flow with large datasets