Skip to main content
I work with world datasets in shapefile.  Spatial joins are taking a long time and lots of resources.  I won't know from one data set to the other which countries will be present but I do know that I will have a match for both components of the join. i.e. that if I have South Korea lines I will have a South Korea Poly for the spatial join.  At the same time I am lazy and I don't want to have a template that has all the possible countries as seperate input and output.  For example I know I could do the following for each country:

 

 

Reader: KOR (lines)-->spatial relation (KOR (Area))-->Writer Output: KOR (lines) with area info.

 

 

There are many more transformers in the middle so I would have to duplicate a lot more than the simplified version above.

 

 

At the moment I use multiple datasets in the reader  the following:

 

Reader Asia (country datasets(lines)) -->spatial relation (Asia (country datasets Area))-->Writer Output: Fanout by Country (lines) with area info.

 

 

But that seems inefficient.  Doesn't the spatial relator have to match each record in the reader to every record in the collection of area datasets to find a match?  Rather than match one country to its counterpart then look for spatial matches?

 

 

Essentialy I'm asking for a way to load in lots of data sets and tell FME to match the datasets by a portion of the name then do the spatial relation based on the initial match. 

 

 

Is that possible?  Is my question understandable?

 

 

Thanks in advance for any help.

 

 

 

Hi,

 

 

If I understand you correctly you should have a look at the group by setting.
Hi,

 

 

As Itay mentioned, using appropriatly the "Group By" parameter would be effective for increasing efficiency. Alternatively, if the source datasets are divided according to countries, processing one by one country - i.e. batch processing could also be effective. The WorkspaceRunner transformer provides a quick way to do batch processing.

 

 

Takashi
Thank you Itay.   I had thought of using the Groop-By setting but hadn't got it attributed on both datasets correctly.  Your suggestion made me rethink it.  It worked a bit better for the spatial relator but adding it allowed me to use parallel processing, which sped things up even more.  Unfortuntely all the kinks are not yet worked out of parallel processing and I had to limit the number of datasets it could handle at one time. 

 

 

Next time I think I will also explore your suggestion Takashi just to see which works best.

 

 

Thanks again

Reply