Question

Add country attribute by searching for words

  • 3 December 2013
  • 4 replies
  • 1 view

Hello, 

 

 

I am fairly new to using FME but I wanted to solve a problem using this software.

 

 

I have a spreadsheet with all kinds of different newsarticles wich I would like to spatially assign (automatically) to a specific country. In the newsarticles, a country name is often mentioned. I was thinking if there was a way to use some kinf of search transformer which could search for country names (in the articles) and also add a new attribute value (the country name it was assigned to).

 

 

I have two readers, the spreadsheet and a shp file of all the countries. It would prove usefull to read all the different search options (all the countries) from the shp file.

 

 

Any suggestions? thx in advance

 

 

 


4 replies

Userlevel 2
Badge +17
Hi,

 

 

Just an idea. Assume the spreadsheet feature has an attribute named "article", the shape feature has an attribute named "country", an article could contain multiple country names.

 

(FME 2013 SP4)   Put a FeatureMerger transformer, send spreadsheet features to its REQUESTOR port.   Send shape features to a ListBuilder to create a feature which has a list attribute containing all country names. List Name: _list And, send the created feature to SUPPLIER port of the FeatureMerger.   Specify any constant value (e.g. "1") to both Requestor and Supplier "Join On" parameter of the FeatureMerger. This means performing unconditional merging.   Connect a ListExploder to MERGED port of the FeatureMerger. List Attribute: _list{}   Connect a StringSearcher to LIST_FOUND port of the ListExploder. Attribute: article (refer to attribute) Regular Expression: country (refer to attribute)   Features having a country name and associated article will be outputted from MATCHED port of the StringSearcher. Then, you can merge associated article to the original shape feature with a second FeatureMerger using "country" as key attribute.   I think this way is possible, but might be inefficient because it does exploding list attributes. If possible, also consider replacing the ListExploder and the StringSearcher with a PythonCaller.

 

 

Takashi
Hi Takashi,

 

 

Thank you for the response. However, I think I already have a probem at the Featuremerger transformer. When inspecting this output, all the spreadsheet articles are assigned to the same country (french polynesia in my case). 

 

 

THis makes that the finale output is all unmerged, but do have a new attribute called "country" and has value: French Polynesia.

 

 

So I suspect it to be something wrong in the Featuremerger, maybe it has something to do with the unconditional merging? (French Polynesia is also the first country in the list, so maybe the constant value of 1 refers to this?)

 

 

Db
Never mind I got it to work, you were right takashi. Thank you so much for your help!
Userlevel 2
Badge +17
It's my pleasure. FYI, the ListExploder and the StringSearcher can be replaced with a PythonCaller with this script, and it will be a little more efficient.

 

-----

 

import fmeobjects, re   class CountryFinder(object):     def __init__(self):         pass              def input(self, feature):         article = feature.getAttribute('article')         countries = feature.getAttribute('_list{}.country')         if not article or not countries:             return         feature.removeAttrsWithPrefix('_list')         for country in countries:             if re.search(country, str(article), re.IGNORECASE):                 newFeature = feature.cloneAttributes()                 newFeature.setAttribute('country', country)                 self.pyoutput(newFeature)              def close(self):         pass -----   If an article can be tokenized appropriately, it could become more efficient. However, since a country name could consist of multiple words (e.g. French Polynesia), tokenizing will not be easy.

Reply