Skip to main content

I have an attribute that contains location information (text-based) but it will say something like “Lake View, Chicago, USA”. I want to create a column that just tells me “USA”. Note, the data isn’t regular, sometimes there are many locality details, sometimes just a country, and sometimes no country. How do I just pull out the countries?

Is the country always the last item, divided by a comma? Then you can use an AttributeSplitter to split the attribute to list elements. Then use a ListIndexer to get the last list element value(index = -1) and set it to an attribute.


Is the country always the last item, divided by a comma? Then you can use an AttributeSplitter to split the attribute to list elements. Then use a ListIndexer to get the last list element value(index = -1) and set it to an attribute.

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Ah I see 🙂 one thing what comes to mind is to use a list of all possible countries and a StringSearcher to find a match. For this you have to merge all countries to all features and then search if attribute country is in attribute location.


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

All possible countries and common alternative spellings... 😅 USA, US, U.S.A., United States, United States of America...

 

Or misspellings... Did you know it's "The Bahamas" and not "Bahama's"? I found out after a high profile cartography project was finished 😕


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Aye, this can get very nasty very quick. It all depends on the data. Is it one time only or automated? 1000 or 1000's of records? Etc .etc.


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

One time only but 23,000 records


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Yep, and half of them are in German. I figured I can probably figure out a second language once I figure out how to deal with the first but I am concerned with different spellings. AND there are historical names.


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

If it was easy, you would not have got this question 🙂 Just get started with a base list, then iterate down the remnants.


No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

That sounds like a challenge. You can try the method @nielsgerrits​ suggested, off the top of my head the Natural Earth dataset has both English and German country names (and a load of other languages too for that matter). It will most likely not match everything but hopefully what's left is a relatively small number (and also hopefully a relatively small number of unique countries). At that point you can try and refine your list and rerun the process, or fix the remainders manually.

 


@caitlin.thorn​ what about using a geocoder? Try the FME Geocoder (OpenStreet map if free)


@caitlin.thorn​ Luckily for you, a similar problem was addressed in a recent webinar about OpenAI....see what I did there 🙂 I'd imagine you could submit a prompt that says something like "What country is @Value(att) in?" If you do try this, I'd be interested in the results. The webinar can be found in the link below, specifically the 26:30 mark in the video.

 

Just out of curiosity, here is the response I got with your example....which was a bit of a softball question for ChatGPT. If you can get it to work, you could parse the return to get everything after "is located in"

image 

Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows (safe.com)

 

EDIT: Ok, I had to try this out since this just seems too cool not to. 🙂 Using the OpenAICompletionsConnector I was able to elicit a correct response when I asked it where Lake View, Chicago. Seems very promising.

 imageimage 


So I actually ended up doing this by using a StringSearcher to match a list of country names using a RegEx. I created the list using Notion AI, then when there were matches I created a new attribute with those. This was only for those which just had the country name, which was actually more common than I initially thought, making it a bit easier.

 

Here is the country list if anyone needs it in the future. No alternative spellings but you could add those. Notion AI was a great tool here!

 

^Afghanistan|Albania|Algeria|Andorra|Angola|Antigua and Barbuda|Argentina|Armenia|Australia|Austria|Azerbaijan|Bahamas|Bahrain|Bangladesh|Barbados|Belarus|Belgium|Belize|Benin|Bhutan|Bolivia|Bosnia and Herzegovina|Botswana|Brazil|Brunei|Bulgaria|BurkinaFaso|Burundi|CaboVerde|Cambodia|Cameroon|Canada|CentralAfricanRepublic|Chad|Chile|China|Colombia|Comoros|Congo, Democratic Republic of the|Congo, Republic of the|Costa Rica|Cote d'Ivoire|Croatia|Cuba|Cyprus|Czech Republic|Denmark|Djibouti|Dominica|Dominican Republic|Ecuador|Egypt|El Salvador|Equatorial Guinea|Eritrea|Estonia|Ethiopia|Fiji|Finland|France|Gabon|Gambia|Georgia|Germany|Ghana|Greece|Grenada|Guatemala|Guinea|Guinea-Bissau|Guyana|Haiti|Honduras|Hungary|Iceland|India|Indonesia|Iran|Iraq|Ireland|Israel|Italy|Jamaica|Japan|Jordan|Kazakhstan|Kenya|Kiribati|North Korea|South Korea|Kosovo|Kuwait|Kyrgyzstan|Laos|Latvia|Lebanon|Lesotho|Liberia|Libya|Liechtenstein|Lithuania|Luxembourg|Macedonia|Madagascar|Malawi|Malaysia|Maldives|Mali|Malta|Marshall Islands|Mauritania|Mauritius|Mexico|Micronesia, Federated States of the|Moldova|Monaco|Mongolia|Montenegro|Morocco|Mozambique|Myanmar (Burma)|Namibia|Nauru|Nepal|Netherlands|NewZealand|Nicaragua|Niger|Nigeria|Norway|Oman|Pakistan|Palau|Palestine|Panama|Papua New Guinea|Paraguay|Peru|Philippines|Poland|Portugal|Qatar|Romania|Russia|Rwanda|Saint Kitts and Nevis|Saint Lucia|Saint Vincent and the Grenadines|Samoa|San Marino|Sao Tome and Principe|Saudi Arabia|Senegal|Serbia|Seychelles|Sierra Leone|Singapore|Slovakia|Slovenia|Solomon Islands|Somalia|South Africa|South Sudan|Spain|Sri Lanka|Sudan|Suriname|Swaziland|Sweden|Switzerland|Syria|Taiwan|Tajikistan|Tanzania|Thailand|Timor-Leste|Togo|Tonga|Trinidad and Tobago|Tunisia|Turkey|Turkmenistan|Tuvalu|Uganda|Ukraine|United Arab Emirates|United Kingdom|United States|Uruguay|Uzbekistan|Vanuatu|Vatican City|Venezuela|Vietnam|Yemen|Zambia|Zimbabwe|Afghanistan|Albanien|Algerien|Andorra|Angola|Antigua und Barbuda|Argentinien|Armenien|Australien|Österreich|Aserbaidschan|Bahamas|Bahrain|Bangladesch|Barbados|Belarus|Belgien|Belize|Benin|Bhutan|Bolivien|Bosnien und Herzegowina|Botswana|Brasilien|Brunei|Bulgarien|Burkina Faso|Burundi|Kap Verde|Kambodscha|Kamerun|Kanada|Zentralafrikanische Republik|Tschad|Chile|China|Kolumbien|Komoren|Demokratische Republik Kongo|Republik Kongo|Costa Rica|Côte d'Ivoire|Kroatien|Kuba|Zypern|Tschechische Republik|Dänemark|Dschibuti|Dominica|Dominikanische Republik|Ecuador|Ägypten|El Salvador|Äquatorialguinea|Eritrea|Estland|Äthiopien|Fidschi|Finnland|Frankreich|Gabun|Gambia|Georgien|Deutschland|Ghana|Griechenland|Grenada|Guatemala|Guinea|Guinea-Bissau|Guyana|Haiti|Honduras|Ungarn|Island|Indien|Indonesien|Iran|Irak|Irland|Israel|Italien|Jamaika|Japan|Jordanien|Kasachstan|Kenia|Kiribati|Nordkorea|Südkorea|Kosovo|Kuwait|Kirgisistan|Laos|Lettland|Libanon|Lesotho|Liberia|Libyen|Liechtenstein|Litauen|Luxemburg|Mazedonien|Madagaskar|Malawi|Malaysia|Malediven|Mali|Malta|Marshallinseln|Mauretanien|Mauritius|Mexiko|Mikronesien|Moldawien|Monaco|Mongolei|Montenegro|Marokko|Mosambik|Myanmar (Burma)|Namibia|Nauru|Nepal|Niederlande|Neuseeland|Nicaragua|Niger|Nigeria|Norwegen|Oman|Pakistan|Palau|Palästina|Panama|Papua-Neuguinea|Paraguay|Peru|Philippinen|Polen|Portugal|Katar|Rumänien|Russland|Ruanda|St. Kitts und Nevis|St. Lucia|St. Vincent und die Grenadinen|Samoa|San Marino|São Tomé und Príncipe|Saudi-Arabien|Senegal|Serbien|Seychellen|Sierra Leone|Singapur|Slowakei|Slowenien|Salomonen|Somalia|Südafrika|Südsudan|Spanien|Sri Lanka|Sudan|Suriname|Swasiland|Schweden|Schweiz|Syrien|Taiwan|Tadschikistan|Tansania|Thailand|Timor-Leste|Togo|Tonga|Trinidad und Tobago|Tunesien|Türkei|Turkmenistan|Tuvalu|Uganda|Ukraine|Vereinigte Arabische Emirate|Vereinigtes Königreich|Vereinigte Staaten|Uruguay|Usbekistan|Vanuatu|Vatikanstadt|Venezuela|Vietnam|Jemen|Sambia|Simbabwe$


Reply