Solved

How do I pick country names out of an attribute?


Badge +1

I have an attribute that contains location information (text-based) but it will say something like “Lake View, Chicago, USA”. I want to create a column that just tells me “USA”. Note, the data isn’t regular, sometimes there are many locality details, sometimes just a country, and sometimes no country. How do I just pull out the countries?

icon

Best answer by nielsgerrits 8 February 2023, 11:01

View original

12 replies

Userlevel 6
Badge +33

Is the country always the last item, divided by a comma? Then you can use an AttributeSplitter to split the attribute to list elements. Then use a ListIndexer to get the last list element value(index = -1) and set it to an attribute.

Badge +1

Is the country always the last item, divided by a comma? Then you can use an AttributeSplitter to split the attribute to list elements. Then use a ListIndexer to get the last list element value(index = -1) and set it to an attribute.

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Userlevel 6
Badge +33

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Ah I see :) one thing what comes to mind is to use a list of all possible countries and a StringSearcher to find a match. For this you have to merge all countries to all features and then search if attribute country is in attribute location.

Userlevel 5
Badge +25

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

All possible countries and common alternative spellings... 😅 USA, US, U.S.A., United States, United States of America...

 

Or misspellings... Did you know it's "The Bahamas" and not "Bahama's"? I found out after a high profile cartography project was finished 😕

Userlevel 6
Badge +33

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Aye, this can get very nasty very quick. It all depends on the data. Is it one time only or automated? 1000 or 1000's of records? Etc .etc.

Badge +1

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

One time only but 23,000 records

Badge +1

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

Yep, and half of them are in German. I figured I can probably figure out a second language once I figure out how to deal with the first but I am concerned with different spellings. AND there are historical names.

Userlevel 6
Badge +33

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

If it was easy, you would not have got this question 🙂 Just get started with a base list, then iterate down the remnants.

Userlevel 5
Badge +25

No unfortunately it's not. Sometimes there is other information after or there is no country, it just has a continent.

That sounds like a challenge. You can try the method @nielsgerrits​ suggested, off the top of my head the Natural Earth dataset has both English and German country names (and a load of other languages too for that matter). It will most likely not match everything but hopefully what's left is a relatively small number (and also hopefully a relatively small number of unique countries). At that point you can try and refine your list and rerun the process, or fix the remainders manually.

 

Badge +2

@caitlin.thorn​ what about using a geocoder? Try the FME Geocoder (OpenStreet map if free)

Userlevel 3
Badge +26

@caitlin.thorn​ Luckily for you, a similar problem was addressed in a recent webinar about OpenAI....see what I did there 🙂 I'd imagine you could submit a prompt that says something like "What country is @Value(att) in?" If you do try this, I'd be interested in the results. The webinar can be found in the link below, specifically the 26:30 mark in the video.

 

Just out of curiosity, here is the response I got with your example....which was a bit of a softball question for ChatGPT. If you can get it to work, you could parse the return to get everything after "is located in"

image 

Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows (safe.com)

 

EDIT: Ok, I had to try this out since this just seems too cool not to. 🙂 Using the OpenAICompletionsConnector I was able to elicit a correct response when I asked it where Lake View, Chicago. Seems very promising.

 imageimage 

Badge +1

So I actually ended up doing this by using a StringSearcher to match a list of country names using a RegEx. I created the list using Notion AI, then when there were matches I created a new attribute with those. This was only for those which just had the country name, which was actually more common than I initially thought, making it a bit easier.

 

Here is the country list if anyone needs it in the future. No alternative spellings but you could add those. Notion AI was a great tool here!

 

^Afghanistan|Albania|Algeria|Andorra|Angola|Antigua and Barbuda|Argentina|Armenia|Australia|Austria|Azerbaijan|Bahamas|Bahrain|Bangladesh|Barbados|Belarus|Belgium|Belize|Benin|Bhutan|Bolivia|Bosnia and Herzegovina|Botswana|Brazil|Brunei|Bulgaria|BurkinaFaso|Burundi|CaboVerde|Cambodia|Cameroon|Canada|CentralAfricanRepublic|Chad|Chile|China|Colombia|Comoros|Congo, Democratic Republic of the|Congo, Republic of the|Costa Rica|Cote d'Ivoire|Croatia|Cuba|Cyprus|Czech Republic|Denmark|Djibouti|Dominica|Dominican Republic|Ecuador|Egypt|El Salvador|Equatorial Guinea|Eritrea|Estonia|Ethiopia|Fiji|Finland|France|Gabon|Gambia|Georgia|Germany|Ghana|Greece|Grenada|Guatemala|Guinea|Guinea-Bissau|Guyana|Haiti|Honduras|Hungary|Iceland|India|Indonesia|Iran|Iraq|Ireland|Israel|Italy|Jamaica|Japan|Jordan|Kazakhstan|Kenya|Kiribati|North Korea|South Korea|Kosovo|Kuwait|Kyrgyzstan|Laos|Latvia|Lebanon|Lesotho|Liberia|Libya|Liechtenstein|Lithuania|Luxembourg|Macedonia|Madagascar|Malawi|Malaysia|Maldives|Mali|Malta|Marshall Islands|Mauritania|Mauritius|Mexico|Micronesia, Federated States of the|Moldova|Monaco|Mongolia|Montenegro|Morocco|Mozambique|Myanmar (Burma)|Namibia|Nauru|Nepal|Netherlands|NewZealand|Nicaragua|Niger|Nigeria|Norway|Oman|Pakistan|Palau|Palestine|Panama|Papua New Guinea|Paraguay|Peru|Philippines|Poland|Portugal|Qatar|Romania|Russia|Rwanda|Saint Kitts and Nevis|Saint Lucia|Saint Vincent and the Grenadines|Samoa|San Marino|Sao Tome and Principe|Saudi Arabia|Senegal|Serbia|Seychelles|Sierra Leone|Singapore|Slovakia|Slovenia|Solomon Islands|Somalia|South Africa|South Sudan|Spain|Sri Lanka|Sudan|Suriname|Swaziland|Sweden|Switzerland|Syria|Taiwan|Tajikistan|Tanzania|Thailand|Timor-Leste|Togo|Tonga|Trinidad and Tobago|Tunisia|Turkey|Turkmenistan|Tuvalu|Uganda|Ukraine|United Arab Emirates|United Kingdom|United States|Uruguay|Uzbekistan|Vanuatu|Vatican City|Venezuela|Vietnam|Yemen|Zambia|Zimbabwe|Afghanistan|Albanien|Algerien|Andorra|Angola|Antigua und Barbuda|Argentinien|Armenien|Australien|Österreich|Aserbaidschan|Bahamas|Bahrain|Bangladesch|Barbados|Belarus|Belgien|Belize|Benin|Bhutan|Bolivien|Bosnien und Herzegowina|Botswana|Brasilien|Brunei|Bulgarien|Burkina Faso|Burundi|Kap Verde|Kambodscha|Kamerun|Kanada|Zentralafrikanische Republik|Tschad|Chile|China|Kolumbien|Komoren|Demokratische Republik Kongo|Republik Kongo|Costa Rica|Côte d'Ivoire|Kroatien|Kuba|Zypern|Tschechische Republik|Dänemark|Dschibuti|Dominica|Dominikanische Republik|Ecuador|Ägypten|El Salvador|Äquatorialguinea|Eritrea|Estland|Äthiopien|Fidschi|Finnland|Frankreich|Gabun|Gambia|Georgien|Deutschland|Ghana|Griechenland|Grenada|Guatemala|Guinea|Guinea-Bissau|Guyana|Haiti|Honduras|Ungarn|Island|Indien|Indonesien|Iran|Irak|Irland|Israel|Italien|Jamaika|Japan|Jordanien|Kasachstan|Kenia|Kiribati|Nordkorea|Südkorea|Kosovo|Kuwait|Kirgisistan|Laos|Lettland|Libanon|Lesotho|Liberia|Libyen|Liechtenstein|Litauen|Luxemburg|Mazedonien|Madagaskar|Malawi|Malaysia|Malediven|Mali|Malta|Marshallinseln|Mauretanien|Mauritius|Mexiko|Mikronesien|Moldawien|Monaco|Mongolei|Montenegro|Marokko|Mosambik|Myanmar (Burma)|Namibia|Nauru|Nepal|Niederlande|Neuseeland|Nicaragua|Niger|Nigeria|Norwegen|Oman|Pakistan|Palau|Palästina|Panama|Papua-Neuguinea|Paraguay|Peru|Philippinen|Polen|Portugal|Katar|Rumänien|Russland|Ruanda|St. Kitts und Nevis|St. Lucia|St. Vincent und die Grenadinen|Samoa|San Marino|São Tomé und Príncipe|Saudi-Arabien|Senegal|Serbien|Seychellen|Sierra Leone|Singapur|Slowakei|Slowenien|Salomonen|Somalia|Südafrika|Südsudan|Spanien|Sri Lanka|Sudan|Suriname|Swasiland|Schweden|Schweiz|Syrien|Taiwan|Tadschikistan|Tansania|Thailand|Timor-Leste|Togo|Tonga|Trinidad und Tobago|Tunesien|Türkei|Turkmenistan|Tuvalu|Uganda|Ukraine|Vereinigte Arabische Emirate|Vereinigtes Königreich|Vereinigte Staaten|Uruguay|Usbekistan|Vanuatu|Vatikanstadt|Venezuela|Vietnam|Jemen|Sambia|Simbabwe$

Reply