Solved

How to map the values in 1 source to that in another manually

  • 15 February 2021
  • 5 replies
  • 1 view

I have 3 shapefiles and and 1 spreadsheet I need to import into 1 PostgreSQL/PostGIS table

 

ADM1_area (Shapefile) is linked to the spreadsheet via a name column however due to differences in spellings this may need manual intervention

How Do I bring up both the Shapefile name column and the spreadsheet name column and where necessary manually map the values across the two?

for instance

MyPlace = My Place (Or one source might have diacritics )

This is probably a manual process as REGEX would take to long

 

The preferred name spelling will then map to the name column in PG table and all other spelling concatenate to a list in alt_names column

 

The other 2 shapfiles are ADM2_area and ADM2_point

these form 1 database row that has both geometry and point_geometry columns and are linked to the ADM1_area row via ADM1_name column this needs to be the same name used before and may need intervention

 

When I created the PostGIS connection I could only see how to had 1 geometry column not two?

 

Thank You

icon

Best answer by markatsafe 18 February 2021, 17:59

View original

5 replies

Badge +2

@geo_geek​  Have a look at the FuzzyStringCompareFrom2Datasets custom transformer that is available on the FME HUB. That might make the join for you.

Thanx I'll look tomorrow

FYI:

I've been playing with StringPairReplacer today to replace accented characters that fixed 7 of the 15 fails

My sample has 48 records with 15 fails

Adding multiple filters to clean text in my mind is not a good option because every source is going to be different

it would be faster to be able to bring up the two sources in a combined table specify the joining columns and manually join the 15 fails

@Mark Stoakes​  I had a look at FuzzyStringCompareFrom2Datasets  after adding two counters to the inputs to form ID columns this transformer seemed to work

However I'm unclear how to use it has none of the other input attributes are carried through

 

Unless I'm missing something?

 

In order to move forward

One of the input is a spreadsheet I've found it faster to add a column to this and copy and paste the other source name column 90% of this was in the correct order

This works on a small datasets (All mine are) upscaling it might become an issue

 

Thanx for your help

 

 

Badge +2

@geo_geek​  That's correct, only the string and unique ID's are output. You then have to join back to the appropriate feature using FeatureJoiner. In the example below:

imagestr1 is the Input1 string to be compared.

String_2_matched max is the best match for Input2. String_2_matched_2th / 3th are the 2nd & 3rd best matches.

ratio_max is the ratio for the best match, and then ratio_2th / 3th for the next to matches

id1 is the Input1 unique ID

id2 & id2_ratio_max are the unique ID's (same value) for the best match. id_2_ratio_2th / 3th are the unique IDs of the 2nd & 3rd matches.

So you can use id1 to join back to the Input1 features and id2 to join back to the best matched Input2 features.

I've attached a small example workspace (FME 2020).

Thank you for your help

Reply