Skip to main content
Solved

Comparing Strings and returning unmatched characters?


Hi , I hope this is a simple query.

 

 

I have a dataset which has a field that contains the organisation name and address seperated by spaces (see example) 

 

I also have a gazetteer that should match the address in the first dataset without organisation name.

 

 

I want to be able to compare the two strings and extract the organisation name into a new field. 

 

 

Example:

 

 

Dataset 1: Kam's Made Up Org 15 Test Street Testingham TE23 2TE

 

 

Dataset 2: 15 Test Street Testingham TE23 2TE

 

 

Desirable new field: Kam's Made Up Org

 

 

p.s. Both datasets have a unique identifier also to match them together.

 

 

Regards,

 

 

 

17 replies

Userlevel 3
Badge +23
Use a string replacer

 

 

Join the two datasets then search the string in dataset 1for the string in dataset 2 and replace it with nothing
Badge
Is there another way without having to join them as my Dataset 2 has over 180k records whereas my dataset 1 has 4k. It would unecessarily look through tens of thousands of records? 

 

 

Also, if i was to do it that way, what do I enter inside the 'Text to find'?
Userlevel 3
Badge +23
You choose the attribute that contains the string from dataset 2 in the text to find, you search the attribute that contains the string from dataset 1

 

 

It won't be searching through thousands of records as what you are doing is joining the two datasets together into one 'table' and then comparing the value in one 'field' with the value in another 'field'
Userlevel 5
Hi,

 

 

a couple of questions:
  1. are you able to join the two datasets on something else than a partial string (address), or is the address part the only thing they have in common?
  2. are you 100% sure that the address part is always written exactly(!) the same in the two datasets?
David
Badge
Hi EGomm and David R,

 

 

EGomm: I have joined using a joiner so now my DATASET 1 includes both string fields, one with organisation names and one without. I have then pushed it into a string replacer with String from Dataset 1 as the Attribute and string from Dataset 2 as the Text to Find, then put a blank space as the replacement text. This does not seem to do anything to the string field from Dataset 1.... 

 

What am i doing wrong?

 

 

 

David R: Yes I am able to join the two datasets with a unique identifier that I have mentioned previously, and yes I am sure the latter part of the string contains the same address as the one from dataset 2 as the system uses the address from dataset 2 and just enters an organisation name before it in the same field.

 

 

 

Badge +2

You said "Both datasets have a unique identifier also to match them together."

 

 

Do you mean you could join on these ID that but it's a many to one relationship? You could use a Joiner, FeatureMerger or if you're famailliar if SQL the InlineQuerier. And I wouldn't worry about joining 180k records.

 

Userlevel 3
Badge +23
If nothing is happening it suggests that there is no match. The process works in theory with the sample I have just created.
Userlevel 5
Hi again,

 

 

sorry, just realized that you had answered question 1 in your original question :-)

 

 

Two alternative suggestions:

 

 

1: Classic FME solution
  • FeatureMerger to join the two datasets
  • StringReplacer, like EGomm suggested
2: Using an InlineQuerier with the two datasets as input (Dataset 1= "ORG" and Dataset 2 = "ADDR" in this example), you can use this SQL query that will give you a new attribute "MY_ORG_STRING" that contains only part of the string that is not matched:
  • select ORG.ID, ORG_STR, ADDR_STR, rtrim(substr(ORG_STR, 1, length(ORG_STR)-length(ADDR_STR))) as MY_ORG_STRING from ORG join ADDR on (ORG.ID = ADDR.ID)
The above assumes that the address part of the organization string has been written EXACTLY the same. If this is not the case you will have to do fuzzy comparisons, but that is another game entirely...

 

 

David
Badge +2

StingReplacer Parameters:

Attibute: Dataset 1 Value

 

Text to Find: Dataset 2 Value  (Use the Set to Attribute Value option)

 

Replace: 

 

Badge +2

Clearly that is StringReplacer 

(Edit function needed here! :) )

Badge
Haha what a pickle. Ok let's make this even simpler.

 

 

I now have 1 Dataset with the fields as follows:

 

 

Address : 15 Temp Street

 

OrgAddress : Organisation 15 Temp Street

 

 

 

Now what I want is a new field with the word 'Organisation' stripped from it

 

 

Also, if the OrgAddress was : Organisation 15 Temp Street Tempingham

 

 

 

Then the new field should strip: Organisation Tempingham..

 

 

 

 

 

 

 

Badge
I am not doubting that your methods are correct, but it is not working for me, maybe I am doing something wrong.

 

 

Also I assumed the StringReplacer would have done it, this was the first thing I tried... 

 

 

I can do this through python with a simple string function but I just wondered how easy it was in FME
Userlevel 5
I just tried with your specific example:

 

 

 

 

Before:

 

Attribute(encoded: utf-8): `Address' has value `15 Temp Street' Attribute(encoded: utf-8): `OrgAddress' has value `Organisation 15 Temp Street'

 

 

After:

 

Attribute(encoded: utf-8): `Address' has value `15 Temp Street' Attribute(encoded: utf-8): `OrgAddress' has value `Organisation '

 

 

Looks halfway right to me?

 

 

David

 

Badge +2
Here - try this Workbench - works with your second example.

 

 

Click Here
Badge
Thank you - works a treat
Userlevel 3
Badge +23

Did you figure out the problem then?

 

 

The example is just a stringreplacer?

Badge
Yes, for some reason the string replacer worked when I had merged the two attributes into one dataset rather than pull them both into one workspace, join them then push through a string replacer.

 

 

Odd.. but It works now

Reply