Skip to main content
Solved

Comparing Strings and returning unmatched characters?


Forum|alt.badge.img

Hi , I hope this is a simple query.

 

 

I have a dataset which has a field that contains the organisation name and address seperated by spaces (see example) 

 

I also have a gazetteer that should match the address in the first dataset without organisation name.

 

 

I want to be able to compare the two strings and extract the organisation name into a new field. 

 

 

Example:

 

 

Dataset 1: Kam's Made Up Org 15 Test Street Testingham TE23 2TE

 

 

Dataset 2: 15 Test Street Testingham TE23 2TE

 

 

Desirable new field: Kam's Made Up Org

 

 

p.s. Both datasets have a unique identifier also to match them together.

 

 

Regards,

 

 

 

Best answer by mark_f

Here - try this Workbench - works with your second example.

 

 

Click Here
View original
Did this help you find an answer to your question?
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

17 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • March 5, 2013
Use a string replacer

 

 

Join the two datasets then search the string in dataset 1for the string in dataset 2 and replace it with nothing

Forum|alt.badge.img
  • Author
  • March 5, 2013
Is there another way without having to join them as my Dataset 2 has over 180k records whereas my dataset 1 has 4k. It would unecessarily look through tens of thousands of records? 

 

 

Also, if i was to do it that way, what do I enter inside the 'Text to find'?

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • March 5, 2013
You choose the attribute that contains the string from dataset 2 in the text to find, you search the attribute that contains the string from dataset 1

 

 

It won't be searching through thousands of records as what you are doing is joining the two datasets together into one 'table' and then comparing the value in one 'field' with the value in another 'field'

david_r
Evangelist
  • March 5, 2013
Hi,

 

 

a couple of questions:
  1. are you able to join the two datasets on something else than a partial string (address), or is the address part the only thing they have in common?
  2. are you 100% sure that the address part is always written exactly(!) the same in the two datasets?
David

Forum|alt.badge.img
  • Author
  • March 5, 2013
Hi EGomm and David R,

 

 

EGomm: I have joined using a joiner so now my DATASET 1 includes both string fields, one with organisation names and one without. I have then pushed it into a string replacer with String from Dataset 1 as the Attribute and string from Dataset 2 as the Text to Find, then put a blank space as the replacement text. This does not seem to do anything to the string field from Dataset 1.... 

 

What am i doing wrong?

 

 

 

David R: Yes I am able to join the two datasets with a unique identifier that I have mentioned previously, and yes I am sure the latter part of the string contains the same address as the one from dataset 2 as the system uses the address from dataset 2 and just enters an organisation name before it in the same field.

 

 

 


Forum|alt.badge.img+2
  • March 5, 2013

You said "Both datasets have a unique identifier also to match them together."

 

 

Do you mean you could join on these ID that but it's a many to one relationship? You could use a Joiner, FeatureMerger or if you're famailliar if SQL the InlineQuerier. And I wouldn't worry about joining 180k records.

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • March 5, 2013
If nothing is happening it suggests that there is no match. The process works in theory with the sample I have just created.

david_r
Evangelist
  • March 5, 2013
Hi again,

 

 

sorry, just realized that you had answered question 1 in your original question :-)

 

 

Two alternative suggestions:

 

 

1: Classic FME solution
  • FeatureMerger to join the two datasets
  • StringReplacer, like EGomm suggested
2: Using an InlineQuerier with the two datasets as input (Dataset 1= "ORG" and Dataset 2 = "ADDR" in this example), you can use this SQL query that will give you a new attribute "MY_ORG_STRING" that contains only part of the string that is not matched:
  • select ORG.ID, ORG_STR, ADDR_STR, rtrim(substr(ORG_STR, 1, length(ORG_STR)-length(ADDR_STR))) as MY_ORG_STRING from ORG join ADDR on (ORG.ID = ADDR.ID)
The above assumes that the address part of the organization string has been written EXACTLY the same. If this is not the case you will have to do fuzzy comparisons, but that is another game entirely...

 

 

David

Forum|alt.badge.img+2
  • March 5, 2013

StingReplacer Parameters:

Attibute: Dataset 1 Value

 

Text to Find: Dataset 2 Value  (Use the Set to Attribute Value option)

 

Replace: 

 


Forum|alt.badge.img+2
  • March 5, 2013

Clearly that is StringReplacer 

(Edit function needed here! :) )


Forum|alt.badge.img
  • Author
  • March 5, 2013
Haha what a pickle. Ok let's make this even simpler.

 

 

I now have 1 Dataset with the fields as follows:

 

 

Address : 15 Temp Street

 

OrgAddress : Organisation 15 Temp Street

 

 

 

Now what I want is a new field with the word 'Organisation' stripped from it

 

 

Also, if the OrgAddress was : Organisation 15 Temp Street Tempingham

 

 

 

Then the new field should strip: Organisation Tempingham..

 

 

 

 

 

 

 


Forum|alt.badge.img
  • Author
  • March 5, 2013
I am not doubting that your methods are correct, but it is not working for me, maybe I am doing something wrong.

 

 

Also I assumed the StringReplacer would have done it, this was the first thing I tried... 

 

 

I can do this through python with a simple string function but I just wondered how easy it was in FME

david_r
Evangelist
  • March 5, 2013
I just tried with your specific example:

 

 

 

 

Before:

 

Attribute(encoded: utf-8): `Address' has value `15 Temp Street' Attribute(encoded: utf-8): `OrgAddress' has value `Organisation 15 Temp Street'

 

 

After:

 

Attribute(encoded: utf-8): `Address' has value `15 Temp Street' Attribute(encoded: utf-8): `OrgAddress' has value `Organisation '

 

 

Looks halfway right to me?

 

 

David

 


Forum|alt.badge.img+2
  • Best Answer
  • March 5, 2013
Here - try this Workbench - works with your second example.

 

 

Click Here

Forum|alt.badge.img
  • Author
  • March 5, 2013
Thank you - works a treat

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • March 5, 2013

Did you figure out the problem then?

 

 

The example is just a stringreplacer?


Forum|alt.badge.img
  • Author
  • March 5, 2013
Yes, for some reason the string replacer worked when I had merged the two attributes into one dataset rather than pull them both into one workspace, join them then push through a string replacer.

 

 

Odd.. but It works now

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings