Skip to main content
Solved

String Manipulation with RegEX


Forum|alt.badge.img
Hi FME'ers 

 

 

I want to be able to extract the organisation name out of a very messy string. Currently i have applied around 20 rules but there is one rule/concept that I just can't seem to work out.

 

 

Example String: The Random Organisation Random Street Randomshire RR20 2RR

 

 

I have another dataset that contains a list of all streets, i.e. 'Random Street' will be a record in this 'Street' dataset

 

 

How do i get FME to look through the street dataset and find in the string the name of the street and strip it, including the rest following the street, i.e. Random Street Randomshire RR20 2RR

 

 

Thanks!

Best answer by david_r

Hi,

 

 

I would first try to make a match / relation between the organisations and their respective street names. I would try to do this in the database and not using FME, using something like (untested):

 

 

select *

 

from organisation, address

 

where address.street like '%' || organisation.name || '%'

 

 

You could do this with a SQLCreator, for instance. The result should be something one row for each organisation with the matching street name, like:

 

 

org_name

 

street_name

 

 

It would then be a simple matter of using a StringSearcher with a regexp that returns the part of org_name that preceeds street_name.

 

 

David
View original
Did this help you find an answer to your question?
This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

takashi
Influencer
  • May 1, 2013
Hi Kam,

 

 

If the string always consists of: organization name (one or more any characters) <space> street name (one or more any characters, except space) <space> 'Street' <space> shire name (one or more any characters, except space) <space> lot number (one or more any characters)   StringSearcher with the following expression would extract the elements: ^(.+)\\s([^\\s]+\\sStreet)\\s([^\\s]+)\\s(.+)$

 

Takashi

Forum|alt.badge.img
  • Author
  • May 1, 2013
Hi Takashi,

 

 

I guess that is one way of doing it, but I would need to repeat this process for Road, Drive, Way, Crescent etc etc... And then it will not be very accurate as Nottingham has quite a few double worded street names, i.e. Castle Bridge Road

 

 

Is there any way of doing it outside of RegEX too? I.e look into a DBF/CSV file that contains the street name in the string and then uses that street name to be the Regex calculation

 

 

Example:

 

Dataset 1 containing the full string

 

Dataset 2 containing street names

 

 

Both datasets enter one transformer

 

Transformer looks at first feature and checks to see if there is a street name within the string

 

IF So then the string searcher value = StreetName.*

 

 

Which would then strip out the rest of the string from the street name start... and do this for every feature it finds with a street name from the dataset 2 containing street names

 

 

 

I hope this makes sense.

 

 


david_r
Celebrity
  • Best Answer
  • May 1, 2013
Hi,

 

 

I would first try to make a match / relation between the organisations and their respective street names. I would try to do this in the database and not using FME, using something like (untested):

 

 

select *

 

from organisation, address

 

where address.street like '%' || organisation.name || '%'

 

 

You could do this with a SQLCreator, for instance. The result should be something one row for each organisation with the matching street name, like:

 

 

org_name

 

street_name

 

 

It would then be a simple matter of using a StringSearcher with a regexp that returns the part of org_name that preceeds street_name.

 

 

David

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • July 25, 2013
if that pattern. like mr.Takashi suggests is consistent, i would strip every line from right to left.

 

Using a regexp for the postalcode and zip leaves city, street and org.

 

Stripping city usig a city dataset leaves street and org.

 

Then stripping streets using your adres dataset wold leave Org.

 

At least in your example string.

 

 

 

etc.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings