Solved

String Manipulation with RegEX

12 years ago
May 1, 2013
4 replies
16 views

kam
18 replies

Hi FME'ers

I want to be able to extract the organisation name out of a very messy string. Currently i have applied around 20 rules but there is one rule/concept that I just can't seem to work out.

Example String: The Random Organisation Random Street Randomshire RR20 2RR

I have another dataset that contains a list of all streets, i.e. 'Random Street' will be a record in this 'Street' dataset

How do i get FME to look through the street dataset and find in the string the name of the street and strip it, including the rest following the street, i.e. Random Street Randomshire RR20 2RR

Thanks!

Best answer by david_r

Hi,

I would first try to make a match / relation between the organisations and their respective street names. I would try to do this in the database and not using FME, using something like (untested):

select *

from organisation, address

where address.street like '%' || organisation.name || '%'

You could do this with a SQLCreator, for instance. The result should be something one row for each organisation with the matching street name, like:

org_name

street_name

It would then be a simple matter of using a StringSearcher with a regexp that returns the part of org_name that preceeds street_name.

David

View original

Did this help you find an answer to your question?

This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

takashi
7683 replies
12 years ago
May 1, 2013

Hi Kam,

If the string always consists of: organization name (one or more any characters) <space> street name (one or more any characters, except space) <space> 'Street' <space> shire name (one or more any characters, except space) <space> lot number (one or more any characters) StringSearcher with the following expression would extract the elements: ^(.+)\\s([^\\s]+\\sStreet)\\s([^\\s]+)\\s(.+)$

Takashi

kam
Author
18 replies
12 years ago
May 1, 2013

Hi Takashi,

I guess that is one way of doing it, but I would need to repeat this process for Road, Drive, Way, Crescent etc etc... And then it will not be very accurate as Nottingham has quite a few double worded street names, i.e. Castle Bridge Road

Is there any way of doing it outside of RegEX too? I.e look into a DBF/CSV file that contains the street name in the string and then uses that street name to be the Regex calculation

Example:

Dataset 1 containing the full string

Dataset 2 containing street names

Both datasets enter one transformer

Transformer looks at first feature and checks to see if there is a street name within the string

IF So then the string searcher value = StreetName.*

Which would then strip out the rest of the string from the street name start... and do this for every feature it finds with a street name from the dataset 2 containing street names

I hope this makes sense.

david_r
8350 replies
Best Answer
12 years ago
May 1, 2013

Hi,

I would first try to make a match / relation between the organisations and their respective street names. I would try to do this in the database and not using FME, using something like (untested):

select *

from organisation, address

where address.street like '%' || organisation.name || '%'

You could do this with a SQLCreator, for instance. The result should be something one row for each organisation with the matching street name, like:

org_name

street_name

It would then be a simple matter of using a StringSearcher with a regexp that returns the part of org_name that preceeds street_name.

David

+15

gio
Contributor
2252 replies
11 years ago
July 25, 2013

if that pattern. like mr.Takashi suggests is consistent, i would strip every line from right to left.

Using a regexp for the postalcode and zip leaves city, street and org.

Stripping city usig a city dataset leaves street and org.

Then stripping streets using your adres dataset wold leave Org.

At least in your example string.

etc.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

String Manipulation with RegEX

4 replies

Helpful Members This Week

Recently Solved Questions

Running Mammoth .docx to HTML with SystemCaller

Group processing by objects count and parallelize treatment

Using a Flow app as the trigger for a Flow automation

Rows to Columns

SQL Executor Read Only on SQLite

Community Stats

Latest FME

Cookie policy

Cookie settings

Related Topics

Exporting Catalog item along with the associated termsicon

Catalog Item with associated Term assigned reportingicon

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings