Question

regular expression problem

9 years ago
July 27, 2015
7 replies
7 views

morten_aagaard
1 reply

Hi all

I am cleaning up a huge adress dataset and I am now stuck so I hope someone here has an idea.

I have a large amount of adresses in one attribute column that are all composed of the following:

"road name" "road number" "junk i need to remove"

example:

fictive street 123 ,23-65

My problem is that I want to remove everything after the road number (ie: ,23-65)

the road names can contain x numbers of characters with a random number of white spaces. I'm guessing I need to use regular expressions but I can't figure out how to select and remove all the junk text. the junk always comes after the road number and a white space, tje junk can be a random number of characters long.

Any ideas?

+11

pratap
Contributor
594 replies
9 years ago
July 27, 2015

Have you tried "AttributeSplitter" with space and StringConcatenator later?

+19

takashi
Contributor
7538 replies
9 years ago
July 27, 2015

Hi,

?> the junk always comes after the road number and a white space

I would try using the StringReplacer with this setting.

Text to Match: ^(.*\\d)\\s.*$

Replacement Text: \\1

Use Regular Expressions: yes

Takashi

david_r
8316 replies
9 years ago
July 27, 2015

For some reason the regular expressions behave a bit unexpectedly (to me) in the StringSearcher et al.

But try the RegularExpressionMatcher (that is Python-based) from the FME Store and set it up as follows:

It will search for all the characters up until the first whitespace after the first group of numbers from the beginning of the line.

For a roadname attribute that contains "fictive street 123 ,23-65" it will return the list attribute "REM_matched_parts{0}" with the value "fictive street 123"

David

+31

ebygomm
Influencer
3236 replies
9 years ago
July 27, 2015

A string searcher with the following regular expression should return everything you want without the 'junk' in the matched result attribute

^\\D+[0-9]+

+31

ebygomm
Influencer
3236 replies
9 years ago
July 27, 2015

Just a thought, do you need to allow for suffixes, eg. fictive street 22a, 6340

+31

ebygomm
Influencer
3236 replies
9 years ago
July 27, 2015

The following should allow for that evenutality (assuming single letter suffixes)

^\\D+[0-9]+[a-z]?

morten_aagaard
Author
1 reply
9 years ago
July 29, 2015

Thank you all for your help, I was able to use your examples to clean up about 14500 adresses out 14800, the rest are so screwed up that they will have to be cleaned more or less manually but regular expressions sure are powerful :)

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

regular expression problem