Solved

How to extract numbers and characters from a string?


Badge

I have an attribute that contains a string representing a full street address (123 Somewhere St). I want to extract the number (123) and pass it to a different attribute (CustomerAddressNumber). I also need to extract all the characters (Somewhere St) and pass that to another attribute (CustomerAddressStreet). I'm sure this is simple, I just cant seem to get it straight at the moment.

icon

Best answer by erik_jan 14 July 2016, 15:59

View original

12 replies

Userlevel 2
Badge +16

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

Badge

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

This works well. But it leads to one additional question. How would I remove a single space left at the beginning of the Street Name ( Somewhere St)? I tried adding |\\s to the expression ([0-9]*|\\s) and get this (SomewhereSt). It removes all spaces. I'm not real familiar with regular expressions.

Badge

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

Got it (^\\s|[0-9]*). Thanks for the help.

Badge +5

I'll also point you to this exercise in the FME Desktop training manual.

In short it takes an address like "3305 W 10th Av" and splits it up into "3305" "W 10th Av". It doesn't use regex, instead it uses an AttributeSplitter. It's not a perfect solution (it assumes a maximum of four elements to the address) but it's definitely along the lines of what you are asking for.

Badge +22

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street

Badge

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?
Badge +22

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 

Badge

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 

got it thanks.

Badge +3

depends on the compostion of your adresseses

something like "strname number letter postalcode" would require

(.*)\\s+(\\d+\\w{1})\\s+(\\d{4}\\w{2})

Then expose the attributes, in my case that would be matched_part{0-3}

recommend searching sites on regexp.

There are full and good tutorials out on the net.

Badge +3

check out the one by

Jan Goyvaerts. (regexbuddy)

(ued to be free, now not so..)
Badge +22

got it thanks.

Upon further reflection I would change the regex to

 

^([0-9A-Z]*) ([0-9A-Z -.]*)

ex.

 

124 Blvd. Saint-Germain

This issue is very old, but if you want one group of numbers in a simple variable, try [0-9]+ . The plus make difference in the result.

Reply