Skip to main content
Solved

How to extract numbers and characters from a string?


Forum|alt.badge.img

I have an attribute that contains a string representing a full street address (123 Somewhere St). I want to extract the number (123) and pass it to a different attribute (CustomerAddressNumber). I also need to extract all the characters (Somewhere St) and pass that to another attribute (CustomerAddressStreet). I'm sure this is simple, I just cant seem to get it straight at the moment.

Best answer by erik_jan

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

View original
Did this help you find an answer to your question?

12 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • Best Answer
  • July 14, 2016

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.


Forum|alt.badge.img
  • Author
  • July 14, 2016
erik_jan wrote:

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

This works well. But it leads to one additional question. How would I remove a single space left at the beginning of the Street Name ( Somewhere St)? I tried adding |\\s to the expression ([0-9]*|\\s) and get this (SomewhereSt). It removes all spaces. I'm not real familiar with regular expressions.


Forum|alt.badge.img
  • Author
  • July 14, 2016
erik_jan wrote:

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

Got it (^\\s|[0-9]*). Thanks for the help.


Forum|alt.badge.img+5

I'll also point you to this exercise in the FME Desktop training manual.

In short it takes an address like "3305 W 10th Av" and splits it up into "3305" "W 10th Av". It doesn't use regex, instead it uses an AttributeSplitter. It's not a perfect solution (it assumes a maximum of four elements to the address) but it's definitely along the lines of what you are asking for.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • July 14, 2016

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street


Forum|alt.badge.img
  • Author
  • July 15, 2016
jdh wrote:

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • July 15, 2016
jim wrote:

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 


Forum|alt.badge.img
  • Author
  • July 15, 2016
jdh wrote:

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 

got it thanks.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • July 15, 2016

depends on the compostion of your adresseses

something like "strname number letter postalcode" would require

(.*)\\s+(\\d+\\w{1})\\s+(\\d{4}\\w{2})

Then expose the attributes, in my case that would be matched_part{0-3}

recommend searching sites on regexp.

There are full and good tutorials out on the net.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • July 15, 2016

check out the one by

Jan Goyvaerts. (regexbuddy)

(ued to be free, now not so..)

jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • July 15, 2016
jim wrote:

got it thanks.

Upon further reflection I would change the regex to

 

^([0-9A-Z]*) ([0-9A-Z -.]*)

ex.

 

124 Blvd. Saint-Germain


This issue is very old, but if you want one group of numbers in a simple variable, try [0-9]+ . The plus make difference in the result.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings