Skip to main content
Solved

How to extract numbers and characters from a string?

  • July 14, 2016
  • 12 replies
  • 2516 views

Forum|alt.badge.img

I have an attribute that contains a string representing a full street address (123 Somewhere St). I want to extract the number (123) and pass it to a different attribute (CustomerAddressNumber). I also need to extract all the characters (Somewhere St) and pass that to another attribute (CustomerAddressStreet). I'm sure this is simple, I just cant seem to get it straight at the moment.

Best answer by erik_jan

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

12 replies

erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • Best Answer
  • July 14, 2016

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.


Forum|alt.badge.img
  • Author
  • 15 replies
  • July 14, 2016

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

This works well. But it leads to one additional question. How would I remove a single space left at the beginning of the Street Name ( Somewhere St)? I tried adding |\\s to the expression ([0-9]*|\\s) and get this (SomewhereSt). It removes all spaces. I'm not real familiar with regular expressions.


Forum|alt.badge.img
  • Author
  • 15 replies
  • July 14, 2016

Using regular expressions in the StringReplacer can do this for you. I would copy the original attribute in two new attributes and then remove the non wanted charachters with nothing.

For the numeric values this would be [0-9]*, for the alphanumeric [a-z,A-Z]*

I am sure other regular expressions are possible.

Got it (^\\s|[0-9]*). Thanks for the help.


Forum|alt.badge.img+5
  • 149 replies
  • July 14, 2016

I'll also point you to this exercise in the FME Desktop training manual.

In short it takes an address like "3305 W 10th Av" and splits it up into "3305" "W 10th Av". It doesn't use regex, instead it uses an AttributeSplitter. It's not a perfect solution (it assumes a maximum of four elements to the address) but it's definitely along the lines of what you are asking for.


jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • July 14, 2016

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street


Forum|alt.badge.img
  • Author
  • 15 replies
  • July 15, 2016

I would actually do it with a single StringSearcher with the expression:

^([0-9A-Z]*) ([0-9A-Z ]*)

 

(note the white space between the two parentheses)

 

 

The _match{0}.part would be the building number and the _match{1}.part would be the street.

An attributeRenamer could rename them to simple attributes. (Note that the AttributeManager does not currently support renaming single elements of a list)

 

 

That would allow for addresses like

350 5th Avenue or 221B Baker Street

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?

jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • July 15, 2016

CustAddress contains "123 Somewhere St". With the StringSearcher configured as shown the result in field "_first_match" is "123 Somewhere". How do I get _match{0} and _match{1}?

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 


Forum|alt.badge.img
  • Author
  • 15 replies
  • July 15, 2016

Click on the advanced tab and enter a name (_match) in the subexpression matches list name.

 

got it thanks.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • 2252 replies
  • July 15, 2016

depends on the compostion of your adresseses

something like "strname number letter postalcode" would require

(.*)\\s+(\\d+\\w{1})\\s+(\\d{4}\\w{2})

Then expose the attributes, in my case that would be matched_part{0-3}

recommend searching sites on regexp.

There are full and good tutorials out on the net.


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • 2252 replies
  • July 15, 2016

check out the one by

Jan Goyvaerts. (regexbuddy)

(ued to be free, now not so..)

jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • July 15, 2016

got it thanks.

Upon further reflection I would change the regex to

 

^([0-9A-Z]*) ([0-9A-Z -.]*)

ex.

 

124 Blvd. Saint-Germain


  • 1 reply
  • January 19, 2018

This issue is very old, but if you want one group of numbers in a simple variable, try [0-9]+ . The plus make difference in the result.