Skip to main content
Solved

Having Trouble with String Replacer


Forum|alt.badge.img

Hey all,

I am having trouble with the stringReplacer transormer. I am doing some data clean up in excel before I run the GoogleGeocoder and the stringReplacer is dropping almost everything I need it to, except for the word LIFT at the end of some of these addresses. It has dropped the word TEMP, the word GAR, but for some reason will not drop the word LIFT. I have even tried running the excel sheet through the attribute trimmer beforehand (not sure of the correct parameters) -- still nothing. I have attached some pictures above. Any clues would be much appreciated, thank you!

Best answer by david_r

If your addresses are limited to the 7-bit ASCII character sets, you can try:

LIFT\W*$

Will match the word "LIFT" as the last printable characters on a line. You can also try the following, which should be a bit more forgiving, but might not be as good for weird non-printable characters:

LIFT\s*$
View original
Did this help you find an answer to your question?

23 replies

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2016

Are you sure there are no non-printing characters in the field you are looking at?


Forum|alt.badge.img
  • Author
  • October 18, 2016
ebygomm wrote:

Are you sure there are no non-printing characters in the field you are looking at?

 

I feel like there has to be, but shouldnt the attribute trimmer get rid of them? I am not sure If I am running it correctly. When I click on show invisibles on the rubular website there looks to be a dash on the end of each address.

david_r
Evangelist
  • October 18, 2016

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2016
heidtmer wrote:

 

I feel like there has to be, but shouldnt the attribute trimmer get rid of them? I am not sure If I am running it correctly. When I click on show invisibles on the rubular website there looks to be a dash on the end of each address.
The attributetrimmer doesn't appear to work with non printing characters, see https://knowledge.safe.com/questions/34433/removing-blank-spaces.html

 

 


Forum|alt.badge.img
  • Author
  • October 18, 2016
david_r wrote:

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.

 

but even when i enter the clause as /bLIFT/b it still will not remove it even though it does in the RUBULAR expression test.

Forum|alt.badge.img
  • Author
  • October 18, 2016
david_r wrote:

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.

 

also I cannot enter like that because I have to account for street names that might have 'LIFT' inside their name.

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2016

non-printing-char.zip

Above workbench demonstrates how non printing characters interfere with the string searcher. Data in excel looks identical, however the word LIFT remains in the first because the data contains a non printing character


Forum|alt.badge.img
  • Author
  • October 18, 2016
ebygomm wrote:

non-printing-char.zip

Above workbench demonstrates how non printing characters interfere with the string searcher. Data in excel looks identical, however the word LIFT remains in the first because the data contains a non printing character

 

How did you remove it? I am unable to run that fmw -- probably because I am running FME 2014

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2016
heidtmer wrote:

 

also I cannot enter like that because I have to account for street names that might have 'LIFT' inside their name.
Have you tried \\bLIFT\\b

 

 


Forum|alt.badge.img
  • Author
  • October 18, 2016
ebygomm wrote:
Have you tried \\bLIFT\\b

 

 

Yeah, that is what I meant ^^

 

 


david_r
Evangelist
  • October 18, 2016
heidtmer wrote:

 

but even when i enter the clause as /bLIFT/b it still will not remove it even though it does in the RUBULAR expression test.
Regex has some global settings, notably on how it treats newline. When you use Rubular, you're entering all your test records as one string with \n between the records. That is different to how things are handled in FME, where every record is treated separately. If you want to check for the exact same scenario in Rubular, make sure to ONLY type one test string at a time, WITHOUT a newline after e.g.

 

15173 PROMENADE PKY LIFT
Also, note that you should use "\bLIFT\b" (note the backslashes, not forward slashes)

 

 


Forum|alt.badge.img
  • Author
  • October 18, 2016
david_r wrote:
Regex has some global settings, notably on how it treats newline. When you use Rubular, you're entering all your test records as one string with \n between the records. That is different to how things are handled in FME, where every record is treated separately. If you want to check for the exact same scenario in Rubular, make sure to ONLY type one test string at a time, WITHOUT a newline after e.g.

 

15173 PROMENADE PKY LIFT
Also, note that you should use "\bLIFT\b" (note the backslashes, not forward slashes)

 

 

oh ok thanks I did not know that. 

 

 

and yes \bLIFT\b still returns the word LIFT in my output. This is strange because every other clause I have listed in the Text to Match statement is being removed i.e. UNIT, GAR, BLDG. 

 

 


david_r
Evangelist
  • Best Answer
  • October 18, 2016

If your addresses are limited to the 7-bit ASCII character sets, you can try:

LIFT\W*$

Will match the word "LIFT" as the last printable characters on a line. You can also try the following, which should be a bit more forgiving, but might not be as good for weird non-printable characters:

LIFT\s*$

david_r
Evangelist
  • October 18, 2016
heidtmer wrote:
oh ok thanks I did not know that.

 

 

and yes \\bLIFT\\b still returns the word LIFT in my output. This is strange because every other clause I have listed in the Text to Match statement is being removed i.e. UNIT, GAR, BLDG.

 

 

Versions of FME prior to 2016 can be a bit picky if you start doing more complicated regex-stuff. It's become a lot better lately. Upgrading is definitely worth it :-)

Forum|alt.badge.img
  • Author
  • October 18, 2016
david_r wrote:

If your addresses are limited to the 7-bit ASCII character sets, you can try:

LIFT\W*$

Will match the word "LIFT" as the last printable characters on a line. You can also try the following, which should be a bit more forgiving, but might not be as good for weird non-printable characters:

LIFT\s*$
is there something special I need to type when using '$' because now the transaction is failing.

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • October 18, 2016
heidtmer wrote:
is there something special I need to type when using '$' because now the transaction is failing.
If you are using FME2014 the regular expression handling isn't nearly as sophisticated

 

 


Forum|alt.badge.img
  • Author
  • October 18, 2016
heidtmer wrote:
is there something special I need to type when using '$' because now the transaction is failing.
*Translation

 

 


david_r
Evangelist
  • October 18, 2016
heidtmer wrote:
is there something special I need to type when using '$' because now the transaction is failing.
No, nothing special, it's just a regular dollar-sign. But as said, the regex engine in FME 2014 can be a bit particular.

 


Forum|alt.badge.img
  • Author
  • October 18, 2016
heidtmer wrote:
*Translation

 

 

Got it! I still had {1,} after LIFT\\W*$ -- will not run properly that way.

 

 

However, LIFT\\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD'

 

 

thanks again! @david_r

david_r
Evangelist
  • October 18, 2016
heidtmer wrote:
Got it! I still had {1,} after LIFT\W*$         -- will not run properly that way. 

 

 

However, LIFT\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD' 

 

 

thanks again! @david_r
I tested the following in FME 2014:

 

\s+LIFT($|\s)
Result:

 

17 HWY LIFT STATION -> 17 HWY STATION
WOLPEN CLIFT RD -> WOLPEN CLIFT RD
15173 PROMENADE PKY LIFT -> 15173 PROMENADE PKY
Not sure if that's what you want, but hopefully it can give you some ideas.

Forum|alt.badge.img
  • Author
  • October 18, 2016
heidtmer wrote:
Got it! I still had {1,} after LIFT\\W*$ -- will not run properly that way.

 

 

However, LIFT\\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD'

 

 

thanks again! @david_r
Awesome! Thank you for taking the time out to help me with this. I just started really using FME after I went to a FME training last week that @erik_jan conducted and I have been using it everyday since. I just wish my company would upgrade to 2016!!!

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • October 18, 2016

wordboundaries in fme (at least in 2015 and 2106 is \\mSOMEWORD\\M.

so it would be \\mLift\\M

Rubular has a better engine and very often something works easily in Rubular while you'll find yourself sweating it in FME.

And you can get the codes for non printables and filter em out as wel using regexp. Handy to create a character set.

Or just replace them with whatever or nothing

[regsub -all {[^[:print:]]} $YourAttribute ""]

lots of tuts out htere..

http://www.regular-expressions.info/nonprint.html


Forum|alt.badge.img
  • Author
  • October 18, 2016
gio wrote:

wordboundaries in fme (at least in 2015 and 2106 is \\mSOMEWORD\\M.

so it would be \\mLift\\M

Rubular has a better engine and very often something works easily in Rubular while you'll find yourself sweating it in FME.

And you can get the codes for non printables and filter em out as wel using regexp. Handy to create a character set.

Or just replace them with whatever or nothing

[regsub -all {[^[:print:]]} $YourAttribute ""]

lots of tuts out htere..

http://www.regular-expressions.info/nonprint.html

 

awesome, thanks!

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings