Solved

Having Trouble with String Replacer


Badge

Hey all,

I am having trouble with the stringReplacer transormer. I am doing some data clean up in excel before I run the GoogleGeocoder and the stringReplacer is dropping almost everything I need it to, except for the word LIFT at the end of some of these addresses. It has dropped the word TEMP, the word GAR, but for some reason will not drop the word LIFT. I have even tried running the excel sheet through the attribute trimmer beforehand (not sure of the correct parameters) -- still nothing. I have attached some pictures above. Any clues would be much appreciated, thank you!

icon

Best answer by david_r 18 October 2016, 16:41

View original

23 replies

Badge +10

Are you sure there are no non-printing characters in the field you are looking at?

Badge

Are you sure there are no non-printing characters in the field you are looking at?

 

I feel like there has to be, but shouldnt the attribute trimmer get rid of them? I am not sure If I am running it correctly. When I click on show invisibles on the rubular website there looks to be a dash on the end of each address.
Userlevel 4

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.

Badge +10

 

I feel like there has to be, but shouldnt the attribute trimmer get rid of them? I am not sure If I am running it correctly. When I click on show invisibles on the rubular website there looks to be a dash on the end of each address.
The attributetrimmer doesn't appear to work with non printing characters, see https://knowledge.safe.com/questions/34433/removing-blank-spaces.html

 

 

Badge

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.

 

but even when i enter the clause as /bLIFT/b it still will not remove it even though it does in the RUBULAR expression test.
Badge

The problem is that your regex says that the word "LIFT" needs at least one space after for it to match:

LIFT {1,}

I suspect that there is no space after "LIFT" in your data.

 

also I cannot enter like that because I have to account for street names that might have 'LIFT' inside their name.
Badge +10

non-printing-char.zip

Above workbench demonstrates how non printing characters interfere with the string searcher. Data in excel looks identical, however the word LIFT remains in the first because the data contains a non printing character

Badge

non-printing-char.zip

Above workbench demonstrates how non printing characters interfere with the string searcher. Data in excel looks identical, however the word LIFT remains in the first because the data contains a non printing character

 

How did you remove it? I am unable to run that fmw -- probably because I am running FME 2014
Badge +10

 

also I cannot enter like that because I have to account for street names that might have 'LIFT' inside their name.
Have you tried \\bLIFT\\b

 

 

Badge
Have you tried \\bLIFT\\b

 

 

Yeah, that is what I meant ^^

 

 

Userlevel 4

 

but even when i enter the clause as /bLIFT/b it still will not remove it even though it does in the RUBULAR expression test.
Regex has some global settings, notably on how it treats newline. When you use Rubular, you're entering all your test records as one string with \n between the records. That is different to how things are handled in FME, where every record is treated separately. If you want to check for the exact same scenario in Rubular, make sure to ONLY type one test string at a time, WITHOUT a newline after e.g.

 

15173 PROMENADE PKY LIFT
Also, note that you should use "\bLIFT\b" (note the backslashes, not forward slashes)

 

 

Badge
Regex has some global settings, notably on how it treats newline. When you use Rubular, you're entering all your test records as one string with \n between the records. That is different to how things are handled in FME, where every record is treated separately. If you want to check for the exact same scenario in Rubular, make sure to ONLY type one test string at a time, WITHOUT a newline after e.g.

 

15173 PROMENADE PKY LIFT
Also, note that you should use "\bLIFT\b" (note the backslashes, not forward slashes)

 

 

oh ok thanks I did not know that. 

 

 

and yes \bLIFT\b still returns the word LIFT in my output. This is strange because every other clause I have listed in the Text to Match statement is being removed i.e. UNIT, GAR, BLDG. 

 

 

Userlevel 4

If your addresses are limited to the 7-bit ASCII character sets, you can try:

LIFT\W*$

Will match the word "LIFT" as the last printable characters on a line. You can also try the following, which should be a bit more forgiving, but might not be as good for weird non-printable characters:

LIFT\s*$
Userlevel 4
oh ok thanks I did not know that.

 

 

and yes \\bLIFT\\b still returns the word LIFT in my output. This is strange because every other clause I have listed in the Text to Match statement is being removed i.e. UNIT, GAR, BLDG.

 

 

Versions of FME prior to 2016 can be a bit picky if you start doing more complicated regex-stuff. It's become a lot better lately. Upgrading is definitely worth it :-)
Badge

If your addresses are limited to the 7-bit ASCII character sets, you can try:

LIFT\W*$

Will match the word "LIFT" as the last printable characters on a line. You can also try the following, which should be a bit more forgiving, but might not be as good for weird non-printable characters:

LIFT\s*$
is there something special I need to type when using '$' because now the transaction is failing.
Badge +10
is there something special I need to type when using '$' because now the transaction is failing.
If you are using FME2014 the regular expression handling isn't nearly as sophisticated

 

 

Badge
is there something special I need to type when using '$' because now the transaction is failing.
*Translation

 

 

Userlevel 4
is there something special I need to type when using '$' because now the transaction is failing.
No, nothing special, it's just a regular dollar-sign. But as said, the regex engine in FME 2014 can be a bit particular.

 

Badge
*Translation

 

 

Got it! I still had {1,} after LIFT\\W*$ -- will not run properly that way.

 

 

However, LIFT\\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD'

 

 

thanks again! @david_r
Userlevel 4
Got it! I still had {1,} after LIFT\W*$         -- will not run properly that way. 

 

 

However, LIFT\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD' 

 

 

thanks again! @david_r
I tested the following in FME 2014:

 

\s+LIFT($|\s)
Result:

 

17 HWY LIFT STATION -> 17 HWY STATION
WOLPEN CLIFT RD -> WOLPEN CLIFT RD
15173 PROMENADE PKY LIFT -> 15173 PROMENADE PKY
Not sure if that's what you want, but hopefully it can give you some ideas.
Badge
Got it! I still had {1,} after LIFT\\W*$ -- will not run properly that way.

 

 

However, LIFT\\W*$ worked and I am only left with street names that have LIFT inside them such as '17 HWY LIFT STATION' or 'WOLPEN CLIFT RD'

 

 

thanks again! @david_r
Awesome! Thank you for taking the time out to help me with this. I just started really using FME after I went to a FME training last week that @erik_jan conducted and I have been using it everyday since. I just wish my company would upgrade to 2016!!!

 

Badge +3

wordboundaries in fme (at least in 2015 and 2106 is \\mSOMEWORD\\M.

so it would be \\mLift\\M

Rubular has a better engine and very often something works easily in Rubular while you'll find yourself sweating it in FME.

And you can get the codes for non printables and filter em out as wel using regexp. Handy to create a character set.

Or just replace them with whatever or nothing

[regsub -all {[^[:print:]]} $YourAttribute ""]

lots of tuts out htere..

http://www.regular-expressions.info/nonprint.html

Badge

wordboundaries in fme (at least in 2015 and 2106 is \\mSOMEWORD\\M.

so it would be \\mLift\\M

Rubular has a better engine and very often something works easily in Rubular while you'll find yourself sweating it in FME.

And you can get the codes for non printables and filter em out as wel using regexp. Handy to create a character set.

Or just replace them with whatever or nothing

[regsub -all {[^[:print:]]} $YourAttribute ""]

lots of tuts out htere..

http://www.regular-expressions.info/nonprint.html

 

awesome, thanks!

Reply