Skip to main content
Solved

String Manipulation


Forum|alt.badge.img
Hi Everyone,

 

 

I have been using FME for some time now and am now getting into little details. My question revolves around parsing out filenames and eliminating portions of them to create a new name. This is probably elementary to many of you.

 

 

I have a name like this:   2_038_1558_0001.ecw

 

 

In this case, the number after the last underscore "_" is always an incremented value. What I want is to always eliminate the last underscore and following number and extension and have the following result:

 

 

2_038_1558

 

 

The sequential number portion is not always 4 digits. Sometimes it is 2 or 3 or 5

 

(2_038_1558_0001.ecw,  2_038_1558_023.ecw, 2_038_1558_00012.ecw)

 

 

Any ideas of the which transformer to use and/or the code the place within it?

 

 

Thank you for any help .

 

 

Best answer by david_r

Here are two possible strategies. The first is using a StringSearcher with the following regexp:   (.+)_   It basically takes all the characters up until the last underscore and puts them into a result group, by default contained in a list called "_matched_parts". You can then use a ListIndexer to extract "_matched_parts{0}" which would contain the desired part, defined by the parenthesis in the regexp.

 

 

 

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer:

import fmeobjects
def get_filename_part(feature):
    ecw_filename = feature.getAttribute("ecw_filename")
    last_underscore_pos = ecw_filename.rfind("_")
    first_part = ecw_filename[:last_underscore_pos]
    feature.setAttribute("first_part", first_part)

The code above creates a new attribute called "first_part" that contains all the characters before the last underscore in the attribute "ecw_filename". You can expose "first_part" in the PythonCaller's "Attributes to expose" to make it visible in your workspace.

 

 

 

David
View original
Did this help you find an answer to your question?

15 replies

david_r
Celebrity
  • November 19, 2012
Here are two possible strategies. The first is using a StringSearcher with the following regexp: (.+)_ It basically takes all the characters up until the last underscore and puts them into a result group, by default contained in a list called "_matched_parts". You can then use a ListIndexer to extract "_matched_parts{0}" which would contain the desired part, defined by the parenthesis in the regexp.

 

 

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer: import fmeobjects def get_filename_part(feature):     ecw_filename = feature.getAttribute("ecw_filename")     last_underscore_pos = ecw_filename.rfind("_")     first_part = ecw_filename[:last_underscore_pos]     feature.setAttribute("first_part", first_part)

 

  The code above creates a new attribute called "first_part" that contains all the characters before the last underscore in the attribute "ecw_filename". You can expose "first_part" in the PythonCaller's "Attributes to expose" to make it visible in your workspace.

 

 

David

david_r
Celebrity
  • Best Answer
  • November 19, 2012

Here are two possible strategies. The first is using a StringSearcher with the following regexp:   (.+)_   It basically takes all the characters up until the last underscore and puts them into a result group, by default contained in a list called "_matched_parts". You can then use a ListIndexer to extract "_matched_parts{0}" which would contain the desired part, defined by the parenthesis in the regexp.

 

 

 

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer:

import fmeobjects
def get_filename_part(feature):
    ecw_filename = feature.getAttribute("ecw_filename")
    last_underscore_pos = ecw_filename.rfind("_")
    first_part = ecw_filename[:last_underscore_pos]
    feature.setAttribute("first_part", first_part)

The code above creates a new attribute called "first_part" that contains all the characters before the last underscore in the attribute "ecw_filename". You can expose "first_part" in the PythonCaller's "Attributes to expose" to make it visible in your workspace.

 

 

 

David

sigtill
Supporter
Forum|alt.badge.img+24
  • Supporter
  • November 19, 2012
Use a StringSearcher with this pattern:

 

([0-9]*_[0-9]*_[0-9]*)

 

 

To test the regexp online (and an easy editor) go here:

 

http://gskinner.com/RegExr/

 

 


davideagle
Contributor
Forum|alt.badge.img+21
  • Contributor
  • November 19, 2012
An alternate approach allows you just use Transformers but assumes you always have a filename with the 4 parts you describe. Not as flexible as Regex and Python but in this case it works.

 

 

Use an AttributeSplitter to split the filename on the underscore. Then concatentate the list elements that you want to keep, i.e. 0, 1 and 2 with underscored between and then add a constant to the end with the file extension.

 

 

 


davideagle
Contributor
Forum|alt.badge.img+21
  • Contributor
  • November 19, 2012
An alternate approach allows you just use Transformers but assumes you always have a filename with the 4 parts you describe. Not as flexible as Regex and Python but in this case it works.

 

 

Use an AttributeSplitter to split the filename on the underscore. Then concatentate the list elements that you want to keep, i.e. 0, 1 and 2 with underscored between and then add a constant to the end with the file extension.

 

 

 


Forum|alt.badge.img
  • Author
  • November 19, 2012

To all who have answered so far...

Wow. Sincerely. Thank you for such quick responses. However, I have made a classic "user mistake" by not being a bit more specific about the filenames I will be processing. They can be just about any format prior to the last underbar so you could see:

 

 

ABC_0123.ECW,  A-B_C_123.ECW,  001-ABC_DEF_012.ECW

 

 

However, after the last underbar in the string along with the remaining digits (minus the file extension .ECW) is what I want stripped off leaving me:

 

 

ABC.ECW,  A-B_C.ECW,  001-ABC_DEF.ECW

 

 

I was thinking about possibly reversing the string and then removing everything up to and including the first underbar, reversing it again and adding back the .ECW. I just am not sure how to do that? Does that simplify the solution?

 

 

Thanks again! You guys are great!

 

 

david_r
Celebrity
  • November 19, 2012
You're most welcome.

 

 

For what it's worth, both my suggested solutions would still work regardless of the differences in the filenames, as the solutions are both based on splitting the string at the right-most underscore, regardless of what comes before or after.

 

 

David

Forum|alt.badge.img
  • Author
  • November 19, 2012
Hi David R.

 

 

Yes David, I did notice that. Since then,  I have been attempting to integrate your StringSearcher solution into my .FMW.  I don't quite have it working yet, but I am hopeful.

 

 

Thank you for your help.

 

 

Larry

Forum|alt.badge.img
  • Author
  • November 19, 2012
david_r wrote:
You're most welcome.

 

 

For what it's worth, both my suggested solutions would still work regardless of the differences in the filenames, as the solutions are both based on splitting the string at the right-most underscore, regardless of what comes before or after.

 

 

David
Hi David R.

 

 

Yes David, I did notice that. Since then,  I have been attempting to integrate your StringSearcher solution into my .FMW.  I don't quite have it working yet, but I am hopeful.

 

 

Thank you for your help.

 

 

Larry

davideagle
Contributor
Forum|alt.badge.img+21
  • Contributor
  • November 19, 2012
Given the change then Regex is most likely the best way to go and David's solution is very elegant. I find www.rubular.com is a useful website to test your Regex against. David's approach might look somwthing like this and its pretty flexible as it allows you to keep the first part up to the last underscore and then throw away what you don't need. To complete the solution just suffix with your file extension again.

 

 

 


davideagle
Contributor
Forum|alt.badge.img+21
  • Contributor
  • November 19, 2012
Given the change then Regex is most likely the best way to go and David's solution is very elegant. I find www.rubular.com is a useful website to test your Regex against. David's approach might look somwthing like this and its pretty flexible as it allows you to keep the first part up to the last underscore and then throw away what you don't need. To complete the solution just suffix with your file extension again.

 

 

 


takashi
Influencer
  • April 16, 2017

The StringReplacer with this setting could also be a solution in the current version of FME that uses better regex engine than before.

  • Mode: Replace Regular Expression
  • Case Sensitive: No
  • Text To Replace: _\\d+(?=\\.ecw$)
  • Replacement Text: <not set>

gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • April 19, 2017

lol Takshi you reactivated old one...2012!!

But, okay..here..

use stringreplacer

search for

(.+[-_]+.+)\\_\\d+.(ecw)

replace with

\\1\\2


ok, now it's 2018, can I just do this in the AttributeManager with the text editor?


takashi
Influencer
  • June 6, 2018
tpatterson1996 wrote:

ok, now it's 2018,  can I just do this in the AttributeManager with the text editor?

This expression set with the Text Editor works not only in 2018, but in older versions.

 

@ReplaceRegEx(@Value(ecw_filename),"^(.+)_\d+\.ecw$","\1")

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings