Solved

String Manipulation

12 years ago
November 19, 2012
15 replies
198 views

giscos
8 replies

Hi Everyone,

I have been using FME for some time now and am now getting into little details. My question revolves around parsing out filenames and eliminating portions of them to create a new name. This is probably elementary to many of you.

I have a name like this: 2_038_1558_0001.ecw

In this case, the number after the last underscore "_" is always an incremented value. What I want is to always eliminate the last underscore and following number and extension and have the following result:

2_038_1558

The sequential number portion is not always 4 digits. Sometimes it is 2 or 3 or 5

(2_038_1558_0001.ecw, 2_038_1558_023.ecw, 2_038_1558_00012.ecw)

Any ideas of the which transformer to use and/or the code the place within it?

Thank you for any help .

Best answer by david_r

Here are two possible strategies. The first is using a StringSearcher with the following regexp: (.+)_ It basically takes all the characters up until the last underscore and puts them into a result group, by default contained in a list called "_matched_parts". You can then use a ListIndexer to extract "_matched_parts{0}" which would contain the desired part, defined by the parenthesis in the regexp.

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer:

import fmeobjects
def get_filename_part(feature):
    ecw_filename = feature.getAttribute("ecw_filename")
    last_underscore_pos = ecw_filename.rfind("_")
    first_part = ecw_filename[:last_underscore_pos]
    feature.setAttribute("first_part", first_part)

The code above creates a new attribute called "first_part" that contains all the characters before the last underscore in the attribute "ecw_filename". You can expose "first_part" in the PythonCaller's "Attributes to expose" to make it visible in your workspace.

David

View original

Did this help you find an answer to your question?

david_r
8355 replies
12 years ago
November 19, 2012

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer: import fmeobjects def get_filename_part(feature): ecw_filename = feature.getAttribute("ecw_filename") last_underscore_pos = ecw_filename.rfind("_") first_part = ecw_filename[:last_underscore_pos] feature.setAttribute("first_part", first_part)

David

david_r
8355 replies
Best Answer
12 years ago
November 19, 2012

Since I'm a big fan Python, here is also a solution that does the same, but without the need for a ListIndexer:

import fmeobjects
def get_filename_part(feature):
    ecw_filename = feature.getAttribute("ecw_filename")
    last_underscore_pos = ecw_filename.rfind("_")
    first_part = ecw_filename[:last_underscore_pos]
    feature.setAttribute("first_part", first_part)

David

+24

sigtill
Supporter
956 replies
12 years ago
November 19, 2012

Use a StringSearcher with this pattern:

([0-9]*_[0-9]*_[0-9]*)

To test the regexp online (and an easy editor) go here:

http://gskinner.com/RegExr/

The real SigTill

+21

davideagle
Contributor
578 replies
12 years ago
November 19, 2012

An alternate approach allows you just use Transformers but assumes you always have a filename with the 4 parts you describe. Not as flexible as Regex and Python but in this case it works.

Use an AttributeSplitter to split the filename on the underscore. Then concatentate the list elements that you want to keep, i.e. 0, 1 and 2 with underscored between and then add a constant to the end with the file extension.

+21

davideagle
Contributor
578 replies
12 years ago
November 19, 2012

An alternate approach allows you just use Transformers but assumes you always have a filename with the 4 parts you describe. Not as flexible as Regex and Python but in this case it works.

giscos
Author
8 replies
12 years ago
November 19, 2012

To all who have answered so far...

Wow. Sincerely. Thank you for such quick responses. However, I have made a classic "user mistake" by not being a bit more specific about the filenames I will be processing. They can be just about any format prior to the last underbar so you could see:

ABC_0123.ECW, A-B_C_123.ECW, 001-ABC_DEF_012.ECW

However, after the last underbar in the string along with the remaining digits (minus the file extension .ECW) is what I want stripped off leaving me:

ABC.ECW, A-B_C.ECW, 001-ABC_DEF.ECW

I was thinking about possibly reversing the string and then removing everything up to and including the first underbar, reversing it again and adding back the .ECW. I just am not sure how to do that? Does that simplify the solution?

Thanks again! You guys are great!

david_r
8355 replies
12 years ago
November 19, 2012

You're most welcome.

For what it's worth, both my suggested solutions would still work regardless of the differences in the filenames, as the solutions are both based on splitting the string at the right-most underscore, regardless of what comes before or after.

David

giscos
Author
8 replies
12 years ago
November 19, 2012

Hi David R.

Yes David, I did notice that. Since then, I have been attempting to integrate your StringSearcher solution into my .FMW. I don't quite have it working yet, but I am hopeful.

Thank you for your help.

Larry

giscos
Author
8 replies
12 years ago
November 19, 2012

david_r wrote:

You're most welcome.

David

Hi David R.

Yes David, I did notice that. Since then, I have been attempting to integrate your StringSearcher solution into my .FMW. I don't quite have it working yet, but I am hopeful.

Thank you for your help.

Larry

+21

davideagle
Contributor
578 replies
12 years ago
November 19, 2012

Given the change then Regex is most likely the best way to go and David's solution is very elegant. I find www.rubular.com is a useful website to test your Regex against. David's approach might look somwthing like this and its pretty flexible as it allows you to keep the first part up to the last underscore and then throw away what you don't need. To complete the solution just suffix with your file extension again.

+21

davideagle
Contributor
578 replies
12 years ago
November 19, 2012

takashi
7725 replies
8 years ago
April 16, 2017

The StringReplacer with this setting could also be a solution in the current version of FME that uses better regex engine than before.

Mode: Replace Regular Expression
Case Sensitive: No
Text To Replace: _\\d+(?=\\.ecw$)
Replacement Text: <not set>

+15

gio
Contributor
2252 replies
8 years ago
April 19, 2017

lol Takshi you reactivated old one...2012!!

But, okay..here..

use stringreplacer

search for

(.+[-_]+.+)\\_\\d+.(ecw)

replace with

\\1\\2

tpatterson1996
1 reply
7 years ago
June 6, 2018

ok, now it's 2018, can I just do this in the AttributeManager with the text editor?

takashi
7725 replies
7 years ago
June 6, 2018

tpatterson1996 wrote:

ok, now it's 2018, can I just do this in the AttributeManager with the text editor?

This expression set with the Text Editor works not only in 2018, but in older versions.

@ReplaceRegEx(@Value(ecw_filename),"^(.+)_\d+\.ecw$","\1")

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

String Manipulation

15 replies

Reply

Helpful Members This Week

Recently Solved Questions

Is it possible to extract text from a PDF of an MS whiteboard?

How to work with Active Periods

Excel reader not getting all content of cell

How to expand Query String Parameters dialog in HTTPCaller?

Read Access query FME

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

String Manipulation with RegEXicon

XMLTemplater and string manipulations in XQuery ?icon

FME 2016 how to subtract 10 hours from a DATETIME stringicon

Shape_Length in file geodatabaseicon

One line python caller

Helpful Members This Week

Recently Solved Questions

Is it possible to extract text from a PDF of an MS whiteboard?

How to work with Active Periods

Excel reader not getting all content of cell

How to expand Query String Parameters dialog in HTTPCaller?

Read Access query FME

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings