Skip to main content
Solved

How to extract certain parts from path name ?


lifalin2016
Contributor
Forum|alt.badge.img+29

Hi,

I'm looking for a best practice to extract parts of a path name.

If I have a path: S:\\Miljø og teknik\\Svendborg Vand\\Anlæg vand\\Microstation\\Dokgraf dokumenter\\VandGraf\\vand_knude\\ULB00004\\Arbejdsrapport.pdf

I need to extract the part starting with "VandGraf", i.e.: "VandGraf\\vand_knude\\ULB00004\\Arbejdsrapport.pdf" into one attribute, and the file name into another.

What is the simplest way to accomplish this (without custom Python coding) ?

Cheers.

Best answer by ebygomm

String searcher with some regex possibly depending on the exact rules you want to follow, e.g. do you need everything always starting with VandGraf or everything at that level which sometimes might be in a folder not called VandGraf?

If the latter, finding the substring after every 6th backslash might be more straightforward e.g.

View original
Did this help you find an answer to your question?

9 replies

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • Best Answer
  • August 22, 2017

String searcher with some regex possibly depending on the exact rules you want to follow, e.g. do you need everything always starting with VandGraf or everything at that level which sometimes might be in a folder not called VandGraf?

If the latter, finding the substring after every 6th backslash might be more straightforward e.g.


Forum|alt.badge.img+2
  • August 22, 2017

Have a look at the FilenamePartExtractor transformer also


robert_punt
Contributor
Forum|alt.badge.img+6
  • Contributor
  • August 22, 2017

You can also use an attributesplitter. Splitting on the '\\' but it has to be in the same place in the list.


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • September 1, 2017
ebygomm wrote:

String searcher with some regex possibly depending on the exact rules you want to follow, e.g. do you need everything always starting with VandGraf or everything at that level which sometimes might be in a folder not called VandGraf?

If the latter, finding the substring after every 6th backslash might be more straightforward e.g.

Actually, I wanted the former: everything starting with "Vandgraf", or NULL if not found.

 

 


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • September 1, 2017
robert_punt wrote:

You can also use an attributesplitter. Splitting on the '\\' but it has to be in the same place in the list.

Alas, I can't be sure that the paths are that well-structured.

 

 


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • September 1, 2017
mark_f wrote:

Have a look at the FilenamePartExtractor transformer also

I did, but it just gives me the "standard" parts of a path name, not a custom and optional part.

 

 


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • September 1, 2017

I ended up solving it with a PythonCaller:

def processFeature(ft):
    doc_name = ft.getAttribute("Documentname")
    if doc_name != None:
        try:
            p = doc_name.upper().index("VANDGRAF")
            if p > 0:
                RelDocName = doc_name[p:]
                ft.setAttribute("RelDocName", unicode(RelDocName))
        except:
            pass
        pass
    pass


lifalin2016
Contributor
Forum|alt.badge.img+29
  • Author
  • Contributor
  • September 1, 2017
lifalin2016 wrote:

I ended up solving it with a PythonCaller:

def processFeature(ft):
    doc_name = ft.getAttribute("Documentname")
    if doc_name != None:
        try:
            p = doc_name.upper().index("VANDGRAF")
            if p > 0:
                RelDocName = doc_name[p:]
                ft.setAttribute("RelDocName", unicode(RelDocName))
        except:
            pass
        pass
    pass

The try-except was necessary, as it apparently throws an exception if the substring isn't found.

 

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • September 1, 2017
lifalin2016 wrote:
Actually, I wanted the former: everything starting with "Vandgraf", or NULL if not found.

 

 

Even easier then with regex VandGraf.+ in the string searcher

 

 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings