Skip to main content
Question

regex to find a part of a string


Forum|alt.badge.img

Hello I would like to find the following part in a text file (originally xml file style) :

<userData code="viPartListRailML" value="1">
            <partRailML s="0.0000000000000000e+00" id="0"/>
            <partRailML s="2.0000000000000000e+01" id="1"/>
            <partRailML s="1.2000000000000000e+02" id="2"/>
            <partRailML s="2.2000000000000000e+02" id="3"/>
            <partRailML s="3.2000000000000000e+02" id="4"/>
            <partRailML s="4.2000000000000000e+02" id="5"/>
            <partRailML s="5.2000000000000000e+02" id="6"/>
            <partRailML s="6.2000000000000000e+02" id="7"/>
            <partRailML s="7.2000000000000000e+02" id="8"/>
            <partRailML s="8.2000000000000000e+02" id="9"/>
            <partRailML s="9.2000000000000000e+02" id="10"/>
        </userData>

 

Can you suggest me a regex that gives back this?

It's like anything between <userData and </userData>

6 replies

david_r
Celebrity
  • March 11, 2020

Why use a regex, specifically, and not the XML transformers in FME?


bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • March 11, 2020

^<userData\\s+(.*)<\\/userData>$

Depending on Transformer, return the middle string by referencing the first bracketed expression by asking FME to return string \\1

You didn't request it, but the \\s+ strips out any leading spaces. If want them in the result, remove this part of the Regex.


arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14

I agree with @david_r here: If you have an XML file, try tackling it with the XML Reader(s) and Transformers, before resorting to 'external' tools like RegEx and Python. RegEx is very powerful, but also known to take up a lot of memory, in particular when you're using a catch-all like .* as suggested by @bwn


david_r
Celebrity
  • March 11, 2020
bwn wrote:

^<userData\\s+(.*)<\\/userData>$

Depending on Transformer, return the middle string by referencing the first bracketed expression by asking FME to return string \\1

You didn't request it, but the \\s+ strips out any leading spaces. If want them in the result, remove this part of the Regex.

Will the regex still work if the "<userData>" block has been indented? Or if there are trailing spaces or other tags directly after the closing "</userData>" tag?


Forum|alt.badge.img+2

@gylona if you are able to include the full XML file then someone on the community might be able to illustrate how to use XMLFlattener or XMLFragmenter to extract what you need.


bwn
Evangelist
Forum|alt.badge.img+26
  • Evangelist
  • March 11, 2020
david_r wrote:

Will the regex still work if the "<userData>" block has been indented? Or if there are trailing spaces or other tags directly after the closing "</userData>" tag?

Nope, it only works for examples of strings following a similar template to the one posted and presumes the original XML values have been extracted into this form.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings