Skip to main content

Hello I would like to find the following part in a text file (originally xml file style) :

<userData code="viPartListRailML" value="1">
            <partRailML s="0.0000000000000000e+00" id="0"/>
            <partRailML s="2.0000000000000000e+01" id="1"/>
            <partRailML s="1.2000000000000000e+02" id="2"/>
            <partRailML s="2.2000000000000000e+02" id="3"/>
            <partRailML s="3.2000000000000000e+02" id="4"/>
            <partRailML s="4.2000000000000000e+02" id="5"/>
            <partRailML s="5.2000000000000000e+02" id="6"/>
            <partRailML s="6.2000000000000000e+02" id="7"/>
            <partRailML s="7.2000000000000000e+02" id="8"/>
            <partRailML s="8.2000000000000000e+02" id="9"/>
            <partRailML s="9.2000000000000000e+02" id="10"/>
        </userData>

 

Can you suggest me a regex that gives back this?

It's like anything between <userData and </userData>

Why use a regex, specifically, and not the XML transformers in FME?


^<userData\\s+(.*)<\\/userData>$

Depending on Transformer, return the middle string by referencing the first bracketed expression by asking FME to return string \\1

You didn't request it, but the \\s+ strips out any leading spaces. If want them in the result, remove this part of the Regex.


I agree with @david_r here: If you have an XML file, try tackling it with the XML Reader(s) and Transformers, before resorting to 'external' tools like RegEx and Python. RegEx is very powerful, but also known to take up a lot of memory, in particular when you're using a catch-all like .* as suggested by @bwn


^<userData\\s+(.*)<\\/userData>$

Depending on Transformer, return the middle string by referencing the first bracketed expression by asking FME to return string \\1

You didn't request it, but the \\s+ strips out any leading spaces. If want them in the result, remove this part of the Regex.

Will the regex still work if the "<userData>" block has been indented? Or if there are trailing spaces or other tags directly after the closing "</userData>" tag?


@gylona if you are able to include the full XML file then someone on the community might be able to illustrate how to use XMLFlattener or XMLFragmenter to extract what you need.


Will the regex still work if the "<userData>" block has been indented? Or if there are trailing spaces or other tags directly after the closing "</userData>" tag?

Nope, it only works for examples of strings following a similar template to the one posted and presumes the original XML values have been extracted into this form.


Reply