Question

capturing code block with new regular expression editor


Badge

I am using the 16.1 MacOS version and the regular expression editor. I would like to capture the following code block example:

type yellow {

yellow green

green yellow

sunshine yellow

}

In the regex editor I can capture "type yellow {" with this regex (type.*) but because everything else is after a line break I cannot go farther. Or I am ok with ({.*}) which works if it is all on a single line but the option \\s (which adds the multi-string or single-line-mode) in some flavors of regex is not working except to match an extra space. I have also tried "\\s\\S" but that does not work either. Is there another method to match everything including line break/newline characters between the brackets?


3 replies

Userlevel 1
Badge +21

Not familiar with the FME version you are using but this regex works for me in FME

type yellow \\{(\\s*?.*?)*?\\}

*Edit: this works in the stringsearcher with the code block being contained in the matched characters attribute. I've assumed there are more than one grouping of curly brackets in the text you are searching within.

Userlevel 4

Hi

Try the following regex:

{((.|\n|\r)*)}

It will match all charachters including Carriage Return and Line Feed inside the brackets. The "\r" might not be necessary, but it depends on a few factors so I've included it for the sake of completeness.

The "_first_match" attribute will contain the brackets and everything between them.

You can also supply a list name for "Subexpression matches list name" and you will get a list item that contains only what's between the brackets, without the brackets themselves.

David

Badge +3

wihtout newline-sensitive matching (i believe this is standard) a dot matches also a newline (CR LF or \\n or \\r\\n)

So without wihtout newline-sensitive matching:

^type.*$ OR ^type.*\\r OR ^type.*\\r\\n$ OR ^type.*\\n

will find all even the newline after the closing bracket.

This wil find all up and including the last closing bracket

^type.*\\r (so it will not grab the ending CR LF)

(based on your input string)

Reply