Skip to main content
Solved

Using regex select a full single line of text before a match


sfb
Contributor
Forum|alt.badge.img+3
  • Contributor

I have a block of text separated by newline characters, e.g:

Some Text

Some More Text

Even More Text

PERIOD: 01/01/1990 TO 12/12/2020

What I'm attempting to do using regex is grab the entire line of text preceding the row beginning with PERIOD (i.e. "Even More Text"). In an online regex editor, the following expression successfully returns just the line containing "Even More Text":

^.*$(?=\\nPERIOD)

However, when I attempt to do the same in FME, it returns all lines above PERIOD. It seems as though in online editors the . includes all characters except newlines, whereas in FME it includes them? Is there a way to adjust multiline regex flags (or some other workaround) in FME to get the desired output?

Best answer by david_r

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:

View original
Did this help you find an answer to your question?

5 replies

arnold_bijlsma
Enthusiast
Forum|alt.badge.img+14

Quantifiers are by definition greedy, matching as much as possible. By putting a ? behind your asterisk, it makes the quantifier lazy, matching as little as possible.

I don't know if it'll work, but try

^.*?$(?=\nPERIOD)

david_r
Evangelist
  • Best Answer
  • June 4, 2020

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • June 4, 2020

You could use the regex to match everything but a newline

^[^\n]*(?=\nPERIOD)

But I would probably go with adjacent attribute mapping as mentioned by @david_r


sfb
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • June 5, 2020
ebygomm wrote:

You could use the regex to match everything but a newline

^[^\n]*(?=\nPERIOD)

But I would probably go with adjacent attribute mapping as mentioned by @david_r

Thank you. This regex returns the correct line I was after.


sfb
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • June 5, 2020
david_r wrote:

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:

Thanks David. This is an elegant alternative to using regex. Though it requires a few more transformers it might be preferable doing it this way to make the workbenches more usable for work colleagues.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings