Skip to main content
Solved

Using regex select a full single line of text before a match

  • June 4, 2020
  • 5 replies
  • 1426 views

sfb
Contributor
Forum|alt.badge.img+3
  • Contributor
  • 7 replies

I have a block of text separated by newline characters, e.g:

Some Text

Some More Text

Even More Text

PERIOD: 01/01/1990 TO 12/12/2020

What I'm attempting to do using regex is grab the entire line of text preceding the row beginning with PERIOD (i.e. "Even More Text"). In an online regex editor, the following expression successfully returns just the line containing "Even More Text":

^.*$(?=\\nPERIOD)

However, when I attempt to do the same in FME, it returns all lines above PERIOD. It seems as though in online editors the . includes all characters except newlines, whereas in FME it includes them? Is there a way to adjust multiline regex flags (or some other workaround) in FME to get the desired output?

Best answer by david_r

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

arnold_bijlsma
Enthusiast
Forum|alt.badge.img+15
  • Enthusiast
  • 125 replies
  • June 4, 2020

Quantifiers are by definition greedy, matching as much as possible. By putting a ? behind your asterisk, it makes the quantifier lazy, matching as little as possible.

I don't know if it'll work, but try

^.*?$(?=\nPERIOD)

david_r
Celebrity
  • 8391 replies
  • Best Answer
  • June 4, 2020

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:


ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3422 replies
  • June 4, 2020

You could use the regex to match everything but a newline

^[^\n]*(?=\nPERIOD)

But I would probably go with adjacent attribute mapping as mentioned by @david_r


sfb
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • 7 replies
  • June 5, 2020

You could use the regex to match everything but a newline

^[^\n]*(?=\nPERIOD)

But I would probably go with adjacent attribute mapping as mentioned by @david_r

Thank you. This regex returns the correct line I was after.


sfb
Contributor
Forum|alt.badge.img+3
  • Author
  • Contributor
  • 7 replies
  • June 5, 2020

Why not use the AttributeSplitter to split the block of text by line, then send it to the ListExploder to process each line at a time. You can then use e.g. the AttributeCreator and the Adjacent feature mode to retrieve the previous line from the one you're processing:

Thanks David. This is an elegant alternative to using regex. Though it requires a few more transformers it might be preferable doing it this way to make the workbenches more usable for work colleagues.