Solved

associate subtext to the heading text

5 years ago
September 12, 2019
6 replies
6 views

tmtech
4 replies

So I have a text document and I need to parse into specific headings and then extract that heading and all the sub text. The issue is the subtext is lettered. Is there a regex syntax that will continue to extraxt until it finds a new match? any ideas?

Im using string searcher and have managed to get all this data as seperate lines, just not sure if its easier to try and find a way to put the data back together or if theres a more efficient way to parse

Current using syntax: chicken{1}\\s|(.

Example: I need every line that starts with "chicken" the chickens name and all the numbered lines below

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Rene

(1) lays three eggs a day

(2) escaped two weeks ago

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Stacey

(1) etc

and so on

Best answer by ebygomm

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

View original

Did this help you find an answer to your question?

+31

ebygomm
Influencer
3241 replies
Best Answer
5 years ago
September 12, 2019

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

tmtech
Author
4 replies
5 years ago
September 12, 2019

ebygomm wrote:

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

I've never used this transformer. It doesnt seem to be doing anything diffrent. I get the same list I was getting with string searcher. Can you provide an example of how I would set this up Please and thank you.

Edit: I see the issue, the original text is already broken up by line, so it not a single document. I aggregated it into a single document but the splitter still does not work, the way I was expecting

+15

gio
Contributor
2252 replies
5 years ago
September 12, 2019

@tmtech

You can use Stringsearcher (possibly in combination with a Variable Setter/Retriever).

Regexp for the chickens (If the data is a string with newlines: basically has only one "end of line" ).

Chicken: (\\w+)(?=\\n)

For all lines with "chicken properties"

(?=\$)\\((\\d)\$+.*(?:(?=\\n)|$)

+31

ebygomm
Influencer
3241 replies
5 years ago
September 12, 2019

tmtech wrote:

chicken.fmw

tmtech
Author
4 replies
5 years ago
September 12, 2019

ebygomm wrote:

chicken.fmw

I was doing it properly, I just wasnt exploding the list, adding the exploder let me see the results, this worked thank you!!

+15

gio
Contributor
2252 replies
5 years ago
September 12, 2019

Concerning (?=\$)\\((\\d)\$+.*(?:(?=\\n)|$)

In the regexp i use a escape character (bold) on the bracket.

(?=\\() The "\\" is used as an escape character for the "("

This seems not to be processed in the stringsearcher.

In Rubular:

Perfect!

However...

When i enter it in the attribute creator using a tcl regexp. It tells us this:

The regexp is ok but escaping the opening bracket seems to confuse fme engine(?)

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

associate subtext to the heading text