Skip to main content

Hi

 

So I have a text document and I need to parse into specific headings and then extract that heading and all the sub text. The issue is the subtext is lettered. Is there a regex syntax that will continue to extraxt until it finds a new match? any ideas?

 

Im using string searcher and have managed to get all this data as seperate lines, just not sure if its easier to try and find a way to put the data back together or if theres a more efficient way to parse

Current using syntax: chicken{1}\\s|(.

 

Example: I need every line that starts with "chicken" the chickens name and all the numbered lines below

 

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Rene

(1) lays three eggs a day

(2) escaped two weeks ago

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Stacey

(1) etc

and so on

 

 

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list


If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

I've never used this transformer. It doesnt seem to be doing anything diffrent. I get the same list I was getting with string searcher. Can you provide an example of how I would set this up Please and thank you.

 

Edit: I see the issue, the original text is already broken up by line, so it not a single document. I aggregated it into a single document but the splitter still does not work, the way I was expecting

 


@tmtech

 

You can use Stringsearcher (possibly in combination with a Variable Setter/Retriever).

 

Regexp for the chickens (If the data is a string with newlines: basically has only one "end of line" ).

Chicken: (\\w+)(?=\\n)

For all lines with "chicken properties"

(?=\\()\\((\\d)\\)+.*(?:(?=\\n)|$)

 

 

 


I've never used this transformer. It doesnt seem to be doing anything diffrent. I get the same list I was getting with string searcher. Can you provide an example of how I would set this up Please and thank you.

 

Edit: I see the issue, the original text is already broken up by line, so it not a single document. I aggregated it into a single document but the splitter still does not work, the way I was expecting

 

chicken.fmw


I was doing it properly, I just wasnt exploding the list, adding the exploder let me see the results, this worked thank you!!


Concerning (?=\\()\\((\\d)\\)+.*(?:(?=\\n)|$)

 

In the regexp i use a escape character (bold) on the bracket.

(?=\\() The "\\" is used as an escape character for the "("

This seems not to be processed in the stringsearcher.

In Rubular:

Perfect!

However...

When i enter it in the attribute creator using a tcl regexp. It tells us this:

 

The regexp is ok but escaping the opening bracket seems to confuse fme engine(?)

 

 

 


Reply