Skip to main content
Solved

associate subtext to the heading text

  • September 12, 2019
  • 6 replies
  • 6 views

Hi

 

So I have a text document and I need to parse into specific headings and then extract that heading and all the sub text. The issue is the subtext is lettered. Is there a regex syntax that will continue to extraxt until it finds a new match? any ideas?

 

Im using string searcher and have managed to get all this data as seperate lines, just not sure if its easier to try and find a way to put the data back together or if theres a more efficient way to parse

Current using syntax: chicken{1}\\s|(.

 

Example: I need every line that starts with "chicken" the chickens name and all the numbered lines below

 

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Rene

(1) lays three eggs a day

(2) escaped two weeks ago

TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Chicken: Stacey

(1) etc

and so on

 

 

Best answer by ebygomm

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

View original
Did this help you find an answer to your question?

6 replies

ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • Best Answer
  • September 12, 2019

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list


  • Author
  • September 12, 2019
ebygomm wrote:

If you have a single text document, could you read the text file as one feature, then use an attribute splitter with Chicken as your delimiter, then explode the list

I've never used this transformer. It doesnt seem to be doing anything diffrent. I get the same list I was getting with string searcher. Can you provide an example of how I would set this up Please and thank you.

 

Edit: I see the issue, the original text is already broken up by line, so it not a single document. I aggregated it into a single document but the splitter still does not work, the way I was expecting

 


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 12, 2019

@tmtech

 

You can use Stringsearcher (possibly in combination with a Variable Setter/Retriever).

 

Regexp for the chickens (If the data is a string with newlines: basically has only one "end of line" ).

Chicken: (\\w+)(?=\\n)

For all lines with "chicken properties"

(?=\\()\\((\\d)\\)+.*(?:(?=\\n)|$)

 

 

 


ebygomm
Influencer
Forum|alt.badge.img+31
  • Influencer
  • September 12, 2019
tmtech wrote:

I've never used this transformer. It doesnt seem to be doing anything diffrent. I get the same list I was getting with string searcher. Can you provide an example of how I would set this up Please and thank you.

 

Edit: I see the issue, the original text is already broken up by line, so it not a single document. I aggregated it into a single document but the splitter still does not work, the way I was expecting

 

chicken.fmw


  • Author
  • September 12, 2019
ebygomm wrote:

I was doing it properly, I just wasnt exploding the list, adding the exploder let me see the results, this worked thank you!!


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 12, 2019

Concerning (?=\\()\\((\\d)\\)+.*(?:(?=\\n)|$)

 

In the regexp i use a escape character (bold) on the bracket.

(?=\\() The "\\" is used as an escape character for the "("

This seems not to be processed in the stringsearcher.

In Rubular:

Perfect!

However...

When i enter it in the attribute creator using a tcl regexp. It tells us this:

 

The regexp is ok but escaping the opening bracket seems to confuse fme engine(?)

 

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings