Question

Text line aggregation

  • 28 October 2016
  • 7 replies
  • 18 views

Badge +10

Hi All,

I have a text file with a variable quantity of lines, separated by an empty line. The lines adjacent to one another need to be grouped together and then concatenated. So for example:

Test1

Test2

Test3

---

testy1

testy2

---

test

---

(where the hyphens are the blank lines, I can't format this page with return lines it seems)

Would need to become:

Test1 Test2 Test3

testy1 testy2

test

I've tried to do this a couple of ways so far, and variables seemed to be the right choice initially but now I'm not so sure. As I also have a format attribute on each line that contains the text_line_length, I was trying to use this as the delimiter when it was = to 0. It's interesting because the logic is a bit like using the PointConnector to create line geometries and what I need is to use the text_line_length as the 'connection break attribute'. Anyway, that's a digression. Does anyone have any simple tricks I could deploy to do this?

Thanks, Dave


7 replies

Badge +3

Hi Dave

What I would try:

  1. Read the file as text line.
  2. Trim the lines.
  3. Get rid of empty lines: simply by Attribute has a value / Attribute Is Empty String.
  4. StringSearcher: extract the digits from the end using the RegExp \\d+$ (\\d: digit, +: any number - of digits, $: at the end)
  5. Copy the original string to a new attrib
  6. With StringReplacer (or other way) replace the extracted digits from the copied attribute
  7. (Sort if you need to)
  8. Aggregator: group by this new attribute, concatenate the original string attrib.

Is this what you mean?

Cheers,

András

Badge +10

Ok, I persisted with Variables and sorted it out.

Test for a blank line, send this to the counter and then set a variable to the count value. On the failed port of the Tester, use a VariableRetriever to get the count value and glue it onto the features that pass through. Everytime a row with nothing on it is seen, the count ID increments. Then just use the Aggregator to group on the count ID and do the concatenation.

Thanks for listening, I helps to talk it through :0)

Userlevel 4
Badge +25

I had a different approach:

Replace empty lines with a character that won't occur in your dataset, then aggregate them all and specify that they should be concatenated, split on that character, explode the list, cleanup a bit et voila! Hope this helps.

 

concat-text.fmwt

Badge +10

I'd assign a value to the blank lines and then use the adjacent feature support in the attribute creator to increment a group number after every blank line, then aggregate/concatenate based on this value

Userlevel 2
Badge +17

Hi @1spatialdave, assuming that a newline is represented by two special characters - [CR] (\\r) and [LF] (\\n), this might also be a possible way.

  1. Text File reader: Read whole file at once.
  2. StringReplacer (1): Replace two consecutive newlines i.e. [CR][LF][CR][LF] with a specific special character e.g. [BEL] (\\a).
  3. StringReplacer (2): Replace every remaining newline with a white space.
  4. StringReplacer (3): Replace the special character [BEL] with a single newline.
Badge +3

@1spatialdave

If spaces are the only criteria...

..i'm pretty sure this question or similar has been sked before.

My workspaces-collection has bunches of variableretriever/setter variants..

Badge +10

The moral to the story is that the FME User community is awesome and... there's 14 ways to do anything in FME. Thanks all very much, neat solutions.

Reply