Question

StringSearcher regex \\r\\n

  • 26 September 2017
  • 12 replies
  • 21 views

Badge +1

I want to find each line of the string below with a regular expression in StringSearcher that looks like this .*\\r\\n, i.e., I expect each line to end up in the all matches list produced by the StringSearcher.

METER_NUM,PRODUCTION_DATE,FLOW_TIME_MINUTES,VOLUME,ENERGY,DIFF_PRESS,STATIC_PRESS,FLOW_TEMP,MOL_CO2,ENERGY_FACTOR

1006-2001307,20170829,132.283,13.8851,,9.45286000000001,1088.28,83.9825000000001,,

1006-2001283,20170829,1440,311.974,,27.9496,162.085,88.4964,,

It works in my regex testing tool Expresso, but StringSearcher doesn't seem to recognize \\r\\n.


12 replies

Badge

Hi @deanrother

please try ^.*\\n as StringSearcher regular expression. You can always experiment using StringSearcher RedEx Editor (please check Open RegEx Editor... for Contains Regular Expression parameter).

Would you like to split your source value into parts? If yes, you might also want to take a look at AttributeSplitter.

Badge +1

Lena, this doesn't work for me. It matches everything (fig. 1). The output displayed by the Inspector bears this out (fig. 2).

I want each line as a list item, e.g.,

_list{0}.value = METER_NUM,PRODUCTION_DATE,FLOW_TIME_MINUTES,VOLUME,ENERGY,DIFF_PRESS,STATIC_PRESS,FLOW_TEMP,MOL_CO2,ENERGY_FACTOR

_list{1}.value = 1006-2001307,20170829,132.283,13.8851,,9.45286000000001,1088.28,83.9825000000001,,

_list{2}.value = 1006-2001283,20170829,1440,311.974,,27.9496,162.085,88.4964,,

 

Figure 1

Figure 2

Here are my thoughts:

Somewhere in version 2016 or 2017 the functionality for .* changed (I struggled with it myself for a fair amount of time). It now catches newlines as well, while before it didn't (like your Espresso I guess).

So a solution could be making the .* non-greedy, that is adding a question mark to it.

The complete regex then would be '.*?\\r\\n' (without the quotes)

Hope this helps.

Badge +1

Here are my thoughts:

Somewhere in version 2016 or 2017 the functionality for .* changed (I struggled with it myself for a fair amount of time). It now catches newlines as well, while before it didn't (like your Espresso I guess).

So a solution could be making the .* non-greedy, that is adding a question mark to it.

The complete regex then would be '.*?\\r\\n' (without the quotes)

Hope this helps.

Thanks, but that didn't work, didn't match anything. I'm on 2017.0. I'll have to try 2017.<newest> to see how that works.
Userlevel 2
Badge +16

Just an alternative thought:

If the source is a text file (and csv is), why not use the Text reader and read by line.

If you want the lines in a list, you can follow the reader by an Aggregator and aggregate the attributes in a list.

If the source is a text attribute, you can follow @LenaAtSafe using the AttributeSplitter and split on \\n

In any case, if you want all lines to be separated, I would not use the StringSearcher and regex.

Userlevel 4
Badge +30
Hi @deanrother,

 

I'm use the FME version 2017 and i haved this result:

 

That this what you want?

 

StringSearcher: ([^ ]* +)

 

 

Attached your workspace edited.

 

 

Thanks, - workspace-stringsearcher.fmw

 

Danilo
Userlevel 4
Badge +30
Hi @deanrother,

 

I'm use the FME version 2017 and i haved this result:

 

That this what you want?

 

StringSearcher: ([^ ]* +)

 

 

Attached your workspace edited.

 

 

Thanks, - workspace-stringsearcher.fmw

 

Danilo
Badge +1

Just an alternative thought:

If the source is a text file (and csv is), why not use the Text reader and read by line.

If you want the lines in a list, you can follow the reader by an Aggregator and aggregate the attributes in a list.

If the source is a text attribute, you can follow @LenaAtSafe using the AttributeSplitter and split on \\n

In any case, if you want all lines to be separated, I would not use the StringSearcher and regex.

FTPCaller dumps the file into an attribute as a string. So, I need to parse that string and the first thing I want to do is break the lines apart.

 

 

Badge

Lena, this doesn't work for me. It matches everything (fig. 1). The output displayed by the Inspector bears this out (fig. 2).

I want each line as a list item, e.g.,

_list{0}.value = METER_NUM,PRODUCTION_DATE,FLOW_TIME_MINUTES,VOLUME,ENERGY,DIFF_PRESS,STATIC_PRESS,FLOW_TEMP,MOL_CO2,ENERGY_FACTOR

_list{1}.value = 1006-2001307,20170829,132.283,13.8851,,9.45286000000001,1088.28,83.9825000000001,,

_list{2}.value = 1006-2001283,20170829,1440,311.974,,27.9496,162.085,88.4964,,

 

Figure 1

Figure 2

. means any character, including \\n and \\r. So, your regex means starting from the beginning of the strings, any number of any characters, followed by <new line> - which is exactly the whole source string.

 

Could you please try AttributeSplitter? This would be your #1 choice for the task.

 

For StringSearcher one of the regex options would be [a-zA-Z0-9_.,-]*\\n . If you need to split the sample string into three strings, your regex should not start with ^ as this would automatically exclude second and third parts.

 

 

Badge +1
Hi @deanrother,

 

I'm use the FME version 2017 and i haved this result:

 

That this what you want?

 

StringSearcher: ([^ ]* +)

 

 

Attached your workspace edited.

 

 

Thanks, - workspace-stringsearcher.fmw

 

Danilo
That worked. Thanks!

 

 

Userlevel 4
Badge +30
That worked. Thanks!

 

 

Perfect @deanrother . I'm happy to help you. :)

 

Userlevel 2
Badge +17

Hi @deanrother, in this case, I would use the AttributeSplitter (Delimiter: <newline character>) as @LenaAtSafe suggested at first. One of the reasons is that there could be a case where the text would not end with newline. Other advantages of the AttributeSplitter are, you can trim leading and/or trailing spaces in the split string, and also remove empty lines optionally.

I also like @erik_jan's suggestion - read the text with the Text File reader. You can download the text to a temporary file and then read it with the FeatureReader. The workflow would consist of the TempPathnameCreator, FTPCaller (Transfer Type: Download to a File), and FeatureReader (Format: Text File) connected in series.

Reply