Question

How to use regex to turn a single line text file (with 27,000 bank records) into 27,000 line csv file?


Badge
I have a large text file produced on a Unix system containing bank records for around 27,000 entries. The sample file attached shows a small snippet of how the file is structured - which is all on a single line. Each entry contains a 31 digit number. In a text editor I simply do a regex search for [0-9]{31} and replace the matches with the matched text plus a line feed character in front of it. Thus the text file gets restructured as per the screen dump with one record per line.

 

Is there a way of achieving this in FME? The regex stuff seems a bit clunky. I worked out that the stringsearcher transformer will only pick up the first occurrence of a match. A transformer from the Store is available called RegularExpressionMatcher is available which finds all occurrences and sticks them into a list. Where I go from there is what I could do with some help with please.

 

Thanks

 

James

18 replies

Badge +2

Hi,

On first look, I'm able to do it with stringReplacer but the result I'm able to see in inspector but not in notepad (copy & paste the contents will do in notepad).

Badge +2

Hi,

On first look, I'm able to do it with stringReplacer but the result I'm able to see in inspector but not in notepad (copy & paste the contents will do in notepad).

Userlevel 4
Badge +13
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

Userlevel 4
Badge +13
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

I've uploaded my workspace as well.

stringreplace2016.fmw

Userlevel 2
Badge +17
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

Good to hear that the Regular Expression Editor will be introduced. I tried it with b16118, it's really convenient. Thanks for the enhancement!

Userlevel 2
Badge +17
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

by the way, why doesn't this title appear in the All Question page?

Badge
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

Dale, perfect thank you. I tried the 2016 beta this morning and that new regex engine is great. When can we get our hands on 2016 as this will save us a lot of time processing these bank files? Attached screen dump of my finished workflow. I also got an error generated in the Inspector - the schema was blank for each record and only when you click on the record did it's attributes appear in the details panel.

regards James

Badge
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

...just while I think about it....it would be useful if you had the option to populate the Test String box in the regex dialogue with a sample of the text from the attribute being searched. Otherwise you have to manually navigate to the file you're processing, open it and then do a cut and paste operation. Not a big deal, but would save a bit of time.

Userlevel 2
Badge +17
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

I also found the same error a few days ago and reported it to Safe support. Looks like it's a bug in build 16118. FME 2016 is in the beta stage, I believe that it will be resolved before the release.

Regarding the main subject, the same data flow should work in FME 2015, I think.

Badge

 

If you don't want to wait for 2016, or use the beta, it is possible to do this with any earlier version too.

 

 

The below shows the workspace. The key is to replace with a delimiter that's formed of a unique character (or set of them) that is guaranteed not to appear in the text file (I used "###"). Then you use the Attribute Splitter to split based on that. After that it's a simple List Exploder and there you have it.

 

 

 

Badge
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

@takashi Yes you are correct. I just tried it this morning in 2015 and it does indeed work. The trick I guess is knowing more about the regex syntax (would certainly be useful for some more worked examples in the FME documentation) and knowing that if you wrap ([0-9]{31}) in outer brackets you just need to to \\1 to replace the text the regex has found.

Userlevel 2
Badge +17
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

In this case, it's not essential to split the text. The StringReplacer in this screenshot replaces the 31 digits with "newline + 31 digits" and removes whitespaces leading the digits simultaneously. FYI.

Userlevel 4
Badge +13

Dale, perfect thank you. I tried the 2016 beta this morning and that new regex engine is great. When can we get our hands on 2016 as this will save us a lot of time processing these bank files? Attached screen dump of my finished workflow. I also got an error generated in the Inspector - the schema was blank for each record and only when you click on the record did it's attributes appear in the details panel.

regards James

FME 2016 will release in mid Jan 2016, but as Takashi found out -- our old Regular Expression processor did the trick anyway so you won't need it.

Userlevel 4
Badge +13

...just while I think about it....it would be useful if you had the option to populate the Test String box in the regex dialogue with a sample of the text from the attribute being searched. Otherwise you have to manually navigate to the file you're processing, open it and then do a cut and paste operation. Not a big deal, but would save a bit of time.

Yes, being able to seed that text box would be very helpful.

Badge
So this was an interesting one. Using FME 2016, which has a new Regular Expression engine and editor, I used exactly the original regular expression [0-9]{31} but in addition added ( ) around it.

 

That allowed me then in the Replacement Text to (using the advanced text editor) insert a Carriage Return and Line Feed before each occurrence of the match (which gets put back in the replacement text via \\1 ) See second screen grab below.

 

 

See http://docs.safe.com/fme/2016.0/html/FME_Desktop_D... for more details.

 

 

Note that I further piped this result into an AttributeSplitter and asked to Split on the same Carriage Return / Line Feed and got myself a list of all the records. I could then to a ListExploder to get a feature per record. And then further use an AttributeSplitter to grab the parts provided they were fixed format FORTRAN style.

 

 

Fun example, and good chance to show off the new Regular Expression stuff in FME 2016.

 

 

 

 

@takashi Unfortunately that doesn't work for me but useful info nonetheless thank you. I only posted a small example snippet of the file as it's sensitive data and the real one contains bank records. In the actual file, each record is 100 bytes long. Usually we have to issue some nasty 'dd' unix command to unblock the file into new lines, which was the point of this FME process to do it using regex. Some records take up the whole 100 bytes (characters) in length and therefore there is no whitespace between the end of one record and the beginning of another at which point your example breaks.

On another note (not connected with this thread at all), could you post a sample workspace which demonstrates how you move files from one directory to another as you describe in your blog post. I can't quite get my head around your description of how all the different bits of the file path and names etc get mapped to the filecopy writer. Thanks

Userlevel 2
Badge +17

OK, I uploaded a demo workspace (template including test files) here.

Hope this helps. If you have any other questions about filecopy, please post new question to the Community ;)

Userlevel 4
Hi

 

 

Just for the fun of it, here's a solution using a PythonCreator and the Python regular expression module:

 

 

import fmeobjects
import re
class SplitString(object):
    def __init__(self):
        self.split_expression = re.compile('[ ](?=[0-9]{31})')
    def close(self):
        input_file = FME_MacroValues['INPUT_FILE']
        with open(input_file) as f:
            lines = f.readlines()
            for line in lines:
                records = self.split_expression.split(line)
                for record in records:
                    feature = fmeobjects.FMEFeature()
                    feature.setAttribute('banking_record', record.strip())
                    self.pyoutput(feature)

 

It expects a published parameter "INPUT_FILE" that points to your banking records file. It will spit out one feature per record found in your file, with an attribute "banking_record" containing the contents of each record. It should work with FME 2013 and up.

 

 

0684Q00000ArLHJQA3.png

 

 

David
Badge
@takashi

 

Many thanks...got that working now.

Reply