Skip to main content
Solved

Parsing file


itay
Supporter
Forum|alt.badge.img+16
  • Supporter

Hi there,

I have a text file containing 1 line with a length of ca. 40 billion characters. I need to parse the line into seperate lines, each 256 characters long.

I have tried multiple configurations looping with the AttributeSplitter (to parse the lines into a list and write it ) in conjunction with the StringReplacer (to erase the characters parsed) but I keep hitting resources limitations, mainly memory.

I am running it on a 64bit OS machine with 16Gb RAM, but apparently that is not enough.

Has anybody done something similar and can share some insights on how to keep the memory eating capacity of FME down to the minimum?

Is there a better way to parse the file, than standard FME transformers?

Any help is appreciated.

Itay

Best answer by erik_jan

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

View original
Did this help you find an answer to your question?

4 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • Best Answer
  • November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.


itay
Supporter
Forum|alt.badge.img+16
  • Author
  • Supporter
  • November 4, 2019
erik_jan wrote:

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.


erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • November 4, 2019
itay wrote:

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?


itay
Supporter
Forum|alt.badge.img+16
  • Author
  • Supporter
  • November 4, 2019
erik_jan wrote:

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

I did try something similar not very successfully.

Now thanks to your pointers I did manage to parse the data in two steps

First by calculating the delimiter needed (bookmark below) and using it to parse it into smaller pieces that are read back and parsed to 256 characters.

All in 15 seconds !


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings