Skip to main content
Solved

Parsing file

  • November 4, 2019
  • 4 replies
  • 104 views

itay
Supporter
Forum|alt.badge.img+18
  • Supporter
  • 1442 replies

Hi there,

I have a text file containing 1 line with a length of ca. 40 billion characters. I need to parse the line into seperate lines, each 256 characters long.

I have tried multiple configurations looping with the AttributeSplitter (to parse the lines into a list and write it ) in conjunction with the StringReplacer (to erase the characters parsed) but I keep hitting resources limitations, mainly memory.

I am running it on a 64bit OS machine with 16Gb RAM, but apparently that is not enough.

Has anybody done something similar and can share some insights on how to keep the memory eating capacity of FME down to the minimum?

Is there a better way to parse the file, than standard FME transformers?

Any help is appreciated.

Itay

Best answer by erik_jan

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • Best Answer
  • November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.


itay
Supporter
Forum|alt.badge.img+18
  • Author
  • Supporter
  • 1442 replies
  • November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.


erik_jan
Contributor
Forum|alt.badge.img+22
  • Contributor
  • 2179 replies
  • November 4, 2019

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?


itay
Supporter
Forum|alt.badge.img+18
  • Author
  • Supporter
  • 1442 replies
  • November 4, 2019

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

I did try something similar not very successfully.

Now thanks to your pointers I did manage to parse the data in two steps

First by calculating the delimiter needed (bookmark below) and using it to parse it into smaller pieces that are read back and parsed to 256 characters.

All in 15 seconds !