Solved

Parsing file

5 years ago
November 4, 2019
4 replies
84 views

+17

itay
Supporter
1441 replies

Hi there,

I have a text file containing 1 line with a length of ca. 40 billion characters. I need to parse the line into seperate lines, each 256 characters long.

I have tried multiple configurations looping with the AttributeSplitter (to parse the lines into a list and write it ) in conjunction with the StringReplacer (to erase the characters parsed) but I keep hitting resources limitations, mainly memory.

I am running it on a 64bit OS machine with 16Gb RAM, but apparently that is not enough.

Has anybody done something similar and can share some insights on how to keep the memory eating capacity of FME down to the minimum?

Is there a better way to parse the file, than standard FME transformers?

Any help is appreciated.

Itay

Best answer by erik_jan

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

View original

Did this help you find an answer to your question?

+18

erik_jan
Contributor
2181 replies
Best Answer
5 years ago
November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

+17

itay
Author
Supporter
1441 replies
5 years ago
November 4, 2019

erik_jan wrote:

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

+18

erik_jan
Contributor
2181 replies
5 years ago
November 4, 2019

itay wrote:

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

+17

itay
Author
Supporter
1441 replies
5 years ago
November 4, 2019

erik_jan wrote:

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

I did try something similar not very successfully.

Now thanks to your pointers I did manage to parse the data in two steps

First by calculating the delimiter needed (bookmark below) and using it to parse it into smaller pieces that are read back and parsed to 256 characters.

All in 15 seconds !

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Parsing file