Solved

Parsing file

Forum|Forum|6 years ago
November 4, 2019
4 replies
104 views

+18

itay
Supporter
1442 replies

Hi there,

I have a text file containing 1 line with a length of ca. 40 billion characters. I need to parse the line into seperate lines, each 256 characters long.

I have tried multiple configurations looping with the AttributeSplitter (to parse the lines into a list and write it ) in conjunction with the StringReplacer (to erase the characters parsed) but I keep hitting resources limitations, mainly memory.

I am running it on a 64bit OS machine with 16Gb RAM, but apparently that is not enough.

Has anybody done something similar and can share some insights on how to keep the memory eating capacity of FME down to the minimum?

Is there a better way to parse the file, than standard FME transformers?

Any help is appreciated.

Itay

Best answer by erik_jan

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

+22

erik_jan
Contributor
2179 replies
Best Answer
Forum|Forum|6 years ago
November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

+18

itay
Author
Supporter
1442 replies
Forum|Forum|6 years ago
November 4, 2019

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

+22

erik_jan
Contributor
2179 replies
Forum|Forum|6 years ago
November 4, 2019

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

+18

itay
Author
Supporter
1442 replies
Forum|Forum|6 years ago
November 4, 2019

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

I did try something similar not very successfully.

Now thanks to your pointers I did manage to parse the data in two steps

First by calculating the delimiter needed (bookmark below) and using it to parse it into smaller pieces that are read back and parsed to 256 characters.

All in 15 seconds !

Parsing file

4 replies

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded