Skip to main content

Hi there,

I have a text file containing 1 line with a length of ca. 40 billion characters. I need to parse the line into seperate lines, each 256 characters long.

I have tried multiple configurations looping with the AttributeSplitter (to parse the lines into a list and write it ) in conjunction with the StringReplacer (to erase the characters parsed) but I keep hitting resources limitations, mainly memory.

I am running it on a 64bit OS machine with 16Gb RAM, but apparently that is not enough.

Has anybody done something similar and can share some insights on how to keep the memory eating capacity of FME down to the minimum?

Is there a better way to parse the file, than standard FME transformers?

Any help is appreciated.

Itay

Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.


Hi Itay,

I tried this with a file containing string of over 1 million characters and it succeeded to parse in 4.4 seconds.

And using 10 million characters and a text writer took 2.6 seconds:

Give it a try, hope it works.

Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.


Hoi @erik_jan,

Thanks, I have tried it but unfortunately I am getting an error (insufficient memory)

I think my machine resources are not enough. Trying to hold all these features (150000) in memory is too much apparently.

Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?


Have you tried using 65,536? (256*256) to split the file in multiple smaller files firs and then use the same routine on the smaller files (using Directory and file path reader and WorkspaceRunner)?

I did try something similar not very successfully.

Now thanks to your pointers I did manage to parse the data in two steps

First by calculating the delimiter needed (bookmark below) and using it to parse it into smaller pieces that are read back and parsed to 256 characters.

All in 15 seconds !