Question

How to divide an Ascii-file into records with a fixlength of 256 characters

  • 16 November 2016
  • 6 replies
  • 1 view

Badge +2

Hello FME experts,

I have an Ascii file with a size of 75Mb. In this file there are several records with a fix recordlength of 256 characters.

 

There are no dividers such as ';' or ','.

 

Every record has its's own recordidentifier on the firtst 2 positions followed bij the city-code (4) like this: (00 is the begining of the ascii file and ends withe record 98, I have manipulated the data)

000413Albrondabeach 201604200000000000OBeaTax main branch 04 1020 20041300007572River Kopijnhoven 00034 8162PV 0000012210G00000000000000000000000000000020140101000N201601012016010119900EUR00 20041300000759River Denelk 00053 5461kB 0000000010G00000000000000000000000000000020140101000N201601012016010112303EUR00 20041300000248River Anfoony van Hobokenlns 00034 9161DR 0000000010G00000000000000000000000000000020140101000N201501012016010110574EUR00 20041300000703River Ty Breuk 00035 9171JH 0000000010G00000000000000000000000000000020140101000N201501012016010112272EUR00 20041301204086Boorkuhaal Pleedeern 00096 7191FN 0000044710G00000000000000000000000000000020140101000N201601012016010120616EUR00 20041301233026River Jadeflie 00016 2562lA 0000024010G00000000000000000000000000000020140101000N201601012016010115144EUR00

 

...

 

...

I received from a friend a solution for the programm GeoKettle.

 

Here the accii file is divided into portions of 256 characters with the help of FOLD.EXE with the following statement: ...GeoKettle\\coreutils-5.3.0-bin\\bin\\fold -w 256 (printscreen 1).

 

The result is put in a seperate textfile fold_Ascii.org.

 

This text file is later divided into several records depending on the recordidentifier and written into an Oracle database (printscreen 2).

How can I do the same in FME?

 

Thank you for your response.

 

Perry

 


6 replies

Userlevel 4

You can use the TextFile reader (read whole file at once = Yes) followed by a PythonCaller:

import fme
import fmeobjects

class FeatureProcessor(object):
    def __init__(self):
        self.split_at = 256
        
    def input(self,feature):
        text = feature.getAttribute('text_line_data')
        if text:
            chunks = [text[i:i+self.split_at] for i in range(0, len(text), self.split_at)]
            for chunk in chunks:
                f = fmeobjects.FMEFeature()
                f.setAttribute('chunk', chunk)
                self.pyoutput(f)

This will output one feature with the attribute "chunk" for each chunk of 256 characters inside the attribute "text_line_data".

Badge +2

@david_r, thank you for your quick response. I'm a newby and not familiar with Python. I have made a workbench and i can see in the inspector a chunk(string) with indeed data of 256 char. The workbench produced 155924 rows so it looks ok. How can I grap these chunks and place them in a separate attribute? or place them in an ORA-file. I have tried with substr, testfilter and attributecreator but that does not work. Can you help me?

Userlevel 4

Can you describe in more detail what you need to do with each chunk? Do you have to split it up further?

Badge +2

@david_r,

Hi David, Depending on the recordidentifier (= the first 2 positions of the spit-records) I have to save the chunks in corresponding tables in an oracle database. In the enclosure I made a print, it mayrecords.jpg be helpful to understand what I mean. So all the chunks beginning with i.e. 20 must be saved in the table WTM_STUFTAX20 as separate records and attributes.
Userlevel 4

You can use the SubstringExtractor to extract the first two positions, then a Tester or TestFilter to split the records accordingly.

Finally, create an Oracle writer and import the appropriate table definition and send your features to it.

Badge +2

@david_r, It works! I forgot to expose the attribute 'chunk' in the PythonCaller-parameters. Thank you very much for your time and solution.

Reply