Skip to main content

Hello FME experts,

I have an Ascii file with a size of 75Mb. In this file there are several records with a fix recordlength of 256 characters.

 

There are no dividers such as ';' or ','.

 

Every record has its's own recordidentifier on the firtst 2 positions followed bij the city-code (4) like this: (00 is the begining of the ascii file and ends withe record 98, I have manipulated the data)

000413Albrondabeach 201604200000000000OBeaTax main branch 04 1020 20041300007572River Kopijnhoven 00034 8162PV 0000012210G00000000000000000000000000000020140101000N201601012016010119900EUR00 20041300000759River Denelk 00053 5461kB 0000000010G00000000000000000000000000000020140101000N201601012016010112303EUR00 20041300000248River Anfoony van Hobokenlns 00034 9161DR 0000000010G00000000000000000000000000000020140101000N201501012016010110574EUR00 20041300000703River Ty Breuk 00035 9171JH 0000000010G00000000000000000000000000000020140101000N201501012016010112272EUR00 20041301204086Boorkuhaal Pleedeern 00096 7191FN 0000044710G00000000000000000000000000000020140101000N201601012016010120616EUR00 20041301233026River Jadeflie 00016 2562lA 0000024010G00000000000000000000000000000020140101000N201601012016010115144EUR00

 

...

 

...

I received from a friend a solution for the programm GeoKettle.

 

Here the accii file is divided into portions of 256 characters with the help of FOLD.EXE with the following statement: ...GeoKettle\\coreutils-5.3.0-bin\\bin\\fold -w 256 (printscreen 1).

 

The result is put in a seperate textfile fold_Ascii.org.

 

This text file is later divided into several records depending on the recordidentifier and written into an Oracle database (printscreen 2).

How can I do the same in FME?

 

Thank you for your response.

 

Perry

 

You can use the TextFile reader (read whole file at once = Yes) followed by a PythonCaller:

import fme
import fmeobjects

class FeatureProcessor(object):
    def __init__(self):
        self.split_at = 256
        
    def input(self,feature):
        text = feature.getAttribute('text_line_data')
        if text:
            chunks = etextai:i+self.split_at] for i in range(0, len(text), self.split_at)]
            for chunk in chunks:
                f = fmeobjects.FMEFeature()
                f.setAttribute('chunk', chunk)
                self.pyoutput(f)

This will output one feature with the attribute "chunk" for each chunk of 256 characters inside the attribute "text_line_data".


@david_r, thank you for your quick response. I'm a newby and not familiar with Python. I have made a workbench and i can see in the inspector a chunk(string) with indeed data of 256 char. The workbench produced 155924 rows so it looks ok. How can I grap these chunks and place them in a separate attribute? or place them in an ORA-file. I have tried with substr, testfilter and attributecreator but that does not work. Can you help me?


Can you describe in more detail what you need to do with each chunk? Do you have to split it up further?


@david_r,

Hi David, Depending on the recordidentifier (= the first 2 positions of the spit-records) I have to save the chunks in corresponding tables in an oracle database. In the enclosure I made a print, it mayrecords.jpg be helpful to understand what I mean. So all the chunks beginning with i.e. 20 must be saved in the table WTM_STUFTAX20 as separate records and attributes.

You can use the SubstringExtractor to extract the first two positions, then a Tester or TestFilter to split the records accordingly.

Finally, create an Oracle writer and import the appropriate table definition and send your features to it.


@david_r, It works! I forgot to expose the attribute 'chunk' in the PythonCaller-parameters. Thank you very much for your time and solution.


Reply