Skip to main content

Hi FME'er,

I have CSV data that is stored in a table in Oracle. The data is created using a stored procedure and is stored in a single field with one row per line. I would like to read and parse this data as a CSV file into FME for further processing. I figured I would start by aggregating the field (attribute) into a single row but don't know where to go from there. I can't use the AttributeSplitter because it uses " qualifiers around data that has commas.

 

Many Thanks,

David

I'm thinking that a PythonCaller with the CSV module is perfect for this, it already has all the necessary mechanisms for dealing with lots of edge cases like quotations, newlines, etc.

Try something like:

from fmeobjects import *import csv class ParseCSVString(object):    def input(self, feature):        csv_string =  feature.getAttribute('CSV_LINE') or '']        csv_parser = csv.reader(csv_string)        for record in csv_parser:            f = feature.clone()            for n, value in enumerate(record):                f.setAttribute('value{%s}' % n, value)            self.pyoutput(f)

Expose the attribute list "value{}" in the PythonCaller.

csv-line.fmwt


You can simple read this field. You would then get an attribute with the csv in it.

Separate/split by newline (texteditor->specialcharacters->newline). Explode list.

Connect a datainspector transformer.

(If fieldname row has different qualifers, separate it by choosing _element_index = 0, to treat is separately. Else, no need to.)

Export the fieldnames list to a txt list by copying it from the feature information window in the Data inspector. (select the list{} attribute names and the attribute values with "copy text with indentation" and save txt file.

If there is no attribute value containing a comma, split by comma, else first use stringreplacer to replace by for instance a backslash (again via texteditor or copying a backslash form another file). Then a stringreplacer to remove the qualifiers, use regexp so you can get the first and last (as to not remove non qualifying ') by regexp= ^'|'$

Now u can split by backslash (or whatever you chose).

Use a renamer to rename the _list{} attributes using the import function. (a addition in recent fme versions I am very happy about)


You can simple read this field. You would then get an attribute with the csv in it.

Separate/split by newline (texteditor->specialcharacters->newline). Explode list.

Connect a datainspector transformer.

(If fieldname row has different qualifers, separate it by choosing _element_index = 0, to treat is separately. Else, no need to.)

Export the fieldnames list to a txt list by copying it from the feature information window in the Data inspector. (select the list{} attribute names and the attribute values with "copy text with indentation" and save txt file.

If there is no attribute value containing a comma, split by comma, else first use stringreplacer to replace by for instance a backslash (again via texteditor or copying a backslash form another file). Then a stringreplacer to remove the qualifiers, use regexp so you can get the first and last (as to not remove non qualifying ') by regexp= ^'|'$

Now u can split by backslash (or whatever you chose).

Use a renamer to rename the _list{} attributes using the import function. (a addition in recent fme versions I am very happy about)

 

I'm assuming the csv is stored in a single record.

 

Else first part (splitting by newline) is not necessary.

 


I'm thinking that a PythonCaller with the CSV module is perfect for this, it already has all the necessary mechanisms for dealing with lots of edge cases like quotations, newlines, etc.

Try something like:

from fmeobjects import *import csv class ParseCSVString(object):    def input(self, feature):        csv_string =  feature.getAttribute('CSV_LINE') or '']        csv_parser = csv.reader(csv_string)        for record in csv_parser:            f = feature.clone()            for n, value in enumerate(record):                f.setAttribute('value{%s}' % n, value)            self.pyoutput(f)

Expose the attribute list "value{}" in the PythonCaller.

csv-line.fmwt

Works perfectly. Thanks. 

 

 


Reply