Skip to main content
Solved

Parse CSV data stored in database table

  • September 14, 2018
  • 4 replies
  • 117 views

djmcdermott
Contributor
Forum|alt.badge.img+6

Hi FME'er,

I have CSV data that is stored in a table in Oracle. The data is created using a stored procedure and is stored in a single field with one row per line. I would like to read and parse this data as a CSV file into FME for further processing. I figured I would start by aggregating the field (attribute) into a single row but don't know where to go from there. I can't use the AttributeSplitter because it uses " qualifiers around data that has commas.

 

Many Thanks,

David

Best answer by david_r

I'm thinking that a PythonCaller with the CSV module is perfect for this, it already has all the necessary mechanisms for dealing with lots of edge cases like quotations, newlines, etc.

Try something like:

from fmeobjects import *import csv class ParseCSVString(object):    def input(self, feature):        csv_string = [feature.getAttribute('CSV_LINE'or '']        csv_parser = csv.reader(csv_string)        for record in csv_parser:            f = feature.clone()            for n, value in enumerate(record):                f.setAttribute('value{%s}' % n, value)            self.pyoutput(f)

Expose the attribute list "value{}" in the PythonCaller.

csv-line.fmwt

View original
Did this help you find an answer to your question?

4 replies

david_r
Evangelist
  • Best Answer
  • September 14, 2018

I'm thinking that a PythonCaller with the CSV module is perfect for this, it already has all the necessary mechanisms for dealing with lots of edge cases like quotations, newlines, etc.

Try something like:

from fmeobjects import *import csv class ParseCSVString(object):    def input(self, feature):        csv_string = [feature.getAttribute('CSV_LINE'or '']        csv_parser = csv.reader(csv_string)        for record in csv_parser:            f = feature.clone()            for n, value in enumerate(record):                f.setAttribute('value{%s}' % n, value)            self.pyoutput(f)

Expose the attribute list "value{}" in the PythonCaller.

csv-line.fmwt


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 14, 2018

You can simple read this field. You would then get an attribute with the csv in it.

Separate/split by newline (texteditor->specialcharacters->newline). Explode list.

Connect a datainspector transformer.

(If fieldname row has different qualifers, separate it by choosing _element_index = 0, to treat is separately. Else, no need to.)

Export the fieldnames list to a txt list by copying it from the feature information window in the Data inspector. (select the list{} attribute names and the attribute values with "copy text with indentation" and save txt file.

If there is no attribute value containing a comma, split by comma, else first use stringreplacer to replace by for instance a backslash (again via texteditor or copying a backslash form another file). Then a stringreplacer to remove the qualifiers, use regexp so you can get the first and last (as to not remove non qualifying ') by regexp= ^'|'$

Now u can split by backslash (or whatever you chose).

Use a renamer to rename the _list{} attributes using the import function. (a addition in recent fme versions I am very happy about)


gio
Contributor
Forum|alt.badge.img+15
  • Contributor
  • September 14, 2018
gio wrote:

You can simple read this field. You would then get an attribute with the csv in it.

Separate/split by newline (texteditor->specialcharacters->newline). Explode list.

Connect a datainspector transformer.

(If fieldname row has different qualifers, separate it by choosing _element_index = 0, to treat is separately. Else, no need to.)

Export the fieldnames list to a txt list by copying it from the feature information window in the Data inspector. (select the list{} attribute names and the attribute values with "copy text with indentation" and save txt file.

If there is no attribute value containing a comma, split by comma, else first use stringreplacer to replace by for instance a backslash (again via texteditor or copying a backslash form another file). Then a stringreplacer to remove the qualifiers, use regexp so you can get the first and last (as to not remove non qualifying ') by regexp= ^'|'$

Now u can split by backslash (or whatever you chose).

Use a renamer to rename the _list{} attributes using the import function. (a addition in recent fme versions I am very happy about)

 

I'm assuming the csv is stored in a single record.

 

Else first part (splitting by newline) is not necessary.

 


djmcdermott
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • September 14, 2018
david_r wrote:

I'm thinking that a PythonCaller with the CSV module is perfect for this, it already has all the necessary mechanisms for dealing with lots of edge cases like quotations, newlines, etc.

Try something like:

from fmeobjects import *import csv class ParseCSVString(object):    def input(self, feature):        csv_string = [feature.getAttribute('CSV_LINE'or '']        csv_parser = csv.reader(csv_string)        for record in csv_parser:            f = feature.clone()            for n, value in enumerate(record):                f.setAttribute('value{%s}' % n, value)            self.pyoutput(f)

Expose the attribute list "value{}" in the PythonCaller.

csv-line.fmwt

Works perfectly. Thanks. 

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings