Skip to main content
Question

How to convert csv data in a single attribute to an attribute table?


Forum|alt.badge.img

I'm reading a CSV file from the web. The file is served encoded, so I first decode it, ending up with an attribute "_decoded", containing the csv data. I want to use this data to create a table (like the csv reader does), without writing the output to a file first. 

 

Sample format:

objid;xcoord;ycoord;name
21;123;456;house
22;125;458;tree

The solution suggested in the thread linked below doesn't work, throwing the error "Python Exception <Error>: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?", but I can't figure out where to stick the read mode (rU).

https://knowledge.safe.com/questions/78520/parse-csv-data-stored-in-database-table.html

 

10 replies

david_r
Celebrity
  • November 7, 2018

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

david_r
Celebrity
  • November 7, 2018
david_r wrote:

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Of course you'll need to manually remove all the Â characters littered about, it's a bug in the forum... :-(


erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • November 7, 2018

You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.


david_r
Celebrity
  • November 7, 2018
erik_jan wrote:

You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.

Agree, that's a very good alternative for those uneasy about maintaining Python code that they don't fully understand.


Forum|alt.badge.img
david_r wrote:

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Thanks, I had to use io.StringIO(csv_string) and import io (I don't even have python 3 installed?!) to get it to work. However this isn't exactly what I'm looking for. The first line should be the attribute and the subsequent features with values. I figure I could somehow expose all the value{}s, but then it wouldn't be dynamic, or?


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • November 7, 2018
erik_jan wrote:

You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.

And if you use the TempPathnameCreator transformer then it will clean up the temporary file automatically once the translation is complete.


mark2atsafe
Safer
Forum|alt.badge.img+44
  • Safer
  • November 7, 2018

I guess we could do with a CSVFlattener transformer (like XMLFlattener and JSONFlattener). Of course this isn't a common issue so I doubt we'd implement it.

The best workaround is what @erik_jan suggests about writing it to a temporary file and reading it back. Alternatively you could try the AttributeSplitter using the newline character as the delimiter to get separate records, and then again using , as the delimiter to get separate attributes from each record. But I still think the temporary file write is the best solution.


Forum|alt.badge.img
david_r wrote:

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Alright I got it together by simply writing the first line in a variable and iterating through it in setAttribute. Unfortunately, unless I expose the attribute names manually in the PythonCaller (logged from the first run), there doesn't seem to be a way to access them afterwards. They're visible in the Inspector, but not when the output is written to a file. What I'd basically need would be a wildcard that exposes everything new coming out of the PythonCaller..


matthew_banyard
Contributor
Forum|alt.badge.img+3
mark2atsafe wrote:

I guess we could do with a CSVFlattener transformer (like XMLFlattener and JSONFlattener). Of course this isn't a common issue so I doubt we'd implement it.

The best workaround is what @erik_jan suggests about writing it to a temporary file and reading it back. Alternatively you could try the AttributeSplitter using the newline character as the delimiter to get separate records, and then again using , as the delimiter to get separate attributes from each record. But I still think the temporary file write is the best solution.

just wanted to echo that I've got exactly the same situation here.

 

Working with an API that only outputs csv.

But due to GDPR reasons cannot (even temporarily) save that csv down to machine - so cannot use the suggested featurereader approach.

 

Agreed a csv flattener / extractor, acting almost identical to the json/xml equivalents would do the job nicely.

Good shout on attribute splitter on newline, followed by attribute splitter on comma - that'll probably work as a workaround in the meantime.


ebygomm
Influencer
Forum|alt.badge.img+33
  • Influencer
  • February 11, 2021
matthew_banyard wrote:

just wanted to echo that I've got exactly the same situation here.

 

Working with an API that only outputs csv.

But due to GDPR reasons cannot (even temporarily) save that csv down to machine - so cannot use the suggested featurereader approach.

 

Agreed a csv flattener / extractor, acting almost identical to the json/xml equivalents would do the job nicely.

Good shout on attribute splitter on newline, followed by attribute splitter on comma - that'll probably work as a workaround in the meantime.

You have to be careful regarding splitting on comma as csv fields can contain commas if quoted. Although i'd argue that writing it to a temp path that is deleted as soon as the workspace finishes no different to reading the csv directly from a GDPR point of view.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings