Skip to main content

I'm reading a CSV file from the web. The file is served encoded, so I first decode it, ending up with an attribute "_decoded", containing the csv data. I want to use this data to create a table (like the csv reader does), without writing the output to a file first. 

 

Sample format:

objid;xcoord;ycoord;name
21;123;456;house
22;125;458;tree

The solution suggested in the thread linked below doesn't work, throwing the error "Python Exception <Error>: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?", but I can't figure out where to stick the read mode (rU).

https://knowledge.safe.com/questions/78520/parse-csv-data-stored-in-database-table.html

 

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Of course you'll need to manually remove all the Â characters littered about, it's a bug in the forum... :-(


You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.


You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.

Agree, that's a very good alternative for those uneasy about maintaining Python code that they don't fully understand.


For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Thanks, I had to use io.StringIO(csv_string) and import io (I don't even have python 3 installed?!) to get it to work. However this isn't exactly what I'm looking for. The first line should be the attribute and the subsequent features with values. I figure I could somehow expose all the value{}s, but then it wouldn't be dynamic, or?


You could use the AttributeFileWriter to store it temporary as a CSV file,

Then use the FeatureReader to read back the CSV file in the regular reader way.

Easier to maintain if you are not too familiar with Python.

And if you use the TempPathnameCreator transformer then it will clean up the temporary file automatically once the translation is complete.


I guess we could do with a CSVFlattener transformer (like XMLFlattener and JSONFlattener). Of course this isn't a common issue so I doubt we'd implement it.

The best workaround is what @erik_jan suggests about writing it to a temporary file and reading it back. Alternatively you could try the AttributeSplitter using the newline character as the delimiter to get separate records, and then again using , as the delimiter to get separate attributes from each record. But I still think the temporary file write is the best solution.


For your use case you can try the following:

from fmeobjects import *
import csv
import StringIO

class ParseCSVString(object):
    def input(self, feature):
        csv_string = feature.getAttribute('csv')
        csv_parser = csv.reader(StringIO.StringIO(csv_string), delimiter=';')
        for record in csv_parser:
            f = feature.clone()
            for n, value in enumerate(record):
                f.setAttribute('value{%s}' % n, value)
            self.pyoutput(f)

Alright I got it together by simply writing the first line in a variable and iterating through it in setAttribute. Unfortunately, unless I expose the attribute names manually in the PythonCaller (logged from the first run), there doesn't seem to be a way to access them afterwards. They're visible in the Inspector, but not when the output is written to a file. What I'd basically need would be a wildcard that exposes everything new coming out of the PythonCaller..


I guess we could do with a CSVFlattener transformer (like XMLFlattener and JSONFlattener). Of course this isn't a common issue so I doubt we'd implement it.

The best workaround is what @erik_jan suggests about writing it to a temporary file and reading it back. Alternatively you could try the AttributeSplitter using the newline character as the delimiter to get separate records, and then again using , as the delimiter to get separate attributes from each record. But I still think the temporary file write is the best solution.

just wanted to echo that I've got exactly the same situation here.

 

Working with an API that only outputs csv.

But due to GDPR reasons cannot (even temporarily) save that csv down to machine - so cannot use the suggested featurereader approach.

 

Agreed a csv flattener / extractor, acting almost identical to the json/xml equivalents would do the job nicely.

Good shout on attribute splitter on newline, followed by attribute splitter on comma - that'll probably work as a workaround in the meantime.


just wanted to echo that I've got exactly the same situation here.

 

Working with an API that only outputs csv.

But due to GDPR reasons cannot (even temporarily) save that csv down to machine - so cannot use the suggested featurereader approach.

 

Agreed a csv flattener / extractor, acting almost identical to the json/xml equivalents would do the job nicely.

Good shout on attribute splitter on newline, followed by attribute splitter on comma - that'll probably work as a workaround in the meantime.

You have to be careful regarding splitting on comma as csv fields can contain commas if quoted. Although i'd argue that writing it to a temp path that is deleted as soon as the workspace finishes no different to reading the csv directly from a GDPR point of view.


Reply