Skip to main content
Solved

Linebreaks in a csv file


Forum|alt.badge.img
I use a csv reader to read a csv file. Unfortunately this csv file contains line breaks in some records (CRLF). This causes a record to be split into more than one, which is undesirable.

 

 

Is there a way for the reader to ignore these linebreaks? 

Best answer by david_r

Hi,

 

 

I does not seem like FME supports CSV files with newlines, even when they are quoted.

 

 

Try to insert the following script into a PythonCreator, it uses the Python CSV module which supports newlines:

 

import fmeobjects
import csv

class FeatureCreator(object):
    def __init__(self):
        self.inputfilename = FME_MacroValues['INPUT_CSV_FILE']
        self.csvdelimiter = ',' # Modify as needed
        self.csvquotechar = '"' # Modify as needed
        self.log = fmeobjects.FMELogFile()
        self.fieldnames = []
        
    def close(self):
        with open(self.inputfilename, 'rb'as csvfile:
            csvreader = csv.reader(csvfile, 
                                   delimiter=self.csvdelimiter, 
                                   quotechar=self.csvquotechar)
            for n, row in enumerate(csvreader):
                if n == 0:
                    self.fieldnames = row
                    self.log.logMessageString("Attribute names to expose " + \
                        "in the PythonCreator:", fmeobjects.FME_WARN)
                    for field in row:
                        self.log.logMessageString("    "+field, fmeobjects.FME_WARN)
                else:
                    feature = fmeobjects.FMEFeature()
                    for m, value in enumerate(row):
                        feature.setAttribute(self.fieldnames[m], value)
                    self.pyoutput(feature)

 

Notes: 
  •   The CSV filename must be defined in a User Parameter (public or private) called INPUT_CSV_FILE
  •   To make the attribute names of the CSV file visible in the Workbench, you will have to add the list of parameter names to the PythonCreator as "Attributes to expose". When you run the script it will output this list for you to the FME log window (blue lines near the top).
  •   Tested with the wikipedia CSV test data and FME2013.

 Hope this helps.

 

 

David
View original
Did this help you find an answer to your question?

10 replies

david_r
Evangelist
  • March 27, 2013
Hi,

 

 

is your field value containing the line break surrounded by quotation marks?

 

 

Example:

 

"the is a

 

multi-line text"

 

It would help if you could post a sample record.

 

 

David

Forum|alt.badge.img
  • Author
  • March 27, 2013
Hi David,

 

 

Thank you for the quick reply

 

 

Yes, the value is  surrounded by quotation marks. I don't see a way to attach a csv file here but the record looks something like this:

 

 

"E","30-05-2007 22:00:00","16-07-2007 22:00:00","ACTIVE","BEMETERD","18062013XX259<crlf>

 

","18062013XX","SPZ REGIO 7 ZUIDWEST E"

 


david_r
Evangelist
  • Best Answer
  • March 27, 2013

Hi,

 

 

I does not seem like FME supports CSV files with newlines, even when they are quoted.

 

 

Try to insert the following script into a PythonCreator, it uses the Python CSV module which supports newlines:

 

import fmeobjects
import csv

class FeatureCreator(object):
    def __init__(self):
        self.inputfilename = FME_MacroValues['INPUT_CSV_FILE']
        self.csvdelimiter = ',' # Modify as needed
        self.csvquotechar = '"' # Modify as needed
        self.log = fmeobjects.FMELogFile()
        self.fieldnames = []
        
    def close(self):
        with open(self.inputfilename, 'rb'as csvfile:
            csvreader = csv.reader(csvfile, 
                                   delimiter=self.csvdelimiter, 
                                   quotechar=self.csvquotechar)
            for n, row in enumerate(csvreader):
                if n == 0:
                    self.fieldnames = row
                    self.log.logMessageString("Attribute names to expose " + \
                        "in the PythonCreator:", fmeobjects.FME_WARN)
                    for field in row:
                        self.log.logMessageString("    "+field, fmeobjects.FME_WARN)
                else:
                    feature = fmeobjects.FMEFeature()
                    for m, value in enumerate(row):
                        feature.setAttribute(self.fieldnames[m], value)
                    self.pyoutput(feature)

 

Notes: 
  •   The CSV filename must be defined in a User Parameter (public or private) called INPUT_CSV_FILE
  •   To make the attribute names of the CSV file visible in the Workbench, you will have to add the list of parameter names to the PythonCreator as "Attributes to expose". When you run the script it will output this list for you to the FME log window (blue lines near the top).
  •   Tested with the wikipedia CSV test data and FME2013.

 Hope this helps.

 

 

David

Forum|alt.badge.img
  • Author
  • March 27, 2013
Thank you David, this works great!

Forum|alt.badge.img+2

fixcsvpython.fmw This little workspace will clean-up CSV files that have embedded linefeeds and then the CSV will process the data OK.


adamajm
Participant
  • Participant
  • May 9, 2016

@MarkAtSafe - can you share that fmw files again. Seems to be coming up empty. Thanks!


tino
Contributor
Forum|alt.badge.img+16
  • Contributor
  • October 12, 2016

@david_r : Thank you very much, this is a great solution!

@MarkAtSafe : It would be nice, if this could become an option for the default CSV-Reader and writer.


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • October 12, 2016
tino wrote:

@david_r : Thank you very much, this is a great solution!

@MarkAtSafe : It would be nice, if this could become an option for the default CSV-Reader and writer.

 

Add ability to read csv with linebreaks

 

 

Coming in 2017, see link above

 


takashi
Influencer
  • October 12, 2016

Hi @tino and everyone, take a look at the CSV2 Reader/Writer in the latest FME 2017.0 beta!


Forum|alt.badge.img+2
  • October 12, 2016

The workspace to pre-process your CSV to remove embedded linefeed / linebreaks is available in the KnowledgeBase article. Thanks @takashi for pointing out that this has been been addressed in FME 2017 beta releases in the updated CSV reader. To take advantage of the new reader in an existing workspace you need to add a new CSV reader and then remove or disable the original one.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings