Solved

Linebreaks in a csv file


Badge
I use a csv reader to read a csv file. Unfortunately this csv file contains line breaks in some records (CRLF). This causes a record to be split into more than one, which is undesirable.

 

 

Is there a way for the reader to ignore these linebreaks? 
icon

Best answer by david_r 27 March 2013, 10:42

View original

10 replies

Userlevel 4
Hi,

 

 

is your field value containing the line break surrounded by quotation marks?

 

 

Example:

 

"the is a

 

multi-line text"

 

It would help if you could post a sample record.

 

 

David
Badge
Hi David,

 

 

Thank you for the quick reply

 

 

Yes, the value is  surrounded by quotation marks. I don't see a way to attach a csv file here but the record looks something like this:

 

 

"E","30-05-2007 22:00:00","16-07-2007 22:00:00","ACTIVE","BEMETERD","18062013XX259<crlf>

 

","18062013XX","SPZ REGIO 7 ZUIDWEST E"

 

Userlevel 4

Hi,

 

 

I does not seem like FME supports CSV files with newlines, even when they are quoted.

 

 

Try to insert the following script into a PythonCreator, it uses the Python CSV module which supports newlines:

 

import fmeobjects
import csv

class FeatureCreator(object):
    def __init__(self):
        self.inputfilename = FME_MacroValues['INPUT_CSV_FILE']
        self.csvdelimiter = ',' # Modify as needed
        self.csvquotechar = '"' # Modify as needed
        self.log = fmeobjects.FMELogFile()
        self.fieldnames = []
        
    def close(self):
        with open(self.inputfilename, 'rb') as csvfile:
            csvreader = csv.reader(csvfile, 
                                   delimiter=self.csvdelimiter, 
                                   quotechar=self.csvquotechar)
            for n, row in enumerate(csvreader):
                if n == 0:
                    self.fieldnames = row
                    self.log.logMessageString("Attribute names to expose " + \
                        "in the PythonCreator:", fmeobjects.FME_WARN)
                    for field in row:
                        self.log.logMessageString("    "+field, fmeobjects.FME_WARN)
                else:
                    feature = fmeobjects.FMEFeature()
                    for m, value in enumerate(row):
                        feature.setAttribute(self.fieldnames[m], value)
                    self.pyoutput(feature)

 

Notes: 
  •   The CSV filename must be defined in a User Parameter (public or private) called INPUT_CSV_FILE
  •   To make the attribute names of the CSV file visible in the Workbench, you will have to add the list of parameter names to the PythonCreator as "Attributes to expose". When you run the script it will output this list for you to the FME log window (blue lines near the top).
  •   Tested with the wikipedia CSV test data and FME2013.

 Hope this helps.

 

 

David
Badge
Thank you David, this works great!
Badge +2

fixcsvpython.fmw This little workspace will clean-up CSV files that have embedded linefeeds and then the CSV will process the data OK.

@MarkAtSafe - can you share that fmw files again. Seems to be coming up empty. Thanks!

Badge +7

@david_r : Thank you very much, this is a great solution!

@MarkAtSafe : It would be nice, if this could become an option for the default CSV-Reader and writer.

Userlevel 1
Badge +21

@david_r : Thank you very much, this is a great solution!

@MarkAtSafe : It would be nice, if this could become an option for the default CSV-Reader and writer.

 

Add ability to read csv with linebreaks

 

 

Coming in 2017, see link above

 

Userlevel 2
Badge +17

Hi @tino and everyone, take a look at the CSV2 Reader/Writer in the latest FME 2017.0 beta!

Badge +2

The workspace to pre-process your CSV to remove embedded linefeed / linebreaks is available in the KnowledgeBase article. Thanks @takashi for pointing out that this has been been addressed in FME 2017 beta releases in the updated CSV reader. To take advantage of the new reader in an existing workspace you need to add a new CSV reader and then remove or disable the original one.

Reply