Skip to main content
Solved

How to read a CSV file with inconsistent line endings.

  • February 26, 2021
  • 2 replies
  • 421 views

gazza
Contributor
Forum|alt.badge.img+6
  • Contributor

I've been given a CSV file that has a mixture of line ending types, some are LF and some are CR.

 

The first line is blank with a LF, followed by a header line with a CR then all the data records which are terminated with LF's. The CSV reader is thinking the terminator is LF so thinks the header line is both lines 2 and 3 which is messing it up.

 

Other than reading the file in as a whole and changing the CR to LF, saving it and reading it in again does anyone know how to read this in?

Best answer by tomf

This would be my approach without resaving a corrected file (Quick solution). It still involves correcting/standardising the line end characters, but that's probably necessary. Here I've faked a CSV with a mix of line ending characters based on your description, then written out. Testing that file with a standard CSV reader produced a similar effect to yours.

The output of this does produce unexposed attributes, but if you have a fixed schema it would be trivial to set up an AttributeExposer to expose all the attrbutes to the rest of the workflow. Has attached the below workspace (FME 2020.2)

BadCSVworkspace

View original
Did this help you find an answer to your question?

2 replies

tomf
Contributor
Forum|alt.badge.img+14
  • Contributor
  • Best Answer
  • February 27, 2021

This would be my approach without resaving a corrected file (Quick solution). It still involves correcting/standardising the line end characters, but that's probably necessary. Here I've faked a CSV with a mix of line ending characters based on your description, then written out. Testing that file with a standard CSV reader produced a similar effect to yours.

The output of this does produce unexposed attributes, but if you have a fixed schema it would be trivial to set up an AttributeExposer to expose all the attrbutes to the rest of the workflow. Has attached the below workspace (FME 2020.2)

BadCSVworkspace


gazza
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • February 27, 2021
tomf wrote:

This would be my approach without resaving a corrected file (Quick solution). It still involves correcting/standardising the line end characters, but that's probably necessary. Here I've faked a CSV with a mix of line ending characters based on your description, then written out. Testing that file with a standard CSV reader produced a similar effect to yours.

The output of this does produce unexposed attributes, but if you have a fixed schema it would be trivial to set up an AttributeExposer to expose all the attrbutes to the rest of the workflow. Has attached the below workspace (FME 2020.2)

BadCSVworkspace

Thanks Tom, that's perfect and saves writing to the temporary file. I just need to test if it works as efficiently as the temp file method with upwards of 1 million lines in the csv.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings