Question

Remove commas within quotes

  • 27 November 2012
  • 9 replies
  • 40 views

Hi,

I have a csv file which I would like to split by comma into separate columns with Attributesplitter. However, there are instances where there are commas within quotes which I don't want to split.

 

 

Is there any way I can split by comma whilst ignoring the ones within quotes or how can I remove commas within quotes? I have tried with regular expression but no success.

An example data;

"BASEMENT, GROUND","Y","W8 6AH","LONDON","",,,""

TO:

"BASEMENT GROUND","Y","W8 6AH","LONDON","",,,""

 

 

"BASEMENT, GROUND, FIRST","Y","SW3 3AD"

TO:

"BASEMENT GROUND FIRST","Y","SW3 3AD"

Thank you


9 replies

Badge +21
Hah, just working with the exact same issue: Solution:

 

 

Make sure all attributes are surrounded by "" and Split by: 

 

 

","
SigTill thanks for your reply.

 

 

Not sure, but this is an exact example of the a record from the csv file, there are some areas with quotes and some without;

 

 

24,"U",470,217102840,"5600L000102488","ENG",1,2000-05-04,,2000-05-04,2012-03-05,,"",,"","GROUND FLOOR",17,"",,"","",21700001,"BASEMENT, GROUND","Y","W8 6AH","LONDON","",,,""

 

 

I would like to split by the commas ignoring those within the quotes.

 

 

I hope this make more sense.

 

Cheers.

 

 

Is the *.csv file being input to a "csv" reader or "text" reader? If it is a *.csv reader, then the reader would have already split the values based on the commas whilst ignoring the commas enclosed within quotes.
vindastra

 

the csv file is being input as a "text" reader.
How about a string replacer in front of the splitter with the regex:

 

[*,*]
 Thanks Vindastra, 

 

It doesn't work with my requirement. It changes the table structure.
Badge
If you don't want to use a csv reader, how about using a stringreplacer to change ,<space> to a | or similar character (I am assuming all your data is structured that way?). Then use an attributesplitter on the comma and then convert the | back to ,<space>

 

 

e.g. 21700001,"BASEMENT, GROUND","Y"

 

 

to 21700001,"BASEMENT|GROUND","Y"

 

 

then attribute split

 

 

then stringreplacer turning "BASEMENT|GROUND" back to "BASEMENT, GROUND"
Thanks TDavis, your solution works.
Badge
I have found that the CSV reader in FME handles this quite well, however if you're looking for a more programmatical way you could write your own reader using a PythonCreator and the Python csv module. See http://docs.python.org/2/library/csv.html for details. There are some good examples, but the doco is a bit long winded. I think though the reader embedded in FME willalready hande this quite well. Most parsers take quotes into consideration. However if you have your heart set on removing the inner commas from all text then there are two methods I can think of...

 

 

Either outside or inside FME you could write a python parser that loops over each character in a row, keeping track of when it is in quotes, and when it is not, copying to a buffer only when the character is not a comma, or when it is a comma and it's not within quotes.

 

 

A fast method though that wouldn't involve coding would be to read the file with a CSVReader into FME, send each attribute through a StringReplacer replacing commas, then outputing the file through a CSVWriter.

Reply