Skip to main content
Question

Remove commas within quotes

  • November 27, 2012
  • 9 replies
  • 150 views

Hi,

I have a csv file which I would like to split by comma into separate columns with Attributesplitter. However, there are instances where there are commas within quotes which I don't want to split.

 

 

Is there any way I can split by comma whilst ignoring the ones within quotes or how can I remove commas within quotes? I have tried with regular expression but no success.

An example data;

"BASEMENT, GROUND","Y","W8 6AH","LONDON","",,,""

TO:

"BASEMENT GROUND","Y","W8 6AH","LONDON","",,,""

 

 

"BASEMENT, GROUND, FIRST","Y","SW3 3AD"

TO:

"BASEMENT GROUND FIRST","Y","SW3 3AD"

Thank you

9 replies

sigtill
Contributor
Forum|alt.badge.img+24
  • Contributor
  • November 27, 2012
Hah, just working with the exact same issue: Solution:

 

 

Make sure all attributes are surrounded by "" and Split by: 

 

 

","

  • Author
  • November 27, 2012
SigTill thanks for your reply.

 

 

Not sure, but this is an exact example of the a record from the csv file, there are some areas with quotes and some without;

 

 

24,"U",470,217102840,"5600L000102488","ENG",1,2000-05-04,,2000-05-04,2012-03-05,,"",,"","GROUND FLOOR",17,"",,"","",21700001,"BASEMENT, GROUND","Y","W8 6AH","LONDON","",,,""

 

 

I would like to split by the commas ignoring those within the quotes.

 

 

I hope this make more sense.

 

Cheers.

 

 


  • November 27, 2012
Is the *.csv file being input to a "csv" reader or "text" reader? If it is a *.csv reader, then the reader would have already split the values based on the commas whilst ignoring the commas enclosed within quotes.

  • Author
  • November 27, 2012
vindastra

 

the csv file is being input as a "text" reader.

  • November 27, 2012
How about a string replacer in front of the splitter with the regex:

 

[*,*]

  • Author
  • November 27, 2012
 Thanks Vindastra, 

 

It doesn't work with my requirement. It changes the table structure.

Forum|alt.badge.img
  • November 28, 2012
If you don't want to use a csv reader, how about using a stringreplacer to change ,<space> to a | or similar character (I am assuming all your data is structured that way?). Then use an attributesplitter on the comma and then convert the | back to ,<space>

 

 

e.g. 21700001,"BASEMENT, GROUND","Y"

 

 

to 21700001,"BASEMENT|GROUND","Y"

 

 

then attribute split

 

 

then stringreplacer turning "BASEMENT|GROUND" back to "BASEMENT, GROUND"

  • Author
  • November 28, 2012
Thanks TDavis, your solution works.

Forum|alt.badge.img
  • November 28, 2012
I have found that the CSV reader in FME handles this quite well, however if you're looking for a more programmatical way you could write your own reader using a PythonCreator and the Python csv module. See http://docs.python.org/2/library/csv.html for details. There are some good examples, but the doco is a bit long winded. I think though the reader embedded in FME willalready hande this quite well. Most parsers take quotes into consideration. However if you have your heart set on removing the inner commas from all text then there are two methods I can think of...

 

 

Either outside or inside FME you could write a python parser that loops over each character in a row, keeping track of when it is in quotes, and when it is not, copying to a buffer only when the character is not a comma, or when it is a comma and it's not within quotes.

 

 

A fast method though that wouldn't involve coding would be to read the file with a CSVReader into FME, send each attribute through a StringReplacer replacing commas, then outputing the file through a CSVWriter.

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings