Hi,
How can you distinguish delimiter semi colons from others?
If there is a regularity - e.g. delimiter semi colons are always followed by a white space and non-delimiters are not, I think it's possible to create an FME workspace to clean up them.
Otherwise it's difficult to automate completely. Some manual operations might be necessary, for example: Read the data with the TEXT reader line by line. Split every line with semi colons, count the number of columns. Save correct records and failed records separately. Correct failed records manually. Read again and merge them, then save them into the final destination dataset.
Takashi
Hi,
it's difficult to tell without having seen the CSV file in question, but as has been said, it is not trivial to separate content from delimiters if there are no other distinguishing features between them.
You could, however, try and replace the CSV reader with a PythonCreator and the script
found here. It will use the Python "csv" module that is somewhat more robust than the CSV reader in FME.
David
Hi, sorry I haven't replied yet - have had a small vacation too.
I'll test your different approaches. For my (at the moment) limited data set of 500K post, there seems to be only one failing entity (of some posts), so I'll surely be able to fix that. However, this would be a returning and updated data set (AIS-shiptracks from satellit monitoring).
I believe I can do Takashi's approach resonable easily, thanks.
But I'll also try David's PythonCreator - I have a personal goal to get more into tools like Python.
Hi again,
I received a new similar data set recently and tried a different approach using regex (.*;){21} - to find lines with more than 20 semi colons - as a simple solution inside a StringSearcher.
Found at Stackoverflow
http://stackoverflow.com/questions/14774559/with-regex-finding-lines-containing-more-than-n-semi-colons
Karl