Solved

How to remove invalid records from a CSV with multiple fields/datatypes?

6 years ago
September 28, 2018
8 replies
71 views

dharmendrasharm
3 replies

I have a CSV file as a source to Reader with multiple fields/datatypes but there are some invalid records in it. Is there any translator to remove the invalid records before writing it to destination dataset.

By Invalid records i mean - Mismatch in datatype, datalength.

Best answer by hollyatsafe

Hi @dharmendrasharm,

The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.

View original

Did this help you find an answer to your question?

lau
65 replies
6 years ago
September 28, 2018

You are looking for the TestFilter transformer?

hollyatsafe
719 replies
Best Answer
6 years ago
September 28, 2018

Hi @dharmendrasharm,

dharmendrasharm
Author
3 replies
6 years ago
October 1, 2018

hollyatsafe wrote:

Hi @dharmendrasharm,

@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.

dharmendrasharm
Author
3 replies
6 years ago
October 1, 2018

lau wrote:

You are looking for the TestFilter transformer?

@lau.. Thanks for your suggestion. I dont think TestFilter would allow me to validate the attribute Data Length/DataType for all the coloums present in CSV file before using it as a source. IS there anything else we can try? Thanks.

ashish_man
Contributor
9 replies
6 years ago
October 1, 2018

tester with a regex test could be a simple solution

hollyatsafe
719 replies
6 years ago
October 1, 2018

dharmendrasharm wrote:

@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.

Hi @dharmendrasharm

There is an AttributeClassifier in 2013 which works similar to the AttributeValidator for some tests e.g type. However for length I think you would need to use the StringLengthCalculator and the resulting attribute will be a number you can Test. Or you can use the @StringLength function within the Tester to calculate this.

In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.

I would also consider upgrading your FME, there have been a lot of great changes - improved performance and workspace efficiency and more transformers/readers/writers that I am sure you would find beneficial.

dharmendrasharm
Author
3 replies
6 years ago
October 10, 2018

hollyatsafe wrote:

Hi @dharmendrasharm

In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.

@hollyatsafe Thanks for your help.

+10

kimo
Contributor
96 replies
6 years ago
October 10, 2018

I used to have a little awk script that scanned a csv file looking for fields counts and widths. If there are un-escaped commas then the reader does get confused. Perhaps that is your problem. You could find these records by counting the number of commas in each record. Do this by reading in the CSV file using a TextLine reader and then count the number of fields using a split. If the count is different reject that record and write the other records out to a cleaned file.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

How to remove invalid records from a CSV with multiple fields/datatypes?