Skip to main content
Solved

How to remove invalid records from a CSV with multiple fields/datatypes?

  • September 28, 2018
  • 8 replies
  • 59 views

I have a CSV file as a source to Reader with multiple fields/datatypes but there are some invalid records in it. Is there any translator to remove the invalid records before writing it to destination dataset.

By Invalid records i mean - Mismatch in datatype, datalength.

Best answer by hollyatsafe

Hi @dharmendrasharm,

The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.

View original
Did this help you find an answer to your question?

8 replies

lau
Forum|alt.badge.img+3
  • September 28, 2018

You are looking for the TestFilter transformer?


Forum|alt.badge.img+2
  • Best Answer
  • September 28, 2018

Hi @dharmendrasharm,

The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.


hollyatsafe wrote:

Hi @dharmendrasharm,

The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.

@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.

 


lau wrote:

You are looking for the TestFilter transformer?

@lau.. Thanks for your suggestion. I dont think TestFilter would allow me to validate the attribute Data Length/DataType for all the coloums present in CSV file before using it as a source. IS there anything else we can try? Thanks.

 


ashish_man
Contributor
Forum|alt.badge.img+4
  • Contributor
  • October 1, 2018

tester with a regex test could be a simple solution


Forum|alt.badge.img+2
dharmendrasharm wrote:
@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.

 

Hi @dharmendrasharm

 

 

There is an AttributeClassifier in 2013 which works similar to the AttributeValidator for some tests e.g type. However for length I think you would need to use the StringLengthCalculator and the resulting attribute will be a number you can Test. Or you can use the @StringLength function within the Tester to calculate this.

 

In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.

 

I would also consider upgrading your FME, there have been a lot of great changes - improved performance and workspace efficiency and more transformers/readers/writers that I am sure you would find beneficial.

hollyatsafe wrote:
Hi @dharmendrasharm

 

 

There is an AttributeClassifier in 2013 which works similar to the AttributeValidator for some tests e.g type. However for length I think you would need to use the StringLengthCalculator and the resulting attribute will be a number you can Test. Or you can use the @StringLength function within the Tester to calculate this.

 

In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.

 

I would also consider upgrading your FME, there have been a lot of great changes - improved performance and workspace efficiency and more transformers/readers/writers that I am sure you would find beneficial.
@hollyatsafe Thanks for your help.

 


kimo
Contributor
Forum|alt.badge.img+10
  • Contributor
  • October 10, 2018

I used to have a little awk script that scanned a csv file looking for fields counts and widths. If there are un-escaped commas then the reader does get confused. Perhaps that is your problem. You could find these records by counting the number of commas in each record. Do this by reading in the CSV file using a TextLine reader and then count the number of fields using a split. If the count is different reject that record and write the other records out to a cleaned file.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings