I have a CSV file as a source to Reader with multiple fields/datatypes but there are some invalid records in it. Is there any translator to remove the invalid records before writing it to destination dataset.
By Invalid records i mean - Mismatch in datatype, datalength.
Best answer by hollyatsafe
Hi @dharmendrasharm,
The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.
The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.
The AttributeValidator may also be of interest to you. You can specify which attributes to test and the validation rules include Type and Minimum/Maximum Length and only those features that meet the criteria will come out of the Passed port.
@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.
@lau.. Thanks for your suggestion. I dont think TestFilter would allow me to validate the attribute Data Length/DataType for all the coloums present in CSV file before using it as a source. IS there anything else we can try? Thanks.
@hollyatsafe.. Thankyou so much for your response on this but i am using FME2013 version which does not have AttributeValidator it seems. Is there any other way to achieve this. Thanks.
Hi @dharmendrasharm
There is an AttributeClassifier in 2013 which works similar to the AttributeValidator for some tests e.g type. However for length I think you would need to use the StringLengthCalculator and the resulting attribute will be a number you can Test. Or you can use the @StringLength function within the Tester to calculate this.
In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.
I would also consider upgrading your FME, there have been a lot of great changes - improved performance and workspace efficiency and more transformers/readers/writers that I am sure you would find beneficial.
There is an AttributeClassifier in 2013 which works similar to the AttributeValidator for some tests e.g type. However for length I think you would need to use the StringLengthCalculator and the resulting attribute will be a number you can Test. Or you can use the @StringLength function within the Tester to calculate this.
In current Testers there is an In Range operator where you could set the Min/Max length values, if not you could do a composite test with the <= / >= operators for this.
I would also consider upgrading your FME, there have been a lot of great changes - improved performance and workspace efficiency and more transformers/readers/writers that I am sure you would find beneficial.
I used to have a little awk script that scanned a csv file looking for fields counts and widths. If there are un-escaped commas then the reader does get confused. Perhaps that is your problem. You could find these records by counting the number of commas in each record. Do this by reading in the CSV file using a TextLine reader and then count the number of fields using a split. If the count is different reject that record and write the other records out to a cleaned file.
We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.