Skip to main content

Is there a way of validating that the data contained in an attribute is valid based on a table of values?

I know I can use a CSV list in a tester with 'in'. But my table of preset values is 4-500 entries long, and has the possibility of change, is there a way of comparing directly to this table, or using the table to populate the CSV value (maybe through a parameter?) automatically?

There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

 

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

 

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

 

If the scenario is different from that, then there are other options or extended options.

 

Cheers,

Todd


There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

 

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

 

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

 

If the scenario is different from that, then there are other options or extended options.

 

Cheers,

Todd

What I am trying to do is test Excel spreadsheets for the correctness of their data before they are bulk-loaded into a database. I'm trying to prevent errors that can cause the failure of the bulk-load, or corrupt the database if they get through.

 

There are about 20 different columns to test in various ways (length, uniqueness, range, so on), and the sheet can have anywhere from 1 line to 1000's.

 

Some of these columns are restricted to codes that are already present in the database, such as a location code, or a group code (a different code type for each of these columns). My list of codes I can pull out of the database is only 2 columns - the code and a description. I only need to be able to confirm that the value in the spreadsheet exists in the list of codes (so won't cause errors) - I don't want to do further manipulation on it.


Hi,

 

Below is a very simplistic way that you can do it but hopefully it shows an option that is availableimage 

 

 


The FME attribute validator has a "value in list" check. I don't think it matters how long the list of values to check is. But i never tried to max it out. 🤔 But honestly, i would ask ChatGPT to make this in Python. There are many great, easy to use Python libraries for data validation. I like https://greatexpectations.io/, but there are so many...


Hi,

 

Below is a very simplistic way that you can do it but hopefully it shows an option that is availableimage 

 

 

Thanks - I'm assuming I would use the counter to get the information to relate back to the original feature in the spreadsheet? I'm trying to keep my output in the same form as the input, just with an extra attribute that will tell the user if the line passed, or what errors there were.


The FME attribute validator has a "value in list" check. I don't think it matters how long the list of values to check is. But i never tried to max it out. 🤔 But honestly, i would ask ChatGPT to make this in Python. There are many great, easy to use Python libraries for data validation. I like https://greatexpectations.io/, but there are so many...

I did wonder about using a python script user parameter. May have to look further into that. Thanks.


Thanks - I'm assuming I would use the counter to get the information to relate back to the original feature in the spreadsheet? I'm trying to keep my output in the same form as the input, just with an extra attribute that will tell the user if the line passed, or what errors there were.

Correct - I didn't show it in the picture, but you can have another branch from the counter and merge the Featuremerger's unmergedrequestor features back onto the original feature.

 

Also out of the unmergedrequestor, you could utilise a listbuider based on that unique id and then a listconcatenator, to create a attribute that holds the fields that aren't valid.


Thank you both. I ended up going with Todd's answer (although leaving out the exploder step as each column of data I needed to check was a different set of codes). I do like the Python idea, as it would have made my flow more elegant, but ease of creating won out.


Reply