Skip to main content
Solved

Validating data using a separate table of preset codes

  • September 20, 2023
  • 8 replies
  • 32 views

amandah
Contributor
Forum|alt.badge.img+2

Is there a way of validating that the data contained in an attribute is valid based on a table of values?

I know I can use a CSV list in a tester with 'in'. But my table of preset values is 4-500 entries long, and has the possibility of change, is there a way of comparing directly to this table, or using the table to populate the CSV value (maybe through a parameter?) automatically?

Best answer by todd_davis

Hi,

 

Below is a very simplistic way that you can do it but hopefully it shows an option that is availableimage 

 

 

View original
Did this help you find an answer to your question?

8 replies

todd_davis
Influencer
Forum|alt.badge.img+22
  • Influencer
  • September 20, 2023

There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

 

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

 

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

 

If the scenario is different from that, then there are other options or extended options.

 

Cheers,

Todd


amandah
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • September 20, 2023
todd_davis wrote:

There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

 

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

 

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

 

If the scenario is different from that, then there are other options or extended options.

 

Cheers,

Todd

What I am trying to do is test Excel spreadsheets for the correctness of their data before they are bulk-loaded into a database. I'm trying to prevent errors that can cause the failure of the bulk-load, or corrupt the database if they get through.

 

There are about 20 different columns to test in various ways (length, uniqueness, range, so on), and the sheet can have anywhere from 1 line to 1000's.

 

Some of these columns are restricted to codes that are already present in the database, such as a location code, or a group code (a different code type for each of these columns). My list of codes I can pull out of the database is only 2 columns - the code and a description. I only need to be able to confirm that the value in the spreadsheet exists in the list of codes (so won't cause errors) - I don't want to do further manipulation on it.


todd_davis
Influencer
Forum|alt.badge.img+22
  • Influencer
  • Best Answer
  • September 21, 2023

Hi,

 

Below is a very simplistic way that you can do it but hopefully it shows an option that is availableimage 

 

 


davidrasner5
Contributor
Forum|alt.badge.img+4
  • Contributor
  • September 21, 2023

The FME attribute validator has a "value in list" check. I don't think it matters how long the list of values to check is. But i never tried to max it out. 🤔 But honestly, i would ask ChatGPT to make this in Python. There are many great, easy to use Python libraries for data validation. I like https://greatexpectations.io/, but there are so many...


amandah
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • September 24, 2023
todd_davis wrote:

Hi,

 

Below is a very simplistic way that you can do it but hopefully it shows an option that is availableimage 

 

 

Thanks - I'm assuming I would use the counter to get the information to relate back to the original feature in the spreadsheet? I'm trying to keep my output in the same form as the input, just with an extra attribute that will tell the user if the line passed, or what errors there were.


amandah
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • September 24, 2023
davidrasner5 wrote:

The FME attribute validator has a "value in list" check. I don't think it matters how long the list of values to check is. But i never tried to max it out. 🤔 But honestly, i would ask ChatGPT to make this in Python. There are many great, easy to use Python libraries for data validation. I like https://greatexpectations.io/, but there are so many...

I did wonder about using a python script user parameter. May have to look further into that. Thanks.


todd_davis
Influencer
Forum|alt.badge.img+22
  • Influencer
  • September 24, 2023
amandah wrote:

Thanks - I'm assuming I would use the counter to get the information to relate back to the original feature in the spreadsheet? I'm trying to keep my output in the same form as the input, just with an extra attribute that will tell the user if the line passed, or what errors there were.

Correct - I didn't show it in the picture, but you can have another branch from the counter and merge the Featuremerger's unmergedrequestor features back onto the original feature.

 

Also out of the unmergedrequestor, you could utilise a listbuider based on that unique id and then a listconcatenator, to create a attribute that holds the fields that aren't valid.


amandah
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • November 8, 2023

Thank you both. I ended up going with Todd's answer (although leaving out the exploder step as each column of data I needed to check was a different set of codes). I do like the Python idea, as it would have made my flow more elegant, but ease of creating won out.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings