Solved

Validating data using a separate table of preset codes

1 year ago
September 20, 2023
8 replies
33 views

amandah
Contributor
9 replies

Is there a way of validating that the data contained in an attribute is valid based on a table of values?

I know I can use a CSV list in a tester with 'in'. But my table of preset values is 4-500 entries long, and has the possibility of change, is there a way of comparing directly to this table, or using the table to populate the CSV value (maybe through a parameter?) automatically?

Best answer by todd_davis

Hi,

Below is a very simplistic way that you can do it but hopefully it shows an option that is available

View original

Did this help you find an answer to your question?

+22

todd_davis
Influencer
307 replies
1 year ago
September 20, 2023

There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

If the scenario is different from that, then there are other options or extended options.

Cheers,

Todd

amandah
Author
Contributor
9 replies
1 year ago
September 20, 2023

todd_davis wrote:

There are lots of things you can do, but it will depend on what you are trying to achieve.

How many columns are you testing across in your reference table?

When you find a value matches, are you just wanting it to confirm it exists or are there other things you want to do?

If it's just one column, then I would just use a featuremerger and pass the validation table in as a supplier. If they match, then they will come out of the merged port.

If it two columns in the validation table and you just want to know if they match, then the same approach could apply.

If the scenario is different from that, then there are other options or extended options.

Cheers,

Todd

What I am trying to do is test Excel spreadsheets for the correctness of their data before they are bulk-loaded into a database. I'm trying to prevent errors that can cause the failure of the bulk-load, or corrupt the database if they get through.

There are about 20 different columns to test in various ways (length, uniqueness, range, so on), and the sheet can have anywhere from 1 line to 1000's.

Some of these columns are restricted to codes that are already present in the database, such as a location code, or a group code (a different code type for each of these columns). My list of codes I can pull out of the database is only 2 columns - the code and a description. I only need to be able to confirm that the value in the spreadsheet exists in the list of codes (so won't cause errors) - I don't want to do further manipulation on it.

+22

todd_davis
Influencer
307 replies
Best Answer
1 year ago
September 21, 2023

Hi,

Below is a very simplistic way that you can do it but hopefully it shows an option that is available

davidrasner5
Contributor
28 replies
1 year ago
September 21, 2023

The FME attribute validator has a "value in list" check. I don't think it matters how long the list of values to check is. But i never tried to max it out. 🤔 But honestly, i would ask ChatGPT to make this in Python. There are many great, easy to use Python libraries for data validation. I like https://greatexpectations.io/, but there are so many...

amandah
Author
Contributor
9 replies
1 year ago
September 24, 2023

todd_davis wrote:

Hi,

Below is a very simplistic way that you can do it but hopefully it shows an option that is available

Thanks - I'm assuming I would use the counter to get the information to relate back to the original feature in the spreadsheet? I'm trying to keep my output in the same form as the input, just with an extra attribute that will tell the user if the line passed, or what errors there were.

amandah
Author
Contributor
9 replies
1 year ago
September 24, 2023

davidrasner5 wrote:

I did wonder about using a python script user parameter. May have to look further into that. Thanks.

+22

todd_davis
Influencer
307 replies
1 year ago
September 24, 2023

amandah wrote:

Correct - I didn't show it in the picture, but you can have another branch from the counter and merge the Featuremerger's unmergedrequestor features back onto the original feature.

Also out of the unmergedrequestor, you could utilise a listbuider based on that unique id and then a listconcatenator, to create a attribute that holds the fields that aren't valid.

amandah
Author
Contributor
9 replies
1 year ago
November 8, 2023

Thank you both. I ended up going with Todd's answer (although leaving out the exploder step as each column of data I needed to check was a different set of codes). I do like the Python idea, as it would have made my flow more elegant, but ease of creating won out.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Validating data using a separate table of preset codes