Skip to main content
Question

Validating attributes against existing schema

  • December 14, 2017
  • 8 replies
  • 110 views

I'm extracting data nightly from JSON services of which have no validation. These datasets have over 100 columns. Is there way to import a schema into the attribute validator so for every column it would test data type and max length? It would take forever to test data type, structure and max length for every column in every dataset, but my fme processes keep failing.

8 replies

takashi
Evangelist
  • December 15, 2017

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).


  • Author
  • December 15, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Example of JSON:

 

 

{"success":true,"message":null,"result":{"pageNum":1,"numPages":31,"records":[{"NAME":"VALUE","NAME":"VALUE","NAME":"VALUE","NAME":"VALUE",.....}]}}

 

 

Ideally I would like to have a transformer that works for anything. not sure if the solution needs to be different for each. I used the attribute manager to make then name of the input columns the same as output so all it needs to do is say for X value can I be inserted into this column successfully. So for string - is length of value able to fit into column? For numeric does precision/scale of value fit into column? The writers have the schema (as in the definition of the column).

 

 


takashi
Evangelist
  • December 15, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Are the attribute names (the actual values which appear as "NAME" shown in your example) known?

 

 


takashi
Evangelist
  • December 16, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

If the JSON document looks like this

 

{
    "success" : true,
    "message" : null,
    "result" : {
        "pageNum" : 1,
        "numPages" : 31,
        "records" : [
            {
                "NAME1" : "VALUE1",
                "NAME2" : "VALUE2",
                "NAME3" : "VALUE3",
                "NAME4" : "VALUE4"
            }
        ]
    }
you can extract each record containing 4 attributes - NAME1, NAME2, NAME3, and NAME4 with the JSONFragmenter.

 

  • JSON Query: json["result"]["records"][*]

0684Q00000ArMa9QAF.png

You can then validate attribute values with the AttributeValidator.

  • Author
  • December 16, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Hey takashi, appreciate the help but not quite what I'm asking - the attribute validator is the bit I want to automatically populate. Creating a validation rule in that transformer for every column is going to take ages.

 

 


takashi
Evangelist
  • December 16, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Can you elaborate how you want to set the validation rules?

 

 


  • Author
  • December 16, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

From the schema of the output dataset

 

 


takashi
Evangelist
  • December 16, 2017
takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Depending on how you can get the schema definition from the output dataset.

 

If the output dataset exists already, you can read the schema from the dataset with the Schema reader and validate the attributes of each record based on the schema definition.

 

If the output dataset doesn't exist yet, there is no way to get the schema definition at run-time unless you have a resource (e.g. an external lookup table) that defines the destination schema.

 

In any case, Python scripting may be necessary to validate the attributes for each record extracted from the JSON document.

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings