I'm extracting data nightly from JSON services of which have no validation. These datasets have over 100 columns. Is there way to import a schema into the attribute validator so for every column it would test data type and max length? It would take forever to test data type, structure and max length for every column in every dataset, but my fme processes keep failing.

Validating attributes against existing schema

takashi
7715 replies
7 years ago
December 15, 2017

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

R

rclarknxf
Author
8 replies
7 years ago
December 15, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Example of JSON:

{"success":true,"message":null,"result":{"pageNum":1,"numPages":31,"records":[{"NAME":"VALUE","NAME":"VALUE","NAME":"VALUE","NAME":"VALUE",.....}]}}

Ideally I would like to have a transformer that works for anything. not sure if the solution needs to be different for each. I used the attribute manager to make then name of the input columns the same as output so all it needs to do is say for X value can I be inserted into this column successfully. So for string - is length of value able to fit into column? For numeric does precision/scale of value fit into column? The writers have the schema (as in the definition of the column).

takashi
7715 replies
7 years ago
December 15, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Are the attribute names (the actual values which appear as "NAME" shown in your example) known?

takashi
7715 replies
7 years ago
December 16, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

If the JSON document looks like this

{
    "success" : true,
    "message" : null,
    "result" : {
        "pageNum" : 1,
        "numPages" : 31,
        "records" : [
            {
                "NAME1" : "VALUE1",
                "NAME2" : "VALUE2",
                "NAME3" : "VALUE3",
                "NAME4" : "VALUE4"
            }
        ]
    }
}

you can extract each record containing 4 attributes - NAME1, NAME2, NAME3, and NAME4 with the JSONFragmenter.

JSON Query: json["result"]["records"][*]

You can then validate attribute values with the AttributeValidator.

R

rclarknxf
Author
8 replies
7 years ago
December 16, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Hey takashi, appreciate the help but not quite what I'm asking - the attribute validator is the bit I want to automatically populate. Creating a validation rule in that transformer for every column is going to take ages.

takashi
7715 replies
7 years ago
December 16, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Can you elaborate how you want to set the validation rules?

R

rclarknxf
Author
8 replies
7 years ago
December 16, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

From the schema of the output dataset

takashi
7715 replies
7 years ago
December 16, 2017

takashi wrote:

Hi @rclarknxf, I think the solution would be different depending on how the desired schema (column names, data type for each column, max length for each column, and other restrictions if needed) is given. Could you please elaborate about this? It would be ideal if you could post a sample JSON document and the schema definition (maybe an external table?).

Depending on how you can get the schema definition from the output dataset.

If the output dataset exists already, you can read the schema from the dataset with the Schema reader and validate the attributes of each record based on the schema definition.

If the output dataset doesn't exist yet, there is no way to get the schema definition at run-time unless you have a resource (e.g. an external lookup table) that defines the destination schema.

In any case, Python scripting may be necessary to validate the attributes for each record extracted from the JSON document.

Validating attributes against existing schema

8 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Red Dead Redemption 2 Not Recognizing UHD Screen on 2023 Razer Blade 16 RTX 4090 i9 13950HX Dual UHD 120hz and FHD 240hz Screenicon

Booster Prime "This option is not available for your device according to our lowest requirements in razer cortex"icon

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings