Skip to main content

The idea is to create a generic workflow which will validate any dataset by checking all its attributes, calculating some statistics etc.

I separated it into 2 (maybe even 3 workspaces) where parent workspace reads the dataset and creates a parameters for the child workspace which is analyzing the data. The only problem is - how to set up the AttributeValidator using these parameters? I did that without any problems with the Attributes to Expose parameter of the FeatureReader (list of attribute names with space separator).

I am using 2020.1.2.0 (20200902 - Build 20620 - WIN64)

You can use the Schema (Any Format) reader to dynamically read any incoming dataset. Attribute details will be stored in a list called attribute{} so you can then use a ListExploder transformer to create one feature for every attribute in dataset.

After this. you will be able to use these attribute names as user parameters for your second (analyse the data) workspace.


You can use the Schema (Any Format) reader to dynamically read any incoming dataset. Attribute details will be stored in a list called attribute{} so you can then use a ListExploder transformer to create one feature for every attribute in dataset.

After this. you will be able to use these attribute names as user parameters for your second (analyse the data) workspace.

Hi @germang​ 

Can you explain in details what do you mean by "you will be able to use these attribute names as user parameters for your second (analyse the data) workspace."? In particular, how would you use these individual features from the parent workspace with one attribute name only in the AttributeValidator inside child workspace?

I know how to extract a list of all attributes, this is not a problem. I am struggling with passing that as a parameter to the specific transformer - AttributeValidator.

 


Hi @zuzanna_sz​ 

The Schema (Any Format) reader will have a single record as output. However, information for each attribute will be stores in a list that once exploded (ListExploder) will produce X number of records where X is the number of attributes in the source dataset.

Then, you may run the child workspace with the WorkspaceRunner transformer posting the attribute named "name" and the source dataset. In the child workspace you may use a Generic (Any Format) reader and an AttributeValidator.

The child workspace will be run as many times as attributes present in the source dataset.

See highlighted elements in attached images and workspace.ParentChild


Hi @germang (Partner)​ 

Your solution if ok, if the workflow is designed to analyze attribute by attribute (child workspace is triggered as many times as the number of attributes in the source data feature type). This is not the solution which I am looking for - in my first post I wrote: validate any dataset by checking all its attributes. I should probably add, that there will be a lot of them, each time I will be running the workspace.

AttribteValidator has an ability to analyze multiple attributes at the same time (it is designed as a check list where you can check as many boxes as you like), in your example, only one attribute at the same run is processed.

 

I am looking for a way, to feed multiple names at the same time, so the parameter which I want to use, should act like a check list for the AttributeValidator.


Hi @zuzanna_sz​ 

With the logic I proposed, you only need to run the parent workspace once. However, the validation for the attributes is always the same one, but the name of the attribute can change.

If you want to do different validations you may need to hard-code the name of each attribute for any validation, and then the process won't accept datasets with different attribute names.

An alternative could be to group attributes by data type and run different child workspaces for any different data type.

Sorry not to be more helpful.


Reply