Hi everybody, I have written a very simple script that:Reads some ESRI-JSON placed in folder tree recursively (**/*.json)For every data which is in the same folder (= same type of data), gathers all entries in the same dataset and creates a new feature class with the name of the folder in an ESRI file geodatabase (having eliminated duplicates) Precisions:Every input json files in the same folder share the same schema. But in differents folders the schemas are not the same and there may be or not M values attached to the data. However, the field OBJECTID can always be found and that key is used to remove duplicates. Issue:I cannot find the way of setting the output port to retrieve the original schema which is the same of the JSON files the output dataset aggregates. When I check "Dynamic Schema Definition" I have no choice but to use fme_feature_type (the script fails otherwise) attribute but it represents the whole schema (since all files are ESRI-JSON); so there are lots of useless fields in the result... What would be the proper setting in your opinion?Below you can find a snapshot of the script used as well as the current parameters used for output.Thanks for your help!

Handle properly dynamic schemas

Hi everybody,

I have written a very simple script that:

Reads some ESRI-JSON placed in folder tree recursively (**/*.json)
For every data which is in the same folder (= same type of data), gathers all entries in the same dataset and creates a new feature class with the name of the folder in an ESRI file geodatabase (having eliminated duplicates)

Precisions:

Every input json files in the same folder share the same schema. But in differents folders the schemas are not the same and there may be or not M values attached to the data.
However, the field OBJECTID can always be found and that key is used to remove duplicates.

Issue:

I cannot find the way of setting the output port to retrieve the original schema which is the same of the JSON files the output dataset aggregates. When I check "Dynamic Schema Definition" I have no choice but to use fme_feature_type (the script fails otherwise) attribute but it represents the whole schema (since all files are ESRI-JSON); so there are lots of useless fields in the result...

What would be the proper setting in your opinion?

Below you can find a snapshot of the script used as well as the current parameters used for output.

Thanks for your help!

Page 1 / 1

I was going to suggest a FeatureReader to read schema and datasets but that only reads the first schema, since the feature_type is always ESRIJSON

If schemas are identical for each folder, I think i might look at using a workspace runner, and running your process once per folder

Hi @ebygomm

Yes, I think that is the answer.

I give it a try and provide feedback.

Thanks a lot!

Hum... Getting closer but still not right.

I made a parent workspace which role is to call a child workspace for each data folder found.

The process runs smoothly but at the end I get only one dataset in the geodatabase (the first to be treated?).

To check on what was going on (since the problem with that method is that there is no log from the child process), I added an ESRI-JSON output to visualize out files. When testing with theses new settings, it appears that the JSON files are all generated correctly, but there is still only one dataset in the geodatabase.

Parent workspace

Child workspace

Hum... Getting closer but still not right.

I made a parent workspace which role is to call a child workspace for each data folder found.

The process runs smoothly but at the end I get only one dataset in the geodatabase (the first to be treated?).

Parent workspace

Child workspace

I'd try something like this

Capture How are you specifying the name for your dataset?

Hi!

Many thanks! I have finally succeeded to do what I wanted.

The FeatureReader works well! It enables not to specify the whole schema before so it is completely dynamics, thanks for the tip!

The problem did not came from this though. It was on the ESRI gdb writer. Indeed, I checked the option "overwrite" which was OK until I split my main process using "workspaceRunner". Since the output port then went to the child process, the gdb whole content when then deleted EACH TIME thsi process was run, that is for each folder to be treaded. That is why I ended up with only one dataset; the last treated one...

This thread can be marked as SOLVED.

Take care,

Jean

I unchecked "overwrite" and used "drop table" option instead in the feature types parameters.

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded