Created my first ever workbench, but it seems very redundant, tried recreating it in a more streamlined way, but do not know what is best practice
Hi everyone,
I am brand new to FME and created my first workbench. In a nut shell it does this:
Brings in 6 different enterprise geodatabase feature classes using 6 ESRI Geodatabase (ArcSDE Geodb) readers
Each reader is connected to a tester to only select ‘active’ values in a status field
Because I selected ‘resolve domains’ in the reader, an additional field for every field that had a domain was created. It was suffixed with “_resolved”, so I used 6 bulk attribute renamer transformers to rename the fields without the ‘_resolved”
I then did 6 attribute removers and removed any fields I did not need when it came to exporting (note, many of the fields between each feature class were the same, but there were some differences)
Finally, I used 6 different feature writers to export each feature class into a single geodatabase
As you can see, I repeated the process 6 different times for every reader/feature class that I had. I was thinking to myself while doing this, ‘this cannot be the right way to do this, what if I had 100 layers to work with).
From there I tried recreating the workbench in a more streamlined way:
Bring in the feature classes in their own individual reader
Use a junction to connect each individual reader
Use a bulk attribute renamer to strip “_resolved” from any field that has it
Use attribute remover to remove any field that I did not need (it gave me a list of every single field in every single feature class, which I really liked cause I could remove them all at once)
Use a schema scanner to generate “fme_feature_type_name” with group processing selected, grouped by “fme_feature_type” to prepare for a dynamic file geodatabase writer
Use a dynamic file geodatabase writer to export each feature class to a new geodatabase
Unfortunately I was not able to get this to fully work, but I was close.
With that said, generally when working with FME, is the best practice to do everything individually? It seems very redundant (I come from a python background), and I am always looking for efficiency. Again, I am very new to FME, so hearing your input/ideas would be greatly appreciated, maybe there is something that I don’t know and could have made this more logical?
Thank you in advance for your time and attention, I look forward to the discussion!
Page 1 / 1
The second workflow is approaching about optimum. For multiple feature types that all get processed in the same way then FME users are advised to learn about using fanout in Writers using the fme_feature_type Attribute, which it appears you have attempted to do/are attempting to learn, https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Form/Workbench/fanout_about.htm
A couple of pointers:
The fme_feature_type Attribute is, for most Readers, always read and Attributed to each read feature no matter what you set the Reader parameters to. You do not need a SchemaScanner to extract this Attribute, it will already be attributed. Most Readers will by default read this as an unexposed Attribute and not initially make it available as a Workspace attribute to do anything with, but this system Attribute can be exposed at any time, either by going into the Reader and setting which metadata fields like this that wish to expose to the Workspace, or otherwise, can do this at anytime using an AttributeExposer Transformer and asking it to expose fme_feature_type. If you look at sample cached Feature Data through the Workspace Visual Preview → Feature Informationwindow, or alternatively send the data to an Inspector Transformer to look at the Feature Information in Data Explorer, either interface will show you all Attributes, both Exposed and Unexposed, and you will virtually always see fme_feature_type as one of the Unexposed Attributes for a Feature that came from a Reader.
To make your workspace run faster/be more efficient, then it is better to deselect the Attributes do not want in the Reader(s) if possible rather than have to Read all Attributes only to drop them with AttributeRemover. Make sure to use this in conjunction with the Reader setting to only read “Exposed Attributes” rather than “All Attributes” (the second option will read all Attributes regardless of which attributes were chosen to be exposed)
SchemaScanner I generally only recommend to use on Features with Attributes that were created within the Workspace, or for data sources read that are not strongly (data)typed and instead are weakly typed (Eg. Excel, CSV, SQLite, NoSQL type sources etc.). For reading a strongly typed database like a Geodatabase, SchemaScanner is unnecessary, slower and less accurate than simply using a Reader to read the Schema of the Source rather than try to guess it from the Workspace Features. To get the Source Schema, I generally recommend the easiest way is to use a separate FeatureReader set in Schema Only mode. This will give one Schema Feature per Feature Class/Table although from memory it labels them as fme_feature_type_name which needs a separate AttributeRenamer to name this back to fme_feature_type.
In dynamic writing, it is important that the Schema Feature(s) arrive at the Writer first. There are a few ways to ensure this, although I’ve uploaded a Customer Transformer FeaturePrioritizer to the Hub to make this a little easier for users when they want to make sure the Schema Features absolutely will arrive at the Dynamic Writer first.
@bwn has got it covered, but I just wanted to pop in and say good on you for
Identifying that something could be done better/more optimally
Asking about what you could do and for feedback
Trying something new and pushing yourself!
Welcome to FME!
Hey there, @bwn,
First and foremost, thank you very much for your suggestions and detailed inputs; I very much appreciate it.
Unfortunately, I am still not able to get this optimized workbench to execute successfully.
I did, however, remove the SchemaScanner from the flow and exposed the fme_feature_type through the reader; this was something new I learned.
It seems that the flow works properly/as intended right up until the writer, however, no matter what I do within the writer, I am getting this error:
Here are the configurations I have in the writer:
It is important to note that the enterprise geodatabase feature classes that are in the reader are a mix of points, lines, and polygons.
While many of the fields are the same, they still have fields necessary for their dataset, for example, say we have a master list of fields: A, B, C, D, E, F, G, H
One feature class might have: A, B, C, D
Another feature class might have: A, B, F, G, H
Another one might have: D, E, F
In the bulk attribute remover, it gave me a list of every field from every feature class. I simply checked off the ones I needed to be removed, however, I think the original schemas are getting lost in the writer (hence the feature definition error), since it seems that the error is stating that it cannot find the geometry (I think)?
I tried many different configurations in the writer; however, it all leads back to the same error.
I also tried the FeaturePrioritizer, and still getting the same error.
On another note, @hkingsbury thank you so much for the kind words and encouragement!
To break it down a little more, because the terminology in the Errors / Writer can be confusing.
When you see the term in Errors or Parameters about a “Schema Feature”, what it means is a Feature that is Attributed in a way that matches the definition of what FME will consider as a “Schema” Feature . A Schema Feature, must have:
A feature type name Attribute that contains the text name of feature type Eg. An Attribute called “TableName” and has a value of Eg. “MyFirstTable”. Usually this will be fme_feature_type to match the main features feature type name attribute, but so long as it is the same name as the corresponding feature type name attribute on the related data features being written, it can be anything. In fact, because there are some annoying quirks with the fme_feature_type attribute often I will just rename it to “TableName”.
A List of Attribute Names and Attribute FME Data Types. Unlike the Feature Type Name Attribute that can be named anything, this List must use the format attribute{i}.Name for the Attribute Names. Further, it must have either attribute{i}.fme_data_type for the Attribute Data Type orattribute{i}.native_data_type for the Attribute Data Type. If the Schema Feature uses attribute{i}.native_data_type then the Writer will attempt to use this as an explicit data type definition to use in the Writer. If going from Geodatabase to Geodatabase, this will work fine since they have common native data types. If instead use the form attribute{i}.fme_data_typethenthe Writer will attempt to use this as an implicit data type definition to use in the Writer. What happens in the second case is the Writer will make an educated guess as to what the equivalent native data type is equivalent to fme_data_type to create the table/feature class with based on the fme data type and the data format being written. This is also one of the reasons why SchemaScanner can be less accurate because it doesn’t know the source native data type, only the fme_data_type. Further, SchemaScanner only knows what Eg. character limit is needed for the particular features within the workspace that may or may not include a sample of features that have the maximum number of characters that are going to be ever used in the geodatabase and so will almost always lead to Geodatabases with fields with smaller character widths than the original Geodatabase source.
Optionally, an extra List value that defines the Geometry Type called fme_geometry{0}. In dynamic writers there are a few different ways to specify the geometry type so this special Schema Feature attribute is optional. In the example below this has been determined to be “fme_no_geom” in the Schema Feature output by the Geodatabase FeatureReader. ie. It is a Geodatabase Table!
So see the “Schema Feature” as a Table Definition. It holds the Table Name, all the Table Field Names and their Data Types, and optionally what is the Geometry Type. Knowing this, we can manipulate the Schema Feature/Table Definition for when we need to say, remove, add or rename fields from the destination table to be written (Noting that for Rename we need to do the same on the Main Features)
Now, above we said there were a couple more options for Geometries in Dynamic Writing. Whilst we can add/ensure there is an fme_geometry{0} Attribute in the Schema Feature , instead what I prefer to do is let the Writer figure out the geometry type itself. It can do this by looking at the geometry of the first data feature for that feature type and using its geometry type to set this for the Writer Schema.
Now that we know how Schema Features work, the error message in the log perhaps makes a little more sense. What it is saying is that the Writer has not received a Schema Feature before it attempted to write the main data features, and it needs this because the user has specified that the Table Definition/Schema will be set by this separate Schema Feature for the Hydrants table.
This is usually from one of two things:
There was no separate Schema Feature sent to the Writer, with a feature type name attribute the same as the feature type name attribute in the main data features.
The Schema Feature arrived at the Writer after the main data features. See in my above sample I am using FeaturePrioritizer to make sure that this does not happen.
@bwn thank you for such detailed responses and explanations, it has definitely helped me understand the inner workings of FME, that’s for sure.
Unfortunately, I am still getting the same error. Since the attribute remover is modifying the schema, it seems that the dynamic writer does not know how to interpret the changes and is causing the error. If I change the schema source to in the writer to the SDE reader, it executes successfully, however, this exports a feature class with all the original fields and none of the fields/attributes I removed in the attribute remover.
No matter what I input into the ‘feature class or table name’ option or the ‘schema definition name’ option, I still get the same error. I also ensured the fme_feature_type values are the same as the feature class name.
I am really not sure what is going on or what I am doing wrong, but it seems it should not be as difficult as it actually is.