Question

Getting the attribute type metadata of incoming features

2 years ago
July 24, 2023
4 replies
95 views

+13

vlroyrenn
Enthusiast
63 replies

I'm trying to make a dynamic transformer that uses a feature writer to create a temporary file in some format (FFS, CSV, Parquet, doesn't matter) and I need to supply a schema line for the Feature Writer to accept the incoming features. Most readers won't supply one such line, and even though FME is keeps track of some sort of attribute type tag that AttributeMapper is now able to modify, and that AttributeValidator can check with the TYPE operator, there seems to be no way to either get the attribute type map from the features themselves (as shown when hovering the headers of the data inspector columns) or to get the attribute list from the feature type definition (as shown when inspecting a reader and checking the FEATURE_TYPE section of an FMW file).

The only ways to get a schema line seem to be to use a SchemaScanner, a SchemaMapper, an AttributePivoter or a FeatureReader. SchemaScanner seems to be the reccomended option in a case like mine, although that does seem to render the new AttributeManager type setter feature from 2023.0 completely useless, since FeatureWriters can't make use of it without a schema scanner that will completely overwite that.

Is there a way I can get an attributes{} array using the same FME types that are already being used by the cache and inspector? Clearly this isn't something where the features are scanned before writing the FFS cache files, otherwise AttributeManager wouldn't allow setting these.

+39

virtualcitymatt
Celebrity
1899 replies
2 years ago
July 26, 2023

I'm not sure if this helps you or not but you can use a Recorder transformer to create a temp file. This is essentially the tool that was used before feature caching came along - I'm pretty sure it's all the FeatureCache is anyway. It means you don't need to worry about the schema but it really doesn't solve the problem .

The AttributeManager and AttributeValidator can only ever act on attributes which are exposed. The SchemaScanner is a way to define/extract the schema from the attributes based on their values where the attributes are not exposed (I guess the same way the the CSV reader works).

Being able to set the type in the AttributeManager is really handy but is kind of pointless like you say when you don't know what the schema is anyway. Where it's helpful is when you do know what the schema is and you want to set the attribute type without having to manually change the output FeatureType schema (you can keep the schema definition as automatic) - the manual work gets old fast when you have a several FeatureTypes, especially when you want to add more attributes to an existing workspace.

If you have a mixture of Defined Schema and that from a SchemaFeature it will prefer the defined definition over the one in the SchemFeature (or at least it did for the few formats I tested)

Here I have only one attribute exposed and I've set the width to be 3 (in an AttibuteManager earlier in the workflow). The SchemaScanner has is as "fme_data_type: fme_varchar(7)".

When I read the data back into FME I can see the defined type has kept the length of 3 and has indeed truncated my string.

I'm not sure if this help of not for your case

+13

vlroyrenn
Author
Enthusiast
63 replies
2 years ago
July 27, 2023

> Being able to set the type in the AttributeManager is really handy but is kind of pointless like you say when you don't know what the schema is anyway.

Well, this is for a dynamic custom transformer, so I wouldn't know what the schema is on the inside (since it may change between instances), but I would know what it is on the outside. I would expect an AttributeMapper upstream from (i.e: outside) my transformer to be enough to setup the types for columns that shouldn't default to being text and have them be readable in some way from inside the transformer, but it doesn't seem like that's the case.

> I'm not sure if this helps you or not but you can use a Recorder transformer to create a temp file. This is essentially the tool that was used before feature caching came along - I'm pretty sure it's all the FeatureCache is anyway. It means you don't need to worry about the schema but it really doesn't solve the problem .

That would actually be massively helpful in my case if FME Feature Storage/FME Feature Table files were supported ouside of FME, so that Python/C/Java/System tools were able to extract, process and save the feature data without having to deal with feature objects (which, in my experiennce, are very slow).

> If you have a mixture of Defined Schema and that from a SchemaFeature it will prefer the defined definition over the one in the SchemFeature (or at least it did for the few formats I tested)

Unfortunately, you can't have "Feature Type definition" as a parameter type for custom transformers, and you can't have reader or writer nodes either, so that's unfortunately not useful in my case.

> I'm not sure if this help of not for your case

It's almost what I would need, but unfortnately not quite. Your input is much appreciated, though.

+39

virtualcitymatt
Celebrity
1899 replies
2 years ago
July 28, 2023

vlroyrenn wrote:

> Being able to set the type in the AttributeManager is really handy but is kind of pointless like you say when you don't know what the schema is anyway.

Well, this is for a dynamic custom transformer, so I wouldn't know what the schema is on the inside (since it may change between instances), but I would know what it is on the outside. I would expect an AttributeMapper upstream from (i.e: outside) my transformer to be enough to setup the types for columns that shouldn't default to being text and have them be readable in some way from inside the transformer, but it doesn't seem like that's the case.

> I'm not sure if this helps you or not but you can use a Recorder transformer to create a temp file. This is essentially the tool that was used before feature caching came along - I'm pretty sure it's all the FeatureCache is anyway. It means you don't need to worry about the schema but it really doesn't solve the problem .

That would actually be massively helpful in my case if FME Feature Storage/FME Feature Table files were supported ouside of FME, so that Python/C/Java/System tools were able to extract, process and save the feature data without having to deal with feature objects (which, in my experiennce, are very slow).

> If you have a mixture of Defined Schema and that from a SchemaFeature it will prefer the defined definition over the one in the SchemFeature (or at least it did for the few formats I tested)

Unfortunately, you can't have "Feature Type definition" as a parameter type for custom transformers, and you can't have reader or writer nodes either, so that's unfortunately not useful in my case.

> I'm not sure if this help of not for your case

It's almost what I would need, but unfortnately not quite. Your input is much appreciated, though.

This video gives a good insight into the whole thing: https://www.youtube.com/watch?v=_MoalhW8zlA - Explains some of the decisions.

+13

vlroyrenn
Author
Enthusiast
63 replies
2 years ago
July 28, 2023

virtualcitymatt wrote:

This video gives a good insight into the whole thing: https://www.youtube.com/watch?v=_MoalhW8zlA - Explains some of the decisions.

Ah, I see, that clears up a lot of things. Thank you, Matt!

What I would be looking for, then, is some hypothetical AutomaticSchemaExtractor transformer that derives feature schema at design time, in the same way that writer nodes do, and outputs it on a separate tag (with an attribute{} list) like a FeatureReader or a SchemaScanner. This is the missing part to allow nodes that need to be able to have a schema dynamically fed to them at runtime (like a generic custom transformer, possibly with an intermediate FeatureWriter, or maybe a schema-aware but semi-generic PythonCaller), but to still be working on features of known types. Like, in my use case, I could probably just have a Creator+AttributeManager create the Schema line manually, but it would be very tedious to create (one translation I would be interested in using this on has 106 columns), and I would then need to be sure to keep it up to date with the data format from the upstream nodes is the schema evolves.

So I guess that the best I could manage on current FME would be to get something like this working, which tries to extract the internal FME attribute types as exposed to Python and generate a dynamic attribute schema feature at the beginning (like a FeatureReader) that the FeatureWriter could then pick up and use to define the columns dynamically. Besides not working for a reason I don't quite understand (I'm getting a "feature does not contain schema information" error), I'm not sure how I would get rid of format_attributes like fme_type or multi_reader_id.

import fme
import fmeobjects
from fmeobjects import FMEFeature
from fmegeneral.plugins import FMEEnhancedTransformer
 
fme_attribute_type_map = {
    fmeobjects.FME_ATTR_BOOLEAN: "fme_boolean",
    fmeobjects.FME_ATTR_INT8: "fme_int8",
    fmeobjects.FME_ATTR_UINT8: "fme_uint8",
    fmeobjects.FME_ATTR_INT16: "fme_int16",
    fmeobjects.FME_ATTR_UINT16: "fme_uint16",
    fmeobjects.FME_ATTR_INT32: "fme_int32",
    fmeobjects.FME_ATTR_UINT32: "fme_uint32",
    fmeobjects.FME_ATTR_INT64: "fme_int64",
    fmeobjects.FME_ATTR_UINT64: "fme_uint64",
    fmeobjects.FME_ATTR_REAL32: "fme_real32",
    fmeobjects.FME_ATTR_REAL64: "fme_real64",
    fmeobjects.FME_ATTR_STRING: "fme_buffer", # System-encoded strings, like windows-1252
    fmeobjects.FME_ATTR_ENCODED_STRING: "fme_buffer", # UTF-8 and US-ASCII strings
 
    # No info on these
    fmeobjects.FME_ATTR_REAL80: "fme_real64",
    fmeobjects.FME_ATTR_UNDEFINED: "fme_buffer",
}
 
class FeatureProcessor(FMEEnhancedTransformer):
    def setup(self, first_feature: FMEFeature):
        """
        This method is only called for the first input feature.
        Implement this method to perform any necessary setup operations,
        such as getting constant parameters.
        """
 
        schema_feature = FMEFeature()
        schema_feature.setAttribute("fme_schema_handling", "schema_only")
 
        # https://docs.safe.com/fme/html/FME-Form-Documentation/FME-ReadersWriters/schema_from_table/Feature_Representation.htm
        # https://community.safe.com/s/article/dynamic-workflow-tutorial-destination-schema-is-de-2
        schema_feature.setAttribute("fme_feature_type_name", first_feature.getFeatureType())
 
        for count, attr in enumerate(first_feature.getAllAttributeNames()):
            attr_fme_type_int = first_feature.getAttributeType(attr)
            attr_fme_type = fme_attribute_type_map.get(attr_fme_type_int, "fme_buffer")
 
            schema_feature.setAttribute(f"attribute{{{count}}}.name", attr)
            schema_feature.setAttribute(f"attribute{{{count}}}.fme_data_type", attr_fme_type)
 
        self.pyoutput(schema_feature)
 
    def receive_feature(self, feature: FMEFeature):
        """
        Override this method instead of :meth:`input`.
        This method receives all input features, including the first one that's also passed to :meth:`setup`.
        """
        self.pyoutput(feature)
 
    def finish(self) -> None:
        """Override this instead of :meth:`close`."""
        pass

If that did work, it would be enough to resolve my issue.

Reply

Rich Text Editor, editor1

Getting the attribute type metadata of incoming features

4 replies

Reply

Helpful Members This Week

Recently Solved Questions

Read Access query FME

Process CSV file pairs

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

DataVirtualizationJSONPaginateicon

FeatureWriter silently dropping bad features and total features written count is wrongicon

Help with JSONTemplater - Sub Templates & Arraysicon

List Attributesicon

Line/rule with expose action in AttributeManager isn't recognized by (following) lines/rules of AttributeManagericon

Helpful Members This Week

Recently Solved Questions

Read Access query FME

Process CSV file pairs

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings