Skip to main content
Open

PythonCreator to support bulk mode output

Related products:FME FormIntegrationsTransformers
  • November 7, 2025
  • 5 replies
  • 152 views

bruceharold
Supporter
Forum|alt.badge.img+19

With libraries like DuckDB in your Python environment, as is the case where you either have ArcGIS Pro or Enterprise on your FME machine, or you’re using ArcGIS Data Interoperability, then it is easy to retrieve potentially millions of features from an S3-API compliant object store or other web source in seconds, but it’s a bottleneck to send them on into the workspace one by one.

I would like to see performance like reading CSV files brought to PythonCreator.

There is a possibly related existing idea Introduce a Python Dataframe Creator/Transformer but I don’t want to confound using dataframes with this idea, which is fundamentally about performance.

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

5 replies

PierreAtSafe
Safer
Forum|alt.badge.img+8
NewOpen

bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • November 10, 2025

I refactored a test workspace to use DuckDB in a PythonCaller, which has a class method to return support for bulk mode, but using a feature list (like in the PythonCaller help) to read what DuckDB read from source filled memory (16 million point features) so wasn’t really a solution.  Backing off to a cursor worked.  If we could go directly from a DuckDB relation to a feature table we might avoid the memory issues and get features in 100K chunks.


bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • November 10, 2025

Now I discover AttributeKeeper can create bulk mode features downstream, which may help...


vlroyrenn
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • December 11, 2025

There are hacks to force bulk mode output on Python transformers, but the current structure of the PythonCaller/PythonCreator underlying factory completely prevents direct interaction with bulk mode features inside Python.

Bulk mode, as far as I understand it, passes up to one hundred thousand features as a single one that has pretty much no attributes and a geometry of type "fme_feature_table". That geometry type is not available through the fmeobjects Python bindings (nor Java or C#, by the looks of it), and there’s no documented ways of interacting with it (the fmeobjects C++ bindings don’t have any headers for IFeatureTable, and there is no IFeatureTableIterator like there is for other geometry types, either), so there doesn’t seem to be any intended way for users to build their own bulk mode readers/writers/transformers.

Python code can only output feature tables if it’s returning features that have been marked as being part of a feature table, which is how hasSupportFor(FME_SUPPORT_FEATURE_TABLE_SHIM) can preserve the “bulk-ness” of incoming feature tables (so long as you only return features that were passed in) and how the aforementionned hack can abuse the AttributeKeeper trick to output brand new bulk mode features, even though there are no ways to create bulk-mode FMEFeatures directly.

 

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

Honestly, given how they say “A feature table has a schema. The schema defines a list of attributes in the feature table. Each attribute has a name and type. Possible attribute types are Real64, Real32, UInt64, UInt32, UInt16, UInt8, Int64, Int32, Int16, Int8, Boolean, and String.”, I think a dataframe-like container would be by far the most practical interchange format for bulk mode features.

 

EDIT: I just noticed IFeatureTable is supposed to have an  fme_geometrytype of fme_aggregate (possibly with schema feature attributes), meaning they would (probably?) have the same interface as other aggregate features for iteration and such.


bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • December 11, 2025

There are hacks to force bulk mode output on Python transformers, but the current structure of the PythonCaller/PythonCreator underlying factory completely prevents direct interaction with bulk mode features inside Python.

Bulk mode, as far as I understand it, passes up to one hundred thousand features as a single one that has pretty much no attributes and a geometry of type "fme_feature_table". That geometry type is not available through the fmeobjects Python bindings (nor Java or C#, by the looks of it), and there’s no documented ways of interacting with it (the fmeobjects C++ bindings don’t have any headers for IFeatureTable, and there is no IFeatureTableIterator like there is for other geometry types, either), so there doesn’t seem to be any intended way for users to build their own bulk mode readers/writers/transformers.

Python code can only output feature tables if it’s returning features that have been marked as being part of a feature table, which is how hasSupportFor(FME_SUPPORT_FEATURE_TABLE_SHIM) can preserve the “bulk-ness” of incoming feature tables (so long as you only return features that were passed in) and how the aforementionned hack can abuse the AttributeKeeper trick to output brand new bulk mode features, even though there are no ways to create bulk-mode FMEFeatures directly.

 

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

Honestly, given how they say “A feature table has a schema. The schema defines a list of attributes in the feature table. Each attribute has a name and type. Possible attribute types are Real64, Real32, UInt64, UInt32, UInt16, UInt8, Int64, Int32, Int16, Int8, Boolean, and String.”, I think a dataframe-like container would be by far the most practical interchange format for bulk mode features.

Thank you for your thoughtful reply!