Skip to main content
Open

PythonCreator to support bulk mode output

Related products:FME FormIntegrationsTransformers
  • November 7, 2025
  • 7 replies
  • 216 views

bruceharold
Supporter
Forum|alt.badge.img+19

With libraries like DuckDB in your Python environment, as is the case where you either have ArcGIS Pro or Enterprise on your FME machine, or you’re using ArcGIS Data Interoperability, then it is easy to retrieve potentially millions of features from an S3-API compliant object store or other web source in seconds, but it’s a bottleneck to send them on into the workspace one by one.

I would like to see performance like reading CSV files brought to PythonCreator.

There is a possibly related existing idea Introduce a Python Dataframe Creator/Transformer but I don’t want to confound using dataframes with this idea, which is fundamentally about performance.

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

7 replies

PierreAtSafe
Safer
Forum|alt.badge.img+8
  • Safer
  • November 10, 2025
NewOpen

bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • November 10, 2025

I refactored a test workspace to use DuckDB in a PythonCaller, which has a class method to return support for bulk mode, but using a feature list (like in the PythonCaller help) to read what DuckDB read from source filled memory (16 million point features) so wasn’t really a solution.  Backing off to a cursor worked.  If we could go directly from a DuckDB relation to a feature table we might avoid the memory issues and get features in 100K chunks.


bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • November 10, 2025

Now I discover AttributeKeeper can create bulk mode features downstream, which may help...


vlroyrenn
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • December 11, 2025

There are hacks to force bulk mode output on Python transformers, but the current structure of the PythonCaller/PythonCreator underlying factory completely prevents direct interaction with bulk mode features inside Python.

Bulk mode, as far as I understand it, passes up to one hundred thousand features as a single one that has pretty much no attributes and a geometry of type "fme_feature_table". That geometry type is not available through the fmeobjects Python bindings (nor Java or C#, by the looks of it), and there’s no documented ways of interacting with it (the fmeobjects C++ bindings don’t have any headers for IFeatureTable, and there is no IFeatureTableIterator like there is for other geometry types, either), so there doesn’t seem to be any intended way for users to build their own bulk mode readers/writers/transformers.

Python code can only output feature tables if it’s returning features that have been marked as being part of a feature table, which is how hasSupportFor(FME_SUPPORT_FEATURE_TABLE_SHIM) can preserve the “bulk-ness” of incoming feature tables (so long as you only return features that were passed in) and how the aforementionned hack can abuse the AttributeKeeper trick to output brand new bulk mode features, even though there are no ways to create bulk-mode FMEFeatures directly.

 

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

Honestly, given how they say “A feature table has a schema. The schema defines a list of attributes in the feature table. Each attribute has a name and type. Possible attribute types are Real64, Real32, UInt64, UInt32, UInt16, UInt8, Int64, Int32, Int16, Int8, Boolean, and String.”, I think a dataframe-like container would be by far the most practical interchange format for bulk mode features.

 

EDIT: I just noticed IFeatureTable is supposed to have an  fme_geometrytype of fme_aggregate (possibly with schema feature attributes), meaning they would (probably?) have the same interface as other aggregate features for iteration and such.


bruceharold
Supporter
Forum|alt.badge.img+19
  • Author
  • Supporter
  • December 11, 2025

There are hacks to force bulk mode output on Python transformers, but the current structure of the PythonCaller/PythonCreator underlying factory completely prevents direct interaction with bulk mode features inside Python.

Bulk mode, as far as I understand it, passes up to one hundred thousand features as a single one that has pretty much no attributes and a geometry of type "fme_feature_table". That geometry type is not available through the fmeobjects Python bindings (nor Java or C#, by the looks of it), and there’s no documented ways of interacting with it (the fmeobjects C++ bindings don’t have any headers for IFeatureTable, and there is no IFeatureTableIterator like there is for other geometry types, either), so there doesn’t seem to be any intended way for users to build their own bulk mode readers/writers/transformers.

Python code can only output feature tables if it’s returning features that have been marked as being part of a feature table, which is how hasSupportFor(FME_SUPPORT_FEATURE_TABLE_SHIM) can preserve the “bulk-ness” of incoming feature tables (so long as you only return features that were passed in) and how the aforementionned hack can abuse the AttributeKeeper trick to output brand new bulk mode features, even though there are no ways to create bulk-mode FMEFeatures directly.

 

Dataframes might be how this idea is implemented, but my guess is that would be a heavy lift for Safe, another way might be a way to output an aggregate feature.

Honestly, given how they say “A feature table has a schema. The schema defines a list of attributes in the feature table. Each attribute has a name and type. Possible attribute types are Real64, Real32, UInt64, UInt32, UInt16, UInt8, Int64, Int32, Int16, Int8, Boolean, and String.”, I think a dataframe-like container would be by far the most practical interchange format for bulk mode features.

Thank you for your thoughtful reply!


vlroyrenn
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • January 13, 2026

I checked with support and I’ve been told that feature tables having no documented interface is intentionnal, so that they don’t have to maintain compatibility with third party code for a feature that’s still evolving. That rules out the possibility of having any community-maintained solution in the meantime that interfaces with FeatureTables directly instead of using shims, factory hacks, or temporary files.

Despite what I said in the “Python Dataframe Creator” idea, I don’t think splitting PythonCaller is the right approach, at least not anymore. I would actually propose the following:

  • The fmeobjects library adds an FMEFeatureTable class, which is opaque and primarily exposes 2 methods: FMEFeatureTable.__dataframe__(self, allow_copy=True), and FMEFeatureTable.from_dataframe(df):

    • FMEFeatureTable.__dataframe__() is the main entry point for the dataframe interchange protocol, which dataframe libraries would use to read/copy the feature table data, which users would then be able to use. It is the only method that’s actually required on FMEFeatureTable, as all the others are expected to exist on the interchange object that this function outputs.

    • Using the dataframe interchange protocol instead of returning a Pandas dataframe object or similar allows FME to not tie itself to any specific dataframe library and use its native APIs under the hood in whichever ways are most efficient.

    • FMEFeatureTable.from_dataframe(), then, would be to allow users to transform their own dataframe objects into feature tables that the PythonCaller can output.

  • In addition to that, PythonCaller.has_support_for() would recieve a new support flag to test for, FME_SUPPORT_FEATURE_TABLES_ONLY, which would determine whether the transformer expect input()/input_from() (or a new third method if we don’t want to overload the current one) to recieve FMEFeature objects or FMEFeatureTable objects.

    • Because of how FME_SUPPORT_FEATURE_TABLE_SHIM has been the only supported flag for several years now (so that there might be people who have that check written as def has_support_for(self, support_type): return True), FEATURE_TABLE_ONLY should be tested first, but if FEATURE_TABLE_SHIM also returns true, then FEATURE_TABLE_ONLY should be considered false. This is so transformers that need to support multiple versions of FME can activate either the input shim or the table input mode depending on what’s available (by keeping track of whether FEATURE_TABLE_ONLY was checked for with an instance variable), but without breaking compatibility with transformers what would naïvely assume that has_support_for() would never ask for anything other than FEATURE_TABLE_SHIM.

    • I’m proposing FME_SUPPORT_FEATURE_TABLES_ONLY because, as it currently exists in  fmeobjects\cpp\fmesupporttype.h, it does not require has_support_for() to examine the incoming feature table (which it would have no mean of doing because it only has one parameter, whereas the C++ version of that function can recieve the geometry object and determine whether it wants to process it as a table or broken up into individual features)

  • The main issue this would leave unadressed would be geometry, as there is no type for this in the dataframe protocol. I’m not currently (January 2026) sure of how that should be adressed.

Now, I’m fully aware that there is likely tons of context I’m missing regarding why it might actually be a lot more complicated than what I’m proposing here to add this sort of feature, but I think it strikes a good balance with regards to maintenance burden, not exposing private interfaces and ensuring compatibility with whatever users might want to do.


vlroyrenn
Enthusiast
Forum|alt.badge.img+14
  • Enthusiast
  • January 13, 2026

Having looked into it some more, I stumbled upon this Github issue by the Narwhals dataframe compatibility layer guy (and also Pandas and Pola-rs contributor) who points out that the dataframe interchange protocol didn’t take off as expected and how the Arrow PyCapsule interface looks like what the ecosystem is settling on (although the biggest non-GPU libraries like Pandas, Pola-rs and Narwhal support both). Arrow does have standardized spatial extension types, though interop for these is apparently not quite there yet.

Didn’t think of it earlier, but I forgot you could serialize geometry into WKB/WKT, which I guess would make a good stopgap for passing FME feature geometry as a dataframe column.