Open

Support for Apache Arrow for Python Caller

Related products:Transformers

1 year ago
May 23, 2024
4 replies
54 views

raean
Contributor
2 replies

Support for apache arrow in memory data exchange format as input to Pythn script caller and allow an apache arrow object to be returned by the python caller as well would be great. There would be a number of advantages to this.

Remove the need to install fme objects into my python environment.
Make the python interface cleaner and easier to use. Just convert the apacje arrow object into your data frame, Reducing the barrier to using python in workflows.
Scripts could more easily be developed and tested in IDEs like vscode, and pycharm making the development of scripts far nicer.
Supported by many libraries. e.g. tensorflow, pandas, polars, duckdb, pyspark etc.
Would make it easier to embed machine learning models into workflows.
Potentially opens up the integration of other languages into FME due to the arrow format being language an gnostic.

+13

vlroyrenn
Enthusiast
63 replies
1 year ago
June 11, 2024

Related Idea:

Intoduce a Python Dataframe Creator/Transformer Open
8 Votes

There are many ways FME could go about this, and it’s unclear to me at this time what can be done without breaking compatibility with existing flows. Another alternative to passing Apache Arrow file handles/paths would be for FME to create some hypothetical FMEPythonFeatureTable object that’s compatible with the Dataframe Interchange Protocol, and pass that instead. This protocol defines a Python-level interface to a dataframe object that is deliberately very similar to Apache Arrow, but without actually depending on the library or any other specific one.

+1

raean
Author
Contributor
2 replies
1 year ago
June 12, 2024

I think changing the current python caller would not be good. I did a bit more thinking about this and you could have something like the InLineQuerier where maybe a new transformer the PythonInLineQuerier where you generate an apache arrow interface. You then select your virtual env. You could maybe have an advanced option to provide a conda-lock file that could build the environment if it did not exist useful for sharing and deployment. Possible uses would be that I had developed a machine learning algorithm that predicted missing values in a table. I could feed the table into the model and get the results. I know I could do other ways. Like write the data out and then use a system caller to a bat or power shell script and run it that way or use a workspace runner with a startup script, but this is clunky. It could possibly be language agnostic.

+13

vlroyrenn
Enthusiast
63 replies
1 year ago
June 12, 2024

raean wrote:

I think changing the current python caller would not be good. I did a bit more thinking about this and you could have something like the InLineQuerier where maybe a new transformer the PythonInLineQuerier where you generate an apache arrow interface. You then select your virtual env. You could maybe have an advanced option to provide a conda-lock file that could build the environment if it did not exist useful for sharing and deployment. Possible uses would be that I had developed a machine learning algorithm that predicted missing values in a table. I could feed the table into the model and get the results. I know I could do other ways. Like write the data out and then use a system caller to a bat or power shell script and run it that way or use a workspace runner with a startup script, but this is clunky. It could possibly be language agnostic.

Python execution environment would be mostly orthogonal to how feature input and output is handled. Virtual environment support has been requested for a while, but no news, still:

Is there a way to use a python virtual environment?

Python VENV Open
14 Votes

Until then, you’ll have to deal with having a single Python environment (per user) for FME Form and a separate one for FME Flow (remember to keep them in sync, FME doesn’t keep track of execution dependencies) that you manually install Python libs into. For local development, the best you can do is use fme-packager to connect your venv with your local FME import path.

The Apache arrow advantages I have in mind are more performance-oriented and focused on efficient and practical dataframe processing, because correctly loading data into a dataframe is a very error-prone and convoluted process for something that’s probably not an uncommon use-case.

+15

LizAtSafe
Safer
1507 replies
10 months ago
August 9, 2024

New→Open

Support for Apache Arrow for Python Caller

4 replies

Reply

Helpful Members This Week

Recently Solved Questions

NeighborFinder output with multiple candidate have same Measure value

Workspace app: ArcGIS Online Feature Service Reader: Connection 'AGOL service' does not exist.

FME Flow Automation versioning

Difference between CoordinateSystemSetter and "Define Projection" in ArcGIS Pro

FME Log "Language" for VS Code

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Run ArcPro Python from FMEicon

Enable selection of coding fonts with ligatures in Python, R, and Tcl editors

Bulk feature marshalling/unmarshalling from FME to Pythonicon

FeatureWriter: Option to write to temporary file without giving an explicit path

Bulk Mode in PythonCallericon

Helpful Members This Week

Recently Solved Questions

NeighborFinder output with multiple candidate have same Measure value

Workspace app: ArcGIS Online Feature Service Reader: Connection 'AGOL service' does not exist.

FME Flow Automation versioning

Difference between CoordinateSystemSetter and "Define Projection" in ArcGIS Pro

FME Log "Language" for VS Code

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings