Solved

Python Caller running 10x slower in flow than FME Server

  • 16 January 2024
  • 10 replies
  • 18 views

Badge +5

Hi,

 

We recently upgraded from FME Server to FME flow and I've noticed that a few workbenches that contain only one python caller and a creator, are taking up to 10 times longer. Both environments hosting Flow and FME Server are identical in specs.

 

Here are the workspace statistics from each environment

 

FME Server:

Average Elapsed: 12 minutes 44 seconds

Average % CPU: 96%

Average CPU time: 12minutes 20 seconds

Average Peak Memory Usage 256mb

 

FME flow statistics:

Average Elapsed: 53 minutes 42 seconds

Average % CPU: 19.4

Average CPU time: 10 minutes 42 seconds

Average Peak Memory Usage 264mb

 

Does FME flow utilize CPU differently when using python, or is there some setting I can change in the form workbench? I can see from the logs that simple arcpy commands are taking 10x longer and the CPU is being under utilized in FME flow. I've looked into the transformer and can't find any processing settings - Is there a setting in FME flow that I need to configure? I can't figure out why it's so much slower in FME flow.

 

Many thanks,

 

Sam

 

icon

Best answer by sam_appleton 18 January 2024, 17:29

View original

10 replies

Userlevel 5

Is it arcpy specifically that's slowing down, or Python in general?

If you're using the class interface in your PythonCallers, make sure you include the "has_support_for" method, and make it return True whenever possible:

    def has_support_for(self, support_type):
        """This method returns whether this PythonCaller supports a certain type.
        The only supported type is fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM.
        
        :param int support_type: The support type being queried.
        :returns: True if the passed in support type is supported.
        :rtype: bool
        """
        if support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM:
            # If this is set to return True, FME will pass features to the input() method that
            # come from a feature table object. This allows for significant performance gains
            # when processing large numbers of features.
            # To enable this, the following conditions must be met:
            #   1) features passed into the input() method cannot be copied or cached for later use
            #   2) features cannot be read or modified after being passed to self.pyoutput()
            #   3) Group Processing must not be enabled
            # Violations will cause undefined behavior.
            return False  # <-- You'll want to return True here, if the above conditions are met
 
        return False

This method will most probably not be present in older workspaces, where the Python code was modified before the support for feature tables was introduced. If this method is not present, or returns False, the PythonCaller will process one feature at a time, which is much slower than using feature table mode.

Userlevel 5
Badge +28

Is it arcpy specifically that's slowing down, or Python in general?

If you're using the class interface in your PythonCallers, make sure you include the "has_support_for" method, and make it return True whenever possible:

    def has_support_for(self, support_type):
        """This method returns whether this PythonCaller supports a certain type.
        The only supported type is fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM.
        
        :param int support_type: The support type being queried.
        :returns: True if the passed in support type is supported.
        :rtype: bool
        """
        if support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM:
            # If this is set to return True, FME will pass features to the input() method that
            # come from a feature table object. This allows for significant performance gains
            # when processing large numbers of features.
            # To enable this, the following conditions must be met:
            #   1) features passed into the input() method cannot be copied or cached for later use
            #   2) features cannot be read or modified after being passed to self.pyoutput()
            #   3) Group Processing must not be enabled
            # Violations will cause undefined behavior.
            return False  # <-- You'll want to return True here, if the above conditions are met
 
        return False

This method will most probably not be present in older workspaces, where the Python code was modified before the support for feature tables was introduced. If this method is not present, or returns False, the PythonCaller will process one feature at a time, which is much slower than using feature table mode.

Interesting, @david_r​ are you suggesting that in previous versions of FME the feature table support was included without being specified automatically? 

 

The under-utilization of the CPU is interesting and matches with what FME is logging as CPU time vs actual time. Usually this is the CPU waiting on something to happen - for example like network delay for a database or a sleep/wait function.

 

Could there be anything in the script which requires going over a network connection? I've seen when writing to ESRI database that is the permissions are not correct then certain things get slowed sown. Is the user running the FME Engine service the same on both instances of FME Server/Flow?

 

 

 

 

Userlevel 5

Interesting, @david_r​ are you suggesting that in previous versions of FME the feature table support was included without being specified automatically?

 

The under-utilization of the CPU is interesting and matches with what FME is logging as CPU time vs actual time. Usually this is the CPU waiting on something to happen - for example like network delay for a database or a sleep/wait function.

 

Could there be anything in the script which requires going over a network connection? I've seen when writing to ESRI database that is the permissions are not correct then certain things get slowed sown. Is the user running the FME Engine service the same on both instances of FME Server/Flow?

 

 

 

 

My understanding is that the PythonCaller didn't have support for feature tables until relatively late. As I read the template code supplied by Safe, I seems that feature able support must be specifically activated through the has_support_for method. If this method is not present (e.g. because the Python code comes from an older version of FME before this method was supported), FME defaults to splitting up feature tables to emulate the behavior from previous versions.

I'd love to be corrected if someone knows more on this topic!

Badge +5

Is it arcpy specifically that's slowing down, or Python in general?

If you're using the class interface in your PythonCallers, make sure you include the "has_support_for" method, and make it return True whenever possible:

    def has_support_for(self, support_type):
        """This method returns whether this PythonCaller supports a certain type.
        The only supported type is fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM.
        
        :param int support_type: The support type being queried.
        :returns: True if the passed in support type is supported.
        :rtype: bool
        """
        if support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM:
            # If this is set to return True, FME will pass features to the input() method that
            # come from a feature table object. This allows for significant performance gains
            # when processing large numbers of features.
            # To enable this, the following conditions must be met:
            #   1) features passed into the input() method cannot be copied or cached for later use
            #   2) features cannot be read or modified after being passed to self.pyoutput()
            #   3) Group Processing must not be enabled
            # Violations will cause undefined behavior.
            return False  # <-- You'll want to return True here, if the above conditions are met
 
        return False

This method will most probably not be present in older workspaces, where the Python code was modified before the support for feature tables was introduced. If this method is not present, or returns False, the PythonCaller will process one feature at a time, which is much slower than using feature table mode.

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue? 

Userlevel 5

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue?

Indeed, the feature table option will have no bearing on external operations where FME is waiting. It will mostly impact the throughput of the PythonCaller, in particular if there are many features.

Regarding the timing that you're observing, there are simply too many unknown factors for me to say much about it. However, I'm assuming that you have two different machines for comparison between the versions? If so, are you 100% sure that they are identical in performance and configuration?

Userlevel 5
Badge +28

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue?

Hi @sam_appleton​  - Check this out: Why is FME slow to truncate my SDE/Geodatabase table? (safe.com)

 

I suspect that the issue might be the user running the FME Engine process - This kind of thing is often the cause of slowdowns when it comes to FME and ESRI. I've seen it a few times before and when I read your post this was my initial thought - when you mentioned that you were performing truncations this was a bit of a confirmation.

 

It still could be something else, however, I'd definitely start by checking the user/permissions here.

Userlevel 5

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue?

I'm suspecting this only goes for SDE Geodatabases and not File Geodatabases? Unless, for some strange reason, the FME service account does not have DELETE file permissions, but only MODIFY, in its own resource folder?

Badge +5

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue?

I've just realised that the specs our cloud team have supplied, are only disc writing speeds. I'll need to wait to hear back on RAM and CPU size. Hopefully it's a simple case of increasing both.

 

Another python script that is running is an append script - It combines the contents of 6 geodatabases together, so I assume this isn't the issue? I'll check this as well to make sure that it has all of the permissions. Unfortunately need to run all of this through my cloud team, but will post an update when they get back to me

Badge +5

Hi @david_r​ ,

I tried adding the code but that didn't seem to help. When I run my workbench on FME form and FME Server, it takes the pythong about a second to truncate a table. I can see that each of these commands is taking about 20 seconds on FME flow.

 

The dataset that it's truncating resides within the shared resources/Data folder within FME flow itself, so I can't imagine that this is a network issue?

They've gotten back to me with the specs and it's the exact same for each environment. I'm writing to geodatabases located in shared resources and not having any issues, so I think this might rule out any network issues? I'll keep testing with python scripts to see if any other libraries are affected

Badge +5

Hi All,

An update on this. Spoke to IT and we had a look through the config file, it turns out the SharedResources/Data isn't located on the system that hosts FME flow. For reasons unknown, they've made the location for the shared resources folder inside an azure file share, meaning that all commands to here need to traverse the internet.

 

I ran the same workbench on flow, pointing to a gdb in the C:/Drive on the machine hosting flow (Where shared resources would usually be located) and I noticed the time reduced to its normal speed. Thank you for all of your help

Reply