Skip to main content

Kia-ora all,

I am trying to use FME to implement a pseudo regression analysis. I have a workbench that performs a core process that comes up with 1 output value to tell me how well various parameters are corellated to a known result. This depends on a bunch of parameters (8 in all), and I need to run through testing each possible combination of parameters (256), turning them on and off, to find the best combination.

Now I can create this scenario quite well with nested custom transformers, with each one just setting a parameter to 0 moving to the next nest level, then setting to 1 etc. Last level of nests is the bit where the core processing actually happens. This effectively loops through all 256 possibiities, without technically looping in FME.

But I am discovering that this is probably not the most efficient way to do it, as each extra nest adds a load of time to actually working in workbench, as presumably it always has to think of how anything I do affects any nests underneath it.

Then when running the workbench (with nests 6 deep), FME has to rebuild the full script which takes about 12 minutes, reconfigure memory which takes another 7 minutes, then actually run the process which takes 3 minutes.

 

 

Is there a way to use python or some other way to control the looping?, without having to start fme and re-read the inputs each time the core process is called. I figure that the core process by itself takes about 4 seconds, whereas starting fme and reading the inputs takes about 20 seconds.

Thanks

 

Keith
Hi Keith,

 

 

this actually sounds like an ideal scenario for Python. You cannot control the workbench flow (directly) from a PythonCaller, but you can easily do all you regression analysis from within the PythonCaller.

 

 

You can do your analysis either on the basis of each single feature that enters or on the total number of featurer (or even groups, if you're creative).

 

 

Here are some good starting points, if you haven't read them already:

 

There are also some interesting examples here:

 

Good luck!

 

 

David
Did you already know about the Loop output-port within a Custom Transformer? Together with a counter on the features before entering the Custom Transformer perhaps it might help you?

 

 

 


Thanks David,

 

I am slowly putting this together with Python, and it is taking shape nicely.

 

My biggest issue now is down to memory usage.

 

I am running a loop (nested set of loops) that runs 256 times, and within each loop I run another loop that runs through each of the ~5500 features from my input file. I do some calculations and statistics on each pass and write a new feature based on this. So I end up writing 256 new features.

 

This all works very well (surprisingly), but I run into memory problems. I have broken out of my feature loop at the 5000 mark, and this completes OK with peak memory usage at 1740948 kB. But letting it run through the entire feature set crashes out of memory

 

I presume I should be tidying things up inside my python loops, but not exactly sure what to do.

 

cheers

 

keith
As an extra note, I have found a couple of things that may be handy to beginners:

 

 

I have 3 separate inputs into my PythonCaller, and I can access them all separately using:

 

   def __init__(self):

 

       self.weightingList = h]

 

       self.cctvList = Â]

 

       self.lutList = Â]

 

       

 

   def input(self,feature):

 

       featType = feature.getFeatureType()

 

       if featType == "weightings":

 

           self.weightingList.append(feature)

 

       elif featType == "CCTV":

 

           self.cctvList.append(feature)

 

       elif featType == "lut_ProbabilityByLength":

 

           self.lutList.append(feature)

 

 

then in the def close(self) part where you pretty much do all the coding you can loop through each individually:

 

    for weightingFeat in self.weightingList:

 

and

 

    for cctvFeat in self.cctvList:

 

 

This works pretty nicely.

 

I have also found that sometimes I need to explicitly set variables to float when using GetAttribute, ie

 

depthNA = float(cctvFeat.getAttribute("Depth_NoAction"))

 

if I really need it to be a float when using in an equation afterwards. This stumped me for a while as I assumed the getAttribute method returned whatever the field type was.

 

 

cheers

 

Keith

Reply