I have a somewhat large (~180k lines, 100 cols) set of features coming out of a database reader that I'm loading into a PythonCaller to run some heavy processing on. I've been doing some profiling and it's a big contributor to slowdown in my workbench. To my surprise, though, the number crunching only represents about 10-20 seconds of the 90 seconds runtime for that node. Most of it is actually spent in the hot loops that run feature.getAttribute (another 25-30s) and feature.setAttribute (about 50s).
Meanwhile, using a FeatureWriter and a FeatureReader, all these features can be written to and read from Parquet files in seemingly less than a second. It's not exactly easy due to how FeatureReaders don't output the path of the file they've written to, and PythonCallers only have one input node, making it difficult to handle passing a temporary output file name, but it cuts down my translation time from 90 seconds down to 9 seconds.
Are these the only two ways of doing things? Do FME's Python bindings have a proper mechanism for bulk-loading a large amount of features like this, or am I stuck doing this by hand (plus handling temporary files)?
Best answer by vlroyrenn
View original