Skip to main content
Open

FeatureWriter: Option to write to temporary file without giving an explicit path

Related products:Transformers

vlroyrenn
Supporter

There are a number of operations that are made much easier and/or more efficient by simply having data be passed around a file on disk instead of as FME features. When dealing with large amounts of features going into a PythonCaller, for instance, it can be preferable to dump them all into a Parquet or Arrow file and decode that file with Pandas or some other dataframe library, rather than extracting attribute data feature by feature in Python. Some other use-cases, like Emailer, also often make use of temporary files.

The issue is that having FeatureWriter recieve both a path from TempPathnameCreator and the features it needs to actually write doesn’t work, that path needs to be joined on each feature in any of various clumsy methods. Given most uses of TempPathnameCreator are intended to feed into FeatureWriter, I would suggest that “Temporary local file” be a valid dataset option for FeatureWriter, which would simply generate a temporary path in FME_TEMP, write all features to it and allow the file name to be retrieved as the _dataset attribute under the Summary tag, as is already the case. It would otherwise work in exactly the same way as the combination of TempPathnameCreator and FeatureWriter, but simpler to setup.

Any scenario where it is more practical to pass features between transformers as single files rather than large collections of features might benefit from this.

 

2 replies

vlroyrenn
Supporter
Forum|alt.badge.img+12
  • Author
  • Supporter
  • July 3, 2024

As a workaround in the meantime, it’s possible to create temporary files in scripted user parameters.

# Parameter Identifier: IN_TEMP_FILE_PATH
import tempfile

# Creates file in FME_TEMP
return tempfile.NamedTemporaryFile(suffix=".parquet", delete=False).name

I’ve not seen this documented anywhere (although I’ve been told that it is intended behavior), but the stock tempfile module has configuration overrides so that the folder its temporary files are created in is the run-specific directory under FME_TEMP. This means that, even if the delete argument is set to false like above, the created file and its parent directory will be deleted when the flow finishes running (or, if caching is enabled in FME Form, whenever the cache is cleared, such as when the next run starts).

This makes such temporary files safe to use between transformers and across your flows without the risk of them not being deleted properly, and without all the issues mentionned above that TempPathnameCreator usually brings.

If you need to break down such paths without the need to use a FilenamePartExtractor, you can also do so in scripted parameters.

# Parameter Identifier: IN_TEMP_FILE_DIR
import os.path

return os.path.dirname(FME_MacroValues['IN_TEMP_FILE_PATH'])
# Parameter Identifier: IN_TEMP_FILE_STEM
import os.path
FILE_SUFFIX = ".parquet"

basename = os.path.basename(FME_MacroValues['IN_TEMP_FILE_PATH'])
assert basename.endswith(FILE_SUFFIX), "{} does not end with {}".format(basename, FILE_SUFFIX)
return basename[:-len(FILE_SUFFIX)]

 


LizAtSafe
Safer
Forum|alt.badge.img+15
  • Safer
  • October 7, 2024
NewOpen

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings