Open

Fence and/or Semaphore transformer to control the flow of input data

Related products:Transformers

1 year ago
August 17, 2023
1 reply
70 views

vlroyrenn
Enthusiast
63 replies

The problem

The documentation for FeatureMerger mentions a "Suppliers First" mode that can reportedly be very beneficial to performance (and, I would imagine, crucial for Streams), but comes with the constraint that all suppliers must have arrived before the first requestor comes in. To my knowledge, there is currently no way in FME of upholding that guarantee in a reliable manner. The timing between Readers, Creators, FeatureReaders and SQLExecutors is not something I can claim to understand, and the completion order can change based on whether caching is turned on or not. This is already troublesome when editing workbenches, but it can be expecially problematic inside custom transformers, where you don't have control over the delivery order of features in your input ports.

This isn't merely an issue with FeatureMerger; it's going to be a problem at any time where you rely on input ordering for a transformer to work properly, with specific attention given to Streams and Feature tables.

If I have a PythonCaller that configures itself based on data coming from TransformerA before it's ready to accept data from TransformerB, there isn't a lot of options for reliably dealing with out-of-order input. Holding onto features is illegal when bulk mode support is advertized, so the PythonCaller must either opt out of it (which hurts downstream performance) to accumulate any features it's not ready to process until the configuration features have come in. Even then, it can't know when TransformerA has closed, so unless it only expects one configuration feature, it's dangerous to start processing before close() has been called.

This might be clearer when considering the attached screenshot: Python_MapAttributes needs the output line from JsonTemplater in order to work with the data coming from MappedInputLines. If MappedInputLines starts sending features first, the PythonCaller can't do anything yet, and can only crash (undesirable) or start buffering features (illegal in Bulk mode).

The solution

The idea would be to have some sort of Semaphore transformer, something like a FeatureHolder, but with (at least?) two input ports: a "Priority" port, which lets features through normally, and a "Held" port which buffers features until the other port has closed and no more features can go through it. This would ensure that no feature from the "Priority" side can ever arrive before a "Held" feature, thus allowing workflow and transformer designers to guarantee feature ordering downstream without breaking bulk mode.

Other relevent use cases

One might also consider having a Terminator node which should stop the translation when an assertion or a join fail in some unexpected way, but should wait until every faulty feature has arrived to give proper context instead of immediately stopping at the first one. Bad features could be sent through the priority port and then the priority port routed to a terminator, so that no feature can be passed to the next step until it has been verified that none exist that would trip the Terminator. This would also allow the Terminator to be changed to wait for all features to have arrived before stopping the translation, instead of aborting at the first one (which currently makes sense, as the more you wait, the more you risk that downstream writers will have already started to write incomplete data).

+13

vlroyrenn
Author
Enthusiast
63 replies
1 year ago
June 27, 2024

Note that this scenario also applies to FeatureWriters in dynamic mode, which may get their schema from a fme_schema_handling → schema_only feature that needs to be recieved before any data. There is a way of doing this by using NULL writers and FeatureHolder transformers to gate the non-priority data on the priority port being closed, but the FeatureHolder also causes non-priority data to not be allowed through until that input port is closed as well, which leaves the streaming issue unsolved.

Multi-port PythonCaller transformers may prove helpful to solve this in the future, but are not an option at the moment due to how it’s not possible for a PythonCaller to dell when individual ports have been closed, only when all ports are closed. See related issue below.

https://community.safe.com/ideas/add%2Dper%2Dport%2Dclose%2Dand%2Dhas%2Dsupport%2Dfor%2Dmethod%2Dto%2Dpython%2Dtransformers%2D32682

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Fence and/or Semaphore transformer to control the flow of input data