Question

Sending Count over Total of files written by FeatureWriter as the are emitted?

  • 26 June 2017
  • 7 replies
  • 14 views

I currently have a custom Transformer that gets a parameter MaxTileCount (the value of _num_tiles) from a WebMapTiler and it counts the number of features and sends a HTTP message every 10 messages to show progress.

That is all working fine.However, now I have issue. The PNGwriter is no longer a PNGRaster writer, but has changed to a FeatureWriter that does a fanout. Well that takes a lot of time, So I want to move the Notifier to later in the pipe, after the "pngWriter (FeatureWriter). The issue I have is that I don't know how to count anymore the number of files written by the FeatureWriter as they are emitted, nor do I have access to _num_tiles which is the total number of tiles coming from the WebMapTiler.

In simple words, I want to send a HTTPPost message with count/total as the files are written by the FeatureWriter any idea who to do this?

The second notifier fails. So I disabled it.


7 replies

Userlevel 2
Badge +12

You can use the VariableSetter to create a variable from the _numtiles attribute from the WebmapTiler. After the FeatureWriter you can retrieve the value from the variable using the VariableRetriever transformer and use that for your count.

Userlevel 2
Badge +17

Hi @ygutfreund, the FeatureWriter outputs summary features after completion of writing and each of them has an attribute called "_total_features_count", which stores the number of features (PNG files in this case) that have been written into a destination dataset (folder in this case). I think you can get the total number of written PNG files from this attribute and then send a message using an HTTPCaller.

Hi @ygutfreund, the FeatureWriter outputs summary features after completion of writing and each of them has an attribute called "_total_features_count", which stores the number of features (PNG files in this case) that have been written into a destination dataset (folder in this case). I think you can get the total number of written PNG files from this attribute and then send a message using an HTTPCaller.

@takashi I like your answer, but I just got an email from SAFE support, and testing it. They tell me the summary is only sent on a per-folder basis (in my case with fanout that means out of 121 files, I get 29 folder messages with the totals, and this happens only when all the webMapTiles are at the featureWriter, so this is not a good way to do progress monitoring. (SAFE has a feature request to do immmediate writes, but that is not there yet). I am going back to my prior method of not using the FeatureWriter, but rather the PNGWriter and putting all the tiles in a single folder (yuck) as zoom-col-row.PNG

 

You can use the VariableSetter to create a variable from the _numtiles attribute from the WebmapTiler. After the FeatureWriter you can retrieve the value from the variable using the VariableRetriever transformer and use that for your count.

 

@erik_jan that is a great tip. I am sure I am going to use that Transformer in some cases. Thanks.
Userlevel 4
Badge +25

Well... that's a very interesting requirement. Since the FeatureWriter is not outputting the summary as one feature at a time, we have to figure out another way. I can think of a few...

1) Put the writing (FeatureWriter or a plain writer) inside a second workspace and use the WorkspaceRunner to run it. That will return a feature as soon as the job is completed, and you can do your count to ten on those.

2) Somehow scan for files as they are written so you have that live count. Perhaps each feature is also sent to a custom transformer that loops around - waiting for a files existence - before the feature is output and counted. Another way would be to perhaps pass the feature into a Python script that checks for the file and loops until it exists.

To keep things simpler, these too could go into a separate workspace. It could store the count in a text file readable by the master, or just pass features back (maybe try a Sender/Receiver pair?)

3) The simplest - though least accurate - solution, might be to just put a decelerator transformer into the workspace. Say you know each tile takes about 30 minutes to write (for example) just put a 30 minute delay in before the Counter. It won't be as accurate, but it will give you an approximate idea of how the process is going. Or the delay could be related to an attribute - eg if you have an idea of file size or number of rows/columns, the delay could be proportionate to that, making it more accurate.

In the first two cases, I'm not quite sure how this might interact with your fanout, or how the features relate to the files, but I think both are capable of being implemented without too much of a problem.

I hope one of these helps.

Mark

Well... that's a very interesting requirement. Since the FeatureWriter is not outputting the summary as one feature at a time, we have to figure out another way. I can think of a few...

1) Put the writing (FeatureWriter or a plain writer) inside a second workspace and use the WorkspaceRunner to run it. That will return a feature as soon as the job is completed, and you can do your count to ten on those.

2) Somehow scan for files as they are written so you have that live count. Perhaps each feature is also sent to a custom transformer that loops around - waiting for a files existence - before the feature is output and counted. Another way would be to perhaps pass the feature into a Python script that checks for the file and loops until it exists.

To keep things simpler, these too could go into a separate workspace. It could store the count in a text file readable by the master, or just pass features back (maybe try a Sender/Receiver pair?)

3) The simplest - though least accurate - solution, might be to just put a decelerator transformer into the workspace. Say you know each tile takes about 30 minutes to write (for example) just put a 30 minute delay in before the Counter. It won't be as accurate, but it will give you an approximate idea of how the process is going. Or the delay could be related to an attribute - eg if you have an idea of file size or number of rows/columns, the delay could be proportionate to that, making it more accurate.

In the first two cases, I'm not quite sure how this might interact with your fanout, or how the features relate to the files, but I think both are capable of being implemented without too much of a problem.

I hope one of these helps.

Mark

@Mark2AtSafe well, since I am spawning FME.EXE from nodeJS (command line interface) I was thinking of putting a Node (FS) file watcher process in the node code and watching the file hierarchy for new files and counting them. But it seemed cleaner just to let let the WebMapTiler create the 10K+ tiles, and use the FeatureWriter to actually write the files, Then I could put the notifications inside of FME and post them to the Node process that spawned FME. WebTiles are only 256x256, so they are pretty small, and the timing varies a lot depending on size the input etc. I have a lot of different stuff going it, so it is hard to estimate the time, Better to get the real values from FME. (In fact, I use FME to get an estimate as a first pass, then as a second pass I do the actual build).

 

 

 

Userlevel 2
Badge +17

The FeatureWriter itself doesn't have your desired functionality (progress monitoring), but if the major issue was that Dataset Fanout takes a long time, possibly this approach could reduce the time.

  1. FeatureWriter: Write all rasters into the destination root folder with setting each file name to "<zoom level>_<x index>_<y index>", without using Dataset Fanout.
  2. Explode the summary feature from the FeatureWriter on the "_feature_type" list, [Edit] and then create the final destination folder path "<root>/<z>/<x>" and file name "<y>.png". [/Edit]
  3. File Copy Writer: Move (rename) every file: "<root>/<z>_<x>_<y>.png" to "<root>/<z>/<x>/<y>.png".

Reply