Skip to main content

Hi folks,

I developed an FME workspace that connects to an api service. The workspace is quite elaborate, but in its basics its sends (calculation) jobs to an api, periodically requests a status update from the api to see which jobs completed, and downloads the results for all jobs that are completed.

 

If I look at the FME workspace I see that in total 5469 (calculation jobs) were initialized, and it seems that the monitoring process has identified 590 jobs to complete. If I however manually request a new status from the api, it seems that (nearly) all jobs in fact completed, but many did so after the latest time instant that was logged during the translation (2020-09-18 17:41:26). Also if I look in my resultFiles folder for the jobs that were downloaded, I see that the latest resultFile was downloaded at 18-09-2020 17:41...

 

I notice that the workspace indicates to be still running, but there doesn't seem to be any progress anymore. If I look at the translation log within workbench, I see that the translation log has exactly reached 1000000 lines, and the translation seems to have stalled from there. However, the first line that is shown in the translation log within workbench

(which is '2020-09-18 05:13:28|32496.7| 0.0|STATS |aeConnectCalculate_2::aeConnectCalculate_mainCalculations_Logger_Information LOGGED Transformer Output Nuker (TeeFactory): Cloned 0 input feature(s) into 0 output feature(s)'),

is actually not the first line of the physical log file

(latter is '2020-09-17 15:23:57|  0.6| 0.6|INFORM|FME 2020.0.2.1 (20200511 - Build 20238 - WIN64)'),

so it seems that FME only shows the last 1 million lines in the translation log window of workbench.

Just curious, is this true, and is this configurable in some way?

 

If I look at the physical logfile, it seems that is has grown into quite a sizeable file of 532944 kb, with a total of 2687610 lines written, where the last line is the same one as the one shown in the translation log window of fme workbench. Given the size it's quite close to half of a gb. Could it be true that FME holds a limitation for the size that the physical log file can hold?

I now also realize that it might be wise for me to turn off informational log elements to be written to the logfile (i.e. uncheck 'Log Information' under 'Log Message Filter' of the 'Translation' tab of the 'FME Options'). In this way the issue I ran into might be prevented.

For now I'm mainly curious to know if it could be possible that the logging methodology of FME has anything to do with the stalling of the workspace that I experienced.

 

I already checked for some settings that might be related to the logging in FME, but I can't seem to find one that points to a setting/restriction of 1000000 lines in the translation log window of FME workbench, or a setting/restriction to the size of the physical log file of FME translation can hold.

 

What I found was;

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_Workbench/Workbench/Max_Features_to_Log.htm

--> This to me seems like a setting that is targetted at logger transformers in the workspace (and/or maybe automatic error/warning logging of FME). I see that for me I didn't change these settings, and in the workspace parameter defaults it is configured to Max Features To Log = 200 and Max features to record = 200

 

Furthermore I thought and saw some items pointing towards 32 vs 64 bits versions and available RAM, but I don't think that is the issue here. The translation doesn't require that much ram, especially not for the monitoring part. Also, I have feature caching disabled. Some lines on the log that might be interesting on this front;

2020-09-17 15:23:57|  0.6| 0.6|INFORM|FME 2020.0.2.1 (20200511 - Build 20238 - WIN64)

2020-09-17 15:23:57|  0.6| 0.0|INFORM|FME Desktop Professional Edition (floating)

2020-09-17 15:23:57|  0.6| 0.0|INFORM|START - ProcessID: 6756, peak process memory usage: 39200 kB, current process memory usage: 39200 kB

2020-09-17 15:23:58|  1.0| 0.0|INFORM|FME Configuration: FME_PRODUCT_NAME is 'FME(R) 2020.0.2.1'

2020-09-17 15:23:58|  1.0| 0.0|INFORM|System Status: 336.57 GB of disk space available in the FME temporary folder (D:\\TempAppData)

2020-09-17 15:23:58|  1.0| 0.0|INFORM|System Status: 64.00 GB of physical memory available

2020-09-17 15:23:58|  1.0| 0.0|INFORM|System Status: 68.00 GB of virtual memory available

2020-09-17 15:23:58|  1.0| 0.0|INFORM|Operating System: Microsoft Windows Server 2012 R2 Server 4.0 64-bit (Build 9600)

2020-09-17 15:23:58|  1.0| 0.0|INFORM|FME Platform: WIN64

2020-09-17 15:23:58|  1.0| 0.0|INFORM|Locale: nl_NL

2020-09-17 15:23:58|  1.0| 0.0|INFORM|System Encoding: windows-1252

2020-09-17 15:23:58|  1.0| 0.0|INFORM|FME Configuration: Start freeing memory when the process exceeds 192.00 GB

2020-09-17 15:23:58|  1.0| 0.0|INFORM|FME Configuration: Stop freeing memory when the process is below 144.00 GB

 

Ideas or suggestions on whether the logging methodology of FME might be linked to the stalling of my workspace are greatly appreciated.

 

Kind regards,

 

Thijs

 

Golly, thats quite a big log file! - I doubt there is a limit to the size of the log file. I suspect that the limit of the lines you see in workbench is m just a number chosen by a developer when setting it up but who knows.

I have seen issues with a lot of logging when both writing to a network drive and using a lot of subprocesses, but I think this is a different issue.

Hanging workspaces are annoying to debug for sure!​


@thijsknapen​  I agree with @virtualcitymatt​  - it's unlikely to be the logging that is causing the hang. If you're worried about that, just close the log window in FME workbench and monitor the 'tail' of the .log file instead.

I think I'd check the logic you have around "I developed an FME workspace that connects to an api service. The workspace is quite elaborate, but in its basics its sends (calculation) jobs to an api, periodically requests a status update from the api to see which jobs completed, and downloads the results for all jobs that are completed." For example, what happens if one of the API calculation calls fails and FME is waiting to download the result. How does FME know when everything has finished. If possible, consider making fewer calculation requests per FME job.


WorkspaceRunnerHi @virtualcitymatt​ and @Mark Stoakes​ 

Thank you for your reply's. So it seems that there are no restrictions to the size of the log file. Looking at the tail of the .log file doesn't really help, because nothing is added anymore since the workspace the workspace seems to be hanging.

 

The last line in the log is

2020-09-18 17:41:26|52263.3| 0.0|INFORM|aeConnectGetResult_mainCalculations_WorkspaceRunner_create_metaData_JSON_calculationJob: Running FME with command line - `{C:/Program Files/FME2020.0.2.1/fme} COMMAND_FILE D:/TempAppData/wsr_cm_1600443686229_6756 2>@1'

 

If I look at the workspace itself, I can see that at this workspacerunner 6059 features entered, and 6058 features succeeded. So it seems that FME was not able to start up the fme process, and stalled from there. No clue why (it can't be that the maximum amount of FME.exe of 8 processes is exceeded, because there was just one workspace running, and together with the workspacerunner it brings a total of two) Maybe, just maybe, the issue is that when starting up this FME (sub) process, FME tried to check to see if a licence was available (or whether the product was licenced at all), and couldn't connect to the licence server at that specific instant, and hanged from there? There have been some network hiccups on the environment...

 

I don't think the issue lies in the logic of the process itself. In general error handling is pretty well implemented. To go into your example, if one of the API calculation calls fails, the workspace is configured to log an error and go to the next call. In that case the job won't initialize, and FME also won't monitor the job or try to download the result (also in the status call this job wouldn't come up). For the monitoring process I have a maxTime parameter configured, so if an initiated job doesn't finish in this preconfigured maximum time, it will abort the monitoring process. However. During this process, it would still incremently perform a statusCall to the api (currently configured every 10 minutes), so while the monitoring process is taking place (i.e. when the maximum time is not yet exceeded), the log should still increment with additional status requests every X (currently 10) minutes.

 

On making fewer calculation requests; that's actually already what I did. Originally there were 22007 jobs to initiate (times 2 for two different configurations) ;)

 

 

 


WorkspaceRunnerHi @virtualcitymatt​ and @Mark Stoakes​ 

Thank you for your reply's. So it seems that there are no restrictions to the size of the log file. Looking at the tail of the .log file doesn't really help, because nothing is added anymore since the workspace the workspace seems to be hanging.

 

The last line in the log is

2020-09-18 17:41:26|52263.3| 0.0|INFORM|aeConnectGetResult_mainCalculations_WorkspaceRunner_create_metaData_JSON_calculationJob: Running FME with command line - `{C:/Program Files/FME2020.0.2.1/fme} COMMAND_FILE D:/TempAppData/wsr_cm_1600443686229_6756 2>@1'

 

If I look at the workspace itself, I can see that at this workspacerunner 6059 features entered, and 6058 features succeeded. So it seems that FME was not able to start up the fme process, and stalled from there. No clue why (it can't be that the maximum amount of FME.exe of 8 processes is exceeded, because there was just one workspace running, and together with the workspacerunner it brings a total of two) Maybe, just maybe, the issue is that when starting up this FME (sub) process, FME tried to check to see if a licence was available (or whether the product was licenced at all), and couldn't connect to the licence server at that specific instant, and hanged from there? There have been some network hiccups on the environment...

 

I don't think the issue lies in the logic of the process itself. In general error handling is pretty well implemented. To go into your example, if one of the API calculation calls fails, the workspace is configured to log an error and go to the next call. In that case the job won't initialize, and FME also won't monitor the job or try to download the result (also in the status call this job wouldn't come up). For the monitoring process I have a maxTime parameter configured, so if an initiated job doesn't finish in this preconfigured maximum time, it will abort the monitoring process. However. During this process, it would still incremently perform a statusCall to the api (currently configured every 10 minutes), so while the monitoring process is taking place (i.e. when the maximum time is not yet exceeded), the log should still increment with additional status requests every X (currently 10) minutes.

 

On making fewer calculation requests; that's actually already what I did. Originally there were 22007 jobs to initiate (times 2 for two different configurations) ;)

 

 

 

What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Hi @virtualcitymatt​ 

Thx again for your help. The workspacerunner is configured to wait for the 'job' to complete (i.e. to wait for the writing of the metadata file to complete). The thing is that I want to write out a metadata file per job on the fly, whereas when using a normal writer, data can only be written out at the end of the translation (or am I missing something there). It seems to be related to the idea; https://community.safe.com/s/idea/0874Q000000TkcQQAS/detail

 

The workspace and data is being stored and run on a local machine (with local drives). The licence server for FME is probably on a different machine though...

All things considered I now am quite confident that the issue for the stalling of the workspace could very well be caused by the workspacerunner, but I'm not exactly sure on what front. Since everything else runs locally (to my recollection), the only thing I can imagine is the communication with the licence server halted.

 

Maybe I could change the setting 'wait for job to complete' of the workspacerunner to 'No' to prevent this issue in the future. Not exactly sure if this was a must for me to have it on 'Yes' (I vaguely remember it was (preferable) though)


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Ahh, take a look at the FeatureWriter. This can be added in the middle of the workflow for sure! This might sort out your troubles. Wait for job to complete is important if you need to know success of fail. If you don't need to know success or fail then you can set it to know and run a bunch of jobs in parallel (up to 7).

But take a look at the FeatureWriter to see if it will help


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

I know about the FeatureWriter, but thx for the tip. Besides the great ability to be able to perform an action after features are written, the featurewriter generally doesn't behave much different than a regular writer.

Specifically, a featurewriter will also only start out writing features after it is detected by the workspace that no new features will enter the input port (where I want features to be written out on the fly).

 

With regards to the workspacerunner settings; in my workspace I do prefer to know whether the metadata files are succesfully written out or not, because if they don't, I would end up with resultzips that don't have the required administration (I guess that is why I chose to wait for the workspacerunner to complete). Furthermore there is also little benefit from parallelization by the workspacerunner, because the creation of the metadata files takes up a second or so (and I also don't want to have too many concurrent FME.exe processes so that I don't end up with (licencing) issues of starting up more than 8 FME.exe processes, so that I can peacefully work on other FME workspaces in the meantime that the large workspace is running, but of course that could also be managed by having maximally 1 concurrent job (instead of e.g. 7))


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Hmm, what format are you outputting to?

If you sort the data and then stick it into a CustomTransformer you should be able to process by Group in that way if the Writer doesn't do it. In the end though that may result in the same situation as the WorkspaceRunnner and you'll probably end up with just as many headaches.


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Hi Matt (I suppose that's your name ;))

Using that methodology you can indeed make sure that the featurewriter will handle features in groups. Although in most use cases I guess it would already suffice to use a fanout at the writer by dynamically configuring the data output path (in the case of a featurewriter, for a regular writer you need to use the 'regular' fanout option). Nevertheless I guess your methodology is a bit more secure/brute force in enforcing features to be handled in groups.

However, in order to write out features on the fly like I wanted here, it is not enough to have the features being handled in groups, as you also need to have the ability to use a setting 'Process When Group Changes', similar to e.g. the Aggregator transformer. That is what the previous idea I mentioned is about. Unfortunately it hasn't been picked up/implemented in the past 4 years 😞

 

To get back to your initial question; I'm using a text writer to create a (nested) JSON file. This because I believe the JSON writer of FME doesn't support nested JSON files.


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

If you are writing a whole text file at once the textfile write should process this per feature no? Perhaps not, I haven't tested that out. I assume not if you've gone to all this trouble. Are you using the JSONTemplator to structure it?

 

In a CustomTransformer you can process by group and when group changes. So you can sort of create your own grouped writer.


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Hmm... I see the 'Process when group changes' option of the custom transformer. That might indeed work. I didn't think (or know) of that when I created the workspace, tnx for the suggestion!

 

If you write out a whole text file at once, and set a data fanout (or dynamic path) specific to single features, you will indeed get a file per feature, but still the features are only written out as soon (or late) as no new features enter the (feature)writer.

I'm indeed using a JSONTemplator to structure the JSON.

Update, it seems that safe is also recommending to use a text writer in case of writing nested JSON files; https://community.safe.com/s/article/json-writing-with-jsontemplater


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

That's too bad. It really seems like this should be the case. At least for the text file writer. It would also really help for situations where recording live data is important. Presumably there is a pretty big block on this functionality. And for now the CustomTransformer solution works, although the stability might be questionable...


What are the settings in the WS Runner? Are you waiting until complete or just submitting them?

Consider also that the sub workspace being run will also be writing it's own logfile. Are you creating a separate file for each subprocess?

If you are waiting for FME to complete the process then it's possible that there is a loss in communication between the parent and child.

If either of the workspaces are hosted on a network drive then logging will also be done over the network and if a lot of logging and overwriting of files is occurring then maaybe this could cause issues. An interesting test would be to see if turning off logging all together results in a successful process. Another option could be to try and set up logging on your C: drive. This would rule out network issues for logging at least.

Maybe a small addendum on this front. There is a setting in FME that sort of controls the order in which features are written. Specifically under the 'Workspace Parameters', under the section 'translation', you have the 'order writers by' parameter. That parameter can be used to control which writer will have the highest priority for writing out features. However, this setting is only specific to when you have multiple writers. It doesn't change the fact that data will only be written as long as no new features enter the writer(s), it only controls which writer will initiate first. Furthermore I believe this setting only works for regular writers (not featurewriters).

 

Also, it is possible to use a recorder if you want to store features mid translation. So if you are happy with the ffs format, there is a workaround to writing out features on the fly. For all other formats not so much.

(or at least not so easy, your methodology with using a featurewriter within a custom transformer, and configuring the custom transformer to handle features in groups (i.e. using the 'Group By' transformer parameter) and choosing the groups to be processed when the group changes (i.e. setting the 'Group By Mode' transformer parameter to 'Process When Group Changes (Advanced)') might be a good workaround)


@thijsknapen​  Feature counts on the connectors can in sometimes be off by one depending on where FME stalled - so it might be the transformer downstream of your WorkspaceRunner that is holding things up. If you have already reduced the number of HTTP calls, then it looks like it is the last call that is causing the hang-up. Would the workspace hang if you requested 10 jobs? You need to inspect the jobs and step thru them. You can do this using Feature Inspection & Break Points (remember breakpoints can have have conditional so you don't have to step through every single job).

If WorkspaceRunner is hanging, then you'd probably still see an fmeworker.exe in windows task manager


Reply