Question

Translation successful, but process keeps running


Badge

Hi

Situation

We have a quite simple FME Workbench, loading CSV data from an URL, and writing it's data to a MSSQL database. The workbench works fine, writes all data and terminates with "Translation was successful".

 

Problem

Sometimes (actually mostly) the workbench doesn't stop right after it logs "Translation was successful", as it's supposed to do. Everything's done, all records have been inserted, but the workbench keeps running. Sometimes it takes 5min or 10min, sometimes it takes days until the workbench stops.

The screenshot shows the situation. In this case it was a time lag of 5 min.

 

Facts

  • FME 2018.1
  • MSSQL 2016; it's on the same machine
  • With PostgreSQL or Oracle we never experienced the same issue

 

Any ideas or hints?

Regards, Tobias


17 replies

Userlevel 5
Badge +25

It could be on the database side. Once FME has sent the last feature to the writer it feels it's done (and thus says "Translation Succesful" , but the database might still be doing something, like writing or creating indexes.

Badge

It could be on the database side. Once FME has sent the last feature to the writer it feels it's done (and thus says "Translation Succesful" , but the database might still be doing something, like writing or creating indexes.

Thanks for this very quick answer and the hint. These tables are not really big, and they have no indexes at all, so honestly I cannot imagine what the database could be working on during this time lag.

Is there a way on the FME side to force FME not to wait?

Userlevel 5
Badge +25

Thanks for this very quick answer and the hint. These tables are not really big, and they have no indexes at all, so honestly I cannot imagine what the database could be working on during this time lag.

Is there a way on the FME side to force FME not to wait?

Not really I think, perhaps if you use a FeatureWriter.

I do agree that "minutes" of extra time seems rather excessive for the amount of data you're processing. How much RAM does that machine have?

Badge

Not really I think, perhaps if you use a FeatureWriter.

I do agree that "minutes" of extra time seems rather excessive for the amount of data you're processing. How much RAM does that machine have?

Windows Server 2012 R2 Standard 64bit, 120 GB RAM
Badge

Not really I think, perhaps if you use a FeatureWriter.

I do agree that "minutes" of extra time seems rather excessive for the amount of data you're processing. How much RAM does that machine have?

We use only one writer for 3 feature types. But as far as I know this should not be a problem.

We use the MS SQL Server Non-Spatial Writer. It triggers a TRUNCATE, followed by Feature Operation INSERT.

Userlevel 5
Badge +25

We use only one writer for 3 feature types. But as far as I know this should not be a problem.

We use the MS SQL Server Non-Spatial Writer. It triggers a TRUNCATE, followed by Feature Operation INSERT.

I'm stumped... The only things I can think of that you can try is doing a DROP instead of a TRUNCATE or experiment with the features per transaction setting but I doubt that that's going to make a whole lot of difference. Hardware specs are fine too so that can't be the issue.

 

Badge

It could be on the database side. Once FME has sent the last feature to the writer it feels it's done (and thus says "Translation Succesful" , but the database might still be doing something, like writing or creating indexes.

After testing back and forward I found that if I run a TRUNCATE on the tables before I start my workbench, it seems to work more often (but still not always).

But if I use the TRUNCATE EXISTING option of the MSSQL Writer, or add a TRUNCATE as "SQL to Run Before Write", it doesn't work. But, I've checked again, this is only the case for MSSQL, not for other DBMS.

Is it possible that MSSQL blocks the insert or has something like a deadlock, if the time between the TRUNCATE and INSERTs is too short?

Badge

It seems that I've found the problem, but I don't really understand it. MSSQL seem to be part of the problem, but the API data downloaded from an URL seem to cause trouble as well.

The original workbench uses several CSV readers. The Source CSV parameter links directly to an Open Data URL (https://swisspost.opendatasoft.com/api/v2/catalog/datasets/hausnummer-und-hauskey_v2/exports/csv). The workbench loads the CSV, processes it and inserts the data into the DBMS. Works fine for PostgreSQL and Oracle, but obviously not for MSSQL. This confuses me, because I cannot see a reason why it's supposed to be a difference.

Now I have changed the process. I download the data with HTTPCaller, store it locally with FeatureWriter, and read it again with FeatureReader, everything happening in one workbench. This works fine, but it's much slower.

Does anyone have an idea or an explanation for this behaviour?

Badge +11

It seems that I've found the problem, but I don't really understand it. MSSQL seem to be part of the problem, but the API data downloaded from an URL seem to cause trouble as well.

The original workbench uses several CSV readers. The Source CSV parameter links directly to an Open Data URL (https://swisspost.opendatasoft.com/api/v2/catalog/datasets/hausnummer-und-hauskey_v2/exports/csv). The workbench loads the CSV, processes it and inserts the data into the DBMS. Works fine for PostgreSQL and Oracle, but obviously not for MSSQL. This confuses me, because I cannot see a reason why it's supposed to be a difference.

Now I have changed the process. I download the data with HTTPCaller, store it locally with FeatureWriter, and read it again with FeatureReader, everything happening in one workbench. This works fine, but it's much slower.

Does anyone have an idea or an explanation for this behaviour?

Hi @tobibruehlmeier,

I'm just making sure that you are not running any "SQL to Run After Write". I don't think you are. This parameter, much like the *Before* parameter can sometimes be forgotten about.

I'm wondering if this issue might be related to the Bulk Insert setting. Could you do some tests with turning off the bulk insert on the SQL Server Writer? The confusing part for me is that you have shared that all data has been written to the database tables... so I don't believe this is the cause of the delay.

My last suggestion to understand the problem involves isolating the Readers from the Writers. Do you see any delays if you disable the writer and only read the data? Does the same behavior exist? Further, what if you save the incoming data to an FFS file (using the Recorder transformer) and the alter the workspace to read the data from the FFS files (using the Player Transformer), disabling the CSV Readers, can you reproduce the same delay while writing?

Cheers,

Steve

Userlevel 4
Badge +13

In addition to Steve's answer, a few other things to consider.

If I understand your screenshot and problem description correctly, you're saying that the log shown in the Workbench window says that the Translation was successful, and all the statistics etc are shown, BUT Workbench itself things FME is still running and so you have the read STOP button lit up. And workbench stays like this for a long time.

If that is what is happening, then what I can say for sure is that FME itself has done all the work it is going to do, and all the transactions have been committed. The data is all where it should be. BUT the "fme.exe" translation process itself is refusing to exit.

When we print "translation was successful" we really are done. its the last thing. All that remains is to exit.

So this must be some interaction between our http fetching and the MSSQL APIs that get into a sad state when its time to exit the process. I suspect the OS you're on is part of the interaction.

A few things to try that might giggle stuff a bit.

1) Put in a FeatureHolder transformer in each stream. This causes the data to pile up there until all the readers are done and closed and terminated. Then the data flows out from that point. IF there is some odd interaction if both the http as well as the MS SQL API are active at a similar time, the FeatureHolders wills top.

 

2) Try FME 2019 -- not sure if that is an option but many library upgrades have taken place and so an odd interaction like this might have been solved.

 

3) If these don't help, try using FeatureReaders (one for each CSV file, using the URL as their input dataset), and FeatureWriters (for the output MSSQL).

 

Between that and Steve's suggestions, do let us know what you learn. Sorry about this -- strange one indeed.

 

 

Badge

In addition to Steve's answer, a few other things to consider.

If I understand your screenshot and problem description correctly, you're saying that the log shown in the Workbench window says that the Translation was successful, and all the statistics etc are shown, BUT Workbench itself things FME is still running and so you have the read STOP button lit up. And workbench stays like this for a long time.

If that is what is happening, then what I can say for sure is that FME itself has done all the work it is going to do, and all the transactions have been committed. The data is all where it should be. BUT the "fme.exe" translation process itself is refusing to exit.

When we print "translation was successful" we really are done. its the last thing. All that remains is to exit.

So this must be some interaction between our http fetching and the MSSQL APIs that get into a sad state when its time to exit the process. I suspect the OS you're on is part of the interaction.

A few things to try that might giggle stuff a bit.

1) Put in a FeatureHolder transformer in each stream. This causes the data to pile up there until all the readers are done and closed and terminated. Then the data flows out from that point. IF there is some odd interaction if both the http as well as the MS SQL API are active at a similar time, the FeatureHolders wills top.

 

2) Try FME 2019 -- not sure if that is an option but many library upgrades have taken place and so an odd interaction like this might have been solved.

 

3) If these don't help, try using FeatureReaders (one for each CSV file, using the URL as their input dataset), and FeatureWriters (for the output MSSQL).

 

Between that and Steve's suggestions, do let us know what you learn. Sorry about this -- strange one indeed.

 

 

Thank you for the hints. The solution described below using HTTPCaller, FeatureReader and FeatureWriter works quite well for now.

The script is running on Windows Server 2012 R2 Standard, with Intel Xeon CPU (2 processors, 2.4 GHz) and 120 GB RAM.

I'm not sure whether FeatureHolder would help or not. The screenshot just shows a simplified version of the original workbench. Actually we are loading around 10 files with the same workbench, producing quite a lot of data. Maybe RAM would not be very happy if we use FeatureHolder.

FME 2019 is not an option, since this Workbench is just part of a bigger process existing of many workbenches coupled with WorkspaceRunners. If we moved to FME 2019 now, we had to test everything again. Risk is too high at this state of our project (although I'm really looking forward to the new release...).

Badge

Hi @tobibruehlmeier,

I'm just making sure that you are not running any "SQL to Run After Write". I don't think you are. This parameter, much like the *Before* parameter can sometimes be forgotten about.

I'm wondering if this issue might be related to the Bulk Insert setting. Could you do some tests with turning off the bulk insert on the SQL Server Writer? The confusing part for me is that you have shared that all data has been written to the database tables... so I don't believe this is the cause of the delay.

My last suggestion to understand the problem involves isolating the Readers from the Writers. Do you see any delays if you disable the writer and only read the data? Does the same behavior exist? Further, what if you save the incoming data to an FFS file (using the Recorder transformer) and the alter the workspace to read the data from the FFS files (using the Player Transformer), disabling the CSV Readers, can you reproduce the same delay while writing?

Cheers,

Steve

Thanks, I've checked. There are no SQL's to run after write. Changing bulk insert setting didn't help as well. At this point I only experienced this behaviour with the combination of reading CSV from an URL and writing it to MSSQL. Not if I write it into Oracle or PostgreSQL (on another computer). So this computer could be part of the problem, but I actually don't think it is. I will set up a new MSSQL on another environment later this week. We will see...

Badge

We have the same Problem writing do a ESRI-File-GeoDataBase 10.6.1. (with FME build 18310, 64bit)

The same workbench ran well every night more than 2 years with ESRI 10.5, same FME-Build, same OS (Windows Server 2016)

But our Problem is: The process fme.exe NEVER stops (it hang the whole Weekend, more than 48 Hours, the Translation was successful - reported 1h 39 Minutes)

Tried it with Windows10 on a desktop-PC: the same Problem!

Tried it with FME 2019 (Build 19260): the same problem! (Writing just 100 Features: Process doesn't stop!)

 

@SAFE: Making a "Workspace" doens't make sense, because the cause of the error (multiple different readers from the same connection) cannot be simulated within ONE workspace!01_get_all_features_ns_ms_lwl_18310_Erweiterung.fmw

 

Badge

We have the same Problem writing do a ESRI-File-GeoDataBase 10.6.1. (with FME build 18310, 64bit)

The same workbench ran well every night more than 2 years with ESRI 10.5, same FME-Build, same OS (Windows Server 2016)

But our Problem is: The process fme.exe NEVER stops (it hang the whole Weekend, more than 48 Hours, the Translation was successful - reported 1h 39 Minutes)

Tried it with Windows10 on a desktop-PC: the same Problem!

Tried it with FME 2019 (Build 19260): the same problem! (Writing just 100 Features: Process doesn't stop!)

 

@SAFE: Making a "Workspace" doens't make sense, because the cause of the error (multiple different readers from the same connection) cannot be simulated within ONE workspace!01_get_all_features_ns_ms_lwl_18310_Erweiterung.fmw

 

Unbelievable, but true:

The same Problem arrived writing it to really simple ESRI-Shape-Files: Process doesn't stop

I changed the Readers (from multiple different ESRI-SDE-Readers: SDE30, ESRI ArcGIS-Layer, ESRI Geodatabase ARCSDE) to ONE single Reader (good luck, that this was possible because they were from one source-connection) -----> IT WORKS.

 

Maybe you should try to Change the Readers.

Multiple Readers from different Sources did NOT make Problems, just when the same Reader Connections are used with different Readers (I know that using more Readers is Nonsens, but it has been historically created)

Badge +4

Hi @tobibruehlmeier​ ,

 

I have the same issue with a workspace that runs many other workspaces (one for each feature, in sequence, using a WorkspaceRunner), that write into a geopackage table. I'm using FME(R) 2020.2.1.0 (20201130 - Build 20806 - WIN64).

 

This is an old thread... Have you come up with a solution in the meanwhile?

 

Thanks!

Badge

Hi @tobibruehlmeier​ ,

 

I have the same issue with a workspace that runs many other workspaces (one for each feature, in sequence, using a WorkspaceRunner), that write into a geopackage table. I'm using FME(R) 2020.2.1.0 (20201130 - Build 20806 - WIN64).

 

This is an old thread... Have you come up with a solution in the meanwhile?

 

Thanks!

Hi @davtorgh​ 

Sorry to hear you are running into this issue, could you test on a newer build of FME? If the problem persists I would recommend submitting a case.

Thanks

Badge +1

We also run into the same problem. Reading a bunch of csv files with an httpcaller to local file store and reading with a feature reader. Then writing to an oracle database. Translation is successful, features get written but our scheduling software (Neuron) doens't receive this signal and the process never ends.

I will submit a support case. FME version 2021.2.4 and 64 bit version. Upgrading to a newer version is not really an option at the moment.

Reply