Skip to main content

We have a workbench which is driven by a where clause and an SQL Executor which picks up data driven by the where clause and process this. Occasionally we are seeing a bunch of consecutive jobs select zero rows in this transformer. We've tested the SQL and it returns the expected, we've re run the same predicate and it works ok.

My surmise is that some sort of failure occurs between SQL Server and FME but the SQL executor is not handling this well and not reporting any failure. We do see failures in the connectivity between our FME Server and the Database and usually these fail the translation but in this case all we are getting is a _matched_records of 0.

Has anyone else seen this behaviour, in our case now i've found it the task can be terminated and the data will be picked up by the system and resubmitted, however I now start to wonder if we are missing this anywhere else!

@Rob McPherson​ If there was a failure between FME & your SQL Server database (i.e. timeout) you'd most likely see an error or warning in your FME log file. to me, it sounds like a previous job has not yet completed - for example a commit has not occured before this query is run


Thanks Mark, bigger picture. We do get comms failures between the fme server and the db in both SQL executor updates and Geodatabase writer updates and these fail the model. we run over 2k models a day on 8 engines and when this happens we tend to see a grouping of failed jobs. Since they fail our process just puts them back on the queue so apart from a slight delay no harm is done. Our manager process selects a number of chunks of json data (keyed by an identity on the table) and submits jobs with a where clause parameter to run the actual loads. We know that the key range was there when this job was submitted and if we extract the SQL and run it after the data is still there. The select in the SQL executor however reads no rows so the job succeeds with no rows read, no rows written therefore throwing away this job. When we dug further these jobs also appear in small groups at the same time implying the same failure.

We can reset these jobs and they run without problem. As a workaround we now test the initiator port for rows returned and if it is 0 we terminate the job. This failure then causes the jobs to be reselected. Sure enough we can now see groups of failed jobs where formerly we couldn't see them. This happens maybe once a day and a handful of jobs fail at the same time - or can go a couple of days without failure.

It therefore appears that the query results are failing to return but the transformer isn't reporting this as an error. Since the data is in an embedded json column the query will be returning a couple of MB but this works 99% of the time without issue.

Thanks

 

Rob


Reply