Skip to main content

I have a workspace that filters a folder for the newest .zip file contained within and then uses a FeatureReader to extract shapefiles out of it for further processing.

 

This works perfectly in Form but in a Flow automation where it runs off of a folder watch, it fails with   '<filename>.dbf' is not a valid Shapefile and cannot be read

 

So apparently when operating from Flow the FeatureReader is trying to read the sidecar files.  Why?  And how can I tell it to stick to .shps?

 

Thank you

@cstewartfortsas this would probably need more details about how the workspace is setup, and show what parameters have been set, particularly on how the File Reader is passing file paths onto FeatureReader.

The error suggests one possibility: That somehow the Flow workspace processing is unzipping the associated *.shp and *.dbf files into the same folder as the one being watched.  This is becoming the “newest” file and the path of the dbf file is then the Folder File Reader is literally sending this as the file path for the SHP FeatureReader to try to read.

This could be why the FeatureReader gives the error “ <filename>.dbf' is not a valid Shapefile and cannot be read.”: This dbf path is the literal dataset filepath it has been told to read by the File Path reader, instead of a Zipped SHP container.

Note that when Readers read a zipped file container like this, they will temporarily extract the files to a nominated temporary folder, read them, and then in theory delete them after reading.  However, the settings in Flow as to “where” to put these temporary files can be different from the settings in Form.  If in Flow it is putting these temporary files into the same folder, or a subfolder of the watched folder, then this could be what is then causing it to be picked up by the File Reader and sent as a path to the FeatureReader.


Thanks @bwn 

 

So as part of checking out these ideas I flipped the shapefiles into a zipped geodatabase, and it works flawlessly.  If that works, then seems to be able to handle folders correctly.

 

What does that leave?

 


That really needs to post up a more detailed view of the workflow and parameter settings for the File Reader and FeatureReader.

The GDB change still has a possiblity that it is one of the unzipped dBase files that is being passed on by the File Reader as the “newest” file in the folder to read as a SHP File.  Because…..FGDBs when they unzip, unzip all their files into a subfolder and so there are scenarios where if in Flow the FGDB is being temporarily unzipped into the same folder as the one being watched, there will be 0 new files appearing during this unzipping phase within the watched folder and instead only in the FGDB subfolder, whereas is a zipped SHP container is being temporarily unzipped into this Flow folder then there will be SHP and DBF files in the root folder and not in an FGDB subfolder.

Otherwise if don’t wan’t/can’t show the workflow here, then would suggest putting a Logger Transformer on the File Reader output to write to either the Log or optionally to a separate Log file on an accessible network drive exactly what file/file path is the File Reader sending to the FeatureReader.  If the Logger writes to the log/log file it is sending a dbf file path to the FeatureReader as the input file path to read, then that is the problem.


Hi @bwn 

I have limits about what I can post, but I’ve attached the relevant part of the workspace.  It still works on the desktop and fails in Flow.


Looking at the workspace tends to confirm some suspicions about what might be happening.

So it is set to read all file paths from the target folder.  The PATH Reader than passes all of these paths to the SHP FeatureReader with a test for whichever is the newest file, and no filtering for the file type for all the potential files in the folder at that time.  This will include any dbf files unzipped into this folder by Flow using this folder as an Eg. Temporary location to unzip the SHP Zip file container.  The Workspace is then set to take whichever is the “newest” file, whichever file type that may be, and this then can create a situation where a file that is not a Zipped SHP is being passed as the path to the FeatureReader.

This lines up with why FGDB “works” and SHP does not, because the FGDB when unzipped does not create any files in the Temporary unzip path, just Subfolders.

 

 

If it was me, then would look to:

  • Potentially set the FME Temp folder to something else in FME Flow.  Locally this can also likely be specified in the Workspace FME_SHAREDRESOURCE_TEMP Parameter to point FME Flow to temporarily unzip the SHP ZIP containers into a folder that is not the one being watched.

     

  • Potentially also set the FME folder monitoring trigger to only trigger on new *.ZIP .  And not just *.*
  • Change the Path reader Path Filter to *.ZIP instead of *
  • Optional, but in FME2024 onwards, I believe can also now use an additional FeatureReader to read the list of files inside ZIP files.   The PATH Reader can then send ZIP files to the FeatureReader and then can use the output from this with a Tester to only use ZIP files that contain a *.SHP file and to ignore any ZIP files that do not.

Thanks @bwn 

So I went through these solutions one at a time and the one that did it was changing the Path Filter.

Thank you for your help.


Well if that solution works, it strongly suggests that there are dbf files also being written/unzipped into the watched folder and that was the source of the FeatureReader errror:  It being passed a dbf file path to attempt to read rather than a Zipped SHP file path.

The Path Filter fix doesn’t stop the workspace from activating on non-ZIP file appearance within the watched folder, it just in-workspace prevents dbf file paths from being output by the Directory and File Reader and passed through to the FeatureReader and stopping the workspace from executing any further.  Ideally though is to limit the workspace to only activate on new ZIP file in the watched folder.

If it works…...it works…….but the purist in me would also be looking how to stop dbf files from being created in the watched folder in the first place.  As above, it seems the most likely source is FME Flow temporarily unzipping the SHP ZIP file container into the watched folder.


Reply