Hi! I have a workspace where I mainly use a PDF reader to read around 5000 pdf:s in folders and subfolders (defined by the dataset “…\\**\\*.pdf”).
The workspace is large and quite complex but the thought is to take the pdf file name, which is an object ID, and combine it with other files that contain the IDs coordinates, etc. The pdfs contain other info on the objects that get extracted through its predetermined local coordinates.
The workspace has worked excellently, but since recently it doesn’t complete. The only change I recall is that the amount of pdf:s keep increasing as times goes by. I’d say that last time it worked was then it contained around 3000-4000 files.
I also find the reasons for the error to be sporadic. Mainly the workspace runs though a few hundred pdf:s, but eventually I get the message that the pdf cannot be opened “because the file is not in PDF format, or because it is corrupted”. Sometimes the program just crashes. The pdf can be individually read with no issues. And if I just run parts of the workspace with the PDF-reader and subsequent transformers, it runs through it all with no issues.
I have tried to run the PDF through a Directory and Path-reader, followed by a FeatureReader (pdf), but the problem persists. I have also tried a WorkspaceRunner, but honestly I don’t understand how I use it when connected to a workspace with multiple readers and writers. All the WorkspaceRunner examples I’ve found are quite simple.
I hoping that someone out there recognizes this issue and can give me some pointers on what to do.
Thanks,
Victor