Hi @westdakota,
I didn't test Tesseract for Mac, but I am aware of a couple of things that require attention when a workspace with SystemCaller is ported to a different platform.
1) Do you use TesseractCaller transformer? It creates temp files using TempPathnameCreator transformer. It used to create really long file names, which work correctly with most modern software, but some packages have limits on the path length. For example, PotraceCaller, a transformer I made about the same time as the TesseractCaller, does not work on Mac due to this limitation. We updated the TempPathnameCreator in recent FME builds, so this problem should go away. Meanwhile, you can just generate the temp path names within the transformer (maybe something like /temp/current_datetime/) or use hard-coded path values.
2) The syntax for the SystemCaller is a bit different for Windows and Unix/MacOS. In my recent workspace, which runs FFmpeg video tool, I use Tester to figure out the platform:
If @Left($(FME_HOME),1) = "/" (that is, if FME_HOME variable begins with forward slash), we are on Unix/MacOS, otherwise, it is Windows. (Soon, in FME 2018, we will have a separate FME_PLATFORM system parameter). Then, I create two different command lines. The main difference beside path syntax is in quotes. Compare the two command lines:
UNIX (for FME on Linux in the cloud):
ffmpeg -framerate 10 -i @Value(_dataset)/frame_%5d.tif -y -codec:v libx264 -pix_fmt yuv420p -codec:a aac @Value(_video_file_name)
Windows:
""C:\\Program Files\\ffmpeg\\bin\\ffmpeg.exe" -framerate 10 -i "@Value(_dataset)/frame_%5d.tif" -y -codec:v libx264 -pix_fmt yuv420p -codec:a aac "@Value(_video_file_name)""
I hope this can help you to figure out what is going on. If not, please let me know, we will try to figure out the correct syntax for Mac here.
Dmitri
Hi @westdakota,
I didn't test Tesseract for Mac, but I am aware of a couple of things that require attention when a workspace with SystemCaller is ported to a different platform.
1) Do you use TesseractCaller transformer? It creates temp files using TempPathnameCreator transformer. It used to create really long file names, which work correctly with most modern software, but some packages have limits on the path length. For example, PotraceCaller, a transformer I made about the same time as the TesseractCaller, does not work on Mac due to this limitation. We updated the TempPathnameCreator in recent FME builds, so this problem should go away. Meanwhile, you can just generate the temp path names within the transformer (maybe something like /temp/current_datetime/) or use hard-coded path values.
2) The syntax for the SystemCaller is a bit different for Windows and Unix/MacOS. In my recent workspace, which runs FFmpeg video tool, I use Tester to figure out the platform:
If @Left($(FME_HOME),1) = "/" (that is, if FME_HOME variable begins with forward slash), we are on Unix/MacOS, otherwise, it is Windows. (Soon, in FME 2018, we will have a separate FME_PLATFORM system parameter). Then, I create two different command lines. The main difference beside path syntax is in quotes. Compare the two command lines:
UNIX (for FME on Linux in the cloud):
ffmpeg -framerate 10 -i @Value(_dataset)/frame_%5d.tif -y -codec:v libx264 -pix_fmt yuv420p -codec:a aac @Value(_video_file_name)
Windows:
""C:\\Program Files\\ffmpeg\\bin\\ffmpeg.exe" -framerate 10 -i "@Value(_dataset)/frame_%5d.tif" -y -codec:v libx264 -pix_fmt yuv420p -codec:a aac "@Value(_video_file_name)""
I hope this can help you to figure out what is going on. If not, please let me know, we will try to figure out the correct syntax for Mac here.
Dmitri
I tried TesseractCaller, but it was failing for me. Specifically, it wasn't seeing any text at all. Since I didn't need all of the features of your caller (ultimately, I just want a tally of the frequency of words in a PDF document), I decided to just try to make a simplified workspace that took in a png, converted it to RGB24, and then used SystemCaller to run Tesseract on that converted file. It got as far as the conversion before failing on the SystemCaller. From Terminal, Tesseract is able to OCR the text just fine (although I couldn't get it to read beyond page 1).
I did notice the quotes/no-quotes distinction between Windows and Mac, and my paths were without quotes. I can literally copy/paste what I have in the SystemCaller parameter and drop it into the Terminal command line, and it works just fine.
As for the not-reading-beyond-page-1 problem, when I converted my PDF to tiff rather than png, I was able to read all pages. Not sure why it won't work with the png.
Hi @westdakota Unless you also start FME Workbench from the terminal, you'll need to specify the full path to tesseract in the SystemCaller, e.g. /usr/local/bin/tesseract /Users/westdakota/Documents/Coding_Projects/LSAT_Vocab/Output.png /Users/westdakota/Documents/Coding_Projects/LSAT_Vocab/words