Skip to main content
Solved

Cannot get the Tesseract to give an output, it ports out to Rejected Features

  • September 27, 2019
  • 4 replies
  • 64 views

Forum|alt.badge.img

I have being trying to get the TesseractCaller to work, to get text out of images into GIS. The TesseractCaller always outputs in the <Rejected> port. I found this Forum "https://knowledge.safe.com/questions/81922/the-tesseractcaller-only-returns-75-of-the-image-a.html " where TesseractCaller was working, so I downloaded the FME Workbench and the pdf from that forum and tried to run it on my end to see an output, but nothing. For the couple of runs I was using Tesseract version 5, I installed version 4 but it also fails. I'm trying to get text of an image into GIS, if not easily to GIS, Excel will work. I have attached the log file I am getting after running the attached Workbench.

@dmitribagh - i have made this forum following the conversation in the other forum .

@danilo_fme - saw that you had help someone in another forum related to this topic.

 

Best answer by dmitribagh

Hi @gerardor,

 

please try the attached version of the workspace. It works well for me with Tesseract 4.

The change I made was, as I said earlier, removing the -psm flag, which is not supported in Tesseract 4.

 

If you need to do it with your workspaces, simply right click on TesseractCaller, select 'Edit', this will open the transformer in a separate tab. Go to AttributeCreator transformer (the one that has annotation "Assemble a command line") and remove the -prm and its parameter from there. That is, change from this:

 

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -psm $(PAGESEGMODE) -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

to this:

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

I hope this helps. Feel free to contact me directly at if you need more assistance.

 

Dmitri

30358-22879-mp3.fmw

View original
Did this help you find an answer to your question?

4 replies

dmitribagh
Safer
Forum|alt.badge.img+16
  • Safer
  • September 30, 2019

Hi, @gerardor,

 

try and updated version of the transformer - I uploaded it a few minutes ago.

 

The syntax of the command line slightly changed from v3 to v4 - there is no "-psm" option anymore, and it is usually enough to remove it from a command line, which is created right before SystemCaller inside TesseractCaller.

 

Dmitri

 


Forum|alt.badge.img
  • Author
  • October 1, 2019

Hi, @dmitribagh

Where can I find the version you updated, I just ran the Workbench I attached tot his forum with the updated TesseractCaller(tesseract-ocr-w64-setup-v5.0.0-alpha.20190708.exe). I have attached the log file and I see it still mentions something related to "-psm" . Also, not sure it shows in the log file but the Transformer for the TesseractCaller after Rejecting the features, I get a table and one attribute that is named "_rejection_string" has the following "Teeseract didn't find any text on this raster, I'm not why it is doing that.

 

Gerardo Rodriguez


dmitribagh
Safer
Forum|alt.badge.img+16
  • Safer
  • Best Answer
  • October 1, 2019

Hi @gerardor,

 

please try the attached version of the workspace. It works well for me with Tesseract 4.

The change I made was, as I said earlier, removing the -psm flag, which is not supported in Tesseract 4.

 

If you need to do it with your workspaces, simply right click on TesseractCaller, select 'Edit', this will open the transformer in a separate tab. Go to AttributeCreator transformer (the one that has annotation "Assemble a command line") and remove the -prm and its parameter from there. That is, change from this:

 

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -psm $(PAGESEGMODE) -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

to this:

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

I hope this helps. Feel free to contact me directly at if you need more assistance.

 

Dmitri

30358-22879-mp3.fmw


Forum|alt.badge.img
  • Author
  • October 1, 2019
dmitribagh wrote:

Hi @gerardor,

 

please try the attached version of the workspace. It works well for me with Tesseract 4.

The change I made was, as I said earlier, removing the -psm flag, which is not supported in Tesseract 4.

 

If you need to do it with your workspaces, simply right click on TesseractCaller, select 'Edit', this will open the transformer in a separate tab. Go to AttributeCreator transformer (the one that has annotation "Assemble a command line") and remove the -prm and its parameter from there. That is, change from this:

 

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -psm $(PAGESEGMODE) -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

to this:

""$(TESSERACT)" "@Value(_dataset)\tesseract.jpg" "@Value(_dirpath)@Value(_rootname)" -l @Value(_language) @Value(_format) $(DIGITSONLY)"

 

I hope this helps. Feel free to contact me directly at if you need more assistance.

 

Dmitri

30358-22879-mp3.fmw

This worked the TesseractCaller outputted something. I will need to work on how I want the output to look. Cause I'm not getting two columns(lot no and address), I'm just getting one column which doesn't look right, How did you manage to get the two columns you provided in the forum https://knowledge.safe.com/questions/81922/the-tesseractcaller-only-returns-75-of-the-image-a.html .

Thank you @dmitribagh ,

Gerardo


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings