Skip to main content
Question

Tesseract extract text from Image

  • November 30, 2018
  • 9 replies
  • 214 views

simonhume
Contributor
Forum|alt.badge.img+2

Good morning

 

I'm trying to extract text from a simple jpg image using Tesseractcaller

 

 

When i run Tesseract-OCR (version 3.05.02 64bit) on my PC (Windows 10) from Windows Powershell to extract the data from the .jpg, it works fine

 

 

If I run the attached workspace in FME, the text export to .csv only gives the first line ie hello there but not the second line. I get no output for the line option

 

 

I found another post about the Tesseract transformer, https://knowledge.safe.com/questions/80141/how-can-i-get-the-tesseractcaller-to-output-to-the.html, and have made the suggested change, but this didn't seem to work

 

 

Any suggestions as to how i can resolve this?

 

 

Simon Hume

 

9 replies

danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • November 30, 2018

Hi @simonhume

 

I executed without problems in my machine - Please, check the result Output:

 

What kind of information do you want to extract in the Output port Line:

 

Thanks,

Danilo

 


simonhume
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • November 30, 2018
danilo_fme wrote:

Hi @simonhume

 

I executed without problems in my machine - Please, check the result Output:

 

What kind of information do you want to extract in the Output port Line:

 

Thanks,

Danilo

 

Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon


danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • November 30, 2018
simonhume wrote:

Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon

Hi @simonhume

The results are in the Output port Text :

 

There aren't some results in Port Line.

 

 

Thanks,

Danilo


simonhume
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • November 30, 2018

Hi @danilo_fme

thanks you for your response

I've now done exactly the same as you and can see all the text is visible in the Inspector, but in the output to the .csv file. only the first line ('hello there') has been output

If you output to .csv as well, do you get all the text, or only the first line?

 

thanks


danilo_fme
Evangelist
Forum|alt.badge.img+44
  • Evangelist
  • November 30, 2018
simonhume wrote:

Hi @danilo_fme

thanks you for your response

I've now done exactly the same as you and can see all the text is visible in the Inspector, but in the output to the .csv file. only the first line ('hello there') has been output

If you output to .csv as well, do you get all the text, or only the first line?

 

thanks

Hi @simonhume

 

Attached the result - text_extracted.csv

Is ti right?

 

Thanks,

Danilo


Forum|alt.badge.img
  • September 6, 2019

Hello @simonhume and @danilo_fme,

I am currently working with the workbench on this forum, but when I run the workspace the Tesseract Caller doesn't not output anything in the Text or Line attributes, only on the <Rejected> attribute. What does your workbench look when you have ran the workspace? Any help in why I cannot get a output?

Regards,

Gerardo Rodriguez


simonhume
Contributor
Forum|alt.badge.img+2
  • Author
  • Contributor
  • September 10, 2019

Hi Gerardo

I wasn't able to output the two lines of text from my .jpg file into the .csv file using the workspace i had set up. I'm not sure how Danilo was able to as i set the workspace up the same.

I did eventually give up after wasting quite a few hours on it. Transpires that colleagues have manually typed whatever information they need from the source files. I doubt they could have been run through FME as the majority were so poor

regards

Simon Hume

 


Forum|alt.badge.img
  • October 3, 2019
simonhume wrote:

Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon

Hello @simonhume,

I got my issue resolved that i posted in this Forum, i ran the workbench you attached and was able to get the output that @danilo_fme got. At first I thought I didn't get it, but it was always there you have to expand the row in excel so that you can see the "hgshdgsdfhg". That is what i encountered.

Regards,

Gerardo Rodriguez


jovitaatsafe
Safer
Forum|alt.badge.img+11

My apologies for a new post on an old question! I saw some new activity here and it looks like it's been resolved elsewhere.

If you are seeking an answer to a similar question, please check out Dmitri's answer to this Q&A, as different versions of Tesseract have different syntax than previous versions.

If you have any other questions on this, please post a new question and just link it to this or any relevant old posts. This makes it easy for everyone to read, and for us to track that you guys have gotten a useful answer! Thanks all!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings