Skip to main content

Good morning

 

I'm trying to extract text from a simple jpg image using Tesseractcaller

 

 

When i run Tesseract-OCR (version 3.05.02 64bit) on my PC (Windows 10) from Windows Powershell to extract the data from the .jpg, it works fine

 

 

If I run the attached workspace in FME, the text export to .csv only gives the first line ie hello there but not the second line. I get no output for the line option

 

 

I found another post about the Tesseract transformer, https://knowledge.safe.com/questions/80141/how-can-i-get-the-tesseractcaller-to-output-to-the.html, and have made the suggested change, but this didn't seem to work

 

 

Any suggestions as to how i can resolve this?

 

 

Simon Hume

 

Hi @simonhume

 

I executed without problems in my machine - Please, check the result Output:

 

What kind of information do you want to extract in the Output port Line:

 

Thanks,

Danilo

 


Hi @simonhume

 

I executed without problems in my machine - Please, check the result Output:

 

What kind of information do you want to extract in the Output port Line:

 

Thanks,

Danilo

 

Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon


Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon

Hi @simonhume

The results are in the Output port Text :

 

There aren't some results in Port Line.

 

 

Thanks,

Danilo


Hi @danilo_fme

thanks you for your response

I've now done exactly the same as you and can see all the text is visible in the Inspector, but in the output to the .csv file. only the first line ('hello there') has been output

If you output to .csv as well, do you get all the text, or only the first line?

 

thanks


Hi @danilo_fme

thanks you for your response

I've now done exactly the same as you and can see all the text is visible in the Inspector, but in the output to the .csv file. only the first line ('hello there') has been output

If you output to .csv as well, do you get all the text, or only the first line?

 

thanks

Hi @simonhume

 

Attached the result - text_extracted.csv

Is ti right?

 

Thanks,

Danilo


Hello @simonhume and @danilo_fme,

I am currently working with the workbench on this forum, but when I run the workspace the Tesseract Caller doesn't not output anything in the Text or Line attributes, only on the <Rejected> attribute. What does your workbench look when you have ran the workspace? Any help in why I cannot get a output?

Regards,

Gerardo Rodriguez


Hi Gerardo

I wasn't able to output the two lines of text from my .jpg file into the .csv file using the workspace i had set up. I'm not sure how Danilo was able to as i set the workspace up the same.

I did eventually give up after wasting quite a few hours on it. Transpires that colleagues have manually typed whatever information they need from the source files. I doubt they could have been run through FME as the majority were so poor

regards

Simon Hume

 


Hi @danilo_fme

I've been trying to extract all the information, but was only getting the 1st line ('hello there')

can i ask what you added to the AttributeExposer_2 to enable this to heppen?

thanks

Simon

Hello @simonhume,

I got my issue resolved that i posted in this Forum, i ran the workbench you attached and was able to get the output that @danilo_fme got. At first I thought I didn't get it, but it was always there you have to expand the row in excel so that you can see the "hgshdgsdfhg". That is what i encountered.

Regards,

Gerardo Rodriguez


My apologies for a new post on an old question! I saw some new activity here and it looks like it's been resolved elsewhere.

If you are seeking an answer to a similar question, please check out Dmitri's answer to this Q&A, as different versions of Tesseract have different syntax than previous versions.

If you have any other questions on this, please post a new question and just link it to this or any relevant old posts. This makes it easy for everyone to read, and for us to track that you guys have gotten a useful answer! Thanks all!


Reply