Skip to main content
Question

PDF words are misspelled


mohamedalsobh
Contributor
Forum|alt.badge.img+4

I was extracting tables from a PDF file by following the steps in  https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0. However, I encountered a challenge some words are misspelled or broken when extracted, even though they appear correctly in the PDF viewer. I suspect this is because the PDF was exported from a graphic design app such as Adobe Illustrator. I realized this when I copied text from the PDF and pasted it into a Word document the same issue occurred (correct me if I’m wrong) is there a transformer can help fix or reconstruct the text? 

 

2 replies

raghavendrans
Enthusiast
Forum|alt.badge.img+14

@mohamedalsobh If the text that when extracted has issues is something for which you can build a lookup table using AttributeValueMapper to replace with correct text, then you could try this approach.

https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/attributevaluemapper.htm

For the mangled text string, create the correct string using the AttributeValueMapper.

Happy FME:-) ing

Cheers

SRG

 

 


mohamedalsobh
Contributor
Forum|alt.badge.img+4
raghavendrans wrote:

@mohamedalsobh If the text that when extracted has issues is something for which you can build a lookup table using AttributeValueMapper to replace with correct text, then you could try this approach.

https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/attributevaluemapper.htm

For the mangled text string, create the correct string using the AttributeValueMapper.

Happy FME:-) ing

Cheers

SRG

 

 

Thanks for your assistance I was trying to correct a word in a sentence and I used the StringReplacer it worked well


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings