Skip to main content

I was extracting tables from a PDF file by following the steps in  https://support.safe.com/hc/en-us/articles/25407564475277-Extracting-Text-and-Tabular-Data-from-PDF#h_01HW3Z9Z37Q33NQQ7R0XBGEVB0. However, I encountered a challenge some words are misspelled or broken when extracted, even though they appear correctly in the PDF viewer. I suspect this is because the PDF was exported from a graphic design app such as Adobe Illustrator. I realized this when I copied text from the PDF and pasted it into a Word document the same issue occurred (correct me if I’m wrong) is there a transformer can help fix or reconstruct the text? 

 

@mohamedalsobh If the text that when extracted has issues is something for which you can build a lookup table using AttributeValueMapper to replace with correct text, then you could try this approach.

https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/attributevaluemapper.htm

For the mangled text string, create the correct string using the AttributeValueMapper.

Happy FME:-) ing

Cheers

SRG

 

 


@mohamedalsobh If the text that when extracted has issues is something for which you can build a lookup table using AttributeValueMapper to replace with correct text, then you could try this approach.

https://docs.safe.com/fme/html/FME-Form-Documentation/FME-Transformers/Transformers/attributevaluemapper.htm

For the mangled text string, create the correct string using the AttributeValueMapper.

Happy FME:-) ing

Cheers

SRG

 

 

Thanks for your assistance I was trying to correct a word in a sentence and I used the StringReplacer it worked well