Skip to main content
Question

PDF to Text - Extracting text from PDFs and creating table

  • December 11, 2019
  • 2 replies
  • 193 views

rich90599
Contributor
Forum|alt.badge.img+4

Hello,

I was given a PDF with survey benchmark information on it. Each page in the PDF is has information on one benchmark (see attached "Benchmarks_sample.pdf". I wanted to get this information into a tabular format so I can have a list with the columns: BenchmarkID, Easting, Northing, Description.

All of the info for the columns are within the PDF but I'm having a hard time extracting the information into the columns.

I'm using the PDF2TextReader from the FME Hub (https://hub.safe.com/publishers/timalbertvictor/transformers/pdf2textreader) which works pretty well for extracting the data. However, I am having a hard time isolating the pertinent information to the appropriate column.

Any tips/help on this matter would be helpful.

Thanks!

 

 

2 replies

becchr
Influencer
Forum|alt.badge.img+26
  • Influencer
  • December 12, 2019

hi @rich90599, I never used the hub transformer you mentioned, but as a coordinateslover I use them often to select text based on position on the page, and regex-expressions for specific text formats (like your N/E-values) as finetuning the selection.

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!


rich90599
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • December 13, 2019
becchr wrote:

hi @rich90599, I never used the hub transformer you mentioned, but as a coordinateslover I use them often to select text based on position on the page, and regex-expressions for specific text formats (like your N/E-values) as finetuning the selection.

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!

Wiow, this works amazing. I was working my way through it by using the stringsearcher with condition statements, but was having a hard time on the description part of the PDF. Your solution is a lot more efficient. Thank you so much, @becchr! Your help is very much appreciated.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings