Question

PDF to Text - Extracting text from PDFs and creating table

5 years ago
December 11, 2019
2 replies
193 views

rich90599
Contributor
18 replies

Hello,

I was given a PDF with survey benchmark information on it. Each page in the PDF is has information on one benchmark (see attached "Benchmarks_sample.pdf". I wanted to get this information into a tabular format so I can have a list with the columns: BenchmarkID, Easting, Northing, Description.

All of the info for the columns are within the PDF but I'm having a hard time extracting the information into the columns.

I'm using the PDF2TextReader from the FME Hub (https://hub.safe.com/publishers/timalbertvictor/transformers/pdf2textreader) which works pretty well for extracting the data. However, I am having a hard time isolating the pertinent information to the appropriate column.

Any tips/help on this matter would be helpful.

Thanks!

+26

becchr
Influencer
106 replies
5 years ago
December 12, 2019

hi @rich90599, I never used the hub transformer you mentioned, but as a coordinateslover I use them often to select text based on position on the page, and regex-expressions for specific text formats (like your N/E-values) as finetuning the selection.

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!

rich90599
Author
Contributor
18 replies
5 years ago
December 13, 2019

becchr wrote:

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!

Wiow, this works amazing. I was working my way through it by using the stringsearcher with condition statements, but was having a hard time on the description part of the PDF. Your solution is a lot more efficient. Thank you so much, @becchr! Your help is very much appreciated.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

PDF to Text - Extracting text from PDFs and creating table

1 Attachments