Skip to main content

Hello,

I was given a PDF with survey benchmark information on it. Each page in the PDF is has information on one benchmark (see attached "Benchmarks_sample.pdf". I wanted to get this information into a tabular format so I can have a list with the columns: BenchmarkID, Easting, Northing, Description.

All of the info for the columns are within the PDF but I'm having a hard time extracting the information into the columns.

I'm using the PDF2TextReader from the FME Hub (https://hub.safe.com/publishers/timalbertvictor/transformers/pdf2textreader) which works pretty well for extracting the data. However, I am having a hard time isolating the pertinent information to the appropriate column.

Any tips/help on this matter would be helpful.

Thanks!

 

 

hi @rich90599, I never used the hub transformer you mentioned, but as a coordinateslover I use them often to select text based on position on the page, and regex-expressions for specific text formats (like your N/E-values) as finetuning the selection.

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!


hi @rich90599, I never used the hub transformer you mentioned, but as a coordinateslover I use them often to select text based on position on the page, and regex-expressions for specific text formats (like your N/E-values) as finetuning the selection.

For the example PDF you attached, these settings in my workbench seem to work, you can tweak yourself if other PDF’s are slightly different.

PDFtoTable.fmwt

Hope this helps!

Wiow, this works amazing. I was working my way through it by using the stringsearcher with condition statements, but was having a hard time on the description part of the PDF. Your solution is a lot more efficient. Thank you so much, @becchr! Your help is very much appreciated.


Reply