Skip to main content
Question

How to organize table text from PDF to write to excel?

  • January 10, 2020
  • 1 reply
  • 17 views

mcgregrr
Contributor
Forum|alt.badge.img+1

I have been using a workflow that was previously posted here to try and extract text, representing a table, from a PDF and write out a worksheet (for each table) to excel. I have been able to extract and isolate the desired text but I cannot figure out how to properly organize the records to write to the excel cells.

1 reply

debbiatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • January 14, 2020

Hi @mcgregrr

Unfortunately, extracting information from PDF (and in particular tables in PDFs) can be quite difficult as information is often not grouped in a logical manner. A workflow that may work for one portion of your file may not work for another so manual editing is likely required.

For the tables information you are trying to extract, I would recommend organizing by each row and column of a table. For rows, you could group by the _y position using a transformer such as an Aggregator.

It will be slightly trickier to figure out columns as data within a column will not have the same _x position. You could use the positions of the header column to figure out a range of _x values that contains data for a column.

With this column and row information, it may be possible to try to 'organize' the data in a logical manner which will make it easier to write out. I have attached an example workspace which attempts to do this for one of the tables in your PDF. I hope this helps. pdftableextracttoexcel.fmw


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings