Question

How to organize table text from PDF to write to excel?

5 years ago
January 10, 2020
1 reply
17 views

mcgregrr
Contributor
4 replies

I have been using a workflow that was previously posted here to try and extract text, representing a table, from a PDF and write out a worksheet (for each table) to excel. I have been able to extract and isolate the desired text but I cannot figure out how to properly organize the records to write to the excel cells.

+20

debbiatsafe
Safer
648 replies
5 years ago
January 14, 2020

Hi @mcgregrr

Unfortunately, extracting information from PDF (and in particular tables in PDFs) can be quite difficult as information is often not grouped in a logical manner. A workflow that may work for one portion of your file may not work for another so manual editing is likely required.

For the tables information you are trying to extract, I would recommend organizing by each row and column of a table. For rows, you could group by the _y position using a transformer such as an Aggregator.

It will be slightly trickier to figure out columns as data within a column will not have the same _x position. You could use the positions of the header column to figure out a range of _x values that contains data for a column.

With this column and row information, it may be possible to try to 'organize' the data in a logical manner which will make it easier to write out. I have attached an example workspace which attempts to do this for one of the tables in your PDF. I hope this helps. pdftableextracttoexcel.fmw

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

How to organize table text from PDF to write to excel?