PDF Reader: Improve reading of non-tagged tables

PDF Tables are not detected in FME if not tagged as table by an appropriate software. However most of the PDF tables are not tagged correctly, and the PDF reader then does not read them.

On the discussion here: https://knowledge.safe.com/questions/90534/reading-pdf-table.html?childToView=90894#comment-90894

@jakemolnar and @krisvewsp had some ideas to improve this reader. Maybe something to look at?

Would be very useful!

Thanks,

Claire

Page 1 / 1

There are definitely some complex things to consider if FME wants to tackle this problem. For instance, imagine we have a table like this:

A1B1C1D1E1A2B2C2D2E2...............ANBNCNDNEN

Should hopefully be fairly easy to figure out the spatial relationships, but what if this table is split onto multiple pages? It could be that extra rows are on the next page, but with no header row. Or maybe the next page has a header row. Or maybe the page only fits A, B, and C, so then columns D and E are on the next page.

FME would probably need something similar to the Excel reader settings box, with a variety of parameters for determining questions such as:

Is there a header row?
- Does it span multiple pages?
- Is it repeated on each page
Is there more than one table?
- What is the bounding box of each table?
- Is it the same for each page?
What is the page range that contains the table(s)?
What should FME do if the table cells contain vector linework and/or images instead of text? (the linework may even look like text).

It would probably be helpful for FME users to think about how what kind of settings they think they'd like in order to read the tables that they work with in their own dataflows. Hopefully they comment here and make some suggestions!

Any development on this topic? Reading untagged tables PDF tables using Excel is relatively easy, could something similar be developed in order to read large volumes of PDF untagged tables using FME?

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded