Skip to main content

Hello everyone,

I've created a workflow analyzing text within a pdf-file with the Adobe Geospatial PDF reader. To limit the amount of data to what im intressted in i've tried to use the "page ranges" field to specify which pages to read. The problem is that the pdf index the first page in each file as 0, the second as 1 and so forth, and the Range field returns 0 as an incorrect value. If i put in page range 1-2 the reader reads page 2-3 in the data. Is there someway to use this field with my dataset?

It's seems like the reader reindex the pages which makes it a headache but manageable with a little bit of time. If someone can confirm this then it's all i need.

Cheers

Joel


It's seems like the reader reindex the pages which makes it a headache but manageable with a little bit of time. If someone can confirm this then it's all i need.

Cheers

Joel

Hi @edmjoe

Currently, FME considers the first page of a PDF document as 1 and not 0.

For example, when viewing your datasets in Data Inspector, you will find that the format attribute pdf_page_number = 1 for features from the first page.


If im not disremebering the pdf reader reads the page metadata from index 0 and onwards. I’ll double check later when im not afk.


A little late but I've now looked this up. the pdf reader index the first page as 0 when you read it as "Non-Spatial" without using "Page Ranges. When you're using the "Page Ranges" function it reads (converts?) the first page to index 1. Don't know if it's a feature or a bug but it's what's happening.


Reply