Question

How to extract one pdf page and write it to another pdf file


Badge

Hi,

 

Need to extract each page from a pdf file and to write them to another distinct pdf file. The issue I have is that the content doesn't follow to the new generated pdf file .

Secondly, I want to bind some specific pdf pages to one PDF file.

 

Thanks


10 replies

Badge +2

Hi @madi​ ,

I'm attaching a workspace with 2 examples, 1) extracting a single page of a PDF to a new single-page PDF and 2) inserting a page from PDF A (Desktop Authoring PDF) into page 2 on PDF B (FME Server Authoring).

 

imageThe idea is the same for both. Read in the PDF, and reset the pdf_page_number attribute. In the first example (top bookmark) I'm taking page 8 from the existing PDF and creating a new PDF with only that page. In order to do that, I need to set the pdf_page_number attribute value to 1. You'll be able to accomplish your first challenge using a Fanout and this method.

 

In the second example (bottom bookmark), I'm taking that same page 8 and inserting it in between page 1 and 2 of another PDF. In order to do this, I need to set the pdf_page_number to 2 for the page I wish to insert, and for the existing page 2 (and beyond) I need to add 1 to the pdf_page_number attribute value.

Badge

Thanks a lot @chrisatsafe​ for your support.😉

I'm going to test on my side right away.

Badge

@chrisatsafe​ It seems that your are using a newer FME Desktop version. Would you mind to save InsertPDFPage.fmw workbench to this FME version 2021.0.3.0 or later?

 

Thanks!

Badge +2

@chrisatsafe​ It seems that your are using a newer FME Desktop version. Would you mind to save InsertPDFPage.fmw workbench to this FME version 2021.0.3.0 or later?

 

Thanks!

Here you go

Badge

Hi @chrisatsafe​ ,

 

I have an issue with the contents(texts) of my PDF tables that are not transferred to the destination PDF file. As exemple, see my attached original pdf.

Is there something I'm missing?

 

Thanks,

Madi

Badge

Hi @chrisatsafe​ ,

 

I have an issue with the contents(texts) of my PDF tables that are not transferred to the destination PDF file. As exemple, see my attached original pdf.

Is there something I'm missing?

 

Thanks,

Madi

This is the final result I have at the end of my transformation:

Badge +2

This is the final result I have at the end of my transformation:

Hmm, hard to say what might be happening here. Do you have a sample pdf that has a similar table that can be shared here? Not all PDF's are created the same so the reader parameters might need to be tweaked a bit to accommodate the differences.

 

If you can't share a sample, try playing with the reader parameters to see if it improves the result. image

Badge

See the original file if that could help.

Badge +2

See the original file if that could help.

Hi @madi​ ,

 

Have a look at the following workspace. It seems you're working with a combo of rasterized pages and text pages. This should do the trick (showing how to extract each page to it's own pdf, if you want to do the insert into page position x, use the logic from the workspace above ;P

 

Also note, I added the PDFPageFormatter as I just noticed your PDF uses the Legal paper size (8.5 x 14).

 

Sample output from text page:

imageSample output from rasterized page:

imageReader Parameters used:

image

Badge

Hi @madi​ ,

 

Have a look at the following workspace. It seems you're working with a combo of rasterized pages and text pages. This should do the trick (showing how to extract each page to it's own pdf, if you want to do the insert into page position x, use the logic from the workspace above ;P

 

Also note, I added the PDFPageFormatter as I just noticed your PDF uses the Legal paper size (8.5 x 14).

 

Sample output from text page:

imageSample output from rasterized page:

imageReader Parameters used:

image

Hi @chrisatsafe​ ,

 

I completed first extraction of my PDF files based on workspace above.

 

Then, second task was easy by combining all rasterzied pages in one PDF.

 

Thanks a lot,

 

Madi.

 

Reply