Skip to main content
Solved

PDF writer: Document properties


fdw
Contributor
Forum|alt.badge.img+6
  • Contributor

Hi,

 

We are daily updating a number of PDF's with FME for publication on our website but these files need to have the Document Properties set. Our webcare team won't allow for the pdf's to be published. So far I'm not able to achieve this.

There seems to be an option with the format attribute 'pdf_document_info_metadata' but how do I format the properties in my writer?

When reading a pdf which has these properties filled in, pdf_document_info_metadata is a feature type. I have tried adding this feature type to my pdf writer with the exact same schema. But no result; empty properties still.

Does anyone know if writing the properties is supported and how to achieve it?

Best answer by warrendev

Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 

View original
Did this help you find an answer to your question?

8 replies

debbiatsafe
Safer
Forum|alt.badge.img+20

Hi @freekdw

I've asked our development team and unfortunately, FME does not natively support writing PDF document properties/metadata. While the PDF reader is able to read document properties, the PDF writer uses a different library that does not support document properties as it was implemented long before the reader.

If you wish to use FME to update PDFs but still carry through the metadata from PDF input files, you will need to post-process the PDF output to add metadata.

I have not tested this, but quick searching suggests it may be possible using PDF utility applications (eg.this page here) or Python libraries (eg. as mentioned here). You would want to combine these with either the SystemCaller or PythonCaller depending on which method used.


fdw
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • April 29, 2020

Hi @debbiatsafe,

Thanks for asking. Too bad it is not natively supported but good to hear there are options. I'm not that familiar with this kind of coding but I'll have a look into it.


warrendev
Enthusiast
Forum|alt.badge.img+23
  • Enthusiast
  • Best Answer
  • April 29, 2020

Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 


fdw
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • April 30, 2020
warrendev wrote:

Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 

Hey @warrengis,

 

Wonderful! Thank you so much for sharing this elegant code! This will help me getting the files published.


fdw
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • April 30, 2020
warrendev wrote:

Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

 

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

 

Any ideas?

4_JorisIvensplein.pdf


warrendev
Enthusiast
Forum|alt.badge.img+23
  • Enthusiast
  • April 30, 2020
fdw wrote:

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

 

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

 

Any ideas? 

4_JorisIvensplein.pdf

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

 

 

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html 

 

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf


fdw
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • April 30, 2020
warrendev wrote:

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

 

 

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html 

 

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

Hi @warrengis,

 

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

 

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.


warrendev
Enthusiast
Forum|alt.badge.img+23
  • Enthusiast
  • April 30, 2020
fdw wrote:

Hi @warrengis,

 

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

 

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

Glad you got it working!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings