Skip to main content

Hi,

 

We are daily updating a number of PDF's with FME for publication on our website but these files need to have the Document Properties set. Our webcare team won't allow for the pdf's to be published. So far I'm not able to achieve this.

There seems to be an option with the format attribute 'pdf_document_info_metadata' but how do I format the properties in my writer?

When reading a pdf which has these properties filled in, pdf_document_info_metadata is a feature type. I have tried adding this feature type to my pdf writer with the exact same schema. But no result; empty properties still.

Does anyone know if writing the properties is supported and how to achieve it?

Hi @freekdw

I've asked our development team and unfortunately, FME does not natively support writing PDF document properties/metadata. While the PDF reader is able to read document properties, the PDF writer uses a different library that does not support document properties as it was implemented long before the reader.

If you wish to use FME to update PDFs but still carry through the metadata from PDF input files, you will need to post-process the PDF output to add metadata.

I have not tested this, but quick searching suggests it may be possible using PDF utility applications (eg.this page here) or Python libraries (eg. as mentioned here). You would want to combine these with either the SystemCaller or PythonCaller depending on which method used.


Hi @debbiatsafe,

Thanks for asking. Too bad it is not natively supported but good to hear there are options. I'm not that familiar with this kind of coding but I'll have a look into it.


Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 


Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 

Hey @warrengis,

 

Wonderful! Thank you so much for sharing this elegant code! This will help me getting the files published.


Hi @freekdw,

 

write_pdf_metadata.fmwt

 

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow. 

input-pdfoutput-pdfworkspace

 

 

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

 

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

 

Any ideas?

4_JorisIvensplein.pdf


I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

 

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

 

Any ideas? 

4_JorisIvensplein.pdf

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

 

 

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html 

 

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    metao'dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf


It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

 

 

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html 

 

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    metai'dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

Hi @warrengis,

 

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValueso'input_pdf'])
    with pdf.open_metadata() as meta:
        metaÂ'dc:title'] = feature.getAttribute('title')
        metaw'dc:subject'] = feature.getAttribute('subject')
        meta:'dc:creator'] = feature.getAttribute('creator')
        metat'dc:description'] = feature.getAttribute('description')
        metam'xmp:CreateDate'] = feature.getAttribute('creation_date')
        metam'xmp:ModifyDate'] = feature.getAttribute('creation_date')
        metaÂ'xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValuesÂ'output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

 

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.


Hi @warrengis,

 

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues 'input_pdf'])
    with pdf.open_metadata() as meta:
        metae'dc:title'] = feature.getAttribute('title')
        metaÂ'dc:subject'] = feature.getAttribute('subject')
        metau'dc:creator'] = feature.getAttribute('creator')
        metam'dc:description'] = feature.getAttribute('description')
        metaÂ'xmp:CreateDate'] = feature.getAttribute('creation_date')
        metaÂ'xmp:ModifyDate'] = feature.getAttribute('creation_date')
        metar'xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues/'output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

 

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

Glad you got it working!


Reply