Hi, We are daily updating a number of PDF's with FME for publication on our website but these files need to have the Document Properties set. Our webcare team won't allow for the pdf's to be published. So far I'm not able to achieve this.There seems to be an option with the format attribute 'pdf_document_info_metadata' but how do I format the properties in my writer?When reading a pdf which has these properties filled in, pdf_document_info_metadata is a feature type. I have tried adding this feature type to my pdf writer with the exact same schema. But no result; empty properties still.Does anyone know if writing the properties is supported and how to achieve it?

PDF writer: Document properties

Userlevel 3

+17

Hi @freekdw

I've asked our development team and unfortunately, FME does not natively support writing PDF document properties/metadata. While the PDF reader is able to read document properties, the PDF writer uses a different library that does not support document properties as it was implemented long before the reader.

If you wish to use FME to update PDFs but still carry through the metadata from PDF input files, you will need to post-process the PDF output to add metadata.

I have not tested this, but quick searching suggests it may be possible using PDF utility applications (eg.this page here) or Python libraries (eg. as mentioned here ). You would want to combine these with either the SystemCaller or PythonCaller depending on which method used.

F

+1

fdw
Author
13 replies
3 years ago
29 April 2020

Hi @debbiatsafe,

Thanks for asking. Too bad it is not natively supported but good to hear there are options. I'm not that familiar with this kind of coding but I'll have a look into it.

+16

warrendev
Enthusiast
123 replies
3 years ago
30 April 2020
Best Answer

Hi @freekdw,

write_pdf_metadata.fmwt

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow.

input-pdf output-pdf workspace

F

+1

fdw
Author
13 replies
3 years ago
30 April 2020

Hi @freekdw,

write_pdf_metadata.fmwt

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow.

input-pdf output-pdf workspace

Hey @warrengis,

Wonderful! Thank you so much for sharing this elegant code! This will help me getting the files published.

F

+1

fdw
Author
13 replies
3 years ago
30 April 2020

Hi @freekdw,

write_pdf_metadata.fmwt

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow.

input-pdf output-pdf workspace

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

Any ideas?

4_JorisIvensplein.pdf

+16

warrendev
Enthusiast
123 replies
3 years ago
30 April 2020

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

Any ideas?

4_JorisIvensplein.pdf

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

F

+1

fdw
Author
13 replies
3 years ago
30 April 2020

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

Hi @warrengis,

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

+16

warrendev
Enthusiast
123 replies
3 years ago
1 May 2020

Hi @warrengis,

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

Glad you got it working!

PDF writer: Document properties

8 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded