Solved

PDF writer: Document properties

5 years ago
April 24, 2020
8 replies
111 views

fdw
Contributor
15 replies

Hi,

We are daily updating a number of PDF's with FME for publication on our website but these files need to have the Document Properties set. Our webcare team won't allow for the pdf's to be published. So far I'm not able to achieve this.

There seems to be an option with the format attribute 'pdf_document_info_metadata' but how do I format the properties in my writer?

When reading a pdf which has these properties filled in, pdf_document_info_metadata is a feature type. I have tried adding this feature type to my pdf writer with the exact same schema. But no result; empty properties still.

Does anyone know if writing the properties is supported and how to achieve it?

Best answer by warrendev

Hi @freekdw,

write_pdf_metadata.fmwt

I've had success using the python library pdfrw for updating the metadata. I've attached an example workspace showing how it can take the metadata read in from one PDF and generate a new PDF from it. Maybe you can work with this to adapt it to your specific workflow.

input-pdf output-pdf workspace

View original

Did this help you find an answer to your question?

+20

debbiatsafe
Safer
647 replies
5 years ago
April 24, 2020

Hi @freekdw

I've asked our development team and unfortunately, FME does not natively support writing PDF document properties/metadata. While the PDF reader is able to read document properties, the PDF writer uses a different library that does not support document properties as it was implemented long before the reader.

If you wish to use FME to update PDFs but still carry through the metadata from PDF input files, you will need to post-process the PDF output to add metadata.

I have not tested this, but quick searching suggests it may be possible using PDF utility applications (eg.this page here) or Python libraries (eg. as mentioned here ). You would want to combine these with either the SystemCaller or PythonCaller depending on which method used.

fdw
Author
Contributor
15 replies
5 years ago
April 29, 2020

Hi @debbiatsafe,

Thanks for asking. Too bad it is not natively supported but good to hear there are options. I'm not that familiar with this kind of coding but I'll have a look into it.

+23

warrendev
Enthusiast
119 replies
Best Answer
5 years ago
April 29, 2020

Hi @freekdw,

write_pdf_metadata.fmwt

input-pdf output-pdf workspace

fdw
Author
Contributor
15 replies
5 years ago
April 30, 2020

warrendev wrote:

Hi @freekdw,

write_pdf_metadata.fmwt

input-pdf output-pdf workspace

Hey @warrengis,

Wonderful! Thank you so much for sharing this elegant code! This will help me getting the files published.

fdw
Author
Contributor
15 replies
5 years ago
April 30, 2020

warrendev wrote:

Hi @freekdw,

write_pdf_metadata.fmwt

input-pdf output-pdf workspace

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

Any ideas?

4_JorisIvensplein.pdf

+23

warrendev
Enthusiast
119 replies
5 years ago
April 30, 2020

fdw wrote:

I am encountering some 'NoneType' objects in my files. It's working fine on some random pdf's I grabbed from the web. It causes a fail of the pythoncode.

2020-04-30 15:53:41| 1.6| 0.5|ERROR |Python Exception <AttributeError>: 'NoneType' object has no attribute 'update'

2020-04-30 15:53:41| 1.6| 0.0|ERROR |Error encountered while calling function `add_metadata'

2020-04-30 15:53:41| 1.6| 0.0|FATAL |PythonCaller (PythonFactory): PythonFactory failed to process feature

2020-04-30 15:53:41| 1.6| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

Any ideas?

4_JorisIvensplein.pdf

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

fdw
Author
Contributor
15 replies
5 years ago
April 30, 2020

warrendev wrote:

It looks like the structure of the pdf that is causing the error. I don't know how to fix that exactly with pdfrw. After I opened it in my PDF program and saved it back out, it worked.

I did get it to work with another python library pikepdf. Here is the documentation: https://pikepdf.readthedocs.io/en/latest/tutorial.html

from pikepdf import Pdf
pdf = Pdf.open(r'37028-4-jorisivensplein.pdf')
with pdf.open_metadata() as meta:
    meta['dc:title'] = "Let's change the title"
pdf.save('output.pdf')

output 4-jorisivensplein.pdf

Hi @warrengis,

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

creation_date is formatted like '%Y-%m-%dT%H:%M:%S%Ez'. The workspace errors on the xmp: metadata but still writes it to the output pdf. I also find the naming of dc:subject and dc:description are not very intuitive. Ah well... We made it.

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

+23

warrendev
Enthusiast
119 replies
5 years ago
April 30, 2020

fdw wrote:

Hi @warrengis,

I made it work as well. Thanks for your guidance. After installing the pikepdf package the python caller contains the following code:

import fme
import fmeobjects
from pikepdf import Pdf

def add_metadata(feature):
    pdf = Pdf.open(FME_MacroValues['input_pdf'])
    with pdf.open_metadata() as meta:
        meta['dc:title'] = feature.getAttribute('title')
        meta['dc:subject'] = feature.getAttribute('subject')
        meta['dc:creator'] = feature.getAttribute('creator')
        meta['dc:description'] = feature.getAttribute('description')
        meta['xmp:CreateDate'] = feature.getAttribute('creation_date')
        meta['xmp:ModifyDate'] = feature.getAttribute('creation_date')
        meta['xmp:CreatorTool'] = feature.getAttribute('producer')
    pdf.save(FME_MacroValues['output_pdf'])

Only 1 object from the pdf has to enter the pythoncaller so a sampler or maxFeatures can be set to speed up the process.

Glad you got it working!

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

PDF writer: Document properties

8 replies

Reply

Helpful Members This Week

Recently Solved Questions

Linestring to geometry conversion

Get value from index +1

FME 2025.0 - Failing exporting SDF files

Unexpected function wrapper in DateTimeConverter

Concatenate fields with space in between

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings