+1 from me from a request for a user to drive PDF production using Excel to create a single page for each Excel row with a map plus table of information. The table would also contain a free form text field so variable length text that wraps line after line into a paragraph.
Map is and static text is easy but tables and variable length text isn't so.
+1 from me as well. I was able to create a simple PDF report using the HTMLReportGenerator and then using the custom transformer that takes HTML and goes to PDF. It worked alright for a very simple report, but it wouldn't scale for more advanced PDF reporting needs.
As a "right now" solution: multiple one-page .PDF outputs can be combined via PDF Toolkit (via .BAT). Not a great long-term fix.
I work in design and construction. We use pdfs all the time. They are:
- Very large. 100-500+MB is common.
- A variety of data formats on one page, ie. Raster, Vector, Text, Image...
- A variety of page sizes. ie., in the same pdf you may have both 8.5x11, 11x17 and 24x36
- Have bookmarks
- Often authored in BlueBeam.
These PDFs need to be manipulated for different applications and workflows. Actions such as:
- Fan out by Bookmark or other criteria
- Extract text /metadata and write to CSV/Excel/?
The PDF Reader can read more than the Writer can write.
What if you just want to break up a large PDF by text or bookmark criteria, leaving the format intact?
I am trying to do that right now and it has been a real struggle. Once the pdf is read into FME, there doesn't seem to be a way to put it back together in the same way.
If you are going to have a PDF Writer, it needs to handle all the types of data found in one. There is a real need for this functionality because PDFs are the file format of choice in the A/E/C sector.
Hello!
The Safe Software Team is actively investigating improvements to PDF writing, potentially as a new additional writer with a focus on text-based documents.
We're in the early stages and we'd love to hear from you on your needs for PDF writing if you're able to share any insight on the questions below:
- What does your organization need to do with PDF writing?
- What do your PDFs need to contain (for example: text, hyperlinks, images, charts...)?
Any use cases you're able to share will help inform our product development team as this project shapes.
If you're interested in chatting further, or in providing a PDF document sample, please Submit a Ticket and we will follow up with you.
Thank you!
Output multipage reports with multiple maps, tables, images, graphs and accompanying text
Hi @jovitaatsafe
Recently, we needed to translate our HTML validation reports to PDF’s. We initially chose HTML because the current PDFPageFormatter would have been too challenging to use given our dynamic output.
Our biggest challenge HTML>PDF, was to get the tables to split properly around page breaks.
So I would like to add these to the list:
- proper splitting of tables around page breaks
- the option to write more complex tables (rowspan/colspan)
I use FME to create extensive multi-page PDF reports, combining map info, text, photos, logos, contents pages, etc… There is definitely room for improvement though as I have had to develop a lot of workarounds.
Multi-page support is currently fine, but initially hard to figure out - you need to embed the PDFPageFormatters within custom transformers and use a group-by on them while specifying the page number you want it to go to (so you have to manually figure out page numbers first and re-assign them after this transformer).
Better support for text (paragraphs) would be nice. At the moment you have to guess where line breaks should be by counting the number of characters for a specific font size and page size. Solving how/when to line-breaks when adding in paragraphs would be helpful. It uses XHTML too, which is okay but limited - custom hyperlinks don’t work unless specifying the whole URL (which is often too long and will spill off a page).
Figuring out the actual size that text is going to be once written to the PDF is imperfect, as you need to use a TextStroker to estimate the size in FME to figure out positioning, but then write to the PDF as a point with XHTML specifying font and size (and these do not match). Technically that could be a niche improvement for the TextStroker to interpret the XHTML in the same way as the PDF...
Writing page navigation doesn’t seem possible. I’d like to be able to link users to specific pages within the PDF when they click on a text link in the contents page.
The PDF tooltip is currently buggy, and will often not work when there is more than 1 tooltip provided as it can link to the wrong info (which is a shame as that could have solved the hyperlinking issue). Use-case for me was producing a floor plan map, with hyperlink boxes per room/asset to external documents (but they’d end up linking to the same info).
Better chart support would be nice so these can be inserted into reports, such as being able to use HTML straight out of a report generator. It’s very hard to make charts look professional as it stands.
The same goes for table support - it’s very tricky to get a table to look good in a PDF report straight out of FME (current workaround is to write data using an Excel template, then use an Excel to PDF print tool (not supported on FME Flow Hosted), then to read in the PDF and re-write it using FME...).
Providing some basic out-of-the-box symbols to use for PDF maps would be a bonus.
That’s what comes to mind for now, hope it helps!
Hi
you should be able to create a PDF the same way as HTML report generator:
header logos tables figures text and so on.
I usually go to the HTML report generator and convert the results to PDF but mostly it would fail
Br
Felipe
We still need the opportunity to write PDF/A as this idea explains:
We use this writer to generate daily automatic security Site Reports (Maps and Tables) to ensure the safety of our teams.
This is very important writer for us but as it stands we are on the verge of switching over to the Esri REST API for PDF maps/reports because making nice looking PDF Maps with FME is just too painful. In way this is understandable because FME is not really a cartographic tool - but if you are going to invest time is this writer - cartographic support is where I would value it most.
IMHO focus on the current writer or the transformers which support it rather than a new writer.
Not sure I would ever use FME to write text based PDFs. The current writer could be super useful if it was improved
- Tables and text boxes! There is that "TableAdder" custom transformer which does this but it really does not work very well, I don't think anyone has really ever updated it.
- Point symbols are also really painful - people suggest using MapnikRasterizer which seems like a very hacky work-around. Add icon support to the PDFStyler
- Page layout. The current PDFPageFormatter is just kind of weird, not sure how to fix it
Some of the issues we run into:
- Text in a table or box, we need to do word-wrap ourselves and it’s really annoying
- Aligning various parts of a layout, especially if they’re in different boxes in the PDFPageFormatter
- Creating mutliple pages with map content is a pain, we need to offset every page’s content to the same origin (technically the PDFPageFormatter needs a hard-coded page number, but by changing the pdf_page_number attribute later we can actually have it create multiple pages)
The PDF Reader can read more than the Writer can write.
Adding to this, the PDF reader has issues with not correctly grouping individual characters into words, especially for narrow fonts. This is was a major hassle trying to parse text from building plans, as FME would only return individual characters and not words, and we finally had to resort to using https://pdfminersix.readthedocs.io/en/latest/ where it’s possible to tweak these settings. I would love for FME to have similar functionality!
For the PDF writer, a big challenge is to ensure page flow, meaning objects that are positioned relative to other objects that may be dynamic in size, over multiple pages. That, together with support for tables and more intelligent paragraph handling (including line breaks) would be fantastic.
we use publish from autodesk products. as it reproduces our dwg layout exactly to pdf.
We also could benefit from enhanced pdf writing in that we need to add authorization stamps to pdf and have the output exactly like the input with added stamp. this is difficult to do without spending LOTs of time formatting and styling the pdf again.
Thank you everyone for your helpful feedback, feel free to continue to join the discussion, I’ll add some responses below.
So far, I’m hearing a lot of support for potentially making use of HTML such as the HTMLReportGenerator, improving support around tables, text, point symbology, and making it easier to set page breaks and better general layouting.
Forgive me for not replying to each individual message, I really appreciate you taking the time to share and it’s a relief to hear we’ve identified some of the same things, as well as heard some new ones (like point symbology!).
@spatialexjames I appreciate how hard some of those tasks have been, shoutout for a very comprehensive answer, really helped me fill out my chart! I expect the team will have more questions after a first pass through.
@gabriel_hirsch thanks for bringing up PDF/A. I’m not sure this one will be in scope for this writer, but it’d be great if we could learn more about the use cases for future considerations. From the idea, I see AEC customers and some municipalities being represented, can you share more about how many organizations in which industries might benefit from having this supported, and what they currently do to output PDF/A? Feel free to shoot me an email to talk further at jovita.chan AT safe.com
@braggken Hopefully with some better text management we’ll be able to make the writer a bit more accessible to FME users for future reports! Good to know about the difficulty around point symbols, thanks!
@gisbradokla Are the authorization stamps like a watermark or annotation? How does your organization currently go about adding in the authorization stamp?
Thanks folks, great discussion (:
Support for PDF/A writing!
@gisbradokla Are the authorization stamps like a watermark or annotation? How does your organization currently go about adding in the authorization stamp
we insert autocad block at specific coordinates/scale onto layout
we use autodesk publish command (export pdf)
then remove block from dwg
access to the stamp is controlled per user
Would like to be able to set the pdf meta data directly in the writer. Currently use pikePDF to update the metadata after writing.
And yes, tables please!
Thanks for taking initiative on this front, @jovitaatsafe! PDF is usually a dead end node for the data but intrinsicly used of a public sector customers. In most cases it is conversion from editable formats lile .xlsx, .docx or HTML that a .pdf takes form. PdfTk and LibreOffice convertor libraries work magic combined with SystemCaller, however there are hinders and unnecessary steps with installations that may not work in certain environments. Ii would be grand to have all packaged in one native transformer.
And yes, PDF/A would open a door where a long queue is patiently building up.
Think of PDF as a dog tag and a lightweight viewer combined. Everyone can read it, it states what we know of map data at a point in time in our (project) workflow.
We need a greater support of pdf writing to display the map data at correct scale and symbology to avoid the bottleneck of producing maps in reports tools like word, or cad drawing programs. Ideally we want to ship the map data as a database together with a pdf that shows you the content. Here will a PDF-A support help us greatly achieve a five-star delivery.
If you want to get a peak of our workflows (AEC industry, 3000+ employees), think of it as a huge GIT tree: we are constantly working on different pipelines, alternatives and this is true for one single project we deliver to our clients every day as it is true for our company as a whole. So a pdf would fit in this workflow as the code viewer in git: you always can see visually what our delivered data was about at a precise time.
Yes we will need to use templates, so that we can focus only on updating the symbology and scale toghether with metadatas.
Would it be enough improving an html writer in place for a better pdf? No. Because you will not have the “snapshot” aspect of the pdf which you can attach a report or an email, archive and view so easily.
Update: The project is in development!
I just wanted to thank everyone for taking the time to share your use cases and needs! Our development team has scoped out a plan taking into consideration the information you’ve all provided, and they are now actively working on the first phase of it.
Keep in mind that the first phase when it comes out to beta will contain some of the things that we really want to add in, while other features may be added later or may be unplanned. As always we’re open to feedback as we go through that process!
Metadata:
Some basic metadata support is being targeted. While I can’t promise that it’ll make it into a future plan, I’d love to hear more about what fields are important to you in PDF metadata when you’ve got post processes like pikePDF @fdw.