Question

Multiple PDF Fanout and pdf_name

  • 19 January 2023
  • 5 replies
  • 9 views

Badge +6

ILF_2023-01-19 15_41_07-NB Aufschlussdarstellungen - Nord_Erkundung 2018_Teil 1 - PDF-XChange EditorI have pdfs with multiple pages

i have to split them up by a certain textstring (e.g. SS 66,16)

This textstring is always nearly on the same coordinate on the page, so i can isolate them and write the text in a new attribute

 

but how can i fanout them that i have all content right from one page (e.g. pdf_page_number 2) and the pdf is named after that textstring (SS 66,16.pdf)

 

Greetz and cheers

Franco


5 replies

Badge +20

After isolating the desired text string (Clipper) expose and rename fme_text_string attribute to something else, send text features to the Supplier port of a FeatureMerger and all the original features to the Requestor port. Do a 1 to 1 merger with group by enabled and set to pdf_page_number (you have to expose this for all features at the very beginning). Send the output from the Merged port to an AttributeRemover and remove pdf_page_number and send everything to a PDF Writer with fanout set to the renamed fme_text_string attribute.

I attached a demo workspace.

Badge +6

After isolating the desired text string (Clipper) expose and rename fme_text_string attribute to something else, send text features to the Supplier port of a FeatureMerger and all the original features to the Requestor port. Do a 1 to 1 merger with group by enabled and set to pdf_page_number (you have to expose this for all features at the very beginning). Send the output from the Merged port to an AttributeRemover and remove pdf_page_number and send everything to a PDF Writer with fanout set to the renamed fme_text_string attribute.

I attached a demo workspace.

Hi.....something doesnt work....the output doesnt look like the original and i dont know what i am doing wrong here......the desired text string (i didnt clipped the text, i isolate the text with coordinates-but i dont think this would be a difference) also is missing after the feature merger in the output...

and why the Attribute remover?

is there a possible way to send you data?

 

Greetz and cheers Franco

Badge +20

Hi.....something doesnt work....the output doesnt look like the original and i dont know what i am doing wrong here......the desired text string (i didnt clipped the text, i isolate the text with coordinates-but i dont think this would be a difference) also is missing after the feature merger in the output...

and why the Attribute remover?

is there a possible way to send you data?

 

Greetz and cheers Franco

The font and formatting of strings isn't preserved by FME. There are even a few builds that read pdf strings without whitespaces.

For estetic purposes I would actually do this differently. I would split pdf's to single page via a dedicated pdf splitter/merger (PDFSAM?) then come in with FME, read the desired text string and use the File Copy writer to rename them.

You can upload your workspace and a 2-3 page pdf file here. Or, there is an email address on my profile page caracadrian (safe.com)

Badge +6

I have solved it....thank you to all very much!

Badge +6

Now i have a new question to this issue......

i have renamed the pds now (with an extracted text string) and copied them with this new fanout

but for that i had to split the multiple pdf in single sheets (like caracadrian mentioned)

unfortunately there were sheets with the same name and e.g. page 1 to 6

is there a way to combine those in one multiple pdf sheet (but only with the filecopy writer?)

 

Greetz and cheers

Franco

Reply