I have pdfs with multiple pagesi have to split them up by a certain textstring (e.g. SS 66,16)This textstring is always nearly on the same coordinate on the page, so i can isolate them and write the text in a new attribute but how can i fanout them that i have all content right from one page (e.g. pdf_page_number 2) and the pdf is named after that textstring (SS 66,16.pdf) Greetz and cheers Franco

Multiple PDF Fanout and pdf_name

+20

caracadrian
Contributor
564 replies
1 year ago
19 January 2023

After isolating the desired text string (Clipper) expose and rename fme_text_string attribute to something else, send text features to the Supplier port of a FeatureMerger and all the original features to the Requestor port. Do a 1 to 1 merger with group by enabled and set to pdf_page_number (you have to expose this for all features at the very beginning). Send the output from the Merged port to an AttributeRemover and remove pdf_page_number and send everything to a PDF Writer with fanout set to the renamed fme_text_string attribute.

I attached a demo workspace.

+6

franco69
Author
Contributor
164 replies
1 year ago
23 January 2023

After isolating the desired text string (Clipper) expose and rename fme_text_string attribute to something else, send text features to the Supplier port of a FeatureMerger and all the original features to the Requestor port. Do a 1 to 1 merger with group by enabled and set to pdf_page_number (you have to expose this for all features at the very beginning). Send the output from the Merged port to an AttributeRemover and remove pdf_page_number and send everything to a PDF Writer with fanout set to the renamed fme_text_string attribute.

I attached a demo workspace.

Hi.....something doesnt work....the output doesnt look like the original and i dont know what i am doing wrong here......the desired text string (i didnt clipped the text, i isolate the text with coordinates-but i dont think this would be a difference) also is missing after the feature merger in the output...

and why the Attribute remover?

is there a possible way to send you data?

Greetz and cheers Franco

+20

caracadrian
Contributor
564 replies
1 year ago
23 January 2023

Hi.....something doesnt work....the output doesnt look like the original and i dont know what i am doing wrong here......the desired text string (i didnt clipped the text, i isolate the text with coordinates-but i dont think this would be a difference) also is missing after the feature merger in the output...

and why the Attribute remover?

is there a possible way to send you data?

Greetz and cheers Franco

The font and formatting of strings isn't preserved by FME. There are even a few builds that read pdf strings without whitespaces.

For estetic purposes I would actually do this differently. I would split pdf's to single page via a dedicated pdf splitter/merger (PDFSAM?) then come in with FME, read the desired text string and use the File Copy writer to rename them.

You can upload your workspace and a 2-3 page pdf file here. Or, there is an email address on my profile page caracadrian (safe.com)

+6

franco69
Author
Contributor
164 replies
1 year ago
1 February 2023

I have solved it....thank you to all very much!

+6

franco69
Author
Contributor
164 replies
1 year ago
3 February 2023

Now i have a new question to this issue......

i have renamed the pds now (with an extracted text string) and copied them with this new fanout

but for that i had to split the multiple pdf in single sheets (like caracadrian mentioned)

unfortunately there were sheets with the same name and e.g. page 1 to 6

is there a way to combine those in one multiple pdf sheet (but only with the filecopy writer?)

Greetz and cheers

Franco

Multiple PDF Fanout and pdf_name

5 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded