Skip to main content
Question

How can I extract a number from a pdf and change the file name of the pdf accordingly with that number?

  • September 13, 2023
  • 2 replies
  • 43 views

sc1
Contributor
Forum|alt.badge.img+2
  • Contributor

Using string searcher with the search text LB I identified the number in the red box on the left. The number always contains the text LB and the same number of characters after LB. There are a lot of PDF's where the file names need to be changed to the LB number. Below I attached a screenshot in which I used a tester and a string searcher. I can find the LB number but how do I copy it to the writer name as output and how can I easiliy process aproximately 1000 pdf's in one go without manually changing parameters?

2 replies

joepk
Influencer
Forum|alt.badge.img+20
  • Influencer
  • September 15, 2023

If I read this correctly you already have (a part of) the filename you want extracted into an attribute value. Correct? You can use attribute values and/or the Text Editor in the dataset field in the FeatureWriter transformer. Could this solve your issue?image


chrisatsafe
Contributor
Forum|alt.badge.img+2
  • Contributor
  • September 18, 2023

Hi @sc1​ ,

 

Please note, that since you are using the FileCopy Writer, you'll only need a single feature per PDF in order to rename the files, please see the attached example.

 

The workspace is annotated with tips for each step. Essentially you'll need to read in the PDF, use a Tester to find the page of interest, perform the string search, set the filecopy_source_dataset and filecopy_dest_filename attributes in an AttributeManager, and connect it to a FileCopy writer. Note, you'll likely want to expose the fme_basename, fme_dataset, pdf_page_number, and pdf_page_text format attributes in the PDF reader (which should be set to read non-spatial text only).


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings