Skip to main content
Solved

Combining the output of PATH and PDF readers

  • July 12, 2023
  • 5 replies
  • 69 views

What I want: I want to read a list of pdf files and write metadata of these files to a postgresql database. The attributes I want to export are number of pages, file name, file path and file id (which I create through the UUIDGenerator).

 

I start with a PATH reader to find all pdf files from this folder through all the subfolders. The problem is I also want to add the number of pages of each pdf file to the output table and I don't think I can find this with the PATH reader.

 

Therefore I included a PDF reader and have found the number of pages of each pdf. Now the problem is - how do I combine the output of these two readers? I have tried to use Featurejoiner and Featuremerger but both of these requires a common attribute to join on. But I can't find a format attribute in the two readers that is unique. Is there a way to index the attribute tables of each reader? Or is there some much easier way to perform this action?

 

I have included a screenshot of my workspace until now.

image

Best answer by nielsgerrits

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Attached workspace demonstrating this.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

5 replies

nielsgerrits
VIP
Forum|alt.badge.img+60
  • 2938 replies
  • July 12, 2023

Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".


  • Author
  • 3 replies
  • July 12, 2023

Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image


nielsgerrits
VIP
Forum|alt.badge.img+60
  • 2938 replies
  • July 12, 2023

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Yes, first directory reader to read the files, then feature reader pdf reader to get the number of pages. Merge attributes is fold out option in the lower portion of the feature readers parameters.

 

edit: typo 8-|


nielsgerrits
VIP
Forum|alt.badge.img+60
  • 2938 replies
  • Best Answer
  • July 12, 2023

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Attached workspace demonstrating this.


  • Author
  • 3 replies
  • July 12, 2023

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Thanks a lot!