Skip to main content
Solved

Combining the output of PATH and PDF readers


What I want: I want to read a list of pdf files and write metadata of these files to a postgresql database. The attributes I want to export are number of pages, file name, file path and file id (which I create through the UUIDGenerator).

 

I start with a PATH reader to find all pdf files from this folder through all the subfolders. The problem is I also want to add the number of pages of each pdf file to the output table and I don't think I can find this with the PATH reader.

 

Therefore I included a PDF reader and have found the number of pages of each pdf. Now the problem is - how do I combine the output of these two readers? I have tried to use Featurejoiner and Featuremerger but both of these requires a common attribute to join on. But I can't find a format attribute in the two readers that is unique. Is there a way to index the attribute tables of each reader? Or is there some much easier way to perform this action?

 

I have included a screenshot of my workspace until now.

image

Best answer by nielsgerrits

albwel wrote:

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Attached workspace demonstrating this.

View original
Did this help you find an answer to your question?

5 replies

nielsgerrits
VIP
Forum|alt.badge.img+54

Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".


  • Author
  • July 12, 2023
nielsgerrits wrote:

Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image


nielsgerrits
VIP
Forum|alt.badge.img+54
albwel wrote:

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Yes, first directory reader to read the files, then feature reader pdf reader to get the number of pages. Merge attributes is fold out option in the lower portion of the feature readers parameters.

 

edit: typo 8-|


nielsgerrits
VIP
Forum|alt.badge.img+54
  • Best Answer
  • July 12, 2023
albwel wrote:

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Attached workspace demonstrating this.


  • Author
  • July 12, 2023
albwel wrote:

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Thanks a lot!


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings