Skip to main content

What I want: I want to read a list of pdf files and write metadata of these files to a postgresql database. The attributes I want to export are number of pages, file name, file path and file id (which I create through the UUIDGenerator).

 

I start with a PATH reader to find all pdf files from this folder through all the subfolders. The problem is I also want to add the number of pages of each pdf file to the output table and I don't think I can find this with the PATH reader.

 

Therefore I included a PDF reader and have found the number of pages of each pdf. Now the problem is - how do I combine the output of these two readers? I have tried to use Featurejoiner and Featuremerger but both of these requires a common attribute to join on. But I can't find a format attribute in the two readers that is unique. Is there a way to index the attribute tables of each reader? Or is there some much easier way to perform this action?

 

I have included a screenshot of my workspace until now.

image

Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".


Several ways to do this. One way is to first read the path, then use a FeatureReader to read from path_windows and choose "merge attributes".

this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image


this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Yes, first directory reader to read the files, then feature reader pdf reader to get the number of pages. Merge attributes is fold out option in the lower portion of the feature readers parameters.

 

edit: typo 8-|


this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Attached workspace demonstrating this.


this makes sense, thanks a lot. But this means to not include the pdf reader, right? I am unsure how to get the page numbers, though. I make the FeatureReader like this but does not I do not see an option to merge attributes.

image

Thanks a lot!


Reply