I'm looking to extract text from a specific area of a PDF. The box contains IDs and I would like those written to a simple Excel file. My source data is a PDF with roughly 100 pages, and the area I want scanned is in the same location on each page. I believe the best tool to use for this would be the new Adobe Geospatial PDF reader but I am unsure how to proceed from there.
Page 1 / 1
Please share the pdf or a part of it and describe the info you would like to get.
Please share the pdf or a part of it and describe the info you would like to get.
If you open the PDF in the data inspector you will find the Min and Max extents of the part you want.
Use a creator to create a box with these extents and use a spatialFilter to get the desired data.
Use a creator to create a box with these extents and use a spatialFilter to get the desired data.
If you open the PDF in the data inspector you will find the Min and Max extents of the part you want.
Use a creator to create a box with these extents and use a spatialFilter to get the desired data.
Thank you for your quick response. This method has gotten me closer to my goal, but I'm now struggling to take the filtered result and narrow it down to an attribute of the contained text I can write to an excel or CSV file. Since the PDF contains over 100 pages, when I view the filtered result in Data Inspector it is difficult to read as all the data prints over itself.
Thank you for your quick response. This method has gotten me closer to my goal, but I'm now struggling to take the filtered result and narrow it down to an attribute of the contained text I can write to an excel or CSV file. Since the PDF contains over 100 pages, when I view the filtered result in Data Inspector it is difficult to read as all the data prints over itself.
Just add an excel or csv writer and voila
The resulting table does not have an attribute with the text contained in the filtered area and I'm not sure how to produce that.
I was able to pull these out by using AttributeCreator and mapping that to the value of "fme_text_string". Thanks again for your help!