Question

Read from PDF and rename file

5 years ago
March 27, 2020
2 replies
70 views

+1

jmhomza
Contributor
3 replies

I have a need to take stacks of documents and upload them to SharePoint Online and organize them in respective Libraries and Doc Sets.

My idea is to have the 1st page of the scanned document, which is saved as a PDF, read the contents, then rename the file based off the text found on the 1st page. Uploading those files into a Content Organizer, based from file name, will accomplish SharePoint challenge.

Any ideas of: Read content of file, then rename file from that content?

+10

thijsknapen
Contributor
154 replies
5 years ago
March 29, 2020

I'm not sure whether it is possible in FME to read text from scanned PDF documents (i.e. find text in rasters).

If the PDF file would be a 'non raster' document, you can use an 'adobe geospatial PDF' reader to obtain 'text features' (either per 'block'/'item') or per page. I think for this purpose obtaining text features per page is most useful. Then you could use general test/search conditions to find the text you want, and using a filecopy transformer to copy the document to a new location with a new name (based on the content found)

jakemolnar
98 replies
5 years ago
March 30, 2020

thijsknapen wrote:

I'm not sure whether it is possible in FME to read text from scanned PDF documents (i.e. find text in rasters).

If the PDF file would be a 'non raster' document, you can use an 'adobe geospatial PDF' reader to obtain 'text features' (either per 'block'/'item') or per page. I think for this purpose obtaining text features per page is most useful. Then you could use general test/search conditions to find the text you want, and using a filecopy transformer to copy the document to a new location with a new name (based on the content found)

If @jmhomza needs to get text from raster images (ie. PDFs that are only scans, no text), then they could try the TesseractCaller. It takes a bit of setup, since you need to independently download and install Tesseract OCR (FME can't ship it due to licensing), but it can work to recognize text in images.

Reply

Rich Text Editor, editor1

Read from PDF and rename file

2 replies

Reply

Helpful Members This Week

Recently Solved Questions

generate triangles between 3D lines

Speeding up geocoder

All Attributes from GeoJSON Retrieved via HTTPCaller (FME 2021)

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

How can I extract a number from a pdf and change the file name of the pdf accordingly with that number?icon

renaming .dgn (Microstation) file levels with FMEicon

With FME, Is it possible to extract a barcode from a PDF file and convert to text file?icon

PDF writer: Document propertiesicon

Extract the Data from pDFicon

Helpful Members This Week

Recently Solved Questions

generate triangles between 3D lines

Speeding up geocoder

All Attributes from GeoJSON Retrieved via HTTPCaller (FME 2021)

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings