I want to read PDF files as images using OpenAI's Vision API to extract information. I realized that FME's PDF reader tool has issues with PDFs containing text created by digital signatures—it cannot read them either as text or as images. Therefore, I decided to use PythonCaller to create image files from the PDF files to ensure the data is complete. Below is my code:
import pdf2image
import fme
import fmeobjects
import os
class FeatureProcessor:
def __init__(self):
pass
def input(self, feature):
pdf_path = feature.getAttribute('path_windows')
images = pdf2image.convert_from_path(pdf_path, dpi=300)
for index, image in enumerate(images):
image_feature = fmeobjects.FMEFeature()
image_feature.setAttribute("page_number", index + 1)
temp_dir = "D:/temp"
if not os.path.exists(temp_dir):
os.makedirs(temp_dir)
image_name = f"{feature.getAttribute('path_rootname')}_{index + 1}.png"
temp_image_path = os.path.join(temp_dir, image_name)
image.save(temp_image_path, "PNG")
feature.setAttribute("image_path", temp_image_path)
feature.setAttribute("image_data", image.tobytes())
feature.setAttribute("image_name", image_name)
self.pyoutput(feature)
feature_processor = FeatureProcessor()
The above code works well; however, at the output port of PythonCaller, I only receive attribute values, and no spatial data of the raster is included.
How can I retrieve both the raster spatial data and the input attributes at the output of PythonCaller?
By the way, I feel that writing PNG files to temporary storage and then reading them back consumes a lot of memory and is slow. Is there a way to optimize the process of exporting and reading images back to make the output of PythonCaller more efficient and faster?