Skip to main content

Hi, I am new in FME - I need a help. I would like to convert an PDF file to JPEG.

Hi @tereziastredna

Just a question: your PDF file has image or text?

You can use the Reader PDF Geospatial or transformer to extract information PDF.

Could you share us your example data?

Thanks,

Danilo


Hi @tereziastredna

Just a question: your PDF file has image or text?

You can use the Reader PDF Geospatial or transformer to extract information PDF.

Could you share us your example data?

Thanks,

Danilo

There is a interesting article about PDF - https://knowledge.safe.com/articles/67717/getting-started-with-pdf-reading.html

 

 


Hi @tereziastredna

If you just want to make a JPEG image that looks exactly like your PDF, then you can use the "Non Spatial > Read Rasterized Pages" mode.

This will generate a feature type called "pdf_rasterized_pages", which will produce one raster per page. You can then send these features to a JPEG writer.

One interesting option is the "Raster Size > Mode". You can either use "Scale", like I did above, or you can make the rasters a custom size in pixels. I recommend trying a few different values for "Pixels per Point:", converting to JPEG, and seeing how the result looks. The bigger your "Pixels per Point" value, the larger your image will be.

PS. If you instead want to read JPEG images out of a PDF, rather than rendering the entire page, you want the "Spatial > Read Images" mode. I can tell you more about that mode if that is indeed what you're looking for.


Hello,

thanks to both of you.

 

My PDF file is a little bit of everything. Text, and images... see attached. test-pdf-to-jpeg.pdf

 

I tried to make it as you recommended jakemolnar, but I get an error. I am not sure what to change now. (to remove extra band) also attached error-fme.png

 

Thanks

Hello,

thanks to both of you.

 

My PDF file is a little bit of everything. Text, and images... see attached. test-pdf-to-jpeg.pdf

 

I tried to make it as you recommended jakemolnar, but I get an error. I am not sure what to change now. (to remove extra band) also attached error-fme.png

 

Thanks
Hi @tereziastredna,

 

The problem is because the PDF reader is generating an ALPHA band (which is used to control transparency). To remove it you can use a RasterInterpretationCoercer. Set the 'Destination Interpretation Type to 'RGB24'. Leave everything else default. You should then be able to write the image to JPEG.

 

 


Hi @tereziastredna,

 

The problem is because the PDF reader is generating an ALPHA band (which is used to control transparency). To remove it you can use a RasterInterpretationCoercer. Set the 'Destination Interpretation Type to 'RGB24'. Leave everything else default. You should then be able to write the image to JPEG.

 

 

Hi, I tried this Transformer which you´ve recommended, the result: background is black, unfortunately. black-background.png and because of black color there are missing informations (data)
Hi, I tried this Transformer which you´ve recommended, the result: background is black, unfortunately. black-background.png and because of black color there are missing informations (data)
Doh! That's disappointing! I'm having trouble downloading your PDF example. It's giving me problems.

 


Hi, I tried this Transformer which you´ve recommended, the result: background is black, unfortunately. black-background.png and because of black color there are missing informations (data)

If you add a RasterExpressionEvaluator and set it up like below this will set all the no data to white instead of black. Do this before you use the RasterInterpretationCoercer.

 

What we do here is test where the Alpha band is 0 and then make the RGB values to be white. This should do the trick
Doh! That's disappointing! I'm having trouble downloading your PDF example. It's giving me problems.

 

Thanks a lot! It is working! Now is my PDF converted into JPEG and looks fantastic! Thanks a lot
Thanks a lot! It is working! Now is my PDF converted into JPEG and looks fantastic! Thanks a lot
Great News!

 

 


Hi @tereziastredna,

 

The problem is because the PDF reader is generating an ALPHA band (which is used to control transparency). To remove it you can use a RasterInterpretationCoercer. Set the 'Destination Interpretation Type to 'RGB24'. Leave everything else default. You should then be able to write the image to JPEG.

 

 

HI, right now I saw that the colors are different... for example original PDF blue is in JPEG beige, and the red is blue. everything else is perfect, but this... do you hae any tip?

 

 


HI, right now I saw that the colors are different... for example original PDF blue is in JPEG beige, and the red is blue. everything else is perfect, but this... do you hae any tip?

 

 

Hi @tereziastredna,

 

Oh dear. This is because the color bands must be in a mixed up order. I think the writer should work properly even if the bands are in the wrong order but the RasterExpressionEvaluator assumes the order is R>G>B>A. You may need to reorder the expression in the Elevator to match what comes into FME from the pdf reader. Alternatively you can use a RasterBandOrderer before the RasterExpressionEvaluator to order the bands in the standard way (RGBA). Perhaps @jakemolnar can see the order is gettign muddled by the pdf reader?

 

 


Hi @tereziastredna,

 

Oh dear. This is because the color bands must be in a mixed up order. I think the writer should work properly even if the bands are in the wrong order but the RasterExpressionEvaluator assumes the order is R>G>B>A. You may need to reorder the expression in the Elevator to match what comes into FME from the pdf reader. Alternatively you can use a RasterBandOrderer before the RasterExpressionEvaluator to order the bands in the standard way (RGBA). Perhaps @jakemolnar can see the order is gettign muddled by the pdf reader?

 

 

All done! I swapped the order of this 3 colors in expression. Not needed to add an RasterBandOrderer. Thanks for your time and help

 

 


All done! I swapped the order of this 3 colors in expression. Not needed to add an RasterBandOrderer. Thanks for your time and help

 

 

Excellent - *High Five Emoji*

 

 


I have another problem 😞. I have a PDF files which consist of few pages (2-3). When I run the translation as recommended from you, the results are 2-3 sepatated files. I need to have them as an JPEG but as 1 file. I have no page numbers to set an attribute pdf_page_number. And what the worst is, I have a houndreds of this files. Some of them have only 1 page but some a few pages. So I need something what will also recognize what has more than 1 page. Do you have any idea how to join in together?


I have another problem 😞. I have a PDF files which consist of few pages (2-3). When I run the translation as recommended from you, the results are 2-3 sepatated files. I need to have them as an JPEG but as 1 file. I have no page numbers to set an attribute pdf_page_number. And what the worst is, I have a houndreds of this files. Some of them have only 1 page but some a few pages. So I need something what will also recognize what has more than 1 page. Do you have any idea how to join in together?

Ohh no! You may want to post a new question about this one. I would say that you need a combination of a RasterMosaiker to join the rasters into one. You will likely also need to use an Offsetter to offset the images before you can stitch them together. A bounds extractor would help get you the amount you need to offset it by. I think a Counter would help you get the page number and the 'fme_basename' (file name) attribute is what you can count by.

 

It's a fairly complex problem but if you ask give it a go and then ask a separate question when you get stuck I think you'll get some good help.

 

 


Hi @tereziastredna

If you just want to make a JPEG image that looks exactly like your PDF, then you can use the "Non Spatial > Read Rasterized Pages" mode.

This will generate a feature type called "pdf_rasterized_pages", which will produce one raster per page. You can then send these features to a JPEG writer.

One interesting option is the "Raster Size > Mode". You can either use "Scale", like I did above, or you can make the rasters a custom size in pixels. I recommend trying a few different values for "Pixels per Point:", converting to JPEG, and seeing how the result looks. The bigger your "Pixels per Point" value, the larger your image will be.

PS. If you instead want to read JPEG images out of a PDF, rather than rendering the entire page, you want the "Spatial > Read Images" mode. I can tell you more about that mode if that is indeed what you're looking for.

@tereziastredna

I have a similar problem. I am converting pdfs to tiffs and am getting both color changes and rotation changes. I was hoping this non-spatial option would solve things but nothing comes out of the reader.

I have also tried the suggestions in the previous answer with the RasterInterpretationCoercer but when I do that, the only change is that the image in the inspector changes to the same incorrect color (bluish) as the output. If I output to a jpeg, the color is wrong but the orientation is ok.

So I need to figure out both how to maintain the color and orientation of the original document.

 


@tereziastredna

I have a similar problem. I am converting pdfs to tiffs and am getting both color changes and rotation changes. I was hoping this non-spatial option would solve things but nothing comes out of the reader.

I have also tried the suggestions in the previous answer with the RasterInterpretationCoercer but when I do that, the only change is that the image in the inspector changes to the same incorrect color (bluish) as the output. If I output to a jpeg, the color is wrong but the orientation is ok.

So I need to figure out both how to maintain the color and orientation of the original document.

 

Hi @justinv, I think it might be a good idea to post your issue as a new question. This thread is quite long already, so your new comment might not get noticed as easily. It would also help to add your sample data, workspace and log file to give some more background, so the community can provide suggestions based on your specific problem. Thank you!


Reply