Question

PDF Writer and Latin2 ( cp1250 ) codepage

  • 24 January 2018
  • 7 replies
  • 12 views

Hello,

Did anybody resolve a problem of writing character ???ŠŽ ( part of Latin2 / cp1250 codepage ) to Geospatial PDF Writer ?

Regards Mišo


7 replies

Badge

Hi @glagolicmiso

 

do you get these characters corrupt or not displayed at all?

Could you please visualize your data right before the PDF Writer and check whether the values are displayed properly and what their encoding is? If something is wrong with the values or their displayed encoding before the values get to the Writer, it is one problem.

If the values look right, it becomes a 100% PDF Writer problem. In this case, could you please try to encode the values in UTF-8 using AttributeEncoder?

If the problem persists, we will probably need to open a support case and investigate. We will need a repro (your workspace and a small source data sample) then.

I am sorry you are struggling with this problem.

Badge

Hi @glagolicmiso

 

do you get these characters corrupt or not displayed at all?

Could you please visualize your data right before the PDF Writer and check whether the values are displayed properly and what their encoding is? If something is wrong with the values or their displayed encoding before the values get to the Writer, it is one problem.

If the values look right, it becomes a 100% PDF Writer problem. In this case, could you please try to encode the values in UTF-8 using AttributeEncoder?

If the problem persists, we will probably need to open a support case and investigate. We will need a repro (your workspace and a small source data sample) then.

I am sorry you are struggling with this problem.

Apparently, characters ?, ?, and ? cause the problem...

 

 

Badge

Hi @glagolicmiso

@NathanAtSafe filed a problem report regarding these disappearing characters. Apparently, PDF natively supports only characters that belong to WinAnsiEncoding (which is basically Win-1252 codepage). This explains why Š and Ž are supported while ? and ? are not supported. However, this doesn't explain why ? is causing a problem. We have PR #81775 open and will investigate further.

Meanwhile, could you please try converting your texts into polygons before writing them into PDF using TextStroker? This seems to be the easiest workaround - if it does solve the problem for you.

Hi @LenaAtSafe

It's been a while and I do not see any progress in solving this problem with Latin2 character in PDF Writer. After some more investigation we find that problem is in a PDF Writer – it just do not have problematic characters in a translation tables.

Is it really so big problem to include more than one code page in the PDF Writer?

Regards, Mišo

Hi @LenaAtSafe

 

It's been a while and I do not see any progress in solving this problem with Latin2 character in PDF writer. After some more investigation we are find that problem is in a PDF Writer – it just do not have problematic characters in a translation tables.

Is it really so big problem to include more than one code page in the PDF Writer?

Hi @LenaAtSafe, @daleatsafe

 

One more about the problem with printing Lantin2 with PDFWriter:

 

I'm almost positive that the problem is in PDF Writer re-coding tables because when I use one of the characters: "??????", I got the next message:

 

2018-05-04 10:26:23| 1.3| 0.0|FATAL |PDF2D writer: An error occurred. FME will attempt to provide more information on the error, but this may cause the translation to be terminated

 

2018-05-04 10:26:23| 1.3| 0.0|FATAL |PDF2D writer: invalid map<K, T> key

 

2018-05-04 10:26:23| 1.3| 0.0|ERROR |A fatal error has occurred. Check the logfile above for details

and the message "invalid map<K, T> key" is from pdf.dll

regards,

 

Mišo

Badge +1

Hi, this seems related.

We use windows hosted machines and many special characters are not printed.

I added a workspace with example text converted to PDF.

From the log file:

WARN |PDFWriter: Code point '257' not available in the current font encoding 'MacRomanEncoding'. Ignoring character

 

From the PDF reference by Adobe Version 1.4, Appendix D:

Host encoding is a platform-dependent encoding for the host machine. For non-UNIX Roman systems, it is WinAnsiEncoding on Windows and MacRomanEncoding on Mac OS. 

 

 

Reply