Skip to main content
Solved

encoding and language driver ID

  • December 4, 2020
  • 2 replies
  • 116 views

alc33
Contributor
Forum|alt.badge.img+13

Hi!

 

I'm trying to define the encoding for my shp but directly on the dbf and not on the cpg. I choose UTF-8 on the parameters and I always have a .cpg file.

 

I looked on the shp writer help and I saw this sentence :

"The output Shapefile .dbf will contain the language driver ID for the selected or detected encoding, if no language driver ID is available for the encoding, a .cpg file may be generated instead."

 

I don't understand what is a "language driver ID ". Can you explain ?

 

Thanks!

Best answer by debbiatsafe

Hi @alc33​ 

I asked our development team for clarification.

 

Language drivers are used to determine how to sort and display characters in tables. In a DBF file, the language driver ID is stored in the file header, at byte offset 29.

 

When FME writes a .dbf file with a .cpg, the language driver ID is set to 0. If the encoding does have a driver ID, no .cpg file is created.

 

The reason FME creates a .cpg file for UTF-8 encoded files is because there is no language driver ID for UTF-8. The DBase version Shapefile uses predates the introduction of UTF-8, so there probably was never a consideration made for it later on. Modern applications know to check for the .cpg file and figure things out from there.

 

I hope this information helps!

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

2 replies

debbiatsafe
Safer
Forum|alt.badge.img+21
  • Safer
  • Best Answer
  • December 16, 2020

Hi @alc33​ 

I asked our development team for clarification.

 

Language drivers are used to determine how to sort and display characters in tables. In a DBF file, the language driver ID is stored in the file header, at byte offset 29.

 

When FME writes a .dbf file with a .cpg, the language driver ID is set to 0. If the encoding does have a driver ID, no .cpg file is created.

 

The reason FME creates a .cpg file for UTF-8 encoded files is because there is no language driver ID for UTF-8. The DBase version Shapefile uses predates the introduction of UTF-8, so there probably was never a consideration made for it later on. Modern applications know to check for the .cpg file and figure things out from there.

 

I hope this information helps!


alc33
Contributor
Forum|alt.badge.img+13
  • Author
  • Contributor
  • December 16, 2020

Hi!

Thank you very much. It's perfect!