Solved

ESRI Shape files with UTF-8 encoded field names ?

  • 17 October 2017
  • 9 replies
  • 37 views

Userlevel 1
Badge +22

Hi,

Apparently (this is new to me too) the ESRI Shape definition has been expanded with a "code page file" (*.cpg), which contains the "code page" of the table structure, i.e. field names.

I've just received such a data set, but FME doesn't seem to recognize this extension. My field names end up as un-decoded UTF-8 string, e.g. "ByggeÃ¥r" instead of "Byggeår".

Is it possible to have FME (2017.1) recognize this extension, or if not, are there any plans to do so ?

Cheers

icon

Best answer by takashi 17 October 2017, 11:38

View original

9 replies

Userlevel 1
Badge +22
Ah, it seems there may be some uncertainty, about whether the "code page file" may only apply to the data content, and not the field names. I hope Safe are better at sorting this out.

 

 

Userlevel 2
Badge +17

Hi @lifalin2016, your observation is correct. The cause is that the current Shapefile Reader/Writer always handles attribute names with System default encoding, regardless of character encoding setting for the dataset. It must be known issue.

I reported this issue twice - C95287 (PR#54930), C101289 (PR#58644) in the past 2 - 3 years, but unfortunately there has not been any progress yet.

Hope this "cold case" will be resolved as soon as possible.

Badge

Hi @lifalin2016

could you please share a sample of your data? I wonder if attribute names are in the same encoding as cpg-file suggests. My main question is whether there are Shape-files with attribute names and attribute values stored in different encodings.

Userlevel 1
Badge +22

Hi @lifalin2016

could you please share a sample of your data? I wonder if attribute names are in the same encoding as cpg-file suggests. My main question is whether there are Shape-files with attribute names and attribute values stored in different encodings.

Hi Lena.

 

In this case, they're definitely both UTF-8. And this is what's in the CPG file.

 

As this is a dataset we've received from an state owned energy distributor, I'm trying to find out, whether we're allowed to pass on this particular dataset. I can't really send you a subset, or an anonymized derivative, as this would defeat the purpose.

 

I have looked at its metadata, however, and it seems like it's been created by ArcGIS 10.2 ?

 

Cheers

 

Userlevel 4
Badge +13

Thanks for reraising this. @takashi has also been in touch and we spent time today discussing ways forward. It is true that to this point we were interpretting the encoding to have applied only to the data values, which we now have evidence to believe is not correct.

Userlevel 1
Badge +22

Hi @lifalin2016

could you please share a sample of your data? I wonder if attribute names are in the same encoding as cpg-file suggests. My main question is whether there are Shape-files with attribute names and attribute values stored in different encodings.

Hi Lena. Did you get the dataset ?

 

 

Userlevel 4
Badge +13
Hi Lena. Did you get the dataset ?

 

 

Yes, we got it. Thanks!

 

 

Badge +21

This issue seems to not be solved? Would be great with a fix :)

Badge

This issue seems to not be solved? Would be great with a fix :)

Hi @sigtill

at the moment, FME supports attribute names and feature type names in system default encoding only (i.e. on Linux @lifalin2016 's workflow would work). However, we are working on UTF8Names project that will enable attribute and feature type name in any encoding support. This is one of the projects that involve infrastructure revamping, therefore it is not a quick fix. We have some changes already in FME 2019, however, they are not visible to users yet. So far, the plan is to make FME 2020 UTF8Names enabled.

Reply