Skip to main content

The shapefile writer (2015 vintage) used to have an option of character encoding - ANSI

What did this actually mean, iso-8859-1?

UTF-8 replaced it.

 

UTF-8 is a multibyte encoding that can represent any Unicode character.

ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters.

 

Both encode ASCII exactly the same way.


UTF-8 replaced it.

 

UTF-8 is a multibyte encoding that can represent any Unicode character.

ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters.

 

Both encode ASCII exactly the same way.

The question is what option do i have to use in the latest Shapefile writer to ensure that the data appears exactly the same as the data written by the 2015 Shapefile writer with character encoding set to ANSI. It isn't UTF-8

'ANSI' isn't consistently used so I wanted to know exactly what it meant in the shapefile writer


Is anyone from Safe able to comment? If you were upgrading a 2015 shapefile writer that previously had the Character Encoding set to ANSI which encoding would you now used if aiming for like for like replacement

@markatsafe @mark2atsafe


Is anyone from Safe able to comment? If you were upgrading a 2015 shapefile writer that previously had the Character Encoding set to ANSI which encoding would you now used if aiming for like for like replacement

@markatsafe @mark2atsafe

Geez! Sorry I missed your reply on the 27th! I didn't get an email.

 

Shapefile(s) have an optional .cpg (Code Page File) associated with them. It's a simple text file that tells software what encoding to use. 

 

Create a {shape file name}.cgp with the following for ISO-8859-1;

ISO 88591

Geez! Sorry I missed your reply on the 27th! I didn't get an email.

 

Shapefile(s) have an optional .cpg (Code Page File) associated with them. It's a simple text file that tells software what encoding to use. 

 

Create a {shape file name}.cgp with the following for ISO-8859-1;

ISO 88591

I know how to set the encoding, that's not the issue. The issue is the shapefile writer previously had an encoding option that no longer exists


I looked up some info in our developer database and found this useful set of q+a's:

"Is there a way to set the output encoding the same as the input?"
  • The writer detects the encoding from source if you set the Writer encoding to <not set>.
"What does ANSI here mean? Does it mean system default encoding?"
  • Yes, ANSI is a term used to indicate system default encoding, so it will vary depending on the locale of the system you are running Workbench on.
"If no encoding is set on the Writer, what will happen to the attributes? Will they be written as is? Will they be system default encoding encoded?"
  • As stated above, the default encoding will be system default. If you change the encoding to <not set>, the attributes will be encoded as is from the source.

So it seems that ANSI means "system encoding". You would set it to the system encoding you want (like cp-1252) or leave it unset if you wanted the same encoding as the source.


I looked up some info in our developer database and found this useful set of q+a's:

"Is there a way to set the output encoding the same as the input?"
  • The writer detects the encoding from source if you set the Writer encoding to <not set>.
"What does ANSI here mean? Does it mean system default encoding?"
  • Yes, ANSI is a term used to indicate system default encoding, so it will vary depending on the locale of the system you are running Workbench on.
"If no encoding is set on the Writer, what will happen to the attributes? Will they be written as is? Will they be system default encoding encoded?"
  • As stated above, the default encoding will be system default. If you change the encoding to <not set>, the attributes will be encoded as is from the source.

So it seems that ANSI means "system encoding". You would set it to the system encoding you want (like cp-1252) or leave it unset if you wanted the same encoding as the source.

Thanks Mark, knowing that ANSI here meant system encoding is really useful, and makes sense from my investigations.


Reply