Skip to main content
Solved

Character encoding - ANSI = iso-8859-1?

  • May 27, 2020
  • 7 replies
  • 1426 views

ebygomm
Influencer
Forum|alt.badge.img+44

The shapefile writer (2015 vintage) used to have an option of character encoding - ANSI

What did this actually mean, iso-8859-1?

Best answer by mark2atsafe

I looked up some info in our developer database and found this useful set of q+a's:

"Is there a way to set the output encoding the same as the input?"
  • The writer detects the encoding from source if you set the Writer encoding to <not set>.
"What does ANSI here mean? Does it mean system default encoding?"
  • Yes, ANSI is a term used to indicate system default encoding, so it will vary depending on the locale of the system you are running Workbench on.
"If no encoding is set on the Writer, what will happen to the attributes? Will they be written as is? Will they be system default encoding encoded?"
  • As stated above, the default encoding will be system default. If you change the encoding to <not set>, the attributes will be encoded as is from the source.

So it seems that ANSI means "system encoding". You would set it to the system encoding you want (like cp-1252) or leave it unset if you wanted the same encoding as the source.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

7 replies

Forum|alt.badge.img+2
  • 194 replies
  • May 27, 2020

UTF-8 replaced it.

 

UTF-8 is a multibyte encoding that can represent any Unicode character.

ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters.

 

Both encode ASCII exactly the same way.


ebygomm
Influencer
Forum|alt.badge.img+44
  • Author
  • Influencer
  • 3427 replies
  • May 27, 2020

UTF-8 replaced it.

 

UTF-8 is a multibyte encoding that can represent any Unicode character.

ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters.

 

Both encode ASCII exactly the same way.

The question is what option do i have to use in the latest Shapefile writer to ensure that the data appears exactly the same as the data written by the 2015 Shapefile writer with character encoding set to ANSI. It isn't UTF-8

'ANSI' isn't consistently used so I wanted to know exactly what it meant in the shapefile writer


ebygomm
Influencer
Forum|alt.badge.img+44
  • Author
  • Influencer
  • 3427 replies
  • June 5, 2020

Is anyone from Safe able to comment? If you were upgrading a 2015 shapefile writer that previously had the Character Encoding set to ANSI which encoding would you now used if aiming for like for like replacement

@markatsafe @mark2atsafe


Forum|alt.badge.img+2
  • 194 replies
  • June 5, 2020

Is anyone from Safe able to comment? If you were upgrading a 2015 shapefile writer that previously had the Character Encoding set to ANSI which encoding would you now used if aiming for like for like replacement

@markatsafe @mark2atsafe

Geez! Sorry I missed your reply on the 27th! I didn't get an email.

 

Shapefile(s) have an optional .cpg (Code Page File) associated with them. It's a simple text file that tells software what encoding to use. 

 

Create a {shape file name}.cgp with the following for ISO-8859-1;

ISO 88591

ebygomm
Influencer
Forum|alt.badge.img+44
  • Author
  • Influencer
  • 3427 replies
  • June 5, 2020

Geez! Sorry I missed your reply on the 27th! I didn't get an email.

 

Shapefile(s) have an optional .cpg (Code Page File) associated with them. It's a simple text file that tells software what encoding to use. 

 

Create a {shape file name}.cgp with the following for ISO-8859-1;

ISO 88591

I know how to set the encoding, that's not the issue. The issue is the shapefile writer previously had an encoding option that no longer exists


mark2atsafe
Safer
Forum|alt.badge.img+56
  • Safer
  • 2554 replies
  • Best Answer
  • June 5, 2020

I looked up some info in our developer database and found this useful set of q+a's:

"Is there a way to set the output encoding the same as the input?"
  • The writer detects the encoding from source if you set the Writer encoding to <not set>.
"What does ANSI here mean? Does it mean system default encoding?"
  • Yes, ANSI is a term used to indicate system default encoding, so it will vary depending on the locale of the system you are running Workbench on.
"If no encoding is set on the Writer, what will happen to the attributes? Will they be written as is? Will they be system default encoding encoded?"
  • As stated above, the default encoding will be system default. If you change the encoding to <not set>, the attributes will be encoded as is from the source.

So it seems that ANSI means "system encoding". You would set it to the system encoding you want (like cp-1252) or leave it unset if you wanted the same encoding as the source.


ebygomm
Influencer
Forum|alt.badge.img+44
  • Author
  • Influencer
  • 3427 replies
  • June 5, 2020

I looked up some info in our developer database and found this useful set of q+a's:

"Is there a way to set the output encoding the same as the input?"
  • The writer detects the encoding from source if you set the Writer encoding to <not set>.
"What does ANSI here mean? Does it mean system default encoding?"
  • Yes, ANSI is a term used to indicate system default encoding, so it will vary depending on the locale of the system you are running Workbench on.
"If no encoding is set on the Writer, what will happen to the attributes? Will they be written as is? Will they be system default encoding encoded?"
  • As stated above, the default encoding will be system default. If you change the encoding to <not set>, the attributes will be encoded as is from the source.

So it seems that ANSI means "system encoding". You would set it to the system encoding you want (like cp-1252) or leave it unset if you wanted the same encoding as the source.

Thanks Mark, knowing that ANSI here meant system encoding is really useful, and makes sense from my investigations.