This is probably rather a Windows 10 problem but maybe someone can help: If I use the StringLength function I get wrong counts for words with German special characters (e.g. "Ä" or "ß") because they get a length of 2 instead of 1. When I ran the same workspace on a colleagues machine the length was calculated correctly so I guess I have to change some region or language settings but so far I could not find the culprit. Any idea what's the issue here? Thank you for any help.Edit: For some reason the StringLengthCalculator Transformer calculates the right length.

Problem with string lenght

david_r
8355 replies
5 years ago
June 3, 2020

Which version of FME is this? I get the same length for both @StringLength() and the StringLengthCalculator in FME 2020.

It could be linked to the usage of Unicode, where the number of bytes doesn't necessarily correspond to the number of characters. Notably, the letter "Ä" in Unicode is represented by the two bytes c3 + 84 (hex). If an algorithm doesn't account for this, or if it doesn't know that it's a multibyte string, it might return the byte length (=2) rather than the character length (=1) for the string "Ä".

If you send the string to e.g. the Logger in FME you should be able to see the encoding associated with the attribute, e.g.

+2

chrisatsafe
Contributor
606 replies
5 years ago
June 3, 2020

HI @kasparlov,

Sorry to hear you are running into this issue.

This has been reported in the past and is currently being tracked in our system as FMEENGINE-48508 (also posted on this idea). For the time being, please continue using the StringLengthCalculator transformer.

If you'd like to be added as a contact for the tracked issue, please submit a case and reference FMEENGINE-48508 or let me know and I can create a case on your behalf. Additionally, please be sure to upvote and comment on the linked idea!

+2

chrisatsafe
Contributor
606 replies
5 years ago
June 3, 2020

david_r wrote:

Which version of FME is this? I get the same length for both @StringLength() and the StringLengthCalculator in FME 2020.

It could be linked to the usage of Unicode, where the number of bytes doesn't necessarily correspond to the number of characters. Notably, the letter "Ä" in Unicode is represented by the two bytes c3 + 84 (hex). If an algorithm doesn't account for this, or if it doesn't know that it's a multibyte string, it might return the byte length (=2) rather than the character length (=1) for the string "Ä".

If you send the string to e.g. the Logger in FME you should be able to see the encoding associated with the attribute, e.g.

This is spot on with the developer comments on the tracked issue. The long winded explanation can be found here: http://unicode.org/faq/char_combmark.html#7

It seems the expectation would be to count graphemes, (what is rendered on the screen) as a single logical character rather than bytes/code units.

david_r
8355 replies
5 years ago
June 3, 2020

chrisatsafe wrote:

This is spot on with the developer comments on the tracked issue. The long winded explanation can be found here: http://unicode.org/faq/char_combmark.html#7

It seems the expectation would be to count graphemes, (what is rendered on the screen) as a single logical character rather than bytes/code units.

Yeah, it's a fairly common challenge that is far from unique to FME.

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

chrisatsafe wrote:

HI @kasparlov,

Sorry to hear you are running into this issue.

This has been reported in the past and is currently being tracked in our system as FMEENGINE-48508 (also posted on this idea). For the time being, please continue using the StringLengthCalculator transformer.

If you'd like to be added as a contact for the tracked issue, please submit a case and reference FMEENGINE-48508 or let me know and I can create a case on your behalf. Additionally, please be sure to upvote and comment on the linked idea!

Hi @chrisatsafe,

I guess StringLengthCalculator would be a workaround but I still don't understand why the same workspace runs fine on other machines on the same FME version.

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

david_r wrote:

Which version of FME is this? I get the same length for both @StringLength() and the StringLengthCalculator in FME 2020.

It could be linked to the usage of Unicode, where the number of bytes doesn't necessarily correspond to the number of characters. Notably, the letter "Ä" in Unicode is represented by the two bytes c3 + 84 (hex). If an algorithm doesn't account for this, or if it doesn't know that it's a multibyte string, it might return the byte length (=2) rather than the character length (=1) for the string "Ä".

If you send the string to e.g. the Logger in FME you should be able to see the encoding associated with the attribute, e.g.

I'm on 2019.2.1 but get the same result on 2020 RC.

david_r
8355 replies
5 years ago
June 3, 2020

pflegpet wrote:

Hi @chrisatsafe,

I guess StringLengthCalculator would be a workaround but I still don't understand why the same workspace runs fine on other machines on the same FME version.

The answer may be found in the source dataset, which format is it? Are there any encoding options on the reader? If the source is the HTTPCaller, make sure to specify the result encoding rather than letting FME guess. Encoding guesses may take the OS settings into account (all things depending) and that could potentially explain the issues.

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

david_r wrote:

The answer may be found in the source dataset, which format is it? Are there any encoding options on the reader? If the source is the HTTPCaller, make sure to specify the result encoding rather than letting FME guess. Encoding guesses may take the OS settings into account (all things depending) and that could potentially explain the issues.

Encoding is set to UTF-8 in the reader

david_r
8355 replies
5 years ago
June 3, 2020

pflegpet wrote:

Encoding is set to UTF-8 in the reader

Then try the tip about sending the features to the Logger directly before the StringLength calculation, to check that it still says UTF-8 on those attributes. If not, then some transformer did something with the attribute encoding along the way.

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

david_r wrote:

Then try the tip about sending the features to the Logger directly before the StringLength calculation, to check that it still says UTF-8 on those attributes. If not, then some transformer did something with the attribute encoding along the way.

I just checked - its UTF-8 before and after the string length calculation

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

I found the problem. In the Region Settings on Windows 10 there is a checkbox "Beta: Use Unicode UTF-8 for worldwide language support" which was checked. When I unchecked the option the string length was calculated correctly.

+8

pflegpet
Author
Contributor
62 replies
5 years ago
June 3, 2020

I found the problem. In the Region Settings on Windows 10 there is a checkbox "Beta: Use Unicode UTF-8 for worldwide language support" which was checked. When I unchecked the option the string length was calculated correctly.

david_r
8355 replies
5 years ago
June 4, 2020

pflegpet wrote:

I found the problem. In the Region Settings on Windows 10 there is a checkbox "Beta: Use Unicode UTF-8 for worldwide language support" which was checked. When I unchecked the option the string length was calculated correctly.

Good find, that's really interesting. @chrisatsafe this may be of relevance for the developers...

+2

chrisatsafe
Contributor
606 replies
5 years ago
June 4, 2020

david_r wrote:

Good find, that's really interesting. @chrisatsafe this may be of relevance for the developers...

Noted, I'll add that as a comment on the tracked issue.

Problem with string lenght

14 replies

Reply

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Technology of surface kinoformicon

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings