Skip to main content

MbStringByteSplitter

  • February 24, 2026
  • 0 replies
  • 7 views

fmelizard
Safer
Forum|alt.badge.img+22
FME Hub user takashi just uploaded a new transformer to the FME Hub.

Splits a character string into tokens whose lengths were determined by number of bytes in a specific encoding; the resulting tokens will be stored in a list named "col{}".

This transformer may be useful if you need to split a character string including multi-byte characters according to number of bytes in a specific encoding.

Note: This transformer has been tested with Japanese Windows only. If you will use it in other locale such as ksc5601 (Korean), gb2312 (Simplified Chinese), big5 (Traditional Chinese) etc., please test enough (and modify if necessary) before embedding it to your workspaces.

Example

Source string: "abあいうえお1234"

# The string contains Japanese characters. Hope those will be displayed correctly in your system!

This transformer (Character Encoding: cp932, Byte Numbers: 2,4,4,4,2) splits the source into:

col{0} = 'ab'

col{1} = 'あい'

col{2} = 'うえ'

col{3} = 'お12'

col{4} = '34'

Here, "cp932" is the default encoding of Japanese Windows; it is almost equivalent to Shift JIS (a Japanese standard character encoding).

In the encoding, the number of bytes of a Japanese character (あ, い, う, え, お) is two, but an ASCII character (a, b, 1, 2, 3, 4) is one byte as well as the international standard.

Comparison: The AttributeSplitter (Delimiter or Format String: 2s4s4s4s2s) splits the source into:

_list{0} = 'ab'

_list{1} = 'あいうえ'

_list{2} = 'お123'

_list{3} = '4'

_list{4} = ''



Would you like to know more? Click here to find out more details!
This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.