Splits a character string into tokens whose lengths were determined by number of bytes in a specific encoding; the resulting tokens will be stored in a list named "col{}".
This transformer may be useful if you need to split a character string including multi-byte characters according to number of bytes in a specific encoding.
Note: This transformer has been tested with Japanese Windows only. If you will use it in other locale such as ksc5601 (Korean), gb2312 (Simplified Chinese), big5 (Traditional Chinese) etc., please test enough (and modify if necessary) before embedding it to your workspaces.
Example
Source string: "abあいうえお1234"
# The string contains Japanese characters. Hope those will be displayed correctly in your system!
This transformer (Character Encoding: cp932, Byte Numbers: 2,4,4,4,2) splits the source into:
col{0} = 'ab'
col{1} = 'あい'
col{2} = 'うえ'
col{3} = 'お12'
col{4} = '34'
Here, "cp932" is the default encoding of Japanese Windows; it is almost equivalent to Shift JIS (a Japanese standard character encoding).
In the encoding, the number of bytes of a Japanese character (あ, い, う, え, お) is two, but an ASCII character (a, b, 1, 2, 3, 4) is one byte as well as the international standard.
Comparison: The AttributeSplitter (Delimiter or Format String: 2s4s4s4s2s) splits the source into:
_list{0} = 'ab'
_list{1} = 'あいうえ'
_list{2} = 'お123'
_list{3} = '4'
_list{4} = ''
Would you like to know more? Click here to find out more details!

