Multi-Language CJK (Chinese-Japanese-Korean) characters support in FIX Antenna/FIX-Protocol
Overview
FIX Antenna products fully support UTF-8 encoding, i.e all the CJK (Chinese-Japanese-Korean) symbols.
FIX-Protocol and multibyte encoding.
The support of non-ASCII characters was introduced in FIX-protocol since FIX 4.2 (https://www.fixtrading.org/standards/fix-4-2/).
The usage of the multibyte encodings is covered by FIX protocol with the followed algorithm:
Special Encoded fields are added for work with Non-ASCII symbols.
The field MessageEncoding(347) should be specified with the encoding which is used in the other Encoded* fields of the message.
The length fields (Encoded*Len) should contain the count of BYTES (Important: not count of symbols) contained in corresponding Encoded* field.
UTF-8 can be used in any text field. In order to keep FIX-protocol compatibility the counterparty must also expect UTF-8 encoding in the fields.
In case if UTF-16 or UTF-32 are used, the described approach leads to protocol violation because the 0x01 symbol in these encodings would be used inappropriately.
Work with Encoded fields
FIX Antenna and FIXEdge support and correctly processes Encoded fields and UTF-8 in non-encoded fields.
For FIX Antenna, it is user responsibility to convert the Unicode string with UTF-8 content to the UTF-8 string and vice-versa.
The list of encoded tags
Example
The example shows how to work with tags: EncodedText (355) and EncodedTextLen (354) encoded by MessageEncoding (347)
Field name | Field number | Field value |
|---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedTextLen | 354 | 15 |
EncodedText | 355 | こんにちは |
Message example:
FIX Client Simulator doesn't fully support multibyte encoded characters in Send Message textbox.
Work with User-defined Encoded Fields
To create a new user-defined field using encoded symbols one should create an extra field for the length of the encoded text in bytes.
Example
Field name | Field number | Field value |
|---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedUserFieldLen | 50354 | 15 |
EncodedUserField | 50355 | こんにちは |
The counterparty must also expect the encoding in these fields.
Dictionary configuration example:
additional.xml
<fielddic>
<!-- ... -->
<fielddef tag="50354" name="EncodedUserFieldLen" type="int"/>
<fielddef tag="50355" name="EncodedUserField" type="String"/>
<!-- ... -->
</fielddic>
<msgdic>
<!-- ... -->
<msgdef msgtype="B" name="NEWS">
<!-- ... -->
<field tag="50354" name="EncodedUserFieldLen" condreq="existtags(T$50355)"/>
<field tag="50355" name="EncodedUserField" condreq="existtags(T$50354)"/>
</msgdef>
<!-- ... -->
</msgdic>Message example:
FIX Protocol and UTF-16/UTF-32 Encoding
In UTF-16 or UTF-32 encodings, 0x01 is a page code and can be contained in the field content it makes UTF-16 incompatible with FIX-Protocol.
There is no need to use UTF-16 or UTF-32 while it can be replaced with UTF-8