Overview
FIX Antenna (Java?) products fully support UTF-8 encoding.
UTF-8 supports all the CJK (Chinese-Japanise-Korean) symbols but has no other meanings for 0x01 instead of SOH.
FIX-Protocol and non-ASCII characters
The support of non-ASCII characters was introduced in FIX-protocol since FIX 4.2 (??? https://www.fixtrading.org/standards/fix-4-2/).
The usage of the multibyte encodings is covered by FIX protocol with the followed algorithm:
- Special Encoded fields are added for work with Non-ASCII symbols.
- The field MessageEncoding(347) should be specified with the encoding which are used in the other Encoded* fields of the message.
- The length fields (Encoded*Len) should contain the count of BYTES (Important: not count of symbols) contained in corresponding Encoded* field.
However, one can use UTF-8 in any text field. In order to keep FIX-protocol compatibility the counterparty must also expect UTF-8 encoding in the fields.
Work with Encoded fields
FIX Antenna and FIXEdge support and correctly processes Encoded fields and UTF-8 in non-encoded fields.
For FIX Antenna, it is user responsibility to convert ASCII string with UTF-8 content to the UTF-8 string and vice-versa.
The list of encoded tags
Example
The example shows how to work with tags: EncodedText (355) and EncodedTextLen (354) encoded by MessageEncoding (347)
MessageEncoding | 347 | Shift_JIS |
EncodedText | 355 | こんにちは |
EncodedTextLen | 354 | 15 |
Message example: encoding testing.txt
For custom tags,
if the text tag has related pair tag with text length specified, the one can use UTF-8, Unicode or UTF-16 there, if the length should be specified in bytes. In case of the length should be specified in symbols, only UTF-8 can be used. In any scenario, counterparty should expect such encoding in such field.
FIX Protocol and UTF-16 Encoding
In UTF-16 or Unicode encodings, 0x01 is a page code and can be contained in the field content it makes UTF-16 incompatible with FIX-Protocol.
There is no need to use UTF-16 while it can be replaced with UTF-8