First of all, regarding the UTF-16 encoding: there is no need to use it at all, the one can use UTF-8 instead with no risk.
UTF-8 supports
Table of Contents |
---|
Overview
FIX Antenna products fully support UTF-8 encoding, i.e all the CJK (Chinese-JapaniseJapanese-Korean) symbols but has no other meanings for 0x01 instead of SOH.In UTF-16 or Unicode encodings, 0x01 is a page code and can be contained in the field content.
Usage
FIX-Protocol and multibyte encoding.
The support of non-ASCII characters was introduced in FIX-protocol since FIX 4.2 (https://www.fixtrading.org/standards/fix-4-2/).
The usage of the multibyte encodings is covered by FIX protocol since FIX 4.2 with the followed algorithm:
- If the field has no Encoded analogue, there is no possibility to use Non-ASCII symbols in this field and still remain compliant with FIX spec.
- If the field has Encoded analog, the Special Encoded fields are added for work with Non-ASCII symbols.
- The field MessageEncoding(347) should be presented and contain specified with the encoding name which is used in the other Encoded* fields of the message.
- The
...
- length fields (Encoded*Len) should contain the count of BYTES (Important: not count of symbols) contained in
...
- corresponding Encoded* field.
However, nothing prevents to use UTF-8 8 can be used in any text field. It is not a FIX-compliant way, but the only requirement for such a trick is that the counterparty should In order to keep FIX-protocol compatibility the counterparty must also expect UTF-8 in such field, protocol requirements will not be violated in this case.
Regarding UTF-16 or Unicode: such trick will lead to protocol violations encoding in the fields.
Info |
---|
In case if UTF-16 or UTF-32 areused, the described approach leads to protocol violation because the 0x01 symbol |
...
in |
...
these encodings would be used inappropriately. |
Work with Encoded fields
FIX Antenna and FIXEdge support and correctly processes Encoded fields , supports and correctly processes UTF-8 in non-encoded fields.
For FAFIX Antenna, it is user responsibility to convert ASCII the Unicode string with UTF-8 content to the UTF-8 string and vise versa.
Tag support tables for base TP ICAP FIX dictionaries can be found in the attachment.
For custom tags, if the text tag has related pair tag with text length specified, the one can use UTF-8, Unicode or UTF-16 there, if the length should be specified in bytes. In case of the length should be specified in symbols, only UTF-8 can be used. In any scenario, counterparty should expect such encoding in such field.vice-versa.
The list of encoded tags
Example
The example shows how to work with tags: EncodedText (355) and EncodedTextLen (354) encoded by MessageEncoding (347)
Field name | Field number | Field value |
---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedTextLen | 354 | 15 |
EncodedText | 355 | こんにちは |
Message example: encoding testing.txt
Info |
---|
FIX Client Simulator doesn't fully support multibyte encoded characters in Send Message textbox. |
Work with User-defined Encoded Fields
To create a new user-defined field using encoded symbols one should create an extra field for the length of the encoded text in bytes.
Example
Field name | Field number | Field value |
---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedUserFieldLen | 50354 | 15 |
EncodedUserField | 50355 | こんにちは |
The counterparty must also expect the encoding in these fields.
Dictionary configuration example:
Code Block | ||||
---|---|---|---|---|
| ||||
<fielddic>
<!-- ... -->
<fielddef tag="50354" name="EncodedUserFieldLen" type="int"/>
<fielddef tag="50355" name="EncodedUserField" type="String"/>
<!-- ... -->
</fielddic>
<msgdic>
<!-- ... -->
<msgdef msgtype="B" name="NEWS">
<!-- ... -->
<field tag="50354" name="EncodedUserFieldLen" condreq="existtags(T$50355)"/>
<field tag="50355" name="EncodedUserField" condreq="existtags(T$50354)"/>
</msgdef>
<!-- ... -->
</msgdic> |
Message example: encoding testing custom.txt
FIX Protocol and UTF-16/UTF-32 Encoding
In UTF-16 or UTF-32 encodings, 0x01 is a page code and can be contained in the field content it makes UTF-16 incompatible with FIX-Protocol.
There is no need to use UTF-16 or UTF-32 while it can be replaced with UTF-8