First of all, regarding the UTF-16 encoding: there is no need to use it at all, the one can use UTF-8 instead with no risk.
UTF-8 supports
Table of Contents |
---|
Overview
FIX Antenna products fully support UTF-8 encoding, i.e all the CJK (Chinese-JapaniseJapanese-Korean) symbols but has no other meanings for 0x01 instead of SOH.In UTF-16 or Unicode encodings, 0x01 is a page code and can be contained in the field content.
Usage
FIX-Protocol and multibyte encoding.
The support of non-ASCII characters was introduced in FIX-protocol since FIX 4.2 (https://www.fixtrading.org/standards/fix-4-2/).
The usage of the multibyte encodings is covered by FIX protocol since FIX 4.2 with the followed algorithm:
- If the field has no Encoded analogue, there is no possibility to use Non-ASCII symbols in this field and still remain compliant with FIX spec.
- If the field has Encoded analog, the Special Encoded fields are added for work with Non-ASCII symbols.
- The field MessageEncoding(347) should be presented and contain specified with the encoding name which is used in the other Encoded* fields of the message.
- The
...
- length fields (Encoded*Len) should contain the count of BYTES (Important: not count of symbols) contained in
...
- corresponding Encoded* field.
However, nothing prevents to use UTF-8 8 can be used in any text field. It is not a FIX-compliant way, but the only requirement for such a trick is that the counterparty should In order to keep FIX-protocol compatibility the counterparty must also expect UTF-8 in such field, protocol requirements will not be violated in this case.
Regarding UTF-16 or Unicode: such trick will lead to protocol violations encoding in the fields.
Info |
---|
In case if UTF-16 or UTF-32 areused, the described approach leads to protocol violation because the 0x01 symbol |
...
in |
...
these encodings would be used inappropriately. |
Work with Encoded fields
FIX Antenna and FIXEdge support and correctly processes Encoded fields , supports and correctly processes UTF-8 in non-encoded fields.
For FAFIX Antenna, it is user responsibility to convert ASCII the Unicode string with UTF-8 content to the UTF-8 string and vise vice-versa.
Следующие поля для поддержки различных кодировок должны присутствовать в словаре:
вставить ссылки на FIXopedia
For custom tags, if the text tag has related pair tag with text length specified, the one can use UTF-8, Unicode or UTF-16 there, if the length should be specified in bytes. In case of the length should be specified in symbols, only UTF-8 can be used. In any scenario, counterparty should expect such encoding in such field.
The list of encoded tags
Example
The example shows how to work with tags: EncodedText (355) and EncodedTextLen (354) encoded by MessageEncoding (347)
Field name | Field number | Field value |
---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedTextLen | 354 | 15 |
EncodedText | 355 | こんにちは |
Message example: encoding testing.txt
Info |
---|
FIX Client Simulator doesn't fully support multibyte encoded characters in Send Message textbox. |
Work with User-defined Encoded Fields
To create a new user-defined field using encoded symbols one should create an extra field for the length of the encoded text in bytes.
Example
Field name | Field number | Field value |
---|---|---|
MessageEncoding | 347 | UTF-8 |
EncodedUserFieldLen | 50354 | 15 |
EncodedUserField | 50355 | こんにちは |
The counterparty must also expect the encoding in these fields.
Dictionary configuration example:
Code Block | ||||
---|---|---|---|---|
| ||||
<fielddic>
<!-- ... -->
<fielddef tag="50354" name="EncodedUserFieldLen" type="int"/>
<fielddef tag="50355" name="EncodedUserField" type="String"/>
<!-- ... -->
</fielddic>
<msgdic>
<!-- ... -->
<msgdef msgtype="B" name="NEWS">
<!-- ... -->
<field tag="50354" name="EncodedUserFieldLen" condreq="existtags(T$50355)"/>
<field tag="50355" name="EncodedUserField" condreq="existtags(T$50354)"/>
</msgdef>
<!-- ... -->
</msgdic> |
Message example: encoding testing custom.txt
FIX Protocol and UTF-16/UTF-32 Encoding
In UTF-16 or UTF-32 encodings, 0x01 is a page code and can be contained in the field content it makes UTF-16 incompatible with FIX-Protocol.
There is no need to use UTF-16 or UTF-32 while it can be replaced with UTF-8