Recovery procedure for a session with corrupted storages

 The article is applicable for FIXEdge 5.13.0 and later versions.

Overview

In FIXEdge 5.13.0 a new mechanism for detecting broken storages was introduced:

  • FIXEdge detects and recovers broken storage on startup.

Work of session with broken storages is restricted because of data inconsistency. Incorrect data in messages may have a negative impact on business.

Problem Detection

Pay your attention to the following conditions:

  • A Session is in Planned state (has black status) or in Undefined state (has grey status) in FIXICC 
  • FIX engine log has error "Error load storage" on startup

    example
    2016-12-16 17:52:00,752 UTC ERROR [Engine] 139950496225088 Error load storage /log/FIXEdge1/logs//FIXEDGE1-FIXCLIENT1:Error parsing last message. Invalid MsgType. Parsing stop
    ped at column: 25 [RefSeqNum: 17837, RefTagID: 35, RefMsgType: 6]
    2016-12-16 17:52:00,752 UTC WARN [EngineAdaptor] 139950496225088 Session <FIXEDGE1,FIXCLIENT1> cannot be started now. Reason: Error parsing last message. Invalid MsgType. Pars
    ing stopped at column: 25 [RefSeqNum: 17837, RefTagID: 35, RefMsgType: 6]

Note: The conditions above must be checked during Start of Day procedure, after each FIXEdge failure/restart and in all the cases when storage is restored.

Automatic Recovery Scenario

FIXEdge provides the possibility to setup the property Storage Recovery Strategy that defines how the "broken storage case" should be handled.

The property is setup in FIXEdge.properties file. In case StorageRecoveryStrategy = CREATE_NEW_ON_ERROR, FIXEdge automatically creates new storage on any error related to broken storage.

In case StorageRecoveryStrategy = NONE, session won't be started and manual recovery is required.

Manual Recovery Scenario

Below are steps that should be performed in case of manual recovery scenario (means StorageRecoveryStrategy = NONE):

  1. Move corrupted logs to another directory.
    It is recommended to use archive directory (typically located here: ..\FIXEdge\FIXEdge1\log\archive).
  2. Start session via FIXICC.
    Right Click on the session -> Start Session:
  3. Ensure that session is in Connecting state (has yellow status)
  4. Notify the counterparty about session new state (Ready for accepting connections) if necessary


Sequence numbers are reset to 1 it this case.

Messages recovery in controversial situations is supposed to be done manually.


Additional Information

Reasons why storages can be considered as corrupted

  1. FIX session's persistent(MM) storage determined by 5 files (*.conf, *.in, *.out, *.in.ndx, *.out.ndx). An absence of any of them means that storage is considered corrupted.
    New session's storage will be created if none of the files exists.
  2. Storage conf file (*.conf) is broken (e.g. incorrect StorageCreationTime, mismatched LastSentSeqNum, IncomingSeqNum, etc).
  3. Index file (*.out.ndx) contains wrong checksum for stored data.
  4. The last FIX-message in outgoing (*.out) files is bad.

Limitations

  • FIXEdge prior 5.13.0 starts with broken storages so that it may cause FIXEdge crash or sending wrong/missing data on a resend request.
  • Lost messages caused by FIXEdge failure are requested from counterparty using FIX gap fill mechanism (MsgType = 4). 
  • Lost messages which are unable to be recovered by means of FIX gap fill mechanism (MsgType = 4) are considered completely lost. 
  • In case of messages routing from one-to-multiple sessions, messages not delivered to some sessions during FIXEdge failure are considered completely lost.