Cluster overview
The cluster functionality suggests that several instances of the FIXEdge Java server work together where one instance is a primary instance and the other ones serve as standby nodes.
The cluster service provides high availability (HA) of the FIXEdge connectivity service.
The Cluster Manager service automatically resolves and registers FEJ servers as cluster nodes and monitors their state. It defines the primary node automatically at the cluster's initial start.
If the primary node fails for some reason, the remaining nodes in the cluster can automatically choose the next leader node in the cluster, or it can be done manually depending on the working mode of the cluster.
The cluster service allows establishing communication between several applications that are placed on different boxes. Each application that uses the cluster service is a cluster node. The node can be run in leader or backup mode. Leader mode means that the node is active, accepts network connection and performs useful work. Backup nodes are used for preparing fast replacing if the leader node becomes unavailable. If this occurs, the backup node can be switched to leader mode and can provide the same service.
The Replication service maintains backup copies of the primary's Persistence API storage and supports an actual, ready-to-work copy of the current data on standby servers within the cluster. Replication supports both synchronous and asynchronous modes for certain FIX sessions.
- Synchronous replication is recommended when data loss is critical (for example, for order processing).
- Asynchronous replication is recommended when performance is very important while clear data is not essential, or data can be restored for other sources (for example, for market data processing).
The Replication service uses the Aeron transport for transmitting data to backup instances. In the synchronous mode, the storage sends notifications about every operation to the backups and waits for acknowledgment. It blocks calling the thread until it has received acknowledgment or the predefined timeout expires.
The replication service consists of 2 parts: a leader and a backup. Depending on the instance role, it needs to initialize and start one or another instance.
The leader instance is responsible for:
notifications about new and existing storage
delivering notifications about a storage’s operations
handling synchronization requests
The backup instance is responsible for:
creating required storages on backup
synchronizing storages
update storage states (process notifications about storage operations)