Table of Contents

Linux cluster

To service multiple FIX sessions reliably, FIXEdge can be deployed in a Linux HA cluster (Red Hat High-Availability Add-on Overview) with 2 or 3 nodes and shared storage for keeping a session's state.

The possible configurable options of shared state are:

...

This approach gives a working, highly-available solution and works well for active-passive clusters with few nodes and static FIX session configuration.

Health checks

Health checks are used to figure out if a node or an application that runs on the node is operating properly.

The simplest way to do a node health check is to monitor the FIXEdge PID file.

More precise checks:

Check the system status via FIXEdge Admin REST API by sending the special GET Request
Establish FIX Admin monitoring session.

Shared physical device

Shared storage might be a SAN-attached device, Fibre channel attached device or TCP/IP attached device. The device might be attached to all nodes simultaneously, or the cluster resource manager can attach it to the active node only. The device might, in turn, be a resilient one, presenting the distributed file system with software or hardware replication between filesystem nodes. In the case of a geographically distributed cluster, the shared storage also can be distributed geographically in the same way as cluster nodes, and the cluster resource manager can attach the storage instance to the node in the same geo-location.

...

Install packages from a repository:
Code Block
language bash
yum install corosync pcs pacemaker
Set the password for the hacluster user:
Code Block
language bash
passwd hacluster

Open ports on the firewall:

Code Block

language	bash

firewall-cmd --add-service=high-availability 
firewall-cmd --runtime-to-permanent

Enable cluster services to run at system start-up:
Code Block
language bash
systemctl enable pcsd corosync pacemaker

...

Do these steps on both servers: NODE_1_NAME and NODE_2_NAME

Download and install:

Code Block

language	bash

$ sudo wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo
$ sudo yum install glusterfs
$ sudo yum install glusterfs-fuse
$ sudo yum install glusterfs-server

Check installed version:

Code Block

language	bash

$ glusterfsd --version

glusterfs 3.6.2 built on Jan 22 2015 12:58:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

Start glusterfs services on all servers and enable them to start automatically on startup:
Code Block
language bash
$ sudo /etc/init.d/glusterd start $ sudo chkconfig glusterfsd on

...

On both nodes, install the needed software:
Code Block
language bash
$ sudo yum install corosync pcs pacemaker
On both nodes, set the password for hacluster user ('epmc-cmcc' was used):
Code Block
language bash
$ sudo passwd hacluster

Configure Firewall on both nodes to allow cluster traffic:

Code Block

language	bash

$ sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT
$ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT
$ sudo iptables -I INPUT -p igmp -j ACCEPT
$ sudo iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT
$ sudo service iptables save

Start the pcsd service on both nodes:
Code Block
language bash
$ sudo systemctl start pcsd
From now on, all commands need to be executed on one node only. We can control the cluster by using PCS from one of the nodes.
Since we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration. Use the previously configured hacluster user and password to do this:
Code Block
language bash
$ sudo pcs cluster auth NODE_1_NAME NODE_2_NAME Username: hacluster Password: NODE_1_NAME: Authorized NODE_2_NAME: Authorized

Create the cluster and add nodes. This command creates the cluster node configuration in /etc/corosync.conf.

Code Block

language	bash

$ sudo pcs cluster setup --name fixedge_cluster NODE_1_NAME NODE_2_NAME 
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
NODE_1_NAME: Succeeded
NODE_2_NAME: Succeeded

We can start cluster now:

Code Block

language	bash

$ sudo pcs cluster start --all
NODE_1_NAME: Starting Cluster...
NODE_2_NAME: Starting Cluster...

We can check cluster status:

Code Block

language	bash

$ sudo pcs status cluster
Cluster Status:
 Last updated: Tue Jan 27 22:11:15 2015
 Last change: Tue Jan 27 22:10:48 2015 via crmd on NODE_1_NAME
 Stack: corosync
 Current DC: NODE_1_NAME (1) - partition with quorum
 Version: 1.1.10-32.el7_0.1-368c726
 2 Nodes configured
 0 Resources configured


$ sudo pcs status nodes
Pacemaker Nodes:
 Online: NODE_1_NAME NODE_2_NAME
 Standby:
 Offline:
 
$ sudo corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.17.131.127)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.17.131.128)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined


$ sudo pcs status corosync
Membership information
----------------------
    Nodeid      Votes Name
         1          1 NODE_1_NAME (local)
         2          1 NODE_2_NAME

Disable the STONITH option as we don't have STONITH devices in our demo virtual environment:
Code Block
language bash
$ sudo pcs property set stonith-enabled=false

For a two-node cluster we must disable the quorum:

Code Block

language	bash

$ sudo pcs property set no-quorum-policy=ignore
$ sudo pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-32.el7_0.1-368c726
 no-quorum-policy: ignore
 stonith-enabled: false

Add Virtual IP as a resource to the cluster:

Code Block

language	bash

$ sudo pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=10.17.135.17 cidr_netmask=32 op monitor interval=30s
$ sudo pcs status resources
 virtual_ip (ocf::heartbeat:IPaddr2): Started

Add FIXEdge as a resource to cluster:

Code Block

language	bash

$ sudo pcs resource create FIXEdge ocf:heartbeat:anything params binfile="/home/user/FixEdge/bin/FIXEdge" cmdline_options="/data/FixEdge1/conf/FIXEdge.properties" user="user" logfile="/home/user/FIXEdge_resource.log" errlogfile="/home/user/FIXEdge_resource_error.log"

Note

For some reason in the /usr/lib/ocf/resource.d/ of the installed cluster there are many missing agents, including ocf:heartbeat:anything. You need to modify the original version (which you can download here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/anything) to make it working. The working version of the agent is attached.

This file should be copied to /usr/lib/ocf/resource.d/ and make executable:

Code Block

language	bash

$ sudo cp anything /usr/lib/ocf/resource.d/heartbeat/
$ sudo chmod a+rwx /usr/lib/ocf/resource.d/heartbeat/anything

Also, to make this agent works the following lines shall be added to sudoers file:

Code Block

language	bash

$ sudo visudo
Defaults    !requiretty
user    ALL=(user)      NOPASSWD: ALL
root    ALL=(user)      NOPASSWD: ALL

In order to make sure that the Virtual IP and FIXEdge always stay together, we can add a constraint:
Code Block
language bash
$ sudo pcs constraint colocation add FIXEdge virtual_ip INFINITY
To avoid the situation where the FIXEdge would start before the virtual IP is started or owned by a certain node, we need to add another constraint that determines the order of availability of both resources:
Code Block
language bash
$ sudo pcs constraint order virtual_ip then FIXEdge Adding virtual_ip FIXEdge (kind: Mandatory) (Options: first-action=start then-action=start)

After configuring the cluster with the correct constraints, restart it and check the status:

Code Block

language	bash

$ sudo pcs cluster stop --all && sudo pcs cluster start --all
NODE_1_NAME: Stopping Cluster...
NODE_2_NAME: Stopping Cluster...
NODE_2_NAME: Starting Cluster...
NODE_1_NAME: Starting Cluster...

The cluster configuration is now completed.

...

The session recovery procedure happens automatically. The missing messages should be recovered with a resend request procedure automatically.

Info

Session recovery requires a reset sequence in case of damaged storage. The messages from the beginning of the day will be lost when resetting the sequence. See Recovery procedure for a session with corrupted storagesIt is possible to manually request missing messages since sequence number equal one using the Logon (A) with the desired sequence and the subsequent Request Request (2) messages (see the FIX specification https://www.fixtrading.org/standards/)

FIX Standard recommends requesting sequences after logon using Message recovery procedure or use Extended features for FIX session and FIX connection initiation

Achievable RTO Values: "Greater than 0 seconds, up to and including 2 minutes"

...

Versions Compared

Old Version 34

New Version 35

Key

Linux cluster

Health checks

Shared physical device

Page Comparison

Versions Compared

Old Version 34

New Version 35

Key

Linux cluster

Health checks

Shared physical device