Table of Contents |
---|
Linux cluster
To service multiple FIX sessions reliably FIXEdge can be deployed in a Linux HA cluster (Red Hat High-Availability Add-on Overview) with 2 or 3 nodes and shared storage for keeping sessions state.
The possible configurable options of shared state are:
...
This approach gives a working highly-available solution and works well for active-passive clusters with few nodes and static FIX session configuration.
Health checks
Health checks are used to figure out if a node or an application that runs on the node is operating properly.
The simplest node health check is to monitor the FIXEdge PID file.
More precise checks:
- Check the system status via FIXEdge Admin REST API by sending the special GET Request
- Establish FIX Admin monitoring session.
Shared physical device
Shared storage might be a SAN-attached device, Fibre channel attached device or TCP/IP attached device. The device might be attached to all nodes simultaneously, or the cluster resource manager can attach it to the active node only. The device might be in turn a resilient one, presenting the distributed file system with software or hardware replication between filesystem nodes. In the case of a geographically distributed cluster, the shared storage also can be distributed geographically in the same way as cluster nodes, and the cluster resource manager can attach to the node the storage instance in the same geo-location.
...
Install packages from a repository:
Code Block language bash yum install corosync pcs pacemaker
Set the password for the
hacluster
user:Code Block language bash passwd hacluster
Open ports on the firewall:
Code Block language bash firewall-cmd --add-service=high-availability firewall-cmd --runtime-to-permanent
Enable cluster services to run at the system start-up:
Code Block language bash systemctl enable pcsd corosync pacemaker
...
Do these steps on both servers: NODE_1_NAME and NODE_2_NAME
Download and install:
Code Block language bash $ sudo wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo $ sudo yum install glusterfs $ sudo yum install glusterfs-fuse $ sudo yum install glusterfs-server
Check installed version:
Code Block language bash $ glusterfsd --version glusterfs 3.6.2 built on Jan 22 2015 12:58:10 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation.
Start glusterfs services on all servers with enable them to start automatically on startup:
Code Block language bash $ sudo /etc/init.d/glusterd start $ sudo chkconfig glusterfsd on
...
These commands shall be run on both FIXEdge nodes: EVUAKYISD103D.kyiv.epam.com and EVUAKYISD103C.kyiv.epam.comNODE_1_NAME and NODE_2_NAME
Install client components:
Code Block language bash $ sudo wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo $ sudo yum install glusterfs-client
Mount remote storage:
Code Block language bash $ sudo mkdir /data $ sudo chmod a+rwx /data $ sudo mount.glusterfs EVUAKYISD105D.kyiv.epam.com:/clusterdata /data $ mount /dev/mapper/VolGroup00-LogVol00 on / type ext4 (rw,noatime,nodiratime,noacl,commit=60,errors=remount-ro) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) EVUAKYISD105D.kyiv.epam.com:/clusterdata on /data type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
And now add the following line to /etc/fstab to ensure mounting after reboot:
Code Block EVUAKYISD105A.kyiv.epam.com:/clusterdata /data glusterfs defaults,_netdev 0 0
Setting up FIXEdge instances
Copy FixEdge-5.8.2.68334-Linux-2.6.32-gcc447-x86_64*.tar.gz Linux package to /home/user to both nodes: EVUAKYISD103D.kyiv.epam.com and EVUAKYISD103C.kyiv.epam.com.NODE_1_NAME and NODE_2_NAME
On Node1
Unzip. Untar.
Unpack the FIXEdge-x.x.x.tar.gz archive to /home/user/
Move installation folder toCode Block language bash $ gunzip FixEdge-5.8.2.68334-Linux-2.6.32-gcc447-x86_64.tar.gz $ tar xf FixEdge-5.8.2.68334-Linux-2.6.32-gcc447-x86_64.tar
FixEdge:-C /home/user/
Code Block language bash $ cd FixEdge $ cd v.5.8.2.68334 $ mv *
The installation directory will be /home/user/FIXEdge
$ cd .. $ rmdir v.5.8.2.68334Upload licenses (:
- Copy license engine.license and fixaj2-license.bin) to /home/user/FIXEdge folder.
Move FIXEdge instance folder configuration and logs directory to shared storage mounted to /data:
Code Block language bash mv FixEdge1FIXEdge1 /data/
- Edit scripts in /home/user/FixEdgeFIXEdge/bin to direct them to the new FIXEdge instance location:
- replace ".." with "/data".
- Edit FIXICC Agent configuration:
- /home/user/FixEdgeFIXEdge/fixicc-agent/conf/wrapper.conf
- wrapper.daemon.pid.dir = ${wrapper_home}
- /home/user/FixEdge/fixicc-agent/conf/agent.properties
- EngineProperty = /data/FixEdge1FIXEdge1/conf/engine.properties
- FIXEdgeFileSettings = /data/FixEdge1FIXEdge1/conf/FIXEdge.properties
- LogUrl = /data/FixEdge1FIXEdge1/log
- /home/user/FixEdgeFIXEdge/fixicc-agent/conf/wrapper.conf
- Edit FIXEdge and engine configuration:
- /data/FixEdge1/conf/engine.properties
- EngineRoot = /data
- LicenseFile = /home/user/FixEdgeFIXEdge/engine.license
- /data/FixEdge1/conf/FIXEdge.properties
- FIXEdge.RootDir = /data/FixEdge1FIXEdge1
- Log.File.RootDir = /data/FixEdge1FIXEdge1
- TransportLayer.SmtpTA.DllName = /home/user/FixEdgeFIXEdge/bin/libSMTPTA.so
- /data/FixEdge1/conf/engine.properties
Install and Start FIXICC Agent daemon:
Code Block language bash $ cd /home/user/FixEdgeFIXEdge/fixicc-agent/bin $ ./installDaemon.sh $ ./startDaemon.sh
- Now everything is ready to run FIXEdge on Node 1.
Prepare to copy the installation to Node 2:
Code Block language bash $ cd /home/user $ tar cvf FixEdge.tar FixEdge $ gzip FixEdge.tar
- Copy file FixEdge.tar.gz to Node2:/user/home
On Node2
Unzip, untarUnpack the FIXEdge-x.x.x.tar.gz archive to /home/user/
Code Block language bash $ cd /home/user $ gunzip FixEdge.tar.gz $ tar xf FixEdge.tartar xf FixEdge-5.8.2.68334-Linux-2.6.32-gcc447-x86_64.tar -C /home/user/
Install and Start FIXICC Agent daemon:
Code Block language bash $ cd /home/user/FixEdgeFIXEdge/fixicc-agent/bin $ ./installDaemon.sh $ ./startDaemon.sh
...
On both nodes install needed software:
Code Block language bash $ sudo yum install corosync pcs pacemaker
On both nodes set the password for hacluster user ('epmc-cmcc' was used):
Code Block language bash $ sudo passwd hacluster
Configure Firewall on both nodes to allow cluster traffic:
Code Block language bash $ sudo iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT $ sudo iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT $ sudo iptables -I INPUT -p igmp -j ACCEPT $ sudo iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT $ sudo service iptables save
Start the pcsd service on both nodes:
Code Block language bash $ sudo systemctl start pcsd
From now on all commands needs to be executed on one node only. We can control the cluster by using PCS from on of the nodes.
Since we will configure all nodes from one point, we need to authenticate on all nodes before we are allowed to change the configuration. Use the previously configured hacluster user and password to do this:Code Block language bash $ sudo pcs cluster auth EVUAKYISD10AA.kyiv.epam.com EVUAKYISD10AB.kyiv.epam.comNODE_1_NAME NODE_2_NAME Username: hacluster Password: EVUAKYISD10AA.kyiv.epam.comNODE_1_NAME: Authorized EVUAKYISD10AB.kyiv.epam.comNODE_2_NAME: Authorized
Create the cluster and add nodes. This command command creates the cluster node configuration in /etc/corosync.conf.
Code Block language bash $ sudo pcs cluster setup --name fixedge_cluster EVUAKYISD10AA.kyiv.epam.com EVUAKYISD10AB.kyiv.epam.com NODE_1_NAME NODE_2_NAME Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... EVUAKYISD10AA.kyiv.epam.comNODE_1_NAME: Succeeded EVUAKYISD10AB.kyiv.epam.comNODE_2_NAME: Succeeded
We can start cluster now:
Code Block language bash $ sudo pcs cluster start --all EVUAKYISD10AB.kyiv.epam.comNODE_1_NAME: Starting Cluster... EVUAKYISD10AA.kyiv.epam.comNODE_2_NAME: Starting Cluster...
We can check cluster status:
Code Block language bash $ sudo pcs status cluster Cluster Status: Last updated: Tue Jan 27 22:11:15 2015 Last change: Tue Jan 27 22:10:48 2015 via crmd on EVUAKYISD10AA.kyiv.epam.comNODE_1_NAME Stack: corosync Current DC: EVUAKYISD10AA.kyiv.epam.comNODE_1_NAME (1) - partition with quorum Version: 1.1.10-32.el7_0.1-368c726 2 Nodes configured 0 Resources configured $ sudo pcs status nodes Pacemaker Nodes: Online: EVUAKYISD10AA.kyiv.epam.com EVUAKYISD10AB.kyiv.epam.comNODE_1_NAME NODE_2_NAME Standby: Offline: $ sudo corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.17.131.127) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.17.131.128) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined $ sudo pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 EVUAKYISD10AA.kyiv.epam.comNODE_1_NAME (local) 2 1 EVUAKYISD10AB.kyiv.epam.comNODE_2_NAME
Disable the STONITH option as we don't have STONITH devices in our demo virtual environment:
Code Block language bash $ sudo pcs property set stonith-enabled=false
For two-nodes cluster we must disable the quorum:
Code Block language bash $ sudo pcs property set no-quorum-policy=ignore $ sudo pcs property Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.10-32.el7_0.1-368c726 no-quorum-policy: ignore stonith-enabled: false
Add Virtual IP as a resource to the cluster:
Code Block language bash $ sudo pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=10.17.135.17 cidr_netmask=32 op monitor interval=30s $ sudo pcs status resources virtual_ip (ocf::heartbeat:IPaddr2): Started
Add FIXEdge as a resource to cluster:
Code Block language bash $ sudo pcs resource create FIXEdge ocf:heartbeat:anything params binfile="/home/user/FixEdge/bin/FIXEdge" cmdline_options="/data/FixEdge1/conf/FIXEdge.properties" user="user" logfile="/home/user/FIXEdge_resource.log" errlogfile="/home/user/FIXEdge_resource_error.log"
Note For some reason in the /usr/lib/ocf/resource.d/ of the installed cluster there are many missing agents, including ocf:heartbeat:anything. You need to modify the original version (which you can download here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/anything) to make it working. The working version of the agent is attached.
This file should be copied to /usr/lib/ocf/resource.d/ and make executable:
Code Block language bash $ sudo cp anything /usr/lib/ocf/resource.d/heartbeat/ $ sudo chmod a+rwx /usr/lib/ocf/resource.d/heartbeat/anything
Also, to make this agent works the following lines shall be added to sudoers file:
Code Block language bash $ sudo visudo Defaults !requiretty user ALL=(user) NOPASSWD: ALL root ALL=(user) NOPASSWD: ALL
In order to make sure that the Virtual IP and FIXEdge always stay together, we can add a constraint:
Code Block language bash $ sudo pcs constraint colocation add FIXEdge virtual_ip INFINITY
To avoid the situation where the FIXEdge would start before the virtual IP is started or owned by a certain node, we need to add another constraint which determines the order of availability of both resources:
Code Block language bash $ sudo pcs constraint order virtual_ip then FIXEdge Adding virtual_ip FIXEdge (kind: Mandatory) (Options: first-action=start then-action=start)
After configuring the cluster with the correct constraints, restart it and check the status:
Code Block language bash $ sudo pcs cluster stop --all && sudo pcs cluster start --all EVUAKYISD10AB.kyiv.epam.comNODE_1_NAME: Stopping Cluster... EVUAKYISD10AA.kyiv.epam.comNODE_2_NAME: Stopping Cluster... EVUAKYISD10AA.kyiv.epam.comNODE_2_NAME: Starting Cluster... EVUAKYISD10AB.kyiv.epam.comNODE_1_NAME: Starting Cluster...
- Cluster The cluster configuration is now completed.
Attachments
File | SizeCreator | Created | Comment | |
---|---|---|---|---|
587268 | Anton Abramov | Mar 06, 2015 21:49 | Modified Sender source code (Sending orders with incremented ClOrdID) | |
9475 | Anton AbramovJan 28, 2015 09:16 | ocf:heartbeat:anything agent for FIXEdge resource monitoring | ||
4976201 | Anton Abramov | Mar 08, 2015 01:06 | Clients simulator (binary) | |
34 KB | Maxim Vasin | Jan 20, 2020 |
FIX Logs Replicator
FIX Antenna Replication tool (RT) is typically used for FIX session recovery real-time replication on a backup host with FIX engine.
It is deployed on the primary FIX engine (server - publisher) and backup destination(s) (subscriber- client). RT uses a custom transfer protocol to synchronize FIX logs between the primary and backups hosts in real-time. It can be started at any time to resynchronize primary and backup FIX log files. It is highly optimized for high throughput to transfer data from multiple log files and to efficiently utilize bandwidth.
RT comes with an Admin interface to operate and monitor the state of the replication process.
FIXEdge Recovery Time Objective and Recovery Point Objective
...
- Some functions are more important than others and may need a quicker recovery.
- The period of time within which systems, applications, or functions must be recovered after an outage is referred to as Recovery Time Objectives (RTOs).
- All business functions, applications, or vendors that require a business continuity plan must have an RTO.
...
Failure of software or hardware within a node (e.g. application failure, node unavailability, resource exhausting). Failover Cluster consists of several application nodes. In case of one node fails – the application (or DB) is moved to another node and start starts there within approximately 2 minutes.
The session recovery procedure happens automatically. The missing messages should be recovered with a resend request procedure automatically.
Info |
---|
Session recovery requires reset requires a reset sequence in case of damaged storage. The messages from the beginning of the day will be lost when resetting the sequence. See Recovery procedure for a session with corrupted storages It is possible to manually request missing messages since sequence number equal one using the Logon (A) with the desired sequence and the subsequent Request Request (2) messages (see the FIX specification https://www.fixtrading.org/standards/) |
...
Failover scenario #2. Whole cluster failure
Disaster A disaster recovery environment should expect the connection. In case of disaster, the production environment is moved to DR.
...
Info |
---|
The better RPO values can be achieved with additional configuration or tools like the log replication tool This procedure requires additional configuration for each client individually. |
...