FIX Antenna HFT Benchmarks

Overview

There are two main patterns that can be used when processing incoming messages in FIX Antenna HFT:

  1. Processing messages in the context of the FIX Antenna's receiver thread.
  2. Processing messages in the user's thread.

The first one is used when message processing or routing does not take a significant amount of time. In this case, messages could be routed directly from the process() callback method. This assures the lowest latency possible.

If the number of incoming sessions is big, it is advised to set TCPDispatcher.NumberOfWorkers = 3, which means 3 dedicated threads for running epoll_wait calls. Also, it is recommended to either set TCPDispatcher.IncomingConnectionOnloadStackAffinity = false or specify several listen ports - this will assure even load distribution from all acceptors.

The second one (called a thread pool) is used when message processing or routing takes a significant amount of time. In this scenario, the client code should create a thread pool (using Utils::ThreadsPool class ) inside the application and pass the messages pointer to the internal queue in the process() callback method in order for the thread pool to start processing messages.
The above considerations are illustrated in the diagram below:

Actually, there are many sessions in the 'With user thread pool' diagram, and each of them works as shown.

Test scenario

The test scenario is the following:

  • Two test servers are connected via LAN.
  • The Sender application is launched on the first server, the Router application is launched on the second server.
  • The Sender establishes several FIX sessions to the Router and sends messages to the Receiver in all sessions simultaneously.
  • The Receiver, when receives a message from any of the established sessions, then sends this message to randomly picked up session back to the Sender.

The following parameters are measured during the test:

    • Round-trip time (RTT);
    • Socket to the process() callback latency on the simple router;
    • The put() callback to socket latency on the simple router.

The test configuration schema (70 Sender (Initiator) sessions - 20 Receiver (Acceptor) sessions) is the following:

  

Environment

Machine 1:

CPU: Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz  (2 CPU  Hyper-Trading Enabled, 20 Cores)
RAM: 128 GB, 2133 MHz
NIC: Solarflare Communications SFC9120
HDD

Centos 7, 3.10.0-123
SolarFlare driver version: 4.1.0.6734a
firmware-version: 4.2.2.1003 rx1 tx1


Machine 2:

CPU:  Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz (2 CPU Hyper-Trading enabled, 24 Cores) 
RAM: 128 GB, 2133 MHz
NIC: Solarflare Communications SFC9120
HDD

CentOS 7, 3.10.0-123
SolarFlare driver version: 4.1.0.6734a
firmware-version: 4.2.2.1003 rx1 tx1

Results

Test results for different numbers of Sender sessions and Receiver threads (the 50th percentile values) are presented in the table below.

Number of senders
(sources)
Message rates per sender, 
msg/s

Number of receivers 
(UserApps)
(options to route to)

Message rates total for all senders, 
msg/s
Without thread poolWith thread pool
RTT/2, 
ns

Socket to process()
latency, ns

put() to socket 
latency, ns
RTT/2, 
ns

Socket to process()
latency, ns

put() to socket 
latency, ns
11,000101,0004,7883855915,147780678
15,000105,0004,6883725565,210

110,0001010,0004,6443915745,196

51,000105,0005,0354015795,346788662
55,0001025,0005,0244195525,508

510,0001050,0004,9354105585,301

201,0002020,0005,517--5,920843684
205,00020100,0005,597--5,865

2010,00020200,0005,504--5,799

701,0002070,0006,4794736206,712882680
705,00020350,0006,5504365506,835

7010,00020700,0007,213--7,243

The histograms below show the distribution of the RTT/2 for 1,000; 5,000 and 10,000 msg/s message rates for configuration with 70 senders and 20 receivers without thread pool and with a thread pool.

The following histogram compares the distribution of RTT/2 for a 10,000 msg/s message rate for configuration with 70 senders and 20 receivers without thread pool and with thread pool.