Monitoring and Management of EPAM B2BITS products

This document describes the FIXEdge C++ monitoring and management features

Email alerting and notifications

Several application events can be handled on Business Layer via Rules and scripts. These events can be passed to SMTP Transport Adapter that will convert it to the notification e-mail.

See configuring SMTP adaptor for details


Event Description

Details

Severity/Demand

1

A session has been connected

Detect and react to session connection attempts.

Minor

2

A session has been established

Detect if the session passed all BL rules (like a credentials check) and FIXEdge established the connection.

Reaction on it can be scripted.

Major

3

A session has been disconnected

Detect if a session connection has been closed or dropped.

Reaction on it can be scripted.

Drop connection during working hours should be treated as a critical alert.

Critical


4

Business and routing rule failure

Detect if there has been failure causing message loss so a user can react on it and save the data

Reaction on it can be scripted.

It is useful if there are complex Business logic and scripts. However, using log monitoring can be more convenient.

Major


5

Can't deliver a message to the counterparty

Detect if FIXEdge can't deliver the message to the counterparty. E.g. the Counterparty session doesn't exist.

Reaction on it can be scripted.

Major

6

Session Reject (35=3) message has been received

Detect and react to rejects from the counterparty.

Happens when FIXEdge sends invalid messages according to the counterparty.

In the case of  FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. 

Minor

The severity depends on the business case.


7

Session Reject (35=3) message has been sent (outgoing session-level reject)

Detect and react to rejects sent from FIXEdge to counterparty.

Happens when counterparty sends invalid (from FIX standard or FIXEdge dictionary perspective) message to FIXEdge.

In the case of  FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. 

Processing of such event can be scripted in following way:

 Business layer event and JavaScript

BL_Config.xml:

<OnOutgoingSessionLevelRejectEvent>
  <Source>
    <FixSession TargetCompID=".*" SenderCompID=".*"/>
  </Source>
  <Action>
    <Script Language="JavaScript" FileName="FIXEdge1/conf/email.js" />
    <Send  Name="TestSMTPClient"/>
  </Action>
</OnOutgoingSessionLevelRejectEvent>


email.js:

seqNo = getStringField(45);
tagNo = getStringField(371);
msgType = getStringField(372);
rejCode = getStringField(373);
rejDetails = getStringField(58);


parseMessage("8=FIX.4.4\x019=56\x0135=C\x0149=TEST\x0156=TEST\x0134=14\x0152=20030204-09:25:43\x01164=1\x0194=0\x01147=Session-level Reject notification\x0133=5\x0158=Text1\x0158=Text2\x0158=Text3\x0158=Text4\x0158=Text5\x0110=139\x01");


lines = getGroup(33);
isGroupValid(lines);

setStringField(lines, 0, 58, "Message with sequence number " + seqNo + " was rejected");

if (msgType != undefined) {setStringField(lines, 1, 58, "Type of message is " + msgType);}
else {setStringField(lines, 1, 58, "Type of message is UNKNOWN");}

if (tagNo != undefined) {setStringField(lines, 2, 58, "Invalid tag number is " + tagNo);}
else {setStringField(lines, 2, 58, "Invalid tag number is UNKNOWN");}

if (rejCode != undefined) {setStringField(lines, 3, 58, "Reject reason code is " + rejCode + ". For details please refer to https://btobits.com/fixopaedia/fixdic44/index.html?tag_373_SessionRejectReason.html");}
else {setStringField(lines, 3, 58, "Reject reason code is UNKNOWN");}

if (rejDetails != undefined) {setStringField(lines, 4, 58, "Explanation of reject: " + rejDetails);}
else {setStringField(lines, 4, 58, "Explanation of reject: UNKNOWN");}


Outgoing session-level reject will be processed by JavaScript - important data will be formatted and sent by email:

Message with sequence number 2 was rejected
Type of message is UNKNOWN
Invalid tag number is 35
Reject reason code is 11. For details please refer to https://btobits.com/fixopaedia/fixdic44/index.html?tag_373_SessionRejectReason.html
Explanation of reject: Invalid MsgType. Parsing stopped at column: 20 at tag MsgType (35) in message  with sequence number 2.

Minor

The severity depends on the business case.


System protection events



8

Memory has been exhausted.

Exhausting memory causes application failure.

Works for windows.

Can be covered by some other system monitoring tools.

Critical


9

Counterparty sent too many messages per workday

Configurable with IncomingMessagesLimit session parameter

Actual for server type applications with many sessions

Minor

10

The counterparty is sending messages too fast.

Configurable with IncomingThroughputLimit session parameter

Actual for server type applications with many sessions

It is not recommended for a solution with only 1 session.

Minor

11

FIXEdge is sending messages to slow consumer

Slow consumption by counterparty leads to the growth of internal memory queue that can lead to system resource exhausting.

Configurable with OutgoingQueueSize session parameter

Major


Other internal events

12

Received Logon


Minor
13Received Logout
Minor
14Detected Sequence Gap
Minor
15

Detected Fatal error


Critical

Log monitoring and integration with event monitoring systems

FIXEdge saves log records about activities during work. 

Logs can be stored to files on disk or send directly to TCP so it can be integrated with Log monitoring systems like Splunk (see How to integrate FIXEdge with Splunk)

Logs records have different severities allowing the monitoring systems to filter important events (see FIXEdge logs format).

Also, FIXEdge can send lifecycle events in CEF (common event format) via TCP (See How to configure forwarding FIXEdge lifecycle events to ArcSight). Therefore, the user can get information about FIXEdge start and stop.


Event Description

Details

Severity/Demand


FIXEdge lifecycle events


1

Application is being to be started and the log system is initialized

FIXEdge generates AppStarting event

Minor

2

Properties are loaded and sessions are run

FIXEdge generates AppStarted event

Minor

3

All components of the application were started

Detect application start moment

FIXEdge generates AppReady event

Major

4

Application is crashed

Some crashes can't be captured

FIXEdge generates AppFailed event

Critical 

5

Signal SIGINT or SITERM is detected

Detect manual termination of the process with signals SIGINT and SIGTERM 

FIXEdge generates AppSigTermDetected event

Major

6

Application is being to shut down (stop sessions / destroy objects)

FIXEdge generates AppShutdown event

Minor

7

All work is done, stopped as planned

Detect graceful termination of FIXEdge

FIXEdge generates AppComplete event

Major


Other important events that can be found in the logs and are worth monitoring


8

The gap in sequence numbers and Resend request from FIXEdge


Major

9

Validation errors


Minor

10

Session level rejects


Minor

11

Mass disconnections

Simultaneously disconnection of multiple sessions.

Critical

12

Any other errors and warnings


Minor

13

Any FATAL record.


Critical

14

License is expired


Critical

Integration with Windows Events Log and Linux Syslog

FIXEdge raises events in the Windows Events Log or writes information about critical failures to Syslog in Linux if the other ways of logging is not available.

For example:

  1. FIXEdge initialization failed.
  2. Logging is not permitted
  3. Logging is failed due to not enough space.

FIXICC Monitoring and management

FIX Integrated Control Center is an application providing out-of-the-box monitoring and administration capabilities for FIX Edge and any applications embedding FIX Antenna C++, FIX Antenna Java, FIX Antenna .Net. FIXICC is a Java-based standalone application that runs on any platform.

See FIXICC User Guide

The most common useful features are:

  1. FIXEdge current state
  2. The list of configured sessions and their state
  3. Session sequence number
  4. Session parameters
  5. Start FIXEdge / Stop FIXEdge
  6. Start session / Stop session
  7. Reset / set session sequence
  8. Send a message to the session

REST admin API for FIXEdge monitoring and management

Since 6.4.0 version of FIXEdge REST admin API is available. Users can get session states and push them to some external monitoring system.

List of Rest API commands:

  1. Reload BL configuration.
  2. The set Sequence number for the session
  3. Remote FIXEdge shutdown
  4. Get a list of sessions
  5. Get session parameters.
  6. Get the last 1000 messages for the session
  7. Send a message to the session
  8. Stop/start a session
  9. Add a session
  10. Remove a session

If you want to integrate it with your monitoring system via REST API let us know.

Message monitoring with FIXEYE

FIXEye log analyzer is a user-friendly tool for FIX messages analysis purposes.

List of useful features:

  1. Get all messages from the logs (FIX-message logs or application logs)
  2. Show all trades for a specific instrument/ID
  3. Search / filter for the message type
  4. Search / filter for the message content
  5. Send e-mail notifications on session state changes  with FIX Event viewer 
  6. Send e-mail notifications if there is an order acknowledge (delay between receiving and sending a response) time is expired.   

Recommendations for checks that should be implemented by 3rd party monitoring software

It is critical to monitor other system statuses like disk space utilization.

Popular items to be checked via other software:

  • Free disk space
  • CPU usage
  • Memory usage
  • fixicc-agent is running
  • file descriptors are used
  • SSL certificate is up-to-date
  • FIXEdge license file exists
  • FIXEdge configuration file exists
  • FIXEdge pid file exists
  • Sessions logs exist
  • FIXEdge process is running
  • FIXEdge is accepting a connection on port

These metrics are supported by the popular monitoring system out-of-the-box. Usually, clients add these checks to their own monitoring systems.

Examples of custom monitoring metrics that done for other clients as solution customization

EPAM provides services of implementation customized monitoring metrics and KPI.

For example:

  • The total Messages count passed through the system
  • The total business messages processed per each FIX session
  • The total number of business rejects received or send to FIX session
  • The total number of rejects received or send to FIX session
  • The total number of messages with certain content comes to the system
  • Average messages burst size.
  • System latency
  • Session availability %
  • Data consistency in DB. Comparison of compare messages in the logs and messages stored to the DB

Please contact sales@btobits.com for requesting a service of monitoring metrics customizations and integration with 3rd party monitoring software.