Monitoring and Management of EPAM B2BITS products
- 1 Email alerting and notifications
- 2 Log monitoring and integration with event monitoring systems
- 3 Integration with Windows Events Log and Linux Syslog
- 4 FIXICC Monitoring and management
- 5 REST admin API for FIXEdge monitoring and management
- 6 Message monitoring with FIXEYE
- 7 Recommendations for checks that should be implemented by 3rd party monitoring software
- 8 Examples of custom monitoring metrics that done for other clients as solution customization
This document describes the FIXEdge C++ monitoring and management features
Email alerting and notifications
Several application events can be handled on Business Layer via Rules and scripts. These events can be passed to SMTP Transport Adapter that will convert it to the notification e-mail.
See configuring SMTP adaptor for details
Event Description | Details | Severity/Demand | |
|---|---|---|---|
| 1 | A session has been connected | Detect and react to session connection attempts. | Minor |
| 2 | A session has been established | Detect if the session passed all BL rules (like a credentials check) and FIXEdge established the connection. Reaction on it can be scripted. | Major |
| 3 | A session has been disconnected | Detect if a session connection has been closed or dropped. Reaction on it can be scripted. Drop connection during working hours should be treated as a critical alert. | Critical
|
| 4 | Business and routing rule failure | Detect if there has been failure causing message loss so a user can react on it and save the data Reaction on it can be scripted. It is useful if there are complex Business logic and scripts. However, using log monitoring can be more convenient. | Major
|
| 5 | Can't deliver a message to the counterparty | Detect if FIXEdge can't deliver the message to the counterparty. E.g. the Counterparty session doesn't exist. Reaction on it can be scripted. | Major |
| 6 | Session Reject (35=3) message has been received | Detect and react to rejects from the counterparty. Happens when FIXEdge sends invalid messages according to the counterparty. In the case of FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. | Minor The severity depends on the business case.
|
| 7 | Session Reject (35=3) message has been sent (outgoing session-level reject) | Detect and react to rejects sent from FIXEdge to counterparty. Happens when counterparty sends invalid (from FIX standard or FIXEdge dictionary perspective) message to FIXEdge. In the case of FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. Processing of such event can be scripted in following way: BL_Config.xml: <OnOutgoingSessionLevelRejectEvent>
<Source>
<FixSession TargetCompID=".*" SenderCompID=".*"/>
</Source>
<Action>
<Script Language="JavaScript" FileName="FIXEdge1/conf/email.js" />
<Send Name="TestSMTPClient"/>
</Action>
</OnOutgoingSessionLevelRejectEvent>
email.js: seqNo = getStringField(45);
tagNo = getStringField(371);
msgType = getStringField(372);
rejCode = getStringField(373);
rejDetails = getStringField(58);
parseMessage("8=FIX.4.4\x019=56\x0135=C\x0149=TEST\x0156=TEST\x0134=14\x0152=20030204-09:25:43\x01164=1\x0194=0\x01147=Session-level Reject notification\x0133=5\x0158=Text1\x0158=Text2\x0158=Text3\x0158=Text4\x0158=Text5\x0110=139\x01");
lines = getGroup(33);
isGroupValid(lines);
setStringField(lines, 0, 58, "Message with sequence number " + seqNo + " was rejected");
if (msgType != undefined) {setStringField(lines, 1, 58, "Type of message is " + msgType);}
else {setStringField(lines, 1, 58, "Type of message is UNKNOWN");}
if (tagNo != undefined) {setStringField(lines, 2, 58, "Invalid tag number is " + tagNo);}
else {setStringField(lines, 2, 58, "Invalid tag number is UNKNOWN");}
if (rejCode != undefined) {setStringField(lines, 3, 58, "Reject reason code is " + rejCode + ". For details please refer to https://btobits.com/fixopaedia/fixdic44/index.html?tag_373_SessionRejectReason.html");}
else {setStringField(lines, 3, 58, "Reject reason code is UNKNOWN");}
if (rejDetails != undefined) {setStringField(lines, 4, 58, "Explanation of reject: " + rejDetails);}
else {setStringField(lines, 4, 58, "Explanation of reject: UNKNOWN");}
Outgoing session-level reject will be processed by JavaScript - important data will be formatted and sent by email: Message with sequence number 2 was rejected
Type of message is UNKNOWN
Invalid tag number is 35
Reject reason code is 11. For details please refer to https://btobits.com/fixopaedia/fixdic44/index.html?tag_373_SessionRejectReason.html
Explanation of reject: Invalid MsgType. Parsing stopped at column: 20 at tag MsgType (35) in message with sequence number 2. | Minor The severity depends on the business case. |
| 8 | System protection events |
|
|
| 9 | Memory has been exhausted. | Exhausting memory causes application failure. Works for windows. Can be covered by some other system monitoring tools. | Critical
|
| 10 | Counterparty sent too many messages per workday | Configurable with IncomingMessagesLimit session parameter Actual for server type applications with many sessions | Minor |
| 11 | The counterparty is sending messages too fast. | Configurable with IncomingThroughputLimit session parameter Actual for server type applications with many sessions It is not recommended for a solution with only 1 session. | Minor |
| 12 | FIXEdge is sending messages to slow consumer | Slow consumption by counterparty leads to the growth of internal memory queue that can lead to system resource exhausting. Configurable with OutgoingQueueSize session parameter | Major |
| 13 | Other internal events |
|
|
| 14 | Received Logon |
| Minor |
| 15 | Received Logout |
| Minor |
| 16 | Detected Sequence Gap |
| Minor |
| 17 | Detected Fatal error |
| Critical |
Log monitoring and integration with event monitoring systems
FIXEdge saves log records about activities during work.
Logs can be stored to files on disk or send directly to TCP so it can be integrated with Log monitoring systems like Splunk (see How to integrate FIXEdge with Splunk)
Logs records have different severities allowing the monitoring systems to filter important events (see FIXEdge logs format).
Also, FIXEdge can send lifecycle events in CEF (common event format) via TCP (See How to configure forwarding FIXEdge lifecycle events to ArcSight). Therefore, the user can get information about FIXEdge start and stop.
Event Description | Details | Severity/Demand | |
|---|---|---|---|
| 1 | FIXEdge lifecycle events |
| |
| 2 | Application is being to be started and the log system is initialized | FIXEdge generates AppStarting event | Minor |
| 3 | Properties are loaded and sessions are run | FIXEdge generates AppStarted event | Minor |
| 4 | All components of the application were started | Detect application start moment FIXEdge generates AppReady event | Major |
| 5 | Application is crashed | Some crashes can't be captured FIXEdge generates AppFailed event | Critical |
| 6 | Signal SIGINT or SITERM is detected | Detect manual termination of the process with signals SIGINT and SIGTERM FIXEdge generates AppSigTermDetected event | Major |
| 7 | Application is being to shut down (stop sessions / destroy objects) | FIXEdge generates AppShutdown event | Minor |
| 8 | All work is done, stopped as planned | Detect graceful termination of FIXEdge FIXEdge generates AppComplete event | Major |
| 9 | Other important events that can be found in the logs and are worth monitoring |
| |
| 10 | The gap in sequence numbers and Resend request from FIXEdge |
| Major |
| 11 | Validation errors |
| Minor |
| 12 | Session level rejects |
| Minor |
| 13 | Mass disconnections | Simultaneously disconnection of multiple sessions. | Critical |
| 14 | Any other errors and warnings |
| Minor |
| 15 | Any FATAL record. |
| Critical |
| 16 | License is expired |
| Critical |
Integration with Windows Events Log and Linux Syslog
FIXEdge raises events in the Windows Events Log or writes information about critical failures to Syslog in Linux if the other ways of logging is not available.
For example:
FIXEdge initialization failed.
Logging is not permitted
Logging is failed due to not enough space.
FIXICC Monitoring and management
FIX Integrated Control Center is an application providing out-of-the-box monitoring and administration capabilities for FIX Edge and any applications embedding FIX Antenna C++, FIX Antenna Java, FIX Antenna .Net. FIXICC is a Java-based standalone application that runs on any platform.
The most common useful features are:
FIXEdge current state
The list of configured sessions and their state
Session sequence number
Session parameters
Start FIXEdge / Stop FIXEdge
Start session / Stop session
Reset / set session sequence
Send a message to the session
REST admin API for FIXEdge monitoring and management
Since 6.4.0 version of FIXEdge REST admin API is available. Users can get session states and push them to some external monitoring system.
List of Rest API commands:
Reload BL configuration.
The set Sequence number for the session
Remote FIXEdge shutdown
Get a list of sessions
Get session parameters.
Get the last 1000 messages for the session
Send a message to the session
Stop/start a session
Add a session
Remove a session
If you want to integrate it with your monitoring system via REST API let us know.
Message monitoring with FIXEYE
A FIXEye log analyzer is a user-friendly tool for FIX messages analysis purposes.
List of useful features:
Get all messages from the logs (FIX-message logs or application logs)
Show all trades for a specific instrument/ID
Search / filter for the message type
Search / filter for the message content
Send e-mail notifications on session state changes with FIX Event viewer
Send e-mail notifications if there is an order acknowledge (delay between receiving and sending a response) time is expired.
Recommendations for checks that should be implemented by 3rd party monitoring software
It is critical to monitor other system statuses like disk space utilization.
Popular items to be checked via other software:
Free disk space
CPU usage
Memory usage
fixicc-agent is running
file descriptors are used
SSL certificate is up-to-date
FIXEdge license file exists
FIXEdge configuration file exists
FIXEdge pid file exists
Sessions logs exist
FIXEdge process is running
FIXEdge is accepting a connection on port
These metrics are supported by the popular monitoring system out-of-the-box. Usually, clients add these checks to their own monitoring systems.
Examples of custom monitoring metrics that done for other clients as solution customization
EPAM provides services of implementation customized monitoring metrics and KPI.
For example:
The total Messages count passed through the system
The total business messages processed per each FIX session
The total number of business rejects received or send to FIX session
The total number of rejects received or send to FIX session
The total number of messages with certain content comes to the system
Average messages burst size.
System latency
Session availability %
Data consistency in DB. Comparison of compare messages in the logs and messages stored to the DB
Please contact sales@btobits.com for requesting a service of monitoring metrics customizations and integration with 3rd party monitoring software.