Monitoring and Management of EPAM B2BITS products
This document describes the FIXEdge C++ monitoring and management features
Email alerting and notifications
Several application events can be handled on Business Layer via Rules and scripts. These events can be passed to SMTP Transport Adapter that will convert it to the notification e-mail.
See configuring SMTP adaptor for details
Event Description | Details | Severity/Demand | |
---|---|---|---|
1 | A session has been connected | Detect and react to session connection attempts. | Minor |
2 | A session has been established | Detect if the session passed all BL rules (like a credentials check) and FIXEdge established the connection. Reaction on it can be scripted. | Major |
3 | A session has been disconnected | Detect if a session connection has been closed or dropped. Reaction on it can be scripted. Drop connection during working hours should be treated as a critical alert. | Critical |
4 | Business and routing rule failure | Detect if there has been failure causing message loss so a user can react on it and save the data Reaction on it can be scripted. It is useful if there are complex Business logic and scripts. However, using log monitoring can be more convenient. | Major |
5 | Can't deliver a message to the counterparty | Detect if FIXEdge can't deliver the message to the counterparty. E.g. the Counterparty session doesn't exist. Reaction on it can be scripted. | Major |
6 | Session Reject (35=3) message has been received | Detect and react to rejects from the counterparty. Happens when FIXEdge sends invalid messages according to the counterparty. In the case of FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. | Minor The severity depends on the business case. |
7 | Session Reject (35=3) message has been sent (outgoing session-level reject) | Detect and react to rejects sent from FIXEdge to counterparty. Happens when counterparty sends invalid (from FIX standard or FIXEdge dictionary perspective) message to FIXEdge. In the case of FIXEdge routes, messages to Exchanges any event related to Session Reject means that the solution is working unexpectedly. Processing of such event can be scripted in following way: | Minor The severity depends on the business case. |
System protection events | |||
8 | Memory has been exhausted. | Exhausting memory causes application failure. Works for windows. Can be covered by some other system monitoring tools. | Critical |
9 | Counterparty sent too many messages per workday | Configurable with IncomingMessagesLimit session parameter Actual for server type applications with many sessions | Minor |
10 | The counterparty is sending messages too fast. | Configurable with IncomingThroughputLimit session parameter Actual for server type applications with many sessions It is not recommended for a solution with only 1 session. | Minor |
11 | FIXEdge is sending messages to slow consumer | Slow consumption by counterparty leads to the growth of internal memory queue that can lead to system resource exhausting. Configurable with OutgoingQueueSize session parameter | Major |
Other internal events | |||
12 | Received Logon | Minor | |
13 | Received Logout | Minor | |
14 | Detected Sequence Gap | Minor | |
15 | Detected Fatal error | Critical |
Log monitoring and integration with event monitoring systems
FIXEdge saves log records about activities during work.
Logs can be stored to files on disk or send directly to TCP so it can be integrated with Log monitoring systems like Splunk (see How to integrate FIXEdge with Splunk)
Logs records have different severities allowing the monitoring systems to filter important events (see FIXEdge logs format).
Also, FIXEdge can send lifecycle events in CEF (common event format) via TCP (See How to configure forwarding FIXEdge lifecycle events to ArcSight). Therefore, the user can get information about FIXEdge start and stop.
Event Description | Details | Severity/Demand | |
---|---|---|---|
FIXEdge lifecycle events | |||
1 | Application is being to be started and the log system is initialized | FIXEdge generates AppStarting event | Minor |
2 | Properties are loaded and sessions are run | FIXEdge generates AppStarted event | Minor |
3 | All components of the application were started | Detect application start moment FIXEdge generates AppReady event | Major |
4 | Application is crashed | Some crashes can't be captured FIXEdge generates AppFailed event | Critical |
5 | Signal SIGINT or SITERM is detected | Detect manual termination of the process with signals SIGINT and SIGTERM FIXEdge generates AppSigTermDetected event | Major |
6 | Application is being to shut down (stop sessions / destroy objects) | FIXEdge generates AppShutdown event | Minor |
7 | All work is done, stopped as planned | Detect graceful termination of FIXEdge FIXEdge generates AppComplete event | Major |
Other important events that can be found in the logs and are worth monitoring | |||
8 | The gap in sequence numbers and Resend request from FIXEdge | Major | |
9 | Validation errors | Minor | |
10 | Session level rejects | Minor | |
11 | Mass disconnections | Simultaneously disconnection of multiple sessions. | Critical |
12 | Any other errors and warnings | Minor | |
13 | Any FATAL record. | Critical | |
14 | License is expired | Critical |
Integration with Windows Events Log and Linux Syslog
FIXEdge raises events in the Windows Events Log or writes information about critical failures to Syslog in Linux if the other ways of logging is not available.
For example:
- FIXEdge initialization failed.
- Logging is not permitted
- Logging is failed due to not enough space.
FIXICC Monitoring and management
FIX Integrated Control Center is an application providing out-of-the-box monitoring and administration capabilities for FIX Edge and any applications embedding FIX Antenna C++, FIX Antenna Java, FIX Antenna .Net. FIXICC is a Java-based standalone application that runs on any platform.
The most common useful features are:
- FIXEdge current state
- The list of configured sessions and their state
- Session sequence number
- Session parameters
- Start FIXEdge / Stop FIXEdge
- Start session / Stop session
- Reset / set session sequence
- Send a message to the session
REST admin API for FIXEdge monitoring and management
Since 6.4.0 version of FIXEdge REST admin API is available. Users can get session states and push them to some external monitoring system.
List of Rest API commands:
- Reload BL configuration.
- The set Sequence number for the session
- Remote FIXEdge shutdown
- Get a list of sessions
- Get session parameters.
- Get the last 1000 messages for the session
- Send a message to the session
- Stop/start a session
- Add a session
- Remove a session
If you want to integrate it with your monitoring system via REST API let us know.
Message monitoring with FIXEYE
A FIXEye log analyzer is a user-friendly tool for FIX messages analysis purposes.
List of useful features:
- Get all messages from the logs (FIX-message logs or application logs)
- Show all trades for a specific instrument/ID
- Search / filter for the message type
- Search / filter for the message content
- Send e-mail notifications on session state changes with FIX Event viewer
- Send e-mail notifications if there is an order acknowledge (delay between receiving and sending a response) time is expired.
Recommendations for checks that should be implemented by 3rd party monitoring software
It is critical to monitor other system statuses like disk space utilization.
Popular items to be checked via other software:
- Free disk space
- CPU usage
- Memory usage
- fixicc-agent is running
- file descriptors are used
- SSL certificate is up-to-date
- FIXEdge license file exists
- FIXEdge configuration file exists
- FIXEdge pid file exists
- Sessions logs exist
- FIXEdge process is running
- FIXEdge is accepting a connection on port
These metrics are supported by the popular monitoring system out-of-the-box. Usually, clients add these checks to their own monitoring systems.
Examples of custom monitoring metrics that done for other clients as solution customization
EPAM provides services of implementation customized monitoring metrics and KPI.
For example:
- The total Messages count passed through the system
- The total business messages processed per each FIX session
- The total number of business rejects received or send to FIX session
- The total number of rejects received or send to FIX session
- The total number of messages with certain content comes to the system
- Average messages burst size.
- System latency
- Session availability %
- Data consistency in DB. Comparison of compare messages in the logs and messages stored to the DB
Please contact sales@btobits.com for requesting a service of monitoring metrics customizations and integration with 3rd party monitoring software.