How to handle "Too many open files" error

During FIXEdge functioning you may encounter "Too many open files" error like:

FixEdge.log
ERROR   [Engine]  139902219888384  The dispatcher catch error during select() sockets: ::socket() failed. Too many open files. (Error code = 24).

 The cause is that your system has reached the limit of open files on the server, thus FIXEdge couldn’t open any more due to system limitations. This encounters both file descriptors and sockets.

Consequences are that FIXEdge will be unable to accept any new connections and current ones can potentially misbehave.

The reasons here could be:

  1. There are another processes that took all the available resources;
  2. Files, that had been opened by FIXEdge (for example, logs) were removed or moved on a hot run. Process still keeps descriptors for them;
  3. Unreleased/unclosed sockets.

Immediate workaround:

  1. Check your current system limitations for the user. (`ulimit –Sn; ulimit -Hn`);
  2. Increase  this limit.

Fairly, FIXEdge log doesn’t contain the list of all open file descriptors on the system.  To catch the root cause, you will need the output of `lsof` and `netstat -nap` commands at the same time the issue is happening.
For the initial investigation, please:

  1. Having FIXEdge running, do `lsof –p <FIX Edge process id>`:
    a) check for records with "(deleted)". This indicates that these files were removed while FIXEdge was running. FIXEdge restart is required in order to reveal these descriptors;
    b) check for records with "CLOSE_WAIT" or "CLOSING". This indicates issues with closure of sockets. 
  2. Contact your Unix Team for assistance;
  3. Collect all the information (commands output, FIXEdge log and configuration files) and send it over to Support FIX Products <SupportFIXAntenna@epam.com> for further investigation.