Overview
Cluster service is a distributed service that allows finding its nodes within a network automatically, and controlling the presence of no more than one leader node.
Cluster implementation is scalable and customizable and provides several endpoints for integration.
The ClusterManager
interface manages interaction between an application and a cluster. In addition, it allows subscription for the current node state and for the cluster state.
With ClusterManager,
it is possible to subscribe for two groups of events:
LocalNodeLeaderListener
– notifications about leader eventsLocalNodeBackupListener
– notifications about backup events
An application can either use both listeners at the same time or only one that best fits a certain goal. For example, if an active node should have an active server, you only have to implement LocalNodeLeaderListener
and put start and stop server calls into its methods.
To make it easier, we also provide LocalNodeLeaderListenerAdaptor
and LocalNodeBackupListenerAdaptor.
With their help you should only override necessary methods.
The default implementation of the Cluster Manager is based on Hazelcast. We use Hazelcast as a distributed cache with a cluster state. Hazelcast also helps resolve nodes within a network.
Hazelcast supports several different transports including multicast and TCP. The default configuration uses multicast so you must have multicast enabled on your network for this to work or update the cluster.xml
configuration file.
Cluster implementation can also configure the quorum for automatic leader election.
Current implementation allows you to automatically select the leader at the cluster init only (when there was no leader at all). If, for some reason, the leader disappears from the cluster, a new one will need to be selected in manual mode.
Sample of Cluster Manager API use
The Cluster Manager is provided with built-in default settings. This allows you to start the Cluster service with minimum effort.
public class ClusterSample { public static void main(String[] args) { ClusterManager clusterManager = new HazelcastClusterManager(); //(1) clusterManager.addLocalNodeLeaderListener(new LocalNodeLeaderListenerAdaptor() { @Override public void onGranted() { //(2) System.out.println("This node was elected as a leader"); } @Override public void onRevoked() { //(3) System.out.println("This node isn't leader any more"); } }); clusterManager.join(); //(4) clusterManager.electLeader(clusterManager.localNode().id()); //(5) //pause 1 sec try { Thread.sleep(1000); } catch (InterruptedException e) { } //print nodes clusterManager.nodes().stream().forEach(System.out::println); //(6) clusterManager.leave(); //(7) } }
- Create an instance of the Cluster Manager with default configuration.
- This method will be called when the node becomes a leader.
- This method will be called when the leader node services should be stopped.
- Join a node to the cluster.
- Assign this node as a cluster leader.
- Print all nodes in this cluster.
- Leave the cluster.
In this case, the cluster will use the 224.2.2.3:54327 UDP multicast address for communication. The size of the quorum for the default cluster is equal to 2 (the leader will be selected automatically if there are two or more nodes in the cluster).
Cluster service lifecycle
The cluster service uses the exchange of events between nodes to manage the cluster and notify nodes about state changes.
Joining a new node to a cluster
After calling the join()
method, the node subscribes to cluster events. The rest of the cluster nodes will also receive a notification about the new node in the cluster. On the node that is a leader, the LocalNodeLeaderListener.backupAdded()
method will be called.
If a leader is present in the cluster, then the current node will be started in backup mode. If the leader is absent, then it may be initiated by the leader election procedure.
Automatic election of a new leader
The procedure of automatic leader selection is started at the cluster init.
Each node at start receives information about available nodes in the cluster and decides whether or not it should be elected a leader in that moment. This decision is based on the presence or absence of a quorum (see Hazelcast configuration). It is considered that the cluster has a quorum (and the leader can be chosen) if
S/2+1 >= Q
where S is the number of nodes in the cluster at the given moment and Q is the configured quorum size.
If the quorum size is not specified, it is considered that even one node already composes a quorum.
Current implementation only automatically chooses a leader in the cluster when the cluster did not have a leader before.
The first time leader selection algorithm is:
If the new node has decided that it is necessary to choose a leader, it sends the
LEADER_EVENT
event to the cluster and offers a new leader.Once the proposed new leader gets the
LEADER_EVENT
event with its ID, it notifies the application about its new status by callingLocalNodeLeaderListener.onGranted()
and sends theLEADER_STARTED_EVENT
event with its ID back to the cluster.When the rest of the cluster nodes receive the
LEADER_STARTED_EVENT
event , they set their state to BACKUP (mark themselves as backup), and notify their applications about the new status by calling LocalNodeBackupListener.onBackup().
You can check the presence of the leader in the cluster by using HazelcastClusterManager.hasLeader()
method.
Appointment of a new leader
A new leader can be assigned by calling the HazelcastClusterManager.electLeader()
method and passing the node ID. A new leader can be assigned whether or not a cluster has a leader. If a cluster already has a leader, it will be deactivated.
The algorithm of assigning a new leader is:
The node that calls the
HazelcastClusterManager.electLeader()
method sendsthe clusterLEADER_EVENT
event a new leader ID.When the backup node receives the
LEADER EVENT
event, it calls theLocalNodeBackupListener.offBackup()
method.The node that was elected as the new leader launches a timer and waits for the old leader to be finished. This mechanism was introduced to minimize the possibility of two leaders in the cluster working simultaneously. The old leader needs some time to complete its processes. A timer is used to protect the cluster from endless waiting when the old leader becomes inaccessible or crashes while leaders are being switched.
The old leader receives the
LEADER_EVENT
event and notifies its application about the status change by callingLocalNodeLeaderListener.onRevoked()
. After this call is completed successfully, it sends aLEADER_STOPPED_EVENT
event to the cluster.When the new leader node receives a
LEADER_STOPPED_EVENT
event, (or if it doesn’t receive this event duringtimeoutLeaderShutdown
period), it notifies the application about the new status by callingLocalNodeLeaderListener.onGranted()
and sends to the clusterLEADER_STARTED_EVENT
event with its ID.When the rest of the cluster nodes receive the
LEADER_STARTED_EVENT
event, they change their state to BACKUP (mark themselves as backup) and notify their applications about the new status by callingLocalNodeBackupListener.onBackup()
.
Recall the leader
There is a way to recall the leader. After recalling, all nodes will be in the backup state.
The algorithm of recalling the leader is the following:
The node that calls the
HazelcastClusterManager.recallLeader()
method sends theLEADER_RECALL_EVENT
event with the leader ID to the cluster.The current leader receives the
LEADER_RECALL_EVENT
event and notifies its application about the status change by callingLocalNodeLeaderListener.onRevoked()
. After the successful completion of this call, it sends theLEADER_STOPPED_EVENT
event to the cluster.
Automatic leader re-election
Since version 1.4.0, by default, the new leader can be automatically elected when the existing leader is gone. Configuration options are described here.