Elastic Scaling¶
The core Clearwater nodes have the ability to elastically scale; in other words, you can grow and shrink your deployment on demand, without disrupting calls or losing data.
This page explains how to use this elastic scaling function when using a deployment created through the automated or manual install processes. Note that, although the instructions differ between the automated and manual processes, the underlying operations that will be performed on your deployment are the same - the automated process simply uses chef to drive this rather than issuing the commands manually.
Before scaling your deployment¶
Before scaling up or down, you should decide how many each of Bono, Sprout, Homestead, Homer and Ralf nodes you need (i.e. your target size). This should be based on your call load profile and measurements of current systems, though based on experience we recommend scaling up a tier of a given type (sprout, bono, etc.) when the average CPU utilization within that tier reaches ~60%. The Deployment Sizing Spreadsheet may also provide useful input.
Performing the resize¶
If you did an Automated Install¶
To resize your automated deployment, run:
knife deployment resize -E <env> --sprout-count <n> --bono-count <n> --homer-count <n> --homestead-count <n> --ralf-count <n>
Where the <n>
values are how many nodes of each type you need. Once
this command has completed, the resize operation has completed and any
nodes that are no longer needed will have been terminated.
More detailed documentation on the available Chef commands is available here.
If you did a Manual Install¶
Follow these instructions if you manually installed your deployment and are using Clearwater’s automatic clustering and configuration sharing functionality.
If you’re scaling up your deployment, follow the following process:
- Spin up new nodes, following the standard install
process, but with the following modifications:
- Set the
etcd_cluster
so that it only includes the nodes that are already in the deployment (so it does not include the nodes being added). - Stop when you get to the “Provide Shared Configuration” step. The nodes will learn their configuration from the existing nodes.
- Set the
- Wait until the new nodes have fully joined the existing deployment.
To check if a node has joined the deployment:
- Run
/usr/share/clearwater/clearwater-cluster-manager/scripts/check_cluster_state
. This should report that the local node is in all of its clusters and that the cluster is stable. - Run
sudo /usr/share/clearwater/clearwater-config-manager/scripts/check_config_sync
. This reports when the node has learned its configuration.
- Run
- Update DNS to contain the new nodes.
If you’re scaling down your deployment, follow the following process:
- Update DNS to contain the nodes that will remain after the scale-down.
- On each node that is about to be turned down:
- Run
monit unmonitor -g <node-type>
. For example for a sprout node:monit unmonitor -g sprout
. On a homestead node also runmonit unmonitor -g homestead-prov
. - Start the main process quiescing.
- Sprout -
sudo service sprout quiesce
- Bono -
sudo service bono quiesce
- Homestead -
sudo service homestead stop && sudo service homestead-prov stop
- Homer -
sudo service homer stop
- Ralf -
sudo service ralf stop
- Ellis -
sudo service ellis stop
- Memento -
sudo service memento stop
- Sprout -
- Unmonitor the clearwater management processes:
sudo monit unmonitor clearwater_cluster_manager
sudo monit unmonitor clearwater_config_manager
sudo monit unmonitor -g etcd
- Run
sudo service clearwater-etcd decommission
. This will cause the nodes to leave their existing clusters.
- Run
- Once the above steps have completed, turn down the nodes.
If you did a Manual Install without Automatic Clustering¶
Follow these instructions if you manually installed your deployment but are not using Clearwater’s automatic clustering and configuration sharing functionality.
If you’re scaling up your deployment, follow the following process.
- Spin up new nodes, following the standard install process.
- On Sprout and Ralf nodes, update
/etc/clearwater/cluster_settings
to contain both a list of the old nodes (servers=...
) and a (longer) list of the new nodes (new_servers=...
) and then runservice <process> reload
to re-read this file. Do the same on Memento nodes, but use/etc/clearwater/memento_cluster_settings
as the file. - On new Memento, Homestead and Homer nodes, follow the instructions on the Cassandra website to join the new nodes to the existing cluster.
- On Sprout and Ralf nodes, update
/etc/chronos/chronos_cluster.conf
to contain a list of all the nodes (see here for details of how to do this) and then runservice chronos reload
to re-read this file. - On Sprout, Memento and Ralf nodes, run
service astaire reload
to start resynchronization. - On Sprout and Ralf nodes, run
service chronos resync
to start resynchronization of Chronos timers. - Update DNS to contain the new nodes.
- On Sprout, Memento and Ralf nodes, wait until Astaire has
resynchronized, either by running
service astaire wait-sync
or by polling over SNMP. - On Sprout and Ralf nodes, wait until Chronos has resynchronized,
either by running
service chronos wait-sync
or by polling over SNMP. - On all nodes, update /etc/clearwater/cluster_settings and
/etc/clearwater/memento_cluster_settings to just contain the new
list of nodes (
servers=...
) and then runservice <process> reload
to re-read this file.
If you’re scaling down your deployment, follow the following process.
- Update DNS to contain the nodes that will remain after the scale-down.
- On Sprout and Ralf nodes, update
/etc/clearwater/cluster_settings
to contain both a list of the old nodes (servers=...
) and a (shorter) list of the new nodes (new_servers=...
) and then runservice <process> reload
to re-read this file. Do the same on Memento nodes, but use/etc/clearwater/memento_clus ter_settings
as the file. - On leaving Memento, Homestead and Homer nodes, follow the instructions on the Cassandra website to remove the leaving nodes from the cluster.
- On Sprout and Ralf nodes, update
/etc/chronos/chronos_cluster.conf
to mark the nodes that are being scaled down as leaving (see here for details of how to do this) and then runservice chronos reload
to re-read this file. - On Sprout, Memento and Ralf nodes, run
service astaire reload
to start resynchronization. - On the Sprout and Ralf nodes that are staying in the Chronos
cluster, run
service chronos resync
to start resynchronization of Chronos timers. - On Sprout, Memento and Ralf nodes, wait until Astaire has
resynchronized, either by running
service astaire wait-sync
or by polling over SNMP. - On Sprout and Ralf nodes, wait until Chronos has resynchronized,
either by running
service chronos wait-sync
or by polling over SNMP. - On Sprout, Memento and Ralf nodes, update
/etc/clearwater/cluster_settings and
/etc/clearwater/memento_cluster_settings to just contain the new
list of nodes (
servers=...
) and then runservice <process> reload
to re-read this file. - On the Sprout and Ralf nodes that are staying in the cluster, update
/etc/chronos/chronos_cluster.conf
so that it only contains entries for the staying nodes in the cluster and then runservice chronos reload
to re-read this file. - On each node that is about to be turned down:
- Run
monit unmonitor -g <node-type>
. For example for a sprout node:monit unmonitor -g sprout
. On a homestead node also runmonit unmonitor -g homestead-prov
. - Start the main process quiescing.
- Sprout -
sudo service sprout quiesce
- Bono -
sudo service bono quiesce
- Homestead -
sudo service homestead stop
- Homer -
sudo service homer stop
- Ralf -
sudo service ralf stop
- Ellis -
sudo service ellis stop
- Memento -
sudo service memento stop
- Sprout -
- Run
- Turn down each of these nodes once the process has terminated.