SNMP Statistics¶
Clearwater provides a set of statistics about the performance of each Clearwater nodes over SNMP. Currently, this is available on Bono, Sprout, Ralf and Homestead nodes, and the MMTel, Call Diversion, Memento and Gemini Application Server nodes. A number of the statistics we offer detail the current state of the system in use, and as such are termed ‘Stateful’ statistics. These statistics will be marked as ‘(stateful)’ below, and more information on them can be found here
Configuration¶
These SNMP statistics require:
- the clearwater-snmpd package to be installed for all node types
- the clearwater-snmp-handler-astaire package to be installed for Sprout and Ralf nodes
These packages will be automatically installed when installing through
the Chef automation system; for a manual install, you will need to
install the packages with sudo apt-get install
.
Usage¶
Clearwater nodes provide SNMP statistics over port 161 using SNMP v2c
and community clearwater
. The MIB definition file can be downloaded
from
here.
Our SNMP statistics are provided through plugins or subagents to the standard SNMPd packaged with Ubuntu, so querying port 161 (the standard SNMP port) on a Clearwater node will provide system-level stats like CPU% as well as any available Clearwater stats.
To load the MIB file, allowing you to refer to MIB objects by name,
first place it in the ~/.snmp/mibs
directory. To load the MIB file
for just the current session, run
export MIBS=+PROJECT-CLEARWATER-MIB
. To load the MIB file every
time, add the line mibs +PROJECT-CLEARWATER-MIB
to a snmp.conf file
in the ~/.snmp
directory.
If a statistic is indexed by time period, then it displays the relevant statistics over:
- the previous five-second period
- the current five-minute period
- the previous five-minute period
For example, a stat queried at 12:01:33 would display the stats covering:
- 12:01:25 - 12:01:30 (the previous five-second period)
- 12:00:00 - 12:01:33 (the current five-minute period)
- 11:55:00 - 12:00:00 (the previous five-minute period)
All latency values are in microseconds.
(although the MIB file should be examined to determine exactly which
ones). The full table can be retrieved by using the snmptable
command. For example, the Initial Registrations table for Sprout can be
retrieved by running:
| snmptable -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationTable
The individual table elements can be accessed using:
snmpget -v2c -c clearwater <ip> <table OID>.1.<column>.<row>
.1.2.826.0.1.1578918.9.3.9
, so the number of initial registration
attempts in the current five-minute period can be retrieved by:
| snmpget -v2c -c clearwater <ip> .1.2.826.0.1.1578918.9.3.9.1.2.2
or by:
| snmpget -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationAttempts.scopeCurrent5MinutePeriod
snmpwalk
command can be used to discover the list of queryableOIDs beneath a certain point in the MIB tree. For example, you can
retrieve all of the entries in the Sprout Initial Registrations table
using:
| snmpwalk -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::sproutInitialRegistrationTable
Running
snmpwalk -v2c -c clearwater <ip> PROJECT-CLEARWATER-MIB::projectClearwater
will output a very long list of all available Clearwater stats.
Note that running an ‘snmpget’ on a table OID will result in a “No Such Object available on this agent at this OID” message.
Bono statistics¶
Bono nodes provide the following statistics:
- The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
- The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
- The number of parallel TCP connections to each Sprout node.
- The number of incoming requests, indexed by time period.
- The number of requests rejected due to overload, indexed by time period.
- The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.
Sprout statistics¶
Sprout nodes provide the following statistics:
- The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
- The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
- The average latency, variance, highest latency and lowest latency for requests to Homestead, indexed by time period.
- The average latency, variance, highest latency and lowest latency for
requests to Homestead’s
/impi/<private ID>/av
endpoint, indexed by time period. - The average latency, variance, highest latency and lowest latency for
requests to Homestead’s
/impi/<private ID>/registration-status
endpoint, indexed by time period. - The average latency, variance, highest latency and lowest latency for
requests to Homestead’s
/impu/<public ID>/reg-data
endpoint, indexed by time period. - The average latency, variance, highest latency and lowest latency for
requests to Homestead’s
/impu/<public ID>/location
endpoint, indexed by time period. - The average latency, variance, highest latency and lowest latency for requests to Homer, indexed by time period.
- The number of attempts, successes and failures for AKA authentications on register requests, indexed by time period (AKA authentication attempts with a correct response but that fail due to the sequence number in the nonce being out of sync are counted as successes).
- The number of attempts, successes and failures for SIP digest authentications on register requests, indexed by time period (authentication attempts with a correct response but that fail due to being stale are counted as failures).
- The number of attempts, successes and failures for authentications on non-register requests, indexed by time period.
- The number of attempts, successes and failures for initial registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
- The number of attempts, successes and failures for re-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
- The number of attempts, successes and failures for de-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
- The number of attempts, successes and failures for third-party initial registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
- The number of attempts, successes and failures for third-party re-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authetication stats and not here).
- The number of attempts, successes and failures for third-party de-registrations, indexed by time period (registrations that fail due to failed authentication are counted in the authentication stats and not here).
- The number of requests routed by the S-CSCF according to a route pre-loaded by an app server, indexed by time period.
- The number of parallel TCP connections to each Homestead node.
- The number of parallel TCP connections to each Homer node.
- The number of incoming SIP requests, indexed by time period.
- The number of requests rejected due to overload, indexed by time period.
- The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.
- The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
- The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
- The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
- The number of remaining nodes to query during the current Chronos scaling operation.
- The number of timers, and number of invalid timers, processed over the last 5 seconds.
- The total number of timers being managed by a Chronos node at the current time.
- The weighted average of total timer count, variance, highest timer count, lowest timer count, indexed by time period.
- The number of attempts, successes and failures for incoming SIP transactions for the ICSCF, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions for the ICSCF, indexed by time period and request type.
- The number of attempts, successes and failures for incoming SIP transactions for the SCSCF, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions for the SCSCF, indexed by time period and request type.
- The number of attempts, successes and failures for incoming SIP transactions for the BGCF, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions for the BGCF, indexed by time period and request type.
- The permitted request rate (PRR) is an estimate for the sustainable request rate without causing large latency. Sprout provides a weighted average permitted request rate, variance, highest PRR, and lowest PRR, indexed by time period.
- The value of the smoothed latency at the last permitted request rate update.
- The value of the target (maximum permissible) latency at the last permitted request rate update.
- The number of penalties experienced at the last permitted request rate update.
- The current permitted request rate.
- The number of incoming INVITE transactions for the S-CSCF that were cancelled before a 1xx response was seen, indexed by time period.
- The number of incoming INVITE transactions for the S-CSCF that were cancelled after a 1xx response was seen, indexed by time period (these INVITE cancellation statistics can be used to distinguish between the case where an INVITE was cancelled because the call rang but wasn’t answered and the case where it failed due to network issues and never got through in the first place).
- The average count, variance, and high and low watermarks for the number of registrations, indexed by time period. (stateful)
- The average count, variance, and high and low watermarks for the number of bindings, indexed by time period. (stateful)
- The average count, variance, and high and low watermarks for the number of subscriptions, indexed by time period. (stateful)
- The number of registrations active at the time queried. (stateful)
- The number of bindings active at the time queried. (stateful)
- The number of subscriptions active at the time queried. (stateful)
Ralf statistics¶
Ralf nodes provide the following statistics:
- The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details).
- The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
- The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
- The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
- The number of remaining nodes to query during the current Chronos scaling operation.
- The number of timers, and number of invalid timers, processed over the last 5 seconds.
- The average count, variance, and high and low watermarks for the number of calls, indexed by time period. (stateful)
- The number of calls active at the time queried. (stateful)
Homestead Statistics¶
Homestead nodes provide the following statistics:
- The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
- The average latency, variance, highest call latency and lowest latency on HTTP requests, indexed by time period.
- The average latency, variance, highest latency and lowest latency on the Cx interface, indexed by time period.
- The average latency, variance, highest latency and lowest latency on Multimedia-Auth Requests on the Cx interface, indexed by time period.
- The average latency, variance, highest latency and lowest latency on Server-Assignment, User-Authorization and Location-Information Requests on the Cx interface, indexed by time period.
- The number of incoming requests, indexed by time period.
- The number of requests rejected due to overload, indexed by time period.
- The total number of Diameter requests with an invalid Destination-Realm or invalid Destination-Host, indexed by time period.
- The number of Multimedia-Authorization-Answers with a given result-code received over the Cx interface, indexed by time period.
- The number of Server-Assignment-Answers with a given result-code received over the Cx interface, indexed by time period.
- The number of User-Authorization-Answers with a given result-code received over the Cx interface, indexed by time period.
- The number of Location-Information-Answers with a given result-code received over the Cx interface, indexed by time period.
- The number of Push-Profile-Answers with a given result-code sent over the Cx interface, indexed by time period.
- The number of Registration-Termination-Answers with a given result-code sent over the Cx interface, indexed by time period.
Call Diversion App Server Statistics¶
Call Diversion App Server nodes provide the following statistics:
- The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
Memento App Server Statistics¶
Memento App Server nodes provide the following statistics:
- The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
MMTel App Server Statistics¶
MMTel App Server nodes provide the following statistics:
- The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.
Gemini App Server Statistics¶
Gemini App Server nodes provide the following statistics:
- The number of attempts, successes and failures for incoming SIP transactions, indexed by time period and request type.
- The number of attempts, successes and failures for outgoing SIP transactions, indexed by time period and request type.