Highlight Alerting Mechanism
Last updated
Was this helpful?
Last updated
Was this helpful?
The mechanism works on the principle that if an error condition is encountered in any element then a fuel gauge-type counter is decreased. In all other circumstances, it’s increased. When the counter goes below pre-set thresholds, Highlight signals an amber condition, and then a red. Please see below the conditions when the various counters are decreased, associated with Stability, Load, or Health which we refer to as level 3 metrics.
Some thresholds are hard coded in Highlight and Admin users can define others. Targets in performance tests are defined during setup. These are labelled as "configurable" below.
How quickly a heat tile changes colour is governed by the sensitivity settings. Please find out more about or for details.
Dormant interfaces are not expected to pass traffic and only send stability alerts if the device is uncontactable
Decrease the counter if any of the following conditions are met, which we refer to as level 2 metrics affecting stability:
The device loses connection with either of the Highlight pollers
The monitored interface is down or indicating a brief outage or has been taken out of service or no longer exists
There has been a device restart
A switch port designated as critical is down
Performance tests: ICMP Ping, UDP Echo and TCP Open: 100% packet loss of all tests in a sample
Performance test: MOS: Application failure, MOS is less than 1.0
Performance test: Precision Delay: 100% packet loss of all tests in a sample. The health index is also affected.
Decrease the counter if any of these conditions are met, which we refer to as level 2 metrics affecting load:
Link utilisation (traffic in or out) exceeds threshold (default is 80% - configurable)
Tunnel utilisation (traffic in or out) exceeds threshold (default is 82% )
Traffic in or out on a dormant watch exceeds a threshold (default is 1000 Kbps - configurable)
CPU for a router exceeds 60%
WiFi Client Count exceeds a threshold (default is 30 client devices - configurable)
Wireless Utilisation exceeds a threshold (default is 50% - configurable)
Decrease the counter if any of these conditions are met, which we refer to as level 2 metrics affecting health:
Link errors exceed a threshold (default is 1% or 10,000 packets per million - configurable)
Link congestion occurs:
Queue length exceeds 0
Discards exceed a threshold (default is 1% or 10,000 packets per million - configurable)
Class drops exceed 0 (configurable)
Broadband Clarity:
The connection speed of the broadband service drops below the speed threshold, which is auto-learned or manually set - configurable
Cellular Clarity:
The signal strength score of the cellular service drops below the threshold, which may be set (default is 0 - configurable)
WiFi:
Congestion (discards) exceeds a threshold (default is 1% or 10,000 packets per million - configurable) or
Signal Problems exceed a threshold (default is 25% - configurable)
Performance tests - ICMP Ping, UDP Echo and TCP Open:
Any one of the tests in a sample shows response exceeds the target (configurable)
At least one test in a sample fails to respond (lost packet)
Note: One sample can contain up to six test results; if all tests in a sample fail it affects stability and health
Performance tests - Precision Delay and MOS: (configurable)
Average response from the burst exceeds response target
Yes
Yes
Percentage of lost packets exceeds packet loss target
Yes
Yes
Jitter measured over the burst exceeds jitter target
Yes
Yes
MOS Score is less than target
N/A
Yes
Note that each level 2 metric above can trigger an alert so for example you may get an alert when a heat tile goes red caused by Inbound Link utilisation, then another alert caused by Outbound Link utilisation even though the tile is already red. The tile colour represents the worst case of all level 2 metrics associated with it.