How to use BSM to Prioritize Important Issues
We all want our monitoring systems to alert us when things go wrong. While it’s important to get alerts in the event of a failure or latency problem on something specific such as a SQL database, it’s actually just as important to not receive alerts from too many specific sources in the same alerting channel. If our monitoring system starts to fatigue us, we will ignore alerts until the phone calls and Emails from end users start letting us know a service is impaired or unavailable. Our monitoring solution should notify us both about specific failures in general and major issues, so we can differentiate and prioritize.
A single event, such as max processes in use on a database may not in itself be a problem that needs to be addressed on an emergency basis. A combination of events, though, such as a high value of max processes, a large amount of network discards, and slow response time for an http request can indicate a more general problem that is currently impacting the end users. We can easily monitor all of these conditions individually.
Read More