GroundWork Monitor offers Parent Child configurations for distributed monitoring, enabling the monitoring of a subset of an infrastructure where Child servers report the state and performance metrics to a central, or “Parent” GroundWork server.
What this Blog post is focused on is not a Parent Child architecture configuration, but instead the other kind of Parent Child: the relationships and inherent dependencies that can be configured to control the behavior of hosts and services based on the status of one of more other hosts and services.
Alarm Storms
Getting too many alerts can desensitize system administrators and cause issues to go unnoticed, despite receiving alerts related to an event. Too many alerts can also reduce your time to resolve issues by not clearly identifying the point of failure.
Two of the most common examples of causes for alarm storms are:
Loss of communication to a network device. Whether due to a firewall rule change or similar network failure, this type of event can cause every monitored host and service to lose network communication and in turn generate alerts for the faux failed state.
Host or application failure. A database is a great example of this scenario. Basic monitoring for an Oracle database will include checks for overall database availability, size of tablespace, locked objects, and the percentage of max processes in use. The underlying Operating System will generally also have basic checks in place for disk, cpu, memory, uptime, and ping. In this scenario, if the server were to power off, we could receive nine alerts for a single system event, with no clear indication of what exactly the problem is.
Alarm storms such as these can lead systems administrators to be complacent to ticket floods, and can result in delayed response times.
Cutting Through the Noise
The scenarios which cause alarm storms can be easily mitigated by implementing GroundWork Monitor features which are easy to configure and maintain:
Parent/Child: For physical dependencies such as a network device to a group of hosts.
Host Dependencies: For logical dependencies such as an authentication server which relies on an external database on a different host to properly function.
Service dependencies: For services which if they fail, other services will also fail.
Using Parent/Child relationships, you can suppress alarms for all child hosts in the directive should the parent system fail, all child systems to the parent will be set to an UNKNOWN state.
With host and service dependencies configured, you can choose to disable notifications and/or service checks should a master service or host fail.
Oracle Database Scenario
For example, using our Oracle database scenario we can define the ping check as the master service and the rest of the Operating System and application checks as dependents, and by doing so, should the host go down, we will receive a single, actionable alert that the host failed to ping, and be able to begin working the issue rather than the tickets.
Parent/Child Relationships Configuration
In GroundWork Monitor, navigate to Configuration > Nagios Monitoring > Hosts.
Expand the Parent Child option and click New.
Select the Parent host from the drop-down list.
Select the Child hosts from the right side, and click the Add button.
Click Save, then Commit the configuration change (Configuration > Nagios Monitoring > Control > Commit).
For example, in this image we are configuring the database servers as children, and the network device which provides connectivity as the parent. Should the network device (Parent) fail, the database servers (Children) will go into an UNREACHABLE state, and the services to an UNKNOWN state, suppressing pointless alarms and checks until the parent device has recovered from failure.
Host Dependency Configuration
Navigate to Configuration > Nagios Monitoring > Hosts.
Expand the Host Dependencies option and click New.
On this page we can configure:
The Dependent host.
The Master host.
Whether or not to Inherit dependencies from the master, this allows you to create dependencies upon dependencies.
Execution failure criteria (Up, Down, Unreachable, Pending, None), for each item selected, if the master host is in the selected state, checks will not execute at all for the dependent hosts until the master host is in a state which is not selected.
Notification failure criteria (Up, Down, Unreachable, Pending, None), for each item selected, if the master host is in the selected state, notifications will not be processed until the master host is in a state which is not selected.
Once you are satisfied with your selections, click Add, then Commit your changes.The image below is an example within the GroundWork UI of a SAP HANA instance which is connected via a site-to-site VPN which is being monitored, and if the VPN goes down, we also lose connectivity to the SAP HANA instance. By using this configuration, we will not get false-positive alerts for connectivity issues caused by the VPN connection, and can more quickly identify it is the VPN connection that is the point of failure.
Service Dependency Configuration
Navigate to Configuration > Nagios Monitoring > Services.
Expand the Service Dependencies option and click New.
On this page we can configure:
The Service dependency template name.
The master Service name.
Execution failure criteria (OK, Warning, Critical, Unknown, None), for each item selected, if the master service is in the selected state, checks will not execute at all for the dependent hosts until the master service is in a state which is not selected.
Notification failure criteria (OK, Warning, Critical, Unknown, None), for each item selected, if the master service is in the selected state, notifications will not be processed until the master service is in a state which is not selected.
Once you are satisfied with your selections, click Add, then Commit your changes. This will create a template that can be applied to dependent services.Below is an example of a configuration for a master service of hdbdaemon, which is a service check which monitors the SAP HANA system to ensure that the process named hdbdaemon is running correctly on the host. This process is responsible for the control of all of the other processes the database needs in order to function, so when it is not in operation, it is likely for the processes it is responsible for to also fail along with it.
Now that the template is created for the master service, it can be applied to an existing service to make it dependent on the master:
Navigate to Configuration > Nagios Monitoring > Services.
Expand Services and click the service in which you want to be a dependent.
Click the Service Dependencies tab.
In the Dependency drop-down, select your new dependency template name.
For Master service host, this will usually be set to the same host – but you can also set the master service to be on a different host if you need to (usually for distributed or HA applications).
Click Add Dependency, and Commit your changes.
For this SAP HANA system specifically, we created a dependency on the master service hdbdaemon for the dependent services, hdbindexserver, hdbnameserver, hdbpreprocessor, hdbwebdispatcher, and hdbxsengine. This way, when the daemon goes down, we get a single alert for the actual issue, instead of six alerts from a single event
In Summary
Utilizing Parent/Child relationships and Dependencies can allow IT administrators to better identify actual causes of failure, respond in a more timely manner, and it’s very easy to configure.
For more information on these features, visit the GroundWork Support to access the following articles:
Docker Container Monitoring with GroundWork Cloud Hub
June 19, 2020
Why Containers?
Container technologies have captivated the computing world. Containers are the cornerstone for cloud computing and microservice architectures. Whether it be Docker™, Docker Compose™, or Kubernetes™, the IT world is embracing this technology with great enthusiasm.
How can you monitor containers? They are different from traditional hosts and servers. For one thing, they are not physical machines; nor are they virtual machines. Containers can be spun up to handle periodic load, and then torn down when no longer needed. With Kubernetes, containers can also be replicated and load balanced in pods across clusters.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.