Agent health checks


Note NOTE:
The agent health checks described here check the health of the HPOM agent and of all its subagents.

Self management monitors the health of the agents on each managed node using the following mechanisms:

The management server reports the health of the agents either to the active message browser, or to the Windows event log. An event log policy that is deployed to the management server evaluates events in the event log and forwards them to the message browser. Message correlation acknowledges Node down messages automatically when a Node up message arrives.

Some sample message generated by the server are:

Node down-messages:

Node up-messages:

Note NOTE:
The management server does not check the health of nodes that have an empty package inventory. Nodes can have an empty package inventory if, for example, you install the agent manually, or if you upload the node configuration from another management server. If you want the management server to start checking the health of these nodes, synchronize the package inventory.

To configure advanced agent health check options

  1. In the console tree, right-click Operations Manager, and then click Configurearrow Server.... The Server Configuration dialog opens.
  2. Click Namespaces, and then click Agent Health Check. A list of values appears.

  3. Change any of the values in the following table:

Values Value type Unit Default value Description
Health check ping protocol List
  • DISABLED
  • AGENTONLY
  • ICMPONLY
  • ENABLED
ENABLED

This value configures the default ping protocol. You can change the default for each node in the Node Properties dialog.

  • DISABLED means that the management server performs no agent health check at all.
  • AGENTONLY means that the server does not actively contact the node with ICMP pings, but still contacts the agent on the node. This is useful for nodes behind a firewall.
  • ICMPONLY means that the server does not contact the agent, but only uses ICMP pings. This is useful for managed nodes like SNMP devices that do not have an agent installed.
  • ENABLED means that all aspects of agent health check are used.
Enable health check Boolean
  • True
  • False
True Enables or disables all aspects of the health check.
Time interval to check agent health Integer Number of seconds 300 The default interval at which the management server checks the health of each agent. You can change the default for each node in the Node Properties dialog.
Maximum number of parallel checks Integer Number of threads 100 The maximum number of parallel threads that are used to do the active check (server pings the node).
After you have changed this value, restart the OvEpMessageActionServer service for the change to take effect.
Health check retries Integer 0 to 3 retries 0 This value configures the number of health check ping retries to do immediately if an agent could not be reached. The node is considered down when all retries have been unsuccessful. Increase this value if you have an unreliable network infrastructure.
Target for agent health problem messages List
  • SERVER
  • EVENTLOG
  • SERVER_EVENTLOG
SERVER

The target for messages that indicate problems with agent health checking.

  • SERVER means that these messages are directly written to the active message browser on the management server, without passing any policy-based message filter.
  • EVENTLOG means that these messages are written to the application event log so that they can be picked up by a Windows Event Log policy. The VP_SM-Server_EventLogEntries policy already contains two rules for these health messages named "forwards all health check...". These rules can be easily adapted or used as templates for your own health checking rules.
  • SERVER_EVENTLOG combines SERVER and EVENTLOG.
Severity of agent health problem messages List
  • Normal
  • Warning
  • Minor
  • Major
  • Critical
Critical

The severity for messages that indicate problems with agent health checking. For example, "Node xxx may be down. Failed to contact it using ping."

If you configure the Target for agent health problem messages to include the event log, this value sets the event types as follows:

  • Normal results in information events.
  • Warning, minor, and major result in warning events.
  • Critical results in error events.

Health check report buffering

Boolean
  • True
  • False
True This value configures whether to report that an agent is buffering messages.
Severity of buffering for this management server List
  • Normal
  • Warning
  • Minor
  • Major
  • Critical
Major This value configures the severity of messages that indicate that the agent is buffering messages for this management server.
Severity of buffering for other management servers List
  • Normal
  • Warning
  • Minor
  • Major
  • Critical
Warning This value configures the severity of messages that indicate that the agent is buffering messages for a management server other than this one.
Enable access denied warning for raw socket creation Boolean
  • True
  • False
True This value configures whether to write a warning to the system event log if the management server cannot accept alive packets from agents. (See Accepting alive packets below.)

Accepting alive packets

On nodes that have the DCE agent, the message agent sends an alive packet to the management server at a configurable interval. However, in HPOM 8.10, the management server is no longer able to receive these alive packets by default. The management server runs under the HP-OVE-User account, which no longer has administrative rights. Without administrative rights, the management server cannot open the raw socket that it needs to receive alive packets.

To continue receiving alive packets, you must add the HP-OVE-User to the local administrators group on the management server. Before you give the HP-OVE-User administrative rights, check the security requirements of your organization.

If the management server can accept alive packets, it checks whether it received a packet from a node before it contacts that node by ICMP ping or call to the control agent. If the management server has received an alive packet, it does not attempt to contact the node.

You can change the frequency with which each node sends alive packets. You do this by configuring the value for OPC_HBP_INTERVAL_ON_AGENT in a nodeinfo policy, which you deploy to the agent. The agent sends an alive packet at an interval equal to two-thirds of the configured value. On nodes that have the DCE agent, the default value of OPC_HBP_INTERVAL_ON_AGENT is 280, so the agent sends an alive packet every 120 seconds.

If the management server cannot accept alive packets, change the default value of OPC_HBP_INTERVAL_ON_AGENT to 0 on nodes with DCE agents. The agent stops sending alive packets, which prevents unnecessary network load. On nodes that have the HTTPS agent, the value is not set by default, so the HTTPS agent sends no alive packets by default.

Changing agent health check behavior

Related Topics: