Manage cluster-aware applications


The easiest way to manage cluster-aware applications is to purchase a Smart Plug-in. You can, however, create your own policies for cluster-aware applications in supported cluster environments. (For details of supported cluster environments, see the support matrix at HP Software Support Online.)

The configuration described below will create a scenario where policies that you designate are enabled on the node in the cluster where the application is current running, and disabled on all other nodes in the cluster.

Note NOTE:

To monitor a cluster aware application, the cluster group which contains the resource being monitored must also contain both a "Network Name" and "IP Address" resource.

If the cluster group being monitored does not contain a "Network Name" and "IP address" resource, then the following errors will be logged in the opcerror file on the cluster node and in the management server active message browser for the cluster node.

"Could not perform cluster API function, error code 1008 returned System Error Number: -1 (ffffffff) - (OpC30-3223)"

"Could not read cluster information System Error Number: -1 (ffffffff) - (OpC30-3221)"

"Application Package Monitor of subagent 0 aborted; process got signal 1 (OpC30-1041)."

To monitor a cluster-aware application

  1. Create an XML file that describes the cluster instances, or if this file already exists, extend the existing one. The example below shows two instances. The file must be named apminfo.xml. A DTD is available here. A sample apminfo.xml file is shown below. This example can be used for all four supported cluster environments.
  2. The illustration below shows the MS Cluster Administrator window with the SQL-Server resource group shown in detail. This group contains the required Network Name (CLUSTER04) and IP Address resources.

    NOTE: The illustration and accompanying text applies only to the Windows cluster.

    In this illustration, the Package (MS calls it resource group) is called "SQL-Server" and the instance name is the Network Name "CLUSTER04". For the other supported environments, Package is named as follows:

    Example configuration for this illustration:

    <?xml version="1.0" ?>
    <APMClusterConfiguration xmlns="http://www.hp.com/OV/opcapm/cluster">
    		<Application>
    				<Name>dbspi_mssqlserver</Name>
    				<Instance>
    						<Name>CLUSTER04</Name>
    						<Package>SQL-Server</Package>
    				</Instance>
    		</Application>
    </APMClusterConfiguration>
    
    In general, the apminfo.xml file has the following format:
    Note NOTE:
    No new line is allowed between package tags in the apminfo.xml file.
    <?xml version="1.0" ?>
       <APMClusterConfiguration>
    	<Application>
    		 <Name>Name of the cluster-aware application</Name> 
    			<Instance>
    			 <Name>Instance name used by the application. 
    						This name is used for start and stop commands 
    						and should usually correspond to the name used 
    						to designate this instance in messages.
    			</Name> 
    			 <Package>Instance name of the Resource Group
    						used by the cluster software
    			 </Package> 
    			</Instance>
    			<Instance>
    			 <Name>Application's name for the second instance.</Name>
    			 <Package>Name of the second Resource Group
    						instance used by the cluster software
    			 </Package> 
    			</Instance>
    	</Application>
       </APMClusterConfiguration>
    
  3. On nodes with an HTTPS agent, save the completed apminfo.xml file on each node in the cluster in the following directory:

    <data_dir>/conf/conf/

    On Windows nodes with a DCE agent, save the file in the following directory:

    <install_dir>/Installed Packages/{790C06B4-844E-11D2-972B-080009EF8C2A}/conf/OpC/

    On UNIX nodes with a DCE agent, save the file in the following directory:

    /var/opt/OV/conf/OpC

  4. Write policies to monitor the application on the cluster. Assign the policies a category. You will use this category later.
  5. Create an XML file that describes the policies that should be cluster-aware. The file name must have the format Name of the cluster-aware application apm.xml. "Name of the cluster-aware application" must be identical to the content of the <Name> tag in the apminfo.xml file.

    The following example file called dbspi_mssqlserver.apm.xml shows how the Database SPI configures the policies for the MS-SQL server.

    <?xml version="1.0"?>
    <APMApplicationConfiguration xmlns="http://www.hp.com/OV/opcapm/app">
    		<Application>
    				<Name>dbspi_mssqlserver</Name>
    				<Template>DBSPI-MSS-05min-Reporter</Template>
    				<Template>DBSPI-MSS-1d-Reporter</Template>
    				<Template>DBSPI-MSS-05min</Template>
    				<Template>DBSPI-MSS-15min</Template>
    				<Template>DBSPI-MSS-1h</Template>
    				<Template>DBSPI-MSS6-05min</Template>
    				<Template>DBSPI-MSS6-15min</Template>
    				<Template>DBSPI-MSS6-1h</Template>
    				<Template>DBSPI Microsoft SQL Server</Template>
    				<StartCommand>dbspicol ON $instanceName</StartCommand>
    				<StopCommand>dbspicol OFF $instanceName</StopCommand>
    		</Application>
    </APMApplicationConfiguration>
    
    In general, the application configuration file has the following format:
    <?xml version="1.0" ?>
      <APMApplicationConfiguration>
      <Application>
    		<Name>Name of the cluster-aware application (must 
    				match the content of <Name> in the apminfo.xml file)
    		</Name>
    		<Template>First policy that should be cluster-aware</Template>
    		<Template>Second policy that should be cluster-aware</Template>
    		<startCommand>An optional command that the agent runs whenever an
    					instance of the application starts
    		</startCommand>
    		<stopCommand>An optional command that the agent runs whenever an 
    					instance of the application stops
    		</stopCommand>
      </Application>
      </APMApplicationConfiguration>
      
    

    The stop and start commands can use the following variables:

  6. Create a category directory for the category you defined in step 3. Copy the xml files generated in step 4 to this directory. (A category directory is a user-defined instrumentation directory on the management server.) The management server automatically deploys the xml files to the node whenever it deploys policies of this category.
  7. Ensure that the physical nodes where the Resource Groups reside are all managed nodes.
  8. Deploy the policies listed in Name of the cluster-aware application.apm.xml and the monitors to all the physical nodes in the cluster.
  9. Stop and restart the agent on after copying the apminfo.xml. Use the commands opcagt -kill to stop the agent and opcagt -start to restart it.

Microsoft Cluster Service (MSCS) specific configuration

Generally, users have huge clusters with many resources in one single resource group. In this case, if a resource in the resource group has failed (or even if it is offline), it may not have such an effect on the application that it fails over. That would mean that it should not lead to APM disabling all monitoring of the application. However, it is required that the user should be able to configure which state of the cluster group or cluster node should be treated by APM as online.

Cluster Group State

In Microsoft Cluster Services, a cluster group can have the following states:

Cluster Group State
Description
ClusterGroupFailed At least one resource in the group has failed.
ClusterGroupPending At least one resource in the group is in a pending state. There are no failed resources.
ClusterGroupOnline All of the resources in the group are online.
ClusterGroupPartialOnline At least one resource in the group is online. No resources are pending or failed.
ClusterGroupOffline All of the resources in the group are offline or there are no resources in the group.
ClusterGroupStateUnknown The operation was not successful. For more information about the error, call the Win32 function GetLastError.

By default only the state ClusterGroupOnline is treated by APM as online. But it is possible to configure the behavior of the APM for MSCS using nodeinfo parameters:

OPC_APM_HANDLE_GROUP_AS_ONLINE

This nodeinfo parameter allows you to define which cluster group states are threaded as online from APM.

Syntax:

OPC_APM_HANDLE_GROUP_AS_ONLINE <state1>, <state2>, .

Example:

OPC_APM_HANDLE_GROUP_AS_ONLINE ClusterGroupOnline, ClusterGroupPartialOnline, ClusterGroupFailed

OPC_APM_HANDLE_PARTIAL_AS_ONLINE

This nodeinfo parameter allows you to define that the cluster group state is handled from APM as online.

Syntax:

OPC_APM_HANDLE_PARTIAL_AS_ONLINE TRUE

This nodeinfo variable is still available for backward compatibility but should not be used in future. Instead use:

OPC_APM_HANDLE_GROUP_AS_ONLINE ClusterGroupOnline, ClusterGroupPartialOnline

If both variables, OPC_APM_HANDLE_GROUP_AS_ONLINE and OPC_APM_HANDLE_PARTIAL_AS_ONLINE, are used, the variable OPC_APM_HANDLE_GROUP_AS_ONLINE has the higher priority and OPC_APM_HANDLE_PARTIAL_AS_ONLINE is ignored.

Cluster Node State

In Microsoft Cluster Services, a cluster node can have the following states:

Cluster Node State
Description
ClusterNodeUp ClusterNodeStateUnknown
ClusterNodeDown The node is inactive.
ClusterNodeJoining The node is in the process of joining a cluster.
ClusterNodePaused The node has temporarily suspended activity.
ClusterNodeStateUnknown The operation was not successful. For more information about the error, call the Win32 function GetLastError.

By default, the state ClusterNodeUp is considered as online. But it is possible to configure the behavior of the APM for MSCS using nodeinfo parameters:

OPC_APM_HANDLE_NODE_AS_ONLINE

This nodeinfo parameter allows you to define which cluster group states are threaded as online from APM.

Syntax:

OPC_APM_HANDLE_NODE_AS_ONLINE <state1>, <state2>, .

Example:

OPC_APM_HANDLE_NODE_AS_ONLINE ClusterNodeUp, ClusterNodeJoining