To detect nodes that contribute to message delays, you can measure the time it takes to resolve node names and the time it takes to route a message from the message queue to all consumers.
To measure the node name resolution, you define a threshold value (in milliseconds) that the node name resolution is allowed to take. If the measured value exceeds the threshold value, the logging mechanism stores the resolution information (node name, IP address, time to resolve, and so on) in a user-configurable log file. Recommended threshold values range from 50 to 200 milliseconds.
To measure the time for message routing, you define a threshold value (in milliseconds) it can take to route a message to all consumers. (This includes the time used for the node name resolution.) If the measured value exceeds the threshold value, the logging mechanism stores the routing information (node name, IP address, proxy information, message ID, first 40 characters of the message, discard information, time to validate the sending node, and so on) in a user-configurable log file. Recommended threshold values range from 200 to 1000 milliseconds.
The following lines show an extract of a log file for measuring node name resolution (OvNameResMsgFilter):
Measure: OvNameResMsgFilter Component: OvEpMessageFilter Time Started: 2007-05-19 14:20:18.413 Node Name: node_x Resolved: no Time Finished: 2007-05-19 14:20:25.319 Elapsed time: 6906 State: Finished with threshold exceeded *-*-*-*-*-*-*-*-*-* Measure: OvNameResMsgFilter Component: OvEpMessageFilter Time Started: 2007-05-19 17:33:18.406 Agent Id: 25da9f90-fe5c-71d7-0422-0f887c4c0000 Node Name: node_y Ip Addr: 15.136.124.76 Resolved: yes Time Finished: 2007-05-19 17:33:18.609 Elapsed time: 203 State: Finished with threshold exceeded *-*-*-*-*-*-*-*-*-*
The first log entry shows that node "node_x" could not be resolved. The elapsed time for this measure interval was 6906 milliseconds. The message did not contain an AgentId or IP address, but a node name only and the node does not exist as managed node.
The second log entry shows that node "node_y" with ip 15.136.124.76 could be resolved as a managed node but the resolution time exceeded the configured threshold value.
The following lines show an extract of a log file for measuring message routing (OvMessageFlow):
Measure: OvMessageFlow Component: OvEpMessageFilter Time Started: 2007-05-19 11:07:25.155 Node Name: node_y Ip Addr: 15.136.124.76 Time Created: May 19 11:07:25 Time Received: May 19 11:07:25 OvEpMessage: Id="f1da7a30-a973-71d8-1afb-0f887c4c0000", Text="This is a good message..." Proxied: no NodeResTime: 0 Discarded: no Time Finished: 2007-05-19 11:07:25.217 Elapsed time: 63 State: Finished with threshold exceeded *-*-*-*-*-*-*-*-*-* Measure: OvMessageFlow Component: OvEpMessageFilter Time Started: 2007-05-19 11:16:32.337 Node Name: node_x Time Created: May 19 11:16:25 Time Received: May 19 11:16:32 OvEpMessage: Id="37ffd310-a975-71d8-1afb-0f887c4c0000", Text="Wrong node ..." Proxied: yes NodeResTime: 6938 Discarded: yes Time Finished: 2007-05-19 11:16:39.274 Elapsed time: 6938 State: Finished with threshold exceeded *-*-*-*-*-*-*-*-*-*
The first log entry shows that a message has been routed to all consumers in 63 milliseconds, but it exceeded the configured threshold. The time for the node resolution (field NodeResTime) was below 1 millisecond. (The name resolution worked well.) The message came from a managed node and was not discarded.
The second log entry shows that a message was discarded because it was proxied and the node name of the originating node was not known on the management server. The resolution time for the node was 6938 milliseconds and this exceeded the threshold.
Related Topics: