Hardware failures occur most frequently in moving mechanical parts such as fans and disk drives. Hardware failures may not cause the system to fail immediately. A single, small hardware failure may often go undetected until it leads to more serious failures. For instance, a burned out cooling fan can cause a processor to overheat, resulting in increased errors and reduced performance for a long time before the hardware component fails completely.
Using some common hardware fault tolerance techniques, you can often isolate and minimize the effects of small hardware failures:
You can also configure Simple Network Management Protocol (SNMP) to notify you of any detectable abnormal conditions in your hardware.