The Simple Network Management Protocol (SNMP) has been an integral part of monitoring network environments since its introduction in 1988. It has established itself as the de facto standard in network monitoring. Many manufacturers support the protocol and have implemented an SNMP agent on their network devices. These agents allow monitoring solutions to query various data, such as bandwidth, CPU load, network interfaces, etc., without installing an additional agent on network equipment.
Especially with the increasing number of devices on a network, a simple and established method such as SNMP sounds like a great help to include components in monitoring quickly. Unfortunately, SNMP has a few flaws. The first part of this article will explain how SNMP works, while the second part will drill deeper into the issues with SNMP and how to deal with them.
The protocol offers two methods to retrieve data from devices: polling and traps. With SNMP polling, a monitoring solution queries the data at user-specified time intervals from the SNMP agent. This active polling is used for status-based monitoring and is generally the recommended method. However, the disadvantage of SNMP polling is that the administrator does not notice if an event occurs between two queries, such as a brief change in the network interface status.
The alternative to SNMP polling is an event-based variant called SNMP traps. If a certain event occurs on the monitored device, it sends an error message to the monitoring instance. One of the disadvantages of SNMP traps is that the data packets transmitted via UDP can be lost. Since UDP does not acknowledge receipt of network packets, the administrator does not even know that an alert was sent if the packets containing the trap data are dropped. Thus, ironically, a problem on the network prevents the detection of another issue with a network device.
Another disadvantage of SNMP traps can be the flood of triggered messages. For example, suppose a core switch is no longer available. In that case, in large network environments, it can lead to thousands of switches sending traps. Even if it does not have an upstream filter mechanism, the trap receiver can collapse under such a load of error messages. Monitoring is then unavailable in an emergency. In addition, the administrator must re-reconfigure all components in the network if the IP address of the trap receiver changes.