In 1935, Austrian physicist, Erwin Schrödinger, still flying high after his Nobel Prize win from two years earlier, created a simple thought experiment.
It ran something like this:
If you have a file server, you cannot know if that server is up or down...until you check on it. Thus, until you use it, a file server is—in a sense—both up and down. At the same time.
This little brain teaser became known as Schrödinger's File Server, and it's regarded as the first known critical research on the intersection of Systems Administration and Quantum Superposition. (Though, why Erwin chose, specifically, to use a "file server" as an example remains a bit of a mystery—as the experiment works equally well with any type of server. It's like, we get it, Erwin. You have a nice NAS. Get over it.)
Okay, perhaps it didn't go exactly like that. But I'm confident it would have...you know...had good old Erwin had a nice Network Attached Storage server instead of a cat.
Regardless, the lessons from that experiment certainly hold true for servers. If you haven't checked on your server recently, how can you be truly sure it's running properly? Heck, it might not even be running at all!
Monitoring a server—to be notified when problems occur or, even better, when problems look like they are about to occur—seems, at first blush, to be a simple task. Write a script to ping a server, then email me when the ping times out. Run that script every few minutes and, shazam, we've got a server monitoring solution! Easy-peasy, time for lunch!
Whoah, there! Not so fast!
That server monitoring solution right there? It stinks. It's fragile. It gives you very little information (other than the results of a ping). Even for administering your own home server, that's barely enough information and monitoring to keep things running smoothly.
Even if you have a more robust solution in place, odds are there are significant shortcomings and problems with it. Luckily, Linux Journal has your back—this issue is chock full of advice, tips and tricks for how to keep your servers effectively monitored.
You know, so you're not just guessing of the cat is still alive in there.
Mike Julian (author of O'Reilly's Practical Monitoring) goes into detail on a bunch of the ways your monitoring solution needs serious work in his adorably titled "Why Your Server Monitoring (Still) Sucks" article.
We continue "telling it like it is" with Corey Quinn's treatise on Amazon's CloudWatch, "CloudWatch Is of the Devil, but I Must Use It". Seriously, Corey, tell us how you really feel.