Let's talk about Amazon CloudWatch.
For those fortunate enough to not be stuck in the weeds of Amazon Web Services (AWS), CloudWatch is, and I quote from the official AWS description, "a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers." This is all well and good, except for the part where there isn't a single named constituency who enjoys working with the product. Allow me to dispense some monitoring heresy.
Better, let me describe this in the context of the 14 Amazon Leadership Principles that reportedly guide every decision Amazon makes. When you take a hard look at CloudWatch's complete failure across all 14 Leadership Principles, you wonder how this product ever made it out the door in its current state.
I'll start with billing. Normally left for the tail end of articles like this, the CloudWatch billing paradigm is so terrible, I'm leading with it instead. You get billed per metric, per month. You get billed per thousand metrics you request to view via the API. You get billed per dashboard per month. You get billed per alarm per month. You get charged for logs based upon data volume ingested, data volume stored and "vended logs" that get published natively by AWS services on behalf of the customer. And, you get billed per custom event. All of this can be summed up best as "nobody on the planet understands how your CloudWatch metrics and logs get billed", and it leads to scenarios where monitoring vendors can inadvertently cost you thousands of dollars by polling CloudWatch too frequently. When the AWS charges are larger than what you're paying your monitoring vendor, it's not a wonderful feeling.
"Invent and Simplify"
CloudWatch Logs, CloudWatch Events, Custom Metrics, Vended Logs and Custom Dashboards all mean different things internally to CloudWatch from what you'd expect, compared to metrics solutions that actually make some fathomable level of sense. There are, thus, multiple services that do very different things, all operating under the "CloudWatch" moniker. For example, it's not particularly intuitive to most people that scheduling a Lambda function to invoke once an hour requires a custom CloudWatch Event. It feels overly complicated, incredibly confusing, and very quickly, you find yourself in a situation where you're having to build complex relationships to monitor things that are themselves far simpler.