Observability Intro
“Observability” gives us the ability to fully understand our systems. In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals. -https://en.wikipedia.org/wiki/Observability
Observability is a trending word in Cloud Native. The word originates from control theory, and it stands for the idea that in order to be able to control a system, you first must be able to observe it.
There are different definitions of observability in our industry, and you may get a different definition depending on who you ask. But in the Cloud Native space, if you have a stack that is able to record metrics, logs, and traces, you are in a good place to control your applications.
Logs, metrics, and traces - why three different formats?
The short answer is cost. The cost of storing them at scale.
Each of them looks at our systems from a different perspective:
- Logs are mostly unstructured, they are very expressive, but also very expensive to store
- Metrics are cheap to store, and as long as your data have low cardinality, you have a great overview of the known characteristics of your system
- And finally, traces are made to cross service boundaries. This overview of your system became very useful since the advent of microservices.
We use Prometheus, Loki, and Tempo to cover the three aspects of observability.
note
There are other definitions of observability in the industry.
A SaaS vendor, Honeycomb.io, which focuses on storing a wide data record, meaning structured entries with hunderds of properties, proposes that you can't prepare for the unknown. And with metrics, you are doing just that. You aggregate your data at write time, which may make it cheap to store, but also strips you from interactive discovery of your system. Honeycomb aggregates data on read, which perhaps makes you better equipped to face the unknown.