Skip to main content

Service Mesh with Linkerd

Service meshes are considered to be a hot topic in cloud native. As microservice architectures grew, so did the point-to-point connections between services. This required smart capabilities from the network and service meshes. They started providing a feature set that tries to make operators lives easier. They add traffic routing, observability, and security features to the network.

Even though service meshes offer solutions to many problems, you should know why you are adopting them.

Know why you are adopting a service mesh​

Monzo's microservice architecture is at a scale at which making sense of it without tooling is probably impossible. But not everyone is Monzo and service meshes are not silver bullets.

I had client requests where I benefitted from the quick overview Linkerd provided on network topology. As an operator, I had a hard time finding out which service called which. Linkerd's deployment level overview of incoming and outbound traffic helped me understand the platform.

The free traffic and top-line metrics were also good places to start troubleshooting.

tip

A common misconception: traces do not come from service meshes.

Request traces that span across multiple services are often demoed together with service meshes. It is important to highlight that traces do not come by default with the usage of service meshes. You still need to propagate request IDs across service boundaries to be able to generate the trace of a request, as it is traveling through multiple services. See the #Distributed tracing in the service mesh: four myths blog post.

Day-two operations​

It's always DNS service mesh​

But adopting service meshes has its cost. Service meshes proxy all traffic between your services, and sometimes they can cause network issues you seldom see otherwise.

During my adoption phase of a service mesh - and to some extent even today - if a service produces a hard-to-explain network related issue, chances are that turning the service mesh off for that service will solve the issue.

You can exclude services from the mesh on a per-service basis with theΒ linkerd.io/inject: "disabled"Β pod annotation. Since this is a low cost fix attempt, it is worth keeping this annotation in mind, should you see odd network behavior.