Going to Production


#1

This page describes some things you might find useful if you’re preparing to run (or running!) linkerd in production.

Metrics to monitor

We recommend setting an alert that checks that these metrics are 0:

rt/*/bindcache/bound/oneshots
rt/*/bindcache/client/oneshots
rt/*/bindcache/path/oneshots
rt/*/bindcache/tree/oneshots

When any of these are non-zero, a cache has been exhausted and parts of the client stack are being built in the request serving path.

Other things our users have found helpful to keep an eye on:

failure_accrual:removals - tracks the number of times a host has been removed due to failure accrual.

loadbalancer:available - gauge of how many nodes the load balancer thinks are ready to receive traffic.

For more information on the metrics linkerd makes available, see the linkerd docs as well as the Finagle docs

Configuration considerations

failFast

If true, connection failures are punished aggressively. This should be set to false on clients that talk to small clusters with fewer than ~3 nodes.


closed #2

pinned #3

Converting Github Wiki to Discourse Wiki