Note: this post is about Linkerd 1.x. For Linkerd 2.x performance benchmarks, please see the blog post: https://linkerd.io/2019/05/18/linkerd-benchmarks/
The goal of this post is to document various tuning strategies to achieve optimal Linkerd performance.
An optimally-tuned Linkerd with 1GB RAM and ample CPU should provide the following performance:
These numbers are highly dependent on the environment, configuration, and latency characteristics of the cluster Linkerd is running in.
In production, set the
JVM_HEAP_MAX environment variables to the same value, to avoid memory fragmentation. For high qps applications, we recommend at least
By default we set
1024M. This is intended for a development environment where, for example, you’d want to easily boot Linkerd on your personal computer.
If Linkerd is running in a virtualized environment, be sure to give 33% headroom above the JVM. For example, if your Docker container or Kubernetes config provides 1GB to Linkerd, give the JVM
Note that these environment variables are for convenience, and map to JVM flags
Avoid setting CPU limitations in virtualized/containerized environments. Linkerd tunes its thread count based on the number of CPUs it detects on the host. If your environment provides a smaller share of CPUs, Linkerd will allocate more threads than it should, causing performance degregdation. To verify this, check the
jvm/num_cpus metric in Linkerd’s
If you need to limit the CPUs available to Linkerd, use taskset to specify the number of CPUs to give it. For example, to limit Linkerd to 4 CPUs in Kubernetes:
command: ["taskset"] args: - "--cpu-list" - "0-3" - "/io.buoyant/linkerd/1.3.3-SNAPSHOT/bundle-exec" - "/io.buoyant/linkerd/config/config.yaml"
Avoid setting high
sampleRate values for Linkerd Telemeters, specifically:
If you encounter performance issues, and have one or more of these telemeters enabled, try disabling them and re-running your benchmarks.
Linkerd provides powerful performance tuning via Failure Accrual, Load-Balancing, and Retries. By default, these parameters are tuned for a high traffic production environment. We recommend you start with defaults, and carefully tune to your environment, re-running benchmarks with each change.
linkerd-viz provides a quick overview of top-line service and linkerd performance. It builds on metrics available from
linkerd-viz includes two dashboards, a default for service-level peformance, and linkerd-health dashboard:
The default dashboard shows service-level performance, including success rate, latency, and request volume.
The linkerd-health dashboard provides Linkerd performance, including memory usage, GC, and uptime.
If Linkerd appears to be CPU bound, you can generate Flame Graphs from a running Linkerd to easily identify hot spots. The instructions below will generate an interactive SVG file, allowing you to browse code-paths by CPU usage.
# be sure to install the old gperftools, we explicitly need pprof from that package, # not the one from https://github.com/google/pprof brew install gperftools $ pprof --version pprof (part of gperftools 2.0) Copyright 1998-2007 Google Inc. This is BSD licensed software; see the source for copying conditions and license information. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. git clone https://github.com/brendangregg/FlameGraph # build flame graph curl -s "http://localhost:9990/admin/pprof/profile?seconds=30&hz=100" > l5d.pprof pprof --collapsed l5d.pprof > l5d.collapsed FlameGraph/flamegraph.pl --color=java --hash l5d.collapsed > l5d.svg open l5d.svg