Confusion about benchmarking Linkerd

Hi all. Several days ago, I tried to benchmark Linkerd with test tools like wrk or webbench. However, the results turned out that Linkerd performed terribly when deployed with high concurrent applications(e.g. Nginx).

I’m not sure if deployed or configured Linkerd in a proper way, or if I misused the test tools. Since discourse is a better place for leading to a subject to discuss, I’d like to post my test here. Hope someone could help me out.

Topology

I setted up a three-node Kubernetes cluster, each with label: membership=master, membership=slave1, membership=slave2.

Then, I deployed an nginx-server service on master, a wrk service on slave1, a wrk-l5d(with http_proxy configured) service on slave2.
What I tried to compare/test is the throughput(requests per second) of two different routes, Non-Linkerd and Linkerd (i.e.wrk -> nginx vs. wrk -> l5d -> l5d -> nginx)

Deployment

Seems new users cannot upload files, so I have to share them by Google drive. Sorry.

  1. Linkerd deployment: https://drive.google.com/open?id=0B2FyytRtzjMKNEt5Ri0tWldndFU
  2. Wrk-l5d deployment: https://drive.google.com/open?id=0B2FyytRtzjMKcDRJODAtb2hMaEE

Test

  1. First, execute a shell in wrk pod, and start testing with a script:
#!/bin/sh
base=200
for i in `seq 10`; do
    conn=`expr $base \* $i`
    wrk -t8 -c$conn -d100s --latency http://nginx-server
    echo ""
    sleep 20
done
  1. Next, execute a shell in wrk-l5d pod, and start testing with another script:
#!/bin/sh
base=200
for i in `seq 10`; do
    conn=`expr $base \* $i`
    wrk -t8 -c$conn -d100s --latency -s proxy.lua http://$http_proxy
    echo ""
    sleep 20
done

For proxy.lua:

local connected = false
local host = "nginx-server"
local path = "/"
local url  = "http://" .. host .. path

wrk.headers["Host"] = host

request = function()
   if not connected then
      connected = true
      return wrk.format("CONNECT", host)
   end

   return wrk.format("GET", url)
end
  1. And finally, the results turned out that, wrk -> nginx had a throughput of almost 30000 rps, while wrk -> l5d -> l5d -> nginx got only 2300 rps.

Some details

  • Kubernetes version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:33:17Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

If there is something confusing in my post, or if I’ve missed some key, please feel free to point it out. Thanks.

Hi @ihac. Thanks for posting all the detail, it’s helpful.

2300 qps is well below what we expect to see with linkerd. Let’s try a few config changes:

  1. Telemeters increase cpu and latency, particularly setting io.l5d.zipkin to sampleRate: 1.0. For now let’s remove the zipkin and recentRequests telemeters from the config:
- kind: io.l5d.zipkin
  host: zipkin-collector.default.svc.cluster.local
  port: 9410
  sampleRate: 1.0
- kind: io.l5d.recentRequests
  sampleRate: 0.25
  1. Try using wrk2 instead of wrk. wrk2 takes a --rate parameter that allows to you target a specific throughput, providing a more real-world benchmark. More details at https://github.com/giltene/wrk2.

  2. Try using the buoyantio/linkerd:1.2.1 docker image rather than 1.2.0.fix4-SNAPSHOT.

  3. Set JVM_HEAP_MIN and JVM_HEAP_MAX to 512M, ensuring linkerd has enough memory headroom for high throughput testing. Setting JVM_HEAP_MIN in particular avoids memory fragmentation and GC pressure.

Let us know how this affects your tests, thanks!

Hi, @siggy. Many thanks for your reply and suggestion.

Yesterday, I tested it again following your suggestions: removing all telemeters from the config, setting JVM_HEAP_MIN to 1024M, JVM_HEAP_MAX to 2048M(acctually, linkerd only used about 600MB), using linkerd:1.2.1, removing all unnecessary dtab rules.

And the good news was that, Linkerd’s maximum QPS come to 13000, instead of 2300, no matter what tools I was using(wrk, wrk2 or slow_cooker).
But there still is a huge gap between linkerd proxy(13000 qps) and native kube_proxy(30000 qps).

I’ve read this post(Linkerd performance), and seems linkerd should perform much better than 13000. But I’m not sure what to optimize next.

BTW, how well would Linkerd be expected to perform, I mean, against direct access? I saw that linkerd was capable of supporting almost 40k qps in your AWS cluster, but what about the original throughput without using linkerd?

Good to hear it’s working a bit better @ihac, but I agree we can do better. What type of nodes are you running on? What platform (AWS/Gcloud)? How many CPUs are available to linkerd? Providing up to 16 CPUs to linkerd can help a lot. It won’t peg your cpu, but its multithreaded design works best with more cores.

I deployed k8s on three physical machines(4 cpu, 8GB mem), and unfortunately those are all I’m permitted to use. It’s a pity.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.