Linkerd performance


#1

Hi guys,

when i did linkerd performance with slow_cooker, one linkerd forwarded requests to two backend services,
both linkerd and two backend services run on same VM configurations: 8 core CPU, 8G RAM.
here is the result:

# slow_cooker -host demo.svc.consul -qps 2000 -concurrency 10 http://10.xx.xx.104:8080
# sending 20000 GET req/s with concurrency=10 to http://10.xx.xx.104:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-03T14:30:25Z  60759/0/0 200000  30% 10s   0 [  1   2   3   39 ]   39      0 +
2017-07-03T14:30:35Z  62339/0/0 200000  31% 10s   0 [  1   2   3   12 ]   12      0 +
2017-07-03T14:30:45Z  62901/0/0 200000  31% 10s   0 [  1   2   3   12 ]   12      0
2017-07-03T14:30:55Z  62078/0/0 200000  31% 10s   0 [  1   2   3   13 ]   13      0
2017-07-03T14:31:05Z  62625/0/0 200000  31% 10s   0 [  1   2   3   11 ]   11      0
2017-07-03T14:31:15Z  62223/0/0 200000  31% 10s   0 [  1   2   3   10 ]   10      0
2017-07-03T14:31:25Z  62274/0/0 200000  31% 10s   0 [  1   2   3   10 ]   10      0
2017-07-03T14:31:35Z  61830/0/0 200000  30% 10s   0 [  1   2   3   13 ]   13      0
2017-07-03T14:31:45Z  62628/0/0 200000  31% 10s   0 [  1   2   3   11 ]   11      0
2017-07-03T14:31:55Z  62640/0/0 200000  31% 10s   0 [  1   2   3   17 ]   17      0
2017-07-03T14:32:05Z  62412/0/0 200000  31% 10s   0 [  1   2   3   10 ]   10      0
2017-07-03T14:32:15Z  62786/0/0 200000  31% 10s   0 [  1   2   3   12 ]   12      0
2017-07-03T14:32:25Z  63131/0/0 200000  31% 10s   0 [  1   2   3   17 ]   17      0
2017-07-03T14:32:35Z  63574/0/0 200000  31% 10s   0 [  1   2   3   10 ]   10      0
2017-07-03T14:32:45Z  63307/0/0 200000  31% 10s   0 [  1   2   3   10 ]   10      0
2017-07-03T14:32:55Z  63372/0/0 200000  31% 10s   0 [  1   2   3   12 ]   12      0
^CFROM    TO #REQUESTS
   0     2 942822
   2     8 102141
   8    32 397
  32    64 12
  64   128 0
 128   256 0
 256   512 0
 512  1024 0
1024  4096 0
4096 16384 0

But if i directly sent requests to one of two backend services, testing results as:

# slow_cooker  -qps 2000 -concurrency 10 http://10.xx.xx.105:8080
# sending 20000 GET req/s with concurrency=10 to http://10.xx.xx.105:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-03T14:40:10Z 186348/0/0 200000  93% 10s   0 [  0   0   0    8 ]    8      0
2017-07-03T14:40:20Z 186380/0/0 200000  93% 10s   0 [  0   0   1    6 ]    6      0 +
2017-07-03T14:40:30Z 186308/0/0 200000  93% 10s   0 [  0   0   1    6 ]    6      0 +
2017-07-03T14:40:40Z 186206/0/0 200000  93% 10s   0 [  0   0   1    7 ]    7      0 +
2017-07-03T14:40:50Z 186529/0/0 200000  93% 10s   0 [  0   0   0    6 ]    6      0
2017-07-03T14:41:00Z 186753/0/0 200000  93% 10s   0 [  0   0   0    7 ]    7      0
2017-07-03T14:41:10Z 186260/0/0 200000  93% 10s   0 [  0   0   1    6 ]    6      0 +
2017-07-03T14:41:20Z 186077/0/0 200000  93% 10s   0 [  0   0   1    6 ]    6      0 +
2017-07-03T14:41:30Z 185820/0/0 200000  92% 10s   0 [  0   0   0   10 ]   10      0
2017-07-03T14:41:40Z 186696/0/0 200000  93% 10s   0 [  0   0   0    6 ]    6      0
2017-07-03T14:41:50Z 186415/0/0 200000  93% 10s   0 [  0   0   0    6 ]    6      0
2017-07-03T14:42:00Z 186114/0/0 200000  93% 10s   0 [  0   0   0    6 ]    6      0
2017-07-03T14:42:10Z 186194/0/0 200000  93% 10s   0 [  0   0   1   11 ]   11      0 +
2017-07-03T14:42:20Z 187130/0/0 200000  93% 10s   0 [  0   0   0    7 ]    7      0
^CFROM    TO #REQUESTS
   0     2 2690656
   2     8 10800
   8    32 33
  32    64 0
  64   128 0
 128   256 0
 256   512 0
 512  1024 0
1024  4096 0
4096 16384 0

seems there is a big gap between direct access and forward request via linkerd, why this? could you guys give the basic performance of linkerd? how to optimize linkerd to get maximum performance?

Thanks in advance.


Confusion about benchmarking Linkerd
#2

Hi @yangzhares,

This is a bit higher than our goal of p95=1ms, p99=5ms, but it’s not drastically higher. At 20k RPS and above (which is where you’re testing), we can often eke out a little better performance by explicitly setting -Xms and -Xmx to 1gb or 2gb explicitly, so you could try that.

In general, Linkerd is a userspace proxy that does a lot of stuff, so it’s not unexpected there to be a cost to doing this. Under real-world conditions where instances are overloaded, slow, or unhealthy, Linkerd can often reduce tail latencies over raw connections by virtue of its load balancing, circuit breaking, etc. Often this tradeoff between improved tail latencies at the expense of a minor increase in best-case latencies is worth it, but it depends on the situation. We explore this a little in our blog post Making Things Faster by Adding More Steps.

Finally, we have an ongoing line of work (starting with linkerd-tcp) that will really improve latency and resource costs. This is scheduled for later this year.

Hope that helps!


#3

With slow_cooker, if you increase the number of connections (via -concurrency), you should find the throughput gap decreases pretty dramatically.


#4

Thanks @william and @stevej, here is the result after changing -Xms to 2G and -Xmx to 4G, --concurrency from 100 to 400.

QPS: 10K

# slow_cooker -host demo.svc.consul -qps 100 -concurrency 100 http://10.xx.xx.104:8080
# sending 10000 GET req/s with concurrency=100 to http://10.xx.xx.104:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-04T04:54:07Z  99745/0/0 100000  99% 10s   0 [  3   5   9  204 ]  204      0 +
2017-07-04T04:54:17Z 100000/0/0 100000 100% 10s   0 [  3   5   7   17 ]   17      0
2017-07-04T04:54:27Z 100000/0/0 100000 100% 10s   0 [  3   5   8   16 ]   16      0
2017-07-04T04:54:37Z  99999/0/0 100000  99% 10s   0 [  3   5   7   20 ]   20      0
2017-07-04T04:54:47Z 100000/0/0 100000 100% 10s   0 [  3   5   8   17 ]   17      0
2017-07-04T04:54:57Z 100000/0/0 100000 100% 10s   0 [  3   5   7   16 ]   16      0
2017-07-04T04:55:07Z 100001/0/0 100000 100% 10s   0 [  3   5   8   15 ]   15      0
2017-07-04T04:55:17Z 100005/0/0 100000 100% 10s   0 [  3   5   8   17 ]   17      0
2017-07-04T04:55:27Z  99998/0/0 100000  99% 10s   0 [  3   5   8   19 ]   19      0
2017-07-04T04:55:37Z 100001/0/0 100000 100% 10s   0 [  3   5   7   18 ]   18      0
2017-07-04T04:55:47Z 100006/0/0 100000 100% 10s   0 [  3   5   8   18 ]   18      0
2017-07-04T04:55:57Z 100002/0/0 100000 100% 10s   0 [  3   5   8   18 ]   18      0
2017-07-04T04:56:07Z 100012/0/0 100000 100% 10s   0 [  3   5   8   18 ]   18      0
2017-07-04T04:56:17Z  99997/0/0 100000  99% 10s   0 [  3   5   8   18 ]   18      0

QPS: 20K

# slow_cooker -host demo.svc.consul -qps 100 -concurrency 200 http://10.xx.xx.104:8080
# sending 20000 GET req/s with concurrency=200 to http://10.xx.xx.104:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-04T04:57:59Z 190483/0/0 200000  95% 10s   0 [  6  16  23  226 ]  226      0 +
2017-07-04T04:58:09Z 193920/0/0 200000  96% 10s   0 [  6  15  22   51 ]   51      0
2017-07-04T04:58:19Z 194751/0/0 200000  97% 10s   0 [  5  15  21   46 ]   46      0
2017-07-04T04:58:29Z 193186/0/0 200000  96% 10s   0 [  6  16  23   56 ]   56      0
2017-07-04T04:58:39Z 193649/0/0 200000  96% 10s   0 [  6  15  22   44 ]   44      0
2017-07-04T04:58:49Z 194632/0/0 200000  97% 10s   0 [  5  15  21   52 ]   52      0
2017-07-04T04:58:59Z 193912/0/0 200000  96% 10s   0 [  6  16  22   49 ]   49      0
2017-07-04T04:59:09Z 195248/0/0 200000  97% 10s   0 [  5  15  21   43 ]   43      0
2017-07-04T04:59:19Z 194786/0/0 200000  97% 10s   0 [  5  15  21   56 ]   56      0
2017-07-04T04:59:29Z 192650/0/0 200000  96% 10s   0 [  6  16  22   77 ]   77      0
2017-07-04T04:59:39Z 193651/0/0 200000  96% 10s   0 [  6  15  22   50 ]   50      0

QPS: 30K

# slow_cooker -host demo.svc.consul -qps 100 -concurrency 300 http://10.xx.xx.104:8080
# sending 30000 GET req/s with concurrency=300 to http://10.xx.xx.104:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-04T05:01:03Z 190405/0/0 300000  63% 10s   0 [ 13  31  42  251 ]  251      0 +
2017-07-04T05:01:13Z 192537/0/0 300000  64% 10s   0 [ 13  31  41   79 ]   79      0
2017-07-04T05:01:23Z 193887/0/0 300000  64% 10s   0 [ 13  31  40   76 ]   76      0
2017-07-04T05:01:33Z 192727/0/0 300000  64% 10s   0 [ 13  31  41   79 ]   79      0
2017-07-04T05:01:43Z 191531/0/0 300000  63% 10s   0 [ 13  31  41  115 ]  115      0
2017-07-04T05:01:53Z 194032/0/0 300000  64% 10s   0 [ 13  31  41   72 ]   72      0
2017-07-04T05:02:03Z 192772/0/0 300000  64% 10s   0 [ 13  31  41   93 ]   93      0
2017-07-04T05:02:13Z 190728/0/0 300000  63% 10s   0 [ 13  31  41   76 ]   76      0
2017-07-04T05:02:23Z 193884/0/0 300000  64% 10s   0 [ 13  31  40   73 ]   73      0
2017-07-04T05:02:33Z 194238/0/0 300000  64% 10s   0 [ 13  30  40   79 ]   79      0
2017-07-04T05:02:43Z 191512/0/0 300000  63% 10s   0 [ 13  31  41   78 ]   78      0
2017-07-04T05:02:53Z 193270/0/0 300000  64% 10s   0 [ 13  31  41   90 ]   90      0

QPS: 40K

# slow_cooker -host demo.svc.consul -qps 100 -concurrency 400 http://10.xx.xx.104:8080
# sending 40000 GET req/s with concurrency=400 to http://10.xx.xx.104:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-04T05:04:34Z 189383/0/0 400000  47% 10s   0 [ 19  40  52  277 ]  277      0 +
2017-07-04T05:04:44Z 191132/0/0 400000  47% 10s   0 [ 18  40  52  105 ]  105      0
2017-07-04T05:04:54Z 194632/0/0 400000  48% 10s   0 [ 18  40  51   99 ]   99      0
2017-07-04T05:05:04Z 195722/0/0 400000  48% 10s   0 [ 18  39  51   88 ]   88      0
2017-07-04T05:05:14Z 196457/0/0 400000  49% 10s   0 [ 18  39  50   84 ]   84      0
2017-07-04T05:05:24Z 195858/0/0 400000  48% 10s   0 [ 18  39  50   99 ]   99      0
2017-07-04T05:05:34Z 197671/0/0 400000  49% 10s   0 [ 18  39  49   89 ]   89      0
2017-07-04T05:05:44Z 195805/0/0 400000  48% 10s   0 [ 18  40  51  103 ]  103      0
2017-07-04T05:05:54Z 194825/0/0 400000  48% 10s   0 [ 18  39  50  111 ]  111      0
2017-07-04T05:06:04Z 194826/0/0 400000  48% 10s   0 [ 18  40  51   99 ]   99      0

From the output, we can find linkerd’s maximum QPS about 20K in my VM(8CPU, 8G RAM), when beyond 20K, QPS don’t increase again even you have enough CPU and RAM resource, from my testing, RAM requirement always about 800M. So, can i think current linkerd’s maximum QPS is 20K in my environment? any other ways to deeply optimize?

Another thing is latency becomes too long when increase QPS.

Thanks,
Clare


#5

Hi Clare,

We typically expect to see better results than this. For example, on our own test cluster, we see 30k qps p50 at 2ms vs. 13ms. Here are a few things to check on your end:

  1. Set both -Xms and -Xmx to the same value. We use 1024M.
  2. Are you warming up linkerd prior to running your tests? We typically run at least 20K qps for at least 1 minute prior to our tests. The JVM performs much better after warmup.
  3. What kind of responses do your backends return? How large? Are they chunked? Mind sharing your response headers? (curl -v -o /dev/null ...)
  4. Are you running slow_cooker on the same host as linkerd? If so, try running it on a separate machine.
  5. Have you tried on 16-core machines? Linkerd (and it’s underlying technology, Finagle) does best with a lot of cores. It won’t saturate your machine, but having that many cores available really helps.
  6. Have you confirmed how the backends perform above 20k without linkerd, for comparison?

Test results

10K qps: p50: 1ms, p99: 4ms
20K qps: p50: 1ms, p99: 6ms
30K qps: p50: 2ms, p99: 10ms
40K qps: p50: 2ms, p99: 26ms

Test setup

Raw results

QPS: 10K

# slow_cooker -totalRequests 600000 -concurrency 100 -qps 100 -host h0 http://linkerd:8080/
# sending 10000 GET req/s with concurrency=100 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-05T16:43:46Z  99808/0/0 100000  99% 10s   0 [  1   3   4   20 ]   20 +
2017-07-05T16:43:56Z 100000/0/0 100000 100% 10s   0 [  1   2   4    6 ]    6 +
2017-07-05T16:44:06Z 100000/0/0 100000 100% 10s   0 [  1   2   3    7 ]    7
2017-07-05T16:44:16Z 100000/0/0 100000 100% 10s   0 [  1   3   4    7 ]    7
2017-07-05T16:44:26Z 100003/0/0 100000 100% 10s   0 [  1   3   4    7 ]    7
2017-07-05T16:44:36Z 100004/0/0 100000 100% 10s   0 [  1   3   4    7 ]    7
2017-07-05T16:44:46Z 100000/0/0 100000 100% 10s   0 [  1   2   3    7 ]    7

QPS: 20K

# slow_cooker -totalRequests 1200000 -concurrency 200 -qps 100 -host h0 http://linkerd:8080/
# sending 20000 GET req/s with concurrency=200 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-05T16:50:28Z 199134/0/0 200000  99% 10s   0 [  1   4   6   55 ]   55 +
2017-07-05T16:50:38Z 200000/0/0 200000 100% 10s   0 [  1   4   6   14 ]   14
2017-07-05T16:50:48Z 200000/0/0 200000 100% 10s   0 [  1   4   6   16 ]   16
2017-07-05T16:50:58Z 200000/0/0 200000 100% 10s   0 [  1   4   6   15 ]   15
2017-07-05T16:51:08Z 200001/0/0 200000 100% 10s   0 [  1   4   6   15 ]   15
2017-07-05T16:51:18Z 200001/0/0 200000 100% 10s   0 [  1   4   6   15 ]   15
2017-07-05T16:51:28Z 200006/0/0 200000 100% 10s   0 [  1   4   6   16 ]   16

QPS: 30K

# slow_cooker -totalRequests 1800000 -concurrency 300 -qps 100 -host h0 http://linkerd:8080/
# sending 30000 GET req/s with concurrency=300 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-05T16:52:24Z 297736/0/0 300000  99% 10s   0 [  2   7  12  211 ]  211 +
2017-07-05T16:52:34Z 299827/0/0 300000  99% 10s   0 [  2   6  10   38 ]   38
2017-07-05T16:52:44Z 299955/0/0 300000  99% 10s   0 [  2   6   9   30 ]   30
2017-07-05T16:52:54Z 299953/0/0 300000  99% 10s   0 [  2   6  10   31 ]   31
2017-07-05T16:53:04Z 299935/0/0 300000  99% 10s   0 [  2   6   9   28 ]   28
2017-07-05T16:53:14Z 299933/0/0 300000  99% 10s   0 [  2   6  10   30 ]   30
2017-07-05T16:53:24Z 299911/0/0 300000  99% 10s   0 [  2   6  10   31 ]   31

QPS: 40K

# slow_cooker -totalRequests 2400000 -concurrency 400 -qps 100 -host h0 http://linkerd:8080/
# sending 40000 GET req/s with concurrency=400 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-05T16:53:53Z 380666/0/0 400000  95% 10s   0 [  2  16  27  232 ]  232 +
2017-07-05T16:54:03Z 385368/0/0 400000  96% 10s   0 [  2  16  26   97 ]   97
2017-07-05T16:54:13Z 386989/0/0 400000  96% 10s   0 [  2  15  25   71 ]   71
2017-07-05T16:54:23Z 384388/0/0 400000  96% 10s   0 [  2  16  26   70 ]   70
2017-07-05T16:54:33Z 386542/0/0 400000  96% 10s   0 [  2  15  25  101 ]  101
2017-07-05T16:54:43Z 385404/0/0 400000  96% 10s   0 [  2  15  26   76 ]   76
2017-07-05T16:54:53Z 370183/0/0 400000  92% 10s   0 [  2  17  31  549 ]  549

Linkerd Performance Tuning
#6

Hi @siggy,

Thanks a lot for providing valuable suggestion and information.

  1. Set both -Xms and -Xmx to the same value. We use 1024M.
    Set same as you said
  2. Are you warming up linkerd prior to running your tests? We typically run at least 20K qps for at least 1 minute prior to our tests. The JVM performs much better after warmup.
    Follow your suggestion to do warmup JVM
  3. What kind of responses do your backends return? How large? Are they chunked? Mind sharing your response headers? (curl -v -o /dev/null …)
* About to connect() to 10.xx.xx.105 port 8080 (#0)
*   Trying 10.xx.xx.105...
* Connected to 10.xx.xx.105 (10.xx.xx.105) port 8080 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.xx.xx.105:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Length: 12
< Content-Type: text/plain
< Server: linkerd.performance
< Date: Fri, 07 Jul 2017 06:24:27 GMT
<
{ [data not shown]
* Connection #0 to host 10.xx.xx.105 left intact
  1. Are you running slow_cooker on the same host as linkerd? If so, try running it on a separate machine.
    yes
  2. Have you tried on 16-core machines? Linkerd (and it’s underlying technology, Finagle) does best with a lot of cores. It won’t saturate your machine, but having that many cores available really helps.
    I have only 8-core VM
  3. Have you confirmed how the backends perform above 20k without linkerd, for comparison?
slow_cooker -qps 100 -concurrency 500 http://10.xx.xx.105:8080
# sending 50000 GET req/s with concurrency=500 to http://10.xx.xx.105:8080 ...
#                      good/b/f t   goal%   min [p50 p95 p99  p999]  max bhash change
2017-07-07T06:11:10Z 495368/0/0 500000  99% 10s   0 [  1   5   9   51 ]   51      0 +
2017-07-07T06:11:20Z 499709/0/0 500000  99% 10s   0 [  1   5   8   22 ]   22      0
2017-07-07T06:11:30Z 499845/0/0 500000  99% 10s   0 [  1   5   7   19 ]   19      0
2017-07-07T06:11:40Z 499837/0/0 500000  99% 10s   0 [  1   5   8   21 ]   21      0
2017-07-07T06:11:50Z 499745/0/0 500000  99% 10s   0 [  1   5   8   20 ]   20      0
2017-07-07T06:12:00Z 499668/0/0 500000  99% 10s   0 [  1   5   7   22 ]   22      0
2017-07-07T06:12:10Z 500067/0/0 500000 100% 10s   0 [  1   5   7   19 ]   19      0
2017-07-07T06:12:20Z 499780/0/0 500000  99% 10s   0 [  1   5   7   20 ]   20      0
2017-07-07T06:12:30Z 499607/0/0 500000  99% 10s   0 [  1   5   8   20 ]   20      0
2017-07-07T06:12:40Z 500094/0/0 500000 100% 10s   0 [  1   5   7   19 ]   19      0

Follow your suggestion i redo test in my environment, still only get 20K RPS, it seems my 8-core CPU limits performance increase. Do you have a way to optimize JVM to get higher performance under definite CPU core number?


#7

Hi @yangzhares, would you be willing to share your configuration file (e.g. linkerd.yml) with us? If you wish to do it privately, a DM in the linkerd slack or an email to me (stevej -at- buoyant.io) would be fine. I’d like to double-check it. I think linkerd should be able to do more than 20k on an 8-core host. Thanks!


#8

Hi @yangzhares,

Thanks for testing all the variables. We’ve re-run our own tests using taskset -c 0-7 ... to simulate 8 core hosts. The results are slightly slower than with 16-cores, but still faster than the results you are getting. As Steve suggested, please share your configs with us so we can dig a little further.

Test results

10K qps: p50: 2ms, p99: 5ms
20K qps: p50: 2ms, p99: 8ms
30K qps: p50: 2ms, p99: 28ms
40K qps: p50: 2ms, p99: 45ms

linkerd config

For reference, here is the config file we’re testing with:

admin:
  port: 9990
namers:
- kind: io.l5d.k8s
  experimental: true
  host: 127.0.0.1
  port: 8001
routers:
- protocol: http
  servers:
  - port: 8080
    ip: 0.0.0.0
  dtab: |
    /k8s      => /#/io.l5d.k8s/default/http;
    /svc/h0 => /k8s/hello;

Raw results

QPS: 10K

# slow_cooker -totalRequests 600000 -concurrency 100 -qps 100 -host h0 http://linkerd:8080/
# sending 10000 GET req/s with concurrency=100 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-07T21:53:02Z  99785/0/0 100000  99% 10s   0 [  2   3   5   34 ]   34 +
2017-07-07T21:53:12Z 100000/0/0 100000 100% 10s   0 [  2   3   5   13 ]   13
2017-07-07T21:53:22Z 100000/0/0 100000 100% 10s   0 [  2   3   5    8 ]    8
2017-07-07T21:53:32Z 100000/0/0 100000 100% 10s   0 [  2   3   5    7 ]    7
2017-07-07T21:53:42Z 100000/0/0 100000 100% 10s   0 [  2   3   5    7 ]    7
2017-07-07T21:53:52Z 100002/0/0 100000 100% 10s   0 [  2   3   5    7 ]    7
2017-07-07T21:54:02Z 100005/0/0 100000 100% 10s   0 [  2   3   5    9 ]    9

QPS: 20K

# slow_cooker -totalRequests 1200000 -concurrency 200 -qps 100 -host h0 http://linkerd:8080/
# sending 20000 GET req/s with concurrency=200 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-07T21:54:52Z 198462/0/0 200000  99% 10s   0 [  2   6  10   73 ]   73 +
2017-07-07T21:55:02Z 199962/0/0 200000  99% 10s   0 [  2   6   8   26 ]   26
2017-07-07T21:55:12Z 199998/0/0 200000  99% 10s   0 [  2   6   8   23 ]   23
2017-07-07T21:55:22Z 199975/0/0 200000  99% 10s   0 [  2   6   8   27 ]   27
2017-07-07T21:55:32Z 199995/0/0 200000  99% 10s   0 [  2   6   8   20 ]   20
2017-07-07T21:55:42Z 199974/0/0 200000  99% 10s   0 [  2   6   8   29 ]   29
2017-07-07T21:55:52Z 199946/0/0 200000  99% 10s   0 [  2   6   8   27 ]   27

QPS: 30K

# slow_cooker -totalRequests 1800000 -concurrency 300 -qps 100 -host h0 http://linkerd:8080/
# sending 30000 GET req/s with concurrency=300 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-07T21:56:17Z 280185/0/0 300000  93% 10s   0 [  2  19  29  233 ]  233 +
2017-07-07T21:56:27Z 284121/0/0 300000  94% 10s   0 [  2  18  29   67 ]   67
2017-07-07T21:56:37Z 285210/0/0 300000  95% 10s   0 [  2  18  28   66 ]   66
2017-07-07T21:56:47Z 283849/0/0 300000  94% 10s   0 [  2  18  28   75 ]   75
2017-07-07T21:56:57Z 283307/0/0 300000  94% 10s   0 [  2  18  29   71 ]   71
2017-07-07T21:57:07Z 283060/0/0 300000  94% 10s   0 [  2  18  28   69 ]   69
2017-07-07T21:57:17Z 283594/0/0 300000  94% 10s   0 [  2  18  28   75 ]   75

QPS: 40K

# slow_cooker -totalRequests 2400000 -concurrency 400 -qps 100 -host h0 http://linkerd:8080/
# sending 40000 GET req/s with concurrency=400 to http://linkerd:8080/ ...
#                      good/b/f t   good%   min [p50 p95 p99  p999]  max change
2017-07-07T21:57:43Z 326624/0/0 400000  81% 10s   0 [  2  29  48  313 ]  313 +
2017-07-07T21:57:53Z 333044/0/0 400000  83% 10s   0 [  2  29  45  108 ]  108
2017-07-07T21:58:03Z 332689/0/0 400000  83% 10s   0 [  2  29  45   99 ]   99
2017-07-07T21:58:13Z 337300/0/0 400000  84% 10s   0 [  2  28  43  129 ]  129
2017-07-07T21:58:23Z 333103/0/0 400000  83% 10s   0 [  2  28  45  120 ]  120
2017-07-07T21:58:33Z 332642/0/0 400000  83% 10s   0 [  2  28  45  100 ]  100
2017-07-07T21:58:43Z 333594/0/0 400000  83% 10s   0 [  2  28  45  109 ]  109
2017-07-07T21:58:53Z 333486/0/0 400000  83% 10s   0 [  2  29  45  109 ]  109

#9

thanks a lot @siggy and @stevej. Here is my linkerd config:

admin:
  port: 9990
namers:
- kind: io.l5d.consul
  host: 127.0.0.1
  port: 8500
  includeTag: false
  setHost: false

routers:
- protocol: http
  label: outgoing
  dtab: |
    /consul => /#/io.l5d.consul/ocp;
    /svc    => /$/io.buoyant.http.subdomainOfPfx/svc.consul/consul;
  httpAccessLog: /var/log/linkerd.log
  servers:
  - port: 8080
    ip: 0.0.0.0

telemetry:
- kind: io.l5d.recentRequests
  sampleRate: 1.0
- kind: io.l5d.prometheus

usage:
  enabled: false

And i’m running linkerd with Docker on CentOS7.2, also have changed some kernel parameters like:

sysctl -w fs.file-max="9999999"
sysctl -w fs.nr_open="9999999"
sysctl -w net.core.netdev_max_backlog="4096"
sysctl -w net.core.rmem_max="16777216"
sysctl -w net.core.somaxconn="65535"
sysctl -w net.core.wmem_max="16777216"
sysctl -w net.ipv4.ip_local_port_range="1025       65535"
sysctl -w net.ipv4.tcp_fin_timeout="30"
sysctl -w net.ipv4.tcp_keepalive_time="30"
sysctl -w net.ipv4.tcp_max_syn_backlog="20480"
sysctl -w net.ipv4.tcp_max_tw_buckets="400000"
sysctl -w net.ipv4.tcp_no_metrics_save="1"
sysctl -w net.ipv4.tcp_syn_retries="2"
sysctl -w net.ipv4.tcp_synack_retries="2"
sysctl -w net.ipv4.tcp_tw_recycle="1"
sysctl -w net.ipv4.tcp_tw_reuse="1"
sysctl -w vm.min_free_kbytes="65536"
sysctl -w vm.overcommit_memory="1"
ulimit -n 9999999

Launch linkerd Docker instance with this command:

docker run -d -v /etc/linkerd/config.yml:/config.yml --net="host" --name=linkerd --env "JVM_HEAP_MIN=1024M" --env "JVM_HEAP_MAX=1024M" docker.io/buoyantio/linkerd:1.1.0 /config.yml 

That’s all, thanks again.

Thanks,
Clare


#10

Thanks for all the detail @yangzhares.

In your linkerd config, this block may cause performance issues:

- kind: io.l5d.recentRequests
  sampleRate: 1.0

Sample rate of 1.0 means record 100% of requests. Try setting that number to something like 0.02, or, even better, remove it completely.


#11

@siggy, thanks for your help.

I have got the root cause of why got low performance in my env, even same CPU with your. That’s because in my env i recorded all request log into log file that configured in my linkerd config, i verified this it will affect a lot both p99 latency and RPS, if disabled it, almost got same performance as you given. But as you know, access log is very necessary for troubleshooting, it seems that’s not reasonable if disable it. If we can have another way to make we can get high performance and log all access log?


#12

Writing 30,000 log lines a second will slow anyone down. At this scale usually we switch to heavily-sampled Zipkin traces.


#13

@william, Thanks.

Yup, you’re right, but if switch to heavily-sampled Zipkin traces, will it have an impact on performance? How does set sample rate? Do you have any recommendation for this?


#14

There’s a fair amount of tuning involved, but typically you would accomplish this by setting the sampling rate low enough not to have a performance impact. E.g. at 30k RPS and a sampling rate of 0.1%, you’d be sending 30 traces a second, which is probably sustainable if it’s all within the datacenter and your Zipkin collector is in healthy shape.

(In the future I’d like to have some kind of fancy adaptive sampling, but for now it’s just a straight percentage.)


#15

Thanks a lot. Sounds good.


#16

Do you have more details on this? Any publicly sharable roadmap?

Thanks,
Leo.


#17

Some details here: https://docs.google.com/document/d/1odTet2UcUmfvh0rZMmi_bppNxIJhJmmK7ViHkznnkqw/ which we went over in the community meeting (https://www.youtube.com/watch?v=Pj_78cnKKas)