I’m trying to learn more about an issue we are having that we suspect is due to a high number of incoming connections to Linkerd.
We run Linkerd as an ingress controller in Kubernetes, and basic load testing from a small number of clients has us easily handling 7-8k RPS, which is ample head room for expected load. The real-world traffic coming into the cluster comes from approx 150 hosts, each opening 80 concurrent connections. We see throughput start to suffer once l5d instances hit about 1000-1200 server connections. CPU and memory are generally stable, and well within limits.
We can (and will) improve connection pooling in the client, but meanwhile would like to understand more about the constraint we’re hitting.
- Kubernetes 1.9.6 on GKE
- Linkerd configured as daemonset (during load test instances peak up to utilising 2CPUs of each host)
- 5 hosts: 8 vCPU, 30GB.
- 1024MB -Xms and -Xmx
resources: limits: memory: 1.5Gi requests: memory: 1.5Gi
admin: ip: 0.0.0.0 port: 9990 namers: - kind: io.l5d.k8s telemetry: - kind: io.l5d.prometheus routers: - protocol: http identifier: kind: io.l5d.ingress servers: - port: 80 ip: 0.0.0.0 dtab: /svc => /#/io.l5d.k8s client: loadBalancer: kind: ewma maxEffort: 5 decayTimeMs: 5000
Can give more detail about our configuration if necessary, but meanwhile wondering about…
- How many incoming connections would you expect Linkerd to be able to handle with the above configuration? (we’ve previously frontend this service with nginx on the same K8S infra/config).
- Any best practises in relation to connection reuse/pooling that we should be thinking about?