Linkerd cluster running slowly, and cpu high!


#1

the following is my environment:

  1. k8s 1.10 with calico
  2. ingress-nginx as lb, it will set a custom header as identifier send to linkerd.
  3. linkerd-cni-1.4.2 as internal linkerd, and set JVM_HEAP_MIN and JVM_HEAP_MAX to 1024M
  4. namerd-1.4.2 in daemonset
  5. I have 3 node , and the service use nginx-test pod to serve a static page.

but , I use ab command to test performance, the response is 300ms-900ms every time.
If I don’t use linkerd, the ab is very fastly!
ab -r -k -n 10000 -c 1000 http://yh.com/index.html

Every node is esxi , and it has 4 cpu, 4 G mem!

The following image is zipkin:

my configure:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-config
  namespace: linkerd
data:
  config.yaml: |-
    admin:
      ip: 0.0.0.0
      port: 9990
    telemetry:
    - kind: io.l5d.prometheus
    - kind: io.l5d.zipkin
      host: 10.200.0.10
    - kind: io.l5d.recentRequests
      sampleRate: 0.25
    usage:
      enabled: false
    routers:
    - protocol: http
      httpAccessLog: access.log
      label: outgoing
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.linkerd.svc.cluster.local/4100
        namespace: external
      servers:
      - port: 4140
        ip: 0.0.0.0
      service:
        responseClassifier:
          kind: io.l5d.http.retryableRead5XX

    - protocol: http
      label: incoming
      httpAccessLog: access.log
      identifier:
        kind: io.l5d.header.token
        header: my-header
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.linkerd.svc.cluster.local/4100
        namespace: internal
        transformers:
        - kind: io.l5d.k8s.localnode
          hostNetwork: true
      servers:
      - port: 4141
        ip: 0.0.0.0

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: l5d
  name: l5d
  namespace: linkerd
spec:
  template:
    metadata:
      labels:
        app: l5d
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      - name: l5d-config
        configMap:
          name: "l5d-config"
      - name: certificates
        secret:
          secretName: certificates
      #nodeSelector:
      #  func: linkerd
      containers:
      - name: l5d
        image: office.registry.cn/buoyantio/linkerd:1.4.2
        env:
        - name: JVM_HEAP_MIN
          value: 1024M
        - name: JVM_HEAP_MAX
          value: 1024M
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        args:
        - /io.buoyant/linkerd/config/config.yaml
        ports:
        - name: outgoing
          containerPort: 4140
          hostPort: 4140
        - name: incoming
          containerPort: 4141
          hostPort: 4141
        - name: admin
          containerPort: 9990
          hostPort: 9990
        volumeMounts:
        - name: "l5d-config"
          mountPath: "/io.buoyant/linkerd/config"
          readOnly: true
        - name: "certificates"
          mountPath: "/io.buoyant/linkerd/certs"
          readOnly: true

      - name: kubectl
        image: office.registry.cn/buoyantio/kubectl:v1.8.5
        args:
        - "proxy"
        - "-p"
        - "8001"
---
apiVersion: v1
kind: Service
metadata:
  name: l5d
  namespace: linkerd
spec:
  selector:
    app: l5d
  type: LoadBalancer
  ports:
  - name: outgoing
    port: 4140
  - name: incoming
    port: 4141
  - name: admin
    port: 9990


my ingress: 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
      set $svc nginx-test;
  name: yh.com
  namespace: linkerd
spec:
  rules:
  - host: yh.com
    http:
      paths:
      - backend:
          serviceName: l5d
          servicePort: 4141

my default namespace svc:
apiVersion: v1
kind: Service
metadata:
  name: nginx-test
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    nginx: test
  sessionAffinity: None
  type: ClusterIP

#2

Hi @caduke!

One of the problems with ab is that it uses HTTP/1.0 and re-establishes a connection for each request which is very expensive for Linkerd. Linkerd performs much better when it can use persistent connections. Consider trying another load testing tool such as Slow Cooker.

I also see that you have the recentRequests sample rate set to 0.25 which can also have an impact on performance.