Routes in admin UI are blinking on and off (v1.3.3)

Issue Type:

  • [X] Bug report
  • [ ] Feature request

What happened:
After deploying the the hello world example (https://github.com/linkerd/linkerd-examples/blob/master/k8s-daemonset/k8s/hello-world.yml) and curling the hello service from within a hello pod I noticed the attached behavior from the admin monitor screen. The hello and world routes are blinking on and off every time the UI polls the metrics.json endpoint. In addition, the metrics displayed are often negative.

What you expected to happen:

The routes should continuously display

How to reproduce it (as minimally and precisely as possible):

Configmap, daemonset, and service are almost identical to https://github.com/linkerd/linkerd-examples/blob/master/k8s-daemonset/k8s/servicemesh.yml with the following exceptions:

  • CNI/hostNetwork=true is enabled in both configmap and daemonset
  • only admin, http, and grpc ports are exposed from service
  • we’ve added an ingress for the admin port
  • we have a prometheus opererator scraping from /admin/metrics/prometheus (see Environment for me details)

Anything else we need to know?:

Environment:

  • linkerd/namerd version, config files: linkerd 1.3.3
  • Platform, version, and config files (Kubernetes, DC/OS, etc): kubernetes 1.7.10
  • Cloud provider or hardware configuration: AWS EC2
  • config based on https://github.com/linkerd/linkerd-examples/blob/master/k8s-daemonset/k8s/servicemesh.yml with the following differences
  • CNI enabled in the daemonset with dnsPolicy: ClusterFirstWithHostNet, hostNetwork: true
  • ingress enabled to the admin
  • nodeport service only for certain ports
  • prometheus operator scraping from /admin/metrics/prometheus
kind: Ingress
metadata:
  name: linkerd
  namespace: linkerd
  annotations:
    kubernetes.io/ingress.class: "linkerd"
spec:
  rules:
  - host: linkerd.uacf.io
    http:
      paths:
      - backend:
          serviceName: linkerd
          servicePort: 9990```


```kind: Service
apiVersion: v1
metadata:
  name: linkerd
  namespace: linkerd
  labels:
    app: l5d
spec:
  selector:
    app: l5d
  ports:
  - name: admin
    port: 9990
    nodePort: 31090
  - name: grpc
    port: 8080
    nodePort: 31080
  - name: http
    port: 80
    nodePort: 31081
  type: NodePort```

 ```apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: linkerd
  namespace: monitoring
  labels:
    k8s-app: linkerd
    app: linkerd
spec:
  selector:
    matchLabels:
      app: linkerd
  namespaceSelector:
    matchNames:
    - linkerd
  endpoints:
    - targetPort: 9990
      path: /admin/metrics/prometheus```

Are you perhaps load balancing over linkerd admin endpoints?

When you access a linkerd’s UI on HOST:9990 – you see the metrics for a single linkerd instance. If HOST happens to be a load-balanced set of linkerd instances, you may be seeing stats for different linkerd instances on each poll.

If this is the case, you may want to use something like linkerd-viz to aggregate metrics across linkerd instances.

Thanks Oliver, this definitely sounds like it might be the case.

Do I understand correctly that linkerd-viz contains its own prometheus? Could/Should it be integrated with an existing cluster’s prometheus instance or is it better to be kept separate?

In a production setting, what role does linkerd-viz play versus the linkerd amin UI?

linkerd-viz is basically a grafana/prometheus configuration that Just Works with clustered Linkerds out-of-the-box to give you top-line metrics for all services Linkerd handles. If you have an existing prometheus cluster, it should be pretty easy to crib from linkerd-viz’s configurations to get the same behavior.

The admin UI is really just intended for spot-checking individual instances.

Thanks for the clarification Oliver! Will give this a shot

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.