Linkerd not stable

Hello, we are struggling with linkerd & namerd where it from time to time stops working. We run linkerd:1.3.2 and namerd:1.3.2 images in kubernetes v1.8.3.

Today, when we tried to access our application through ingress resource, it returned:

Unknown destination: Request("GET /", from /192.168.190.99:50022) / no ingress rule matches

When I looked into logs, there was some repetated exception in namerd pod:

W 1207 07:19:45.082 UTC THREAD190 TraceId:1543e6dc18c28875: k8s ns k2-development service k2-development endpoints resource does not exist, assuming it has yet to be created
E 1207 07:19:46.032 UTC THREAD23 TraceId:85a2fb6e02d91256: resolving addr /#/io.l5d.k8s.http/k2-development/http/orchestration
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources
        at com.twitter.finagle.mux.ServerDispatcher.process(Server.scala:275)
        at com.twitter.finagle.mux.ServerDispatcher.$anonfun$loop$2(Server.scala:292)
        at com.twitter.finagle.mux.ServerDispatcher.$anonfun$loop$2$adapted(Server.scala:290)
        at com.twitter.util.Future$.$anonfun$each$1(Future.scala:1343)
        at com.twitter.util.Future.$anonfun$flatMap$1(Future.scala:1740)
        at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:228)
        at com.twitter.util.Promise$Transformer.k(Promise.scala:228)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:239)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:220)
        at com.twitter.util.Promise$$anon$7.run(Promise.scala:547)
        at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:198)
        at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:157)
        at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:274)
        at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:109)
        at com.twitter.util.Promise.runq(Promise.scala:522)
        at com.twitter.util.Promise.updateIfEmpty(Promise.scala:887)
        at com.twitter.util.Promise.update(Promise.scala:859)
        at com.twitter.util.Promise.setValue(Promise.scala:835)
        at com.twitter.concurrent.AsyncQueue.offer(AsyncQueue.scala:122)
        at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$1.channelRead(ChannelTransport.scala:183)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at com.twitter.finagle.netty4.channel.ChannelRequestStatsHandler.channelRead(ChannelRequestStatsHandler.scala:41)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at com.twitter.finagle.netty4.codec.BufCodec$.channelRead(BufCodec.scala:69)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelRead(ChannelStatsHandler.scala:106)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1342)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:934)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
        at java.lang.Thread.run(Thread.java:748)

And linkerd returned just this:

I 1207 07:20:52.920 UTC THREAD47 TraceId:b771302ec4b24339: no ingress rule found for request k2 /
I 1207 07:20:52.920 UTC THREAD47 TraceId:b771302ec4b24339: no ingress rule found for request k2 /
W 1207 07:20:52.923 UTC THREAD47: Exception propagated to the default monitor (upstream address: /192.168.190.99:50022, downstream address: n/a, label: 0.0.0.0/80).
io.buoyant.router.RoutingFactory$UnknownDst: Unknown destination: Request("GET /", from /192.168.190.99:50022) / no ingress rule matches

All our pods are running correctly and deletion/recreation of all linkerd & namerd pods solved the issue for now. But this is not first time it occurred…

Can you please help me how to debug this behavior? Turn on some debug logging or anything else?

Thank you.

Hi @zsojma,

Based on that error message, it sounds like you don’t have any ingress rules that match that request. Could you share the contents of your ingress resource?

Hi @Alex, thanks for the reply. I would like to remid, that restart of linkerd solved the issue (for now). Therefore I think that we have everything set up correctly… We have several ingress resources from which none had been working that time. Here is example of one:

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: frontend
  namespace: k2-production
  labels:
    app: frontend
  annotations:
    kubernetes.io/ingress.class: "linkerd"
spec:
  rules:
  - host: k2
    http:
      paths:
      - backend:
          serviceName: frontend
          servicePort: http

Here is linkerd.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: l5d-config
  namespace: l5d-system
data:
  config.yaml: |-
    admin:
      ip: 0.0.0.0
      port: 9990

    telemetry:
    - kind: io.l5d.prometheus
    - kind: io.l5d.recentRequests
      sampleRate: 0.25
      
    usage:
      enabled: false

    routers:
    - label: http-outgoing
      protocol: http
      servers:
      - port: 4140
        ip: 0.0.0.0
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: http-outgoing

    - label: http-incoming
      protocol: http
      servers:
      - port: 4141
        ip: 0.0.0.0
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: http-incoming
        transformers:
        - kind: io.l5d.k8s.localnode
          hostNetwork: true

    - label: grpc-outgoing
      protocol: h2
      experimental: true
      servers:
      - port: 4340
        ip: 0.0.0.0
      identifier:
        kind: io.l5d.header.token
        header: host-path
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: grpc-outgoing

    - label: grpc-incoming
      protocol: h2
      experimental: true
      servers:
      - port: 4341
        ip: 0.0.0.0
      identifier:
        kind: io.l5d.header.token
        header: host-path
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: grpc-incoming
        transformers:
        - kind: io.l5d.k8s.localnode
          hostNetwork: true

    - label: http-ingress
      protocol: http
      servers:
      - port: 80
        ip: 0.0.0.0
      identifier:
        kind: io.l5d.ingress
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: http-ingress

    - label: grpc-ingress
      protocol: h2
      experimental: true
      servers:
      - port: 8180
        ip: 0.0.0.0
      identifier:
        kind: io.l5d.ingress
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/localhost/30150
        namespace: grpc-ingress
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: l5d
  name: l5d
  namespace: l5d-system
spec:
  template:
    metadata:
      labels:
        app: l5d
    spec:
      hostNetwork: true
      volumes:
      - name: l5d-config
        configMap:
          name: "l5d-config"
      containers:
      - name: l5d
        image: buoyantio/linkerd:1.3.2
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        args:
        - /io.buoyant/linkerd/config/config.yaml
        ports:
        - name: http-outgoing
          containerPort: 4140
          hostPort: 4140
        - name: http-incoming
          containerPort: 4141
        - name: grpc-outgoing
          containerPort: 4340
          hostPort: 4340
        - name: grpc-incoming
          containerPort: 4341
        - name: http-ingress
          containerPort: 80
        - name: grpc-ingress
          containerPort: 8180
        volumeMounts:
        - name: "l5d-config"
          mountPath: "/io.buoyant/linkerd/config"
          readOnly: true

      - name: kubectl
        image: buoyantio/kubectl:v1.6.2
        args:
        - "proxy"
        - "-p"
        - "8001"
---
apiVersion: v1
kind: Service
metadata:
  name: l5d
  namespace: l5d-system
spec:
  selector:
    app: l5d
  type: LoadBalancer
  ports:
  - name: http-outgoing
    port: 4140
  - name: http-incoming
    port: 4141
  - name: grpc-outgoing
    port: 4340
  - name: grpc-incoming
    port: 4341
  - name: http-ingress
    port: 80
  - name: grpc-ingress
    port: 8180

And here is namerd.yaml:

---
kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
  name: dtabs.l5d.io
  namespace: l5d-system
spec:
  scope: Namespaced
  group: l5d.io
  version: v1alpha1
  names:
    kind: DTab
    plural: dtabs
    singular: dtab
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: namerd-config
  namespace: l5d-system
data:
  config.yml: |-
    admin:
      ip: 0.0.0.0
      port: 9991

    namers:
    - kind: io.l5d.k8s
    - kind: io.l5d.k8s
      prefix: /io.l5d.k8s.http
      transformers:
      - kind: io.l5d.k8s.daemonset
        namespace: l5d-system
        port: http-incoming
        service: l5d
        hostNetwork: true
    - kind: io.l5d.k8s
      prefix: /io.l5d.k8s.grpc
      experimental: true
      transformers:
      - kind: io.l5d.k8s.daemonset
        namespace: l5d-system
        port: grpc-incoming
        service: l5d
        hostNetwork: true
    - kind: io.l5d.rewrite
      prefix: /portNsSvcToK8s
      pattern: "/{port}/{ns}/{svc}"
      name: "/k8s/{ns}/{port}/{svc}"
    - kind: io.l5d.rewrite
      prefix: /ingRmvPort
      pattern: "/{ns}/{port}/{svc}"
      name: "/host/{ns}/{svc}"

    storage:
      kind: io.l5d.k8s
      host: localhost
      port: 8001
      namespace: l5d-system

    interfaces:
    - kind: io.l5d.thriftNameInterpreter
      ip: 0.0.0.0
      port: 4100
    - kind: io.l5d.httpController
      ip: 0.0.0.0
      port: 4180
---
kind: Deployment
apiVersion: apps/v1beta2
metadata:
  name: namerd
  namespace: l5d-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: namerd
  template:
    metadata:
      labels:
        app: namerd
    spec:
      dnsPolicy: ClusterFirst
      volumes:
      - name: namerd-config
        configMap:
          name: namerd-config
      containers:
      - name: namerd
        image: buoyantio/namerd:1.3.2
        args:
        - /io.buoyant/namerd/config/config.yml
        ports:
        - name: thrift
          containerPort: 4100
        - name: http
          containerPort: 4180
        - name: admin
          containerPort: 9991
        volumeMounts:
        - name: "namerd-config"
          mountPath: "/io.buoyant/namerd/config"
          readOnly: true
      - name: kubectl
        image: buoyantio/kubectl:v1.6.2
        args:
        - "proxy"
        - "-p"
        - "8001"
---
apiVersion: v1
kind: Service
metadata:
  name: namerd
  namespace: l5d-system
spec:
  selector:
    app: namerd
  type: LoadBalancer
  ports:
  - name: thrift
    port: 4100
    nodePort: 30150
  - name: http
    port: 4180
    nodePort: 30151
  - name: admin
    port: 9991
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: namerctl-script
  namespace: l5d-system
data:
  createNs.sh: |-
    #!/bin/sh

    set -e

    if namerctl dtab get http-outgoing > /dev/null 2>&1; then
      echo "http-outgoing namespace already exists"
    else
      echo "
      /ph  => /$/io.buoyant.rinet ;
      /svc => /ph/80 ;
      /svc => /$/io.buoyant.porthostPfx/ph ;
      /k8s => /#/io.l5d.k8s.http ;
      /srv => /#/portNsSvcToK8s/http ;
      /host => /srv/k2-development ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /$/io.buoyant.http.domainToPathPfx/host ;
      " | namerctl dtab create http-outgoing -
    fi

    if namerctl dtab get http-incoming > /dev/null 2>&1; then
      echo "http-incoming namespace already exists"
    else
      echo "
      /k8s => /#/io.l5d.k8s ;
      /srv => /#/portNsSvcToK8s/http ;
      /host => /srv/k2-development ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /$/io.buoyant.http.domainToPathPfx/host ;
      " | namerctl dtab create http-incoming -
    fi

    if namerctl dtab get grpc-outgoing > /dev/null 2>&1; then
      echo "grpc-outgoing namespace already exists"
    else
      echo "
      /k8s => /#/io.l5d.k8s.grpc ;
      /srv => /#/portNsSvcToK8s/grpc ;
      /host => /srv/k2-development ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /$/io.buoyant.http.domainToPathPfx/host ;
      " | namerctl dtab create grpc-outgoing -
    fi

    if namerctl dtab get grpc-incoming > /dev/null 2>&1; then
      echo "grpc-incoming namespace already exists"
    else
      echo "
      /k8s => /#/io.l5d.k8s ;
      /srv => /#/portNsSvcToK8s/grpc ;
      /host => /srv/k2-development ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /$/io.buoyant.http.domainToPathPfx/host ;
      " | namerctl dtab create grpc-incoming -
    fi

    if namerctl dtab get http-ingress > /dev/null 2>&1; then
      echo "http-ingress namespace already exists"
    else
      echo "
      /k8s => /#/io.l5d.k8s ;
      /srv => /#/portNsSvcToK8s/http ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /#/ingRmvPort ;
      " | namerctl dtab create http-ingress -
    fi

    if namerctl dtab get grpc-ingress > /dev/null 2>&1; then
      echo "grpc-ingress namespace already exists"
    else
      echo "
      /k8s => /#/io.l5d.k8s ;
      /srv => /#/portNsSvcToK8s/grpc ;
      /host => /srv ;
      /tmp => /srv ;
      /svc => /#/ingRmvPort ;
      " | namerctl dtab create grpc-ingress -
    fi
---
kind: Job
apiVersion: batch/v1
metadata:
  name: namerctl
  namespace: l5d-system
spec:
  template:
    metadata:
      name: namerctl
    spec:
      volumes:
      - name: namerctl-script
        configMap:
          name: namerctl-script
          defaultMode: 0755
      containers:
      - name: namerctl
        image: linkerd/namerctl:0.8.6
        env:
        - name: NAMERCTL_BASE_URL
          value: http://namerd.l5d-system.svc.cluster.local:4180
        command:
        - "/namerctl/createNs.sh"
        volumeMounts:
        - name: "namerctl-script"
          mountPath: "/namerctl"
          readOnly: true
      restartPolicy: OnFailure

It is configured for canary deployment:

Please, do you have some recommendations how to debug it when the issue occurs again? It unfortunatelly happens irregularly and I don’t know how to replicate it now.

Thank you.

Ah, interesting. This sounds similar to https://github.com/linkerd/linkerd/issues/1730 in that it seems like Linkerd is not getting the most up-to-date data from Kubernetes. We are actively attempting to reproduce this issue so that we can debug but so far haven’t been able to trigger it.

We’re continuing our attempts to reproduce, but in the meantime, please let us know if you discover a way to deliberately trigger the issue or have any clues about the conditions where it occurs.

Thank you @Alex. If I came up with something, I will let you know.

One more question: We have linkerd log filled with messages like following. Is it normal behavior?

I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.044 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: no ingress rule found for request auditmanagement.k2.alz.lcl /api/analytics
I 1207 15:02:35.045 UTC THREAD66 TraceId:caecbf8556596726: k8s found rule matching auditmanagement.k2.alz.lcl /api/analytics: IngressPath(Some(auditmanagement.k2.alz.lcl),None,k2-development,auditmanagement,http)
I 1207 15:02:35.055 UTC THREAD68: Response with a status code of 204 must not have a Content-Length header field thus the field has been removed. Content-Length: 0

Thanks,
Zdenek

Hi @Alex, new error occurred now. Linkerd returned an error: exceeded 10.seconds to unspecified while dyn binding /svc/k2-development/http/frontend. Remote Info: Not Available while we tried to access our application. From what I observed, linkerd was not able to access namerd (even from linkerd admin page) and namerd logged following OutOfMemoryError:

E 1208 14:45:30.438 UTC THREAD66 TraceId:2cf69a311aa2a41a: binding name /svc/rightsmanagement.k2-development
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources
        at com.twitter.finagle.mux.ServerDispatcher.process(Server.scala:275)
        at com.twitter.finagle.mux.ServerDispatcher.$anonfun$loop$2(Server.scala:292)
        at com.twitter.finagle.mux.ServerDispatcher.$anonfun$loop$2$adapted(Server.scala:290)
        at com.twitter.util.Future$.$anonfun$each$1(Future.scala:1343)
        at com.twitter.util.Future.$anonfun$flatMap$1(Future.scala:1740)
        at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:228)
        at com.twitter.util.Promise$Transformer.k(Promise.scala:228)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:239)
        at com.twitter.util.Promise$Transformer.apply(Promise.scala:220)
        at com.twitter.util.Promise$$anon$7.run(Promise.scala:547)
        at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:198)
        at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:157)
        at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:274)
        at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:109)
        at com.twitter.util.Promise.runq(Promise.scala:522)
        at com.twitter.util.Promise.updateIfEmpty(Promise.scala:887)
        at com.twitter.util.Promise.update(Promise.scala:859)
        at com.twitter.util.Promise.setValue(Promise.scala:835)
        at com.twitter.concurrent.AsyncQueue.offer(AsyncQueue.scala:122)
        at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$1.channelRead(ChannelTransport.scala:183)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at com.twitter.finagle.netty4.channel.ChannelRequestStatsHandler.channelRead(ChannelRequestStatsHandler.scala:41)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at com.twitter.finagle.netty4.codec.BufCodec$.channelRead(BufCodec.scala:69)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelRead(ChannelStatsHandler.scala:106)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1342)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:934)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:23)
        at java.lang.Thread.run(Thread.java:748)

Exception in thread "Netty 4 Timer-1" java.lang.OutOfMemoryError: Java heap space
WARN 1208 14:58:08.879 UTC finagle/netty4-41: An exception 'java.lang.NoClassDefFoundError: Could not initialize class com.twitter.util.RootMonitor$' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
WARN 1208 22:30:28.975 UTC finagle/netty4-13: An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
WARN 1208 22:30:28.975 UTC finagle/netty4-8: An exception 'java.lang.NoClassDefFoundError: Could not initialize class com.twitter.util.RootMonitor$' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
WARN 1210 15:15:10.223 UTC finagle/netty4/boss-1: An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.OutOfMemoryError: Java heap space
SLF4J: Failed toString() invocation on an object of type [java.lang.OutOfMemoryError]
java.lang.OutOfMemoryError: Java heap space
Exception in thread "finagle/netty4-28" java.lang.OutOfMemoryError: Java heap space
WARN 1210 17:01:25.887 UTC finagle/netty4-7: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
WARN 1210 17:02:53.582 UTC finagle/netty4-9: An exception '[FAILED toString()]' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
WARN 1210 17:02:53.582 UTC finagle/netty4-10: An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
WARN 1210 17:02:53.582 UTC finagle/netty4-27: An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
ERROR 1210 17:02:31.825 UTC finagle/netty4/boss-1: Failed to submit a listener notification task. Event loop shut down?
java.lang.OutOfMemoryError: Java heap space
Exception in thread "finagle/netty4/boss-1" java.lang.OutOfMemoryError: Java heap space
WARN 1211 00:57:29.565 UTC finagle/netty4-24: An exception 'java.lang.NoClassDefFoundError: Could not initialize class com.twitter.util.RootMonitor$' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.NoClassDefFoundError: Could not initialize class com.twitter.util.RootMonitor$

After restart of namerd, it started working again. Can you please help? Thank you!

Sorry for the late response, @zsojma. We’ve fixed a number of namerd memory leaks recently, can you confirm what version of Namerd you are running and (if not already) try using the latest version. If you still experience memory leaks or out of memory errors, the best info you can provide is to take a heap dump of Namerd shortly before the OOM and we can attempt to diagnose the issue from there.

+1 suffered from the same problem

@zsojma have you found the solution? thanks.

The binding errors are showed every 10 minutes.

E 0225 07:10:24.586 UTC THREAD53 TraceId:72566a8976d2162d: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:10:24.595 UTC THREAD53 TraceId:efbbdff55b22b39e: binding name /svc/app-1
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:10:24.603 UTC THREAD53 TraceId:6a5e2dc481aafe4c: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:10:24.832 UTC THREAD53 TraceId:342318f8311e970b: resolving addr /#/io.l5d.k8s/default/http/app-1-v0-0-6
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:10:39.469 UTC THREAD45 TraceId:ae19dd73ec6e6641: binding name /svc/app-1
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:10:39.474 UTC THREAD45 TraceId:8a9cbd78ffd20749: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:20:24.605 UTC THREAD53 TraceId:54c2d2317584c3a5: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:20:39.488 UTC THREAD45 TraceId:5100dc22d065523f: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:20:39.493 UTC THREAD45 TraceId:9415e1dba317845e: binding name /svc/app-1.csdn.net
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

E 0225 07:20:39.720 UTC THREAD45 TraceId:892689438f547f7e: resolving addr /#/io.l5d.k8s/default/http/app-1-v0-0-6
com.twitter.finagle.mux.ClientDiscardedRequestException: Failure(Released, flags=0x02) with NoSources

@zhangpin04 what version of LInkerd and Namerd are you running?

Linkerd and Namerd version: 1.3.4, k8s version: 1.8.3 @william illiam

@zhangpin04 Unfortunatelly, I don’t have a solution for it. We are now running linkerd & namerd 1.3.5 and it is stable for now. In past, I noticed that one of our master nodes (we have multi-master kubernetes cluster) returned faulty information about k8s resources (old pods with old IP addresses, etc.). I think that it was the main reason why linkerd didn’t work correctly from time to time. But I have no idea if it is the case here.

ok… thanks any way