Io.l5d.mesh issue

I am using Linkerd/Namerd 1.1.3 in Kubernetes 1.7.2. I have a very simple setup with the following configuration:

Namerd:

namers:
- kind: io.l5d.k8s
  prefix: /http-kube-local-namer
  host: localhost
  port: 8001
  transformers:
  - kind: io.l5d.k8s.daemonset
    namespace: system
    port: http-mesh
    service: linkerd
    hostNetwork: true

- kind: io.l5d.k8s
  prefix: /http-kube-mesh-namer
  host: localhost
  port: 8001

Linkerd:

routers:
- protocol: http
  label: http
  dstPrefix: /http-kube-local
  interpreter:
    kind: io.l5d.mesh
    experimental: true
    dst: /#/io.l5d.k8s/system/mesh/namerd
    tls:
      commonName: linkerd.com
      trustCerts:
      - /com/linkerd/certs/cacertificate.pem
    root: /httproutes
  servers:
  - port: 4140
    ip: 0.0.0.0
  service:
    responseClassifier:
      kind: io.l5d.http.retryableRead5XX
  client:
    hostConnectionPool:
      idleTimeMs: 1800000
    tls:
      commonName: linkerd.com
      trustCerts:
      - /com/linkerd/certs/cacertificate.pem

- protocol: http
  label: http-mesh
  dstPrefix: /http-kube-mesh
  interpreter:
    kind: io.l5d.mesh
    experimental: true
    dst: /#/io.l5d.k8s/system/mesh/namerd
    tls:
      commonName: linkerd.com
      trustCerts:
      - /com/linkerd/certs/cacertificate.pem
    root: /httproutes
    transformers:
    - kind: io.l5d.k8s.localnode
      hostNetwork: true
  servers:
  - port: 4141
    ip: 0.0.0.0
    tls:
      certPath: /com/linkerd/certs/certificate.pem
      keyPath: /com/linkerd/certs/key.pem
  client:
    hostConnectionPool:
      idleTimeMs: 1800000

When I try to curl a service running in the cluster, Namerd apparently finds the service fine

D 1004 15:11:53.041 UTC THREAD44: k8s lookup: /service/api/simple-httpd /service/api/simple-httpd
D 1004 15:11:53.041 UTC THREAD44: k8s ns service service simple-httpd found
D 1004 15:11:53.041 UTC THREAD44: k8s ns service service simple-httpd port api found + /

But Linkerd says that there are no hosts to be found.

I 1004 15:11:53.195 UTC THREAD45 TraceId:37f7724e91d30e24: %/io.l5d.k8s.daemonset/system/http-mesh/linkerd/#/http-kube-local-namer/service/api/simple-httpd: name resolution is negative (local dtab: Dtab())
E 1004 15:11:53.210 UTC THREAD45 TraceId:37f7724e91d30e24: service failure: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for /http-kube-local/simple-httpd, Dtab.base=[], Dtab.local=[]. Remote Info: Not Available

When I replace, as appropriate, the io.l5d.mesh, mesh, and root with, respectively, io.l5d.namerd.http, http, and namespace (removing the leading / from the namespace name), the curl request succeeds.

My dtab looks like the following

/http-kube-local/simple-httpd  => /#/http-kube-local-namer/service/api/simple-httpd ;
/http-kube-mesh/simple-httpd   => /#/http-kube-mesh-namer/service/api/simple-httpd ;

I would of course suspect that I had a port named incorrectly or such, so that is why I tried the io.l5d.namerd.http configuration. Is there something special about the io.l5d.mesh configuration that I am missing?

Hi @robbfoster!

Very interesting, could you look at the admin uis for both linkerd and namerd (while using io.l5d.mesh) and try resolving /http-kube-local/simple-httpd in both? Does it work in either of them? Do the trees look the same? I’m wondering if it’s similar to Linkerd + Namerd CNI setup help

Yeah, I should have added that. When I am configured with io.l5d.mesh, everything resolves fine with Namerd.

/http-kube-local/simple-httpd
Namer Match /#/http-kube-local-namer/service/api/simple-httpd /http-kube-local/simple-httpd => /#/http-kube-local-namer/service/api/simple-httpd
Bound Path 192.168.57.17:4141 [/%/io.l5d.k8s.daemonset/ck-system/http-mesh/linkerd/#/http-kube-local-namer/service/api/simple-httpd] (residual: /)

/http-kube-mesh/simple-httpd
/#/http-kube-mesh-namer/service/api/simple-httpd /http-kube-mesh/simple-httpd => /#/http-kube-mesh-namer/service/api/simple-httpd
172.16.198.22:80 [/#/http-kube-mesh-namer/service/api/simple-httpd] (residual: /)

With Linkerd, everything looks similar, but not quite the same:

/http-kube-local/simple-httpd
/#/http-kube-local-namer/service/api/simple-httpd /http-kube-local/simple-httpd => /#/http-kube-local-namer/service/api/simple-httpd
[/%/io.l5d.k8s.daemonset/system/http-mesh/linkerd/#/http-kube-local-namer/service/api/simple-httpd] (residual: /)

Yeah, the Linkerd + CNI issue looked very similar.

Any update on this issue. It is a blocker for our upgrade plans.

thanks for the additional information. I’ve linked this discourse issue in the tracking ticket https://github.com/linkerd/linkerd/issues/1660

The issue has been assigned and prioritized after some other k8s api bugs. In general CNI issues are difficult for us to track down as the team doesn’t currently have access to a CNI setup. We will update the ticket with anything we find, thanks.