Linkerd with traffic shifting fails coincidentally when request via POS

hey there - i’m currently evaluating linkerd.

I have a small demo setup with two services (sample-prod and sample-canary). I’ve set up a 50/50 traffic shifter for booth.

What works fantastic so far:

  • calling the services directly via their names (http://service-prod for example) with POST and GET
  • calling the 50/50 via GET

What works not so well:

  • calling the 50/50 via POST

in the last case, this error from time to time:

No hosts are available for /svc/frontend, Dtab.base=[/srv=>/#/io.l5d.k8s/default/http;/host=>/srv;/svc=>/host;/host/frontend=>50.00/srv/go-sample-frontend-canary & 50.00/srv/go-sample-frontend-prod], Dtab.local=[]. Remote Info: Not Available

Every thrid to even tenth request. it’s not limited to canary or prod, booth tracks fail from time to time.
When i set the 50/50 to just one track (booth prod or booth canary), everything is working fine again.

I*m running in google cloud GCE with 4 nodes

my linkerd.yml:

Would be awesome if someone could help me a little bit :slight_smile:

i first thought that the problem was solved, but then i randomly killed pods, so that some
Instances reoccured on other Nodes (in the working scenario, prod and canary where on the same node).

After separation, the problem happens again.

best regards

This looks like it’s probably the same issue as here: Failures when there is a rule with a weighted union in namerd dtab

We definitely want to fix this so that traffic shifting will automatically work out-of-the-box but in the meantime there are some workarounds in the thread that I linked.

It works now perfectly. The mistake i made in my config was that i did not setup an external route like this:

    - protocol: http
      label: external
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.default.svc.cluster.local/4100
        namespace: external
      servers:
      - port: 4142
        ip: 0.0.0.0
        engine:
          kind: netty4
      client:
        engine:
          kind: netty4

adding this, exposing it to a port (80 in my case) and configure the traffic shift dtab there perfectly worked.