Traffic Shifting on Kubernetes not working as expected


#1

Hi,

I have setup linkerd 1.4.5 on Kubernetes cluster. L5D is configured with an http outgoing router (port#4140),and http incoming router (port#4141). I have used Namerd for configuring the namers and DTAB. Namerd uses “io.l5d.k8s.daemonset” as the transformer for the outgoing router and “io.l5d.k8s.localnode” transformer for the incoming router

- protocol: http
** label: outgoing**
** interpreter:**
** kind: io.l5d.namerd**
** dst: //inet/namerd.default.svc.cluster.local/4100** ** namespace: bluemarble-internal** ** transformers:** ** - kind: io.l5d.k8s.daemonset** ** namespace: default** ** port: incoming-intern** ** service: l5d** ** servers:** ** - port: 4140** ** ip: 0.0.0.0** **- protocol: http** ** label: incoming-internal** ** interpreter:** ** kind: io.l5d.namerd** ** dst: //inet/namerd.default.svc.cluster.local/4100**
** namespace: bluemarble-internal**
** transformers:**
** - kind: io.l5d.k8s.localnode**
** servers:**
** - port: 4141**
** ip: 0.0.0.0**

The kubernetes cluster has 1 master and 5 minions.

I am trying to test a traffic shifting scenario. To start I have configured DTAB for the version 1 of a service-a (replicas # 2) as below

/srv=>/#/io.l5d.k8s;
/domain=>/srv;
/domain/l5d-poc-dev/http/service-a-v1=>/srv/l5d-poc-dev/http/service-a-v1;
/host=>/$/io.buoyant.http.domainToPathPfx/domain;
/svc=>/host;

After that I have updated the DTAB to shift the traffic by 10% to the version 2 of service-a as below

/srv=>/#/io.l5d.k8s;
/domain=>/srv;
/domain/l5d-poc-dev/http/service-a-v1=> 9/srv/l5d-poc-dev/http/service-a-v1 & 1/srv/l5d-poc-dev/http/service-a-v2;**
/host=>/$/io.buoyant.http.domainToPathPfx/domain;
/svc=>/host;

To test the scenario, I am sending about 20 request to the service-a via linkerd. While running the tests, for some of the requests I see “Unable to route request!”.

After bit of debugging, I found that; the ipaddress(s) of l5d daemonset for which this error is seen, doesn’t have both the version of the service-a. That is, if the weighted algorithm is expecting the request to be routed to service-a-v2, and on that daemonset the service-a-v1 instance is running, then it returns this error and vice a versa.

For e.g see the below output, the outgoing router selects the “192.168.251.73:4141” daemonset to route the request to. But, when the request reaches this daemonset, the incoming router uses the “io.l5d.k8s.localnode” transformer and tries to route the request to another ip / service instance, which is not local to the identifed daemonset.

Any help to resolve the issue will be really appreicated.

— Router: outgoing —
request duration: 818 ms
service name: /svc/service-a-v1.http.l5d-poc-dev
client name: /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1
addresses: [192.168.251.73:4141, 192.168.39.109:4141]
selected address: 192.168.251.73:4141
dtab resolution:
** /svc/service-a-v1.http.l5d-poc-dev**
** /host/service-a-v1.http.l5d-poc-dev (/svc=>/host)**
** /domain/l5d-poc-dev/http/service-a-v1**
** /domain/l5d-poc-dev/http/service-a-v1**
** /srv/l5d-poc-dev/http/service-a-v1 (/domain/l5d-poc-dev/http/service-a-v1=>/srv/l5d-poc-dev/http/service-a-v2 & 9.00*/srv/l5d-poc-dev/http/service-a-v1)**
** /#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (/srv=>/#/io.l5d.k8s)**
** /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (DelegatingNameTreeTransformer$)**

Unable to route request!

service name: /svc/service-a-v1.http.l5d-poc-dev
resolutions considered:
** /%/io.l5d.k8s.localnode/192.168.24.21/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v2 (pending)**
** /%/io.l5d.k8s.localnode/192.168.24.21/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (neg)**
** /%/io.l5d.k8s.localnode/192.168.24.21/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (neg)**
dtab:
** /srv => /#/io.l5d.k8s**
** /domain => /srv**
** /domain/l5d-poc-dev/http/service-a-v1 => /srv/l5d-poc-dev/http/service-a-v2 & 9.00*/srv/l5d-poc-dev/http/service-a-v1**
** /host => /$/io.buoyant.http.domainToPathPfx/domain**
** /svc => /host**
base dtab:

override dtab:


#2

Hi :wave: Thanks for all the detail in this thread. It looks like the incoming router is trying to apply the dtab after the outgoing router has already determined what service it should go to. You can fix this by taking advantage of the l5d-dst-client header the outgoing router adds to all requests. This is the client name based on what you have posted above. e.g. /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1

You would need to change your incoming router to use a header identifier, and then add a dtab that effectively strips the daemonset info from the client name.

Your incoming router should look like this:

- protocol: http
  label: incoming
  identifier:
    kind: io.l5d.header
    header: l5d-dst-client
  dtab: /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d => /

Hopefully that fixes the issue you are running into.


#3

@Dennis thanks for the reply.
After making changes, I am facing a different issue in routing a normal request.I feel I might be missing something.

I forgot to mention about using “namerd” for DTAB configuration (namer: io.l5d.k8s).
As suggested by you, I have added the dtab entry “/%/io.l5d.k8s.daemonset/default/incoming-intern/l5d => /” to the dtabs configuration in “namerd”.
See the below DTAB configuration for the incoming router after making the changes

/srv=>/#/io.l5d.k8s;
/domain=>/srv;
/domain/l5d-poc-dev/http/bm-greeting-svc-v1=>/srv/l5d-poc-dev/http/bm-greeting-svc-v1;
/host=>/$/io.buoyant.http.domainToPathPfx/domain;
/svc=>/host;
/%/io.l5d.k8s.daemonset/default/incoming-intern/l5d=>/

And the incoming route configuration looks like

- protocol: http
      label: incoming-internal
      identifier:
         kind: io.l5d.header
         header: l5d-dst-client
      interpreter:
        kind: io.l5d.namerd
        dst: /$/inet/namerd.default.svc.cluster.local/4100
        namespace: incoming
        transformers:
        - kind: io.l5d.k8s.localnode

After the changes, I am seeing the below error for “http_proxy=$HOST_IP:31616 curl -s -X TRACE -H “l5d-add-context: true” http://service-a-v1.http.l5d-poc-dev/bsi/greeting/v1”.

Can you take a look at this and let me know what am I missing?

Unable to route request!

service name: /svc/%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1
resolutions considered:
  /$/io.buoyant.http.domainToPathPfx/domain/%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (neg)
dtab:
  /srv => /#/io.l5d.k8s
  /domain => /srv
  /domain/l5d-poc-dev/http/service-a-v1 => /srv/l5d-poc-dev/http/service-a-v1
  /host => /$/io.buoyant.http.domainToPathPfx/domain
  /svc => /host
  /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d => /
base dtab:

override dtab:

--- Router: outgoing ---
request duration: 96 ms
service name: /svc/service-a-v1.http.l5d-poc-dev
client name: /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1
addresses: [192.168.251.69:4141, 192.168.39.117:4141]
selected address: 192.168.39.117:4141
dtab resolution:
  /svc/service-a-v1.http.l5d-poc-dev
  /host/service-a-v1.http.l5d-poc-dev (/svc=>/host)
  /domain/l5d-poc-dev/http/service-a-v1
  /srv/l5d-poc-dev/http/service-a-v1 (/domain/l5d-poc-dev/http/service-a-v1=>/srv/l5d-poc-dev/http/service-a-v1)
  /#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (/srv=>/#/io.l5d.k8s)
  /%/io.l5d.k8s.daemonset/default/incoming-intern/l5d/#/io.l5d.k8s/l5d-poc-dev/http/service-a-v1 (DelegatingNameTreeTransformer$)

Regards,
Prasad


#4

Hi Prasad, it looks like you need to add the leading /svc prefix for the dtab to take effect. So the dtab entry would look like this:

 dtab: /svc/%/io.l5d.k8s.daemonset/default/incoming-intern/l5d => /

#5

Hi Denis,

Thanks it worked :slight_smile:

One question, is this the expected behavior when the number of Linkerd instances is greater than the number of instances of a service?

Regards,
Prasad


#6

If you are performing traffic shifting in a daemonset environment in K8S then yea this is something you would need to add to your configs to get that feature working. It’s not necessarily tied to the number of Linkerd instances in a setup.