L5d routing to service fails after re-deploying


#1

Hi,

Some info:

K8s version - 1.7.5
L5d version - 1.3.2
OS - centos 7.3
CNI Network - Flannel
Cloud provider - AWS
L5d kubectl sidecar - v1.6.2

We are using L5d for a few months now. Lately we upgraded from L5d version 1.2 to 1.3.2 in order to solve the “too old versions” error but since using the 1.3.2 we are seeing a different issue happening in our clusters when redeploying an existing service (upgrade) and sometimes the issue happens on a living existing service with no change.

The issue itself is that L5d is trying to route a request to an old and not correct endpoint IP which ends up with a timeout and the following error in L5d logs:

[l5d-8qcdg l5d] E 1127 09:37:35.318 UTC THREAD28: service failure: Failure(connection timed out: /100.72.7.21:3000 at remote address: /100.72.7.21:3000. Remote Info: Not Available, flags=0x09) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: /100.72.7.21:3000, Downstream label: #/io.l5d.k8s/default/http/frontend-platform-web, Trace Id: 486c517410f137f3.486c517410f137f3<:486c517410f137f3

As can be seen above the L5d is trying to access service named frontend-platform-web through the IP - 100.72.7.21 in port 3000. At the same time the actual endpoint IP for this service is as follows:

#kubectl get pods -o wide | grep frontend

frontend-platform-2428193161-8dbnx 1/1 Running 0 1h 100.72.2.8

The actual endpoint IP for the frontend-platform-web service is 100.72.2.8 and not 100.72.7.21 and for some reason the L5d caches the wrong IP for the endpoint and thus cannot reach the endpoint itself. The enduser receives an 502 error. The issue can be solved only by deleting the L5d pods :frowning:

Any help will be much appreciated

Thanks,

Roiy


#2

Hello - we are experiencing exactly the same issue.
Did you resolve it ?
the l5d sidecar is kubectl v.1.6.4


#3

@roiyn, @amirg: can you please provide your linkerd configs?


#4

Hi Siggy and thank you for your response ! @siggy

unfortunately I cannot upload files (new user) so I am pasting the 2 configuration files below - daemonset & config-map:

linkerd-daemonset.yml

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: 2017-10-22T12:08:57Z
generation: 2
labels:
app: l5d
name: l5d
namespace: default
resourceVersion: "31512733"
selfLink: /apis/extensions/v1beta1/namespaces/default/daemonsets/l5d
uid: c7ba3e3c-b721-11e7-894e-024a4d316152
spec:
selector:
matchLabels:
app: l5d
template:
metadata:
creationTimestamp: null
labels:
app: l5d
spec:
containers:
- args:
- /io.buoyant/linkerd/config/config.yaml
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: docker-dev-local.artifactory-dev.traiana.com/linkerd:1.3.2
imagePullPolicy: Always
name: l5d
ports:
- containerPort: 4140
hostPort: 4140
name: http-outgoing
protocol: TCP
- containerPort: 4141
hostPort: 4141
name: http-incoming
protocol: TCP
- containerPort: 4142
hostPort: 4142
name: http-ingress
protocol: TCP
- containerPort: 5150
hostPort: 5150
name: h2-outgoing
protocol: TCP
- containerPort: 5151
hostPort: 5151
name: h2-incoming
protocol: TCP
- containerPort: 5152
hostPort: 5152
name: h2-ingress
protocol: TCP
- containerPort: 9990
hostPort: 9990
name: admin
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /io.buoyant/linkerd/config
name: l5d-config
readOnly: true
- args:
- proxy
- -p
- "8001"
image: buoyantio/kubectl:v1.6.2
imagePullPolicy: IfNotPresent
name: kubectl
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: l5d-config
name: l5d-config
templateGeneration: 2
updateStrategy:
type: OnDelete
status:
currentNumberScheduled: 6
desiredNumberScheduled: 6
numberAvailable: 6
numberMisscheduled: 0
numberReady: 6
observedGeneration: 2
updatedNumberScheduled: 6

linkerd-cm.yml

apiVersion: v1
data:
config.yaml: |-
namers:
- kind: io.l5d.k8s
experimental: true
host: 127.0.0.1
port: 8001
telemetry:
- kind: io.l5d.prometheus
- kind: io.l5d.recentRequests
sampleRate: 1.0
- kind: io.l5d.zipkin
host: localhost
port: 30941
sampleRate: 1.0
admin:
port: 9990
ip: 0.0.0.0
routers:
- protocol: http
label: http-outgoing
dtab: |
/svc => /#/io.l5d.k8s/default/http;
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: default
port: http-incoming
service: l5d
hostNetwork: true
servers:
- port: 4140
ip: 0.0.0.0
service:
responseClassifier:
kind: io.l5d.http.retryableRead5XX
- protocol: http
label: http-incoming
dtab: |
/svc => /#/io.l5d.k8s/default/http;
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.localnode
hostNetwork: true
servers:
- port: 4141
ip: 0.0.0.0
- protocol: http
label: http-ingress
dtab: |
/svc => /#/io.l5d.k8s/default/http;
identifier:
kind: io.l5d.path
segments: 1
consume: true
servers:
- port: 4142
ip: 0.0.0.0
clearContext: true
- protocol: h2
label: h2-outgoing
experimental: true
dtab: |
/grpc => /#/io.l5d.k8s/default/grpc;
/svc => /$/io.buoyant.http.domainToPathPfx/grpc;
identifier:
kind: io.l5d.header.path
segments: 1
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: default
port: h2-incoming
service: l5d
hostNetwork: true
servers:
- port: 5150
ip: 0.0.0.0
- protocol: h2
label: h2-incoming
experimental: true
dtab: |
/grpc => /#/io.l5d.k8s/default/grpc;
/svc => /$/io.buoyant.http.domainToPathPfx/grpc;
identifier:
kind: io.l5d.header.path
segments: 1
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.localnode
hostNetwork: true
servers:
- port: 5151
ip: 0.0.0.0
- protocol: h2
label: h2-ingress
experimental: true
dtab: |
/svc => /$/io.buoyant.http.domainToPathPfx/grpc;
/grpc => /#/io.l5d.k8s/default/grpc;
identifier:
kind: io.l5d.header.path
segments: 1
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: default
port: h2-incoming
hostNetwork: true
service: l5d
servers:
- port: 5152
ip: 0.0.0.0
clearContext: true
usage:
enabled: false
kind: ConfigMap
metadata:
creationTimestamp: 2017-10-22T12:08:56Z
name: l5d-config
namespace: default
resourceVersion: "22558961"
selfLink: /api/v1/namespaces/default/configmaps/l5d-config
uid: c7a899c2-b721-11e7-894e-024a4d31615


#5

Thanks for the detail @roiyn. This is definitely unexpected behavior. A few more things we’d like to see:

  1. Can you reformat your previous message with config files, either using github gists, or inline formatting:
example:
  foo: bar
  1. Take screenshots of the linkerd admin ui dtab playground, both before a redeploy, and after, using /svc/app or whatever your app name is.
  2. Enable the io.l5d.tracelog in the telemetry section of your linkerd config.
  3. Add -log.level=DEBUG to your linkerd startup command.
  4. Following 2 and 3, please run a successful request, redeploy, run a failed request, and then provide the logging output from linkerd.

#6

apiVersion:v1
data:
__config.yaml:
|-
____namers:
____-_kind:_io.l5d.k8s
______experimental:_true
______host:_127.0.0.1
______port:_8001
____telemetry:
______-_kind:_io.l5d.tracelog
______-_kind:_io.l5d.prometheus
______-_kind:_io.l5d.recentRequests
________sampleRate:_1.0
______-_kind:_io.l5d.zipkin
________host:_localhost
________port:_30941
________sampleRate:_1.0
____admin:
______port:_9990
______ip:_0.0.0.0
____routers:
____-_protocol:_http
______label:_http-outgoing
____dtab:|
_______/svc=>
/#/io.l5d.k8s/default/http;
______interpreter:
________kind:_default
________transformers:
________-_kind:_io.l5d.k8s.daemonset
__________namespace:_default
__________port:_http-incoming
__________service:_l5d
__________hostNetwork:_true
______servers:
______-_port:_4140
________ip:_0.0.0.0
______service:
________responseClassifier:
__________kind:_io.l5d.http.retryableRead5XX
____-_protocol:_http
______label:_http-incoming
____dtab:|
_______/svc=>
/#/io.l5d.k8s/default/http;
______interpreter:
________kind:_default
________transformers:
________-_kind:_io.l5d.k8s.localnode
__________hostNetwork:_true
______servers:
______-_port:_4141
________ip:_0.0.0.0
____-_protocol:_http
______label:_http-ingress
____dtab:|
_______/svc=>
/#/io.l5d.k8s/default/http;
______identifier:
________kind:_io.l5d.path
________segments:_1
________consume:_true
______servers:
______-_port:_4142
________ip:_0.0.0.0
________clearContext:_true
____-_protocol:_h2
______label:_h2-outgoing
______experimental:_true
___dtab:|
_______/grpc=>
/#/io.l5d.k8s/default/grpc;
______/svc=>
/$/io.buoyant.http.domainToPathPfx/grpc;
______identifier:
________kind:_io.l5d.header.path
________segments:_1
______interpreter:
________kind:_default
________transformers:
________-_kind:_io.l5d.k8s.daemonset
__________namespace:_default
__________port:_h2-incoming
__________service:_l5d
__________hostNetwork:_true
______servers:
______-_port:_5150
________ip:_0.0.0.0
____-_protocol:_h2
______label:_h2-incoming
______experimental:_true
___dtab:|
_______/grpc=>
/#/io.l5d.k8s/default/grpc;
______/svc=>
/$/io.buoyant.http.domainToPathPfx/grpc;
______identifier:
________kind:_io.l5d.header.path
________segments:_1
______interpreter:
________kind:_default
________transformers:
________-_kind:_io.l5d.k8s.localnode
__________hostNetwork:_true
______servers:
______-_port:_5151
________ip:_0.0.0.0
____-_protocol:_h2
______label:_h2-ingress
______experimental:_true
___dtab:|
_______/svc=>
/$/io.buoyant.http.domainToPathPfx/grpc;
_______/grpc=>
/#/io.l5d.k8s/default/grpc;
______identifier:
________kind:_io.l5d.header.path
________segments:_1
______interpreter:
________kind:_default
________transformers:
________-_kind:_io.l5d.k8s.daemonset
__________namespace:_default
__________port:_h2-incoming
__________hostNetwork:_true
__________service:_l5d
______servers:
______-_port:_5152
________ip:_0.0.0.0
________clearContext:_true
____usage:
______enabled:_false

  1. ill do it asap and update again

  2. enabled

  3. added.

  4. ill do it ASAP. all update again.