When control is down, meshed pods not working

when we scale the control plane to 0
kubectl get deploy -n linkerd |grep -v NAME|awk ‘{print $1}’|xargs -I ARG kubectl scale deploy ARG -n linkerd --replicas=0

used booksapp app, from authors pod to ping books pod, failed.
k -n booksapp exec -it authors-5fdcc84c89-66srd -c linkerd-debug – curl http://books.booksapp:7002/books.json
connection reset by peer

we can see the proxy is still there, but not ok to ping each other, we guessed there is some connection tracking, if control is down, connection is broken with iptables in pods.

when we bring up control plane, it is working again. so this is not making sense

Hi @munger I tried to reproduce this with Linkerd 2.11.1 using the emojivoto application and didn’t see the behavior that you described.

When the control plane is down, any pods already injected with the Linkerd proxy should continue to work. If any new pods are scheduled, they won’t be injected with the Linkerd proxy. Were the pods restarted after scaling the control plane down?

You can check with the community on Linkerd Slack to see if anyone there has seen similar behavior.

hi, thx, pls stop the robot that generate the traffic, and pls stop visiting for at least 20 mins, later u test, u will find it is unable to call books api from authors, the reason u find it is ok it is becuz u keep sending traffic , once proxy finds any connection , it will wait for it to finish, and then u try to test , faild.

not need to restart it .

pls retry and tell me any workaround

becuz emojiapp is grpc, it is based on long connection, proxy will wait for it to finish, once control plane is down, u need to stop sending traffic from robot, then some more time later, no traffic, call, failed. pod is not restarting, proxy is still in pod.

@munger I ran the same test with booksapp and was able to access the webapp and view traffic from the traffic container after scaling the control plane down to zero.

Have a look at the logs of the traffic and web containers to see if there are any helpful error messages. You can also look at the proxy logs to see if there is information there.

Charles

i have made a video in youtube, pls check this out for the issue, see if you have better workaround.

oh, the url of video is linkerd issue when control plane is down - YouTube

any advice? did u see my video

@munger can you share the proxy logs from pod p when you attempt to run the curl command after the control plane is down?

Assuming that the pod was created before the control plane was scaled down, the proxies should be able to communicate with each other. You should be able to verify this by looking at the logs of the traffic service that generates traffic to the books app.

If we can figure out what the error 56 means, then we will have a better understanding of what is happening.

from the video, u can see pod is created and injected before the control is down. leme bring up env again, it takes some time though