Test Linkerd load balancer and circuit breaking on Kubernetes cluster

I have a small use case where I need to test Linkerd’s load balancing and circuit breaking capabilities which is deployed as part of the Kubernetes cluster. I have three replicas of a microservice and wanted to test how linkerd’s load balancing and circuit breaking actually work. I have exposed a kubernetes service endpoint which actually load balances the requests to all the replicas and also stops sending traffic to one that is unhealthy. If this is the case where does the load balancing and circuit breaking capabilities of linkerd used. How is it different from that of Kubernetes?

Hey @prasaanth, it’s great to hear that you are trying out Linkerd’s load balancing and circuit breaking capabilities! Linkerd allows you to use circuit breaking and load balancing with lots of useful configurations. Linkerd provides a number of load balancing options such as Power of Two Choices (P2C): Least Loaded, Aperture: Least Loaded and Heap Least Loaded. To learn more about all these options, check out the docs on this feature here. For circuit breaking, you can configure Linkerd to fail fast and remove a replica from service discovery if a replica becomes unhealthy or use failure accrual based on a set number of failed requests. More details on this feature can be found here in our docs.

There are a number of blog posts that walk you through how to setup and test out these features.

Here is a blog post about various other ways of load balancing with the help of Linkerd.

This blog post shows you how to get started with Linkerd’s circuit breaking features.

Feel free to ask any questions you may have and I hope you find the information provided above useful!

Hi @deebo91, thanks for your quick response. Actually as per the documentation, it says the circuit breaker configurations of linkerd removes any unhealthy instances from the load balancing pool so that the requests stop accessing the unhealthy instances to increase the success rate. Since my micro services are deployed on Kubernetes, it actually maintains a load balancing pool and distributes the load even since I have configured linkerd to access the service endpoint of Kubernetes. Is this the right way or is there any possible way that I could configure a load balancing pool along with linkerd?

Hey @prasaanth! If you’ve configured linkerd to use the Kubernetes api for service discovery, linkerd will use the api to discover services (and their corresponding pods) and will handle the load balancing over the relevant pods. So you wouldn’t be using the Kubernetes Service’s load balancing over its pods. Linkerd maintains its own pool. (It might be helpful to upload your linkerd config and we can give you feedback). What are you looking to configure?
To configure linkerd’s load balancing, see https://linkerd.io/config/1.3.1/linkerd/index.html#load-balancer.
See also this slack thread that could be helpful: https://linkerd.slack.com/archives/C0JV5E7BR/p1504190778000624

Hi @marzipan
I think I am missing out the configuration of linkerd to use Kubernetes api for service discovery.
My linkerd config goes like this

apiVersion: v1
 kind: ConfigMap
 metadata:
   name: linkerd-config
   namespace: linkerd
 data:
   config.yaml: |-
     admin:
       ip: 0.0.0.0
       port: 51029
namers:
- kind: io.l5d.k8s
- kind: io.l5d.k8s
  prefix: /io.l5d.k8s.http
  transformers:        
  - kind: io.l5d.k8s.daemonset
    namespace: linkerd
    port: http-incoming
    service: l5d
    hostNetwork: true # Uncomment if using host networking (eg for CNI)
- kind: io.l5d.rewrite
  prefix: /portNsSvcToK8s
  pattern: "/{port}/{ns}/{svc}"
  name: "/k8s/{ns}/{port}/{svc}"

telemetry:
- kind: io.l5d.prometheus # Expose Prometheus style metrics on :9990/admin/metrics/prometheus
- kind: io.l5d.recentRequests
  sampleRate: 0.25 # Tune this sample rate before going to production
usage:
  orgId: linkerd-examples-servicemesh

routers:          
- label: ping
  protocol: http
  servers:
  - port: 51021
    ip: 0.0.0.0
  dtab: |
    /svc => /#/io.l5d.k8s/<namespace>/<kubernetes service port>/<kubernetes service name>;
  client:
    loadBalancer:
      kind: ewma
      maxEffort: 10
      decayTimeMs: 15000
    failureAccrual:
      kind: io.l5d.successRate
      successRate: 0.9
      requests: 1000
      backoff:
        kind: jittered
        minMs: 5000
        maxMs: 300000

I have exposed Kubernetes service as a Load balancer type and I have configured the service end point’s port and name in the Dtab. I still think Linkerd is not able to stop sending requests to the unhealthy pods.

Hey @prasaanth! yeah your service shouldn’t need to be configured as a load balancer.
See our hello world services in linkerd-examples for what this could look like:

Also @klingerf brought up an important thing to note - please ensure you’re running linkerd 1.3.1 or later - there were some k8s issues in 1.3.0 that have since been fixed.

Hi @marzipan
Thank you for pointing out this. I am running linkerd 1.3.1 now and I have made changes to my service as follows

apiVersion: v1
kind: Service
metadata:
  name: ping
    labels:
      run: ping
spec:
  clusterIP: None
  ports:
  - name: ping
    port: 51030
    targetPort: 51030
    protocol: TCP
  selector:
    run: ping

My linkerd config is same as mentioned above. I am running one of the replicas of the pod with a different version which makes all the requests to that pod fail. But still I don’t see the circuit being tripped.

Hi @prasaanth, thanks for all the detail, it’s helpful. A few more things to check:

  1. What kind of error is the failing pod returning? For example, if it’s returning a 400, linkerd will not consider that an error. Have you tried configuring the failing pod to simply not respond?
  2. What qps are you sending to the pods?
  3. Since you are on k8s, you may gain some visibility by installing linkerd-viz, it will show you request-level information on a per-instance basis.

Thank you so much @siggy. Actually I had configured my pods to return 400 response and that was the reason why my circuit was not able to break. Is there any reason why Linkerd will not consider 400 status code as error?

Great to hear @prasaanth.

Linkerd does not consider 400s errors because those are generally failures of the caller, not of your server. For example, a user hitting an incorrect URL multiple times should not affect success rate to determine circuit-breaking.

For more information on response classification, have a look at:
https://linkerd.io/config/1.3.1/linkerd/index.html#http-response-classifiers

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.