EWMA routing more traffic to latency based end-points(external/internal configuration)

I am testing EWMA load-balancer configuration against internal and external setups with latency endpoints. I observe that more traffic is going towards latency endpoints(in both cases internal/external), is this the expected behaviour of EWMA or is there something wrong with the configuration/setup?

below is the setup, config, and observations.

Setup1: Internal config

me —> Linkerd(k8s) —>service1(k8s-service deployment) —>e1,e2,e3

  • e1 is quick
  • e2 and e3 give 10 sec latency

Config

client:
  loadBalancer:
    kind: ewma
    maxEffort: 5
    decayTimeMs: 5000

Number of requests reaching endpoints for 100 requests:

e-1 e-2 e-3
54, 23, 23
09, 41, 50
18, 45, 37
11, 38, 51
10, 41, 49
12, 45, 43
69, 17, 14
09, 45, 46
09, 43, 48


Setup2: External config

me —> Linkerd —> DNS —> e1,e2,e3 (as DNS A-records)

  • e1 is quick
  • e2 and e3 give 10 sec latency.

config

 routers:
    - protocol: http
      label: outgoing
      dtab: |
        /ph        => /$/io.buoyant.rinet ; # Lookup the name in DNS
        /svc       => /ph/80 ; # Use port 80 if unspecified
        /srv       => /$/io.buoyant.porthostPfx/ph ; # Attempt to extract the port from the hostname
      servers:
      - port: 4140
        ip: 0.0.0.0
      service: 
        responseClassifier:
          kind: io.l5d.http.retryableRead5XX
      client:
        kind: io.l5d.static
        configs:
        - prefix: /$/io.buoyant.rinet/80/*
          loadBalancer:
            kind: ewma
            maxEffort: 5
            decayTimeMs: 5000

Number of requests reaching endpoints for 100 requests:

e-1 e-2 e-3
37, 33, 30
15, 40, 45
65, 19, 16
12, 41, 27
18, 44, 38
51, 23, 26
16, 41, 43
24, 37, 39


Edit Summary:
Made this post initially with the subject to ELB based endpoint and corrected it with the appropriate setup from Alex’s comments. Included internal configuration as well so as to confirm that behavior is similar in internal configuration also.

Hi @zshaik! If linkerd is talking to an ELB, it can’t load balance over the endpoints behind it. From linkerd’s point of view, there’s only one endpoint: the ELB itself.

Hmm… but I was able to view the 3 endpoints on admin console and regarding loadbalancing, I was actually able to set algorithms. when roundRobin was set, equal number of requests were shared b/w endpoints, when p2c was set, more no of requests were going to the non-latency end point. We are using classic type load balancer from AWS.

Based on your config file, it looks like you’re using DNS. Linkerd will pull all the entries from that DNS record and load balance over them. I would have expected the DNS entry to just contain a single IP (the IP of the ELB) but maybe I’m wrong. You may want to look at what’s in the DNS entry in order to figure out what’s going on.

1 Like

Hey@Alex, Sorry for the confusion :scream:, our setup actually has a DNS with 3 instances as DNS A-records, not the elb DNS -> e1,e2,e3 (as DNS A records). I have edited my question.

Hey@Alex, I found what I was missing :innocent: I need to set decayTimeMs > endpoint latency. EWMA is working fine, it now routes most of the traffic to non-latency endpoint. Thank you very much for addressing important points :slight_smile:

1 Like