Finagle: "No Hosts available"

Hey all, using linkerd 1.2.1 and consul for service discovery, our config looks like:

admin:
  ip: 0.0.0.0
  port: 9990

telemetry:
- kind: io.l5d.prometheus


namers:
 - kind: io.l5d.consul
   host: consul.host.name
   useHealthCheck: true
   prefix: /default
 - kind: io.l5d.consul
   host: consul.host.name
   useHealthCheck: true
   healthStatuses:
   - warning
   prefix: /fallback

routers:
- protocol: thrift
  label: service1
  thriftProtocol: binary
  thriftMethodInDst: true
  dtab: |
    /svc => /#/fallback/.local/service1;
    /svc => /#/default/.local/service1;
  client:
    thriftFramed: false
    failureAccrual:
      kind: io.l5d.consecutiveFailures
      failures: 5
      backoff:
        kind: jittered
        minMs: 5000
        maxMs: 300000
  servers:
  - ip: 0.0.0.0
    port: 11010
    thriftFramed: false
....
<few more routers here - thrift Python>

We are getting a “No hosts available” for a router (not for all requests). However, the rt:client:loadbalancer:available metric for this router doesn’t show any corresponding drop to 0. The behavior goes away after a linkerd restart.

Any idea, What may be happening? Thanks for any insights.

Hi @amitsaha. Can you provide a bit more detail on when you are seeing this error? Does the same route fail every time? Is it more than one route? Screenshots of the linkerd admin dtab page are helpful.

Yes it was the same route. However i noticed that we see these error messages when the downstream services are having issues as well. This also correlated to the failure count of the individual thrift calls for which we were getting the error messages.

Our retry configuration is currently:

client:
    thriftFramed: false
    failureAccrual:
      kind: io.l5d.consecutiveFailures
      failures: 5
      backoff:
        kind: jittered
        minMs: 5000
        maxMs: 300000

Could the failures cause finagle to report this log message? (certainly seems so).

Can you confirm the route is failing via the linkerd admin dtab page? A screenshot helps if you have one.

Sure, i will try that next time. I have a feeling it’s related to Question regarding cached pool size with Python thrift servers

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.