Hey all, using linkerd 1.2.1 and consul for service discovery, our config looks like:
admin:
ip: 0.0.0.0
port: 9990
telemetry:
- kind: io.l5d.prometheus
namers:
- kind: io.l5d.consul
host: consul.host.name
useHealthCheck: true
prefix: /default
- kind: io.l5d.consul
host: consul.host.name
useHealthCheck: true
healthStatuses:
- warning
prefix: /fallback
routers:
- protocol: thrift
label: service1
thriftProtocol: binary
thriftMethodInDst: true
dtab: |
/svc => /#/fallback/.local/service1;
/svc => /#/default/.local/service1;
client:
thriftFramed: false
failureAccrual:
kind: io.l5d.consecutiveFailures
failures: 5
backoff:
kind: jittered
minMs: 5000
maxMs: 300000
servers:
- ip: 0.0.0.0
port: 11010
thriftFramed: false
....
<few more routers here - thrift Python>
We are getting a “No hosts available” for a router (not for all requests). However, the rt:client:loadbalancer:available
metric for this router doesn’t show any corresponding drop to 0. The behavior goes away after a linkerd
restart.
Any idea, What may be happening? Thanks for any insights.