FailureAccrualFactory marking connection as dead just after initialization

Hello, we have been observing this log (linkerd 1.3.5):


...
Jan 30 22:21:39 ip-10-168-184-121 linkerd[9073]: I 0131 03:21:39.633 UTC THREAD30 TraceId:1d456389ef3d92e9: FailureAccrualFactory marking connection to "#/default/.local/users" as dead. Remote Address: Inet(/10.51.167.196)

The above log appears for every host for a service right after a instance first comes up just after:

Jan 30 22:20:18 ip-10-168-184-121 linkerd[9073]: -XX:+AggressiveOpts -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:InitialHeapSize=33554432 -XX:Ma
Jan 30 22:20:22 ip-10-168-184-121 linkerd[9073]: Jan 31, 2018 3:20:22 AM com.twitter.finagle.http.HttpMuxer$ $anonfun$new$1
Jan 30 22:20:22 ip-10-168-184-121 linkerd[9073]: INFO: HttpMuxer[/admin/metrics.json] = com.twitter.finagle.stats.MetricsExporter(<function1>)
Jan 30 22:20:22 ip-10-168-184-121 linkerd[9073]: Jan 31, 2018 3:20:22 AM com.twitter.finagle.http.HttpMuxer$ $anonfun$new$1
Jan 30 22:20:22 ip-10-168-184-121 linkerd[9073]: INFO: HttpMuxer[/admin/per_host_metrics.json] = com.twitter.finagle.stats.HostMetricsExporter(<function1>)
...

..
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.132 UTC THREAD1: Tracer: com.twitter.finagle.zipkin.thrift.ScribeZipkinTracer
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.220 UTC THREAD1: connecting to usageData proxy at Set(Inet(stats.buoyant.io/104.28.23.233:443,Map()))
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.517 UTC THREAD1: Resolver[inet] = com.twitter.finagle.InetResolver(com.twitter.finagle.InetResolver@71926a36)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.519 UTC THREAD1: Resolver[fixedinet] = com.twitter.finagle.FixedInetResolver(com.twitter.finagle.FixedInetResolver@216e9ca3)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.520 UTC THREAD1: Resolver[neg] = com.twitter.finagle.NegResolver$(com.twitter.finagle.NegResolver$@75120e58)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.521 UTC THREAD1: Resolver[nil] = com.twitter.finagle.NilResolver$(com.twitter.finagle.NilResolver$@48976e6d)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.522 UTC THREAD1: Resolver[fail] = com.twitter.finagle.FailResolver$(com.twitter.finagle.FailResolver$@2a367e93)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.523 UTC THREAD1: Resolver[flag] = com.twitter.server.FlagResolver(com.twitter.server.FlagResolver@7f6874f2)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.524 UTC THREAD1: Resolver[zk] = com.twitter.finagle.zookeeper.ZkResolver(com.twitter.finagle.zookeeper.ZkResolver@1a6dc589)
Jan 30 22:20:30 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:30.525 UTC THREAD1: Resolver[zk2] = com.twitter.finagle.serverset2.Zk2Resolver(com.twitter.finagle.serverset2.Zk2Resolver@697a34af)
Jan 30 22:20:31 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:31.276 UTC THREAD1: serving http admin on /0.0.0.0:9990
....
Jan 30 22:20:31 ip-10-168-184-121 linkerd[9073]: I 0131 03:20:31.545 UTC THREAD1: initialized

This is the current failureAccrual configuration:

 failureAccrual:
      kind: io.l5d.consecutiveFailures
      failures: 5
      backoff:
        kind: jittered
        minMs: 5000
        maxMs: 300000

Any ideas what may be happening here? I can say that there are actually no downstream issues with the services.

Is it possible to understand more about what state the connection was in before it was marked as dead? Was it one of the pooled connections that was open and something weird happened on the other end of it?

Thanks
Amit.

Hi, @amitsaha!

So I understand you are pretty sure that there are no obvious issues (failures or timeouts/taking too long to respond) with the downstream service, is that right?

In this case, if you disable the circuit breaking here as a test, what happens?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.