Linkerd returns empty/null response with retryable response classifier

  service:
    responseClassifier:
      kind: io.l5d.http.retryableRead5XX

I am using a retryable configuration, after the retires upon status 501 failures (in my case), Linkerd returns empty response instead of error response.
We need linkerd to return the actual error response, How can we control this?

Config

  • version: 1.0.0
  • linkerd config able to route externally

Without nonRetryable5xx

Request

http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -vL http://foo.bar.com?wsdl

Response

User-Agent: curl/7.52.1
Accept: */*
Proxy-Connection: Keep-Alive

HTTP/1.1 501 Not Implemented
Content-Type: None
Content-Length: 0
Set-Cookie: BIGipServeriat-chrtr_cte_pool=1012995594.32784.0000; path=/; Httponly; Secure
l5d-success-class: 0.0
Via: 1.1 linkerd

Curl_http_done: called premature == 0
Connection #0 to host (nil) left intact

With RetryableRead5xx

Request

http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -vL http://foo.bar.com?wsdl

Response

*   Trying 34.211.160.197...
* TCP_NODELAY set
* Connected to (nil) (34.211.160.197) port 4140 (#0)
> GET http://foo.bar.com?wsdl HTTP/1.1
> Host: foo.bar.com
> User-Agent: curl/7.52.1
> Accept: */*
> Proxy-Connection: Keep-Alive
> 
* Curl_http_done: called premature == 0
* Empty reply from server
* Connection #0 to host (nil) left intact
curl: (52) Empty reply from server

Debugging

Logs

kubectl logs -f <l5d-pod> -c l5d

I 0808 14:23:27.260 UTC THREAD21 TraceId:e79783b6daa3cf51: FailureAccrualFactory marking connection to "$/io.buoyant.rinet/4222/foo.bar.com" as dead. Remote Address: Inet(foo.bar.com/206.221.59.190:4222,Map())
I 0808 14:23:32.467 UTC THREAD21 TraceId:e79783b6daa3cf51: FailureAccrualFactory marking connection to "$/io.buoyant.rinet/4222/foo.bar.com" as dead. Remote Address: Inet(foo.bar.com/206.221.59.190:4222,Map())
I 0808 14:23:42.344 UTC THREAD21 TraceId:e79783b6daa3cf51: FailureAccrualFactory marking connection to "$/io.buoyant.rinet/4222/foo.bar.com" as dead. Remote Address: Inet(foo.bar.com/206.221.59.190:4222,Map())
I 0808 14:23:57.144 UTC THREAD21 TraceId:e79783b6daa3cf51: FailureAccrualFactory marking connection to "$/io.buoyant.rinet/4222/foo.bar.com" as dead. Remote Address: Inet(foo.bar.com/206.221.59.190:4222,Map())
E 0808 14:24:24.680 UTC THREAD26 TraceId:e79783b6daa3cf51: service failure
com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Upstream Address: /10.2.104.168:17651, Upstream Client Id: Not Available, Downstream Address: foo.bar.com/206.221.59.190:4222, Downstream Client Id: $/io.buoyant.rinet/4222/foo.bar.com, Trace Id: e79783b6daa3cf51.bd2dbbf2910abcca<:e79783b6daa3cf51

Metrics

“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/available” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/cancelled_connects” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/closes” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.count” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.max” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.min” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p50” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p90” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p95” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p99” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p9990” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.p9999” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.sum” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connect_latency_ms.avg” : 708.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connection_duration.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connection_received_bytes.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connection_requests.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connection_sent_bytes.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connections” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/connects” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/dispatcher/serial/queue_size” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/dtab/size.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failed_connect_latency_ms.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failure_accrual/probes” : 3,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failure_accrual/removals” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failure_accrual/removed_for_ms” : 69179,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failure_accrual/revivals” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failures” : 358,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failures/com.twitter.finagle.CancelledRequestException” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException” : 357,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/adds” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/available” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/busy” : 1.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/closed” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/load” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/max_effort_exhausted” : 350,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/meanweight” : 1.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/p2c” : 1.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/rebuilds” : 364,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/removes” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/size” : 1.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/loadbalancer/updates” : 14,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pending” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pool_cached” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pool_num_too_many_waiters” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pool_num_waited” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pool_size” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/pool_waiters” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/received_bytes” : 57834,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.count” : 106,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.max” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.min” : 158,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p50” : 159,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p90” : 168,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p95” : 176,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p99” : 185,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p9990” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.p9999” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.sum” : 17102,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/request_latency_ms.avg” : 161.33962264150944,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/requests” : 358,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/budget” : 103.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/budget_exhausted” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/cannot_retry” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/not_open” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/request_limit” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.count” : 106,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.max” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.min” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p50” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p90” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p95” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p99” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p9990” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.p9999” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.sum” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/retries/requeues_per_request.avg” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/sent_bytes” : 161174,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/failures” : 358,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/failures/com.twitter.finagle.CancelledRequestException” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException” : 357,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/pending” : 0.0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.count” : 106,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.max” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.min” : 158,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p50” : 159,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p90” : 168,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p95” : 176,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p99” : 185,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p9990” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.p9999” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.sum” : 17127,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/request_latency_ms.avg” : 161.5754716981132,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/requests” : 358,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service/svc/foo.bar.com/success” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.count” : 107,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.max” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.min” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p50” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p90” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p95” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p99” : 3,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p9990” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.p9999” : 708,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.sum” : 714,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/service_creation/service_acquisition_latency_ms.avg” : 6.672897196261682,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/socket_unwritable_ms” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/socket_writable_ms” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/1XX” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/2XX” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/3XX” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/4XX” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/501” : 357,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/5XX” : 357,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/status/error” : 1,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/success” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/1XX.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/2XX.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/3XX.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/4XX.count” : 0,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.count” : 106,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.max” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.min” : 158,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p50” : 159,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p90” : 169,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p95” : 176,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p99” : 185,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p9990” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.p9999” : 203,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.sum” : 17157,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/5XX.avg” : 161.85849056603774,
“rt/outgoing/client/$/io.buoyant.rinet/foo.bar.com/time/error.count” : 0,

Hi @zshaik, can you provide some more information please? What response is your service returning? What HTTP status code does linkerd return in it’s empty response? Do you know if Linkerd stops retrying because it runs out of retry budget? Or because it eventually receives a successful response? Do you have specific steps to reproduce this?

1 Like

@alex Hey, Sure! I thought that was expected behavior :paw_prints: updated my question.
Thank you.

What it looks like is happening is that the request is being continually retried until curl cancels the request (either because it times out or you ctrl-c it). You could confirm this by looking at the /rt/outgoing/service/.../retires/... metrics.

Hmm… I increased the timeout & still get the empty response, something else should be disconnecting this?
http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -vL --connect-timeout 100000 http://foo.bar.com?wsdl

I am doing with a default retry budget and without totalTimeoutMs

  service:
    responseClassifier:
      kind: io.l5d.http.retryableRead5XX

:thinking:

Also tried httpie instead of curl and didn’t help
curl version: curl 7.52.1
httpie: 0.9.8

The service level stats will help with debugging (since retries happen at the service layer):

/rt/outgoing/service/...

Another thing to try is to set the retry budget to 0 so that no retries occur (or low enough that only a finite number of retries will occur).

1 Like

Hey that’s right! @alex, Setting the retry budget to a low number resolved !

  service:
    retries:
      budget:
        minRetriesPerSec: 1
        percentCanRetry: 0.1
        ttlSecs: 30
    responseClassifier:
      kind: io.l5d.http.retryableRead5XX

and the metrics then,

 "rt/outgoing/service/svc/foo.bar.com:4222/failures" : 1,
  "rt/outgoing/service/svc/foo.bar.com:4222/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 1,
  "rt/outgoing/service/svc/foo.bar.com:4222/pending" : 0.0,
  "rt/outgoing/service/svc/foo.bar.com:4222/request_latency_ms.count" : 0,
  "rt/outgoing/service/svc/foo.bar.com:4222/requests" : 1,
  "rt/outgoing/service/svc/foo.bar.com:4222/retries/budget" : 30.0,
  "rt/outgoing/service/svc/foo.bar.com:4222/retries/budget_exhausted" : 1,
  "rt/outgoing/service/svc/foo.bar.com:4222/retries/per_request.count" : 0,
  "rt/outgoing/service/svc/foo.bar.com:4222/retries/total" : 30,
  "rt/outgoing/service/svc/foo.bar.com:4222/success" : 0,

Metrics with default retry budget below,

  "rt/outgoing/service/svc/foo.bar.com/failures" : 1,
  "rt/outgoing/service/svc/foo.bar.com/failures/com.twitter.finagle.CancelledRequestException" : 1,
  "rt/outgoing/service/svc/foo.bar.com/pending" : 0.0,
  "rt/outgoing/service/svc/foo.bar.com/request_latency_ms.count" : 0,
  "rt/outgoing/service/svc/foo.bar.com/requests" : 1,
  "rt/outgoing/service/svc/foo.bar.com/retries/budget" : 100.0,
  "rt/outgoing/service/svc/foo.bar.com/retries/budget_exhausted" : 0,
  "rt/outgoing/service/svc/foo.bar.com/retries/per_request.count" : 0,
  "rt/outgoing/service/svc/foo.bar.com/retries/total" : 352,
  "rt/outgoing/service/svc/foo.bar.com/sourcedfailures/$/io.buoyant.rinet/4222/foo.bar.com" : 1,
  "rt/outgoing/service/svc/foo.bar.com/sourcedfailures/$/io.buoyant.rinet/4222/foo.bar.com/com.twitter.finagle.CancelledRequestException" : 1,
  "rt/outgoing/service/svc/foo.bar.com/success" : 0,

One more observation: Sending Parallel Requests!

#!/bin/sh
    set -m # Enable Job Control
    for i in `seq 5`; do # start 5 jobs in parallel
     http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -v http://foo.bar.com?wsdl &
    done

Sending requests in parallel (>5 only) with default retry budget is returning the error response back with no issues after the retires!

What’s the math behind this? ¯_(ツ)_/¯

https://linkerd.io/config/1.1.2/linkerd/index.html#retry-budget-parameters

Hey I just found this, going through, thanks!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.