Higher sampling rate for error responses


Is there any way to alter the telemetry sampling rate as a function of the response?

It may be a difficult balance to get right when the error rate is very low, and the sampling rate is so low too that you rarely get a trace for errors.

There isn’t an automatic way to modify sample rate as a function of the response. In fact, once a response has been received, it’s too late to enable tracing on that request.

However, you can set the l5d-sample header which will enable tracing for that request. This is useful if you know how to manually reproduce errors and want to force tracing on for those requests.

Thanks for the info, Alex! It’s what I feared the answer would be, as I was going through the code.

To achieve what I was describing, I suppose Linkerd would have to gather tracing data on each request, but only send to Zipkin after evaluating the response, then the appropriate sampling probability.

Does anyone have a rough idea of how much overhead this would add?

I don’t think what you describe is possible with distributed tracing. Different parts of the trace are emitted from different instances of Linkerd. Once a Linkerd instance gets a response, it’s too late to tell downstream instances to emit traces for that request.

Ah! Because the downstream instance communicates separately to the trace server.

I’m wiring up my understanding. Thanks :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.