Server labels in metrics?

Hey all,

I have a setup exporting StatsD metrics which is sent to statsd exporter and then scraped by prometheus. The reason we have it like this is because our prometheus server expects the target to always export metrics at <ip>/metrics which isn’t currently possible via linkerd’s prometheus telemetry configuration.

A particular metric I am looking at is linkerd_rt_service_server_0_0_0_0_11500_failures_counter (where service is my router label). I see that it doesn’t have any distinguishing information regarding which of the server nodes (currently discovers via consul), the failures were upon. In prometheus terms, I was expecting a label to identify which of the server nodes the failures were on. Perhaps, it’s a limitation of the statsd metrics? If so, we should probably be using “Tags” ?

Any suggestions/insights would be helpful.

Thanks,
Amit.

Hi Amit,

linkerd_rt_service_server_0_0_0_0_11500_failures_counter simply counts the number of failures that that linkerd serves back (on port 11500). I think the metric linkerd_rt_service_client_<client name>_failures_counter is close to what you want. However, this will give you failures at the granularity of consul service. ie it will give you a failure count per consul service, not per host. We currently do not report per-host stats because the number of instances of a service may be very large and generate an overwhelming number of counters.

Does this answer your question?

Hi @amitsaha,

You may have explored this already, but modifying you metrics_path in your Prometheus config and switching to the Prometheus Telemeter should give you more flexibility (and eliminate the need for statsd exported). Example here:

Thanks Alex for the clarification and you are right, I should be looking your suggested metric. My main interest in this metric now was to see the circuit breaking in action. Is there another more relevant metric? Effectively, I am trying to see that if one of my host’s are failing, linkerd automatically reroutes the request to another good host.

Thanks. Yeah, unfortunately, our central prometheus server expects all exporters to be service on /metrics.

Take a look at

linkerd_rt_service_client_<client name>_loadbalancer_size_gauge
linkerd_rt_service_client_<client name>_loadbalancer_available_gauge

Those indicate the total number of endpoints in the load balancer, and how many are currently available (ie do not have their circuit breakers triggered). This won’t be able to tell you which hosts have flipped their breakers, but it will tell you how many.

1 Like

Thank you! That helps.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.