Horizontal Pod Auto Scaling based on the number of unique gRPC requests

Dear all,

In the current deployment, there are a few pods running and waiting for gRPC calls from various clients. Linkerd is used as a service mesh to load balance the gRPC requests coming from the clients, and it forwards new requests to unused pods which works great so far.

The problem now is when the number of requests exceeds the number of pods in a deployment, for some reason all the requests get mixed with the active pods and all the pods end up processing the last request.

So my question is how to upscale the number of pods based on new incoming gRPC requests, for example 2 pods are currently processing 2 unique requests and a 3rd request has just arrived so I would like to spawn a new pod for it to process it. How is that possible?

@edpell There’s a good example of autoscaling on latency here and it should be possible to modify the autoscaling rules with a custom query to get the request rate from Linkerd metrics.

Using the linkerd viz stat command, you get output similar to the following, which includes the request rate for traffic:

NAME   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN
web       1/1    91.55%   2.4rps           4ms           9ms          10ms          4

That value is derived from the success and failure rates over a given time window.

So, you could write a prometheus query using the response_total metric and the classification label to get the RPS for your custom rule. All of the available prometheus metrics are available from a given pod using the linkerd dg proxy-metrics <podname> command.

Hope this gets you pointed in the right direction!