UPDATE: After understanding this better, updated my question.
Trying to understand more about the caching pool works with linkerd. According to the finagle docs, the pool size is a dynamically changing value based on the load. Looking at https://linkerd.io/config/head/linkerd/#client-parameters, we have the maxSize parameter which is described as “The minimum number of connections to maintain to each host”. My reading of this was that if the number of concurrent requests exceed maxSize, it will just open a new connection (for every request >= maxSize). But that doesn’t seem to be the case? maxSize is actually also the maximum number of concurrent requests we process?
You are correct that maxSize is the maximum number of connections that linkerd will open to a single destination host. If more than that many concurrent requests are attempted to that host, requests will be queued until there is a connection available. This means that the maximum number of concurrent requests for a service is # of hosts * maxSize. Concurrent requests beyond that number will be queued.
Based on the behavior of the connection pool size (metric: rt:client:pool_size) we observe an increase in this whenever there is a increased latency in the response time from the downstream service. Can you please explain this behavior a little more? In addition, what is the metric rt:client:pool_cached and it’s relation to pool_size? The behavior seems to be the cached size is usually 4 times that of the pool size.