We have a linkerd setup which as per the docs is a Per-host deployment. We are configuring the
hostConnectionPool object for each client so that the
maxSize parameter is set to a value which is sufficiently safe for the capacity of the corresponding downstream service. In our case, the capacity is the number of worker processes we run, However, the calculation is done by hand and rough estimates based on the fixed number of worker processes.
The setting of
maxSize value feels manual and will need to be recalculated whenever we add a new host which will talk to the downstream service. Is there a better way to do this?
Thanks for any insights.
Hi @amitsaha. Personally, I’ve never run into a situation where manually limiting the number of connections has been necessary. Can you tell me a bit more about the behavior that arrises if you don’t set a
maxSize? Do your downstream services queue connections if they don’t have a worker available to accept them?
We started off by not having any mazSize for the connection pool. What we
learned the hard way was this:
Out setup has linkerd running per host on a number of instances being used
to proxy requests to multiple downstream services. When the request latency
to a downstream service would increase, each linkerd instance’s pool size
would grow by a certain amount. However each of these small amounts would
sum to a large enough number such that they would be close to or more than
total number of workers in those downstream services across hosts. Hence,
those services would be tied up to these linkerd connections even though
they may not be serving active requests. As a result, those services are
not able to process any new connections.
Does that make sense?
Yes, the downstream services queue connections when there are no workers.
Each worker is tied to a single persistent connection and can only serve
requests for that connection till the remote end closes the connection.
Please Let me know if I can furnish more details.
Ah, I see. So essentially you have a hard cap on the number of connections each downstream can accept. I don’t have any recommendation other than setting
maxSize to a reasonable number and ensuring that your upstream and downstream clusters are sized appropriately. If I’m understanding correctly, the relationship that you want to maintain is
upstreams * maxSize <= downstreams * workers
Sorry that I don’t have anymore more substantial to recommend. If you have a hard limit on the number of connections the downstream can accept, there’s not much that Linkerd can do except for limiting its own connection use.
Thanks. No problems.
Would you be able to shed further light on the behavior of opening up more
connections when we see an latency increase? Is it just purely due to the
buildup or is the default load balancing heuristic playing a role here?
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.