Here’s a use-case for which I’m having no success with Linkerd.
I have a read-only cache service sitting beside a read-write server, and want to reach the cache only if the server fails, or takes too much time to respond. In other words, I want requests to be routed to the primary service replica set, or fall-back to a secondary replica set.
I tried to encode this rule with dtab precedence:
/svc => /#/namer/cache
/svc => /#/namer/server
That’s equivalent to alternates:
/svc => /#/namer/server | /#/namer/cache
I test it by taking down the server. Sure enough, it gets removed from the service registry, and the interpreter binds /svc to the result of /#/namer/cache.
Next, I replace the server with one that responds with failures. The name resolution suceeds with /#/namer/server, because it is still in the service registry. Requests get routed to the server address, and with enough of them, it eventually gets marked unavailable by FailFastFactory, or dead by FailureAccrualFactory.
That does not prevent binding the service name to the server, and the cache never gets hit. When the request is retryable, retries still get routed to the failed server internally, and the retry budget is depleted as fast as the backoff allows.
Is there a way to support this use-case with Linkerd?
One solution to this use-case is to register the server and the cache with the same name in the service registry, with different weights. The primary service gets assigned a very high weight relative to the secondary.
Since I use Consul, I added support for setting weights on addresses based on what tags the service has.
Unfortunately, the address of the primary service still gets picked, even when the endpoint is marked dead.
Given a retryable response, lowering the weight enough that the retry budget statistically allows to hit the secondary, I see many retries then a success. This is cool, but uses up all the retry budget, and consumes resources while the primary server sends many error responses for each request hitting Linkerd.
Going through the code, I believe the problem lies in the TrafficDistributor picking one weighted partition without consideration for the health of the partition. The bad address does not get removed from the Event[Activity.State[Set[Address]]], and the request gets to a load balancer that doesn’t even have the secondary in its address set, because it handles the high-weight partition.
@elecnix this is fascinating and goes deeper into the inner workings of the load balancer than I’ve been before. Do you know if this partitioning behavior is used by all load balancing algorithms? Do you get different behavior if you use the heap or aperture load balancers, for example?
Yes, the LoadBalancerFactory module always defers to a TrafficDistributor to interpret the weighted addresses. Normally, all addresses have the same weight, so you get a single load balancer. And it doesn’t matter which balancer you use; the factory delegates to the LoadBalancerFactory Stack Param.
turn a successful resolution into a negative one when the balancer is not in Open status, or
implement a new LoadBalancerFactory that gets all endpoints with equal weights, but favors addresses having a certain key in the Address metadata, and the status of the endpoint is Open.
The first option would allow dtab precedence to work as expected above. The second one is not likely to get accepted into the project, because it would require collaboration of a Namer to set the metadata, and the exact behavior of the balancer may not be generic enough.
I’m open to other ideas, or help in implementing the first option.
Today’s finding: NameTreeFactory doesn’t support Alt nodes, but would have to if namerd evaluated all alternate client names. NameTreeFactory would check the status of the underlying factory for picking which one to use. It could also check the health of unions to avoid Drv’ing to unhealthy names.
After being away in training for most of this week, I was finally able to test a proof of concept:
The change is entirely in Finagle, and will bring some overhead because of the preemptive binding of all alternate paths. We may want to add a feature toggle in the form of a Stack Param. If you support this approach, I’ll create a Finagle pull request to get a second opinion.