No hosts are available for /svc/qa5/tags, Dtab.base=[], Dtab.local=[]. Remote Info: Not Available
The error log from linkerd is:
0614 21:44:05.072 4243c69c8667b009.4243c69c8667b009<:4243c69c8667b009] Message(namer.success)
0614 21:44:05.073 4243c69c8667b009.4b47ee18e32febf2<:4243c69c8667b009] ClientAddr(/10.97.26.244:59612)
0614 21:44:05.074 4243c69c8667b009.40891eea44fc43e0<:4b47ee18e32febf2] Message(namer.success)
0614 21:44:05.076 4243c69c8667b009.40891eea44fc43e0<:4b47ee18e32febf2] BinaryAnnotation(io.buoyant.router.Failure,ClientAcquisition)
E 0614 21:44:05.076 UTC THREAD29 TraceId:4243c69c8667b009: service failure
com.twitter.finagle.NoBrokersAvailableException: No hosts are available for /svc/qa5/tags, Dtab.base=[], Dtab.local=[]. Remote Info: Not Available
What port are you sending to when you get the “No hosts are available” available? Based on the metrics it looks like you may be sending to 4141, the incoming router. These requests will fail unless there happens to be an instance of the target service running on the same node as the linkerd you’re sending to. You should instead send your request to port 4140, the outgoing router. This will forward the request to a node where the target service is running.
Yes - there is a service instance running on the same node as linkerd.
We send requests only to 4140. The traffic into 4141 that you see should just be the requests being forwarded by linkerd to itself (meant for the local service instance).
Note that if I set the dtab to direct 100% of the traffic to the local service instance (or the remote service instance), things work just fine. The problem occurs ONLY when the dtab rule is a weighted union.
Looking at the dtab resolution screenshot from the outgoing router, it looks like linkerd should be sending traffic to a weighted union of 70% to 10.97.26.244:4141 and 30% to 10.97.25.51:4141 (I assume. the last line of the delegation is cut off in the screenshot).
The next step in debugging would be to go to the dtab playground of the linkerd on each of those nodes and look at the delegation for /svc/qa5/tags on the “incoming” router. (The router can be selected from a dropdown in the top right).
Hopefully that can shed some light as to what is going wrong.
Oh, looks like the linkerd incoming dtabs don’t resolve well. I suppose the rules need to be specified differently. A different prefix on the incoming router, that doesn’t have the weighted union?
(edit: just remembered that you’re not using k8s. I edited the dtab in the config below to strip the port transformer prefix instead of the k8s daemonset transformer prefix)
What I believe is happening is that at the outgoing router, linkerd is evaluating the dtab, encountering a weighted union, and picking one of the branches to route to. When the request reaches the incoming router on the destination node, it’s evaluating the dtab again and can potentially pick a different branch of the union. If it does, there’s not guarantee that an instance of that service will be running on that node.
To get around this problem you can have the incoming router configured to use the same client name that the outgoing router used (instead of evaluating the dtab all over again). You can do this by using the io.l5d.header identifier to read the l5d-dst-client header and changing the dtab to strip off the transformer prefix. You’ll also want to do path consumption to the outgoing router since the path identifier is not longer used on the incoming router. All together, your config would look something like this:
admin:
port: 9990
telemetry:
- kind: io.l5d.prometheus
routers:
- protocol: http
label: outgoing
#streamingEnabled: false
#dtab: |
# /svc=>/#/io.l5d.marathon;
identifier:
kind: io.l5d.path
consume: true # <-- notice consumption has been moved here
segments: 2
interpreter:
kind: io.l5d.namerd
experimental: true
dst: /$/inet/namerd/4100
namespace: sessionm
transformers:
- kind: io.l5d.port
port: 4141
servers:
- port: 4140
ip: 0.0.0.0
- protocol: http
label: incoming
#streamingEnabled: false
dtab: | # <-- this dtab strips off the transformer prefix
/svc/%/io.l5d.port/4141 => /;
identifier:
kind: io.l5d.header
header: l5d-dst-client # <-- use the client name picked by the outgoing router
interpreter:
kind: default # <-- We're just using the client name from the outgoing router so no need to talk to namerd
transformers:
- kind: io.l5d.localhost
servers:
- port: 4141
ip: 0.0.0.0
client:
The concrete client id. In your case it looks something like: /%/io.l5d.port/4141/#/io.l5d.marathon/qa5/tags.blue
It depends on if the target app expects the path to be stripped or not. The destination linkerd doesn’t need the path to be stripped but the destination app might.
That’s correct.
I can’t really say how much blue-green deploy is used. It’s definitely one of the more advanced of linkerd’s capabilities so it makes sense that it’s not as widely used.