What port are you sending to when you get the “No hosts are available” available? Based on the metrics it looks like you may be sending to 4141, the incoming router. These requests will fail unless there happens to be an instance of the target service running on the same node as the linkerd you’re sending to. You should instead send your request to port 4140, the outgoing router. This will forward the request to a node where the target service is running.
Yes - there is a service instance running on the same node as linkerd.
We send requests only to 4140. The traffic into 4141 that you see should just be the requests being forwarded by linkerd to itself (meant for the local service instance).
Note that if I set the dtab to direct 100% of the traffic to the local service instance (or the remote service instance), things work just fine. The problem occurs ONLY when the dtab rule is a weighted union.
Looking at the dtab resolution screenshot from the outgoing router, it looks like linkerd should be sending traffic to a weighted union of 70% to
10.97.26.244:4141 and 30% to
10.97.25.51:4141 (I assume. the last line of the delegation is cut off in the screenshot).
The next step in debugging would be to go to the dtab playground of the linkerd on each of those nodes and look at the delegation for
/svc/qa5/tags on the “incoming” router. (The router can be selected from a dropdown in the top right).
Hopefully that can shed some light as to what is going wrong.
Oh, looks like the linkerd incoming dtabs don’t resolve well. I suppose the rules need to be specified differently. A different prefix on the incoming router, that doesn’t have the weighted union?
linkerd 01 incoming
I think I understand what’s going on with this. But my root problem is still not solved. I’ll make another post about it.
(edit: just remembered that you’re not using k8s. I edited the dtab in the config below to strip the port transformer prefix instead of the k8s daemonset transformer prefix)
What I believe is happening is that at the outgoing router, linkerd is evaluating the dtab, encountering a weighted union, and picking one of the branches to route to. When the request reaches the incoming router on the destination node, it’s evaluating the dtab again and can potentially pick a different branch of the union. If it does, there’s not guarantee that an instance of that service will be running on that node.
To get around this problem you can have the incoming router configured to use the same client name that the outgoing router used (instead of evaluating the dtab all over again). You can do this by using the
io.l5d.header identifier to read the
l5d-dst-client header and changing the dtab to strip off the transformer prefix. You’ll also want to do path consumption to the outgoing router since the path identifier is not longer used on the incoming router. All together, your config would look something like this:
admin: port: 9990 telemetry: - kind: io.l5d.prometheus routers: - protocol: http label: outgoing #streamingEnabled: false #dtab: | # /svc=>/#/io.l5d.marathon; identifier: kind: io.l5d.path consume: true # <-- notice consumption has been moved here segments: 2 interpreter: kind: io.l5d.namerd experimental: true dst: /$/inet/namerd/4100 namespace: sessionm transformers: - kind: io.l5d.port port: 4141 servers: - port: 4140 ip: 0.0.0.0 - protocol: http label: incoming #streamingEnabled: false dtab: | # <-- this dtab strips off the transformer prefix /svc/%/io.l5d.port/4141 => /; identifier: kind: io.l5d.header header: l5d-dst-client # <-- use the client name picked by the outgoing router interpreter: kind: default # <-- We're just using the client name from the outgoing router so no need to talk to namerd transformers: - kind: io.l5d.localhost servers: - port: 4141 ip: 0.0.0.0 client:
Let me know if this helps!
Yes, I figured out what was going on over the weekend too. Thanks.
I’ll try out your solution. I have a few questions on it though.
Where are some of these things documented? E.g. the part about how linkerd puts the client in the header, and what that header looks like. Finagle?
What exactly goes into the client header? What does it look like?
Why does the path need to be consumed on the outgoing router? The incoming router constructs the identifier from a header (not path), right?
One side effect of this solution is that you can’t hit the incoming router directly with a request. Right?
Is the recipe of blue green deploy not used much?
- The concrete client id. In your case it looks something like:
- It depends on if the target app expects the path to be stripped or not. The destination linkerd doesn’t need the path to be stripped but the destination app might.
- That’s correct.
- I can’t really say how much blue-green deploy is used. It’s definitely one of the more advanced of linkerd’s capabilities so it makes sense that it’s not as widely used.
Thanks. That makes sense. A couple of follow-ups.
/#/io.l5d.marathon/qa5/tags.blue gets hit both in the incoming and outgoing router, right?
The way I am reading what you are saying - “consume” actually modifies the path in the URL forwarded to the next hop. Is that right? And are there other primitives that operate on the URL directly?
Can you think of a way to do this so that the ability to directly hit the incoming router can be retained?
- Both the incoming and outgoing router use
/#/io.l5d.marathon/qa5/tags.blueas the client name. Only the incoming router actually sends the request directly to the service.
- Yes, that’s right. This is the only place (currently) that linkerd modifies the URL.
- It depends on what you want. You can still hit the incoming router directly, but you’ll need to manually set the
l5d-dst-clientheader. Or you can add a third router that has the same config as your old incoming router and hit that directly. But beware: this has the same issue where if you hit that router directly and the service it resolves to is not on that node, then the request will fail.
- the incoming and outgoing router use /#/io.l5d.marathon/qa5/tags.blue as the client name. Only the incoming router actually sends the request directly to the service.
I meant, both the routers resolve the client name through marathon to get a list of physical addresses. The outgoing router load balances among them, while the incoming router keeps only the local address.
Yes. Makes sense.
One last question. What’s the best way to look at the request (with headers etc), delegation steps, resolution etc., as the request propagates through the system.
A couple ways to get a more granular view of requests:
-com.twitter.finagle.tracing.debugTrace=true -log.level=DEBUGto your linkerd+namerd startup commands (if you haven’t already)
Wanted to run this by you. The following setup is just a slight variation from what you had originally recommended. The one advantage I see is that there is more dynamic control over traffic via namerd configuration.
But can you see if there are any inefficiencies (small or big) in doing it this way?
admin: port: 9990 telemetry: - kind: io.l5d.prometheus routers: - protocol: http label: svc dstPrefix: /svc #streamingEnabled: false #dtab: | # /svc=>/#/io.l5d.marathon; identifier: kind: io.l5d.path consume: false segments: 2 interpreter: #kind: io.l5d.namerd #dst: /$/inet/namerd/4100 #namespace: sessionm kind: io.l5d.mesh dst: /$/inet/namerd/4101 root: /sessionm experimental: true transformers: - kind: io.l5d.port port: 4141 servers: - port: 4140 ip: 0.0.0.0 - protocol: http label: fwd dstPrefix: /fwd #streamingEnabled: false #dtab: | # /svc=>/#/io.l5d.marathon; identifier: kind: io.l5d.header header: l5d-dst-client interpreter: #kind: io.l5d.namerd #dst: /$/inet/namerd/4100 #namespace: sessionm kind: io.l5d.mesh dst: /$/inet/namerd/4101 root: /sessionm experimental: true transformers: - kind: io.l5d.localhost servers: - port: 4141 ip: 0.0.0.0 - protocol: http label: loc dstPrefix: /loc #streamingEnabled: false #dtab: | # /svc=>/#/io.l5d.marathon; identifier: kind: io.l5d.path consume: false segments: 2 interpreter: #kind: io.l5d.namerd #dst: /$/inet/namerd/4100 #namespace: sessionm kind: io.l5d.mesh dst: /$/inet/namerd/4101 root: /sessionm experimental: true transformers: - kind: io.l5d.localhost servers: - port: 4142 ip: 0.0.0.0
/res=>/#/io.l5d.marathon; /svc/qa5/tags=>5.00*/res/qa5/tags.blue & 5.00*/res/qa5/tags.green; /fwd/%/io.l5d.port/4141=>/; /loc=>/svc
That looks great to me!
Thanks for all your help!