Namerd got high CPU usage and lots of IO for writing logs

Hi team and @siggy

Recently i got an weird issue that as title, Namerd continuously wrote logs, 10M about a min, after restarting Namerd, it’s gone.

Namerd logs(almost all log entries like below) :

E 1205 09:10:30.732 UTC THREAD23 TraceId:288653d33dadb5cb: consul ns %2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd dtab observation error UnexpectedRespons
e(301: <a href="/etc/passwd?raw=true&amp;dc=ocp">Moved Permanently</a>.

)
E 1205 09:10:30.732 UTC THREAD23 TraceId:288653d33dadb5cb: consul ns %2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd dtab observation error UnexpectedRespons
e(301: <a href="/etc/passwd?raw=true&amp;dc=ocp">Moved Permanently</a>.

)

Namerd config:

admin:
  ip: 0.0.0.0
  port: 9991

storage:
  kind: io.l5d.consul
  host: 127.0.0.1
  port: 8500
  pathPrefix: /namerd/dtabs
  datacenter: ocp
namers:
- kind: io.l5d.consul
  prefix: /io.l5d.consul
  host: 127.0.0.1
  port: 8500
  includeTag: false
  setHost: false
  useHealthCheck: true
- kind: io.l5d.consul
  prefix: /io.l5d.mbs.consul
  host: 127.0.0.1
  port: 8500
  includeTag: true
  setHost: false
  useHealthCheck: true
interfaces:
- kind: io.l5d.thriftNameInterpreter
  ip: 0.0.0.0
  port: 4100
- kind: io.l5d.httpController
  ip: 0.0.0.0
  port: 4180

When got this issue, no linkerd error info, so no related linkerd log.

Please help check, thanks in advance.

Thanks,
Clare

That error is coming from Namerd’s Consul Dtab store:

The namespace it is trying to lookup, when URL decoded, is ../../../../../../../../../../../../etc/passwd. I’m not totally sure what’s going on here, but could this be some kind of malicious activity? I am trying to craft a request that would generate this sort of failure.

Is it possible to share the following:

  1. linkerd config
  2. linkerd + namerd versions
  3. log output from linkerd
  4. dtabs for all namespaces (confirm in namerd’s dtab ui)

Thanks.

  1. linkerd config:
admin:
  port: 9990
  ip: 0.0.0.0

routers:
- protocol: http
  identifier:
    kind: customer.Identifier
  label: customer_outgoing
  interpreter:
    kind: io.l5d.namerd
    dst: /$/inet/namerd.service.consul/4100
    namespace: customer
  httpAccessLog: /alloc/logs/access_customer.log
  servers:
  - port: xxx
    ip: 0.0.0.0
- protocol: http
  label: outgoing
  interpreter:
    kind: io.l5d.namerd
    dst: /$/inet/namerd.service.consul/4100
    namespace: default
  httpAccessLog: /alloc/logs/access_outgoing.log
  servers:
  - port: xxx
    ip: 0.0.0.0

telemetry:
- kind: io.l5d.recentRequests
  sampleRate: 1.0
- kind: io.l5d.prometheus

usage:
  enabled: false
  1. linkerd + namerd version: 1.3.0
  2. No linkerd log when got the error
  3. only two namespaces: default and customer.

i got one log entry like this, hope can help.

I 1128 07:25:16.589 UTC THREAD10: Reaping /svc/active/%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd

also give you dtab rule:

/active      => /#/io.l5d.mbs.consul/ocp/active;
/baking      => /#/io.l5d.mbs.consul/ocp/baking;
/svc/active  => /active;
/svc/baking  => /baking;

Looks like there are two issues here.

  1. We have confirmed that providing a malformed namespace to namerd’s dtab API causes the large amount of log messages, for example:

    curl -v "192.168.99.100:4180/api/1/dtabs/%2e%2e/foo"
    

    We will track this fix here:
    https://github.com/linkerd/linkerd/issues/1733

  2. This does mean that in your datacenter, something made a request that looked like this:

    curl -v "192.168.99.100:4180/api/1/dtabs/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd"
    

When #1 is addressed, a bad request like #2 should only generate a single log message, but recommend you investigate the source of that request.

thanks @siggy, our environments were scanned by security team with different request URLs on 8080 and 8081 port, /../../../../../etc/passwd is one of them.

But about #2, i think there isn’t an external request directly going to Namerd, it should be forwarded by other way, but i don’t know who did that.

why i think it should be forwarded by other way? Because we used a customized Identifier that will transform all requests from URL like http://$linkerd_ip:8080/segment1/segment2/…/ to concrete name like /svc/active/segment1/segment2/…/ or /svc/backing/segment1/segment2/…/, segment1 as service name, this is done by datab rule that i listed above.
from linkerd log, there is a log entry:

I 1202 17:20:09.233 UTC THREAD10: Reaping /svc/active/%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd

it means customized Identifier had transformed request to concrete name, you know, normally, if concrete name can’t be blinded to real IP/port pair, it will print log in linkerd’s log file like:

E 1202 17:27:30.364 UTC THREAD35 TraceId:b2f75a265fa9c548: service failure: com.twitter.finagle.NoBrokersAvailableException: No hosts are available for /svc/active/images, Dtab.base=[], Dtab.local=[]. Remote Info: Not Available

But i don’t find the failure log about /svc/active/%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc%2fpasswd, so why Named thinks it’s a Namerd namespace, not a service name, instead goes to query namespace with that, it works two different way.

By the way, we have another namespace default on same Namerd, another linkerd router port 8081 maps to it, default namespace dtab rule as:

/consul => /#/io.l5d.consul/ocp;
/host   => /consul;
/svc    => /$/io.buoyant.http.subdomainOfPfx/svc.consul/host;

i think the follow log entry is generated when request went through linkerd router 8081 port.

W 1202 16:30:30.864 UTC THREAD35: Exception propagated to the default monitor (upstream address: /xx.xx.xx.xx:60867, downstream address: n/a, label: 0.0.0.0/8081).
io.buoyant.router.RoutingFactory$UnknownDst: Unknown destination: Request("GET /../../../../../etc/passwd", from /xx.xx.xx.xx:60867) / Host header is absent

Hope this can help troubleshooting.

Thanks,
Clare

@siggy, could you help give more explanation about this?

Hi Clare,

It sounds like linkerd port 8081 may be the entrypoint for the /etc/passwd logging in namerd, though it’s difficult for us to confirm without seeing what customer.Identifier does on port 8080.

I recommend you experiment with queries in linkerd’s dtab admin ui. Start with something like /host/%2e%2e/etc/passwd/foo, be sure to try it on both customer_outgoing and outgoing routers, selectable in the upper-right corner.

@siggy, what’s format input request that linkerd will generate this log? how did linkerd generate it?

 W 1202 16:30:30.864 UTC THREAD35: Exception propagated to the default monitor (upstream address: /xx.xx.xx.xx:60867, downstream address: n/a, label: 0.0.0.0/8081).
io.buoyant.router.RoutingFactory$UnknownDst: Unknown destination: Request("GET /../../../../../etc/passwd", from /xx.xx.xx.xx:60867) / Host header is absent

follow your recommendation, i can’t get that.

Thanks,
Clare

Hi @yangzhares. I was able to reproduce this with a ruby script. It looks like curl and wget remove ../ before sending the request.

Modified http.yaml example from the linkerd repo

# HTTP protocol
namers:
- kind: io.l5d.fs
  rootDir: linkerd/examples/io.l5d.fs

routers:
- protocol: http
  dtab: |
    /host => /#/io.l5d.fs;
    /svc    => /$/io.buoyant.http.subdomainOfPfx/host;
  servers:
  - port: 4140
    ip: 0.0.0.0

Ruby script

uri = URI("http://localhost:4140/../../../etc/passwd")
req = Net::HTTP::Get.new(uri)
req['Host'] = ""

res = Net::HTTP.start(uri.hostname, uri.port) {|http|
  http.request(req)
}

linkerd log output

W 1214 18:38:50.403 UTC THREAD36: Exception propagated to the default monitor (upstream address: /127.0.0.1:59921, downstream address: n/a, label: 0.0.0.0/4140).
io.buoyant.router.RoutingFactory$UnknownDst: Unknown destination: Request("GET /../../../etc/passwd", from /127.0.0.1:59921) / Host header is absent

Discussion moved to GitHub: https://github.com/linkerd/linkerd/issues/1766