502 Bad Gateway when resolving URLs with a public well-known ca with default tls client

I have linkerd installed in DC/OS and am trying to get my routers configured correctly. Our microservices may make RPC calls to microservices deployed via Marathon and external / legacy systems where DNS resolution will be used.

Given the following configuration:

admin:
  port: 9990
telemetry:
- kind: io.l5d.prometheus
namers:
- kind: io.l5d.marathon
  host: leader.mesos
  port: 8080
  prefix: "/io.l5d.marathon/incoming"
  uriPrefix: ''
- kind: io.l5d.marathon
  host: leader.mesos
  port: 8080
  prefix: "/io.l5d.marathon/outgoing"
  uriPrefix: ''
  transformers:
  - kind: io.l5d.port
    port: 4141
routers:
- protocol: http
  servers:
  - port: 4140
    ip: 0.0.0.0
  client:
    kind: io.l5d.static
    configs:
    - prefix: "/$/io.buoyant.rinet/443/{service}"
      tls:
  dtab: |
    /ph        => /$/io.buoyant.rinet ; # Lookup the name in DNS
    /svc       => /ph/80 ; # Use port 80 if unspecified
    /srv       => /$/io.buoyant.porthostPfx/ph ; # Attempt to extract the port from the hostname
    /srv       => /#/io.l5d.marathon/outgoing ; # Lookup the name in Marathon
    /svc       => /srv ;
  label: outgoing
- protocol: http
  servers:
  - port: 4141
    ip: 0.0.0.0
  dtab: |
    /marathonId =>  /#/io.l5d.marathon/incoming ;
    /svc        =>  /$/io.buoyant.http.domainToPathPfx/marathonId ;
  label: incoming
  interpreter:
    kind: default
    transformers:
    - kind: io.l5d.localhost

When I run the following command on a host where linkerd is installed curl -x localhost:4140 -X GET http://www.google.com:443 -sI | head -n 1 I receive HTTP/1.1 502 Bad Gateway. If I add the disableValidation: true setting to the tls client, the same curl command succeeds.

When @Alex runs the same curl command with his linkerd instance (without setting disableValidation), he receives a success - HTTP/1.1 200 OK.

Hoping someone can help me understand what I’m doing wrong here. FYI I’m using v1.1.0 in docker, deployed via Marathon.

It looks like the tls: block is missing the commonName property. Is that a typo or is it actually missing from your config?

I had it in there but it didn’t appear to make a difference. Here’s what it was:

    kind: io.l5d.static
    configs:
    - prefix: "/$/io.buoyant.rinet/443/{service}"
      tls:
        commonName: "{service}"

I just added the commonName back in and although I get the same 502 response, the error log shows a different / more helpful error:

java.lang.IllegalArgumentException: File does not contain valid certificates: /tmp/certCollection4434947254200694809.tmp```
...
```Caused by: java.security.cert.CertificateException: found no certificates in input stream```

Interesting! Is there a full stack-trace there that you can share?

I may have narrowed it down. When I deploy v1.0.2 of the docker image, the same yaml config works great.

Is there something missing in the v1.1.0 docker image?

FYI, here’s the stack trace when using 1.1.0 docker image:

E 0621 20:04:51.016 UTC THREAD21 TraceId:d186a04959f5bc12: service failure
java.lang.IllegalArgumentException: File does not contain valid certificates: /tmp/certCollection1848457118942697108.tmp
	at io.netty.handler.ssl.SslContextBuilder.trustManager(SslContextBuilder.java:164)
	at com.twitter.finagle.netty4.ssl.Netty4SslConfigurations$.configureTrust(Netty4SslConfigurations.scala:40)
	at com.twitter.finagle.netty4.ssl.client.Netty4ClientEngineFactory.apply(Netty4ClientEngineFactory.scala:63)
	at com.twitter.finagle.netty3.Netty3Transporter.$anonfun$addFirstTlsHandlers$1(Netty3Transporter.scala:307)
	at com.twitter.finagle.netty3.Netty3Transporter.$anonfun$addFirstTlsHandlers$1$adapted(Netty3Transporter.scala:306)
	at scala.Option.foreach(Option.scala:257)
	at com.twitter.finagle.netty3.Netty3Transporter.addFirstTlsHandlers(Netty3Transporter.scala:306)
	at com.twitter.finagle.netty3.Netty3Transporter.newPipeline(Netty3Transporter.scala:382)
	at com.twitter.finagle.netty3.Netty3Transporter.newConfiguredChannel(Netty3Transporter.scala:391)
	at com.twitter.finagle.netty3.Netty3Transporter.$anonfun$apply$2(Netty3Transporter.scala:398)
	at com.twitter.finagle.netty3.ChannelConnector.apply(Netty3Transporter.scala:48)
	at com.twitter.finagle.netty3.Netty3Transporter.apply(Netty3Transporter.scala:400)
	at com.twitter.finagle.netty3.Netty3Transporter$$anon$3.apply(Netty3Transporter.scala:149)
	at com.twitter.finagle.Http$Client$$anon$2.$anonfun$apply$1(Http.scala:209)
	at com.twitter.util.Local.letClear(Local.scala:151)
	at com.twitter.finagle.context.MarshalledContext.letClearAll(MarshalledContext.scala:112)
	at com.twitter.finagle.context.Contexts$.$anonfun$letClearAll$1(Contexts.scala:38)
	at com.twitter.util.Local.letClear(Local.scala:151)
	at com.twitter.finagle.context.LocalContext.letClearAll(LocalContext.scala:43)
	at com.twitter.finagle.context.Contexts$.letClearAll(Contexts.scala:37)
	at com.twitter.finagle.Http$Client$$anon$2.apply(Http.scala:209)
	at com.twitter.finagle.Filter$AndThen$$anon$3.apply(Filter.scala:149)
	at com.twitter.finagle.pool.CachingPool.apply(CachingPool.scala:58)
	at com.twitter.finagle.pool.WatermarkPool.apply(WatermarkPool.scala:144)
	at com.twitter.finagle.liveness.FailureAccrualFactory.apply(FailureAccrualFactory.scala:370)
	at com.twitter.finagle.service.ExceptionRemoteInfoFactory.apply(ExceptionRemoteInfoFactory.scala:71)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.loadbalancer.LoadBalancerFactory$StackModule$$anon$2.apply(LoadBalancerFactory.scala:208)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.loadbalancer.LeastLoaded$Node.apply(LeastLoaded.scala:30)
	at com.twitter.finagle.loadbalancer.Balancer.apply(Balancer.scala:250)
	at com.twitter.finagle.loadbalancer.Balancer.apply$(Balancer.scala:240)
	at com.twitter.finagle.loadbalancer.p2c.P2CLeastLoaded.apply(P2CLeastLoaded.scala:29)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.factory.TrafficDistributor$Distributor.apply(TrafficDistributor.scala:99)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.factory.TrafficDistributor.apply(TrafficDistributor.scala:303)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.factory.StatsFactoryWrapper.apply(StatsFactoryWrapper.scala:44)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.factory.RefcountedFactory.apply(RefcountedFactory.scala:24)
	at com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:227)
	at com.twitter.finagle.factory.TimeoutFactory.apply(TimeoutFactory.scala:61)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.service.Retries$$anon$1.applySelf(Retries.scala:225)
	at com.twitter.finagle.service.Retries$$anon$1.apply(Retries.scala:262)
	at com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:99)
	at com.twitter.finagle.service.DelayedFactory.$anonfun$apply$1(DelayedFactory.scala:46)
	at com.twitter.util.Future.$anonfun$flatMap$1(Future.scala:1089)
	at com.twitter.util.Promise$Transformer.liftedTree1$1(Promise.scala:107)
	at com.twitter.util.Promise$Transformer.k(Promise.scala:107)
	at com.twitter.util.Promise$Transformer.apply(Promise.scala:117)
	at com.twitter.util.Promise$Transformer.apply(Promise.scala:98)
	at com.twitter.util.Promise$$anon$1.run(Promise.scala:421)
	at com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:200)
	at com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:158)
	at com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:272)
	at com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:108)
	at com.twitter.util.Promise.runq(Promise.scala:406)
	at com.twitter.util.Promise.updateIfEmpty(Promise.scala:801)
	at com.twitter.util.ExecutorServiceFuturePool$$anon$4.run(FuturePool.scala:141)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.security.cert.CertificateException: found no certificates in input stream
	at io.netty.handler.ssl.PemReader.readCertificates(PemReader.java:98)
	at io.netty.handler.ssl.PemReader.readCertificates(PemReader.java:64)
	at io.netty.handler.ssl.SslContext.toX509Certificates(SslContext.java:1026)
	at io.netty.handler.ssl.SslContextBuilder.trustManager(SslContextBuilder.java:162)
	... 72 more

Aha! I had been testing against 1.0.2. I’ve reproduced this on 1.1.0 (and on latest master) and it doesn’t seem to be specific to docker. Seems to be a bug that is triggered when no CA cert is specified and the bug seems to have been introduced in the 1.1.0 release. Would you mind filing a github issue for it?

Thanks for finding this! In the meantime I think you should be able to work around the issue by either disabling validation, explicitly specifying a CA cert, or by using linkerd version 1.0.2 or older.

Yes, I’ve reverted to v1.0.2 until this has been patched in a future release.

I’ve submitted “File does not contain valid certificates” when trustCerts not specified in TLS client (v1.1.0 only). Thanks for verifying!

1 Like