K8s - Pods aren't created with hostPort config in linker-grpc.yml


#1

This might be an issue because I am using Openshift, or rather minishift.

I am following the linker k8s example. When I create the services with openshift the l5d service doesn’t create any pods unless I remove the hostPort under spec.template.spec.containers.ports in the linker-grpc.yml:

...
ports:
- name: outgoing
  containerPort: 4140
  hostPort: 4140 #this guy
...

I wanted to bring this up to see if it’s a known thing, or maybe just an issue with my configuration. I don’t know what the hostPort config does, but the service seems to work without it.

Seems I spoke to soon. The admin panel is loading fine, but if I go into the logs of the l5d pod there is an error:

-XX:+AggressiveOpts -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:InitialHeapSize=33554432 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=174485504 -XX:MaxTenuringThreshold=6 -XX:OldPLABSize=16 -XX:+PrintCommandLineFlags -XX:+ScavengeBeforeFullGC -XX:-TieredCompilation -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseStringDeduplication 
Jul 05, 2017 5:06:00 PM com.twitter.finagle.http.HttpMuxer$ $anonfun$new$1
INFO: HttpMuxer[/admin/metrics.json] = com.twitter.finagle.stats.MetricsExporter(<function1>)
Jul 05, 2017 5:06:00 PM com.twitter.finagle.http.HttpMuxer$ $anonfun$new$1
INFO: HttpMuxer[/admin/per_host_metrics.json] = com.twitter.finagle.stats.HostMetricsExporter(<function1>)
I 0705 17:06:01.042 UTC THREAD1: linkerd 1.0.2 (rev=a6fbb1cb7f2779159b432324d364afee4e6462af) built at 20170512-210300
I 0705 17:06:01.298 UTC THREAD1: Finagle version 6.44.0 (rev=ef94604c6db76959610eeb8fb2bb06810022061f) built at 20170421-130151
I 0705 17:06:03.629 UTC THREAD1: Tracer: com.twitter.finagle.zipkin.thrift.ScribeZipkinTracer
I 0705 17:06:03.670 UTC THREAD1: connecting to usageData proxy at Set(Inet(stats.buoyant.io/104.28.22.233:443,Map()))
I 0705 17:06:03.769 UTC THREAD1: tracer: io.buoyant.telemetry.recentRequests.RecentRequetsTracer@3e60355b
I 0705 17:06:03.904 UTC THREAD1: Resolver[inet] = com.twitter.finagle.InetResolver(com.twitter.finagle.InetResolver@58434b19)
I 0705 17:06:03.905 UTC THREAD1: Resolver[fixedinet] = com.twitter.finagle.FixedInetResolver(com.twitter.finagle.FixedInetResolver@7d3fb0ef)
I 0705 17:06:03.906 UTC THREAD1: Resolver[fail] = com.twitter.finagle.FailResolver$(com.twitter.finagle.FailResolver$@4fa9ab6)
I 0705 17:06:03.905 UTC THREAD1: Resolver[neg] = com.twitter.finagle.NegResolver$(com.twitter.finagle.NegResolver$@7dbe2ebf)
I 0705 17:06:03.907 UTC THREAD1: Resolver[zk] = com.twitter.finagle.zookeeper.ZkResolver(com.twitter.finagle.zookeeper.ZkResolver@6e4c0d8c)
I 0705 17:06:03.912 UTC THREAD1: Resolver[zk2] = com.twitter.finagle.serverset2.Zk2Resolver(com.twitter.finagle.serverset2.Zk2Resolver@64db4967)
I 0705 17:06:03.906 UTC THREAD1: Resolver[nil] = com.twitter.finagle.NilResolver$(com.twitter.finagle.NilResolver$@885e7ff)
I 0705 17:06:03.906 UTC THREAD1: Resolver[flag] = com.twitter.server.FlagResolver(com.twitter.server.FlagResolver@2d3ef181)
I 0705 17:06:03.990 UTC THREAD1: k8s initializing default
E 0705 17:06:04.401 UTC THREAD21: k8s failed to list endpoints
io.buoyant.k8s.Api$UnexpectedResponse

This error repeats indefinitely for the life of the pod.

The hello service also has a similar repeating error:

starting gRPC server on :7777
2017/07/05 15:19:44 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 192.168.99.100:4140: getsockopt: connection refused"; Reconnecting to {192.168.99.100:4140 <nil>}

But this is probably because something with linkerd isn’t working, right?


#2

Hey @tfmertz – I recommend checking out the Flavors of Kubernetes guide. The Linkerd example configs from the k8s-daemonset directory are setup to work with GKE. Kubernetes environments vary widely, and the same config won’t work across all environments, unfortunately.

In your case, it looks like the linkerd daemonset pods can’t be deployed, since they can’t reserve hostPort: 4140. It’s possible your environment doesn’t support hostPort, or port 4140 may already be taken. I’d recommend setting hostNetwork: true in the linkerd daemonset spec, to see if that fixes it. There’s more info about kubernetes host networking here.


#3

Thanks for the pointers! I was able to figure out that openshift by default starts all pods with restricted privileges that don’t include permission to open hostports.

I created a service account and linked it to the privileged security context constraint and added it to the daemon set’s configuration under the spec.template.spec.serviceAccount in the linkerd-grpc.yml.

So now with the original linkerd-grpc.yml config, plus the service account addition to give openshift permissions, the l5d service is coming up with a pod successfully!

However, I’m still getting the k8s failed to list endpoints error and I’m wondering where I can find more information about this error. Is this possibly my linkerd configuration at fault, or maybe openshift permissions?


#4

We’ve done pretty minimal testing with OpenShift specifically so we don’t have formal documentation for how to get this working, unfortunately. From some experiments I did a while back I found that I needed to do the following to get it to work:

  • Ensure the user has daemonset permissions
  • Ensure the service account has list/get/watch permissions for the namespaces, namespace/endpoints, and namespace/services APIs
  • Ensure the SCC has hostPort permissions
  • Don’t drop SETUID or SETGID

Hope this is helpful.


#5

That’s good to know, thanks. Are these permission just for the linkerd (l5d) service, or do I need to add them to the hello and world services as well?

Also, another issue might be the fact that I’m using minishift. It brings up an Open Shift cluster in a virtualbox machine for local dev.

My developer user has daemonset permissions, and my service account is added to the privileged SCC, so it should have full access and the privileged SCC is set to drop none.

I’m not quite sure what you mean by the service account has permissions for namespaces, namespace/endpoints, and namespace/services.

Thanks for the help!


#6

You should just need to add the permissions to linkerd.

It’s been a while since I looked at this stuff but I think the API permissions are captured in cluster policy: https://docs.openshift.org/latest/admin_guide/manage_authorization_policy.html#viewing-cluster-policy


#7

It initialized without the error! I had that error for so long I almost couldn’t believe it. THANK YOU!!!

I had the issue come back when I tried to make a request, but I figured out that my linkerd config was set to look at the default namespace, whereas my project is part of the linker namespace. Updating the dtab:

-        /srv        => /#/io.l5d.k8s/default/grpc;
+        /srv        => /#/io.l5d.k8s/linker/grpc;

and the transformer

         transformers:
         - kind: io.l5d.k8s.daemonset
-          namespace: default
+          namespace: linker

seemed to fix that issue.

For documentation reasons the ClusterRole file that seemed to fix the permissions is:

apiVersion: v1
kind: ClusterRole
metadata:
  name: namespaces
rules:
  - resources:
    - namespaces
    - endpoints
    - services
    verbs:
    - get
    - list
    - watch

Then create the role and add it to the service account

$ oc create -f namespace-permissions.yml
$ oc adm policy add-role-to-user namespaces -z yourserviceaccount

Also, for documentation purposes the ClusterRole file to give daemonset access:

apiVersion: v1
kind: ClusterRole
metadata:
  name: daemonset-admin
rules:
  - resources:
    - daemonsets
    apiGroups:
    - extensions
    verbs:
    - create
    - get
    - list
    - watch
    - delete
    - update
    - patch

Same process as for the namespaces but instead of adding to a service account with the -z flag, you add it to your user.

Then I created a route in openshift that goes to the 4140 port and used the helloworld client in /docker/helloworld of the linkerd examples project to

go run main.go <openshift_dns>:4140

and got Hello (172.17.0.4) world (172.17.0.5)!!

Thank you so much for your help guys! I’ve been struggling with this for the better part of two days and it’s a great way to finish the week!


Help with vagrant and calico
#8

@tfmertz Great, glad to hear you got it working!


split this topic #9

A post was split to a new topic: Help with vagrant and calico