Deploying Kubelet on CoreOS in a consistent and maintainable manner
If you are running the CoreOS beta channel, you should already have kubelet installed, but if you are running stable channel like me and wish to play with the latest and greatest kubernetes and deploy a non trivial sized cluster, read on.
How to fail at running Kubelet in a container
The right way deploy Kubelet on a large cluster of machines is to deploy
kube-proxy and kubelet on all of your workers using containers, but as of this
writing I could not achieve running Kubelet inside a container. The kubernetes
source tree has an example of running Kubelet inside a container in the local-up-cluster.sh
file. However I was not able to achieve running kubelet on CoreOS using the example given. There is definitely work being done
to get kubelet to run in containers but does not appear to be complete at this
time. See Here
The first hurdle I attempted to overcome was getting the Kubelet container to run in the same pid namespace as the host. When kubelet runs it attempts to move processes like docker daemon into separate name spaces, If it’s running inside a container it will complain it can’t find these processes, and rightly so, because the container is running within it’s own pid namespace. So, we need a way to run a container inside the host namespace.
Unfortunately Docker 1.7.1 does not support running containers in the pid namespace of the host, in what might be termed as a super privileged container. But Spawnd does allow running containers in the host pid namespace! So inspired by toolbox I wrote a script that would download the docker image, export the container, untar the container somewhere on CoreOS and then run the Kubelet within the super privileged container.
This didn’t work for several reasons; First; spawnd mounts /proc/sys
as
read only (Which I overcame with a simple remount script that runs before
Kubelet), Second there were some kubelet mount issues which I never figured out
(Even with using —containerized), Third; and here is the big one… now that
kubelet was running in the host namespace it could see all the processes
running on the host but it was still unable to move processes into different
names spaces. I’m not sure if this is a kernel security limitation, or
something in the way spawnd setup the container. In the end I decided to move
on and just use docker as a software delivery platform to install the kubelet
binaries, config files and necessary TLS certificates.
How to install Software on CoreOS via Docker containers
What I’m going to show you now is how you can use containers and systemd to deploy a kubernetes worker on CoreOS in a reproducible and maintainable manner
At first you might think all you need is to use cloud-config
until you
realize it can do everything; but install binaries. Well technically it can,
but you have to base encode the binary and place it in the cloud-config.yaml
file which is not something I want to do with a 55 Megabyte binary. The second
thought I had was too run the cloud-config as a script and have
the script download the binary from some http site I would have to setup. I
really don’t want to manage an out of band http server with Kubelet and SSL
certs rolled up in a tar ball. So I decided to create a docker image that would
preform the installation for me! So now all the software I want to run on the
cluster and the software to install kubelet on CoreOS all live in my
local docker repo!
For my kubernetes setup I created 2 docker images, one for the kubernetes master and one for the workers. To make installation simple I have systemd run the install image at startup which insures my workers always have the latest and greatest installed.
First lets start with the Dockerfile where we create our image
FROM gcr.io/google_containers/hyperkube:v1.1.1
# Copy everything in our worker root to the image
COPY root /worker-root
# Copy the worker installation script
COPY worker-install.sh /
RUN chmod +x /worker-install.sh
All of the TLS Certs, config files and binaries are located in the relative ./root
tree structure and is copied to /worker-root
inside the
Docker image. We use the hyperkube image because it contains everything we will
need to preform the installation including nsenter
The ./root
directory structure looks like this
$ find root
root
root/etc
root/etc/flannel
root/etc/flannel/options.env
root/etc/kubernetes
root/etc/kubernetes/ssl
root/etc/kubernetes/ssl/apiserver-key.pem
root/etc/kubernetes/ssl/apiserver.csr
root/etc/kubernetes/ssl/apiserver.pem
root/etc/kubernetes/ssl/ca.pem
root/etc/kubernetes/ssl/worker-key.pem
root/etc/kubernetes/ssl/worker.csr
root/etc/kubernetes/ssl/worker.pem
root/etc/kubernetes/worker-kubeconfig.yaml
— SNIP ---
Now lets look at worker-install.sh
#!/bin/bash
set -e
WORKER_ROOT=/worker-root
COREOS_ROOT=/rootfs
NSENTER="./nsenter --mount=${COREOS_ROOT}/proc/1/ns/mnt -- "
SYSTEM_CTRL="$NSENTER /usr/bin/systemctl"
function render_template () {
eval "echo \"$(cat $1)\""
}
function get_ip () {
echo $($NSENTER ifconfig $1 | awk '/inet /{print $2}')
}
# These variables are used when render_template is called
PUBLIC_IP=$(get_ip eth0)
SERVICE_IP=$(get_ip eth1)
# Install /etc
cp -r $WORKER_ROOT/* $COREOS_ROOT
# Expand the variables in some of our files
render_template $WORKER_ROOT/etc/flannel/options.env > $COREOS_ROOT/etc/flannel/options.env
render_template $WORKER_ROOT/etc/systemd/system/kubelet.service > $COREOS_ROOT/etc/systemd/system/kubelet.service
render_template $WORKER_ROOT/etc/systemd/system/kube-proxy.service > $COREOS_ROOT/etc/systemd/system/kube-proxy.service
# Tell systemd it has new unit files
$SYSTEM_CTRL daemon-reload
# Enable to services to start at boot
$SYSTEM_CTRL enable worker-install
$SYSTEM_CTRL enable kubelet
$SYSTEM_CTRL enable kube-proxy
# Start the services
$SYSTEM_CTRL start kubelet
$SYSTEM_CTRL start kube-proxy
Starting from the top we define a helper function called $NSENTER that will allow us to execute commands in the host name space from the container. See https://github.com/jpetazzo/nsenter
Next we create a function called render_template()
which preforms a little
bash magic and gives us a poor mans template renderer. This will take a
single file as input and expand any bash variables it finds within the target
file. I mostly use this for expanding $PUBLIC_IP
inside kubelet.service
and kube-proxy.service
.
Next we figure out what public ip we have been assigned by our cloud provider
or physical machine. Finally we recursively copy all the binaries, TLS Certs,
config files, and systemd unit files from /worker-root
into the host root
file system we have mounted on /rootfs
. Once these files are on disk we
tell systemd to reload and start the appropriate services.
Now lets create our image and upload it to our docker repo
$ export DOCKER_REPO=<your-repo-here>
$ docker build -t ${DOCKER_REPO}/worker:latest .
$ docker push ${DOCKER_REPO}/worker:latest
All that is left is to have systemd pull and run the docker container on server startup by creating /etc/systemd/system/worker-install.service
[Unit]
Description=This downloads the docker image and runs the installation \
script from within a docker container
[Service]
Type=oneshot
ExecStart=/usr/bin/docker pull <your-docker-repo>/worker:latest
ExecStart=/usr/bin/docker run --privileged --net=host --ipc=host --uts=host -v /:/rootfs -v /dev:/dev -v /run:/run <your-docker-repo>/worker:latest /worker-install.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
This one shot systemd service unit preforms a docker pull
to ensure it
has the latest version of the image then runs the install script from within
the docker image. I give the container access to all the namespaces docker can
provide just incase one day my script may need it.
In order to avoid any chicken and egg problems both kube-proxy
and kubelet
Unit files depend upon worker-install.servce
before they
are allowed to start, You can set this up by adding the following to any
dependent systemd units.
[Unit]
Requires=worker-install.service
Before=worker-install.service
Now you can place the worker-install.service
into a cloud-config
and
have it create the service when your provider installs CoreOS on the server.
Once the server is booted, the rest of the kubernetes worker software should
install automatically.
Final Thoughts
At first glance this seams like a bit of a hack, but the more I ponder containers role in an infrastructure ecosystem if becomes clear that containers are not only a great software delivery method but also within the realm of system configuration.
In a nutshell, anything you want to do to a Server; firmware updates, hardware configuration, and software installation should all be performed or installed by a container.
You could conceivably run salt or ansible from a container without ever having to install python on the operating system, making updates to those systems a snap, just update the salt image and restart a systemd service on the server and bingo, you are running the latest version of salt/ansible/kubernetes.
Also, I think super privileged containers are a great idea. At Rackspace Cloud Block Storage we have tons of servers that we can not restart on a whim, but do occasionally need security upgrades. Because CoreOS requires a reboot to update, CoreOS is not optimal for those kinds of servers. However if CoreOS ran sshd from a container that could be upgraded independently of the operating system no reboot would be required! The more Software running in containers the better! In the future I would love to see CoreOS ship their sshd server in this manner.
On second thought, Don’t even ship CoreOS with sshd, let the users pick what servers to run.