When you work with bleeding edge technology you can expect the unexpected
As most of us know software is never without bugs and due to technical diversity most of us won’t be able to fix these bugs ourselves. Instead we can develop and deploy workarounds while we wait for specialists to release a fix.
In this post I’d like to share how we resolved a connectivity issue between pods in our Kubernetes cluster due to a tunnelling issue in Calico.
Public Cloud
At Fuga CLoud we run a public cloud based on the free and open-source cloud computing platform OpenStack. Initially we built this public cloud for our users to setup and manage their own infrastructure and there has been an internal company demand for a similar service. We eat our own dog food.
Currently we’re working on a continuous deployment pipeline to run OpenStack in containers. For fast iterations we deploy to virtual hardware on our own public cloud. The containers are orchestrated by Kubernetes and intern-container connectivity is handled by Calico. Because we run on virtual hardware we use Calico’s IP in IP tunnelling.
IP in IP is an IP tunnelling protocol that encapsulates one IP packet in another IP packet. (source wikipedia.org)
Problem
The problem we faced using Calico IP in IP tunnels in a virtual environment was that Kubernetes pods sometimes couldn’t connect to one another during initialisation phase. Somehow these IP-in-IP tunnels between pods weren’t properly initialised causing the pods to get stuck in a crash loop. During troubleshooting and many deployment runs we discovered sending ICMP packets from cross origin pods within the Kubernetes cluster resolved the IP in IP network issues we were having.
The successrate of our continuous deployments went from 60% to 100%.
Workaround
Our current workaround is deploying a pod on each Kubernetes node which sends a single ICMP packet to each pod in the cluster. To deploy these pods we used some core features of Kubernetes narrowing the workaround down to a single configuration file containing no more than 30 lines.
Our resulting configuration after some iterations:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: pokepods
namespace: kube-system
labels:
app: pokepods
spec:
template:
metadata:
labels:
app: pokepods
spec:
containers:
- name: busybox
image: busybox
command: ["/bin/sh"]
args: ["-c", "PATH=$PATH:/host/usr/bin; while true; do kubectl get pods --all-namespaces -o go-template='{{range .items}}{{if (and (.status.podIP) (ne .metadata.namespace \"kube-system\"))}}ping -c 1 -w 1 {{.status.podIP}} || true;{{end}}{{end}}' |sh; sleep 5; done"]
volumeMounts:
- mountPath: /host/usr/bin
name: kubectl-path
readOnly: true
volumes:
- name: kubectl-path
hostPath:
path: /usr/bin
To deploy the pods in the Kubernetes cluster:
$ kubectl create -f ./manifest.yml
Break-it-down
Kubernetes configuration can be written in manifest files with a YAML or JSON format. The first three fields in the following excerpt are required for all Kubernetes configurations, apiVersion, kind and metadata. The manifest complies with the defined API version, we want to deploy a DaemonSet resource in the kube-system namespace.
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod (source: kubernetes.io)
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: pokepods
namespace: kube-system
labels:
app: pokepods
The image we want use as our container will be busybox. This image contains all the basic Unix tools we need to execute a shell script and ping other pods. The busybox image is pre-installed with Kubernetes which frees us from building and registering our own container image.
spec:
template:
metadata:
labels:
app: pokepods
spec:
containers:
- name: busybox
image: busybox
When all you have is a hammer
The busybox container runs a single process, configured in the command
field. This process loops through all the running pods in the Kubernetes cluster and sends them a ICMP packet to jumpstart the IP in IP tunnel configured through Calico.
command: ["/bin/sh"]
args: ["-c", "PATH=$PATH:/host/usr/bin; while true; do kubectl get pods --all-namespaces -o go-template='...' |sh; sleep 5; done"]
To know which pods to send ICMP packets we need to query the Kubernetes API with the Kubernetes client kubectl
. We could install this client in our container but because it is pre-installed on all our Kubernetes nodes we simply mount the parent directory including the binary on the host in our container. An additional benefit is that the Kubernetes client will now always match the Kubernetes API.
volumeMounts:
- mountPath: /host/usr/bin
name: kubectl-path
readOnly: true
volumes:
- name: kubectl-path
hostPath:
path: /usr/bin
By default the Kubernetes client kubectl
returns output in a human readable format and it supports machine readable formats like YAML and JSON. Another option is to use the built-in templating system with which you can do insane things like build shell scripts. The following excerpt loops through the pods and filters pods which have an IP address and are not in the kube-system namespace.
{{range .items}}
{{if (and (.status.podIP) (ne .metadata.namespace "kube-system"))}}
ping -c 1 -w 1 {{.status.podIP}} || true;
{{end}}
{{end}}
The output of the template is then piped to the sh
command which executes the ping
commands.
Improvements
There are always improvements to be made but since this is a workaround I’ve left them out.
- Filter-out pods which are in ready status
- Log failed pings
Read more
To read more about the subject please consider the following links:
- IP in IP en.wikipedia.org
- Calico IP-in-IP docs.projectcalico.org
- Kubernetes DaemonSet kubernetes.io
- OpenStack openstack.org
- Busybox busybox.net
- Golang template package golang.org
- Alternative workaround developed by GiantSwarm github.com/giantswarm
- Fuga Cloud fuga.cloud
FAQ
A quick way to hunt down the available fields you can use within templates is to view the accompanied resource in JSON format using the Kubernetes client:
$ kubectl -n kube-system get pod <pod id> -o json
{
"apiVersion": "v1",
"kind": "Pod",
"status": {
"podIP": "10.100.0.3",
...