HA Kubernetes RKE2 with Kube-VIP and Rancher
Batur Orkun

First, I want to try to explain some mind-blowing things. If you are new to Rancher, it can be difficult to understand the difference between and purpose of each of these concepts. If you know the differences between K3S, RKE, RKE2, and Rancher, you can skip the next paragraph without reading.

Rancher is a piece of software that can be created and managed by your Kubernetes clusters. In my opinion, that is where the confusion starts. If you have not good at K8S concepts, you fall into the chicken and egg situation. Managing Kubernetes is the main duty for Rancher. You can manage multiple Kubernetes clusters in one Rancher. But if you don’t have any Kubernetes, you can create from Rancher UI easily.
I think If you need a single node K8S, you can use this method. At first, you should run a Rancher docker then follow the installation wizards for K3S or RKE. RKE stands for Rancher Kubernetes Engine and is Rancher’s command-line utility for creating, managing, and upgrading Kubernetes clusters. That means RKE is the name of Kubernetes distribution like Openshift, Mikrok8s, Mirantis, Tanzu, and EKS (AWS). But you will usually hear “Rancher” because Rancher is the name of the frontend product and company name. Rancher Company launched K3S, RKE, and RKE2 Kubernetes with Rancher Product. If you need lightweight Kubernetes, especially on IoT devices, you can use K3S. If you need a standard Kubernetes that running on Docker, use RKE. If you need a more secure and powerful Kubernetes on ContainerD, use RKE2. It is my suggestion that If you need a High Available (HA) Rancher on Production, use RKE2.

Now, I will try to describe the installation of 3 Node HA Rancher Kubernetes with RKE2 basically. I said there are available many installation methods at first for that, but I tried nearly all of them and decided that.

I used Ubuntu 20.04 for OS. Your devices can be bare-metal or VMs. I installed Ubuntu 20.04 on Proxmox Virtualisation.

Do not forget to set static IPs to your nodes. The range “192.168.10.71” to “192.168.10.73” is my node’s IP and needs an extra floating IP for Kube-VIP. I allocated “192.168.10.74”.

Now set domain names on your DNS. I set 3 domains for every node ( rke-node-X.rke.domain.com) and an extra domain for Rancher UI (rancher.rke.domain.com). The “rke.domain.com” domain is the base domain for my Rancher cluster, so I need a wildcard SSL certificate for this domain (*.rke.domain.com). If you don’t have an SSL certificate, you can get it from ZeroSSL or Letsencrpt.

Suggestion:
I like “acme.sh” project to get certificates from ZeroSSL easily.
https://github.com/acmesh-official/acme.sh


192.168.10.71  rke-node-1.rke.domain.com
192.168.10.72  rke-node-2.rke.domain.com
192.168.10.73  rke-node-3.rke.domain.com
192.168.10.74  rancher.rke.domain.com

Well! Your domains IPs and certificates are ready, in that case, connect Node1 through SSH and run these commands below.


mkdir -p /etc/rancher/rke2
vi /etc/rancher/rke2/config.yaml


This is the config file for Node1. Then add these lines below to the config file.


tls-san:
- rke-node-1
- rke-node-1.rke.domain.com
- rancher.rke.domain.com
- 192.168.10.71

Now, set some environment variables for Kube-VIP.


export VIP=192.168.10.74
export TAG=v0.4.2
export INTERFACE=ens160
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock
export PATH=/var/lib/rancher/rke2/bin:$PATH
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
alias k=kubectl

We are ready for installation now.


curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server
systemctl start rke2-server

Follow the logs, if you like, run:
journalctl -u rke2-server -f


Wait for finishing the installation patiently, because that may take some time, then your first node will be ready. You can check it like that:

“Kubeconfig” file named rke2.yaml can be found in “/etc/rancher/rke2” directory.


export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
k get nodes

Output:
NAME   STATUS   ROLES                       AGE   VERSION
rke2   Ready    control-plane,etcd,master   12d   v1.22.7+rke2r2

If your output resembles these lines above, everything is going well. You have a single node K8S. So you can install Kube-VIP now.


curl -s https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml
crictl pull docker.io/plndr/kube-vip:$TAG
alias kube-vip="ctr --namespace k8s.io run --rm --net-host docker.io/plndr/kube-vip:$TAG vip /kube-vip"

kube-vip manifest daemonset \
--arp \
--interface $INTERFACE \
--address $VIP \
--controlplane \
--leaderElection \
--taint \
--services \
--inCluster | tee /var/lib/rancher/rke2/server/manifests/kube-vip.yaml


Wait 15 seconds approximately, then you can check the status.


k logs $(k get po -n kube-system | grep kube-vip | awk '{print $1}') -n kube-system --tail 1
Output:
time="2022-04-11T20:33:48Z" level=info msg="Broadcasting ARP update for 192.168.10.74 (00:50:56:9b:3a:cb) via ens160"

If your output resembles these lines above, everything is going well probably. You installed Kube-VIP on your K8S. So you can see your floating IP on your node.


ip a list $INTERFACE
Output:
2: ens160:  mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:9b:3a:cb brd ff:ff:ff:ff:ff:ff
inet 192.168.10.71/24 brd 192.168.10.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.10.74/32 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9b:3acb/64 scope link
valid_lft forever preferred_lft forever

Our floating IP “192.168.10.74” is there. Our node has 2 IPs.
I can continue with the installation of the second node. But I choose to install Rancher at first.


kubectl create namespace cattle-system

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.4/cert-manager.yaml

helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--version 2.6.3 \
--set hostname=rancher.rke.domain.com \
--set replicas=1

If you want to install HTTPS-based Rancher, you must create a TLS secret object and must give parameters to the HELM command. Forget the commands above and implement the codes below.


kubectl create namespace cattle-system
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update
export CERTDIR=/root/rke/certificates
kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=${CERTDIR}/fullchain.pem --key=${CERTDIR}/key.pem

helm install  rancher rancher-stable/rancher \
--namespace cattle-system  \
--set hostname=rancher.rke.domain.com \
--set replicas=1 \
--set ingress.tls.source=secret

To check rancher installation status. Run the command below.


root@rke-node-1:~/certificates# kubectl -n cattle-system rollout status deploy/rancher
Output:
Waiting for deployment "rancher" rollout to finish: 0 of 1 updated replicas are available...


You can see the output like above at first. In that case, please wait patiently. Maybe after some extra outputs, you should see the sentence below at end of the outputs.

deployment “rancher” successfully rolled out



To be sure everything is well, check the rancher pods


kubectl get  pods -n cattle-system
Output:
NAME                       READY   STATUS    RESTARTS   AGE
rancher-78f6794ccb-wh54w   1/1     Running   0          7m42s

If your output resembles these lines above, you can smile. :) then check the URL from your browser. If you see a Welcome Page, you should see some commands to get “token” too. For a Helm installation, run:


kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}{{"\n"}}'

Use the token to set the admin password then you are in the Rancher UI. At the last, You have a single node RKE2 and Rancher. So you can continue the installation of Node-2.

You need the token to install other nodes. For getting the token, run:


cat /var/lib/rancher/rke2/server/token

Output:
K10e53bdb27060ebc74cd2c25184fd8b14a94934898a7a91a48a613fb33ec310032::server:1c18fe2068bc4561260b2764858d7402

Now you must log in SSH to the second node and run:


mkdir -p /etc/rancher/rke2
vi /etc/rancher/rke2/config.yaml

This is the config file for Node2. Then add these lines below to the config file.


token: K10e53bdb27060ebc74cd2c25184fd8b14a94934898a7a91a48a613fb33ec310032::server:1c18fe2068bc4561260b2764858d7402
server: https://rancher.rke.domain.com:9345
tls-san:
- rke-node-2
- rke-node-2.rke.domain.com
- rancher.rke.domain.com
- 192.168.10.72

We are ready for install now.


curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server
systemctl start rke2-server

Wait for finishing the installation patiently, because that may take some time, then your second node will be ready. You can check it like that:


kubectl get nodes

Also, you can see the second node in Rancher UI. Anymore, You must do the same things for the third node as the second.

Now, I hope you have 3 nodes, but the rancher is running on just the first node, for this reason, you should increase replicas of the rancher. For doing that run:


kubectl scale --replicas 3 deployment/rancher -n cattle-system

After 3 minutes later, you can see your new pods with the command below.


kubectl get  pods -n cattle-system
Output:
rancher-78f6794ccb-jm5tm           1/1     Running     0        4m10s
rancher-78f6794ccb-pqzbd           1/1     Running     0             4m47s
rancher-78f6794ccb-wh54w           1/1     Running     0             64m

RKE will dispatch every pod to a different node. So you have a 3 nodes Rancher cluster and you can try the high availability of your cluster. node. Now shut down the first node! In seconds, your VIP IP will move to one of the other nodes and you can still use Rancher UI and your cluster. But you can not stop another node. Your cluster will collapse because the cluster needs minimum of 2 Etcd servers.

Adding Agent ( Worker) Node

If you just want to add Worker nodes to your cluster, you should run “rke2-agent” instead of the “rke2-server” service.

You must add the config file again.


mkdir -p /etc/rancher/rke2
vi /etc/rancher/rke2/config.yaml

This time, your config file can be like that below.


token: K10e53bdb27060ebc74cd2c25184fd8b14a94934898a7a91a48a613fb33ec310032::server:1c18fe2068bc4561260b2764858d7402
server: https://rancher.rke.domain.com:9345

“token” and “server” items are enough for worker config file. Run “rke2-agent”:


curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-agent
systemctl start rke2-agent

Check your nodes again. Run:


export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodes

Follow the logs, if you like, run:


journalctl -u rke2-agent -f