Migrating from k3s to RKE2

A comprehensive guide for migrating from a 3-node k3s cluster to a 4-node RKE2 Kubernetes cluster with zero downtime, using Rocky Linux 10, Canal CNI, and Hetzner infrastructure.

Welcome to my guide on migrating from k3s to RKE2 while keeping downtime to a minimum. Follow along as I walk through the complete process of transitioning a 3-node k3s setup to a 4-node RKE2 cluster with high availability, using enterprise-grade tools and practices.

Please read this

I originally planned to offer this guide as a paid online course, but as a strong believer in free open source resources, I made it available for free instead.

Please, if my guides helped you, I would be very grateful if you could support my work by becoming a GitHub Sponsor and by sharing the guides in your network. 🙏

Eventually I might offer additional guides as paid online courses, but for now, I want to focus on providing free guides.

Thank you ❤️

I started our original cluster using k3s due to the ease of setup and lightweight nature for our CI/CD workloads, which were not considered mission-critical at the time. As our migration from bare-metal GitHub Action runners to Kubernetes GitHub Action Runner Controller (ARC) continued, we noticed a significant increase in our resource demand. I decided to add two additional Hetzner dedicated servers as worker nodes to our cluster and looked into getting them production-ready, using inter-node communication via vSwitch (see my existing blog post New K3s agent node for our cluster if you want to learn more).

This enabled us to move additional development and proof-of-concept workloads from a comparably expensive Elastic Kubernetes Service (EKS) in Amazon Web Services (AWS) to our self-managed infrastructure, saving us a fortune. However, as we continued to grow, we started to feel the limitations of k3s: it was not designed for larger, complex clusters with high availability requirements, but instead focused on simplicity and ease of use for edge and IoT environments.

This made me look into alternatives, and RKE2 stood out as a robust and enterprise-grade Kubernetes distribution also maintained by SUSE/Rancher, the same company behind k3s. While k3s offers a lot of built-in features and convenience tools, I wanted to be closer to enterprise-level Kubernetes behavior and have greater control over the components as our environment grows. RKE2 provides exactly that.

On top of that, the etcd component in k3s was showing stability issues, especially since I had not yet migrated to high availability (HA), spreading the control plane across multiple nodes. With the migration to RKE2, I can set up a proper HA control plane with multiple etcd nodes, significantly improving reliability.

The final push to migrate came when we decided to add another bare-metal dedicated server. This allowed me to bootstrap RKE2 on the new node without touching the existing k3s nodes, enabling a zero-downtime migration.

To follow this guide, you must have:

As this series was written as an online course, the guide is structured as a curriculum with sections and lessons. Each lesson is a separate post in this series, and the terms guide, series and course, as well as lesson, article and posts, are used interchangeably.

Getting Started

To get started with the series, head over to the first section to learn about the migration strategy, prepare the new node with Rocky Linux, and configure networking and firewall rules. .

History

As guides evolve over time to incorporate new technologies, best practices, and community feedback, you can track the various revisions and updates that have been made to this guide here.

Help improve this content
I strive to create helpful and accurate content, but there's always room for improvement! Whether you notice a typo, have ideas to make this clearer, or want to share your thoughts, I warmly welcome your feedback. Together, we can make this content even better for everyone.
Edit this page   |   Create an issue