I am working on setting up a home server but I want it to be reproducible if I need to make large changes, switch out hardware, or restore from a failure. What do you use to handle this?

  • fruitycoder@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    12 hours ago

    Fleet from Rancher to deploy everything to k8s. Baremetal management with Tinkerbell and Metal3 to management my OS deployments to baremetal in k8s. Harvester is the OS/K8S platform and all of its configs can be delivered in install or as cloudinit k8s objects. Ansible for the switches (as KubeOVN gets better in Harvester default separate hardware might be removed), I’m not brave enough for cross planning that yet. For backups I use velero, and shoot that into the cloud encrypted plus some nodes that I leave offline most of the time except to do backups and updating them. I user hauler manifests and a kube cronjob to grab images, helm charts, rpms, and ISO to local store. I use SOPS to store the secrets I need to boot strap in git. OpenTofu for application configs that are painful in helm. Ansible for everything else.

    For total rebuilds I take all of that config and load it into a cloudinit script that I stick on a Rocky or sles iso that, assuming the network is up enough to configure, rebuilds from scratch, then I have a manual step to restore lost data.

    That covers everything infra but physical layout in a git repo. Just got a pikvm v4 on order along with a pikvm switch, so hopefully I can get more of the junk on Metal3 for proper power control too and less IPXE shenanigans.

    Next steps for me are CI/CD pipelines for deploying a mock version of the lab into Harvester as VMs, running integrations tests, and if it passes merge the staged branch into prod. I do that manually a little already but would really like to automate it. One I do that I start running Renovate to grab the latest stable for stuff for me.

    • fruitycoder@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 hours ago

      Definitely overkill lol. But I like it. Haven’t found a more complete solutions that doesn’t feel like a comp sci dissertation yet.

      The goal is pretty simple. Make as much as possible, helm values, k8s manifests, tofu, ansible, cloud init as possible and in that order of preference because as you go up the stack you get more state management for “free”. Stick that in git and test and deploy from that source as much as possible. Everything else is just about getting to there as fast as possible, and keeping the 3-2-1 rule alive and well for it all (3 backups, 2 different media, 1 off-site).