website/content/lab-notes/k8s.md

3.2 KiB
Executable file

title date
Mira attempts to lock the fuck in and figure out how to k8s 2025-04-07

Current Setup

Auxin (HTPC):

  • OS: NixOS
  • Service Runtume:Docker Compose
  • Services:
    • Syncthing
    • Jellyfin
    • Caddy (as reverse proxy only)
    • Kodi
  • Service Storage:
    • Bind Mounts to NFS on NAS Lipotropin (NAS):
  • OS: Proxmox
  • Services Runtime: K3S
  • Services:
    • adminer
    • caddy
    • copyparty
    • forgejo
    • jackett
    • mariadb
    • ntfy
    • paperless+gotenberg+tika
    • qbittorrent+gluetun
    • radarr
    • redis
    • slskd
  • Service Storage:
    • NFS to spinning rust on same system
  • Baremetal storage:
    • 6 HDDs in BTRFS5 (50TB raw, 38TB usable) Motilin (working hostname, Unused)

Other Client Devices:

  • Access files over NFS/SMB
  • Access services via hostname (local), or URL (external)

Endn't Point

Auxin

  • Exclusive Services:
    • Kodi
  • Distributed Services via K8S
  • Service Storage: ????
  • Baremetal storage:
    • Various HDDs as available Motilin + Lipotropin
  • Distributed Services via K8S Client Devices
  • Still need to be able to access files over NFS/SMB
  • Still need to access services via hostname (local), or URL (external)

Still need to figure out:

  • How to handle jellyfin requiring GPU access
    • nodes not guaranteed to have GPU, or may have heterogenous GPUs, need to lock to nodes with GPUs
  • How do I handle failover
    • current setup has SSH and HTTP traffic go to auxin, which proxies to lipotropin as needed
    • how would I handle auxin going down
    • where the fuck would ssh go in general, it's not something reverse proxyable
      • Move SSH to router, use proxyjump from there
        • see if Auto proxy and fallback are options
        • can we do crowdsec and fail2ban on openwrt
    • if I fucked up a config, how could I recover without incurring downtime
  • How the fuck do I handle storage
    • how do I do concurrent access, which services even allow that
      • how the ever loving fuck do I make syncthing work
      • the intent is graceful failover, but can I do load balancing?
        • I do not have a good reason to have multiple instances of, cap replicas at 1
        • at best, jellyfin might benefit from multiple GPUs but also nobody is using your instance
      • some services use SQLite, is there a way to mitigate concurrency issues with that
        • on that note SQLite shits itself when FS access is too slow (eg NFS or ISCSI), how deal with that
    • How to gradually migrate to new setup from 6 BTRFS drives on one machine
      • The end goal is at least 3 nodes, but at some point there's only going to be 1 or 2
      • how would I minimize buying more storage/spending more money, while maintaining some redundancy
      • is the best option just btrfs remove -> move drive -> format drive -> copy files from btrfs to new drive?
      • new storage setup should be easy to add to (at a minimum, no homogenous drives)
    • Is tiered storage something I can make use of, could I grab an SSD and use it as cache somehow
    • how to give client devices and services access to the same filesystems
      • eg (accessing /downloads from qbittorrent, sonarr and my phone via NFS/SMB)

Rook: does it solve my storage issues?

  • Needs raw devices/partitions
    • requires migration

Longhorn: will it do shit?