website/content/lab-notes/k8s.md

92 lines
No EOL
3.2 KiB
Markdown
Executable file

---
title: Mira attempts to lock the fuck in and figure out how to k8s
date: 2025-04-07
---
# Current Setup
Auxin (HTPC):
- OS: NixOS
- Service Runtume:Docker Compose
- Services:
- Syncthing
- Jellyfin
- Caddy (as reverse proxy only)
- Kodi
- Service Storage:
- Bind Mounts to NFS on NAS
Lipotropin (NAS):
- OS: Proxmox
- Services Runtime: K3S
- Services:
- adminer
- caddy
- copyparty
- forgejo
- jackett
- mariadb
- ntfy
- paperless+gotenberg+tika
- qbittorrent+gluetun
- radarr
- redis
- slskd
- Service Storage:
- NFS to spinning rust on same system
- Baremetal storage:
- 6 HDDs in BTRFS5 (50TB raw, 38TB usable)
Motilin (working hostname, Unused)
Other Client Devices:
- Access files over NFS/SMB
- Access services via hostname (local), or URL (external)
----
# Endn't Point
Auxin
- Exclusive Services:
- Kodi
- Distributed Services via K8S
- Service Storage: ????
- Baremetal storage:
- Various HDDs as available
Motilin + Lipotropin
- Distributed Services via K8S
Client Devices
- Still need to be able to access files over NFS/SMB
- Still need to access services via hostname (local), or URL (external)
---
# Still need to figure out:
- How to handle jellyfin requiring GPU access
- nodes not guaranteed to have GPU, or may have heterogenous GPUs, need to lock to nodes with GPUs
- How do I handle failover
- current setup has SSH and HTTP traffic go to auxin, which proxies to lipotropin as needed
- how would I handle auxin going down
- ~~where the fuck would ssh go in general, it's not something reverse proxyable~~
- Move SSH to router, use proxyjump from there
- see if Auto proxy and fallback are options
- can we do crowdsec and fail2ban on openwrt
- if I fucked up a config, how could I recover without incurring downtime
- How the *fuck* do I handle storage
- how do I do concurrent access, which services even allow that
- how the ever loving fuck do I make syncthing work
- ~~the intent is graceful failover, but can I do load balancing?~~
- I do not have a good reason to have multiple instances of, cap replicas at 1
- at best, jellyfin might benefit from multiple GPUs but also nobody is using your instance
- some services use SQLite, is there a way to mitigate concurrency issues with that
- on that note SQLite shits itself when FS access is too slow (eg NFS or ISCSI), how deal with that
- How to gradually migrate to new setup from 6 BTRFS drives on one machine
- The end goal is at least 3 nodes, but at some point there's only going to be 1 or 2
- how would I minimize buying more storage/spending more money, while maintaining some redundancy
- is the best option just `btrfs remove` -> move drive -> format drive -> copy files from btrfs to new drive?
- new storage setup should be easy to add to (at a minimum, no homogenous drives)
- Is tiered storage something I can make use of, could I grab an SSD and use it as cache somehow
- how to give client devices and services access to the same filesystems
- eg (accessing `/downloads` from `qbittorrent`, `sonarr` and my phone via NFS/SMB)
---
# Rook: does it solve my storage issues?
- Needs raw devices/partitions
- requires migration
# Longhorn: will it do shit?