• 4 Posts
  • 13 Comments
Joined 1Y ago
cake
Cake day: Jun 12, 2023

help-circle
rss

Knowing what and when to abstract can be hard to define precisely. Over abstraction has a cost. So does under abstraction. I have seen, writen and refactored terrible examples of both. Anecdotally, flattening an over abstracted hierarchy feels like less work and usually has better test coverage to validate correctness after refactoring than abstracting under abstracted code (spaghetti code, linear code, brand it how you will). Be aware of both extremes and try to find the balance.


My homelab is a 2 node Kubernetes cluster (k3s, raspberry pis), going to scale it up to 4 nodes some day when I want a weekend project.

Built it to learn Kubernetes while studying for CKA/CKD certification for work where I design, implement and maintain service architectures running in Kubernetes/Openshift environments every day. It’s relatively easy for me to manage Kubernetes for my home lab, but It’s a bit heavy and has a steep learning curve if you are new to it which (understandably) puts people off it I think. Especially for homelab/selfhosting use cases. It’s a very valuable (literally $$$) skill if you are in that enterprise space though.


Yeah, I know, that’s a huge advantage in this situation, but not one I can take advantage of 🙂


Switched to qbittorrent+gluetun side car recently and it’s been pretty good compared to the poorly maintained combo torrent+OpenVPN images I was using. Being able to update my torrent client image/config independent from the VPN client is great. Unfortunately most of the docs are Docker focused so it’s a bit of trial and error to get it setup in a non-docker environment like Kubernetes. Here’s my deployment in case it’s useful for anyone. Be careful that you configure qbittirrent to use “tun0” as it’s network interface or you will be exposed (got pinged by AT&T before I realized that one). I’m sure there’s a more robust way to makeuse of gluetun’s DNS over TLS and iptables kill switch that doesn’t require messing with qbittorrent config to secure, but that’s what I have so far and it works well enough for now.


Look for refurbished units, you can get enterprise grade units for like half the retail price. I recently got a refurbished APC from refurbups.com. Comes with brand new batteries, mostly rack mountable stuff. Ended up being a little over half the price of a brand new one with shipping. Can’t tell at a glance if they ship to Canada, but if not I’d be surprised if there wasn’t a similar Canada based site you could find.


Got a refurbished APC coming in today. Looking forward to not having to worry about my NAS drives or losing internet because or a split second power blip.


Not really, its mostly a hobby/nerdy/because I can thing. I am a software engineer with a decade of experience. The job sometimes requires virtual sys admin work (VM/container, cloud networking, etc). Setting up my own baremetal cluster has given me more insight into how things work, especially on the network side. Most of my peers take for granted that traffic gets in or out of a cluster, but I can actually troubleshoot it or design with it in mind.


Thoughts on server/network racks?
Every couple months I get the urge to organize my lab/home office equipment into a rack/cabinet, but never follow through on it. I ocassionally look on craigslist for deals, but everything is either too far away or too big. I'd rather pay more for a smaller rack that doesn't go all the way to the ceiling and will just show up on my doorstep. A 6U would fulfil my current requirements, 12U is probably more than enough in reality but as an engineer I find myself eyeing 15-18U to be conservative. This iteration of the search has me eying these options: * [sysracks 18U server rack](https://www.amazon.com/dp/B082YJVBTV/?coliid=I3NT2EN7YX0XES&colid=3E8TPEGQ105CM&psc=1&ref_=list_c_wl_lv_ov_lig_dp_it) - slightly bigger than I want, but still reasonable. Some questionable reviews on manufacturing/shipping quality, but this seems like a solid cost/value ratio: fully enclosed, room to grow, wheels, accessories like shelves and such I'd want anyway. Feels like maybe overkill, but for the price... * [NavePoint 15U Portable Rolling Network Rack](https://www.amazon.com/dp/B08HWGKPWF/?coliid=I3TO1OGGRHCC1D&colid=3E8TPEGQ105CM&psc=1&ref_=list_c_wl_lv_ov_lig_dp_it) - closer to the size I want (12/15U options), cheaper but no accessories, like shelves, I'd need bringing it closer in price to the sysrack. Similar manufacturing/shipping concern reviews. I like this one, but hard to feel like it's not a worse deal than the sysrack. * some startech variant - these seem generally higher build quality (sturdier) but higher cost and more "bare bones" looking. also often adjustable depth making it potentially more future proof. but I'm not sure either of these make up for the increased cost. What do you think? Any advice or wisdom you can share? I'm feeling like finally following through this time because my office is a tiny mess. Leaning toward the NavePoint currently.
fedilink

I considered it but RAM is very limited on the NAS and the cluster nodes, it’s my primary bottleneck. it would also be more volitile. the two SSDs are RAID 1 redundant, just like the underlying HDDs, in addition to the built in power loss protection on the drives. RAM discs are great if you can spare them and have a UPS though.


Kubernetes and SSD Read Cache - Beautiful Silence
So I run a small Kubernetes cluster (k3s) backed by MariaDB hosted on a Synology NAS with only HDDs rather than etcd colocated on the control nodes. For resiliency purposes it's been great, nodes are basically pure compute resources I can wipe out and recreate with ease and not worry about data loss. However, for over a year now I've lived with the constant chatter of active hard drives in my office. The Kube DB workload is extremely read heavy and very active: many thousands of selects per minutes with only a handful of writes. Clickclickclickclickclickclick. Seems like a good case for caching, and luckily my NAS has 2 NVMe slots for an SSD cache. I bought a couple data center drives with PLP (Kingston DC1000B, probably overkill, but not crazy expensive), pop them in, set up a read/write cache for the database and Kube NFS volumes and...silence, wonderful silence. It's almost constantly at 100% cache hits. Bonus points if things are faster as well. I'm very happy. Never optimized an application for noise levels before 😁.
fedilink

Fyi you will not be able to do live video transcoding with a raspberry pi. I overclocked my pi4’s CPU and GPU and it just can’t handle anything but direct play and maybe audio stream transcoding, though I’ve never had luck with any transcoding period. I either download a format I know can direct play or recently started using tdarr (server on pi, node running on my desktop when I need it) to transcode into a direct play format before it hits my Jellyfin library. Even just using my AMD Ryzen 5 (no GPU) it transcodes like 100x faster than a tdarr node given 2 of the rpi cpu cores. You could probably live transcode with a decent CPU (newer Intel CPUs are apparently very good at it) if you run Jellyfin on the NAS but then you’re at odds with your low power consumption goals. Otherwise rpi Jellyfin is great.

Good luck, I’d like to build a NAS myself at some point to replace or supplement my Synology.


It’s for the chance that I need to administer my cluster when I am not on my LAN. I can set up a port forward to the externally accessible port and everything works as normal like I’m on my LAN. Non-default port, password auth disabled, ssh with root disabled (so you have to have my user and ssh key) and limited ssh connection attempts before ban. I can toggle it on or off with a check box on my router. Yes, I understand there are other ways that are even more secure, yes I understand the risks, but for my circumstances this was a good balance of convenience and security. I’ve also never had an issue :).


I do as well on a non-standard port, although that doesn’t really provide any extra security. I found ssh only login acceptably secure personally, but it’s definitely less secure than tailscale which can operate with 0 open ports. The risk would be from os/sshd vulnerabilities that can be exploited. As long as you keep the router up to date it should be safe enough.


It’s a great tool for knowledge sharing, ramp up and debugging. Definitely not something that needs to happen on every story. Stuck on something or working on a weird bug? Get someone on a call and walk them through it. New team member or old susbsytem not many people understand? Pair the less knowledgeable person up with an SME for the first couple tasks so they can pick the SME’s brain while they work and get valuable context that might be lost in code or the story description.

It also doesn’t need to drag on. I find 30 minutes is best because as you approach an hour+ attention is hard to maintain. Get on the same page, learn a few things and once your making progress move to async communication.

Pair programming is a tool and only valuable if you know how and when to use it.


Get enough experience and you just have a brief moment of stage 3 as you dive straight to stage 4.

Unless it’s a customer/that-one-guy-at-work (it’s a title, but there’s usually a handful of them) and then there’s this vast stage 0 of back and forth of “are you sure that’s happening, run these commands and paste the entire output to me” to be sure of what they are saying then you jump to stage 3/4.


Baremetal Kubernetes - LF host level metrics/monitoring/reporting solution
I run a baremetal Kubernetes cluster on a couple raspberry pis (though that detail isn't super important to this question). I am familiar with Kubernetes metrics/alerting tools such as grafana, Prometheus, Loki, ELK stack, etc. I am also familiar with the node metrics exporter for gathering node level resource metrics like CPU, memory, file system, temps, etc. All that's great and gets me like 99% of the way there. The last 1% that I am looking for are things like available updates (e.g. 56 packages with available updates), reboot required, system component status, etc and for whatever reason I sttuggle to find good search results for this specific problem area. I can and do use things like dnf-automatic/unattended-upgrades and systemd to maintain the minimal system level health (so 99% -> 99.8%) but I haven't been able to find a solution that provides a bit more insight depth into underlying system health, probably because that's usually handled by cloud providers/hypervisors. I am sure I could come up with some custom, not too hacky solution for myself (off the top of my head: a pod/job with access the underlying system to run whatever commands I want to gather state and make it available to the Kube space general monitoring solution, feels dirty though) but it feels like an obvious hole I'm just missing the wrong Google incantation to find. Any ideas or experience you can provide? Please don't suggest kube metrics node-exporter, unless I am missing something it doesn't provide what I am asking about.
fedilink

Looking for 64-bit RPi4 Server OS Distro suggestions
So I've been running a little 2 node rpi kubernetes cluster for over a year now, bootstrapped with Ansible and Helm ([source](https://github.com/macgregor/homelab)). I picked Ubuntu Server at the time because I think the official 64-bit Raspbian OS was still young or maybe not even out at the time (can't quite remember) but I've found myself fighting with Ubuntu an awful lot culminating in a major version upgrade to "jammy" last night that has wrecked one of my nodes. It even tried to delete the running kernel during the upgrade but caught itself and asked me to confirm, wtf. I've never experienced a Linux upgrade this bad. Yeah, "jammy" is right. Luckily I use a separate NAS for persistence. So I'm breaking up with Ubuntu, which I think is the cool thing to do these days anyway, and using this as an opportunity to rebuild and clean up my IaC. I am most familiar with Red Hat distros (Fedora/CentOS daily drivers for years now, RHEL servers at work) though I'm not familiar with the ARM ecosystem there. Ive also been wanting to try NixOS for a while but looking at some of the rpi config last night had me a little concerned because it felt unfamiliar. Then of course there's the Official Raspbian OS, 64 bit support should be solid by now. What OS are you using for your Raspberry Pi servers? Any I should definitely avoid?
fedilink