DevOps as a profession and software development for fun. Admin of lemmy.nrd.li and akkoma.nrd.li.

Filibuster vigilantly.

  • 0 Posts
  • 74 Comments
Joined 1Y ago
cake
Cake day: Jun 10, 2023

help-circle
rss

Laptops/desktopes: no real naming scheme, they use non-static DHCP leases anyway.

Physical servers: NATO phonetic alphabet. If I run out of letters something has gone terribly wrong right.

VMs: I don;t have many of these left, but they are named according to their function and then a digit in case I need more. e.g. docker1, k3s1. This does mean that I have some potential oddities like a k3s cluster with foxtrot, alpha, and k3s1 as members, but IMO that’s fine and lets me easily tell if something is physical or virtual. I am considering including the physical machine name in the VM name for new things as I no longer have things set up such that machines can migrate… though I haven’t made a new VM in some time.

Network equipment: Named according to location and function. e,g, rack-router, rack-10g, rack-back-1g, rack-ap, upstairs-10g, upstairs-ap. If something moves or is repurposed it is likely getting reconfigured so renaming at that point makes sense.


I switched to Fogejo just by swapping out the image. So far gitea hasn’t been malicious with its trademarks now being owned by a private company, but I feel better using software that is more closely tied to a nonprofit. I see no reason to switch back.


I have owned and otherwise dealt with a few different Startech 4-post open racks and have been very happy with them. I currently use one of their 25U racks for my lab, but am running out of space…


I started on Gitlab, which was a monster to run. I moved to Gitea, until the developers started doing some questionable things. Now I’m on Forgejo (a fork of Gitea).


I believe the Pictrs is a hard dependency and Lemmy just won’t work without it, and there is no way to disable the caching. You can move all of the actual images to object storage as of v0.4.0 of Pictrs if that helps.

Other fediverse servers like Mastodon actually (can be configured to) proxy all remote media (for both privacy and caching reasons), so I imagine Lemmy will move that way and probably depend even more on Pictrs.


IIRC Lemmy preloads all thumbnails for posts in communities you subscribe to into pictrs to be cached for like a month or something. So, yeah…


I switched from Plex to Jellyfin several years ago and haven’t really looked back. Overall I just didn’t like the direction plex kept going (pushing shit streaming services, central auth, paywalling features), and dropped it even though I grabbed a lifetime plex pass back in the day. The only thing I miss about plex was the ease of developing a custom plugin for it since you could pretty much just drop python scripts in there and have it work, though their documentation for plugin development was terrible (and I think removed from their site entirely).


I love tinc, it’s so simple. I wish there were something just as easy that leveraged wireguard instead of whatever custom VPN/tunneling stuff tinc uses, as using it scares me with how seemingly little maintenance tinc gets. Like if tailscale/headscale and tinc had a baby, haha.

Is there a way to run tinc on your phone or similar? To me that’s another bonus of tailscale at least.


Having a “source of truth” makes many things easier but less resilient. One place to go get the latest version of something mutable. The fediverse/ActivityPub needs to get on board with some form of DID or something similar before worrying about improving the ID system (and the ID system is inherently tied to JSON-LD, so AP would need to stop using that or there would need to be a new version of it) IMO.


Basically, no:

It can cause some wackiness… basically you will need to maintain that old domain forever and everything will still refer to that old domain.

For example, your post looks like this from an ActivityPub/federation perspective:

{
    [...]
    "id": "https://atosoul.zapto.org/post/24325",
    "attributedTo": "https://atosoul.zapto.org/u/Soullioness",
    [...]
    "content": "<p>I'm curious if I can migrate my instance (a single user) to a different domain? Right now I'm on a free DNS from no-ip but I might get a prettier paid domain name sometime.</p>\n",
}

The post itself has an ID that references your domain, and the the attributedTo points to your user which also references your domain. AFAIK there is no reasonable way to update/change this. IDs are forever.

It would also break all of the subscriptions for an existing instance, as the subscriptions are all set to deliver to that old domain.

IMO your best bet would be to start a new instance on the new domain, update your profile on the old one saying that your user is now @Soullioness@newinstance.whatever and maintain that old server in a read-only manner for as long as you can bear.


Lemmy and Akkoma, both in docker with Traefik in front.


Ext4 because it is rock solid and a reasonable foundation for Gluster. Moving off of ZFS to scale beyond what a single server can handle. I would still run ZFS for single-server many-drive situations, though MDADM is actually pretty decent honestly.


A few of these servers were stacked on top of each other (and a monitor box to get the stack off the ground) in a basement for several years, it’s a journey.


No. - sent from my iNstance


Business in the front:

  • Mikrotik CRS2004-1G-12S+2XS, acting as a router. The 10g core switch plugs into it as well as the connection to upstairs
  • 2u cable management thing
  • Mikrotik CRS326-24S+2Q+, most 10g capable things hook into this, it uses its QSFP+ ports to uplink to the router and downlink to the (rear) 1g switch.
  • 4u with a shelf, there are 4x mini-pcs here, most of them have a super janky 10g connection via an M.2 to PCIe riser.
  • “echo”, Dell R710. I am working on migrating off of/decomissioning this host.
  • “alpha”, Dell R720. Recently brought back from the dead. Recently put a new (to me) external SAS card into it, and it acts as the “head” unit for the disk shelf I recently bought.
  • “foxtrot”, Dell R720xd. I love modern-ish servers with >= 12-disks per 2u. I would consider running a rack full of these if I could… forgive the lack of a label, my label maker broke at some point before acquiring this machine.
  • “delta”, “Quantum” something or other, which is really just a whitelabeled Supermicro 3u server.
  • Unnamed disk shelf, “NFS04-JBOD1” to its previous owner. Some Supermicro JBOD that does 45 drives in 4u, hooked up to alpha.

Party in the back:

  • You can see the cheap monitor I use for console access.
  • TP-Link EAP650, sitting on top of the rack. Downstairs WAP.
  • Mikrotik CRS328-24P-4S+, rear-facing 1g PoE/access switch. The downstairs WAP hooks into that as well as the one mini-PC I didn’t put a 10g card on. It also provides power (but not connectivity) to the upstairs switch. It used to get a lot more use before I went to 10g basically everywhere. Bonds 4x SFP+ to upllink via the 10g switch in front.
  • You can see my cable management, which I would describe as “adequate”.
  • You can see my (lack of) power distribution and power backup strategy, which I would describe as “I seriously need to buy some PDUs and UPSs”

I opted for a smaller rack as my basement is pretty short.

As far as workloads:

  • alpha and foxtrot (and eventually delta) are the storage hosts running Ubuntu and using gluster. All spinning disks. ~160TiB raw
  • delta currently runs TrueNAS, working on moving all of the storage into gluster and adding this in to that. ~78TiB raw, with some bays used for SSDs (l2arc/zil) and 3 used in a mirror for “important” data.
  • echo, currently running 1 (Ubuntu) VM in Proxmox. This is where the “important” (frp, Traefik, DNS, etc) workloads run right now.
  • mini-pcs, running ubuntu, all sorts of random stuff (dockerized), including this Lemmy instance. Mounting the gluster storage if necessary. They also have a gluster volume amongst themselves for highly redundant SSD-backed storage.

The gaps in the naming scheme:

  • I don’t remember what happened to bravo, it was another R710, pretty sure it died, or I may have given it away, or it may be sitting in a disused corner of my basement.
  • We don’t talk about charlie, charlie died long ago. It was a C2100. Terrible hardware. Delta was bought because charlie died.

Networking:

  • The servers are all connected over bonded 2x10g SFP+ DACs to the 10g switch.
  • The 1g switch is connected to the 10g switch with QSFP+ breakout to bonded 4x SFP+ DAC
  • The 10g switch is connected to the router with QSFP+ breakout to bonded 4x SFP+ DAC
  • The router connects to my ISP router (which I sadly can’t bypass…) using a 10GBASE-T SFP+.
  • The router connects to an upstairs 10g switch (Mikrotik CRS305-1G-4S+) via a SFP28 AOC (for future upgrade possibilities)
  • I used to do a lot of fancy stuff with VLANs and L3 routing and stuff… now it’s just a flat L2 network. Sue me.

If you find a decent alternative let me know. I have been looking for a while and not found anything that supports the full feature set I want (including Twilio).



There is an “Actions” feature coming that is very similar to GitHub actions for CI and similar use-cases. It’s still behind a feature flag as it’s not quite ready for prime-time, but you can enable it on a self-hosted instance if you want. I believe this is also in Gitea as well, so you don’t have to use the Forgejo fork, but I have moved my instance over due to the whole situation leading to the fork.



I am pretty sure what I described is only when --log.level=DEBUG or

[log]
  level = "DEBUG"

The syntax errors are weird/concerning if it says there are errors but it still seems to load the config anyway (based on you seeing them in the dashboard).

Back when I used the file provider I pointed it at a directory and put every router/service in its own file with that volume’d in to e.g. /traefik-conf. That’s probably more just advice than being your problem though.


Yeah, as someone who used Mastodon back in the day this wasn’t surprising, as they sorta highlighted your vs local vs public timeline, but I can totally see how it could be confusing expecting Lemmy to just be a “reddit clone”. And TBF it is a reddit clone of sorts if you disable federation, “All” is everything your instance can possibly access, but then you lose out on what IMO is the killer feature.

There is probably a way you could spider instances and scrape content to get an “All” of sorts…


Your logs (at debug level at least, which is where I keep my server, haha) should have entries something along the lines of:

  • Receiving configuration from the file provider
  • What routers and services it sets up based on the configuration
  • Whether certificate generation is needed for the routers
  • What happens when LEGO tries to generate the certificate (created account, got challenge, passed/failed challenge, got cert, etc)

Use a site like browse.feddit.de to find communities you want to join and join them. Every instance only “has” their local communities plus whatever remote communities the users of the instance join. With more users it is more likely someone else has subscribed to something you are interested in, but someone on e.g. lemmy.world had to be the first user there to search and subscribe to any community that isn’t based on that instance.


Yeah, you could also set up some sort of caching proxy in the cloud just for images and host those on a different domain (e.g cdn.lemmyinstance.com) if you want to host large images still and be as self-hosted as is possible given the constraints.


Is traefik successfully getting the cert via LE? It sounds like for one reason or another it is still using the built-in/default cert for those services. You can check the traefik log’s LEGO lines, and/or look at your /letsencrypt/acme.json.

In my example I specified entrypoints.https.http.tls.domains, but I think that is only necessary when you’re doing wildcard domains with a DNS solver.

edit: You may need to use the file provider rather than trying to specify stuff in the main config toml… traefik differentiates from “static” config that it has to know at boot time and can’t change and “dynamic” config like routers and stuff.


Most of your traffic will be incoming, not outgoing. Unless you are posting to a community hosted on your instance the only time you send stuff will be when you post or comment, and even then you only send that to the instance hosting the community.

edit: Also if you post an image in a post/comment that would get loaded from your instance.


Traefik. It has a GUI that I can use to see things, and (depending on your setup) you define the routes and stuff as part of your container definitions, minimal extra work required, makes setup and teardown a breeze. It is also nice that you can use it in all sorts of places, I have used it as Kubernetes ingress and as the thing that routed traffic to a Nomad cluster.

I went from Apache to Nginx (manually configured, including ACME) to Traefik over the course of the past ~10 years. I tried Caddy when I was making the switch to Traefik and found it very annoying to use, too much magic in the wrong places. I have never actually used NPM, as it doesn’t seem useful for what I want…

Anyway, with traefik you can write your services in docker compose like this, and traefik will just pick them up and do the right thing:

version: "3"
services:
  foo-example-com:
    image: nginx:1.24-alpine
    volumes: ['./html:/usr/share/nginx/html:ro']
    labels:
      'traefik.http.routers.foo-example-com.rule': Host(`foo.example.com`)
    restart: unless-stopped
    networks:
      - traefik
networks:
  traefik:
    name: traefik-expose-network
    external: true

It will just work most of the time, though sometimes you’ll have to specify 'traefik.http.services.foo-example-com.loadbalancer.server.port': whatever or other labels according to the traefik docs if you want specific behaviors or middleware or whatever.

And your deployment of traefik would look something like this:

version: '3'
services:
  traefik:
    image: traefik:v2
    command: >-
      --accesslog=true
      --api=true
      --api.dashboard=true
      --api.debug=true
      --certificatesresolvers.le.acme.dnschallenge.provider=provider
      --certificatesresolvers.le.acme.storage=acme.json
      [ ... other ACME stuff ... ]
      --entrypoints.http.address=:80
      --entrypoints.http.http.redirections.entrypoint.to=https
      --entrypoints.http.http.redirections.entrypoint.scheme=https
      --entrypoints.https.address=:443
      --entrypoints.https.http.tls.certresolver=le
      --entrypoints.https.http.tls.domains[0].main=example.com
      --entrypoints.https.http.tls.domains[0].sans=*.example.com
      --entrypoints.https.http.tls=true
      --global.checknewversion=false
      --global.sendanonymoususage=false
      --hub=false
      --log.level=DEBUG
      --pilot.dashboard=false
      --providers.docker=true
    environment:
      [ ... stuff for your ACME provider ... ]
    ports:
      # this assumes you just want to do simple port forwarding or something
      - 80:80
      - 443:443
      # - 8080:8080 uncomment if you want to hit port 8080 of this machine for the traefik gui
    working_dir: /data
    volumes:
      - ./persist:/data
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - traefik
    restart: unless-stopped
networks:
  traefik:
    name: traefik-expose-network
    external: true

Note that you’d have to create the traefik-expose-network manually for this to work, as that is how traefik will talk to your different services. You can get even fancier and set it up to expose your sites by default and auto-detect what to call them based on container name and stuff, but that is beyond the scope of a comment like this.

Technically my setup is a little more complex to allow for services on many different machines (so I don’t use the built-in docker provider), and to route everything from the internet using frp using proxy protocol so I don’t expose my home IP… I think this illustrates the point well regardless.


“Initial sync” isn’t a thing. Things only federate from communities after you subscribe to it. Old posts will make their way over if someone interacts with it (comments/votes on it). I think old comments may make their way over under the same conditions. Old votes will not make their way over so your vote count on old posts will never be right.

You can search for a post or comment to force your instance to load it (copy the federation link, the rainbow-web-looking icon) just like you would do for communities. I think there are scripts out there that may automate this process to force your instance to load old content, but you’re putting more load on an already strained system.

And yes, lemmy.world is probably overloaded. Usually this just means that federation from it isn’t instant and may take a little time.


I actually just migrated things to a setup that is pretty neat using FRP: I run frps on 2 Linodes in the same datacenter and have set up IP sharing for failover between them (which is a neat feature Linode, Vultr and probably a few others offer), and then I run 4 frpc’s, two for each frps in case one of them breaks somehow. Lots of redundancy without all that much effort.


Pictrs is a mix of both cache and long-lived images on anything you post, your profile image, etc. So you may want to back it up… Apparently there may also be some sort of internal database pictrs uses as well, but I haven’t looked into that…

I would suggest taking postgres dumps and backing those up rather than trying to back up the data directory, as according to the pg docs when taking a disk-level backup “[t]he database server must be shut down in order to get a usable backup.”


To answer what I think you are getting at lemmy scales based on two things:

  1. Database size (and write volume) scales mostly on what communities are being federated to you. Unless you are .world the volume of remote content is going to massively outweigh local content. On my (mostly) single-user instance I have found this to be the same with Pictrs as well, as it is mostly eating storage to store federated thumbnails.
  2. Database read load scales mostly on the number of users you have. For a single-user instance this is pretty minimal. For an instance like .world (with thousands of users) I imagine it is significant and scaling postgres to have read-only replicas to scale this load.

~18 hours ago I wrote

My instance has been running for 23 days, and I am pretty much the only active local user:

7.3G    pictrs
5.3G    postgres

I may have a slight Reddit Lemmy problem

As of right now

7.5G    pictrs
5.7G    postgres

So my storage is currently growing at around 1G per day, though pictrs is mostly cached thumbnails so that should mostly level out at some point as the cache expires.

To answer your stated question: I run an instance on a mini PC with 32G of RAM (using <2G including all lemmy things such as pg, pictrs, etc and any OS overhead) and a quad core i5-6500T (CPU load usually around 0.3). You could probably easily run Lemmy on a Pi so long as you use an external drive for storage.


There are many ways to update dns automatically, I have used this container in the past. You could probably even write a bash script/cron job that checks your IP and updates it with curl depending on your DNS provider.

If you are already running tailscale you may be interested in using a funnel, which lets you accept and route internet traffic to your tailnet. I don’t use tailscale so can’t comment on how good/bad/useful this is.

You could also run some sort of service like frp from some remote box (like a VPS in DO/Linode/etc). This or the funnel lets you not expose/advertise your home IP address if that is a consideration.


I’m talking purely from an ActivityPub/Activity Streams/Activity Vocabulary/JSON-LD perspective. There are some other local identifiers for things in Lemmy, but those do not matter for the purposes of federation. Any Object that is federated is expected to have an ID that is a URL at which you can make a GET request with the proper Accept header and you will get the latest version of that Object. AFAIK there is no provision for IDs to change.


Migration of ActivityPub stuff is pretty rough… Everything has an ID, and that ID is the URL, so the ID of the post you replied to is literally https://lemmy.nrd.li/comment/227095… AFAIK there are some (non-standard, at least not in core AP) ways you can mark things to be like “yeah, this moved to over here”, but that isn’t built in to the spec so whether those mechanisms actually work is a crapshoot.


Yeah, I think the problem comes if you don’t want to manually configure “Add-ons”. Using this feature is only supported on their OS or using “Supervised”. “Supervised” can’t itself be in a container AFAIK, only supports Debian 12, requires the use of network manager, “The operating system is dedicated to running Home Assistant Supervised”, etc, etc.

My point is they heavily push you to use a dedicated machine for HASS.


Which is exactly why you should self-host. No one to blame but yourself when your instance goes down/away.

Sadly this idea doesn’t mesh well with how communities work given those are inherently tied to an instance, unlike e.g. hashtags on Mastodon. It would suck if some community goes away just because the instance admin got tired of running it.


Communities are inherently tied to the instance on which they are created and cannot be moved. If the instance is overloaded then that community will not federate properly. If the instance goes down nobody can post to the community. If the instance goes away that community goes away (except for the “cache” that other instances have).


Yeah… it is kinda hypocritical for this community to be based on .world, haha. There are plenty of people here running instances, who wants to volunteer as tribute and to sign up to be on call?


If everything you want to run makes sense to do within k8s it is perfectly reasonable to run k8s on some bare-metal OS. Some things lend themselves to certain ways of running them better than others. E.g. Home Assistant really does not like to run anywhere but a dedicated machine/VM (at least last time I looked into it).

Regardless of k8s it may make sense to run some sort of virtualization layer just to make management easier. One panel you can use to access all of the machines in you k8s cluster from a console level can be pretty nice, and a Proxmox cluster would give you this. You can make a VM on a host that takes up basically all of the available RAM/CPU on it. Proxmox specifically has some built-in niceties with gluster (which I’ve never use, I manage gluster myself on bare metal) which could even be useful inside a k8s cluster for PVCs and the like.

If you are willing to get weird (and experimental) look into Rancher’s Harvester it’s an HCI platform (similar to Proxmox or vSphere) that uses k8s as its base layer and even manages VMs through k8s APIs… I played with it a bit and it was really neat, but opted for bare metal Ubuntu for my lab install (and actually moved from rke2 to k3s to Nomad to docker compose with some custom management/clustering over the course of a few years).


Yeah it’s not automated or anything, I just pop an incognito window and use it when there is a communitI think is worth seeing sometimes in “All” (or just for archiving purposes) but don’t want to clutter “Subscribed”. I may make something to auto-subscribe to communities meeting some criteria or something at some point in the future…