Never ask a man his pay, a woman her weight, or a data horder the contents of their stash.
Jk. Mostly.
I have a similar-ish set up to @Davel23 , I have a couple of cool use cases.
I seed the last 5 arch and opensuse (a few different flavors) ISOs at all times
I run an ArchiveBot for archive.org
I scan nontrivial mail (the paper kind) and store it in docspell for later OCR searches, tax purposes etc.
I help keep Sci-Hub healthy
I host several services for de-googling, including Nextcloud, Blocky, Immich, and Searxng
I run Navidrome, that has mostly (and hopefully will soon completely) replace Spotify for my family.
I run Plex (hoping to move to Jellyfin sometime, but there’s inertial resistance to that) that has completely replaced Disney streaming, Netflix streaming, etc for me and my extended family.
I host backups for my family and close friends with an S3 and WebDAV backup target
I run 4x14TB, 2x8TB, 2x4TB, all from serverpartsdeals, in a ZFS RAID10 with two 1TB cache dives, so half of the spinning rust usable at ~35TB, and right now I’m at 62% utilization. I usually expand at about 85%
My favorite city builder in decades. A few notes.
Pros:
Cons:
All in all, I highly recommend it, especially at the modest asking price. If you love city builders, charming and beautiful art, thematic settings, dynamic challenge, and solution engineering, this is a fantastic game for you.
Other games I’ve enjoyed that scratch similar itches:
Get it and have fun is my recommendation.
I’ve had great experiences with exactly one vendor of second hand disks.
Currently running 8x14TB in a striped & mirrored zpool.
Yeah, you should be scrubbing weekly or monthly, depending on how often you are using the data. Scrub basically touches each file and checks the checksums and fixes any errors it finds proactively. Basically preventative maintenance.
https://manpages.ubuntu.com/manpages/jammy/man8/zpool-scrub.8.html
Set that up in a cron job and check zpool status periodically.
No dedup is good. LZ4 compression is good. RAM to disk ratio is generous.
Check your disk’s sector size and vdev ashift. On modern multi-TB HDDs you generally have a block size of 4k and want ashift=12. This being set improperly can lead to massive write amplification which will hurt throughput.
https://www.high-availability.com/docs/ZFS-Tuning-Guide/
How about snapshots? Do you have a bunch of old ones? I highly recommend setting up a snapshot manager to prune snapshots to just a working set (monthly keep 1-2, weekly keep 4, daily keep 6 etc) https://github.com/jimsalterjrs/sanoid
And to parrot another insightful comment, I also recommend checking the disk health with SMART tests. In ZFS as a drive begins to fail the pool will get much slower as it constantly repairs the errors.
ZFS is a very robust choice for a NAS. Many people, myself included, as well as hundreds of businesses across the globe, have used ZFS at scale for over a decade.
Attack the problem. Check your system logs, htop, zpool status.
When was the last time you ran a zpool scrub? Is there a scrub, or other zfs operation in progress? How many snapshots do you have? How much RAM vs disk space? Are you using ZFS deduplication? Compression?
It can. Most people just use the filesystem watcher, but this looks nice. https://github.com/deathbybandaid/tdarr_inform
Hard disagree on them being the same thing. LLMs are an entirely different beast from traditional machine learning models. The architecture and logic are worlds apart.
Machine Learning models are "just"statistics. Powerful, yes. And with tons of useful applications, but really just statistics, generally using just 1 to 10 variables in useful models to predict a handful of other variables.
LLMs are an entirely different thing, built using word vector matrices with hundreds or even thousands of variables, which are then fed into dozens or hundreds of layers of algorithms that each modify the matrix slightly, adding context and nudging the word vectors towards new outcomes.
Think of it like this: a word is given a massive chain of numbers to represent both the word and the “thoughts” associated with it, like the subject, tense, location, etc. This let’s the model do math like: Budapest + Rome = Constantinople.
The only thing they share in common is that the computer gives you new insights.
You’re talking about two very different technologies though, but both are confusingly called “AI” by overzealous marketing departments. The basic language recognition and regressive model algorithms they ship today are “Machine Learning”, and fairly simple machine learning at that. This is generally the kind of thing we’re running on simple CPUs in realtime, so long as the model is optimized and pre-trained. What we’re talking about here is a Large Language Model, a form of neural network, the kind of thing that generally brings datacenter GPUs to their knees and generally has hundreds of parameters being processed by tens of thousands of worker neurons in hundreds of sequential layers.
It sounds like they’ve managed to simplify the network’s complexity and have done some tricks with caching while still keeping fair performance and accuracy. Not earth shaking, but a good trick.
I’ve had excellent luck with Docspell. https://github.com/eikek/docspell
For second hand, I highly recommend https://serverpartdeals.com/
Pros:
bullet proof
Simple
FOSS
Selfhosted
Cons:
password/secrets manager nearly required to setup new devices
fails to make my morning coffee
Yeah, no idea why. Seems like a basic character substitution algorithm using a basic one time pad scheme.
I’m not super deep into cryptography, because it’s a whole field unto itself with experts that can make your head spin in seconds. But this “novel” approach (given the description in this article which might be flawed), reads as neither novel nor secure.
I can’t access the DOI linked though, so I guess I’ll wait for more reliable coverage.
We need to research it to know more. That’s what this funding is for.
The reason green energy is usually brought into the conversation is that while many sequestration strategies require nearly zero energy inputs, many do. What’s the point of cutting into the effectiveness of the solutions by emitting more greenhouse gasses? At least in my case the sentiment here is genuine, no alterior motives, it just makes sense. Can’t say the same for everyone, but big projects often make for strange bedfellows.
Green energy has had steady funding and advances for 30 years. Sequestration is largely still relegated to lifecycle studies and truly needs testing.
Evaluation of lifecycle of a popular solution, with calls for more study. https://nap.nationalacademies.org/catalog/26278/a-research-strategy-for-ocean-based-carbon-dioxide-removal-and-sequestration (news blurb with summary here: https://www.nationalacademies.org/news/2021/12/new-report-assesses-the-feasibility-cost-and-potential-impacts-of-ocean-based-carbon-dioxide-removal-approaches-recommends-u-s-research-program)
Report to US Congress with worthy citations and feasibility findings. https://crsreports.congress.gov/product/pdf/R/R44902
article from Yale with a good interview with a researcher with lots of solid citations https://e360.yale.edu/features/negative-emissions-is-it-feasible-to-remove-co2-from-the-air
There are more, but you get the gist. There’s a familiar pattern in these studies and interviews with scientists and academics- we need negative emissions, and every day we don’t have them we have even more work to do in the same time span. At the same time, we need to study this further because geoengineering will likely have far reaching impacts beyond what we primarily need.
Some of these projects are as simple as reforestation and/or biochar sequestration into rich soils. Some are moonshots like molecular pumps and nanoparticles lattices (charmingly being nicknamed the giant vacuum solution by MSM today). But over and over those studying it seem to agree we need more research and investment. That’s literally what is being announced in this article and everyone is acting like this money was ripped away from someone building a huge green energy plant. Realistically this isn’t how funding for projects and research works.
We need to stop fighting "green energy OR sequestration. It NEEDS to be AND. Trust the scientists who are asking for this.
Copying this from an earlier comment thread on the same topic.
Actually this solves a very important problem. If we stop all pollution and carbon emissions today the earth will still be heated up significantly for the next thousand years or so. Enough that life will be more than uncomfortable, we’ll have massive water shortages, widespread desertification, and wholesale extinctions of many plants and animals.
We need carbon sequestration if we want to control the damage already done.
Dishonor on you! Dishonor on your cow!