Hi, I’m looking for some recommendations, mostly looking for pointers of where to go and look at/research stuff as I have no idea what is good and what is just well advertised.
Intro: I have finally entered the world of (almost) Gigabit internet, which is opening up options with what I can host.
I currently have:
I will probably also be upgrading my gaming PC in the next few months, so my current rig will probably be put behind the TV to use as a server and for couch gaming.
Info/recommendations I would like:
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.
Rules:
Be civil: we’re here to support and learn from one another. Insults won’t be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it’s not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don’t duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
What scanner do you have? My biggest hurdle in making real use of paperless revolves around the annoyance of using a flatbed that’s not within arms each of my desk lol
ScanSnap iX1600. I bought mine from B&H: https://www.bhphotovideo.com/c/product/1615326-REG/fujitsu_pa03770_b635_scansnap_ix1600_document_scanner.html. There’s two scanners that usually get recommended for paperless: this one, and a cheaper (but not as nice) Brother one.
It’s a really compact unit - smaller than I thought it’d be! You can put up to 50 sheets in the feeder and it scans them all, on both sides (no need to manually flip the pages). Can scan 40 pages per minute.
I’ve combined it with ASN (archive serial number) QR code stickers for documents that I need to keep a physical copy of. I’m using Avery 5267 stickers + Avery’s online designer site to design and print them. If I need to keep a physical copy of the document, I stick a sticker on the document, scan it, and Paperless automatically detects the QR code and sets the ASN. Then I keep all the physical copies in a binder, ordered by ASN. If I need to locate a physical document, I find it in Paperless, check the ASN, then go to the right document in the binder (easy to find the right place since they’re all in order).
There’s just a few minor issues with the scanner, but otherwise it’s perfect:
For everybody, who hasn’t that much of paperwork: I’m kind of doing the same, but without barcode stickers. Just scan the document into paperless and then stick it in a box or a folder. If you need the physical document sometimes in the future (which you won’t), paperless of course has the date of the scan / date of the document available. It then it quite easy to take your chronolocical sorted documents and find the one that came in on 2023-04-14
Interesting approach with the ASN — haven’t started using that feature yet. If I understand correctly, you add a QR ASN to each document you need to keep a physical copy of? And that sticker also has the ASN in human readable form? So you would then add many documents at once to the feeder, and Paperless will read the QR and also split documents whenever a new code appears?
What about documents you don’t want to keep physically? Is there a way to get Paperless to split them automatically as well if you add many to the feeder?
Yes! They look like this:
Paperless supports two different splitting methods:
so all you need to do is have a “Patch T” page between each document and it’ll split them automatically.
Docs: https://docs.paperless-ngx.com/advanced_usage/#document-splitting
I’m also using
paperless-ai
to automatically tag and set a title for scanned documents. Very useful. I’d love to run my own AI locally using ollama, but I don’t have good enough hardware so for now I’m using Google’s Gemini 2.0 Flash. I trust Google’s privacy policy far more than OpenAI’s, Google Gemini is very cheap, and if you use the paid version they don’t retain any of your data nor use it for training.Thanks, this sounds really useful. Patch T sounds like some manual sorting work, but I guess with the option to reuse those separator pages it is still better than manual splitting or - worse - single scanning.
I haven’t looked into paperless-ai yet, but I hope my machine would be beefy enough for this task — worst case I guess it might take a little longer to process all docs.
Now I only still need to decide on a good archiving method. I read some article a long time ago about the pros and cons of different document archiving methods used by professional archivers. Some prefer horizontal stacking in boxes, while others prefer vertical stacks in vertical boxes. Pretty interesting nerdy topic 😀
You need a GPU with a decent amount of VRAM to get LLMs working well locally. I don’t have a new enough GPU to be useful - my server just has the Intel iGPU, and my desktop PC only has a GTX1080, which is from before Nvidia added Tensor cores for AI.
Thanks, I’ll look into it. For completionists: This is the article about how to properly archive paper: https://peelarchivesblog.com/2024/09/10/how-do-archivists-package-things-the-battle-of-the-boxes/