New to self-hosting

@lucid@lemmy.dbzer0.com

What scanner do you have? My biggest hurdle in making real use of paperless revolves around the annoyance of using a flatbed that’s not within arms each of my desk lol

@dan@upvote.au

ScanSnap iX1600. I bought mine from B&H: https://www.bhphotovideo.com/c/product/1615326-REG/fujitsu_pa03770_b635_scansnap_ix1600_document_scanner.html. There’s two scanners that usually get recommended for paperless: this one, and a cheaper (but not as nice) Brother one.

It’s a really compact unit - smaller than I thought it’d be! You can put up to 50 sheets in the feeder and it scans them all, on both sides (no need to manually flip the pages). Can scan 40 pages per minute.

I’ve combined it with ASN (archive serial number) QR code stickers for documents that I need to keep a physical copy of. I’m using Avery 5267 stickers + Avery’s online designer site to design and print them. If I need to keep a physical copy of the document, I stick a sticker on the document, scan it, and Paperless automatically detects the QR code and sets the ASN. Then I keep all the physical copies in a binder, ordered by ASN. If I need to locate a physical document, I find it in Paperless, check the ASN, then go to the right document in the binder (easy to find the right place since they’re all in order).

There’s just a few minor issues with the scanner, but otherwise it’s perfect:

It was a bit expensive, at $400 in the USA.
You need a Windows or MacOS system to do the initial setup. Setting it up is done through a desktop app rather than through the touchscreen on the device.
Some of the options need a computer connected to the scanner via USB, or signing up to their cloud service. However, it does support scanning to a SMB share without a computer connected, which is all I needed. I have my paperless-ngx “consume” directory shared via Samba. You just need to delete the default scanning profiles and add a network scan (SMB) one.

@Obelix@feddit.org

For everybody, who hasn’t that much of paperwork: I’m kind of doing the same, but without barcode stickers. Just scan the document into paperless and then stick it in a box or a folder. If you need the physical document sometimes in the future (which you won’t), paperless of course has the date of the scan / date of the document available. It then it quite easy to take your chronolocical sorted documents and find the one that came in on 2023-04-14

@BennyInc@feddit.org

Interesting approach with the ASN — haven’t started using that feature yet. If I understand correctly, you add a QR ASN to each document you need to keep a physical copy of? And that sticker also has the ASN in human readable form? So you would then add many documents at once to the feeder, and Paperless will read the QR and also split documents whenever a new code appears?

What about documents you don’t want to keep physically? Is there a way to get Paperless to split them automatically as well if you add many to the feeder?

@dan@upvote.au

And that sticker also has the ASN in human readable form?

Yes! They look like this:

So you would then add many documents at once to the feeder, and Paperless will read the QR and also split documents whenever a new code appears? What about documents you don’t want to keep physically? Is there a way to get Paperless to split them automatically as well if you add many to the feeder?

Paperless supports two different splitting methods:

If it encounters an ASN QR code, it’ll split at that point and keep the page with the barcode
If it encounters a special barcode that’s used as a separator sheet, it’ll split at that point and delete the page with the barcode. By default it looks for a “Patch T” barcode, and you can a page with a Patch T barcode from https://www.alliancegroup.co.uk/patch-codes.htm

so all you need to do is have a “Patch T” page between each document and it’ll split them automatically.

Docs: https://docs.paperless-ngx.com/advanced_usage/#document-splitting

I’m also using paperless-ai to automatically tag and set a title for scanned documents. Very useful. I’d love to run my own AI locally using ollama, but I don’t have good enough hardware so for now I’m using Google’s Gemini 2.0 Flash. I trust Google’s privacy policy far more than OpenAI’s, Google Gemini is very cheap, and if you use the paid version they don’t retain any of your data nor use it for training.

@BennyInc@feddit.org

Thanks, this sounds really useful. Patch T sounds like some manual sorting work, but I guess with the option to reuse those separator pages it is still better than manual splitting or - worse - single scanning.

I haven’t looked into paperless-ai yet, but I hope my machine would be beefy enough for this task — worst case I guess it might take a little longer to process all docs.

Now I only still need to decide on a good archiving method. I read some article a long time ago about the pros and cons of different document archiving methods used by professional archivers. Some prefer horizontal stacking in boxes, while others prefer vertical stacks in vertical boxes. Pretty interesting nerdy topic 😀

@dan@upvote.au

I haven’t looked into paperless-ai yet, but I hope my machine would be beefy enough for this task

You need a GPU with a decent amount of VRAM to get LLMs working well locally. I don’t have a new enough GPU to be useful - my server just has the Intel iGPU, and my desktop PC only has a GTX1080, which is from before Nvidia added Tensor cores for AI.

@BennyInc@feddit.org

Thanks, I’ll look into it. For completionists: This is the article about how to properly archive paper: https://peelarchivesblog.com/2024/09/10/how-do-archivists-package-things-the-battle-of-the-boxes/

New to self-hosting

New to self-hosting

Selfhosted