this post was submitted on 18 Dec 2023
27 points (96.6% liked)

Selfhosted

40095 readers
829 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I'm looking for a service I could install to archive a huge pile of letters, preferably in PDF form, to a database. I'm living in a country where paper is still king, and digital services are either non-existent, or loathed (Germany). My current situation is that I have a mailbox with lots of PDFs all over the place, but also many folders of paper sent in 2007 etc. that I have to keep, but I also have to find them every five years or so.

So what I'd like to have is a service to my homelab, where I could scan these and copy these, that would index them, clean them, OCR them and all that good stuff. It should have really good metadata abilities, because my files are usually named in a very random way, so if I could copy these, and quickly categorize them, that would be really awesome.

There is one service called Papermerge, that kind of fits to my use-case. I spent one afternoon with it, and there were a few issues:

  • crashes quite often
  • when sending a large folder of PDFs, uses all the CPU and crashes again
  • categorizing functions are not very good, it takes time to get everything together and clean when organizing files

This might not be very interesting if your country has digital services for everything, but for us needing to suffer this paper madness, a service to do so would be great.

top 14 comments
sorted by: hot top controversial new old
[–] zzzz@lemmy.world 29 points 10 months ago (2 children)
[–] Nomad64@lemmy.world 8 points 10 months ago

I second Paperless NGX. I have been using it for a few years, and it has been working great!

[–] angelsomething@lemmy.one 2 points 10 months ago

This is the correct and only answer :)

[–] tofubl@discuss.tchncs.de 26 points 10 months ago (1 children)

Paperless-NGX

This ticks all your boxes. It's really good.

[–] tofubl@discuss.tchncs.de 4 points 10 months ago* (last edited 10 months ago) (1 children)

The killer feature for me is my networked scanner scanning directly to the paperless consume samba share and the documents just popping up in the inbox fully OCRd and pre-categorised. Pretty magical.

NB, the docs make it sound like a proper DB is optional, but it's really not. Performance was iffy for me with sqlite but is rock solid with Postgres.

[–] pimeys@lemmy.nauk.io 2 points 10 months ago (1 children)

This was it for me now, installed paperless-xng, set it up to scan my email folders, copied all random PDFs from my "organized" tax folder and scanned the rest.

Too bad I just happen to have that Brother printer/scanner without SMB or FTP support. So I need to go through the process of scanning on my computer first, then uploading.

[–] timbuck2themoon@sh.itjust.works 2 points 10 months ago (1 children)

If it can email you can send it to an email address and paperless can automatically grab it and archive it.

[–] pimeys@lemmy.nauk.io 1 points 10 months ago

I've been digging into the settings of this printer and, sadly the only send it can do is as a fax... It's the entry model, been serving us for years very nicely. It even connects to the internet, but misses features such as email, smb or ftp. For me this looks like something an open source firmware could fix. It has enough processing power to possibly run a lightweight Linux distribution, so installing one that would enable modern communication protocols doesn't seem impossible.

[–] Kata1yst@kbin.social 3 points 10 months ago

I've had excellent luck with Docspell. https://github.com/eikek/docspell

[–] DarkDarkHouse@lemmy.sdf.org 1 points 10 months ago
[–] rentar42@kbin.social 1 points 10 months ago (1 children)

Note that just because everything is digital doesn't mean something like that isn't necessary: If you depend on your service provider to keep all of your records then you will be out of luck once they ... stop liking you, go out of business, have a technical malfunction, decide they no longer want to keep any records older than X years, ...

So even in a all-digital world I'd still keep all the PDF artifacts in something like that.

And I also second the suggestion of paperless-ngx (even though I'm not using it for very long yet, but it's working great so far).

[–] pimeys@lemmy.nauk.io 1 points 10 months ago

Of course. My setup now is a Proxmox server + a NAS. What I'm planning to do is to install a service for this to Proxmox, then have the files synced over NFS to the NAS, which then backs them up every night to Backblaze. And of course I need to have the paper copies too, but to be able to search, tag and archive the documents is great when you need to remember a thing X that was mentioned in a paper I got back in 2014.

[–] padook@feddit.nl 1 points 10 months ago

I've been pretty happy with paperless-ngx, it should tick all your boxes

[–] nutshell7827@lemmy.world 1 points 10 months ago