this post was submitted on 03 Oct 2023
1709 points (97.6% liked)
Technology
59653 readers
3289 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I mean, I'd really have to disagree, but that's fine.
The effort involved with deconstructing a book, batching it through a document scanner, and compiling it with OCR in a EBOOK-compatible format is not trivial. Most consumer-quality OCR software isn't even that great at recognizing words, new lines, symbols, and hyphenated and line-broken words, let alone recognizing chapters, indexes, footnotes, ect. It's just not something that would be worthwhile for what it produces in the end, and there are millions more print titles than there are movie and show titles.
On the other hand, with A/V there's almost always a way to pass playback through a virtual media capture device. Worst-case you have to wait the real run-time in order to capture it, but at the end you at least have a near-original quality file.
If tomorrow all EBOOKs got locked down without a means to strip DRM, I don't think anyone outside of historical archivists would start spending their time manually cataloguing copyrighted hard copy books to distribute freely. Best-case, only the highest-demanded books would justify that amount of effort, and certainly not enough books to sustain a digital library worth frequenting.
Historically speaking, people have gone to the trouble of manually digitizing hard copy books to distribute freely. There were digital copies of print books available online (if you knew where to look) before e-books were officially available for sale in any form. That includes mass-market novels as well as items of interest to historians. Ergo, your scepticism seems entirely unjustified.
OCR is far from perfect (though editing OCR output is generally faster than retyping), but even without it we have the storage and bandwidth these days to distribute full books as stacks of images if needed, without converting them to text. The same way people distribute scans of comics/manga.