this post was submitted on 11 Mar 2025
49 points (100.0% liked)
Technology
38230 readers
494 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The accuracy rate of even the best OCR software is far, far too low for a wide array of potential use cases.
Let's say I have an archive of a few thousand scientific papers. These are neatly formatted digital documents, not even scanned images (though "scanned images" would be within scope of this task and should not be ignored). Even for that, there's nothing out there that can produce reliably accurate results. Everything requires painstaking validation and correction if you really care about accuracy.
Even ArXiv can't do a perfect job of this. They launched their "beta" HTML converter a couple years ago. Improving accuracy and reliability is an ongoing challenge. And that's with the help or LaTeX source material! It would naturally be much, much harder if they had to rely solely on the PDFs generated from that LaTeX. See: https://info.arxiv.org/about/accessible_HTML.html
As for solving this problem with "AI"...uh...well, it's not like "OCR" and "AI" are mutually exclusive terms. OCR tools have been using neural networks for a very long time already, it just wasn't a buzzword back then so nobody called it "AI". However, in the current landscape of "AI" in 2025, "accuracy" is usually just a happy accident. It doesn't need to be that way, and I'm sure the folks behind commercial and open-source OCR tools are hard at work implementing new technology in a way that Doesn't Suck.
I've played around with various VL models and they still seem to be in the "proof of concept" phase.