[sdiy] Can anyone OCR the AN23.PDF File Here?

rsdio at audiobanshee.com rsdio at audiobanshee.com
Thu Jul 6 21:47:34 CEST 2017


Thanks for the thought, Joel, but a highly-flawed OCR is actually worse than none at all. Bernie has already given a specific example of the disastrous results. I'm not saying that all of the text in the diagrams has to be converted, but that the places in the main body of the text that refer to the schematics must be accurate, or else the circuits won't make any sense at all. I'm also not saying that the OCR has to be 100% perfect - nothing we humans create is ever totally perfect - but it absolutely has to be a lot better than "come-what-may" in quality.

By the way, I misused the term in my previous reply. It's not called "cloud sourcing" but is supposed to be "crowd sourcing." In other words, getting the OCR right takes a lot of work, but that work could be spread out over several people instead of requiring one person to do thousands of pages. Think about the way a wiki works or any other distributed system. Granted, it would be difficult to do this given the smaller number of electronics savvy volunteers and the commercial nature of the project, but I present it as an idea of how to think outside of the box for solutions to a difficult problem.

Brian


On Jul 5, 2017, at 8:41 PM, Joel B wrote:
> Why not just scan, and do a come-what-may OCR just for full text indexing - if it picks up a diagram and thinks that is a word, who cares, maybe someone will find something cool via typo they didn't expect. No human intervention, just index words to the page. Highly Imperfect but still super useful. 
> 




More information about the Synth-diy mailing list