[sdiy] Can anyone OCR the AN23.PDF File Here?
rsdio at audiobanshee.com
rsdio at audiobanshee.com
Sun Jul 2 07:25:04 CEST 2017
Hi Bernie. Thanks for everything.
I think that you'll have to provide the original bitmap image of the AN23 scan if anyone is going to get a better OCR of it. I'm not sure that anyone can make improvements from the resulting PDF (but I might be missing something).
By the way, never use JPEG compression or file formats for storage of line drawings or text, because JPEG adds artifacts that ruin the image. JPEG is great for color photos of realistic images, but it's horrible for pure black and white with sharp edges, as found in schematics and text. Those JPEG artifacts will make OCR astronomically more difficult than it already is, by nature. It's best to use a high scan resolution and a lossless compression scheme like the ones designed for fax. I think that someone else has already mentioned these guidelines earlier in the thread. From this high-resolution master, OCR is easier to pull off, and then the final PDF can include reduced resolution.
I worked on OCR back in the nineties. At that time, the software could automate the division of a page into columns and divide text from images. There was also the ability to give hints to the OCR system about where the columns should fall and what regions contained text. An important feature of the OCR software that I used was that it would ask for human help with certain characters that were difficult to discern, and it would learn as you went along. If it was confused between "cl" and "d" or "rn" versus "m" then you'd get a bitmap of the problem part of the input image, and then you could type in the correct characters. Over time, the software got smarter about the particular artifacts of your document.
These days, I've only briefly used the Adobe PDF tools for OCR, and I'm not sure how to access these highly-detailed aspects of the process. I fear that OCR has been made easier by hiding the details, but that the quality might suffer as a result. It might be worth investing in training or better software that can handle the task.
Brian
On Jul 1, 2017, at 9:34 PM, Bernard Arthur Hutchins Jr <bah13 at cornell.edu> wrote:
> Here is a brand new Electronotes webnote with regard to digital conversion. It includes a test PDF scan of an app note (AN23) with my failure to get anything usable baring extensive editing. Can anyone get an acceptable automated conversion?
>
> http://electronotes.netfirms.com/ENWN49.pdf
>
> Thanks -Bernie
>
More information about the Synth-diy
mailing list