<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="color: rgb(0, 0, 0); font-family: Arial, Helvetica, sans-serif, EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols, EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols;" dir="ltr">
<p style="font-size: 10pt;"><br>
</p>
<br>
<br>
<div style="color: rgb(0, 0, 0);">
<div style="font-size: 10pt;">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> rsdio@audiobanshee.com <rsdio@audiobanshee.com><br>
<b>Sent:</b> Thursday, July 6, 2017 4:15 PM<br>
<b>To:</b> Bernard Arthur Hutchins Jr<br>
<b>Cc:</b> synth-diy@synth-diy.org List<br>
<b>Subject:</b> Re: [sdiy] Can anyone OCR the AN23.PDF File Here?</font>
<div> </div>
</div>
</div>
<font>
<div class="PlainText" style="font-size: 10pt;"><br>
On Jul 5, 2017, at 5:21 PM, Bernard Arthur Hutchins Jr <bah13@cornell.edu> wrote:<br>
> Thanks Brian. Well there is no original scan. The original is typewrite and pen drawings on white. It looks a GREAT deal like what you would get if you print out my 300 dpi PDF. The text is picked up quite well even from this scan of a copy. The figures
and the equations are a complete failure.<br>
I'm thinking like an engineer - sorry. At some point in the process of creating your 300 dpi PDF, there must have existed a digital bitmap of the original page. The fact that it was not saved as a separate file, distinct from the PDF, is a consequence of the
new, simplified world we live in now where everything is made accessible to non-engineers, and quality suffers because the details are hidden from us or lost.</div>
<div class="PlainText" style="font-size: 10pt;">**********************************************************</div>
<div class="PlainText" style="font-size: 10pt;">Please see comments to Bruno</div>
<div class="PlainText" style="font-size: 10pt;">***********************************************************</div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
<br>
If you really want to make a fair challenge that a professional could live up to, then make a quality scan of those two original pages and submit it to the audience. Not all OCR software takes a PDF as an input, and as you complained in your response to Rob's
efforts, the quality of the drawings is degraded by printing the PDF and then scanning it a second time. That's called generational loss.</div>
<div class="PlainText" style="font-size: 10pt;">************************************************************</div>
<div class="PlainText" style="font-size: 10pt;">I wasn't complaining - I suggested printing it out and scanning back in. This was because it would compare to what would happen if someone scanned a copy (generation) that was sold. </div>
<div class="PlainText" style="font-size: 10pt;">*******************************************************************</div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
<br>
<br>
> Who would proofread the OCRs? Except for the 6000 pages (!!!) I might be able to do the job and the average reader here might be of some use. Ultimately, it gets down to a word-by-word and sometimes character-by-character comparison. What? 3 months of
8 hour days? Something like that. Sorry.<br>
I realize that this part of your response came before my original solution. Who would proofread the OCRs? A crowd-sourced group of people could both submit corrections and vet them. Admittedly, that would probably require coordinating software like a wiki that
we don't have now, and the group would probably be much smaller than most crowd-sourced efforts.</div>
<div class="PlainText" style="font-size: 10pt;">***************************************************</div>
<div class="PlainText" style="font-size: 10pt;">Sounds unlikely given that I didn't even get two pages proofed. Do you want to try - even though I already posted the answers! (Ooooops! I just noticed - you tried and failed miserably)</div>
<div class="PlainText" style="font-size: 10pt;">*********************************************************************<br>
<br>
<br>
> There have been plenty of volunteers for the automated scan and OCR steps. I'm wondering if it would be possible to use a sort of "cloud sourced" document review (to borrow some terminology) so that a handful of volunteers could submit corrections. Of course,
even with plenty of volunteers, this would still be a significant effort, if only in coordinating all the submissions to make sure they're benefitting the total effort.<br>
> ***********************************************************************************<br>
> All this for a clearer font? Do you really think so?<br>
> ***********************************************************************************<br>
The goal is not a clearer font, but more accurate text. The easy part is scanning the pages into digital format because it can be automated. The hard part is training the OCR and correcting any remaining mistakes. The latter part could be distributed so that
it's not all on one person.<br>
********************************************</div>
<div class="PlainText" style="font-size: 10pt;">The scanning is NOT easy as I have said. How am I going to get anyone to proof the OCRs? That was hard for me - quite impossible for you apparently. </div>
<div class="PlainText" style="font-size: 10pt;">********************************************************************<br>
<br>
> The least amount of effort is NOT doing it. Leave well-enough alone. ENWN-49.<br>
Fair enough. I think that what we're dealing with here is a group - myself included - who are used to doing what it takes to maintain vintage analog electronics for the sake of posterity, and who don't want to lose valuable historical information. Doing nothing
is certainly an option, but it's still disappointing.<br>
<br>
> Which is why someone needs to submit a business plan and put up money (or admit that it is NOT viable). The surest way to assure that a project will not get done is to NOT try to make money and rely on volunteers. Not my time, and certainly not my money.<br>
Also fair, but there are exceptions. Wikipedia is an example of what a huge number of volunteers can accomplish. Despite the vandalism, which wouldn't be possible if volunteers weren't part of the system, it still works because other volunteers clean up after
the vandals. What I'm suggesting would be not as open as Wikipedia, but could still be successful.<br>
<br>
> Did you note that my PDF is already searchable?<br>
Yes!<br>
<br>
There were only two or three mistakes in the OCR. One missed a space between two words, which created a fictitious word, "jumpingoff" (not likely to be a problem). Another OCR error turned "transconductance and can be seen" into "transconductand can be ance
seen" - looks like the OCR was confused by the hyphenated word and also somehow lost track of which line it was on in the middle of a sentence. If you double-click on "transconduct- in the PDF, Acrobat will highlight "and" on the second line instead of "ance"
- this is the kind of cleanup that is important, but would take a lot of time.<br>
***********************************************************</div>
<div class="PlainText"><span style="font-size: 10pt;">I'm astounded now. You didn't even try - did you Brian. There are 25 errors, half of them quite serious. Read my solution - and then apologize. I am very much afraid your carelessness is typical of
what I would get from volunteers. You should have done your homework and THEN spoken up. </span><span style="font-size: 13.3333px;"> </span></div>
<div class="PlainText" style="font-size: 10pt;"><span style="font-size: 10pt;"><br>
</span></div>
<div class="PlainText" style="font-size: 10pt;"><span style="font-size: 10pt;">Bernie</span></div>
<div class="PlainText" style="font-size: 10pt;"><br>
</div>
<div class="PlainText" style="font-size: 10pt;"><br>
Brian<br>
<br>
</div>
</font></div>
</div>
</body>
</html>