<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>You're right about that Dave. You don't need to be able to search
every single word per page. That is why a good taxonomical index
is the way to go.</p>
<p>Mike<br>
</p>
<br>
<div class="moz-cite-prefix">On 7/6/2017 4:32 PM, Dave Magnuson
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:014101d2f697$031f4040$095dc0c0$@dmdrafting.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"Segoe UI Light";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:"Lucida Console";
panose-1:2 11 6 9 4 5 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.m-7117098588547062152m6399548960396420844yiv0105428593spelle
{mso-style-name:m_-7117098588547062152m_6399548960396420844yiv0105428593spelle;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Is
it possible to leave the scans as images and just have
someone add some sort of metadata to each page instead?
Then you’d be searching through perhaps a dozen or two
“keywords” per page instead of the actual document text.
<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Just
a thought… I’ve had terrible luck with OCR myself, except
on the most simple of scans.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Dave
<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
Synth-diy [<a class="moz-txt-link-freetext" href="mailto:synth-diy-bounces@synth-diy.org">mailto:synth-diy-bounces@synth-diy.org</a>] <b>On
Behalf Of </b>Bruno Afonso<br>
<b>Sent:</b> Thursday, July 6, 2017 3:42 PM<br>
<b>To:</b> Bernard Arthur Hutchins Jr; Rob Kam<br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:synth-diy@synth-diy.org">synth-diy@synth-diy.org</a><br>
<b>Subject:</b> Re: [sdiy] Can anyone OCR the AN23.PDF File
Here?<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Bernie, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I'd be happy to have a go using open
source software such as tesseract. I feel you cannot
tackle this problem without further teaching the
classifier for the nuances of this text and constraining
of what the possibilities are. I'd like you to rescan a
representative item into tiffs at 300 or 600dpi. You
stated that the tiffs would look the same but that is not
true. The tiff exports out of AN23.pdf do not have the
same quality of the original scanned image (likely tiffs),
and this does not help.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Is the end goal to replace the
original image text with just text using a similar font?
For me the most useful would be to have OCR'ed most of
it so it's searchable. But again, you have not set what
you find acceptable or what your goal is. It's ok if you
don't know. You either propose this as a challenge that
will never be possible to accomplish (some teachers
never want to give students top score) or you compromise
and propose a solution that enhances your original pdfs
with value worth money for most people. I find most
people would be happy keeping the original scanned text
and simply having it OCR'ed to the best possible for
their cursory searches. But I may see things different
than most people :) In the academic lingo you should
provide some ground-truth examples of what you imagine
is a perfect conversion of the AN23.pdf, acceptable or
not worth the time. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">You can never rely on ONE
volunteer, but you can certainly get many excited so
over time as a group something is accomplished. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers<o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal">b<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Thu, Jul 6, 2017 at 3:11 PM
Bernard Arthur Hutchins Jr <<a
href="mailto:bah13@cornell.edu" target="_blank"
moz-do-not-send="true">bah13@cornell.edu</a>>
wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divtagdefaultwrapper">
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Thanks
Rob -<o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Really
makes my point, and I guess I should not rely on
volunteers! I don't blame you one bit - just
does not work.<o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">I
expect no one else want to try either. If
anyone does, don't look at the crib below until
after you try. Errors located and circled in
red. <o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="http://electronotes.netfirms.com/AN23Rob.PDF" target="_blank"
id="m_-7117098588547062152m_6399548960396420844LPlnk587417"
moz-do-not-send="true">http://electronotes.netfirms.com/AN23Rob.PDF</a><o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Please
all, let's agree that the OCR issue is bogus as
applied here.<o:p></o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Bernie<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<div>
<div class="MsoNormal" style="text-align:center"
align="center"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<hr size="2" align="center" width="98%"></span></div>
<div
id="m_-7117098588547062152m_6399548960396420844divRplyFwdMsg">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">
Rob Kam <<a
href="mailto:robkam@ymail.com"
target="_blank" moz-do-not-send="true">robkam@ymail.com</a>><br>
<b>Sent:</b> Thursday, July 6, 2017 1:51 PM</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divtagdefaultwrapper">
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divRplyFwdMsg">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"><br>
<b>To:</b> Bernard Arthur Hutchins Jr<br>
<b>Cc:</b> <a
href="mailto:synth-diy@synth-diy.org"
target="_blank" moz-do-not-send="true">synth-diy@synth-diy.org</a><br>
<b>Subject:</b> Re: [sdiy] Can anyone OCR
the AN23.PDF File Here?</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divtagdefaultwrapper">
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divRplyFwdMsg">
<div>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7157">
<p class="MsoNormal"
style="background:white"><span
style="font-family:"Lucida
Console";color:black">Thanks for
the challenge Bernie but no thanks. I
don't have the patience to correct the
OCR.<br>
<br>
Rob<o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7155">
<p class="MsoNormal"
style="background:white"><span
style="font-family:"Lucida
Console";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7036">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7035">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7034">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7153">
<div class="MsoNormal"
style="text-align:center;background:white"
align="center"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<hr size="1" align="center"
width="100%"></span></div>
<p class="MsoNormal"
style="background:white"><b
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7151"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">From:</span></b><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
Bernard Arthur Hutchins Jr <<a
href="mailto:bah13@cornell.edu"
target="_blank"
moz-do-not-send="true">bah13@cornell.edu</a>><br>
<b>To:</b> Rob Kam <<a
href="mailto:robkam@ymail.com"
target="_blank"
moz-do-not-send="true">robkam@ymail.com</a>>
<br>
<b>Cc:</b> "<a
href="mailto:synth-diy@synth-diy.org"
target="_blank"
moz-do-not-send="true">synth-diy@synth-diy.org</a>"
<<a
href="mailto:synth-diy@synth-diy.org"
target="_blank"
moz-do-not-send="true">synth-diy@synth-diy.org</a>><br>
<b>Sent:</b> Thursday, 6 July
2017, 18:30</span><span
style="font-family:"Helvetica","sans-serif";color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divtagdefaultwrapper">
<div>
<div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7036">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7035">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7034">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7153">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><br>
<b>Subject:</b> Re: [sdiy] Can
anyone OCR the AN23.PDF File Here?</span><span
style="font-family:"Helvetica","sans-serif";color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844divtagdefaultwrapper">
<div>
<div>
<div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7036">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7035">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7034">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7033">
<p class="MsoNormal"
style="background:white"><span
style="font-family:"Helvetica","sans-serif";color:black"><o:p> </o:p></span></p>
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7032">
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593divtagdefaultwrapper">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7161">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Thanks
Rob -<o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7162">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7163">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">True
- the equations are now
usable, but slightly more
blurred than my original
PDF. Likewise, the figures
are now OK but of slightly
lower quality, which does
NOT matter much for hand
drawings. <o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7305">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7306">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">I
did note a lot of
OCR misreads in the text.
A careful proofing of the
text took me 18 minutes
and there are 25 errors,
some not at all obscure,
and about 13 of which I
had to look at the
original to see what they
were supposed to be. (One
was hard to detect since
it substituted an Rf for
an Ri, a disaster). A
full proofread/correction
would take at least
30 minutes (188 eight-hour
days for 6000 pages).
And I wrote this! Almost
certainly a volunteer
would have more trouble
and miss errors.<o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7236">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7235">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">In
the spirit of no good deed
going unpunished, Rob, let
me put you on the spot.
Take your scan, find and
fix the 25 errors. Let us
know how easy/hard this
was and the time it took,
and show your results. <o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7147">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7042">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">I
will post the "solution"
to the "find the errors"
this evening if I get the
chance.<o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7031">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7309">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Since
there is no improvement in
the figures/equations, and
the text is a serious
downgrade, tell me again
(anyone) why an OCR/ebook
is a good idea here. <o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7310">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7311">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Bernie<o:p></o:p></span></p>
</div>
<p class="MsoNormal"
style="margin-bottom:12.0pt;background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7230">
<div class="MsoNormal"
style="text-align:center;background:white"
align="center"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<hr size="2"
align="center"
width="98%"></span></div>
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593divRplyFwdMsg">
<p class="MsoNormal"
style="background:white"><b><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">
Rob Kam <<a
href="mailto:robkam@ymail.com"
target="_blank"
moz-do-not-send="true">robkam@ymail.com</a>><br>
<b>Sent:</b> Thursday,
July 6, 2017 7:24 AM<br>
<b>To:</b> Bernard
Arthur Hutchins Jr<br>
<b>Cc:</b> <a
href="mailto:synth-diy@synth-diy.org"
target="_blank"
moz-do-not-send="true">synth-diy@synth-diy.org</a><br>
<b>Subject:</b> RE:
[sdiy] Can anyone OCR
the AN23.PDF File Here?</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<o:p></o:p></span></p>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7314">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7229">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7228">
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7227">
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">There’s
a second attempt at
<a
href="http://www.sdiy.info/AN23b.rtf"
target="_blank"
id="m_-7117098588547062152m_6399548960396420844LPlnk72125"
moz-do-not-send="true"><span
style="font-size:12.0pt;color:#1F497D;text-decoration:none">http://www.sdiy.info/AN23b.rtf</span></a>
converting the
equations to images
instead, (and still
manually tweaking
the OCR). It took
six minutes to do
from the scan/PDF
and the text still
needs comparing and
correcting against
the original.</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7315">
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yui_3_16_0_ym19_1_1499355715660_7317">
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">There
are already experts
at this sort of
project, at
Archive.org who have
been doing this for
a number of years <a
href="https://archive.org/details/texts&tab=about" target="_blank"
id="m_-7117098588547062152m_6399548960396420844LPlnk637705"
moz-do-not-send="true"><span
style="font-size:12.0pt;color:#1F497D;text-decoration:none">https://archive.org/details/texts&tab=about</span></a>
</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
<div
style="margin-bottom:15.0pt;overflow:auto"
id="m_-7117098588547062152m_6399548960396420844LPBorder_GT_14993670741300.17156451145104">
<table
class="MsoNormalTable"
style="width:90.0%;background:white;border-top:dotted #C8C8C8
1.0pt;border-left:none;border-bottom:dotted
#C8C8C8
1.0pt;border-right:none"
cellspacing="0"
cellpadding="0"
width="90%"
border="1">
<tbody>
<tr>
<td
style="border:none;padding:0in
0in 0in 0in"
valign="top">
<div
id="m_-7117098588547062152m_6399548960396420844LPTitle_14993670741270.29065428092860457">
<p
class="MsoNormal"
style="margin-top:15.0pt;mso-line-height-alt:15.75pt"><span
style="font-size:16.0pt;font-family:"Segoe
UI
Light","sans-serif";color:#B31B1B"><a
href="https://archive.org/details/texts&tab=about" target="_blank"
moz-do-not-send="true"><span
style="text-decoration:none">Free Books : Download & Streaming :
eBooks and
Texts ...</span></a><o:p></o:p></span></p>
</div>
<div
style="margin-top:7.5pt;margin-bottom:12.0pt"
id="m_-7117098588547062152m_6399548960396420844LPMetadata_14993670741280.7021146677640517">
<p
class="MsoNormal"
style="margin-top:15.0pt;line-height:10.5pt"><span
style="font-size:10.5pt;font-family:"Segoe
UI","sans-serif";color:#666666"><a
href="http://archive.org"
target="_blank" moz-do-not-send="true">archive.org</a><o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844LPDescription_14993670741290.10828915176730747">
<p
class="MsoNormal"
style="margin-top:15.0pt;line-height:15.0pt"><span
style="font-size:10.5pt;font-family:"Segoe
UI","sans-serif";color:#666666">The Internet Archive
offers over
12,000,000
freely
downloadable
books and
texts. There
is also a
collection of
550,000 modern
eBooks that
may be
borrowed by
anyone ...<o:p></o:p></span></p>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
<div
style="margin-bottom:15.0pt;overflow:auto"
id="m_-7117098588547062152m_6399548960396420844yiv0105428593LPBorder_GT_14993575599660.8014538408476546">
<table
class="MsoNormalTable"
style="width:90.0%;background:white;border-top:dotted #C8C8C8
1.0pt;border-left:none;border-bottom:dotted
#C8C8C8
1.0pt;border-right:none"
cellspacing="0"
cellpadding="0"
width="90%" border="1">
<tbody>
<tr>
<td
style="border:none;padding:0in
0in 0in 0in"
valign="top">
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593LPTitle_14993575599620.797908623843816">
<p
class="MsoNormal"
style="margin-top:15.0pt;mso-line-height-alt:15.75pt"><span
style="font-size:16.0pt;font-family:"Segoe
UI
Light","sans-serif";color:#B31B1B"><a
href="https://archive.org/details/texts&tab=about" target="_blank"
moz-do-not-send="true"><span
style="text-decoration:none">Free Books : Download & Streaming :
eBooks and
Texts ...</span></a><o:p></o:p></span></p>
</div>
<div
style="margin-top:7.5pt;margin-bottom:12.0pt"
id="m_-7117098588547062152m_6399548960396420844yiv0105428593LPMetadata_14993575599640.4270309974458135">
<p
class="MsoNormal"
style="margin-top:15.0pt;line-height:10.5pt"><span
style="font-size:10.5pt;font-family:"Segoe
UI","sans-serif";color:#666666"><a
href="http://archive.org"
target="_blank" moz-do-not-send="true">archive.org</a><o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593LPDescription_14993575599650.8568978319302971">
<p
class="MsoNormal"
style="margin-top:15.0pt;line-height:15.0pt"><span
style="font-size:10.5pt;font-family:"Segoe
UI","sans-serif";color:#666666">The Internet Archive
offers over
12,000,000
freely
downloadable
books and
texts. There
is also a
collection of
550,000 modern
eBooks that
may be
borrowed by
anyone ...<o:p></o:p></span></p>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><br>
<br>
<br>
To put my two
cents in, the synth
DIY community should
see whether they are
able to raise the
funds to compensate
(against unsold
hardcopy, <span
class="m-7117098588547062152m6399548960396420844yiv0105428593spelle">ebooks</span>
etc.) for releasing <span
class="m-7117098588547062152m6399548960396420844yiv0105428593spelle">Electronotes</span>
under a non-commercial
Creative Commons
licence <a
href="https://creativecommons.org/licenses/by-nc/2.0/uk/"
target="_blank"
moz-do-not-send="true">https://creativecommons.org/licenses/by-nc/2.0/uk/</a>
<o:p></o:p></span></p>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">Rob</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div>
<div
style="border:none;border-top:solid
#E1E1E1
1.0pt;padding:3.0pt
0in 0in 0in">
<div>
<p class="MsoNormal"
style="background:white"><b><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black">From:</span></b><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black">
Bernard Arthur
Hutchins Jr
[mailto:<a
href="mailto:bah13@cornell.edu"
target="_blank" moz-do-not-send="true">bah13@cornell.edu</a>] <br>
<b>Sent:</b> 06
July 2017 01:42<br>
<b>To:</b> Rob
Kam <<a
href="mailto:robkam@ymail.com"
target="_blank" moz-do-not-send="true">robkam@ymail.com</a>>; <a
href="mailto:mskala@ansuz.sooke.bc.ca"
target="_blank" moz-do-not-send="true">mskala@ansuz.sooke.bc.ca</a><br>
<b>Cc:</b> <a
href="mailto:synth-diy@synth-diy.org"
target="_blank" moz-do-not-send="true">synth-diy@synth-diy.org</a><br>
<b>Subject:</b>
Re: [sdiy] Can
anyone OCR the
AN23.PDF File
Here?</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593divtagdefaultwrapper">
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Tkanks
Rob - <o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">But
a manual
identifications
and 5
minutes/page is
no good for the
small
improvement.
Still months of
8-hour days to
do 6000 pages.
My PDF is still
much better
already. The
equations are
still unusable.
It makes the
same text
errors,
apparently.
Why not just
say it just
can't do this?
Wasn't intended
to. <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Thanks
for trying -
useful data
point! <o:p></o:p></span></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
<div>
<div
style="margin-bottom:12.0pt">
<p class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">Bernie<o:p></o:p></span></p>
</div>
<div>
<div
class="MsoNormal"
style="text-align:center;background:white" align="center"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<hr size="2"
align="center"
width="98%"></span></div>
<div
id="m_-7117098588547062152m_6399548960396420844yiv0105428593divRplyFwdMsg">
<div>
<p
class="MsoNormal"
style="background:white"><b><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black">From:</span></b><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black">
Rob Kam <</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="mailto:robkam@ymail.com" target="_blank" moz-do-not-send="true"><span
style="font-size:11.0pt">robkam@ymail.com</span></a></span><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black">><br>
<b>Sent:</b>
Wednesday,
July 5, 2017
6:47 PM<br>
<b>To:</b>
Bernard Arthur
Hutchins Jr; </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="mailto:mskala@ansuz.sooke.bc.ca" target="_blank"
moz-do-not-send="true"><span
style="font-size:11.0pt">mskala@ansuz.sooke.bc.ca</span></a></span><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black"><br>
<b>Cc:</b> </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="mailto:synth-diy@synth-diy.org" target="_blank"
moz-do-not-send="true"><span
style="font-size:11.0pt">synth-diy@synth-diy.org</span></a></span><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:black"><br>
<b>Subject:</b>
RE: [sdiy] Can
anyone OCR the
AN23.PDF File
Here?</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<o:p></o:p></span></p>
</div>
<div>
<div>
<p
class="MsoNormal"
style="background:white"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"> <o:p></o:p></span></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p
class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">Hi
Bernie,</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><o:p></o:p></span></p>
</div>
<div>
<p
class="MsoNormal"
style="background:white"><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D"><br>
At </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="http://www.sdiy.info/AN23.rtf" target="_blank"
id="m_-7117098588547062152m_6399548960396420844yiv0105428593LPlnk309394"
moz-do-not-send="true"><span style="font-size:11.0pt">http://www.sdiy.info/AN23.rtf</span></a></span><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">
this took 10
minutes to OCR
with </span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black"><a
href="https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiZhc6ZmPPUAhVG6RQKHRHpA1UQFggoMAA&url=http%3A%2F%2Fwww.abbyy.com%2Fen-gb%2Fsupport%2Ffinereader-12%2F&usg=AFQjCNHLOjsz219pjjTDqDytG2Cpm9N90w"
target="_blank" moz-do-not-send="true"><span
style="font-size:11.0pt;color:#1F497D;text-decoration:none">ABBYY
FineReader 12</span></a></span><span
style="font-size:11.0pt;font-family:"Arial","sans-serif";color:#1F497D">,
first manually
identifying
areas of text
vs. images.
Obviously it
still needs
further
corrections. <br>
<br>
Rob</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif";color:black">
<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"
style="margin-bottom:12.0pt;background:white"><span
style="font-family:"Helvetica","sans-serif";color:black"><o:p> </o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal">_______________________________________________<br>
Synth-diy mailing list<br>
<a href="mailto:Synth-diy@synth-diy.org"
target="_blank" moz-do-not-send="true">Synth-diy@synth-diy.org</a><br>
<a
href="http://synth-diy.org/mailman/listinfo/synth-diy"
target="_blank" moz-do-not-send="true">http://synth-diy.org/mailman/listinfo/synth-diy</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Synth-diy mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Synth-diy@synth-diy.org">Synth-diy@synth-diy.org</a>
<a class="moz-txt-link-freetext" href="http://synth-diy.org/mailman/listinfo/synth-diy">http://synth-diy.org/mailman/listinfo/synth-diy</a>
</pre>
</blockquote>
<br>
</body>
</html>