[sdiy] Can anyone OCR the AN23.PDF File Here?
Joel B
onephatcat at earthlink.net
Thu Jul 6 05:41:41 CEST 2017
Why not just scan, and do a come-what-may OCR just for full text indexing - if it picks up a diagram and thinks that is a word, who cares, maybe someone will find something cool via typo they didn't expect. No human intervention, just index words to the page. Highly Imperfect but still super useful.
Joel
Sent from my iPhone
> On Jul 5, 2017, at 5:42 PM, Bernard Arthur Hutchins Jr <bah13 at cornell.edu> wrote:
>
>
> Tkanks Rob -
>
> But a manual identifications and 5 minutes/page is no good for the small improvement. Still months of 8-hour days to do 6000 pages. My PDF is still much better already. The equations are still unusable. It makes the same text errors, apparently. Why not just say it just can't do this? Wasn't intended to.
>
> Thanks for trying - useful data point!
>
> Bernie
>
> From: Rob Kam <robkam at ymail.com>
> Sent: Wednesday, July 5, 2017 6:47 PM
> To: Bernard Arthur Hutchins Jr; mskala at ansuz.sooke.bc.ca
> Cc: synth-diy at synth-diy.org
> Subject: RE: [sdiy] Can anyone OCR the AN23.PDF File Here?
>
> Hi Bernie,
>
>
> At http://www.sdiy.info/AN23.rtf this took 10 minutes to OCR with ABBYY FineReader 12, first manually identifying areas of text vs. images. Obviously it still needs further corrections.
>
> Rob
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://synth-diy.org/pipermail/synth-diy/attachments/20170705/cbeb2237/attachment.htm>
More information about the Synth-diy
mailing list