[sdiy] re-publishing typewritten material

Michael E Caloroso mec.forumreader at gmail.com
Thu Nov 12 22:22:17 CET 2020


Good tips, thanks.

Fonts can impact the accuracy of OCR too.

Headhunters scan resumes for employers.  During my job hunt I was
directed to use Times Roman font for my resume because that works best
with OCR applications.

I don't think any two publishers use the same font.  That presents a
challenge when scanning legacy documents and magazines.

MC

On 11/11/20, KA4HJH <ka4hjh at gmail.com> wrote:
> This thread has gotten so long that I can't begin to reply to everything
> in-line. At the risk of repeating what others have said here and in earlier
> discussions (always about Electronotes), here are some things that I've
> learned about turning print documents into pdfs over the decades...
>
> Years ago, Skeptical Inquirer digitized the first 25 or so volumes of the
> magazine. Whoever did it was a pro and I'm sure it wasn't cheap. Everything
> is, relatively speaking, beautiful. I paid $100 for the indexed, fully
> searchable archive on a DVD-ROM.
>
> OCRing text is an art form in itself. I've tried all of the Mac OCR apps and
> FineReader is probably the best of the lot. For a technical publication it
> has to be trained to recognize oddball glyphs like "±" and so on.
>
> Having a really good original helps but if it's been printed double-sided
> there will always be bleed-through from the back side. Getting a good OCR of
> the text will be trickier but for illustrations it can turn into a lot of
> time-consuming sh*twork. Also, OCR apps typically complain about scans
> larger than 150DPI. IOW, they work best with fairly lo-res scans.
>
> There are two kinds of illustrations: photos and line art (schematics, block
> diagrams). Black & white printed photos should be scanned as 8 bit gray at
> 300DPI, de-screened, and saved as jpeg. Line art should be scanned at 600
> DPI monochrome (black and white) and saved as CCITT. Schematics with very
> small text or other very fine details may require 1200DPI.
>
> The upshot of all this is that a document on electronics will require three
> separate scans if a page has all three on it. You can try doing a single
> hi-res, lossless scan but you'll still have to process the mega-scan itself
> three different ways afterwards. Regardless, afterwards you have to put all
> of the pieces back together.
>
> A little-know feature of Acrobat is that it supports "articling". This links
> text flows together so that every time you click on a column of text it
> scrolls down one page. When it gets to the bottom of the column it jumps up
> to the top of the next one. This makes reading a pdf much easier.
>
> As an experiment I re-created some printed articles from scratch. I OCRed
> the text and re-set it with the same typeface as the original in InDesign.
> Then I scanned the photos and line art as described above, cleaned them up,
> inserted them into the InDesign document and ran the finished document
> through Acrobat.
>
> Here are some examples that I've shared multiple times before—see the linked
> pdf files in these blog posts:
>
> https://www.astarcloseup.com/2016/02/the-oscilloscope-artist.html
>
> https://www.astarcloseup.com/2016/10/craig-andertons-multiple-identity-filter.html
>
>
> Note that I scanned the line art as 8 bit gray TIFFs at 150DPI. I think that
> was the maximum resolution of the scanner I had back then.
>
>
> On an unrelated note, I have a 5000DPI slide scanner that can save each scan
> as a RAW file just like a digital camera. The proprietary dust & scratch
> removal software is in the scanner itself and I can run the RAW files back
> through the scanner at different settings to get the best possible finished
> scan.
>
>
>> On Nov 10, 2020, at 8:47 AM, mskala at ansuz.sooke.bc.ca wrote:
>>
>> In more detail:  I was reprinting an academic work.  It was important for
>> the page numbers to remain the same as in the first printing so that
>> references wouldn't break.
>
> Stupid. Really, REALLY stupid. I've created documents with that kind of page
> numbering. There's no excuse for this, period.
>
>
>
> Terry Bowman, KA4HJH
> "The Mac Doctor"
>
> "By the end of Chuck Statler's 'Rock Videos' of Devo we agreed that even if
> Devo did not take the stage it was still the best concert any of us had ever
> attended." --Kim Thayil (Soundgarden), 1995
>
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
> Selling or trading? Use marketplace at synth-diy.org
>




More information about the Synth-diy mailing list