[sdiy] re-publishing typewritten material

KA4HJH ka4hjh at gmail.com
Thu Nov 12 01:16:41 CET 2020


This thread has gotten so long that I can't begin to reply to everything in-line. At the risk of repeating what others have said here and in earlier discussions (always about Electronotes), here are some things that I've learned about turning print documents into pdfs over the decades...

Years ago, Skeptical Inquirer digitized the first 25 or so volumes of the magazine. Whoever did it was a pro and I'm sure it wasn't cheap. Everything is, relatively speaking, beautiful. I paid $100 for the indexed, fully searchable archive on a DVD-ROM.

OCRing text is an art form in itself. I've tried all of the Mac OCR apps and FineReader is probably the best of the lot. For a technical publication it has to be trained to recognize oddball glyphs like "±" and so on. 

Having a really good original helps but if it's been printed double-sided there will always be bleed-through from the back side. Getting a good OCR of the text will be trickier but for illustrations it can turn into a lot of time-consuming sh*twork. Also, OCR apps typically complain about scans larger than 150DPI. IOW, they work best with fairly lo-res scans.

There are two kinds of illustrations: photos and line art (schematics, block diagrams). Black & white printed photos should be scanned as 8 bit gray at 300DPI, de-screened, and saved as jpeg. Line art should be scanned at 600 DPI monochrome (black and white) and saved as CCITT. Schematics with very small text or other very fine details may require 1200DPI.

The upshot of all this is that a document on electronics will require three separate scans if a page has all three on it. You can try doing a single hi-res, lossless scan but you'll still have to process the mega-scan itself three different ways afterwards. Regardless, afterwards you have to put all of the pieces back together.

A little-know feature of Acrobat is that it supports "articling". This links text flows together so that every time you click on a column of text it scrolls down one page. When it gets to the bottom of the column it jumps up to the top of the next one. This makes reading a pdf much easier.

As an experiment I re-created some printed articles from scratch. I OCRed the text and re-set it with the same typeface as the original in InDesign. Then I scanned the photos and line art as described above, cleaned them up, inserted them into the InDesign document and ran the finished document through Acrobat. 

Here are some examples that I've shared multiple times before—see the linked pdf files in these blog posts:

https://www.astarcloseup.com/2016/02/the-oscilloscope-artist.html

https://www.astarcloseup.com/2016/10/craig-andertons-multiple-identity-filter.html


Note that I scanned the line art as 8 bit gray TIFFs at 150DPI. I think that was the maximum resolution of the scanner I had back then.


On an unrelated note, I have a 5000DPI slide scanner that can save each scan as a RAW file just like a digital camera. The proprietary dust & scratch removal software is in the scanner itself and I can run the RAW files back through the scanner at different settings to get the best possible finished scan.


> On Nov 10, 2020, at 8:47 AM, mskala at ansuz.sooke.bc.ca wrote:
> 
> In more detail:  I was reprinting an academic work.  It was important for
> the page numbers to remain the same as in the first printing so that
> references wouldn't break.

Stupid. Really, REALLY stupid. I've created documents with that kind of page numbering. There's no excuse for this, period.



Terry Bowman, KA4HJH
"The Mac Doctor"

"By the end of Chuck Statler's 'Rock Videos' of Devo we agreed that even if Devo did not take the stage it was still the best concert any of us had ever attended." --Kim Thayil (Soundgarden), 1995





More information about the Synth-diy mailing list