Scanning old documents (like EN)

KA4HJH ka4hjh at gte.net
Thu Feb 17 00:13:41 CET 2000


Aside from the tangle of legal issues involved, the fact is that 
scanning old documents and touching up the results is a 
time-consuming, laborious business. Something the size of the EN 
collection would be a massive undertaking.

Then there's OCR, Optical Character Recognition. The idea is to 
turned the scanned text back into ASCII text, which takes up a LOT 
less space in pdf file!

As an experiment I have already done this with a couple of 
interesting old articles, with impressive results (I think). The only 
problem is that it took FOREVER to get one done. Part of the problem 
stems from the fact that things like parts lists and technical jargon 
are not readily recognized by the OCR software, and have to be 
corrected by hand. "S5" becomes "SS" and "R11" becomes "Rll". It's 
still faster than retyping the original but a real pain. The better 
programs, like OmniPage, can be "trained" to a certain extent, but I 
have never had the opportunity to look into this.

Once I had the text converted and the images tweaked I recreated the 
original article in FrameMaker, then used Acrobat Distiller 4 to 
create the final pdf. Not only prints great, but it looks great on 
the screen, too. MUCH more readable than a scan. You can zoom in as 
deep as you want.

I hope to have this online so everyone can take a look at it this week.


Terry Bowman, KA4HJH
"The Mac Doctor"

ICQ: 45652354



More information about the Synth-diy mailing list