1. Computing

Discuss in my forum

OCR - What It Is, How to Do it Better

Turning images into editable text

By

OCR (Optical Charactor Recognition) is the process of turning a picture of words (such as a scan of a typed letter) into an editable document that you can open and use in your desktop publishing software, word processor, or other text editor.

While the technology has been around for years, it has also been a hit-or-miss process. Some software does the job better than others. Some of the newest packages offer better support for less-than-perfect originals and documents with elaborate formatting including columns, tables, numerous font changes, and graphics. See the sidebar links for a round-up of some of the top OCR solutions out there right now for your Windows and Macintosh desktop systems.

Tips for better OCR results
Scanners often come with a limited edition or "stripped down" version of OCR software. Other types of programs also have OCR modules included. The CorelDRAW suite has a utility called OCR-Trace that has always yielded a fairly acceptable level of OCR accuracy for me. If your OCR needs are modest, these solutions may be adequate for your needs.

Whatever type of program you use (and no matter what accuracy rate the program claims) there are things you can do to insure the best possible results from your OCR software:

  • Start with a good original. Is the paper wrinkled? Try ironing it (warm, not hot iron) or pressing between heavy books. Erase smudges.

  • Make the scan the best you can. Make sure the scanner bed/glass is clean, smudge-free. Keep the document straight and even so you don't end up with a "skewed" image. Adjust the color/contrast/brightness so the background is light/white and free of "artifacts" (such as a pattern in the paper) and the text is dark. Scan at 300dpi or better.

  • Turn one document into many. With older or stripped-down software, graphics, lines on forms, columns of text, and other formatting will cause problems. Try breaking the scanned original down into smaller chunks (crop out non-text elements or save columns of text as individual images) and run your OCR software on each part separately. You'll lose formatting but gain a more accurate text document. However, newer OCR software is getting better and better at retaining formatting of forms and tables so you may want to trade in your old OCR software for some newer OCR software solutions.

  • Try different settings. Experiment with different options in your software. If your first attempt is less than usable, adjust the controls.

  • Proofread. No matter how accurate the program, all are fallible. Proofread, proofread, proofread the finished document.

Do you have an OCR program that you highly recommend? Are there some really good freeware, shareware, or very inexpensive OCR solutions that you've found? Tell us about it on the forum.

Pick Your Path to Desktop Publishing
Get Started:Basic Guidelines and Requirements for Desktop Publishing
Choose Software:Desktop Publishing and Design Software
Tips & Tutorials:How to Do Desktop Publishing
Training, Education, Jobs:Careers in Desktop Publishing
In the Classroom: Back to School With Desktop Publishing
Make Something: Things to Make for the Holidays
Use Templates: Templates for Print and Web Publishing

©2014 About.com. All rights reserved.