IDAT Consulting | Time to Reconsider OCR/ICR and OMR/IMR <META name="resource-type" contents="a-ocromr.html">; <META name="description" content="Overview of OCR, ICR, OMR, IMR."> <META name="keywords" value="Consulting ADC AIDC Automatic ID OCR ICR OMR IMR"> <META name="distribution" content="global"> <META name="copyright" content="Copyright Bert Moore, contents may be used upon request and with appropriate reference(s).">

 
IDAT logo
Article Index
 

  Surprise! OCR and OMR Are Hotter Than Ever

Bert Moore

Introduction

I-C-R, O-C-R, O-M-R, and Mark Sense...sounds a little like something children might chant skipping rope. But if you haven't given much thought lately to "old" technologies such as OCR (Optical Character Recognition) and OMR (Optical Mark Recognition), don't "skip" this article.

Why? Because we haven't been able to make the "paperless" office a reality – but modern OCR/OMR hardware and software can make child's play of handling the mounds of paper we do have to deal with on a daily basis. And it's not just the No. 2 pencils [be sure to fill the box carefully and completely] and funny-looking characters you might remember from when you were a kid. It's a whole new ball game.

OCR and ICR are character-based recognition systems. OCR recognizes machine-printed text (e.g., computer print-out) while ICR (Intelligent Character Recognition) recognizes machine- and hand-printed characters.

OMR and Mark Sense recognize the presence or absence of marks in specified locations. (Mark Sense is the early version of this technology that relied on electrical conductivity of No. 2 pencil marks for reading.)

Even five years ago, combining these two technologies in a single article would simply have been a matter of convenience – because not too many people were interested in either technology. Today, however, there's a lot more interest and OMR and OCR are often combined in the same application to improve the flexibility of data collection and processing. Thus, it makes sense to look at these technologies together.

How accurate is this method of entry? According to one vendor, accuracy rates can be achieved up to:

  • 98% for ICR,
  • 99.8% for OCR, and
  • 99.99% for OMR.

Most large-scale systems are designed to flag "problem" areas for review by human operators – thereby increasing overall accuracy.

If you've ever scanned documents into your PC and spent a lot of time correcting recognition errors, you're probably wondering how you can get such high performance in ORC. It's important to remember that, in data collection applications, fields are typically short and pre-defined for the data type, such as an all-numeric account number. Document scanning software doesn't have the benefit of this type of structure.

Applications

OMR/OCR are finding a wide range of applications today, primarily because of advanced software.

OMR applications have typically been those where the person entering data does not have access to sophisticated data recording equipment (e.g., students, poll-takers).

OCR has typically been used where selected information on a pre-printed form is to be read (e.g., account number from a remittance stub).

Here are a couple of applications to start you thinking.

Data Retrieval

One major U.S. manufacturer was faced with the task of retrieving data from 1.7 million frames of archived COM fiche created between 1977 and 1990 – over 100 million characters of data. (COM fiche is a photograph taken from the fiche's CRT display.) After getting quotes of over $3,000,000 for key entry at an off-shore location, the company turned to the new generation of OCR software that could "learn" forms layouts and character shapes.

Forms Processing

Reading a combination of pre-printed and hand-checked information on U.K. road tax forms (the equivalent of auto registrations) an OCR imaging system verifies information then microfilms the form for archiving. Thruput averages 4,800 forms per hour, including time for operator intervention to correct data incorrectly entered on the form or to key unreadable characters.

Check Processing

ICR now reads the machine- or hand-printed amount of the check (numeric values) as the check is being processed (OCR is used to read the check and account number). Thruput is dramatically improved.

Other Applications

Some of the more "typical" OMR/OCR applications are entering and processing data from:

  • consumer invoices (e.g., utility bills),
  • market surveys, public opinion polls, election ballots, testing,
  • employment applications, and
  • tax payments and audit reports.

Other applications include:

  • membership/enrollment forms,
  • route sales,
  • expense reports,
  • customer registration, change of address,
  • process tracking and monitoring,
  • medical histories and examinations,
  • <
  • insurance forms,
  • technical document data retrieval, and
  • entry and indexing of documents in archive.

How It Works

Mark Sense/Optical Mark Recognition (OMR)

Mark Sense and OMR recognize the presence or absence of a mark in a specific area of a specially-designed form. The exact meaning of the mark depends on the form's design. In other words, a question could be true/false or multiple choice and the mark could indicate any of the valid responses. This technology is ideal for collecting standardized data where the person recording the data does not have sophisticated (and expensive) data recording equipment. In other words, a No. 2 pencil and piece of paper work just fine.

Mark Sense is the early, contact-read version of this technology. It has been replaced by the optically-based version, OMR. OMR allows much closer spacing of marks (to within one-sixth of an inch of each other).

Although typically used for paper forms that are read by page readers or document readers, OMR isn't restricted to hand-printed "boxes."

Some of today's mark recognition technology is also called Intelligent Mark Recognition (IMR) because of its ability to recognize "X" and "check" marks – and can even tell if a check-mark has been "X-ed" out.

Historically, OMR has relied on special preprinted forms and hardware-based reading. Questions and answer blocks are printed in a color (typically light blue) that isn't recognized by the reader. The reader then sees only the answers. Special timing marks along the form's edge provide location information and a bar code is typically used to identify the specific form. Special OMR page or form readers process the documents.

New software-based solutions offer the opportunity to produce forms in-house, using black printing, and a wider variety of page readers. These packages can be taught where to look for answers and what parts of the form (e.g., the questions) to ignore.

Optical Character Recognition (OCR)/Intelligent Character Recognition(ICR)

OCR/ICR recognize the shape of printed characters. OCR recognizes machine-printed characters (e.g. computer printouts) while ICR recognizes hand-printed characters.

ICR is the technology employed by pen-based systems for handwriting recognition.

OCR readers are no longer restricted to highly stylized OCR-A and OCR-B fonts. This is one reason OCR is enjoying renewed interest: the machine-readable characters are also human-readable. This saves space and doesn't add the sometimes "impersonal" look of a bar code on some forms.

For some, these technologies may be more familiar in their electronic forms. OCR is incorporated into many computer fax packages to allow editing of faxed text. ICR is used by many hand-held touch-screen terminals to recognize hand-printed data.

Regardless of the basis (paper or electronic), both OCR and ICR require sophisticated algorithms to interpret shapes to determine their meaning. One OCR/ICR engine developer uses up to five different algorithms to increase accuracy.

OCR can recognize a wide variety of machine-produced typefaces, including proportionally-spaced type such as used in this magazine. Typically, OCR readers have been dedicated to OCR tasks but, again, software-based solutions have greatly increased the choice of reading devices. Standard flat-bed scanners can even be used for low volume applications.

Instead of simply interpreting the character shapes, many OCR/ICR software packages first capture the full image. This allows the software to "deskew" the image if the form isn't aligned precisely. It also provides a human operator with a retrievable image for comparison or correction purposes.

Conclusion

OCR/OMR have grown in capabilities to the point where many types of forms that needed to be processed manually can now be handled automatically. The combination of recognition technologies means that hand-printed and check-box data can now be captured at the same time. And the fact that OMR forms no longer need to be produced by outside printers greatly increases flexibility.

Of course, the "traditional" methods are still viable, particularly for companies that can benefit from the expertise and assistance of outside service bureaus.

A version of this article originally appeared in the August 1996 issue of Automatic ID News.
Copyright © 1996 Advanstar Communications. Reprinted by permission.