Executive Strategies

Document Management Magazine

 

Trends and Applications of Handprint OCR Recognition – ICR – Today and In The Next Millennium

by Herbert F. Schantz, HLS Associates

 

This year, the United States market for Optical Character Recognition (OCR) systems components used for Data Capture is estimated to be in excess of $55 million. Data Capture OCR involves applications where data per subsequent computer processing is captured from forms. These forms can be page sized or coupons (checks) sized. OCR for Data Capture as compared to Text Recognition OCR is characterized by having to capture the data for formatted and unformatted forms very accurately. With Text Recognition OCR, the text to be recognized is usually on pages such as magazine and legal discovery documents. With Text Recognition most of the characters being read are machine printed, not handprinted or handwritten. With Text Recognition applications errors rates of one to three percent are acceptable, even after spell checks. This rate would be totally unacceptable for Data Capture applications where the data will be used for business and financial applications. (Only one mistake on a tax return could be expensive and time consuming for both the taxpayer and the government.)

ICR is a form of OCR that is used in the Industry to relate primarily to the recognition of handprinted (not handwritten) characters in primarily Data Capture applications. These applications include the recognition of handprint characters in the following business related applications:

  1. Tax returns – Federal, State, Local (Hempstead)
  2. Magazine subscription applications (Ziff Davis)
  3. Mail Orders (L.L. Bean)
  4. Express Mail Waybills (FedEx, Airborne, UPS)
  5. Transportation Waybills
  6. Sales Reporting
  7. Medical and Dental Health Claim
  8. Applications for Insurance
  9. Applications for Credit Cards
  10. Mortgage Applications (Loans)
  11. Wildlife Applications
  12. Checks – POD
  13. Payments – Remittances
  14. Order Processing – (Avon)

These applications are typical of those that use ICR (OCR) technologies to accurately identify the handprinted characters. These characters can be alpha, numeric, or alpha/numeric.

These characters can be constrained or unconstrained, segmented or unsegmented. That is, they can be carefully located in "blend-info" boxes (constrained and segmented) or there can be randomly printed in a minimum space without any constraint boxes. This latter case is the "real world" for ICR where the handprinted characters are unconstrained and unsegmented. The third class of ICR characters is the most difficult of all to accurately read. This is known as Courtesy Amount Read (CAR) and the application is the recognition of the money field on checks. That is, the handprinted amount in the "Courtesy Amount Box" or field. True, the characters are all numeric. But, the many diverse and random patterns are as random and diverse as the payer’s handprinting skills and consistency. Each payer has a unique style and format that they use. In any one bank processing checks there are several thousand accounts whose checks are received intermixed. With CAR, accuracy is critical because the amount of each check is imprinted on each check in MICR info and used to transfer funds from one account to another. Thus, error rates for CAR handprint recognition usually are very low, in the order of one error in each 10,000 valid characters presented. The ICR performance on checks is usually in "real time" at the speed of the transport. These typically run at speed of 600 to 1200 items per minute. (i.e. 10 to 20 per second).

Today, ICR systems are reading "real world" characters from "real world" forms at machine speeds. ICS is most common and readily used in check processing and remittance processing applications. ICR is also becoming more common in transaction processing applications where time critical decisions are requested. ICR is also commonly employed in data capture applications where timing is not critical and the applications can run in-line rather than on-line. This latter case would apply when on-line reject re-entry (data purification) is applied or where the applications are not "time critical".

Today, most of the "low hanging fruit" i.e. the easy applications for ICR are implemented. What remains is the need for accurate and rapid recognition of "real world" characters from "real world" forms that are commonplace in U.S. Businesses, today… and in the next millenium.