OCR, or Optical Character Recognition, refers to a scanning process that not only makes a digital image of the original document, but also “reads” that image creating editable text and graphics. Newspapers often use OCR to scan articles, letters, advertising copy and classified ads into their computer system so that the items can be manipulated to fit within the prescribed space and length requirements.
Once a document has been scanned using OCR technology, it can be opened in any number of word processing or picture formats, edited, and saved under a new file name.
OCR systems use both hardware (scanners) and software to accomplish this task. Many types of text are OCR-readable, but handwritten documents can still pose a problem for these readers due to the natural variations in handwriting style. The United States Post Office uses a form of OCR for mail sorting, and in order to be able to sort the most letters through their automated sorters, they ask that patrons print addresses in print, not cursive and that a prescribed format be used. One aspect of that request is that state names be used in their two-letter abbreviated form.
The legal profession is also making heavy use of OCR technology to be able to sort, maintain, and index the mountains of paperwork they are required to keep for each case, as well as to make research materials such as Westlaw and Lexis more easily available to offices with limited library space.
Archives are also using forms of OCR to be able to offer easier access to images of historic, genealogical and research data such as census images and old city and county land records.
The major drawback of OCR is that human eyes must still proofread the digital document to assure that the original document has been read accurately.