|
| |
|
Streamlined Document Capture
|
The
Benefits of Automation
Once you have decided the best way to organize, store and retrieve your
documents, the next part of the planning stage is to find the most efficient way
to scan these documents and associate them with the correct index field values.
Creating an efficient scanning and indexing process will save you countless
hours of labor over the life of your project.
The two main methods for
automating indexing are barcode recognition and Optical Character
Recognition (OCR). Barcode recognition is faster and more accurate, but your
documents must contain a barcode on the document or a cover page for this to
work. OCR is able to read printed data directly from the page, which means
most documents can be processed as-is. However there are many conditions
that can affect the practicality of OCR that will be discussed in this
section.
|
|
Using Barcode Recognition
|
Using OCR
|
If your index data already exists in another
database, oscFile has two features that can make use of this data to
automate processing. The Index Autofill feature lets you enter one key field
that is used in a database lookup to retrieve matching values and fill in
the remaining index fields automatically. oscFile also has the ability to
pre-set index values using the Command Line Interface and have a scanned
document receive these indexes automatically.
Barcode recognition is the most efficient way to capture index data printed
on documents. Some documents already have key information in barcode format
on them. If your project is to scan new documents on an ongoing basis, it
may be possible for you to redesign it to include barcodes. Having a barcode
with index data on the document is the best case scenario, for all the index
data is on the document at the time it is created in a format that can be
read with near 100% accuracy.
If it is not possible to print
barcodes on the document itself, an alternative is to have the person who
creates the document print out a barcode cover page and place it on the file
before it is scanned. The oscFile CoverSheet application was designed to
make this easy by providing a simple interface for selecting index values
and printing a standard coversheet that contains these values in barcode
format.
Barcode recognition can also be useful when you have documents with a
variable number of pages that will all receive the same index values. If it
is not possible to generate an indexed coversheet for these at the time they
are created, a generic barcode coversheet can be used to separate the
scanned images into multi-page files, one for each document. A second
process can then be used to index these images one file at a time instead of
one page at a time, greatly increasing throughput.
|
Traditionally, zone OCR solutions require you to
specify a region on the page where index information will be found. This
region is recognized and the result is inserted into an index field. The
problem with traditional zone OCR is that if the region is moved slightly
due to variations in scanning, the result could contain extra neighboring
characters or cut off desired characters. This limits the usefulness of
traditional zone OCR to documents where the index value is in the exact same
place every time and has plenty of white space around it.
oscFile’s OCR contains many advanced features to overcome the inherent
limitations of zone OCR. This is done by providing template and dictionary
matching for OCR fields. These features search the OCR results for a certain
pattern or list of possible values and return only the matching data. This
allows you to draw your OCR zones much larger than normal, ensuring that no
matter how much the data shifts around it will always be contained within
that region.
It is even possible to draw your zone around the entire page and find key
information that is not printed in any fixed location. For example, a
doctor’s office may receive lab reports from many different labs. Each
report is formatted differently, but each contains the patient’s name
somewhere on it. Using the dictionary matching feature with a patient name
list, oscFile can identify the correct patient for each lab automatically.
When implementing OCR for document automation, carefully consider the data
you are trying to recognize. Is the text legible? Does it appear in a fixed
location? Does it conform to a unique pattern that won’t be found anywhere
else on the page? Is there a list available with all the possible values for
this field? Answer these questions and you will know which OCR approach is
best for your application. |
|
Using Index Autofill |
Using Pre-Indexed Batches |
The Autofill
feature of oscFile is an easy way to associate many index fields with one
document without retyping data that already exists in another application.
Autofill uses a database lookup to retrieve records that match a key value
entered by the user. Blank index fields are then filled in automatically
with the data from this lookup. The result is a document database with many
different possible search fields, of which only one needed to be entered
during scanning.
The key field may be typed by the user, or it may be read from the document
automatically using barcode recognition or OCR. The lookup is performed
either when the user changes this field or when the index values are saved.
If the lookup finds multiple matching records, the user will be notified and
the first set of values will be used by default.
|
Pre-index batches
are a unique feature of oscFile that greatly improve throughput for
scanning a single document at a time. Pre-indexed batches can be configured
to allow the user to enter index values prior to scanning, or they can be
executed from the command line to circumvent user interaction altogether.
Some typical scenarios for pre-indexed batches are:
User scans one document at a time by entering field values first,
scanning and having the images saved with these values automatically.
User has several pre-defined documents that they scan. All field
values are saved with the configuration file. User loads the scanner and
double-clicks the appropriate configuration to scan and save that file
automatically.
oscFile is integrated with an existing application. A “Scan
Current Record” button is implemented that launches oscFile and passes the
index values for the current document through the command line. The user
loads the scanner and clicks this button; images are scanned and saved
automatically.
|
|