|
Index
Unstructured Documents Affordably
Dynamic OCR (also known as Dynamic Forms Processing or Unstructured Forms
Processing) is the cutting-edge new technology that allows very expensive
software packages to read an index field no matter what its location on the
image. Huge data processing centers pay $25,000 or more to implement this
technology in their organizations. How can it be that this feature is included
in oscFile?
oscFile makes
this possible by simplifying Dynamic OCR, using Template and Dictionary Matching
to find a desired index value no matter where it is on the image.
For example, if you want to find a social security number automatically, simply
enter the template ###-##-#### and oscFile will search the entire page until it
finds a match. Since only one social security number is likely to appear on the
page, a match on this pattern is almost certainly the required value.
With dictionary matching, you can give oscFile a list of possible values and it
will automatically search the page for each possible value until it finds a
match.
Many dynamic forms processing applications can be implemented using these simple
algorithms. This makes oscFile far more versatile than other zone OCR solutions
that require the index value to be in the exact same location on every page. Yet
oscFile costs only a fraction of the price!
oscFile’s dynamic forms processing can greatly speed up data entry by
eliminating a good percentage of indexing work. For many this can put the labor
cost of scanning within their reach. Dynamic OCR can also be applied to MS Office and PDF files, creating a fully
automated process for intelligently indexing and reorganizing electronic
documents.
Check our Advanced OCR Guide for information on how to use third party OCR
applications with oscFile to improve accuracy and efficiency in dynamic and
zone OCR applications.
Support for Regular Expressions
oscFile
6.0 adds the ability to use Regular Expressions to find values
within the OCR text or input file path. Regular Expressions (RegEx for short)
let you define complex search patterns to extract matching values from the text.
This greatly enhances the functionality of the dynamic OCR in oscFile, making
it capable of finding variable-length fields with no distinct pattern.
Regular Expressions are a commonly used in text parsing applications. The Perl
programming language makes extensive use of RegEx, as do UNIX utilities like "grep".
Many programmers and IT personnel are already familiar with RegEx and can create
complex expressions without specific training.
Traditional Zone OCR
If you have standard documents where index data is always in the same
location, the fastest way to index them is with traditional Zone OCR. The
Template and Dictionary Matching features of Dynamic OCR also come in handy with
Zone OCR, since documents can skew during scanning, and unwanted text may shift
into the zone. With oscFile you can draw oversize zones that account for any
amount of skewing, and use Template or Dictionary Matching to extract the data
you need.
Index operators may also manually perform zone OCR during indexing by simply
drawing a box around the desired field value.
Full Page OCR
oscFile also has the ability to perform full-page OCR conversion to MS
Word, HTML, Text or WordPerfect documents. Image and text zone templates can be
drawn and recognized text can be corrected before it is output. You can also use
oscFile's streamlined scanning and indexing workflow to batch process images
and apply standard naming schemes to the OCRed files.
|