Next Article in Journal
Developments in Electric and Green Marine Ships
Previous Article in Journal
Mathematical Apparatus of Optimal Decision-Making Based on Vector Optimization
Open AccessArticle

Business Process Automation: A Workflow Incorporating Optical Character Recognition and Approximate String and Pattern Matching for Solving Practical Industry Problems

Department of EIT, University of Arkansas at Little Rock (UALR), Little Rock, AR 72204, USA
*
Author to whom correspondence should be addressed.
Appl. Syst. Innov. 2019, 2(4), 33; https://doi.org/10.3390/asi2040033
Received: 17 September 2019 / Revised: 16 October 2019 / Accepted: 18 October 2019 / Published: 24 October 2019
Companies are relying more on artificial intelligence and machine learning in order to enhance and automate existing business processes. While the power of OCR (Optical Character Recognition) technologies can be harnessed for the digitization of image data, the digitalized text still needs to be validated and enhanced to ensure that data quality standards are met for the data to be usable. This research paper focuses on finding and creating an automated workflow that can follow image digitization and produce a dictionary consisting of the desired information. The workflow introduced consists of a three-step process that is implemented after the OCR output has been generated. With the introduction of each step, the accuracy of key-value matches of field names and values is increased. The first step takes the raw OCR output and identifies field names using exact string matching and field-values using regular expressions from an externally maintained file. The second step introduces index pairing that matches field-values to field names based on the location of the field name and value on the document. Finally, approximate string matching is introduced to the workflow, which increases accuracy. By implementing these steps, the F-measure for key-value pair matches is measured at 60.18% in the first step, 80.61% once index pairing is introduced, and finally 90.06% after approximate string matching is introduced. The research proved that accurate usable data can be obtained automatically from images with the implementation of a workflow after OCR. View Full-Text
Keywords: business process automation; Levenshtein; OCR; Google vision API; Tesseract; approximate- and exact string matching; index pairing; workflow business process automation; Levenshtein; OCR; Google vision API; Tesseract; approximate- and exact string matching; index pairing; workflow
Show Figures

Figure 1

MDPI and ACS Style

de Jager, C.; Nel, M. Business Process Automation: A Workflow Incorporating Optical Character Recognition and Approximate String and Pattern Matching for Solving Practical Industry Problems. Appl. Syst. Innov. 2019, 2, 33.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop