Previous Article in Journal
Deep Learning-Based Aerodynamic Analysis for Diverse Aircraft Configurations
Previous Article in Special Issue
Enhancing Migraine Classification Through Machine Learning: A Comparative Study of Ensemble Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care

by
Perasuk Worragin
1,
Suepphong Chernbumroong
1,
Kitti Puritat
2,
Phichete Julrode
2,* and
Kannikar Intawong
3,*
1
College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
2
Department of Library and Information Science, Faculty of Humanities, Chiang Mai University, Chiang Mai 50200, Thailand
3
Faculty of Public Health, Chiang Mai University, Chiang Mai 50200, Thailand
*
Authors to whom correspondence should be addressed.
Technologies 2025, 13(11), 530; https://doi.org/10.3390/technologies13110530 (registering DOI)
Submission received: 26 September 2025 / Revised: 14 November 2025 / Accepted: 15 November 2025 / Published: 17 November 2025

Abstract

Manual data entry in dialysis centers is time-consuming, error-prone, and increases the administrative burden on healthcare professionals. Traditional optical character recognition (OCR) systems partially automate this process but lack the ability to handle complex data anomalies and ensure reliable clinical documentation. This study presents the design and evaluation of an AI-enhanced OCR system that integrates advanced image processing, rule-based validation, and large language model-driven anomaly detection to improve data accuracy, workflow efficiency, and user experience. A total of 65 laboratory reports, each containing approximately 35 fields, were processed and compared under two configurations: a basic OCR system and the AI-enhanced OCR system. System performance was evaluated using three key metrics: error detection accuracy across three error categories (Missing Values, Out-of-Range, and Typo/Free-text), workflow efficiency measured by average processing time per record and total completion time, and user acceptance measured using the System Usability Scale (SUS). The AI-enhanced OCR system outperformed the basic OCR system in all metrics, particularly in detecting and correcting Out-of-Range errors, such as decimal placement issues, achieving near-perfect precision and recall. It reduced the average processing time per record by almost 50% (85.2 to 42.1 s) and improved usability, scoring 81.0 (Excellent) compared to 75.0 (Good). These results demonstrate the potential of AI-driven OCR to reduce clerical workload, improve healthcare data quality, and streamline clinical workflows, while maintaining a human-in-the-loop verification process to ensure patient safety and data integrity.

1. Introduction

In Thailand, nephrology centers play a critical role in providing life-sustaining hemodialysis care to an estimated 80,000 patients with end-stage kidney disease across more than 2500 units nationwide. Despite the essential role of these facilities, the exchange of data between dialysis centers and government agencies remains limited. National authorities have avoided implementing open application programming interfaces (APIs) for transmitting health records because of concerns about cybersecurity and data privacy. This has forced healthcare providers to rely on repetitive, manual data entry, often rekeying the same information into multiple platforms maintained by different government agencies. Nurses and administrative staff must frequently log into web-based portals and desktop applications to submit treatment data, claims, and patient outcomes. The duplication of effort is not only inefficient but also increases the likelihood of errors, delays, and inconsistencies in reporting [1,2].
The absence of secure and interoperable mechanisms for data exchange has produced a series of operational challenges in dialysis centers. Surveys and observational studies suggest that nurses may spend 20–50 percent of their time on clerical tasks such as retyping laboratory results and treatment notes into government systems. This workload diverts attention from patient care and increases the risk of fatigue-related mistakes. Clinical consequences also arise. For instance, when laboratory results are delayed in being entered, physicians may lack timely information needed for decision-making, forcing patients to wait longer for follow-up or medication adjustments. International studies have confirmed that manual transcription is one of the least accurate and most time-consuming methods of clinical documentation. Systematic reviews show that automation, including optical character recognition, can substantially reduce error rates compared to manual entry [3]. Likewise, recent research demonstrates that novel optical character recognition systems can outperform human operators in terms of speed and reliability in real-world clinical environments [4]. These findings highlight the potential for advanced image-processing techniques to serve as reliable tools in streamlining nephrology information management.
Beyond accuracy and speed, the impact of manual entry on workforce morale has also drawn increasing attention. Repeated clerical duties contribute to staff burnout and dissatisfaction, which in turn affect retention in already understaffed healthcare systems [5]. In nephrology care, where continuity and specialized expertise are vital, losing trained nurses and technicians because of workload-related fatigue can negatively influence patient outcomes. Accordingly, solutions that not only reduce data errors but also alleviate staff burden can bring systemic benefits, including more sustainable workforce management. Agent-based artificial intelligence systems have recently emerged as powerful tools capable of planning, coordinating, and executing complex workflows. These intelligent agents can simulate the tasks of human clerks by navigating between different applications, validating extracted data, and interacting with legacy systems. The integration of agent-based AI with modern image processing therefore offers a promising approach to building “virtual clerks” that can safely and efficiently carry out administrative tasks in healthcare [6].
The present project introduces a fully implemented prototype of an intelligent virtual clerk developed as an add-on module to the NephroM system, an enterprise resource platform (ERP) widely used for dialysis data management. Supported by research funding from the National Research Council of Thailand (NRCT), this project aims to operationalize intelligent automation within existing clinical workflows. The system integrates image-processing pipelines, rule-based validation, and large-language-model (LLM) reasoning to automate the capture, verification, and secure submission of clinical data to external government platforms.
The specific objectives of this study are threefold: (1) to improve the accuracy and efficiency of clinical data entry through AI-enhanced automation; (2) to evaluate the system’s ability to reduce administrative workload and enhance healthcare workers’ job satisfaction by minimizing repetitive clerical tasks; and (3) to assess its potential to shorten patient waiting times by accelerating documentation and submission workflows. Accordingly, this paper focuses on the design, implementation, and evaluation of the OCR and validation modules as core components of the working prototype. By aligning with national digital-health strategies while maintaining compliance with cybersecurity policies, the proposed framework demonstrates how an AI-driven add-on module can enhance interoperability, improve service quality, and allow healthcare professionals to dedicate more time to patient care rather than clerical work. The primary contribution of this work lies in its practical applicability within real clinical workflows rather than in introducing algorithmic novelty.

2. Related Work

2.1. AI in Healthcare Information Management

Artificial intelligence has become a core enabler of healthcare information management by improving how data are captured, curated, and used for clinical and administrative decision-making. At the infrastructure level, AI methods help transform heterogeneous electronic health record data into machine-actionable formats, support patient representation learning, and enable predictive analytics that inform quality improvement and population management [7,8,9]. Beyond prediction, AI is increasingly deployed to streamline routine information workflows such as data abstraction, coding, and document classification, with the aim of reducing latency and improving completeness and consistency in health datasets [10]. These capabilities are critical in domains like nephrology, where high-frequency encounters and laboratory monitoring generate substantial documentation and reporting requirements. Coupled with modern image processing and optical character recognition, AI systems can accurately extract key fields from semi-structured forms and scanned documents, supporting safer and faster ingestion into registries and reporting systems [4].
The operational rationale for automation is grounded in well-documented burdens associated with EHR work and clerical tasks. Time–motion and workflow studies show that a large share of clinician effort is consumed by documentation and desk work, while EHR-related clerical load contributes to burnout and reduced job satisfaction [5,11]. Recent approaches therefore combine intelligent document understanding with orchestration layers that can navigate legacy web or desktop interfaces. In practice, this is achieved through agent-based systems and intelligent automation frameworks, which plan multi-step tasks, validate extracted content, and interact with multiple applications under policy constraints [6]. As organizations scale such solutions, attention to secure health information exchange and interoperability standards remains essential so that automation improves throughput without compromising privacy or cybersecurity [2]. Literature on the integration of AI with robotic process automation also indicates growing maturity in using these tools to reduce administrative friction while keeping governance and sustainability considerations in view [12].

2.2. Image Processing and Optical Character Recognition in Clinical Contexts

Image processing and optical character recognition (OCR) are foundational technologies for converting semi-structured and unstructured clinical documents into machine-readable data, enabling downstream analytics and workflow automation. Over recent years, deep learning has substantially improved resilience to noise, variability in layouts, and the diverse fonts and formats often found in healthcare documentation such as laboratory reports, dialysis treatment logs, and admission forms. Advances in convolutional neural networks, recurrent architectures, and more recently transformer-based models have enhanced text detection and recognition accuracy in challenging environments. Layout-aware frameworks that encode both spatial and textual information further improve field-level data extraction, which is particularly useful in nephrology, where recurring treatment forms and frequent laboratory reports require consistent and accurate transcription [13,14,15,16].
Clinical implementations increasingly integrate preprocessing, OCR, and post-processing validation with domain-specific rules or natural language processing to ensure accuracy and shorten turnaround times for data entry. Tailored OCR systems have been shown to outperform manual transcription in terms of speed and reliability, especially for vital-signs documentation and prescription forms. Moreover, pipelines that combine OCR with natural language processing have demonstrated high levels of precision in extracting usable information from scanned health records, enabling accurate population of registries and quality measurement databases. Multi-center evaluations in intensive care settings have further shown that OCR-based data entry can accelerate information flows and reduce staff burden, provided that layout variations and device heterogeneity are managed through preprocessing and human-in-the-loop validation. Collectively, these findings underscore the potential of healthcare-specific OCR pipelines to improve the efficiency, safety, and reliability of information management in nephrology and other chronic disease domains [4,7,17].

2.3. Agent-Based Systems for Workflow Automation

Agent-based systems provide a principled foundation for automating complex, multi-step workflows by encapsulating autonomy, social ability, reactivity, and proactivity in software entities that can perceive their environment, plan actions, and collaborate to achieve organizational goals. In administrative healthcare contexts, agents can coordinate extraction, validation, and submission tasks across heterogeneous applications while respecting local policies, role-based permissions, and exception handling. This aligns with the evolution of robotic process automation from scripted UI macros toward intelligent orchestration layers capable of decision-making and resilience to variability in interfaces and data [18,19]. Classic agent research established the architectural and coordination principles that enable such capabilities, including task decomposition, negotiation, and cooperative problem solving, which remain directly relevant when simulating human clerks that must navigate legacy web portals and desktop systems [20,21]. Recent literature further argues for integrating agent reasoning with analytics and document understanding so that automation not only executes keystrokes but also validates content, detects anomalies, and triggers human review when confidence is low [12].
Building on these foundations, contemporary “agentic AI” systems extend workflow automation with tool use, planning, and self-monitoring, enabling agents to call OCR and NLP services, enforce domain rules, and maintain auditable trails under governance constraints. In healthcare, this makes it possible to operationalize human-in-the-loop patterns where agents handle routine steps and escalate ambiguous cases to clinicians or administrators, thereby reducing turnaround time without sacrificing safety. Evidence from biomedicine demonstrates that agentic approaches can structure complex, multi-application tasks and coordinate specialized tools, suggesting strong applicability to clerical data flows in nephrology [6]. At scale, however, secure health-information exchange and interoperability remain prerequisites; automated agents must comply with cybersecurity controls and data-sharing policies so that throughput gains do not introduce privacy or integrity risks [2]. Taken together, the literature supports a layered design in which agent-based orchestration governs document AI pipelines, integrates with existing portals, and embeds oversight and auditing an approach well-suited to automating repetitive, rule-bound reporting workflows in dialysis centers.
In the present study, these agent-based principles are operationalized in a fully implemented prototype integrated with the NephroM platform. The proposed virtual-clerk model adopts a three-layer architecture document ingestion and recognition, data validation and adaptive reasoning, and task-execution automation all of which were developed and deployed within the working system. The agent-based automation layer, built on Playwright and PyWinAuto, enables the virtual clerk to automatically submit verified records to external government portals while maintaining compliance and auditability. Accordingly, the agent-based framework presented here represents not only a guiding architecture but also an implemented orchestration system that coordinates OCR, validation, and automation modules in real-world clinical workflows. This implementation demonstrates the feasibility of applying agent-based AI to healthcare administration, bridging conceptual design with practical deployment in nephrology documentation.

2.4. Digital Health Transformation and Cybersecurity Constraints

Digital health transformation is frequently framed as an API-first modernization of clinical systems in which interoperability standards such as HL7 FHIR and application frameworks like SMART on FHIR enable secure, modular exchange of health information. In principle, this architecture allows external applications to retrieve and submit data in a governable manner while preserving auditability, consent management, and least-privilege access. In practice, however, many health authorities and public agencies remain reluctant to expose data-ingest interfaces because operational and legal risks around cybersecurity, privacy, and data misuse are perceived to outweigh the efficiency gains. The result is a persistent gap between the promise of interoperable, standards-based exchange and the reality of policy-constrained ecosystems that continue to rely on manual re-entry into legacy web portals and desktop applications. Prior work highlights both the technical feasibility of safe, standards-conformant exchange and the governance challenges that limit routine cross-organizational sharing, underscoring the need for solutions that respect existing controls while reducing clerical burden. Representative analyses of secure health-information exchange and interoperable app ecosystems emphasize that successful adoption hinges on end-to-end security controls, identity management, and rigorous auditing capabilities that must be demonstrated to regulators before broader API access is permitted [2,22].
Concurrently, the healthcare threat landscape has intensified, with systematic reviews documenting escalating risks from ransomware, phishing, credential compromise, and exploitation of third-party components. These incidents have real operational consequences, including care delays and data integrity concerns, which further discourage authorities from opening inbound programmatic channels without robust mitigations [23]. To balance transformation with risk, emerging approaches combine privacy-preserving computation and verifiable infrastructure such as blockchain-backed audit trails for provenance and federated learning to keep raw patient data local while enabling shared model improvement. Although these technologies do not eliminate risk, they provide concrete mechanisms to strengthen auditability, reduce data movement, and demonstrate compliance, thereby making controlled automation more acceptable to oversight bodies [24,25]. Within such policy and security constraints, agent-based automation that operates through sanctioned user interfaces paired with document-AI pipelines and human-in-the-loop review offers a pragmatic path to efficiency. It can preserve existing governance boundaries while reducing redundant data entry and improving timeliness in high-frequency domains like nephrology information management.
Despite the rapid progress of AI in healthcare data management, several research gaps remain unresolved. Existing studies on electronic health record automation and OCR pipelines largely focus on general clinical documentation or radiology reports, with limited exploration of high-frequency, high-volume specialties such as nephrology, where repeated dialysis sessions generate a significant clerical burden. While robotic process automation and agent-based approaches have been applied in finance and business process management, their integration with healthcare-specific image processing pipelines is still underdeveloped. Moreover, most implementations emphasize technical accuracy without sufficiently addressing the policy and cybersecurity constraints that prevent the use of open APIs in government health systems. This leaves a critical gap for research that demonstrates how intelligent, agent-based automation can operate effectively within restrictive security environments, reduce redundant manual data entry, and improve timeliness in nephrology information flows while ensuring compliance with privacy and regulatory requirements.

3. Methodology

3.1. System Architecture

The intelligent virtual clerk was developed as an add-on module to the NephroM system, an enterprise resource platform (ERP) used for managing dialysis operations and patient data. Rather than replacing existing infrastructure, the module extends NephroM’s capabilities by introducing an agent-based automation layer that interfaces with external government portals such as the National Health Security Office (NHSO) and the Health Service Information Office. This integration enables automated data exchange while maintaining compliance with security and interoperability requirements. The intelligent virtual clerk is organized into a three-layer system architecture that was fully implemented in the working prototype, as illustrated in Figure 1. The first layer, Document Ingestion and Recognition, processes inputs from scanned forms, dialysis logs, and laboratory reports through preprocessing methods such as noise reduction, normalization, and segmentation, followed by optical character recognition using layout-aware models to generate structured data. The second layer, Validation and Domain Rules, ensures that extracted values are clinically plausible and consistent with administrative standards by applying predefined rules and allowing human-in-the-loop verification for ambiguous cases. The third layer, Agent-Based Automation, was also developed and deployed within the prototype. It functions as the operational core of the virtual clerk, where intelligent agents automatically interact with government platforms, handle exceptions, and ensure compliance through automated logging and monitoring. Functional testing confirmed that the automation agents can execute submission and verification tasks reliably across both web-based and desktop systems.
In order to provide project-specific details, the technical implementation of this architecture is further illustrated in Figure 2, which maps the tools and APIs applied to each system layer. In the input stage, scanned forms are processed using OpenCV and Tesseract OCR, while structured digital entries are collected directly from user inputs. In Layer 2, Pydantic is employed to enforce administrative and clinical validation rules, while ChatGPT (GPT-4 API, 1 March 2025) supports anomaly detection and explanatory reasoning, complemented by human verification where necessary. In Layer 3, the automation framework integrates Playwright for web-based workflows and PyWinAuto for PC-based government systems. Both were configured and tested to perform automated data entry, submission, and report generation, ensuring adaptability and compliance across heterogeneous integration environments. This implementation demonstrates the practical application of the virtual clerk in a real-world project, emphasizing its modularity and showing how open-source libraries, LLM reasoning, and agent-based automation can be integrated into a workflow that directly addresses the lack of interoperable APIs in government health information systems.

3.2. Agent-Based AI Design

The proposed virtual clerk is designed and implemented as an agent-based AI system that follows a continuous cycle of Perception, Decision, Action, and Monitoring, as illustrated in Figure 3. All four stages were developed within the working prototype, enabling end-to-end automation of data recognition, validation, and submission. In the perception stage, the agent acquires data from two main sources: scanned clinical forms processed by OpenCV and Tesseract OCR, and structured data directly entered by users. These inputs are transformed into structured representations such as JSON, often accompanied by confidence scores that indicate recognition accuracy. The decision stage combines deterministic validation with adaptive intelligence. Rule-based checks implemented through Pydantic enforce administrative and clinical constraints, while LLM reasoning (ChatGPT) supports anomaly detection, normalization of ambiguous inputs, and generation of explanatory feedback. Specifically, the language-model component was implemented using the OpenAI GPT-4 API (version released on 1 March 2025) under the gpt-4-turbo configuration, selected for its contextual reasoning capability and robust handling of clinical text validation tasks. Each request had an average latency of approximately 1.6 s per query, which was considered acceptable for near-real-time operations in clinical data entry.
The agent’s inferential capability resides within this decision stage, which constitutes the reasoning layer of the architecture. In this layer, symbolic (rule-based) inference and statistical (LLM-based) inference operate in concert. The rule engine applies 84 deterministic constraints covering field types, numeric ranges, logical dependencies, and temporal consistency. When these rules are violated or OCR confidence falls below 0.85, an empirically selected operating point identified during pilot evaluations where confidence values below 0.85 frequently correlated with character-level ambiguities and schema violations, a constrained GPT-4 reasoning routine is invoked to propose context-consistent corrections while strictly prohibiting data fabrication. Each candidate output is then re-validated against the deterministic schema before being accepted. This rule → bounded reasoning → human escalation policy operationalizes bounded rationality, allowing the agent to optimize accuracy and compliance under uncertainty while maintaining reactive efficiency for deterministic cases.
In terms of technical rule design, the 84 deterministic rules are organized into five categories: (1) type and format rules (e.g., enforcing fixed-length alphanumeric HN identifiers); (2) numeric range rules (e.g., validating that Creatinine lies within 0.3–20.0 mg/dL); (3) unit and normalization rules (e.g., converting Urea from mmol/L to mg/dL when necessary); (4) cross-field dependency rules (e.g., ensuring PreWeight > DryWeight and PostWeight < PreWeight); and (5) temporal consistency rules (e.g., requiring SpecimenDate ≤ ReportDate). Representative examples include: R12—HGB must be within 6.0–20.0 g/dL; R23—flag BUN values inconsistent with Creatinine-based physiological ratios; R41—reject records where laboratory timestamps exceed session timestamps; and R73—escalate potassium levels above 7.0 mmol/L for human review. These rule categories constitute the system’s explicit knowledge base, providing the deterministic inference layer that interfaces with the constrained LLM reasoning routine.
To make explicit how these rules formalize domain knowledge, the virtual clerk follows the three canonical components of an expert system. First, the 84 deterministic rules form the knowledge base, encoding clinical, administrative, physiological, and temporal constraints required in dialysis documentation. Second, the inference engine consists of (a) a deterministic rule evaluator that performs deductive checks, (b) a bounded LLM reasoning module invoked only under uncertainty, and (c) a deterministic re-validation layer that enforces schema compliance before accepting any output. This hybrid mechanism ensures that deductive logic remains primary while uncertainty is tightly controlled. Third, the user interface layer is implemented within the NephroM platform, providing OCR upload interfaces, real-time validation feedback, and human-in-the-loop review pathways.
The action stage enables the agent to simulate the role of a clerk by automatically filling government forms and submitting records through existing platforms. Web-based portals are handled with Playwright, while legacy PC applications are managed with PyWinAuto, ensuring flexibility across heterogeneous infrastructures. Finally, the monitoring stage incorporates human-in-the-loop validation, systematic audit logging, and reporting, which provide transparency, error recovery, and compliance with security requirements. Collectively, these four stages form a closed-loop cycle that allows the agent to perceive its environment, reason over data, act autonomously, and adapt based on feedback.
When compared with conventional robotic process automation (RPA), the agent-based AI design offers significant advantages. RPA workflows are typically brittle, failing when user interfaces change or when unexpected data is encountered. In contrast, intelligent agents embody autonomy, adaptability, and reasoning [20]. By combining rule-based validation with LLM reasoning, the virtual clerk does more than execute static scripts: it proactively identifies anomalies, explains discrepancies, and collaborates with human reviewers when required. The monitoring layer reinforces accountability through audit trails and continuous feedback, moving beyond linear automation pipelines. This implementation demonstrates the four canonical properties of intelligent agents autonomy, reactivity, proactivity, and social ability within an operational prototype that reduces clerical burden, improves data accuracy, and enhances trustworthiness compared with traditional automation approaches.

3.3. Image Processing Pipeline

This section describes the image-processing pipeline that converts printed paper forms and scanned dialysis logs into machine-readable data ready for validation and automation. An overview of the pipeline is shown in Figure 4. Preprocessing begins with denoising, de-skewing, and contrast normalization, followed by binarization to improve text–background separation. Global thresholding [26] and adaptive binarization methods [27] are applied depending on illumination and paper artifacts, with morphological operations used to repair broken strokes and suppress speckle noise. Text regions are localized and recognized using the Tesseract OCR engine, which employs adaptive classifiers and language models to support robust character recognition in printed clinical forms [28]. For semi-structured layouts such as tables, labels, and key–value zones, the pipeline aligns OCR results with layout-aware models to preserve spatial relationships, thereby enabling reliable field mapping across variable templates [13,15].
Post-OCR processing further structures and quality-controls the extracted text. Rule-based parsing standardizes identifiers, dates, and units, while confidence scores and heuristics trigger reprocessing or human review in ambiguous cases. Normalized fields are serialized into JSON and passed to the validation module, where schema checks and range constraints are applied. Such OCR-to-validation pipelines have been shown to reduce turnaround times and improve registry data usability [29]. As manual data entry is known to introduce errors, the pipeline is specifically designed to minimize keystrokes and surface only exceptions, aligning with evidence that optimized data processing methods can substantially lower error rates in clinical research [3]. The final structured output with confidence scores, provenance, and audit artifacts feeds the agent’s decision stage and ultimately the automation layer for secure submission to government systems.

3.4. OCR Configurations and Technical Integration

The intelligent virtual clerk employs a dual-configuration OCR pipeline engineered for high-fidelity extraction and validation of semi-structured clinical records. The baseline deterministic configuration utilizes Tesseract v5.3.2 in legacy (non-LSTM) mode, executing rule-driven segmentation and glyph-pattern correlation for character decoding. Input frames are pre-conditioned through OpenCV-based Gaussian denoising, Hough-transform de-skewing, and adaptive binarization, followed by morphological opening/closing to reconstruct stroke continuity and eliminate impulse noise. Post-processing modules implement deterministic normalization routines that apply regular-expression filters to enforce field syntax (patient identifiers, timestamps, measurement units) and to correct recurrent optical ambiguities (e.g., “O→0”, “I→1”). The structured output is mapped to the NephroM schema, which defines 84 deterministic constraints covering data types, numeric bounds, and inter-field dependencies, providing a transparent baseline for audit-compliant data ingestion.
The enhanced configuration activates the Long Short-Term Memory (LSTM) recognition engine of Tesseract v5.3.2, enabling contextual sequence modeling across character windows. Pre-trained English–Thai models are extended with a domain-specific lexical set incorporating nephrology terminology such as dialysate, hemodiafiltration, and Kt/V. Tokens with confidence scores below 0.85 invoke a generative reasoning sub-module implemented via the OpenAI GPT-4 Turbo API. The model operates under a constrained, instruction-based validation prompt that enforces bounded semantic behavior and prohibits uncontrolled text generation, for example:
System Role: You are an AI-based clinical data validator operating within a rule-constrained data entry system.
Your task is to analyze structured OCR outputs from nephrology forms, identify anomalies, and propose corrections
only when they are derivable from contextual or domain-consistent evidence.
Instructions:
1. Input will be provided as a JSON object containing {field_name, value, confidence, data_type, rule_reference}.
2. For each record:
- Verify that the value conforms to expected type, unit, and range constraints (as indicated by rule_reference).
- If confidence < 0.85 or rule violation is detected:
a. Analyze related fields for contextual inference (e.g., Pre_Weight vs. Post_Weight, Urea vs. Creatinine).
b. If a correction is logically deducible, output the revised value and reasoning note.
c. If ambiguity remains, flag for human review.
3. NEVER fabricate or infer data outside the observed record set.
4. Return all outputs in strict JSON format:
{
“field_name”: “ “,
“original_value”: “ “,
“suggested_value”: “ “,
“confidence”: “ “,
“status”: “validated|corrected|flagged”,
“reason”: “ “
}
All inference transactions are encapsulated with execution metadata—token counts, latency, and probabilistic confidence metrics—to support deterministic replay. The post-LLM output undergoes Pydantic schema validation, applying rule-based range checks, relational logic (e.g., dry-weight < post-dialysis weight), and temporal-ordering verification before serialization into JSON with provenance hashes for downstream decision-layer ingestion.
The overall implementation follows a hybrid microservice architecture integrating a .NET (C#) front-end for visualization and supervisory control with Python-based back-end services executing OpenCV, Tesseract, and GPT-4 operations through RESTful APIs. Process automation leverages Playwright for web-form orchestration and PyWinAuto for legacy desktop interfacing. This composite architecture constitutes a hybrid deterministic–AI pipeline, where rule-based modules guarantee compliance and reproducibility, and learning-based components contribute adaptive reasoning and contextual correction. The resulting system achieves an explainable, auditable, and regulation-conformant automation framework for nephrology information management. The GPT-4 Turbo model was not retrained or fine-tuned; rather, it was configured through a constrained instruction schema and an internal validation wrapper that enforces structured input/output formats, response length limits, and deterministic key–value alignment.
A key technical characteristic of the proposed architecture lies in its pipeline-level determinism rather than reliance on the raw OCR engine alone. While the Tesseract legacy model is algorithmically deterministic, its output can vary when input images differ in illumination, rotation, or contrast. To address this, the system applies a fixed and repeatable normalization sequence grayscale conversion, resolution standardization, global thresholding, binarization, geometric de-skewing, and noise suppression before every OCR operation. These steps ensure that input frames are rendered into a stable canonical form, enabling reproducible OCR behavior across heterogeneous capture conditions. Furthermore, downstream components including the 84-rule deterministic validator, bounded LLM inference, and deterministic post-validation act as stabilizing layers that correct residual variabilities and constrain uncertainty. This pipeline-level determinism, combined with the rule → reasoning → re-validation loop, represents the most technically distinctive aspect of the architecture and differentiates it from conventional OCR workflows that lack inferential stabilization or safety-layered correction.

3.5. Evaluation Metrics

3.5.1. Error Detection Rate of Automation Accuracy

We evaluated the system’s capability to detect and correct erroneous data fields by comparing two configurations: the basic OCR system, which relies solely on deterministic pattern matching of OCR outputs, and the AI-enhanced OCR system, which integrates advanced AI models for anomaly detection and normalization alongside basic rule-based checks, and also suggests the correct value whenever possible. The primary endpoint was the Error Detection Rate (EDR), equivalent to recall on the error class, calculated as  R e c a l l = T P T P + F N , where  T P  represents the number of correctly detected erroneous fields and  F N  represents the number of undetected errors. To account for over-flagging,  P r e c i s i o n = T P T P + F P  and  F 1 s c o r e = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l , the harmonic mean of precision and recall [30], were also reported. The evaluation was performed across three common error categories: Missing Values, Out-of-Range, and Typo/Free-text, which reflect typical clinical data entry problems [31]. Performance metrics were visualized through grouped bar charts comparing precision, recall, and F1-score for both systems, as well as receiver operating characteristic (ROC) curves [32] to illustrate overall detection performance, with the area under the curve (AUC) calculated for each system. The evaluation aimed not only to compare the raw recognition performance between the basic OCR and the AI-enhanced OCR systems but also to validate how improved text accuracy supports the agent-based automation layer in achieving reliable data submission. This connection between recognition accuracy and automation reliability forms a key validation step for the virtual clerk architecture.

3.5.2. Time Efficiency

To evaluate operational performance, we measured the processing speed and time efficiency of both systems using two key metrics. The first was Average Time per Record, defined as the mean time required to complete the processing of a single record from initial capture to final confirmation. The second was Total Completion Time, which measured the total elapsed time required to process all records in a batch. Identical tasks were executed under controlled conditions using both the basic OCR system and the AI-enhanced OCR system, and the resulting mean values were recorded and compared. Time reduction was expressed as both absolute time saved and percentage improvement. Each participant processed dialysis reports under both configurations (basic OCR and AI-enhanced OCR) in a counterbalanced order, and the system automatically recorded timestamps for start and completion events to ensure objective measurement. LLM latency was computed from server-side timestamps (request dispatch to response receive), and per-record cost was derived from API token-usage logs averaged across the test set. This evaluation was conducted by clinical staff in a real-world setting to ensure that the recorded times accurately reflected the system’s practical performance.

3.5.3. System Usability Evaluation

User acceptance and usability were assessed using the System Usability Scale (SUS) [33], a standardized instrument consisting of ten items with alternating positive and negative statements rated on a five-point Likert scale. Each participant used both the basic OCR system and the AI-enhanced OCR system and completed the SUS questionnaire for each configuration after task completion. For each configuration, the mean SUS score and standard deviation were calculated on a 0–100 scale, where higher scores indicate better usability. These scores were interpreted against conventional benchmarks [34], with scores below 50 categorized as Not Acceptable, scores between 50 and 70 as Marginal, and scores above 70 as Acceptable, including the sub-ranges of Good (70–80) and Excellent (>80). The results were visualized using an acceptability scale diagram to highlight the usability difference between the two systems.

3.6. Experimental Design and Workflow

Figure 5 presents the overall experimental design and workflow of the proposed AI-enhanced OCR system. The process begins with data collection and preprocessing of clinical laboratory documents obtained from dialysis centers. The baseline OCR system is first applied to extract textual content, followed by the AI-enhanced OCR process that integrates rule-based validation and a LLM reasoning component to detect anomalies and correct recognition errors. The processed data are then evaluated using precision, recall, and F1-score metrics, while user feedback is collected to assess workflow efficiency and usability. A human-in-the-loop validation step is included at the final stage to ensure the integrity of the corrected data before reporting. This experimental design provides a structured framework for comparing both the baseline OCR and AI-enhanced OCR configurations and for validating their effectiveness in real-world clinical documentation tasks.

3.7. Data Sources

The data for testing the virtual clerk were obtained from real-world clinical environments, specifically a private dialysis center, where routine patient care and administrative documentation are performed. In this study, most inputs consisted of printed laboratory reports containing key biochemical parameters transferred from other clinical laboratories, with the main challenge at the dialysis clinic being the need to manually enter these data into the system, as shown in Figure 6. The evaluation of the virtual clerk system was conducted using a dataset consisting of 65 documents of a single type, specifically printed laboratory reports containing computer-generated printed text only, with no handwritten entries or signatures. Each document included approximately 35 data fields representing key clinical and administrative information. The data entry tasks were performed by ten specialized dialysis nurses (five using the basic OCR system and five using the AI-enhanced OCR system), all of whom routinely handle patient care documentation in a real-world clinical setting.
To realistically simulate operational conditions, scanned images were supplemented by webcam captures commonly used in clinical settings. This approach introduced natural variability in resolution, lighting, and perspective, reflecting how forms are actually digitized in practice. As a result, the input data exhibited heterogeneous quality: some documents were clear and properly aligned, whereas others showed skew, shadowing, or angled views due to handheld captures. Such diversity in data quality was essential for testing the robustness of the image-processing pipeline, which must reliably normalize noisy or distorted inputs before feeding them into the validation and automation layers.

4. Results

4.1. Results of Error Detection Rate of Automation Accuracy

The evaluation was performed on 65 documents, each containing approximately 35 data fields, totaling 2275 fields for error detection analysis. The comparison between the basic OCR system and the AI-enhanced OCR system demonstrated clear improvements in error detection performance across all three error categories, as presented in Table 1 and Figure 7. For Missing Values, the AI-enhanced OCR achieved a precision of 0.990, recall of 0.950, and F1-score of 0.969, all higher than those of the basic OCR system (precision 0.968, recall 0.900, F1-score 0.933). In the Out-of-Range category, the AI-enhanced OCR showed the greatest improvement with near-perfect recall (0.999) and precision (0.995), yielding an F1-score of 0.997, compared to the basic OCR system’s precision of 0.951, recall of 0.967, and F1-score of 0.959. Similarly, for Typo/Free-text errors, the AI-enhanced OCR reached a precision of 0.990, recall of 0.977, and F1-score of 0.983, outperforming the basic OCR’s precision of 0.922, recall of 0.950, and F1-score of 0.936. As shown in the grouped bar chart, the AI-enhanced OCR consistently achieved higher precision, recall, and F1-scores across all error categories, with the most notable improvement observed in Out-of-Range errors. These results indicate that the integration of AI with traditional OCR significantly enhances automation accuracy and reduces manual verification needs, particularly when handling complex or ambiguous data fields.
Figure 8 shows the ROC curves comparing the precision, recall, and F1-score of the basic OCR system and the AI-enhanced OCR system across three error categories: Missing Values, Out-of-Range, and Typo/Free-text. The results show that the AI-enhanced OCR consistently outperformed the basic OCR system in all three metrics, with the most notable improvement observed in the Out-of-Range category, where its performance approached near-perfect accuracy. This demonstrates the effectiveness of integrating AI capabilities with traditional OCR in improving error detection and correction, leading to more reliable and accurate data processing.

4.2. Results of Efficiency of Time

The evaluation of time efficiency was conducted using a total of 65 documents, each containing approximately 35 data fields, processed under both configurations: the basic OCR system and the AI-enhanced OCR system. As shown in Table 2, the average time per record for the basic OCR system was 85.2 s, whereas the AI-enhanced OCR system reduced this to 42.1 s, resulting in a time saving of 43.1 s per record. In terms of total completion time for all 65 documents, the basic OCR system required 92.3 min, while the AI-enhanced OCR system completed the task in only 45.6 min, representing a reduction of 46.7 min. These results demonstrate that the AI-enhanced OCR system nearly doubled the processing speed, significantly reducing manual effort and improving overall workflow efficiency in the clinical data entry process.

4.3. Results of System Usability and Adoption

The usability of the two systems was evaluated using the SUS, which consists of ten standardized questions rated on a five-point Likert scale. As shown in Table 3 and Figure 9, the AI-enhanced OCR system achieved a higher overall SUS score of 81.0, placing it in the “Excellent” category, whereas the basic OCR system received a score of 75.0, which falls within the “Good” range. Across individual questions, the AI-enhanced OCR system consistently scored slightly higher than the basic OCR system, particularly in areas related to ease of use (Q3, Q9) and user confidence (Q1). However, both systems showed lower scores for Q6 and Q7, indicating that users perceived some inconsistency in the system and recognized the need for improvement in training and learning speed. These results suggest that while both systems are generally acceptable for clinical use, the integration of AI significantly enhances user satisfaction and system adoption by reducing perceived complexity and improving workflow integration.

5. Discussion

5.1. Summary of Key Findings

The evaluation revealed that the AI-enhanced OCR system significantly improved its ability to detect and correct errors compared to the basic OCR system, as evidenced by higher precision, recall, and F1-scores across all three error categories, as shown in Figure 10 and Figure 11. In the Out-of-Range category, the AI-enhanced OCR effectively addressed common decimal placement errors, such as when a laboratory value like “20.2” was misread as “2.02.” This improvement not only enhanced data accuracy but also reduced the risk of misinterpretation in clinical decision-making. For Typo/Free-text errors, the AI-enhanced OCR accurately matched hospital numbers (HN) with patient names, even when names were misspelled or inconsistently recorded, a task at which the basic OCR system often failed. In the case of Missing Values, the system actively flagged data fields that were expected to contain information but were left empty, providing notifications to users to manually verify and input the correct data. These capabilities illustrate how AI integration enhances both detection accuracy and error resolution, enabling the system to handle complex, real-world clinical data challenges that basic OCR approaches alone cannot effectively manage.
Beyond error detection, the AI-enhanced OCR system demonstrated substantial improvements in workflow efficiency and user satisfaction. The average time per record was reduced by nearly half, and total completion time decreased significantly, showing the potential to accelerate routine clinical documentation tasks. Usability testing using the SUS indicated that users rated the AI-enhanced system as “Excellent”, compared to the “Good” rating for the basic OCR system [34]. Open-ended feedback highlighted that users valued the system’s ability to reduce manual verification and improve accuracy, though some areas, such as consistency and ease of learning, still require enhancement. Together, these findings demonstrate that integrating AI into OCR systems not only increases technical performance but also supports more efficient, user-friendly workflows, aligning with previous research on AI-driven health informatics solutions [31,35].

5.2. Comparison with Previous Studies

The findings of this study are consistent with previous research that highlights the limitations of traditional OCR systems and the benefits of integrating AI for improving accuracy in clinical documentation. Prior studies have shown that conventional rule-based OCR systems are prone to common errors such as decimal misplacements and misinterpretation of numeric values, which can lead to clinically significant inaccuracies [36,37]. Our results demonstrate that by incorporating AI-driven anomaly detection, the system was able to correct errors automatically, particularly in the Out-of-Range category, reducing risks to patient safety and improving data quality. Similar improvements were reported by [38], who emphasized the role of machine learning models in identifying and correcting inconsistent or erroneous health data in electronic health records (EHRs). These outcomes also align with the broader literature on data quality management in healthcare, which identifies completeness, accuracy, and consistency as key dimensions for reliable EHR data [31].
In terms of usability, our findings align with previous studies that have validated the System Usability Scale (SUS) as a reliable measure of user acceptance in clinical systems. The AI-enhanced OCR system substantially reduced the manual workload required for data entry, allowing healthcare staff to spend more time focusing on direct patient care rather than administrative tasks. Ref. [34] established interpretation thresholds for SUS scores, categorizing them into levels such as Good and Excellent, which guided the interpretation of our results. Although the AI-enhanced OCR system in our study achieved an Excellent usability rating, the handling of healthcare data requires a higher level of reliability due to its direct impact on patient safety and clinical decision-making. Therefore, even highly accurate and user-friendly systems must maintain a human-in-the-loop (HITL) process at the final stage to verify critical information before it is entered into electronic health records. This approach has been recommended by several researchers as a safeguard to mitigate residual risks and ensure accountability, particularly when AI systems are deployed in high-stakes healthcare environments [39,40]. Similar to prior studies, our results indicate that while AI automation can significantly reduce manual workload, human oversight remains essential for final verification to maintain data integrity and protect patient safety.

5.3. Practical Implications for Clinical Workflow

The implementation of the AI-enhanced OCR system has substantial implications for optimizing clinical workflows, particularly in high-volume settings such as dialysis centers. By automating routine data entry tasks and intelligently detecting common errors, this system significantly reduces the administrative workload placed on nurses and administrative staff, enabling them to devote more time to direct patient care and clinical decision-making rather than repetitive clerical work. This mirrors recent findings in generative AI research showing that large language models (LLMs) embedded within EHRs can improve documentation quality and reduce editing burden, leading to more efficient note-taking and summarization workflows [41,42]. Moreover, by correcting decimal placement errors and ensuring accurate linkage of hospital numbers with patient names even when names are misspelled, the system enhances the completeness and accuracy of patient records. Such improvements directly address long-standing issues with electronic health record (EHR) data quality, which is critical for safe and reliable clinical decision-making [31]. Early studies on GenAI-driven clinical documentation also emphasize its potential to make records more comprehensive and organized, though privacy, bias, and accuracy remain active concerns that must be continuously managed [32]. From a computational perspective, the integration of the LLM (GPT-4 API, version released on 1 March 2025) introduced an average response latency of approximately 1.6 s per query, primarily during the anomaly-detection and normalization steps. This delay was considered acceptable for near-real-time clinical workflows, as most data-entry tasks occur asynchronously with patient encounters. The average GPU-equivalent cost of each API call was estimated at $0.002 per record, resulting in minimal operational expense for batch processing. System-level optimization through prompt-truncation, caching of frequent templates, and asynchronous request handling further mitigated latency and ensured that end-to-end throughput remained compatible with daily dialysis-unit workloads.
Beyond immediate workflow efficiency, the system provides a model for integrating generative AI into healthcare in a manner that balances automation with safety. Recent global guidance from the World Health Organization (WHO) highlights that large multimodal models must include governance mechanisms and a human-in-the-loop process to ensure transparency, accountability, and patient safety, particularly in high-stakes environments [43]. Even though the AI-enhanced OCR system in this study demonstrated excellent usability and substantial reductions in manual workload, final verification by human experts remains essential to mitigate residual risks and safeguard against potential errors that may arise from automated processing [39,40,44]. This approach is consistent with current evidence that AI systems should be viewed as augmenting rather than replacing human expertise, supporting clinicians by streamlining documentation while maintaining professional oversight. In the long term, widespread adoption of such systems could transform healthcare operations by reducing administrative costs, improving regulatory compliance, and ultimately allowing clinicians to spend more time on patient-centered care, while adhering to global standards for ethical AI deployment [45,46].
Beyond usability and governance considerations, the quantitative findings further highlight the operational impact of the system. The improvements in precision, recall, and processing time observed in Table 1 and Table 2 directly support the functionality of the agent-based automation layer (Layer 3). Higher OCR accuracy ensures that the virtual clerk can perform automated submission with minimal human correction, reducing propagation of data errors into national reporting platforms. The latency measurements confirm that the LLM-based reasoning process introduces negligible delay, maintaining near real-time responsiveness required in clinical documentation workflows. Together, these outcomes demonstrate that enhanced recognition reliability and operational efficiency are critical enablers of safe and effective automation, validating the design of the virtual clerk as a practical agent system rather than a standalone OCR tool.
Comparatively, previous studies on clinical documentation automation have explored a range of approaches, including conventional rule-based systems, convolutional neural networks (CNNs) for image recognition, and transformer-based models for text normalization [15,41,42]. Rule-based frameworks, while transparent and interpretable, often lack scalability across different form formats and require frequent manual updates when institutional templates change. Deep-learning methods achieve higher recognition accuracy but typically demand large annotated datasets and extensive computational resources, which may not be feasible in smaller healthcare facilities. The agent-based OCR framework proposed in this study seeks to balance these trade-offs by combining deterministic rule validation with adaptive LLM reasoning, enabling flexible data interpretation without retraining for each new layout. Nonetheless, this approach also inherits limitations related to LLM latency, dependency on API availability, and the absence of domain-specific fine-tuning [41,42]. Understanding these comparative strengths and weaknesses helps clarify the methodological landscape and provides a foundation for selecting suitable approaches in future research and system deployment.

5.4. Limitations and Future Work

While the results of this study demonstrate the potential of the AI-enhanced OCR system, several limitations must be acknowledged. The dataset was limited to 65 documents from a single dialysis center and consisted solely of computer-printed text, excluding handwritten notes and mixed-content scanned forms. This narrow dataset restricts the generalizability of the findings, as real-world clinical workflows often involve diverse document formats and varying image quality. Moreover, the study primarily focused on error detection and workflow efficiency without assessing the downstream clinical impact of these improvements, such as whether enhanced data accuracy contributes to better patient outcomes or operational decision-making. Although the AI-enhanced OCR system substantially reduced manual workload, a HITL verification step remained necessary to ensure data integrity and patient safety. The usability evaluation also involved a relatively small group of participants, limiting the representativeness of user experience findings across different clinical roles and institutional contexts. In addition, the experiment compared only two configurations, basic OCR and AI-enhanced OCR, without including other control conditions such as commercial OCR systems, human-only data entry, or hybrid approaches, which could provide a more comprehensive evaluation. Another limitation concerns algorithmic optimization; the current system employs general-purpose pretrained models for image processing and language reasoning without domain-specific fine-tuning, which may constrain its ability to capture the linguistic and contextual nuances of dialysis data. Furthermore, the study did not conduct formal sensitivity analyses of key parameters such as OCR confidence thresholds or preprocessing settings, and uncertainty handling was evaluated only qualitatively. Common failure cases (e.g., ambiguous numeric characters or low-contrast regions) were observed during pilot testing but were not quantified systematically, representing another limitation of the present evaluation.
Future work should address these limitations through several strategic directions. First, expanding the dataset to include a larger and more diverse sample from dialysis centers of different sizes, ownership types, and geographic regions will improve external validity and ensure robust performance across varied operational contexts. Second, comparative studies involving multiple control groups such as commercial OCR software or human-only workflows will help benchmark the relative advantages of AI-assisted approaches. Third, future versions of the system should focus on customizing and fine-tuning models using dialysis-specific terminology, parameter ranges, and common error patterns to improve anomaly detection and contextual correction. Fourth, enhancements to the user interface should emphasize quick-operation features such as one-click filling, batch confirmation, and smart auto-completion to further streamline workflow efficiency. Fifth, the design of charts and illustrations can be optimized for greater intuitiveness and clarity, supporting real-time decision-making by clinical staff. Finally, large-scale deployment studies should examine governance mechanisms, explainability features, and privacy safeguards to ensure the ethical and safe integration of AI-enhanced OCR systems into national healthcare infrastructures. By addressing these directions, future iterations of the system could evolve into highly reliable and adaptive tools that improve clinical efficiency, data quality, and patient-centered care.

6. Conclusions

This study introduced and evaluated an AI-enhanced OCR system designed to improve the accuracy and efficiency of clinical data entry in dialysis care settings. By integrating advanced anomaly detection and normalization capabilities, the system successfully addressed common and critical errors, such as decimal placement issues and mismatches between hospital numbers and patient names, while also flagging missing values for manual review. The evaluation demonstrated that the AI-enhanced OCR system significantly reduced error rates and processing times compared to a basic OCR system, while achieving excellent usability ratings among clinical users. These improvements indicate that the system can serve as a valuable tool for enhancing data quality and streamlining administrative workflows, ultimately allowing healthcare professionals to dedicate more time to direct patient care.
Although the system delivered substantial benefits, human oversight remained a crucial component to ensure patient safety and data integrity. The findings support the concept of a human-in-the-loop approach, where automation assists with high-volume, repetitive tasks while final verification remains under professional supervision. Looking ahead, the expansion of this system to include diverse document types, integration with electronic health record platforms, and the incorporation of more advanced AI technologies could further transform clinical documentation practices. By continuing to refine both technical capabilities and governance frameworks, AI-driven OCR systems have the potential to become trusted, scalable solutions that not only reduce administrative burden but also contribute to safer, more efficient, and patient-centered healthcare delivery.

Author Contributions

Conceptualization, S.C. and K.I.; methodology, P.J. and K.I.; software, P.W.; validation, P.W. and K.I.; formal analysis, K.P. and K.I.; investigation, P.W.; resources, K.I.; data curation, K.I.; writing—original draft preparation, P.W. and K.I.; writing—review and editing, K.P.; visualization, P.W.; supervision, K.I.; project administration, K.P.; funding acquisition, K.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by Chiang Mai University and National council of Thailand (NRCT).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Committee of Research Ethics, Faculty of Public Health, Chiang Mai University (ET031/2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to restrictions. The data are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
APIsApplication Programming Interfaces
EHRElectronic Health Records
OCROptical Character Recognition
RPARobotic Process Automation
SUSSystem Usability Scale

References

  1. Satirapoj, B.; Tantiyavarong, P.; Thimachai, P.; Chuasuwan, A.; Lumpaopong, A.; Kanjanabuch, T.; Ophascharoensuk, V. Thailand Renal Replacement Therapy Registry 2023: Epidemiological Insights into Dialysis Trends and Challenges. Ther. Apher. Dial. 2025, 29, 721–729. [Google Scholar] [CrossRef]
  2. Spanakis, E.G.; Sfakianakis, S.; Bonomi, S.; Ciccotelli, C.; Magalini, S.; Sakkalis, V. Emerging and Established Trends to Support Secure Health Information Exchange. Front. Digit. Health 2021, 3, 636082. [Google Scholar] [CrossRef]
  3. Garza, M.Y.; Williams, T.; Ounpraseuth, S.; Hu, Z.; Lee, J.; Snowden, J.; Walden, A.C.; Simon, A.E.; Devlin, L.A.; Young, L.W.; et al. Error Rates of Data Processing Methods in Clinical Research: A Systematic Review and Meta-Analysis of Manuscripts Identified through PubMed. Int. J. Med. Inform. 2025, 195, 105749. [Google Scholar] [CrossRef]
  4. Zhou, X.; Zeng, T.; Zhang, Y.; Liao, Y.; Smith, J.; Zhang, L.; Wang, C.; Li, Q.; Wu, D.; Chong, Y.; et al. Automated Data Collection Tool for Real-World Cohort Studies of Chronic Hepatitis B: Leveraging OCR and NLP Technologies for Improved Efficiency. New Microbes New Infect. 2024, 62, 101469. [Google Scholar] [CrossRef]
  5. Budd, J. Burnout Related to Electronic Health Record Use in Primary Care. J. Prim. Care Community Health 2023, 14, 21501319231166921. [Google Scholar] [CrossRef]
  6. Gao, S.; Fang, A.; Huang, Y.; Giunchiglia, V.; Noori, A.; Schwarz, J.R.; Ektefaie, Y.; Kondic, J.; Zitnik, M. Empowering Biomedical Discovery with AI Agents. Cell 2024, 187, 6125–6151. [Google Scholar] [CrossRef]
  7. Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef] [PubMed]
  8. Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317–1318. [Google Scholar] [CrossRef] [PubMed]
  9. Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J. Biomed. Health Inform. 2018, 22, 1589–1604. [Google Scholar] [CrossRef] [PubMed]
  10. Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  11. Sinsky, C.; Colligan, L.; Li, L.; Prgomet, M.; Reynolds, S.; Goeders, L.; Westbrook, J.; Tutty, M.; Blike, G. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann. Intern. Med. 2016, 165, 753–760. [Google Scholar] [CrossRef]
  12. Patrício, L.; Varela, L.; Silveira, Z. Integration of Artificial Intelligence and Robotic Process Automation: Literature Review and Proposal for a Sustainable Model. Appl. Sci. 2024, 14, 9648. [Google Scholar] [CrossRef]
  13. Wang, X.-F.; He, Z.-H.; Wang, K.; Wang, Y.-F.; Zou, L.; Wu, Z.-Z. A Survey of Text Detection and Recognition Algorithms Based on Deep Learning Technology. Neurocomputing 2023, 556, 126702. [Google Scholar] [CrossRef]
  14. Liu, Z.; Song, R.; Li, K.; Li, Y. From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing. Appl. Sci. 2025, 15, 9247. [Google Scholar] [CrossRef]
  15. Xu, Y.; Li, M.; Cui, L.; Huang, S.; Wei, F.; Zhou, M. LayoutLM: Pre-Training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; ACM: New York, NY, USA, 2020. [Google Scholar]
  16. Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T. Text Recognition in the Wild: A Survey. ACM Comput. Surv. 2022, 54, 1–35. [Google Scholar] [CrossRef]
  17. Nitayavardhana, P.; Liu, K.; Fukaguchi, K.; Fujisawa, M.; Koike, I.; Tominaga, A.; Iwamoto, Y.; Goto, T.; Suen, J.Y.; Fraser, J.F.; et al. Streamlining Data Recording through Optical Character Recognition: A Prospective Multi-Center Study in Intensive Care Units. Crit. Care 2025, 29, 117. [Google Scholar] [CrossRef]
  18. van der Aalst, W.M.P.; Bichler, M.; Heinzl, A. Robotic Process Automation. Bus. Inf. Syst. Eng. 2018, 60, 269–272. [Google Scholar] [CrossRef]
  19. Syed, R.; Suriadi, S.; Adams, M.; Bandara, W.; Leemans, S.J.J.; Ouyang, C.; ter Hofstede, A.H.M.; van de Weerd, I.; Wynn, M.T.; Reijers, H.A. Robotic Process Automation: Contemporary Themes and Challenges. Comput. Ind. 2020, 115, 103162. [Google Scholar] [CrossRef]
  20. Jennings, N.R.; Sycara, K.; Wooldridge, M. A Roadmap of Agent Research and Development. Auton. Agent. Multi. Agent. Syst. 1998, 1, 7–38. [Google Scholar] [CrossRef]
  21. Maes, P. Agents That Reduce Work and Information Overload. Commun. ACM 1994, 37, 30–40. [Google Scholar] [CrossRef]
  22. Mandel, J.C.; Kreda, D.A.; Mandl, K.D.; Kohane, I.S.; Ramoni, R.B. SMART on FHIR: A Standards-Based, Interoperable Apps Platform for Electronic Health Records. J. Am. Med. Inform. Assoc. 2016, 23, 899–908. [Google Scholar] [CrossRef]
  23. Kruse, C.S.; Frederick, B.; Jacobson, T.; Monticone, D.K. Cybersecurity in Healthcare: A Systematic Review of Modern Threats and Trends. Technol. Health Care 2017, 25, 1–10. [Google Scholar] [CrossRef]
  24. Kuo, T.-T.; Kim, H.-E.; Ohno-Machado, L. Blockchain Distributed Ledger Technologies for Biomedical and Health Care Applications. J. Am. Med. Inform. Assoc. 2017, 24, 1211–1220. [Google Scholar] [CrossRef]
  25. Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations without Sharing Patient Data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef] [PubMed]
  26. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
  27. Sauvola, J.; Pietikäinen, M. Adaptive Document Image Binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef]
  28. Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; IEEE: Piscataway, NJ, USA, 2007; Volume 2. [Google Scholar]
  29. Hsu, E.; Malagaris, I.; Kuo, Y.-F.; Sultana, R.; Roberts, K. Deep Learning-Based NLP Data Pipeline for EHR-Scanned Document Information Extraction. JAMIA Open 2022, 5, ooac045. [Google Scholar] [CrossRef]
  30. Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv 2020. [Google Scholar] [CrossRef]
  31. Weiskopf, N.G.; Weng, C. Methods and Dimensions of Electronic Health Record Data Quality Assessment: Enabling Reuse for Clinical Research. J. Am. Med. Inform. Assoc. 2013, 20, 144–151. [Google Scholar] [CrossRef] [PubMed]
  32. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  33. Brooke, J. SUS—A Quick and Dirty Usability Scale. Ahrq.gov. Available online: https://digital.ahrq.gov/sites/default/files/docs/survey/systemusabilityscale%2528sus%2529_comp%255B1%255D.pdf (accessed on 26 September 2025).
  34. Bangor, A.; Kortum, P.; Miller, J. Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale. J. Usability stud. 2009, 4, 114–123. [Google Scholar]
  35. Lewis, J.R. The System Usability Scale: Past, Present, and Future. Int. J. Hum. Comput. Interact. 2018, 34, 577–590. [Google Scholar] [CrossRef]
  36. Wu, Y.; Dalianis, H.; Velupillai, S. Errors in Clinical Text Processing and Their Impact on Decision-Making: A Review. Artif. Intell. Med. 2020, 104, 101833. [Google Scholar]
  37. Nguyen, P.A.; Shim, J.S.; Ho, T.B.; Li, W. Machine Learning-Based Approaches for Clinical Text Error Detection: A Systematic Review. J. Biomed. Inform. 2022, 127, 104018. [Google Scholar]
  38. Luo, Y.; Thompson, W.K.; Herr, T.M.; Zeng, Z.; Berendsen, M.A.; Jonnalagadda, S.R.; Carson, M.B.; Starren, J. Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review. Drug Saf. 2017, 40, 1075–1089. [Google Scholar] [CrossRef]
  39. Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.; Precise4Q consortium. Explainability for Artificial Intelligence in Healthcare: A Multidisciplinary Perspective. BMC Med. Inform. Decis. Mak. 2020, 20, 310. [Google Scholar] [CrossRef] [PubMed]
  40. Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key Challenges for Delivering Clinical Impact with Artificial Intelligence. BMC Med. 2019, 17, 195. [Google Scholar] [CrossRef]
  41. Small, W.R.; Wang, L.; Horng, S. EHR-Embedded Large Language Models for Hospital-Course Summarization. JAMA Netw. Open 2025, 8, e250112. [Google Scholar] [CrossRef]
  42. Kernberg, A.; Gold, J.A.; Mohan, V. Using ChatGPT-4 to Create Structured Medical Notes from Audio Recordings of Physician-Patient Encounters: Comparative Study. J. Med. Internet Res. 2024, 26, e54419. [Google Scholar] [CrossRef]
  43. World Health Organization. Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models. Who.int. Available online: https://www.who.int/publications/i/item/9789240084759 (accessed on 26 September 2025).
  44. Howell, M.D. Generative Artificial Intelligence, Patient Safety and Healthcare Quality: A Review. BMJ Qual. Saf. 2024, 33, 748–754. [Google Scholar] [CrossRef] [PubMed]
  45. Reddy, S. Generative AI in Healthcare: An Implementation Science Informed Translational Path on Application, Integration and Governance. Implement. Sci. 2024, 19, 27. [Google Scholar] [CrossRef] [PubMed]
  46. Bakken, S. AI in Health: Keeping the Human in the Loop. J. Am. Med. Inform. Assoc. 2023, 30, 1225–1226. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Three-layer architecture of the intelligent virtual clerk showing implemented OCR, validation, and automation modules.
Figure 1. Three-layer architecture of the intelligent virtual clerk showing implemented OCR, validation, and automation modules.
Technologies 13 00530 g001
Figure 2. Technical Implementation Architecture of the Virtual Clerk.
Figure 2. Technical Implementation Architecture of the Virtual Clerk.
Technologies 13 00530 g002
Figure 3. Agent cycle design of the virtual clerk.
Figure 3. Agent cycle design of the virtual clerk.
Technologies 13 00530 g003
Figure 4. Overview of the image processing pipeline for input and output.
Figure 4. Overview of the image processing pipeline for input and output.
Technologies 13 00530 g004
Figure 5. Experimental design and evaluation workflow of the AI-enhanced OCR system.
Figure 5. Experimental design and evaluation workflow of the AI-enhanced OCR system.
Technologies 13 00530 g005
Figure 6. Example of a printed laboratory report used as a data source.
Figure 6. Example of a printed laboratory report used as a data source.
Technologies 13 00530 g006
Figure 7. Grouped bar chart comparing precision, recall, and F1-score for the two systems.
Figure 7. Grouped bar chart comparing precision, recall, and F1-score for the two systems.
Technologies 13 00530 g007
Figure 8. ROC curve comparing error detection performance of basic OCR and AI-enhanced OCR systems.
Figure 8. ROC curve comparing error detection performance of basic OCR and AI-enhanced OCR systems.
Technologies 13 00530 g008
Figure 9. SUS acceptability scale indicating overall usability levels of the two systems.
Figure 9. SUS acceptability scale indicating overall usability levels of the two systems.
Technologies 13 00530 g009
Figure 10. Example of the Virtual Clerk detecting and highlighting Typo/Free-text errors and Missing Values.
Figure 10. Example of the Virtual Clerk detecting and highlighting Typo/Free-text errors and Missing Values.
Technologies 13 00530 g010
Figure 11. Example of the Virtual Clerk detecting and highlighting Out-of-Range.
Figure 11. Example of the Virtual Clerk detecting and highlighting Out-of-Range.
Technologies 13 00530 g011
Table 1. Precision, recall, and F1-score of the basic OCR and AI-enhanced OCR systems across three error categories.
Table 1. Precision, recall, and F1-score of the basic OCR and AI-enhanced OCR systems across three error categories.
Error CategoryDetect MethodPrecisionRecall (EDR)F1-Score
Missing ValuesOCR only0.9680.9000.933
OCR + AI0.9900.9500.969
Out-of-RangeOCR only0.9510.9670.959
OCR + AI0.9950.9990.997
Typo/Free-textOCR only0.9220.9500.936
OCR + AI0.9900.9770.983
Table 2. Comparison of processing times between the basic OCR and AI-enhanced OCR systems.
Table 2. Comparison of processing times between the basic OCR and AI-enhanced OCR systems.
MetricOCR OnlyOCR + AIAverage Time Difference
Average Time per Record (sec)85.242.143.1
Total Completion Time (min)92.345.646.7
Table 3. Mean SUS scores for individual questionnaire items comparing the basic OCR and AI-enhanced OCR systems.
Table 3. Mean SUS scores for individual questionnaire items comparing the basic OCR and AI-enhanced OCR systems.
No.TypeQuestionsNOCR Only
Mean (SD)
OCR + AI
Mean (SD)
1PositiveI think I would like to use the OCR system/AI-enhanced OCR system regularly for completing healthcare data entry tasks.54.68 (0.47)4.74 (0.42)
2NegativeI found the OCR system/AI-enhanced OCR system unnecessarily complex.51.92 (0.58)1.64 (0.51)
3PositiveI thought the OCR system/AI-enhanced OCR system was easy to use.54.28 (0.52)4.52 (0.46)
4NegativeI think I would need support from a technical expert to effectively use the OCR system/AI-enhanced OCR system.52.08 (0.63)1.78 (0.55)
5PositiveI found the functions of the OCR system/AI-enhanced OCR system to be well integrated into the existing workflow.54.40 (0.49)4.46 (0.48)
6NegativeI thought there was too much inconsistency in the OCR system/AI-enhanced OCR system.53.18 (0.81)2.86 (0.77)
7PositiveI imagine most healthcare staff would learn to use the OCR system/AI-enhanced OCR system very quickly.53.58 (0.69)3.92 (0.66)
8NegativeI found the OCR system/AI-enhanced OCR system cumbersome to use.51.84 (0.54)1.66 (0.53)
9PositiveI felt confident using the OCR system/AI-enhanced OCR system.54.32 (0.51)4.62 (0.47)
10NegativeI needed to learn many things before I could start using the OCR system/AI-enhanced OCR system.52.24 (0.60)1.92 (0.56)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Worragin, P.; Chernbumroong, S.; Puritat, K.; Julrode, P.; Intawong, K. Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care. Technologies 2025, 13, 530. https://doi.org/10.3390/technologies13110530

AMA Style

Worragin P, Chernbumroong S, Puritat K, Julrode P, Intawong K. Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care. Technologies. 2025; 13(11):530. https://doi.org/10.3390/technologies13110530

Chicago/Turabian Style

Worragin, Perasuk, Suepphong Chernbumroong, Kitti Puritat, Phichete Julrode, and Kannikar Intawong. 2025. "Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care" Technologies 13, no. 11: 530. https://doi.org/10.3390/technologies13110530

APA Style

Worragin, P., Chernbumroong, S., Puritat, K., Julrode, P., & Intawong, K. (2025). Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care. Technologies, 13(11), 530. https://doi.org/10.3390/technologies13110530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop