1. Introduction
Electronic health records (EHRs) have become a cornerstone of modern healthcare, providing critical infrastructure for clinical decision-making, research, and health system management. Global adoption has grown rapidly: many high-income countries now have national or regional EHR systems in place (often exceeding 50–70% adoption), whereas adoption in low- and lower-middle-income countries remains substantially lower, often in the range of 10–40%, depending on region [
1]. In the United States, nearly 96% of hospitals and 78% of office-based physicians now use certified EHR systems [
2]. Similarly, the European Union has committed to creating a European Health Data Space by 2025, aiming to improve cross-border interoperability of EHRs [
3].
Despite this widespread implementation, the incompleteness of EHR data remains a critical, unresolved challenge. Empirical studies reveal that many EHR datasets have substantial missingness: in some settings, 30–40% of variables may be missing more than half their expected values, especially for laboratory or social determinants data, and overall missingness rates for key clinical variables often fall in the 15–30% range, depending on context [
4,
5,
6]. Missing or fragmented EHR records have been linked to an increased risk of medication errors, redundant testing, misdiagnoses, and worse health outcomes, especially among marginalized populations. Evidence shows that discontinuities in patient records degrade predictive model fairness for racial and ethnic minorities [
7], while missing data and documentation gaps contribute to diagnostic inaccuracies and bias in clinical decision-making [
8,
9]. As health systems increasingly rely on EHR data for predictive analytics, population health management, and machine learning applications, the risks associated with incompleteness are magnified.
While prior research has described aspects of data incompleteness, most existing work remains fragmented. Earlier contributions have focused either on conceptual frameworks [
10], technical completeness metrics [
11], or specific methodological biases [
12]. However, a unifying perspective that situates incompleteness as a systemic process problem spanning patients, providers, technology, and policy has been largely absent. Furthermore, the global dimension of this issue manifested in differences across health systems, governance structures, and digital divides has not been adequately integrated into prior reviews. The novelty of this article lies in the following contributions:
It provides an integrative framing of EHR incompleteness across six interdependent domains: definitions and dimensions, types of missingness, research contributions, process gaps, implications, and mitigation strategies with future directions.
It synthesizes empirical findings from multiple countries, moving beyond single-region perspectives.
It highlights both the statistical foundations and the sociotechnical drivers of incompleteness.
It emphasizes the ethical and policy consequences of incomplete records, linking data quality to equity and governance.
In parallel with these academic and operational developments, major strategic and regulatory bodies have also advanced initiatives to improve the completeness, interoperability, and governance of electronic health records. For instance, the European Commission’s European Health Data Space initiative seeks to harmonize standards and enhance the quality and accessibility of health data across member states, promoting cross-border interoperability and secondary data use for research and innovation [
3,
13,
14]. Similarly, the U.S. Food and Drug Administration (FDA) has emphasized real-world data reliability and electronic health record usability as part of its broader digital health modernization efforts, reinforcing the importance of structured and complete data capture for regulatory decision-making. These activities reflect growing international momentum toward policy frameworks that prioritize data completeness as a foundation for safe, equitable, and interoperable digital health systems [
13,
15].
1.1. Perspective Design and Scope
As this work is presented as a perspective, rather than a systematic review, its objective is to synthesize key conceptual, empirical, and policy-oriented developments related to EHR incompleteness. The selection of literature emphasized peer-reviewed studies published between 2013 and 2025 that addressed data completeness, interoperability, bias, and governance within electronic health record systems. Foundational theoretical frameworks were included to establish conceptual grounding, complemented by recent computational and policy-driven studies that reflect current trends. The perspective was designed to integrate diverse viewpoints—spanning technical, organizational, patient-centered, and policy domains—into a coherent argument highlighting incompleteness as a systemic process issue. This narrative and integrative design was chosen to encourage cross-disciplinary dialogue and to identify pragmatic directions for future research and governance.
The remainder of this article is structured as follows:
Section 2 develops the conceptual foundations of incompleteness, including definitions, mathematical representations, and a taxonomy of missingness.
Section 3 reviews key research contributions, bridging conceptual, statistical, and computational perspectives.
Section 4 synthesizes cross-country evidence, underscoring the global scope of the problem.
Section 5 identifies process gaps across patients, providers, technology, and policy.
Section 6 analyzes the implications of incompleteness for clinical care, operations, research, and ethics.
Section 7 outlines mitigation strategies and future directions, spanning technical, organizational, policy, and patient-centered solutions. Finally,
Section 8 presents conclusions and authorial reflections.
1.2. Definitions and Dimensions
The IEEE Standards Association defines incompleteness in electronic health records (EHRs) as the absence of a required data element that should exist for a patient encounter [
16]. Building upon this, Weiskopf et al. [
10] proposed four operational dimensions:
Documentation—whether a data element is recorded at all.
Breadth—the extent to which all relevant data types are captured.
Density—the frequency of data entries over time.
Predictive value—the ability of a record to support reliable inference about outcomes.
As shown in
Table 1, each dimension of completeness can be formalized mathematically. Documentation and breadth quantify whether essential data types are present, while density emphasizes longitudinal data capture across encounters. Predictive value links record completeness to downstream inferential strength, bridging data quality and clinical utility. Together, these metrics offer a structured foundation for evaluating EHR completeness beyond qualitative assessment [
10,
11].
To concretize the four completeness dimensions, we computed RSS on a de-identified MIMIC-IV subset of type 2 diabetes encounters (n = 1000). As shown in
Table 2, two medical informaticians selected “required” items for diabetes management; inter-rater agreement was high (
). Component values were normalized to
and aggregated with equal weights:
To enable cross-domain comparability, components were z-normalized within-dataset prior to aggregation.
Table 3 provides an illustrative calculation for three encounters, showing the individual component scores (documentation, breadth, density, and predictive value) and their aggregated RSS values. The visual distribution of these normalized components across encounters is presented in
Figure 1, where the bar profiles highlight variation in data completeness among patient records. To ensure cross-domain comparability, each component was z-normalized within its dataset prior to aggregation, and the “required” item set was defined through expert consensus.
1.3. Types of Missingness
Rubin’s taxonomy [
17] distinguishes between three canonical forms of missingness:
Missing Completely at Random (MCAR):
Missingness is independent of both observed and unobserved data.
Missingness depends only on observed data X.
Missing Not at Random (MNAR):
Missingness depends on the unobserved data itself.
Here, M is the missingness indicator ( if missing), X denotes observed data, and Y unobserved values. Identifying the missingness mechanism is crucial for selecting appropriate imputation or modeling strategies in EHR analyses.
As this work offers a conceptual perspective, validation was discussed, rather than empirically performed. We illustrate how simulated data and the Kolmogorov–Smirnov test can support distinguishing random from systematic missingness [
18]. Expert interpretation, aligned with prior frameworks [
10,
12], further informs practical classification. A conceptual illustration summarizing the complementary roles of statistical and expert reasoning in identifying MCAR, MAR, and MNAR patterns is presented in
Table 4. This reflective approach emphasizes combining statistical reasoning with expert judgment to guide the robust identification of missingness mechanisms across EHR contexts.
1.4. A Taxonomy for EHR Incompleteness
Incompleteness extends beyond data properties to include causal pathways and downstream effects.
Figure 2 presents an integrated taxonomy comprising six domains: Data Dimensions (documentation, breadth, density, predictive value); Types of Missingness (MCAR, MAR, MNAR); Causes (human factors, administrative constraints, technological limitations, patient-generated data challenges); Detection and Quantification Approaches (RSS, Kolmogorov–Smirnov test [
18], graph theory [
19], pattern recognition [
20]); Process Gaps (patient, provider, technology, and administrative levels); and Implications (clinical decision risk, interoperability challenges such as HL7/FHIR, regulatory issues, and bias in machine learning models).
This taxonomy underscores that incompleteness is not merely a technical flaw but a systemic process problem with profound clinical, organizational, and ethical consequences.
2. Key Research Contributions: Complementary Lenses
Research into EHR incompleteness has progressed through several distinct phases. Early work focused on establishing conceptual frameworks, defining the dimensions of completeness, and highlighting the structural gaps in health records. This conceptual grounding provided the language for later operationalization. Building on these foundations, subsequent studies introduced quantifiable measures that allowed completeness to be evaluated more systematically. Among these, the Record Strength Score (RSS) proposed by Nasir et al. [
11] offered a simple yet powerful way to quantify data completeness as the ratio of recorded to required elements within a patient’s clinical context:
where
is the number of required elements documented, and
is the total expected number of elements. This measure was later extended by Nasir et al. [
21] to examine disparities across patient subgroups, demonstrating that incompleteness often reflects underlying inequities in healthcare access and utilization.
As the field matured, more rigorous statistical methods were employed to characterize incompleteness patterns. For instance, Gurupur et al. [
22] applied the Kolmogorov–Smirnov (K–S) test [
18] to detect whether missingness was random or systematic. The K–S statistic is defined as follows:
where
D is the maximum distance between the observed empirical cumulative distribution,
, and the expected distribution,
. By identifying deviations from expected distributions, this method provides insight into whether incompleteness arises from systematic process gaps, rather than random variation.
More recently, advanced computational approaches have reframed incompleteness as a learnable pattern. Ontology-driven validation has been used to detect structural errors in clinical model design [
23], and ML classifiers have been built to predict which physicians will omit documentation in hospital EHR systems [
24]. Real-world studies also continue to uncover missingness patterns across demographic, behavioral, and health history fields [
25]. These developments emphasize that incompleteness is not merely a technical issue but also a behavioral and systemic one, requiring methods that integrate statistical rigor with contextual understanding. As shown in
Table 5, key research contributions on EHR incompleteness are summarized chronologically.
As summarized in
Table 6, research on EHR incompleteness has progressed along a chronological trajectory from early conceptual frameworks to advanced computational modeling. Initial work by Weiskopf et al. (2013) [
10] established key dimensions of completeness, followed by the introduction of quantitative measures such as the Record Strength Score (RSS) by Nasir and colleagues (2016, 2019) [
11,
21]. More recent studies have applied graph theory, statistical distribution tests, and machine learning to characterize and predict incompleteness [
19,
20,
22]. Reviews [
12,
27] and empirical studies [
8,
28] from 2023 onward reflect both global perspectives and emerging standards, underscoring that incompleteness is a multidimensional issue requiring conceptual, methodological, and policy-driven solutions.
Recent global studies reinforce that incompleteness is not a U.S.-centric phenomenon [
12]. Reviews from Europe [
12], empirical evaluations in Africa [
28], and comparative analyses of EHR versus paper-based records [
27] demonstrate that data gaps persist even in advanced health systems and are compounded in resource-limited settings. Taken together, these contributions reveal a multidimensional research trajectory that naturally motivates a broader synthesis of cross-country evidence, which we present in the next section.
While previous studies have examined EHR incompleteness from isolated viewpoints—such as conceptual frameworks [
10], completeness metrics like the Record Strength Score (RSS) [
11], or methodological bias analyses [
12]—our framework provides a unifying perspective that integrates these approaches within a process-oriented model. Specifically, it bridges technical and behavioral dimensions by situating data incompleteness within patient, provider, technology, and policy workflows. This comparison underscores that, unlike earlier studies that primarily focused on measurement or statistical aspects alone, our framework advances the field by linking incompleteness to systemic process gaps and offering actionable pathways for mitigation through multi-level coordination.
3. Global Evidence of EHR Incompleteness
While much of the early research on EHR incompleteness originated in the United States, recent work shows that the problem is global in scope. Studies across diverse regions demonstrate that missingness arises not only from technical limitations but also from structural, organizational, and socio-political factors.
Table 7 summarizes representative evidence across countries and regions, illustrating that incompleteness is a universal challenge with context-specific manifestations.
As
Table 8 demonstrates, incompleteness in EHRs is a universal challenge, but it manifests differently, depending on health system maturity and resources. In high-income countries such as the United States and United Kingdom, incompleteness often arises from workflow limitations, underreporting in structured fields, or a lack of interoperability across vendor systems. In contrast, low- and middle-income countries face structural incompleteness due to limited EHR adoption, partial digitization, and infrastructural challenges such as unreliable connectivity. Hybrid contexts like Latin America highlight the role of policy and governance, with regional variation in reporting standards contributing to data gaps.
Concrete case studies reinforce these findings. In Germany, Wurster et al. (2024) [
29] showed that, even after EMR adoption, perioperative and laboratory documentation remained incomplete. In Belgium, Declerck et al. (2024) [
30] demonstrated that routine measurements such as height and weight were often missing or inconsistently recorded across hospitals. In the United States, Huang et al. (2023) [
4] revealed that incomplete longitudinal data continuity can amplify racial–ethnic disparities in predictive modeling. In Kuwait, AlHussainan et al. (2025) [
34] identified critical gaps in safety-related fields across public hospitals. These examples highlight that incompleteness persists even in resource-rich settings and has direct consequences for clinical care, equity, and research validity.
To enable comparability across health systems with differing EHR maturity and data definitions, prior studies highlight the utility of schema alignment through HL7/FHIR mappings and the Observational Medical Outcomes Partnership (OMOP) Common Data Model [
15]. Reported incompleteness rates in the reviewed literature were interpreted as relative indicators after accounting for contextual factors such as data model scope and reporting granularity.
Recognizing its global scope is essential for developing mitigation strategies that are context-sensitive while also guided by universal standards. The next section explores the specific process gaps at the patient, provider, technology, and administrative levels, that drive incompleteness within healthcare workflows.
6. Mitigation Strategies and Future Directions
With the conceptual foundations, process gaps, and broad implications of incompleteness having been outlined, it is critical to turn toward solutions. Addressing incompleteness requires a multi-level approach that integrates technical, organizational, policy, and patient-centered interventions. Each of these domains contributes distinct but complementary mechanisms to reduce missingness, harmonize workflows, and build trust in electronic health records (EHRs). This section synthesizes emerging strategies and future directions across these four domains.
6.1. Technical Solutions
Technical innovations offer the most direct pathway to mitigating incompleteness in EHRs. Real-time data quality assessment systems that flag missing, inconsistent, or implausible entries at the point of documentation can substantially reduce downstream errors. Recent studies demonstrate that automated completeness checks integrated into EHR workflows improve both data capture and clinical usability [
67,
68,
69]. Beyond rule-based validation, machine learning methods have been developed to predict likely missing values and harmonize heterogeneous inputs across systems. For example, natural language processing pipelines now extract structured information from unstructured clinical notes, partially closing gaps in structured fields [
70,
71].
Standardization remains equally critical. Despite the wide adoption of HL7/FHIR, the lack of uniform implementation undermines the consistency of captured data. Ongoing initiatives have emphasized maturing FHIR profiles and strengthening mapping between terminologies such as SNOMED CT and LOINC to reduce structural incompleteness during interoperability [
15,
72]. Finally, technical advances in federated learning and privacy-preserving record linkage are showing promise in integrating fragmented patient records across institutions without compromising security. For example, Kho et al. (2020) used PPRL in the All of Us Research Program to detect care fragmentation across health provider organizations [
73]. Similarly, recent frameworks such as FED-EHR (Wani et al. 2025) and reviews of methodological advances in FL (Zhang et al. 2024) demonstrate that federated models can maintain performance while preserving privacy [
74,
75]. Altogether, these methods help mitigate longitudinal incompleteness by allowing information linkage and learning across distributed datasets.
6.2. Organizational Solutions
Organizational strategies are essential to address the human and workflow dimensions of incompleteness. Evidence consistently shows that high documentation burden and poorly aligned workflows are leading contributors to missing or incomplete EHR data. Interventions such as team-based documentation, medical scribes, and task redistribution have been associated with improved completeness and reduced clinician fatigue [
76]. Workflow redesign, usability improvements, and reduced click/navigation burden likewise appear to improve documentation completeness and reduce burnout [
77,
78].
Training and education represent another critical organizational strategy. Studies reveal that clinicians and ancillary staff often lack sufficient training in structured data entry, resulting in incomplete or inconsistent records [
56,
61]. Structured onboarding, refresher courses, and role-specific training have demonstrated measurable gains in data completeness across hospital units.
Finally, organizational culture plays an important role. Institutions that emphasize quality improvement and data stewardship, supported by leadership commitment and feedback mechanisms, show higher completeness rates compared to those without such frameworks [
79]. Equally important is addressing patient-facing workflows, where initiatives such as improving digital literacy support, patient portals, and structured patient-generated health data entry have shown promise in reducing missingness [
80,
81].
Together, these findings suggest that organizational solutions are indispensable complements to technical interventions, ensuring that completeness is embedded into day-to-day clinical practice rather than treated as an afterthought.
6.3. Policy and Governance Solutions
Policy and governance interventions provide the structural backbone for improving EHR completeness across health systems. Unlike technical or organizational efforts that focus on workflows or tools, governance frameworks establish accountability, incentives, and standards that shape long-term data quality. Recent work emphasizes that national and regional governance structures significantly influence data completeness. For example, a systematic review of digital health governance models demonstrated that countries with mandated reporting standards and centralized oversight (e.g., Denmark, Estonia) achieved higher levels of record completeness compared to systems with fragmented governance [
13]. Similarly, regulatory policies enforcing the mandatory coding of diagnoses and medications were associated with significant improvements in EHR completeness in France and Canada [
82]. Another crucial area involves the alignment of reimbursement incentives with documentation quality. Studies indicate that, when reimbursement policies reward accurate and complete documentation, providers demonstrate higher compliance with structured EHR data entry [
83].
Conversely, poorly designed incentives risk promoting “check-box” behavior without meaningful improvements in data quality. Governance frameworks must also address interoperability and equity. A multicountry 2024 study reported that the absence of coordinated regulatory mandates for HL7/FHIR adoption perpetuates fragmentation and missingness, especially in cross-border care [
14,
15]. At the same time, the failure to incorporate equity-driven governance exacerbates disparities, as underrepresented populations often face higher levels of incomplete data capture [
84].
Together, these findings indicate that policy and governance solutions must extend beyond compliance checklists. Effective governance should embed standardization, accountability, financial alignment, and equity considerations as cornerstones of completeness. Without such frameworks, technical and organizational interventions risk remaining piecemeal, unable to address systemic incompleteness across healthcare ecosystems.
6.4. Patient-Centered Solutions
Patients play a pivotal role in mitigating EHR incompleteness, especially through engagement, transparency, and the contribution of patient-generated health data (PGHD). A growing body of evidence suggests that, when patients are empowered as active participants in their health data journey, record completeness and accuracy improve substantially.
One promising approach is the use of patient portals and mobile health applications that allow patients to review and supplement their records. A multicenter trial demonstrated that providing patients with structured opportunities to correct medication lists and allergy information via portals significantly increased data completeness while reducing discrepancies during clinical encounters [
85,
86]. Similarly, mobile apps integrated with EHRs have been shown to improve the reporting of chronic disease management metrics, such as blood glucose and blood pressure, filling critical gaps in longitudinal monitoring [
87,
88]. Another important dimension is addressing health literacy and digital divide barriers. Studies show that patients with limited digital literacy are less likely to contribute reliable PGHD, perpetuating disparities in data completeness [
89,
90]. Targeted training programs and simplified user interfaces can help mitigate this issue, ensuring more equitable patient participation in data contribution.
Finally, involving patients in data governance and quality oversight has gained traction as a mechanism for accountability. A 2025 study reported that incorporating patient representatives in data quality review committees improved trust and identified gaps that clinicians and administrators often overlooked [
91]. In addition, expanding the integration of PGHD through wearables and remote monitoring devices provides richer datasets, although the challenges of standardization and validation remain [
92,
93]. Taken together, patient-centered solutions highlight that data completeness is not solely a technical or institutional responsibility. Patients themselves, when supported with the right tools, literacy, and governance frameworks, become key contributors to more accurate and equitable EHR systems.
Table 10 consolidates the wide range of mitigation strategies discussed in this section. Technical solutions address structural and interoperability gaps, while organizational interventions reduce documentation burden and embed data quality into workflows. Policy and governance frameworks provide accountability and standardization, and patient-centered initiatives extend completeness by empowering individuals to contribute and verify their own health data. Together, these strategies emphasize that no single intervention is sufficient; sustained improvements require coordinated action across all four domains.
8. Conclusions
This perspective has highlighted that incompleteness in electronic health records (EHRs) is not merely a technical artifact but a systemic process problem, arising at the intersection of patient behaviors, provider workflows, technological limitations, and policy structures. Across sections, we demonstrated how incompleteness compromises clinical decision-making, weakens interoperability, biases research and machine learning models, and raises serious ethical and governance concerns. From an authorial standpoint, our central argument is that EHR incompleteness must be treated as a foundational barrier to the promise of digital health. It is not sufficient to pursue interoperability or advanced analytics without first ensuring that the underlying data are consistently captured, validated, and contextualized. Our synthesis suggests that mitigation requires a multi-level approach: technical advances such as automated quality checks and federated integration; organizational strategies that reduce documentation burden and embed usability; governance models that align incentives and mandate standards; and patient-centered solutions that empower individuals to contribute and verify their health data. While absolute completeness is unattainable, systematic improvement is both feasible and urgent. By embedding completeness into the design of health information systems and aligning it with equity, accountability, and patient safety, we can shift EHRs from fragmented repositories toward reliable, learning health infrastructures. As researchers and practitioners, we see this as an opportunity: addressing incompleteness is not simply a corrective task but a necessary step toward realizing the transformative potential of digital health for safe, equitable, and globally relevant healthcare systems.