Data Quality : A Negotiator between Paper-based and Digital Records in the Pakistan ’ s TB Control Program

Background: Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records by using data quality assessment framework. Methodology: We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled TB clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings: A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n=59; 59 out of 117 TB01 cards) were digitized. There were 1,239 comparable data fields, out of which 65% (n=803) were correctly matched between paper based and digital records. However, 35% of the data fields (n=436) had anomalies, either in paper-based records or in digital records. 1.9 data quality issues were calculated per digital patient record, whereas it was 2.1 issues per record for paper-based record. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paperbased records (n=123) than in digital records (n=110). Conclusion: There were fewer data quality issues in digital records as compared to the corresponding paper-based records. Greater use of mobile data capture and continued use of the data quality assessment framework can deliver more meaningful information for decision making.


Introduction
With an increased adoption of performance indicators for monitoring the healthcare delivery systems, the need for high-quality data generation has also increased [1].Health information management systems are intended to provide the right information to their users through feedback and data sharing, and are designed for facilitating data-driven decisions, policy making, and health planning [2].
Improving the quality of healthcare data is beneficial in many ways, such as in making informed decisions about service delivery, ensuring patient safety, conducting research, informing patients regarding their illness and care, and measuring effectiveness of the clinical pathways.Sharing data within and across departments or organizations can provide much needed evidence about healthcare community needs [3], offering a reliable summary of the true health status of patients and the community, and guiding policy makers in making healthcare system adjustments as necessary [4].Similarly, the cornerstone of the public health function is to identify healthcare needs, to influence policy development, and to ensure that healthcare services are equitably provided [5].
Organizations rely heavily on various data resources for the effective and efficient management of their operational processes.However, the volume and complexity of some data resources can make them susceptible to defects that can reduce data quality [6] and result in higher operational costs [7].Data quality (DQ) management aims at objectively measuring quality, with particular emphasis on various data quality aspects [8,9]; therefore, many DQ management approaches exist that utilize different perspectives and have been adopted by organizations [10].
In medical and public health communities, documentation is a critical aspect of DQ and quality of care.Complete documentation records the history of the clinical pathway and its outcomes or effectiveness in providing decision support to healthcare providers.Documentation is commonly maintained in paper-based format in low resource settings [11,12].Previous research has shown that paper-based information systems tend to produce low-quality data and result in limited or less than optimal data use [13].The quality of care and quality improvement planning are adversely affected in the case of paper-based information systems; for example, illegibility, incompleteness, and poor organization of records are problems often plaguing the paper format [14].
On the other hand, the benefits of maintaining digital records in healthcare, such as rapid data sharing, reduced paperwork, lower incidence of medical errors, and cost savings, have been commonly discussed in the literature [15][16][17].Furthermore, with proper digital data security and handling provisions implemented, the degree of patient data confidentiality and privacy protections obtained with digital records can exceed that afforded by any paper-based system [18].
Many organizations have started using technology for efficient data management because of the huge quantities of data that are involved in their operational processes [9].Among these technologies, mobile health (mHealth) technologies have gained particular attention for digital data capturing in the public health domain [19].However, in the absence of an adequate DQ improvement strategy, it becomes challenging to translate data into meaningful information and later into programmatic and strategic decisions [9].Moreover, a data quality assessment framework (DQAF) is a vital constituent of an effective DQ improvement strategy [20,21].
Despite efforts in improving the data quality of paper-based records, the overall data quality remains low, especially in the developing countries.In most of the developing countries, data quality defects are because of the information system's inability to detect and prevent errors.In addition, these countries do not adapt context-specific data quality measurement as a usual approach.Because Mercy Corps Pakistan is digitizing data collection and setting up a computerized management information system for its Tuberculosis Control Program, the objective of this study is to compare data quality in digital records and their corresponding paper-based records.

Sample Description
Supported by the Global Fund, Mercy Corps Pakistan undertook a mHealth initiative in the public-private mix (PPM) model of the TB control program in Pakistan.In the PPM model, all registered care providers are providing free treatment and diagnostic services for TB patients.
The sample for this study included six clinics (or six healthcare providers) that qualified for inclusion uniquely because paper-based and digital recording systems were managed simultaneously at each of them.The initiative was focused within the limited geographic areas of the intervention districts (Narowal and Chiniot) and represented by six clinics (three in each district).
Mobile Application for Physician-Patient-Lab Efficiency (MAPPLE) was developed using CommCare platform (https://www.commcarehq.org/),which is open source code that can work well with Java-enabled phones.It is an extension to the JavaROSA codebases (code.javarosa.org)that supports a range of mobile data collection applications in low-income countries.MAPPLE is a mHealth application loaded with TB-related forms that allows users to enter data on the application and share data with a remote cloud server (Figure 1).
University graduates from the United States extended their support to Mercy Corps in developing MAPPLE (mHealth application) for the TB Control Program in Pakistan.Application design and development phase could not use participatory approach because prospective application users and developers were not co-located.However, MAPPLE was tested and re-designed (based on feedback) before its actual use.
Before the enrollment of healthcare providers, it was agreed that completing both paper and digital records would be their responsibility during the pilot phase.At each clinic, paramedic staff were given responsibility and there was no incentive for the application user.Each paramedic staff was given a smartphone with MAPPLE deployed on it during the month of March 2016.

Data Collection
Paper-based patient treatment cards (TB01 card) prepared during the study period of four months (April 2016-July 2016) were requested from the six enrolled clinics of the Narowal and Chiniot districts.These enrolled clinics are operated by private and primary healthcare providers, where only one clinician conducts clinical assessment and is helped by support staff, whereas support staff manage medical stock inventory and patient recording registers.Generally, these healthcare providers are not regulated by health authorities and clinical documentation is also not mandatory.
During the study period, support staff collected both digital and handwritten data.The copies of TB01 cards were compared with the corresponding digital records retrieved from the server.The TB01 card contains data fields that are representative of the patient's profile, and clinical and diagnostic details.The TB01 card captures a multi-visit report of a patient's treatment expanded over a period of either six or eight months, depending upon the category of TB patient (CAT I and CAT II).Data fields representing each data category are summarized in Table 1.
Table 1.Type and number of data fields on patient treatment card (TB01 card).

Data Quality Assessment Method
Prior to analysis, an approach for logical and comprehensive review was developed and a desk review of the collected paper-based and corresponding digital records of the same service delivery points was conducted.All data fields of the TB01 card were input into a spreadsheet-based template to undertake field-to-field comparisons of the shared data fields between TB01 card (paper-based data) and MAPPLE (digital) data (Table 2).Upon culmination of the review, non-matching data fields were ordered into classifiable and non-classifiable issues.Classifiable issues were categorized according to the context-specific data quality dimensions, for example, completeness, accuracy, consistency, understandability, and timeliness; the details of which are reported elsewhere [22].Non-classifiable issues were those differences for which correctness or completeness could not be determined without contacting the patient.For example, difference in reported age noted in two formats (digital and paper-based) can only be corrected if the patient is contacted for this purpose.

Data Analysis
The operational definitions of the identified data quality dimensions were applied to the data variances for classification purposes.The non-matching fields between paper-based and digital records were regarded as a data quality issue.Each issue was attributed to either paper-based record or digital record, hence called a classifiable issue.There were issues occurring due to application design modifications; as these issues were emerging because of technology shortcomings or application workflow, which was not aligning clinical workflow, they were excluded from the main dataset.
The data quality issues in both paper-based and digital records were recorded against each of the data quality dimension, entered in an Excel sheet.In addition to basic descriptive statistical analyses, a test of proportion was conducted to test the significance of results.

Ethical Considerations
In Pakistan, ethical approval is only required for experimental research involving humans and this study is exempt as it does not qualify as experimental research.However, the study followed all of Mercy Corps' established confidentiality guidelines (https://www.mercycorps.org/researchresources)and was carefully checked by the Monitoring, Evaluation, and Learning Unit and Data Controller of Mercy Corps Pakistan.

Comparison of the Paper-Based and Digital Records
During the study period, April 2016-July 2016, a total of 117 TB01 cards were prepared at six enrolled sites, including 68 TB01 cards from three sites in Chiniot district and 49 TB01 cards from three sites in Narowal district.Only 50% of records (n = 59; 59 out of 117 TB01 cards) were digitized by paramedics and sent to the server, which is a rather low use of the mHealth application (MAPPLE) for the purpose of data collection.The TB01 card and MAPPLE had 21 data fields in common, hence the total of 1239 (n = 59 × 21) comparable data fields that were available for analysis (Figure 2).
Out of the 1239 data fields, 65% (n = 803) were found to be correctly matched across paper-based and digital records.However, 35% of data fields (n = 436) had anomalies either in paper-based records or in digital records.Among the data anomalies, 67% were classifiable (292 out of 436) and 33% were non-classifiable issues (144 out of 436).Non-classifiable issues were the differences in data fields that could not be clearly attributed as an issue neither in the paper-based record nor in the corresponding digital record.Discrepancies in comparable data fields, such as different paper versus digital values for patient's contact number, national identification number, age, weight, and lab serial number could not be settled until feedback from the provider or patient was taken (which was not possible in this study as researchers had no access to the patients in question).These mismatches were therefore categorized as non-classifiable issues.For example, if a patient's age in the paper-based record is 34 and the age of the same patient in the digital record is 42, then this difference was categorized as a non-classifiable issue.
Similarly, classifiable issues were those differences in data fields that could be attributed as an issue either in the paper-based record or in the digital record.For example, if the age of the patient is given in the digital record, while in the corresponding paper-based record, this field was left empty, then this is considered a paper-based record completeness issue.
In an effort to integrate data collection and care delivery processes, within the study period, various design modifications of the data entry forms took place (e.g., making fields 'required', re-organizing questions, adding new forms or questions to capture missing information), in response to feedback received from application users.Among the classifiable issues, a sub-set of data (n = 59) was excluded from the analysis because it had been affected by these design modification activities (Table 3).Therefore, only valid issues (n = 110) of the digital records (DRs) were compared with issues recorded in the paper-based records (PBR).The distribution of excluded issues in the digital records that occurred as a result of change in the application design is shown in Table 4.
Overall, 1.9 DQ issues were calculated per digital patient record, whereas the corresponding figure was 2.1 issues per single paper-based record.Additionally, at the beginning of the study, the number of issues per digital and paper-based records was 1.5 and 2.2, respectively, but these figures later dropped down to 0.7 and 1.4 issues per record, respectively, by the end of the study period.Based on the analysis of valid data quality issues, it was found that there were more DQ issues in the paper-based records (n = 123) than in the digital records (n = 110).A month-by-month comparison of the data showed that April had significantly different entry errors between DR and PBR.In the case of April, errors in the paper-based records significantly exceeded those in the digital records.All other months under consideration were not significantly different.The difference between months among the digital records showed a significant improvement (p-value = 0.0328), while no significant improvements were observed in the case of the paper-based records over time (p-value = 0.0629).Table 5 lists all 13 data fields where differences were recorded, but not settled because of patients' confidentiality concerns (no researchers' access to patients).Patient's age was the data field in which most differences were observed, that is, n = 47.However, differences in patient's weight (n = 22), among others, were critically important in relation to effective case management, because of the clinical significance of body weight value, its use in patient condition monitoring, and its potential to affect certain treatment decisions.Issues with the patient identifier code (n = 20) were also of considerable significance.

Analysis of Classifiable Issues
All valid classifiable issues (excluding those issues that occurred because of the aforementioned design modifications) were further categorized according to data quality dimensions as shown in Figure 3. Overall, there were more completeness issues (n = 148; 63.5%), followed by timeliness (n = 44; 19%), accuracy (n = 30; 13%), understandability (n = 10; 4%), and consistency issues (n = 1; 0.5%) in the set of valid classifiable issues.The detailed findings of the data quality assessment exercise are presented below, categorized by data quality dimension.

Classifier 1: Completeness
An operational definition of completeness is "information having all required parts of an entity's description" [23].
Data completeness issues were found in both datasets; however, there were more such issues in the paper-based medical records, that is, 58% of all observed data completeness issues (Figure 4).Upon further analysis, if was found that patient Name (n = 9) and address (n = 14), and treatment supporter name (n = 11) and address (n = 12), were the digital data fields that showed more issues of completeness.In the paper-based records, the top completeness issues were in patient's address (n = 10), type of referral (n = 11), and laboratory examination date (n = 7).Therefore, it can be said that most of the encountered data completeness issues were in the patient profile data types that allowed free text input, and hence were more prone to errors.However, there were relatively less observed completeness issues in clinical and diagnostics data types, except for laboratory examination date.Classifier 2: Accuracy Applying understanding of the field of practice (tuberculosis treatment) and work settings, accuracy can be defined as "the degree to which data correctly describe the "real world" object or event being described" [24].
Accuracy is one of the key data quality dimensions that helps the data user in building trust in data representativeness.Data-field-level analysis showed that most of such issues were found in the digital records (Figure 5).Out of a total of 30 observed accuracy issues, 77% (n = 23) were found in the digital records, and most of these issues were in patient identifier code (n = 12) and national identity card number (n = 6).

Classifier 3: Consistency
By consistency, we mean that the "representation of data values remains the same in multiple data items in multiple locations" [25].
Consistency was the least reported issue type in our set, with only one issue found in the paper-based records (Figure 6).Classifier 4: Understandability Utilizing the "fitness-for-use" perspective, understandability can be defined as "the statement or the term that has clear or specific meaning" [1].
Under understandability, the findings of this data quality assessment exercise can be mainly linked to one of the most common issues associated with paper-based records, namely illegibility of handwriting.There were a total of 10 understandability issues in our set, and most of them were spotted in the paper-based records (n = 8; 80%), as shown in Figure 7. Classifier 5: Timeliness Under the MAPPLE mHealth initiative, timeliness means that "shared data should be as near to real-time as possible.Thus, data should be timely, in that it relates to the present" [26].
Though the general principles of informatics encourage the integration of application and clinical workflows, technology use also ensures the timeliness of data recording and reporting.However, in our studied set, there were slightly more timeliness issues observed in digital records than in paper records as shown in Figure 8 (DR = 23; PBR = 21; difference = 2), which is a clear indication of the weak integration between workflow processes.Besides the importance of integrating workflows, treatment start date and lab exam date are also of critical importance for achieving the desired health outcomes monitoring of treatment timeline.It was observed that all of these issues (n = 44), either in paper-based records or in digital records, were in those data fields storing treatment start date and follow-up evaluation dates.

Discussion
Global evidence identifies high data quality as a necessary condition for the delivery of quality healthcare [27].In developing countries, health information systems are needed to tackle the growing public health concerns, as current paper-based documentation systems are becoming increasingly inadequate [28].Therefore, mHealth technology is being implemented in the public health settings of developing countries.
This study looked at the paper-based records and their corresponding digital records at the six points or locales of TB care that have started using a mobile data collection application (MAPPLE) from March 2016.As a theoretical framework is helpful in addressing data variability issues [29], we used a data quality assessment framework to assess data quality.According to the study's findings, digital records have generated better data quality in the first quarter of their implementation.On the other hand, despite years of staff practice in maintaining the paper-based patient record, our assessment results showed relatively poor data quality associated with handwritten paper forms.
Moreover, relatively low (50.4%)use of MAPPLE in data collection can be explained by overburdening of the data collection workflows, hence resulting in frustration of the involved staff.Additionally, factors such as unregulated and non-standardized practices in developing countries, and non-incentivized data collection in private healthcare settings are possible reasons for low mHealth adoption.
Currently, in the public-private mix model of TB care delivery, there are multiple stakeholders representing different levels of the management within an organization and across different organizations.The complexity in the management structure demands a high level of collaborative relationship between different management units [29].However, the problem of management complexity can be addressed if different organizations have a similar level of direct control over the data they generate during their normal care and management procedures [30].Hence, all stakeholders get an equal opportunity for the data quality review.Therefore, organizations will start producing high quality data by strategizing the use of a data quality assessment framework.
Data quality issues were found in all three data types: patient profile, and clinical and diagnostic data.Issues in the clinical variables are of critical importance [18].As part of data quality improvement strategy, there should be a mechanism to flag disparities in the clinically important data fields [31].Errors in clinical practice are sometimes attributed to medical documentation errors in paper-based records [32], but digital records, when not properly designed and implemented, can equally suffer from data inaccuracies leading to medical errors [18].Furthermore, it is critically important to receive complete and correct patient information, which is achievable if mHealth technology is fully exploited beyond the mere basic functions of digital data collection, storage, retrieval, and sharing [31,33].

User Adoption and Acceptance Issues of Digital Data Collection
Though there was some improvement in the data quality of digital records over the study period of four months, there was also a gradual decrease in the use of MAPPLE (mobile application).This might be because of frequent application design modifications and non-incentivized data collection.The current use of the MAPPLE, used primarily for data collection at the six study sites, is inconsistent and without any supportive supervision or management's active role in ensuring the regular use of the application.Additionally, no reward mechanism was introduced to encourage application use for the purpose of data collection.
It has been observed that data collection, digitization, and aggregation are increasingly difficult tasks in developing countries [34] because of the lack of incentive programs [35].Additionally, application design considerations should include making all required functions available on the user's device in a highly usable and intuitive fashion [36].Applications should be designed with full user involvement from the early design stages and throughout the application's lifecycle, including its regular maintenance and updates.Applications should seamlessly integrate with existing clinical workflows, improving rather than overburdening them, and taking into consideration the already high work and cognitive loads of most healthcare professionals today [37].Free text input should be kept to a minimum in digital forms (also to avoid errors), and clear and comprehensive choices should be offered instead for users to select from them.Integrity and validation checks should be built into digital forms.Other strategies for minimizing user input, reducing errors, and improving acceptance include cross-linking relevant databases to 'autocomplete' certain fields where applicable, based on values entered in other fields.
Improving data quality is task-dependent and includes aligning data collection processes, operationalizing quality improvement strategy, and building capacity for those responsible for data entry and review.Therefore, with an application like MAPPLE, there is a wide range of organizational and system-specific factors that may affect the adoption of healthcare information technology [38].

Novel Contribution, Replicability and Generalizability of the Work beyond the Six Study Locales
The novel contribution made by this study concerns our model of using an assessment framework that is inclusive of the management perspective and is more relevant to local work settings and field of practice.We believe that a meaningful assessment would not have been possible had we opted to use existing frameworks (generic or developed for other contexts), as only the local data users can conceptualize and contextualize data quality [21,29].Long ago, it was identified that the definition of data quality varies between users, locales, and contexts, which makes the data quality concept multi-dimensional and complicated [39,40].Considering this, a similar approach was also used elsewhere [1,41].
Though the study included six participating primary healthcare clinics, it is observed that across the country, the characteristics of clinics and their clinical and data management practices remain nearly the same [42].The private healthcare system remains largely un-regulated because of a lack of interest of public health authorities [43].This provides non-governmental organizations (NGOs) with an opportunity to bridge the gap between service need and service provision [43].As the public-private mix model is working in 65 districts of Pakistan, our approach can be replicated in other districts of the country (and other countries sharing our settings) when digitization and data quality improvement plans are rolled out in those places.

Follow-Up on Current Work
The current work was completed as part of mHealth initiative within the Tuberculosis Control Program of Mercy Corps, Pakistan.A context-specific data quality assessment framework was developed [22] to report the field-to-field review and comparison of digital record with corresponding paper-based records.Mercy Corps also conducted operational research to examine external and organizational factors that have affected the adoption level of the mHealth application.As also discussed in this study, unregulated private healthcare practice is the biggest challenge, hence data collection is not given importance in routine clinical practice.Because of the work burden of healthcare providers, stakeholders from outside the clinical practice are identified and involved in the mHealth initiative.Additionally, results of the current work are being used iteratively to refine mHealth initiative (MAPPLE) and its expansion plan is already developed.The mHealth initiative does not only emphasize data collection, it also includes elements in application design that will fulfil the information needs of the users in their routine work.

Research Implications
This study included a review of patient records in paper and digital formats, and concluded that, in the studied set of records, digital data were of moderately better quality compared with data from the corresponding paper-based records.For significant and sustained improvement in data quality, the study emphasized the improved technology adoption supported by the incentive program.The present study also identifies the need for iterative revisions so that successful transition from paper-based to digital records is achieved.Despite engaging users in design and development phases, sufficient time for application development and iterations, given by detailed feedback of users, should be incorporated [44].

•
The scope of study included a comprehensive review and comparison of paper-based and digital data to identify quality issues and to categorize the identified issues into classifiable and non-classifiable ones.

•
The strength of the present work is its usefulness in developing a case for implementation agencies for expanding their digital health initiatives, particularly for data collection.

•
As a result of patients' information confidentiality concerns and provisions (researchers had no access to or contact with the patients), the researchers were unable to categorize non-classifiable issues (those data that would have required contacting the patient to verify them), which can be considered as a limitation of the current study.Nonetheless, we demonstrated the need for putting in place an adequate data quality improvement strategy so that reliability and sanity of healthcare data can be fully achieved.

•
With the limited human and other resources in the enrolled clinics, running two systems (paper-based and digital) in parallel during the study period might have caused frustration among clinic staff.Overburdening the data collection workflows of the involved staff might have also been a reason for the relatively low (50.4%)overall use of MAPPLE in data collection.With sufficient incentives in place and a complete switch to a digital format (following any necessary tweaking and optimization of MAPPLE), digital data collection rates can greatly improve in the future.

Conclusions
Overall quality of digital records is moderately better than the quality of paper-based records.Therefore, in addition to the presence of a data quality improvement strategy, the data quality assessment should also be introduced as routine practice.Likewise, considering the inherent ability of the technology in improving data quality, design modifications and workflow optimization and integration should also be considered essential for the adotion of mHealth technology.Efforts towards improving adoption levels should be concentrated on system-level initiatives, such as regulation of private practice, incentivizing data collection, and making data collection an essential part of private clinical practice.Consequently, strengthening of the information management system would help organizations in building trust in data, and making evidence-based and informed decisions about health policy and practice.

Figure 2 .
Figure 2. Overview of data quality assessment result.

Figure 4 .
Figure 4. Trends in data completeness issues.DR-digital records; PBR-paper-based records.

Figure 5 .
Figure 5. Trends in data accuracy issues.

Figure 6 .
Figure 6.Trends in data consistency issues.

Figure 8 .
Figure 8. Trends in data timeliness issues.

Table 2 .
Comparable data fields of the patient treatment card (TB01).

Table 3 .
Quantification and type of issues that occurred as a result of design change.

Table 4 .
Monthly chart of classifiable issues in paper-based records (PBRs) and digital records (DRs).

Table 5 .
Data field-wise distribution of the non-classifiable issues.