Agreement in All-in-One Dataset between Diagnosis and Prescribed Medication for Common Cardiometabolic Diseases in the NDB-K7Ps

The Japanese National Database (NDB), a useful data source for epidemiological studies, contains information on health checkups, disease diagnoses, and medications, which can be used when investigating common cardiometabolic diseases. However, before the initiation of an integrated analysis, we need to combine several pieces of information prepared separately into an all-in-one dataset (AIOD) and confirm the validation of the dataset for the study. In this study, we aimed to confirm the degree of agreement in data entries between diagnoses and prescribed medications and self-reported pharmacotherapy for common cardiometabolic diseases in newly assembled AIODs. The present study included 10,183,619 people who underwent health checkups from April 2018 to March 2019. Over 95% of patients prescribed antihypertensive and antidiabetic medications were diagnosed with each disease. For dyslipidemia, over 95% of patients prescribed medications were diagnosed with at least one of the following: dyslipidemia, hypercholesterolemia, or hyperlipidemia. Similarly, over 95% of patients prescribed medications for hyperuricemia were diagnosed with either hyperuricemia or gout. Additionally, over 90% of patients with self-reported medications for hypertension, diabetes, and dyslipidemia were diagnosed with each disease, although the proportions differed among age groups. Our study demonstrated high levels of agreement between diagnoses and prescribed medications for common cardiometabolic diseases and self-reported pharmacotherapy in our AIOD.


Introduction
In the last few decades, vast amounts of healthcare data have been stored and managed in digital electronic systems, along with the progression of database systems using upgraded computers and IT systems, which now allow us to encounter so-called healthcare big data [1][2][3].
The Japan National Database of Health Insurance Claims and Specific Health Checkups (NDB), provided by the Japanese Ministry of Health, Labour and Welfare (MHLW), contains data on all disease diagnoses and prescribed medications (administered medicinal substances) for the whole Japanese population, in addition to annual health checkup information [4].The NDB is an important data source for epidemiological studies that aim to provide novel insights into the incidence of diseases, pathophysiology, and the underlying mechanisms for specific target diseases.
At the present time, the NDB is offered to investigators as separate datasets in the form of comma-separated value (CSV) files (datasets of health checkups, disease diagnoses, and prescribed medications) by the MHLW.Therefore, before the initiation of an integrated analysis, we need to assemble these datasets into an all-in-one dataset (AIOD) that includes disease diagnoses, prescribed medications, clinical features, and biochemical results (see graphical abstract).
Medication is typically prescribed post-diagnosis.Ideally, all patients prescribed a drug for a disease have a confirmed diagnosis, amounting to 100%.Nevertheless, this has not been confirmed yet, particularly for the newly prepared AIOD.Furthermore, the degree of reliability for self-reported medications on questionnaires administered during health checkups has also been poorly understood.
Therefore, in this study, we aimed to investigate the agreement in data entries between diagnoses and prescribed medications and self-reported medications in the AIOD, an assembled dataset obtained from several separate datasets from the NDB.Health checkups, whose data have been stored in the NDB, have been launched primally for the improvement of metabolic syndrome [5].For several decades, hypertension, diabetes, and dyslipidemia have been major common cardiometabolic diseases that lead to fatal atherosclerosis and heart, brain, and vascular damage worldwide [6][7][8][9]; however, their underlying mechanisms and the associations between medications, clinical parameters, and lifestyles have not been fully revealed.Therefore, we investigated diagnoses and prescribed medications primarily for these three diseases in this study.This study did not investigate the appropriateness of prescribing for these common cardiometabolic diseases.

Study Design and Participants
The present study was a composite multidisciplinary study involving the secondary use of annual health checkup data in Japan as a part of the National Database Study in the Kanto 7 Prefectures (Tokyo, Kanagawa, Saitama, Chiba, Ibaraki, Gunma, and Tochigi) study (the NDB-K7Ps Study), which were collected to investigate clinical factors primarily associated with cardiometabolic diseases.Details of the study concept and design have been described elsewhere [10].After a rigorous review of our research project by the MHLW, our protocol was accepted in December 2020 (No. 1320).We received digitally recorded anonymous data from the MHLW in July 2022.
In Japan, electronic submission of all insurance claims data from medical institutions has been mandatory according to the MHLW since 2011, with nearly complete penetration in 2015 (Figure S1 [11]).Therefore, the present study included nearly all claims data from 10,183,619 non-hospitalized individuals who were living in the above seven prefectures of Kanto and underwent specific health checkups from April 2018 to March 2019, which are mandatory for people aged 40-74 years in Japan [5].During data processing to link the health checkup data with datasets comprising disease diagnoses and prescribed medications, we used special IDs prepared by the MHLW, which are unique to each person and described as hashed 64 alphanumeric codes based on sex, birth date, and insurance identification number.Data for diseases coded as "suspected" or "withdrawn" were excluded in this study.Data assembly via common ID and processing were performed using Excel CSV datasets offered by the MHLW and SAS datasheets that imported Excel data.Finally, we obtained an AIOD that included all relevant factors initially located in separate datasets.

Disease Diagnosis, Prescribed Medications, and Self-Reported Medication
In this study, we identified each disease using Japan's disease codes, which are used for insurance claims in Japan [12] and correspond to a more detailed disease classification than the International Classification of Diseases, 10th revision (ICD-10) codes [13].For example, type 2 diabetic nephropathy and renal failure were separately coded as "8830042" and "8845088", respectively, in Japan's disease codes, although they are coded as the same code ("E112") in ICD-10.The disease codes in insurance claims were rigorously checked by medical clerks in each hospital and clinic.We defined patients with hypertension as those who were diagnosed with at least hypertension or essential hypertension [14].Similarly, we defined patients with diabetes as those who were diagnosed with at least one of the 259 codes for diabetes (Table S1).
To investigate the proportion of diagnosed patients who were prescribed medications, we selected the following 19 medications: 8 antihypertensive medications (Trade names: Perdipine, Valsartan Tablets, RENIVACE Tablets, NU-LOTAN Tablets, Mikelan LA capsules, CALSLOT TABLETS, Selara Tablets, and PREMINENT Tablets) for hypertension, 3 antidiabetic medications (METGLUCO Tablets, TENELIA TABLETS, and Suglat Tablets) for diabetes, 3 medications for dyslipidemia (LIVALO OD TABLETS, BEZATOL SR Table and Zetia Tablets) for dyslipidemia, 2 medications for hyperuricemia and gout (Zyloric Tablets and Feburic Tablets), 2 vitamin K preparations (Glakay capsules and Kaytwo Capsules, Syrup and Injection), and 1 vitamin E preparation (Juvela Tablets, Capsules and Powder).Information about selected medications, namely, the trade name, nonproprietary name (Japanese Accepted Names for Pharmaceuticals [15]), therapeutic category, and clinical indications, was collected from the package inserts or the search system of the Prescription Medications in Pharmaceuticals and Medical Devices Agency [16].
Patients with self-reported medications for hypertension, diabetes, and dyslipidemia were defined as those who reported taking the following medicines on the questionnaire during health checkups: medications to reduce blood pressure, insulin injections or medications to reduce blood glucose, and medications to reduce cholesterol levels.The missing data about self-reported medications, which are considered data missing at random, were excluded from analysis.

Statistical Analysis
Accuracy was calculated by summing up all cases of agreement (number of individuals that have a diagnosis and medication, plus the number of individuals that have neither diagnosis nor medication) and dividing by the total number of individuals [17].An individual's response to the questionnaire depends on several factors.As sex and age are important influencing factors for cardiometabolic diseases [18][19][20][21], the proportion of patients who were diagnosed with hypertension, diabetes, and dyslipidemia among those who reported receiving pharmacotherapy on the questionnaire were compared between the four age groups (40s, 50s, 60s, and 70s) using the χ 2 test with Bonferroni correction.Statistical analysis was performed using SAS-Enterprise Guide (SAS-EG 7.1) in SAS, version 9.4 (SAS Institute, Cary, NC, USA).Values of p < 0.05 were considered statistically significant.When differences in the proportion of patients between two selected groups among four age groups were evaluated by the χ 2 test, values of p < 0.007 were considered to represent statistical significance on the basis of the Bonferroni test.

Proportion of Diagnosed Patients among Those Who Were Prescribed Medications
Table 1 shows the proportion of diagnosed patients who were prescribed medications.The number of patients who were prescribed at least one of the eight antihypertensive medications was 51,598, while 1,982,782 reported receiving antihypertensive medication, 2,369,621 were diagnosed with hypertension, and 45,288 had overlaps among all three variables.Similarly, the number of patients who were prescribed at least one of the three antidiabetic medications was 114,597, while 515,636 reported taking antidiabetic medication, 1,293,487 were diagnosed with diabetes, and 96,900 had overlaps among all three variables.The number of patients who were prescribed at least one of the eight medications for dyslipidemia was 117,930, while 1,341,702 reported taking medication for dyslipidemia, 2,335,645 were diagnosed with dyslipidemia, and 93,724 had overlaps among all three variables.
* Accuracy was calculated by summing up all cases of agreement (number of individuals that have a diagnosis and medication, plus the number of individuals that have neither diagnosis nor medication) and dividing by the total number of individuals. 1 Nonproprietary name according to Japanese Accepted Names for Pharmaceuticals [15]. 2 Hypertension included hypertension and essential hypertension. 3Generic drugs made by 37 pharmaceutical companies. 4Diabetes included the following 259 diagnoses, except for gestational diabetes: 60 type 1 diabetes mellitus, 61 type 2 diabetes mellitus, 10 viral diabetes mellitus, 10 steroid diabetes mellitus, 10 mitochondrial diabetes mellitus, 10 slowly progressive type 1 diabetes mellitus, 10 hepatic diabetes mellitus, 3 proliferative diabetic retinopathy, 48 diabetes mellitus, 10 secondary diabetes mellitus, 10 drug-induced diabetes, 10 pancreatic diabetes mellitus, 1 insulin-resistant diabetes mellitus, 1 stable diabetes mellitus, 1 brittle diabetes mellitus, 1 malnutrition-related diabetes mellitus, 1 fulminant type 1 diabetes mellitus, 1 juvenile type 2 diabetes, and 1 bronchogenic diabetes mellitus. 5Vitamin K deficiency included deficiency of coagulation factor owing to vitamin K deficiency. 6Diabetic retinopathy included the following nine diagnoses: diabetic retinopathy, type 1 diabetic central retinopathy, type 1 diabetic retinopathy, type 2 diabetic central retinopathy, type 2 diabetic retinopathy, proliferative diabetic retinopathy, proliferative diabetic retinopathy/type 1 diabetes, proliferative diabetic retinopathy/type 2 diabetes, and diabetic central retinopathy. 7The detailed number is not expressed because of the small number of participants (<10), which could affect confidentiality.ICD-10, International Classification of Diseases, 10th revision.
Over 95% of patients prescribed antihypertensive medications were diagnosed with hypertension.Noteworthily, over 99% of patients prescribed antidiabetic medications were diagnosed with diabetes.In contrast, a low percentage (approximately 28%) of patients prescribed medication for dyslipidemia were diagnosed with dyslipidemia, with 24-65% for hypercholesterolemia (including familial hypercholesterolemia) and 40-71% for hyperlipidemia.In contrast, over 97% of patients prescribed medications for dyslipidemia were diagnosed when target diseases were replaced with at least one of those three diseases (dyslipidemia, hypercholesterolemia, or hyperlipidemia).Similarly, over 98% of patients prescribed medications for hyperuricemia were diagnosed with either hyperuricemia or gout.The accuracy was 77.2% in hypertension, 88.4% in diabetes, 78.2% in dyslipidemia, and 95.3% in hyperuricemia and gout.
Of the patients taking Glakay capsules (vitamin K 2 preparation), 92.1% were diagnosed with osteoporosis, and 1.9% were diagnosed with vitamin K deficiency.Similarly, in patients treated with Juvela (vitamin E preparation), 2.9% were diagnosed with vitamin E deficiency.

Proportion of Diagnosed Patients among Those Who Reported Receiving Pharmacotherapy on the Questionnaire
Table 2 shows the proportion of patients who reported receiving pharmacotherapy for hypertension, diabetes, and dyslipidemia on the questionnaire during health checkups and those who were diagnosed with each disease.Overall, in all three diseases, the proportion of patients who reported receiving pharmacotherapy and who were diagnosed with each disease was over 90%.The accuracy was 93.3% for hypertension, 91.7% for diabetes, and 88.1% for dyslipidemia.
Like patients who were prescribed anti-dyslipidemia medications (Table 1), the proportion of patients who reported receiving pharmacotherapy for dyslipidemia, hypercholesterolemia, or hyperlipidemia was low (24-49%), whereas that of patients who were diagnosed with at least one of them was higher (91.8% in total).
Additionally, the proportion of diagnosed male patients who reported receiving pharmacotherapy (89.5-92.8%)was significantly lower than that of diagnosed female patients (94.1-94.7%)(χ 2 test with Bonferroni correction for multiple comparisons, p < 0.007).Moreover, the proportion of diagnosed patients in their 40s who reported receiving pharmacotherapy (87.7-89.9%)was significantly lower than that of diagnosed patients in their 70s (94.4-94.9%)(χ 2 test with Bonferroni correction for multiple comparisons, p < 0.007).In addition, there were more missing data for the questions about self-reported medications among people in their 40s than those in their 70s (Table S2).* Significant differences between sex or age groups were determined using χ 2 test with Bonferroni correction for multiple comparisons (p < 0.007).† Proportions are calculated based on the available numbers for each question.‡ Accuracy was calculated by summing up all cases of agreement (number of individuals that have a diagnosis and medication, plus the number of individuals that have neither diagnosis nor medication) and dividing by the total number of individuals. 1 Hypertension included hypertension and essential hypertension. 2Diabetes included the following 259 diagnoses, except for gestational diabetes: 60 type 1 diabetes mellitus, 61 type 2 diabetes mellitus, 10 viral diabetes mellitus, 10 steroid diabetes mellitus, 10 mitochondrial diabetes mellitus, 10 slowly progressive type 1 diabetes mellitus, 10 hepatic diabetes mellitus, 3 proliferative diabetic retinopathy, 48 diabetes mellitus, 10 secondary diabetes mellitus, 10 drug-induced diabetes, 10 pancreatic diabetes mellitus, insulin-resistant diabetes mellitus, stable diabetes mellitus, brittle diabetes mellitus, malnutrition-related diabetes mellitus, fulminant type 1 diabetes mellitus, juvenile type 2 diabetes, and bronchogenic diabetes mellitus. 3Hypercholesterolemia included familial hypercholesterolemia. ICD-10, International Classification of Diseases, 10th revision.

Discussion
Although the NDB has covered over 95% of medical and dispensing claims for the Japanese population during the past decade (Figure S1 [11]), with the coverage reaching almost 100% in 2018 [22], little is known about the degree of agreement in data entries in an AIOD, an assembled NDB, which we used in [23] and will continue to use.Our study demonstrated a high level of agreement between diagnoses and prescribed medications for hypertension, diabetes, dyslipidemia (including hypercholesterolemia and hyperlipidemia), and hyperuricemia (including gout) in our assembled NDB datasheet.This suggests that the use of prescribed medications is reasonable as one of the confirmation methods for the diagnosis for further study using the NDB.Although "hyperlipidemia" was renamed "dyslipidemia" in 2007 in Japan [24], the previous term was still used for diagnosis in 2018.The prescription of the vitamin K preparation Glakay capsules was not in good agreement with the diagnosis of vitamin K deficiency, whereas it was in relatively high agreement with the diagnosis of osteoporosis, in accordance with clinical indications for Glakay [16].Similarly, the vitamin E preparation Juvela is also used to treat arteriosclerosis and diabetic retinopathy.These results suggest that widely used medications, such as vitamin preparations, are not suitable for use as a means to validate a disease diagnosis, and that diseases without disease-specific medications, such as vitamin deficiency, are difficult to define using information on the prescribed medication.
Meanwhile, the proportion of patients having a diagnosed disease was not 100% among those who were administered medications, although it was over 95%.Although there was some time lag between the time points of the diagnosis entry and its revision by an attending physician and the actual administration of a drug, which might contribute to a few reductions from 100% (approximately 5%), it is unknown why such reductions were observed in our AIOD.
As one of the strong points for the NDB, it has detailed information available about medications, namely, the dosage, form, method of administration (powder, tablet, capsules, or injection), trade name, regardless of the original or generic product, and the pharmaceutical company that manufactures the drug.This information can reveal the frequency of prescribed medications, which might be useful in pharmacoeconomics as well as pharmacoepidemiology.
In this study, self-reported medications for hypertension, diabetes, and dyslipidemia on the health checkup questionnaire corresponded well with diagnoses, showing the high accuracy of self-reported medications.Although female individuals answered the questions in higher concordance with clinical diagnostic information than males, both proportions were almost over 90%.Interestingly, our results showed that individuals in their 70s answered the questions with the highest concordance with clinical diagnostic information (almost 95%) among the four age groups, whereas those in their 40s answered with the lowest (almost 90%).There were more missing data in the questions about selfreported medications among people in their 40s than those in their 70s (Table S2).Moreover, the proportion of individuals who underwent health guidance by medical staff, which is recommended for individuals at risk of cardiometabolic disease, was higher among people in their 70s than among those in their 40s [25].Taken together, the results suggest that individuals in their 70s might be more conscious about promoting their own health and might complete the questionnaire more accurately than people in their 40s.Future studies targeting individuals in their 40s to investigate inaccuracies in self-reported medications are warranted.
Misclassification of the disease diagnosis is an important problem in claims databases [26][27][28][29].Because matching between the NDB and other data sources is unfeasible in Japan, at least at the present time, several studies have been conducted to validate disease diagnoses using claims data from a single or several hospitals in which specialists in each disease confirm the condition by reviewing past medical records in detail.For example, high positive predictive values (almost 90%) were obtained in several studies by confirming both the diagnosis and treatment using disease-specific medications for type 1 diabetes, age-related macular degeneration, and medication-related osteonecrosis of the jaw [13, 30,31].Our study showed high positive predictive values (patients with diagnoses/patients with prescriptions (%): over 95%), which may support that our method is reasonably acceptable.Moreover, the diagnosis procedure combination case mix scheme (DPC) claims information, which is included in the NDB, may be useful for defining a disease without disease-specific medication, such as acute myocardial infarction [12], suggesting that further research involving DPC databases is needed.
Of note, ICD-10 codes were not always an appropriate method to identify a patient's disease; for instance, viral, steroid, hepatic, secondary, drug-induced, and pancreatic diabetes were all categorized using the same code in ICD-10 (Table S1).Japan's disease codes provide greater detail regarding the condition of a disease using a 7-digit code in comparison with ICD-10 codes, which is a strength of our study.However, misclassification is a serious problem, if any, in claims databases.Additionally, Japan's disease codes include several unclear disease names, for instance, a diagnosis coded only as "diabetes mellitus" but without other information, which does not allow us to investigate the disease in detail.
The NDB, which comprises complete big data, is useful for investigating orphan diseases in which conditions, etiologies, and even epidemiological characteristics such as the prevalence rate are poorly understood because of inadequate numbers of observations and corresponding cases [32].Future multidisciplinary analysis using the NDB, particularly the AIOD, will be helpful in obtaining a wide range of novel findings, confirming previous indefinite findings, and providing new perspectives for health promotion and disease prevention.To this end, it is crucial to assemble several NDB datasets and combine them into one AIOD through common IDs.
Several limitations should be mentioned in our study.First, it is impossible to confirm whether the actual diagnoses in the NDB were correct because linking the NDB with other databases, such as medical records at the individual level, is currently unavailable in Japan.Second, when the target disease has various pharmacotherapies and no disease-specific medications and, in particular, when pharmacotherapy is not essential for the target disease, our method is limited in use.Thirdly, there may be misreporting of self-medications due to recall bias, social desirability bias, and bias due to the level of health education/awareness in self-reported data.Finally, we had no available information about the duration of administration and the patient's adherence to the medication, which can modify the effects of medications.Therefore, additional studies are needed to address these limitations.

Conclusions
Our results demonstrated a high level of agreement between diagnoses and prescribed medications for hypertension, diabetes, and dyslipidemia, common cardiometabolic diseases, and self-reported pharmacotherapy for these diseases, which suggests that the newly prepared AIOD is a reasonable and useful data source for epidemiological studies that will explore the association between diagnosed disease, prescribed medications, and clinical parameters, especially in terms of cardiometabolic diseases.

Institutional Review Board Statement:
The study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the Ethics Committee of Japan Women's University (513) and the MHLW of Japan (No. 1320).
Informed Consent Statement: Informed consent was not required because of the use of anonymous data from the MHLW of Japan, as part of its nationwide program involving the provision of medical data to third parties.The study protocol is available online (https://www.jwu.ac.jp/unv/educationresearch/NationalDatabase.html;accessed on 24 September 2023).

Table 1 .
Proportion of diagnosed patients among those who were prescribed a specific medication.

Table 2 .
Proportion of diagnosed patients among those who reported receiving pharmacotherapy on the health checkup questionnaire.