Diagnostic Policies Optimization for Chronic Diseases Based on POMDP Model

During the process of disease diagnosis, overdiagnosis can lead to potential health loss and unnecessary anxiety for patients as well as increased medical costs, while underdiagnosis can result in patients not being treated on time. To deal with these problems, we construct a partially observable Markov decision process (POMDP) model of chronic diseases to study optimal diagnostic policies, which takes into account individual characteristics of patients. The objective of our model is to maximize a patient’s total expected quality-adjusted life years (QALYs). We also derive some structural properties, including the existence of the diagnostic threshold and the optimal diagnosis age for chronic diseases. The resulting optimization is applied to the management of coronary heart disease (CHD). Based on clinical data, we validate our model, demonstrate how the quantitative tool can provide actionable insights for physicians and decision makers in health-related fields, and compare optimal policies with actual clinical decisions. The results indicate that the diagnostic threshold first decreases and then increases as the patient’s age increases, which contradicts the intuitive non-decreasing thresholds. Moreover, diagnostic thresholds were higher for women than for men, especially at younger ages.


Introduction
Chronic diseases remain one of the major public health problems in the world, imposing a high cost burden on healthcare systems and reducing the quality of life of patients [1][2][3]. About half of all adults have one or more chronic diseases, which are responsible for 70% of deaths in the United States and account for more than 85% of total healthcare expenses [4]. In China, chronic diseases contribute 86.6% to total deaths, and the burden accounted for 70% of the total medical cost in 2017 [5]. Given that the population is increasing and aging, especially due to the impact of COVID-19, people are more concerned about their health. Therefore, how to take effective intervention strategies early enough to be effectively diagnosed is of paramount importance [6].
There are many advanced medical examinations for screening and diagnosing chronic diseases, including non-invasive and invasive tests, which makes it possible to make personalized diagnostic policies based on individual characteristics to detect chronic diseases early [7]. Non-invasive examinations have important value for early screening of chronic diseases. When a patient undergoes a non-invasive test for basic screening, a physician will decide whether he needs further invasive tests based on the screening results, such as coronary angiography (CAG), prostate biopsy, or breast biopsy. Such policies for invasive tests made following basic screening are called diagnostic policies [8].
The latest screening program of the U.S. Preventive Services Task Force emphasizes that the purpose of screening should be to maximize the life of the patient while minimizing the damage to the body [9]. When diagnosing chronic diseases for patients, a trade-off is required between the long-term benefits of early detection on the one hand and the shortterm side effects of invasive tests and the long-term side effects of subsequent treatment on the other hand. Meanwhile, overdiagnosis (unnecessary examinations) cause patients anxiety and increases personal and national medical costs, whereas underdiagnosis (lack of tests) means patients may not be treated in time [10][11][12][13]. Thus, the underdiagnosisoverdiagnosis trade-off must be confronted for chronic diseases, which serves as our motivation.
Therefore, we address the following research questions about chronic disease diagnosis based on differences in specificity, sensitivity, and associated side effects of the above different examinations: 1.
Whether and when should a physician advise an individual patient to take an invasive test based on basic screening results via non-invasive tests? 2.
How should physicians and decision makers in health-related fields make diagnostic policies for populations with various individual characteristics?
Our motivating example is CAG services for coronary heart disease (CHD) diagnosis. The risk of CHD is predominant among all diseases, and about one in three adults has CHD [14]. Furthermore, patients with CHD are at greater risk of death with COVID-19 than patients with other chronic diseases [15]. Common diagnostic methods include electrocardiogram (ECG), computed tomography (CT), and CAG. ECG and CT are convenient and non-invasive and are often used as a basic screening tool to detect CHD at its early stages, yet they are prone to misdiagnosis and missed diagnosis due to low specificity and sensitivity; CAG is invasive and expensive as well and entails related side effects.
The results of CAG in some patients, however, are normal in the clinic. Currently, some studies show that the proportion of normal examination results is about 16.9-31.1% in China [16]. Such overdiagnosis is not unique to China: For instance, data reports from 663 hospitals in the American College of Cardiology National Cardiovascular Data Registry showed that 62.4% of patients had no obstructive coronary artery disease in 2010 [17,18]. Hatzidakis et al. [19] indicated that no lesions or nonsignificant lesions were found (47.4% of patients). This implies that unnecessary tests can result in excessive diagnosis and additional medical costs. Furthermore, individual characteristics have a significant impact on the diagnostic rate of CAG [20][21][22][23].
To do so, we constructed a partially observable Markov decision process (POMDP) model for quantitative trade-offs to optimize diagnostic policies for patients with suspected chronic diseases, which can help governments to optimize the allocation of medical resources. Our objective is to maximize a patient's total expected quality-adjusted life years (QALYs). Moreover, combined with clinical data, we evaluated the performance of optimal policies on CAG diagnosis and compared them with physicians' decisions in clinical practice.
Different from the existing studies, we studied optimal diagnostic policies and analyzed the impact of individual characteristics of patients on them. To the best of our knowledge, this is the first study from two different perspectives (i.e., individual perspectives and population perspectives) that determines optimal policies for clinical diagnosis based on the POMDP model. The updating of belief state makes our model more consistent with the imperfect nature of basic screening and partially observable health conditions in chronic diseases, which is the difference between our POMDP model and other Markov models in the related literature.
The remainder of our study is organized as follows. In Section 2, we review the relevant literature about chronic diseases management and POMDP in medical decisionmaking, especially about CHD. We characterize the problem of the diagnostic decision in Section 3. In Section 4, we propose the POMDP model and describe the optimal equation as well as some properties. In Section 5, a case study of CHD is presented to illustrate the applicability of the proposed model. The discussion and conclusion are revealed in Sections 6 and 7, respectively.

Literature Review
This paper is related to chronic diseases management and POMDP applications in medical decision-making. We review the related literature as follows, and then point out the differences between our study and existing literature.

Chronic Diseases Management
Research on chronic disease policies has been primarily conducted in terms of three aspects: screening, treatment, and diagnosis. Many studies mainly focus on cost-benefit analysis of screening policies and treatment policies about chronic diseases, such as hypertension [2], breast cancer [8,24], liver cancer [25], colon cancer [26], and glaucoma [3] based on Markov models.
Disease diagnosis plays a critical role in designating screening and treatment policies [27]. In making diagnosis decisions, physicians must consider the trade-off between detecting diseases early and sparing healthy people unnecessary procedures [28]. There are many studies on diagnostic policies of prostate cancer [29] and breast cancer [8,30], which demonstrate that the threshold to biopsy increases with age, implying less potential benefit of a biopsy in older ages.
Regarding decision-making on CHD, while there have been many studies that focus on the optimal screening policies [31,32] and treatment policies [33][34][35][36], there have been relatively few studies that focus on improving CHD diagnosis based on quantitative trade-offs. On the other hand, it is worth noting that the heterogeneity of patients is of paramount importance in making decisions [2,29,37]. Due to the heterogeneity in the patient population with CHD, the impact of treatment may vary among subgroups of the overall population in terms of resource use, health benefits, and cost-effectiveness [38].

POMDP Applications in Healthcare
In healthcare, Markov models are extensively used for decision-making [39]. The use of POMDP in the health context is rather recent, which has attracted much attention within operations management.
As we all know, the POMDP model is a generalization of the Markov decision process (MDP), which is the most popular stochastic sequential decision process in reinforcement learning. Because patients' condition evolves with time, the effects of diagnosis can be accessed via POMDP, and it is not assumed that the state of the system is fully observable, which is consistent with reality and makes it suitable to study chronic diseases.
POMDP has been used to study optimal disease decisions by considering different factors in the literature, such as individual risk factors [37], heterogeneity of adherence rates [40,41], and limited mammography resources [42]. In addition, some studies determine the best time for drug susceptibility testing [43] and analyze the telemedicine triage system [44].
Among the aforementioned literature, Zhang et al. [29] is the study most related to our study. They assumed that the biopsy rate was fixed at different ages. Notwithstanding the relevance, our paper differs from it in three aspects. First, we not only consider the impact of age and gender on the prevalence and deterioration rate, but also analyze dynamic changes under various individual characteristics in the model. Second, we specifically prove some unique structural properties including the optimal diagnosis time for a certain patient with CHD and the belief threshold of diagnostic policies for different populations, while they focused on population-based policies. Third, due to the specificity of CHD and its examinations, our model differs from other literature. For instance, patients with suspected CHD will have health loss, and the transfer of health status is different from other diseases. In our study, the threshold first decreases and then increases as the patient's age increases, which is different from the non-decreasing thresholds in other studies. Finally, combined with clinical data, we evaluate the performance of optimal policies on CAG diagnosis and compare them with physicians' decisions in clinical practice.

Problem Definition
In our model, a patient is in one of five health states including health (H), pre-clinical state (P), clinical state (C), deterioration (M), and death (D), as shown in Figure 1. The state P indicates that the disease is present but not detected, while the state C indicates that the disease is detected through an invasive test. In addition, the state M indicates that the patient's condition has deteriorated. The states in our model are associated with disease incidence and progression as well as patient mortality. If the patient is diagnosed with diseases, his health status will be transferred to state C. In this state, he can be treated with drugs and possibly surgery. For example, if patients are diagnosed with CHD, many of them may be treated with drugs, while some of these patients with severe conditions may undergo PCI or CABG. This means they will exit the diagnosis system and start treatment, and hence we assume a patient in a clinical state (C) will not transition to a deteriorating state (M).
Patients in states H and P can be screened by non-invasive tests such as ECG and CT. Due to the limitation and the inaccuracy of these tests, they cannot confirm whether the patient has diseases, which implies that the patient's condition is partially observable. Because of this feature, we use a probability distribution to represent the belief on true states and update the belief states in a Bayesian manner based on test results.
A diagnostic process is illustrated in Figure 2. When a patient undergoes non-invasive tests, a physician assesses screening results and makes a decision from the two alternatives: wait or administer invasive tests. That is, the physician decides whether to administer an invasive test for the patient based on the results of the basic screening and the value of updated risk. If the decision is to wait, the patient will continue to perform non-invasive tests and make a decision for the same problem in the next year; if the decision is diagnosis via an invasive test, the patient will have two results after the examination: (1) health, the patient will stay in the current state and perform non-invasive tests at next decision epoch; (2) diagnosed with diseases, the patient's health status will be transferred to state C, which implies that he will exit the diagnosis system and start treatment. Patients diagnosed with diseases will be treated, implying that there are long-term side effects of subsequent treatment.
In our model, QALYs is chosen as the criteria since it is extensively used in evaluating the benefits of patients in the aforementioned literature. By capturing the decrements in the quality of life due to side effects of related events such as tests and treatment, it can consider both quality and quantity of life under different health conditions. Specifically, QALYs is normalized to 1 with perfect health for a patient in a full year, and 0 for death [45]. We describe various assumptions made throughout our model. Due to the high accuracy of invasive tests, patients will leave this diagnostic system immediately once they take an invasive test, which is consistent with previous medical decision-making studies [29,37,[40][41][42], and hence we assume a patient in state C will not transition to state M. On the other hand, once a patient's health status has deteriorated, we assume that he will exit the diagnosis system because our model focuses on preventive diagnosis rather than treatment, and therefore we assume a patient in state M will not transition to state C. In addition, because it is not reported in the relevant literature that non-invasive tests will reduce QALYs of patients, we assume that there is no QALYs decrement associated with basic screening. In recent years, many patients have participated in physical examinations every year in hopes of detecting diseases in China. In light of this case, we assume that patients receive basic screening annually. Lastly, we assume that patients fully comply with physicians' diagnosis recommendations, since physicians are often altruistic and act in the best interest of patients.
Therefore, on the one hand, from the patient's perspective the optimal diagnosis age for a certain patient should be determined based on his basic screening results, which means that this patient should be recommended to diagnose via invasive tests in this time, namely optimal diagnosis time. On the other hand, from the perspective of physicians and the government as well as decision makers, we analyze optimal diagnostic policies, denoted by the belief threshold between administering invasive tests and waiting, for populations with various individual characteristics. Our objective is to maximize patient health benefits, thereby efficiently balancing underdiagnosis and overdiagnosis.

Mathematical Model
As mentioned earlier, we developed a POMDP model to maximize a patient's total expected QALYs for optimizing policies in chronic diseases diagnosis, which can characterize the diagnosis progress. The model combines the health states of patients with decisions and updates their belief states (i.e., the probability of being in various disease stages) at each decision epoch according to basic screening results. Next, we detail our POMDP model.
The decision epochs in our model are denoted by t ∈ T = {40, 41, 42, . . . , 100}. That is, the starting age of screening is 40 years old, and the ending age is 100 years old. Assume that each screening is conducted at the beginning of each year.
At period t, health states of patients are denoted by s t ∈ S = {H, P, C, M, D}. H and P are partially observable, whereas the other states are completely observable. Patients have a basic screening via non-invasive tests once a year, so the state transition interval is one year.
The action is taken by a patient aged t, denoted by a t ∈ A = {G, W}. The action space A consists of two decisions: diagnosis via an invasive test (G) or wait (W) until the next decision epoch. Specifically, if the health state of the patient s t ∈ {H, P} at the start of year is t, he will undergo basic screening, and then the physician advises whether he will be subjected to invasive testing (a t = G) or not (a t = W) based on basic screening results in that year.

Observation States and Observation Probabilities
The health states of patients can be obtained through observable health statuses and basic screening results. Due to the imperfect nature of basic screening, states H and P are partially observable. However, some observation information can be obtained that can be used to make further decisions. The observation space is defined as O = {PH, PP, C, M, D}. We use PH to denote an observed health state by results, and PP to denote an observed CHD present but not diagnosed state according to results. All other states are fully observable.
We let l t denote the observation when the patient is in a true state s t at time t, and use q t (l t |s t ) to denote the probability of observing l t . We define the observation probability matrix as Q t (l t |s t ). Note that Q t (l t |s t ) is independent of the actions because actions are only taken for the patient in states s t ∈ {H, P} and these actions do not affect basic screening results. If he is in health state s t ∈ {C, M, D}, it implies that the disease has been diagnosed and thus no invasive test is needed and the observation on the state is deterministic and reflects the true state, i.e., q t (l t |s t ) = 1 for l t ∈ {C, M, D} and q t (l t |s t ) = 0 for l t ∈ {PH, PP}, e.g., q t (C|C ) = 1. With a certain probability (specificity), the result of a healthy patient will be negative; with a certain probability (sensitivity), the result of a patient with CHD will be positive. We let p 1 and p 2 denote the specificity and the sensitivity of basic screening, respectively, and then where rows refer to states H and P and columns refer to observed states PH and PP.

Transition Probabilities
We let p t (s t+1 |s t , a t ) denote health state transition probability, i.e., the probability of the patient in state s t+1 at age t + 1, given that he was in state s t and took action a t at age t. Taking CHD as an example, if a t = W, the patient's CHD-related health state evolves naturally. CHD is caused by a narrowing of the blood vessels in the heart resulting in inadequate blood supply. Even if patients take drugs for a long time, it can only slow down the progression of CHD, not cure it. The patient's health condition will not be reversed but will only remain the same or deteriorate. There are some parameters used to construct the state transition probability matrices in our model. We let i t and e t be the annual incidence of CHD and the probability of health deterioration which are age-specific, and d 1 and d 2 be the annual mortality from CHD and other reasons, respectively.
If a t = G, CAG test changes the course of CHD natural disease progression in that it triggers treatment at a clinical state. We assume that the CAG test can diagnose CHD and identify the patient's state with certainty.
For s t ∈ {H, P}, if the patient is diagnosed with certainty by CAG (a t = G), the patient will be transferred to state C with certainty and the probability of transitioning to any other states is 0.
For s t ∈ {C, M, D}, no CAG is needed and p t (s t+1 |s t , a t ) does not depend on a t . Hence, the patient's health state evolves naturally, which can be modeled by setting p t (s t+1 |s t , W ) = p t (s t+1 |s t , G ).

Belief States and Belief Update
We denote a belief state (or belief vector) to be where b t is the probability with which a person is in one of the five health states at epoch t.
We let B be the set containing all possible belief states, i.e., b t ∈ B ≡ {b t ∈ R 5 |∑ s∈S b t (s) = 1, 0 ≤ b t (s) ≤ 1, s ∈ S}, where R denotes the set of real numbers. Note that, if a patient has an observation l t ∈ {C, M, D}, his belief state is b t (s t ) = 1 for s t ∈ {C, M, D}. This indicates that our five-state POMDP model includes two partially observable states (H and P) and three absorbing states (C, M, D). For a patient without a positive result of invasive tests, his belief state can be represented as b t = (1 − b t (P), b t (P), 0, 0,0).
If l t ∈ {PH, PP}, we apply Bayesian updating for the patient's belief state b t based on l t and a t , which can be used to revise the belief state b t+1 . Based on [29,41,42], the updated belief can be obtained by the following transformation If l t ∈ {C, M, D}, no belief needs to be updated because the true disease state can be observed.

Rewards
We define the rewards which incorporate the expected future longevity and side effects associated with tests. Let r t (s t , a t ) be the immediate reward (measured in QALYs) when the patient's true health state is s t and action a t is selected at time t, and hence the belief state expected immediate reward is computed by In addition, let p 3 be the diagnostic rate of invasive tests for patients, i.e., the probability of detecting the disease through an invasive test (such as CAG) in state P, µ be the expected loss associated with a single invasive test, and let α, β, and γ denote the annual expected loss of QALYs in states P, C, and M, respectively. We assume that all parameters and immediate rewards have values Immediate rewards are provided in Table 1.

Optimality Equation
Our objective is to determine optimal diagnostic policies for chronic diseases, which maximizes the patient's total expected discounted QALYs. The optimal value function of the model can be written as where p t (l t+1 |b t , a t ) = ∑ s t+1 ∈S q t+1 (l t+1 s t+1 )∑ s t ∈S p t (s t+1 |s t , a t )b t (s t ) and λ ∈ [0, 1] denotes the annual discount factor. In the optimality equation, the first component represents the expected immediate reward of a patient at decision epoch t, and the other represents the total expected future rewards. Thus, the diagnostic policy depends not only on current immediate rewards but also on future rewards. The larger the discount factor, the more important future rewards. As previously assumed, patients will leave our diagnostic system immediately once they take an invasive test, and hence the optimal diagnostic policy can be viewed as a partially observable optimal stopping time problem. Let v t (b t , W) and v t (b t , G) be the total expected discounted QALYs for actions W and G in belief state b t , respectively. At each decision epoch, the physician will choose between the expected reward associated with invasive tests, v t (b t , G), or deferral of the decision to invasive tests to the next decision epoch, v t (b t , W). Thus, the optimality equation can be rewritten as Let R t (s) denote the expected discounted future reward given the patient is never referred for invasive tests (i.e., a t = W), and is in state s t , in age t. On the contrary, R t (s) is the expected discounted future reward given a t = G in decision epoch t. Lastly, the difference between a patient's health rewards under two strategies (wait and diagnosis) can be calculated as:

Structural Properties
In this section, we discuss the structure of our POMDP and prove relevant propositions to provide insights to decision makers on optimal diagnostic policies of chronic diseases. POMDPs are computationally difficult to solve. However, it was shown by Smallwood and Sondik (1973) [46] that the value functions for POMDPs are finite, piecewise linear, and convex, and POMDPs can be converted into an equivalent completely observable Markov decision process on the continuous belief states [9,29,47,48], which greatly simplifies the solution of our model.
According to previous assumptions, we first analyze optimal diagnostic decisions by measuring the difference of health rewards under different strategies. Next, we demonstrate that optimal policies of chronic diseases diagnosis follow a control-limit type policy with respect to a belief threshold. The proofs of all results in this section are presented in Appendix A. Proposition 1. Under the assumptions above, for any∀t ∈ T, we have d∆QALYs db t (P) ≥ 0.
Proposition (1) presents that ∆QALYs increases with b t (P) (the probability of having chronic diseases) at each period t. As a result, compared to waiting, patients with high risk tend to diagnose by invasive tests. This suggests that an increased risk of chronic diseases naturally increases the future reward of diagnosis and reduces the relative effect from side effects of an invasive test and subsequent treatment on patients.

Proposition 2.
There exists an optimal diagnosis time t * for the patient with the probability of having chronic diseases b t (P), and the optimal time is uniquely determined by the following equation Proposition (2) indicates that there is the maximum incremental rewards in QALYs of chronic diseases diagnosis at time t * between two diagnostic decisions in belief state b t (P), and the patient's optimal policy is diagnosis by an invasive test, i.e., a * t = G. From the patient's perspective, Proposition (2) shows how a physician determines the optimal time for a single patient in belief state b t (P) based on ∆QALYs. Next, Proposition (3) will make optimal diagnostic policies for populations from the perspective of physicians and governments.

Proposition 3.
There exists a threshold b * t (P) such that 1.
If b t (P) > b * t (P), the optimal policy is diagnosis, i.e., a * t (b t )= G; where the threshold equals Proposition (3) states that optimal diagnostic policies exhibit a threshold structure. This implies that b * t (P) is the maximum probability that the patient who chooses to wait is in state P. When a patient's belief state b t (P) is lower than this threshold, the optimal policy is wait, while when b t (P) is higher than this threshold, the optimal policy is performing an invasive test.
Propositions (2) and (3) provide insights for physicians and governments as well as policymakers to make diagnostic decisions more effectively. Specifically, from the patient's perspective, Proposition (2) indicates that a given patient should be recommended to diagnose in the optimal diagnosis time based on his basic screening results. In addition, from the government and policy makers' perspective, Proposition (3) shows that optimal diagnostic policies are denoted by the belief threshold between diagnosis and waiting for different populations with various individual characteristics, thereby efficiently improving health and balancing between overdiagnosis and underdiagnosis. For example, if available resources are restricted, such as essential medical supplies and well-trained personnel, diagnostic policies can be made for patients with different disease probabilities in an optimized manner to effectively use resources and improve overall QALYs.

Case Study
In this section, we perform numerical analyses on CHD that investigate how patient characteristics and model parameters may impact diagnostic policies by CAG as well as health rewards of patients. The primary goal of this research is to verify the feasibility of the proposed model and provide relevant insights obtained from the results to physicians and decision makers in health-related fields for diagnosing CHD, effectively.
First, we summarize parameters from the available clinical data and present optimal policies in the basic model. Second, to achieve a more personalized diagnostic decision, we consider the influence of individual characteristics on optimal policies. Third, we evaluate the performance of optimal diagnostic policies on CAG diagnosis and compare them with physicians' decisions in clinical practice. Last, we use sensitivity analysis to evaluate the influence of changes in each of the model parameters to identify which parameters most significantly affect optimal diagnostic policies.

Parameter Estimation
Our clinical data were obtained from a dataset of CAG examinations from a large general hospital in Kaifeng, Henan Province, collected between January and December 2019. The data consists of 1291 patients. Because we focused on primary diagnosis, we removed unrelated records, which included 399 patients who were in states of percutaneous coronary intervention (PCI) or coronary artery bypass grafting (CABG), and 59 patients with acute myocardial infarction. Finally, our data set consisted of 833 effective records. The results of data statistics showed that 495 patients did not suffer from CHD. Therefore, we assumed that the average diagnostic rate of CAG was 41% in the basic model. Table 2 summarizes the above parameters used to construct the transition probability and the rewards in our model, and all parameters are nonnegative and not greater than 1 by definition.

Baseline Analysis
We discuss some interesting observations and general insights under the baseline scenario that can be drawn below. b t (p) at age t = 40 is 0.01, which is based on the annual incidence of CHD for ages 40-60, as reported by the Center for Health Statistics and Information [49]. The belief threshold between diagnosis and waiting in the basic model is illustrated in Figure 3. Obviously, it can be seen that optimal policies have the following properties. First, there is a threshold for optimal policies. Second, the change in the optimal threshold will be related to the age of the patient. Third, the threshold is higher for younger patients. This is because when a patient is at a young age, the health loss of undergoing a CAG is generally greater than the health loss of waiting. Fourth, there is a termination age for diagnostic decisions. If the patient is over age 88, namely the termination age, it will not be recommended to diagnose.
It can be seen from Figure 4 that not everyone benefits from CAG. When the patient is between age 41 and 84, the benefit of undergoing a CAG is greater than that of waiting. Especially for patients between age 43 and 66, the health rewards are significantly higher given the decision to diagnose, and the patient aged 61 who receives a CAG has the greatest health rewards. Furthermore, if patients are older than 84 years, taking CAG tests results in a decrease of their rewards, for which physicians currently hesitate to recommend patients receive a CAG for diagnosing CHD. The change is intuitive because with the increasing probability of death from other causes such as patients' age, their remaining lifetime QALYs following treatment will be lower, which is similar to the related literature and guidelines [8,40]. Therefore, the government and decision makers in health-related fields should pay attention to patients between age 43 and 66 to promote early screening and diagnosis of CHD to reduce health loss and medical costs relating to CHD. Unlike chronic diseases studied in other literature, we can find interesting results from Figures 3 and 4. If the patient is under the age of 41, the side effects of undergoing a CAG are higher than that of waiting. Note that in this case patients will not be recommended for diagnosis, which is consistent with low adherence rates of younger patients receiving a CAG in reality. We can explain this as follows. First, because CAG is an invasive examination, it has potential risks and side effects such as bleeding or hematoma at the puncture site and arrhythmia. Secondly, CAG has proven to have a significant role in measuring the degree of vascular stenosis, and patients may be treated by PCI or CABG during the diagnosis process by performing a CAG, which is another important role. In clinical practice, a patient does not need to be treated if the degree is less than 50%. Finally, because of the lower prevalence of CHD for patients at a young age compared with other diseases, they are relatively less likely to receive treatment after diagnosis, and hence their threshold is higher. This is different from the optimal screening policies proposed in other literature, which recommend more aggressive screening for younger patients.

Optimal Policies Considering Individual Characteristics
In this section, in order to investigate the impact of individual characteristics on optimal policies, we updated the diagnostic rate of CAG based on the dataset described above, as shown in Table 3. The updated diagnostic rates are disaggregated according to different ages and genders. As shown in Figure 5, optimal diagnostic policies vary with individual characteristics of patients; that is, the belief threshold and the termination time are different with various characteristics. In general, as the patient's age increases, the threshold first decreases and then increases, which is different from non-decreasing thresholds in the age seen in other literature. Furthermore, females have a higher diagnostic threshold than males, especially at a younger age, and the termination time for diagnosis becomes smaller for female patients. Thus, at such low belief levels, aggressive policies would lead to many false positives and unnecessary CAG tests, and the diagnosis rate would not increase significantly, which in turn would lead to a reduction in QALY. For evaluating the difference of health rewards, Figure 6 compares the influence of individual characteristics on the improvements in the QALY criterion under two diagnostic policies. Obviously, we find that ∆QALYs of males undergoing diagnosis is higher than that of females, which implies that having aggressive diagnosing between the ages of 50 and 75 is especially critical for males. However, the benefits for females are less because the efficacy of the screening test is lower for them. For example, the improvement between two strategies of a male aged 61 is 0.1008 QALYs, and hence a man has 0.0405 QALYs more than a woman of the same age. In addition, the diagnostic benefits of young patients may be negative due to the side effects of examinations and the low probability of having CHD, implying that the optimal policy is to wait in this case.
Therefore, physicians and decision-makers in health-related fields should focus on diagnosing patients with a high probability of having CHD via CAG to avoid underdiagnosis and delays in treatment, especially between the ages of 50 and 75. At the same time, unnecessary testing of patients with a low probability, especially young females, could be reduced, which would avoid overdiagnosis. This is in line with diagnostic guidelines and the literature [22,54,55]. It should be noted that in clinical practice, however, physicians need to consider other factors comprehensively when deciding whether a patient should go for further testing to avoid delays in diagnosis and treatment.

Comparison with Other Policies
We measured the total health rewards of CHD diagnosis by estimating how much the total expected QALYs for a 40-year-old patient improves when the optimal policy is adopted as opposed to other policies. Table 4 shows that the improvement and percentage improvement in QALYs between optimal decisions and decisions in clinical practice. Based on our experiments, we found that the optimal strategies we proposed can result in higher QALY gains for patients than others. For example, the optimal policy for male patients in b 40 (P)= 1 is diagnosis by CAG. Compared with the decision to wait, the incremental benefit of the optimal policy is 0.1067 QALYs per person for the male population. As a result, males in a higher belief state get more rewards when taking the optimal policy than when taking others. Note that optimal policies are different for patients of different genders. When b 40 (P) is equal to 0.5, male patients are expected to have up to 0.177% more QALYs by following the optimal policy rather than waiting, while female patients should be advised to wait. We focused on two different perspectives, i.e., individual patients and populations with various individual characteristics. Thus, the improvement in quality-adjusted life years of an individual patient is not obvious, whereas the health benefits at the population level will increase significantly as the benefits of each patient increase.

Sensitivity Analysis
Because some parameters in our model are subject to variation due to differences in actual conditions, patient preferences, and physiology, we conducted sensitivity analysis for these parameters in Figure 7. The variation range of these parameters is based on relevant literature or ±20% of the basic values [26,29]. The optimal policy is particularly sensitive to the annual loss of QALYs per person (namely α, β, and γ) in states P, C, and M, respectively, as illustrated in Figure 7e,g. In Figure 7c,d, optimal policies are sensitive to the change in specificity p 1 and sensitivity p 2 of basic screening. In addition, Figure 7i is the sensitivity analysis of the discount factor, λ. As illustrated in Figure 7a,b,h, we observe that ∆QALYs is insensitive to changes in d 1 , d 2 , and µ.
The above sensitivity analysis shows that it is crucial to assess patients' health loss under each of the states when studying the diagnosis of chronic diseases. Therefore, model parameters are well estimated, which can help effectively optimize diagnostic policies based on various personal characteristics. Furthermore, improving the sensitivity of basic screening and reducing the side effects of CAG and subsequent treatment can effectively improve health rewards for undergoing a CAG, which implies that patients will be willing to diagnose.

Discussion
From the patient's perspective, there exists an optimal diagnosis time based on basic screening results via non-invasive tests. When patients are in the recommended age, they will be encouraged to diagnose via invasive tests for maximizing their health rewards.
From the perspective of physicians and the government as well as decision makers, optimal policies of chronic disease diagnosis follow a control-limit type policy with respect to a belief threshold for different populations with various individual characteristics. This can lead to better resource utilization for healthcare providers and governments. Unlike most studies in the literature, due to the specificity of CHD and its examinations, the diagnostic threshold first decreases and then increases as the patient's age increases. In addition, not everyone benefits from non-invasive tests, and we were able to identify patients that could benefit from tests through this model. Therefore, physicians should focus on the diagnosis of males with a high probability of having CHD to efficiently avoid underdiagnosis and a delay in treatment, particularly between the ages of 50 and 75. In contrast, unnecessary examinations for patients with a low probability, especially young females, should be reduced which can avoid overdiagnosis.
In clinical practice, however, physicians need to comprehensively consider other clinical factors to avoid delays in diagnosis and treatment. Therefore, decision makers can make personalized decisions based on the characteristics of different populations. This can not only help high-risk patients to be diagnosed and treated as soon as possible, but also make effective use of medical resources. This further emphasizes the importance of individual characteristics, which would be of great significance for the government and policymakers in health-related fields when setting health care policies and for insurance companies when designing insurance programs.
The contribution of this paper is two-fold. First, we extended the POMDP model to a new problem and studied personalized diagnostic decisions of chronic diseases considering individual characteristics of patients. This model can also be extended to other diseases and other medical decisions such as screening and treatment policies. Second, we derived some structural properties, including the optimal diagnosis time for a certain patient with chronic diseases and the belief threshold of diagnostic policies for different populations, and investigated the effect of individual characteristics on properties.

Conclusions
Diagnostic policies of chronic diseases have become a critical research issue for patients and physicians as well as decision makers in health-related fields, which currently has potential problems such as overdiagnosis and underdiagnosis. We constructed a POMDP model for chronic diseases to optimize diagnostic policies incorporating individual characteristics of patients and derived some structural properties, including the existence of the diagnostic threshold and the optimal diagnosis age.
More specifically, from the patient's perspective, we determined the optimal diagnosis time for a certain patient based on basic screening results; from the perspective of physicians and the government as well as decision makers, we analyzed optimal diagnostic policies, denoted by the belief threshold for populations with different individual characteristics. Our proposed model maximizes health benefits for patients and achieves the trade-off between underdiagnosis and overdiagnosis from a systematic point of view. Furthermore, the application of this model can improve overall public health QALYs and can be an efficient use of resources, especially where available resources, such as essential medical supplies and well-trained personnel, are constrained.
There are several future research directions of this study. Firstly, we only considered the impact of age and gender on the decision-making of invasive examinations, while other risk factors such as family medical history, smoking, and drinking are related to chronic diseases. Secondly, we assumed 100% adherence rate of patients to diagnosis recommendations in POMDP, and hence the impact of adherence behavior on the diagnostic policies is a potential topic for further exploration. Thirdly, an increasing number of patients prefer to play an active role during the decision-making process in clinical practice, which suggests it is critical to incorporate various risk preferences into the process for individuals or populations. Finally, because this paper focused on diagnostic policies, in terms of extending the analysis another research direction is to comprehensively analyze optimal policies of patients during the entire process including screening and treatment.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
To prove the aforementioned propositions, we need to prove related lemmas. First, we present the expected discounted future rewards under two policies after age t for five states H, P, C, M and D, respectively. The rewards can be written as Proof. From the above reward functions and the assumption α ≤ β ≤ γ, we have When the patient is in state M, R t (M) is greater than 0, thereby the above inequality holds. Expanding this inequality for t + 1, t + 2, . . . , 100, we have which implies R t (P) ≥ R t (M) for all t ∈ T. Lemma (A1) means that the patient in state P has expected QALYs no less than that in state M given the decision to wait.
Lemma A2. R t (H) ≥ R t (P) for all t ∈ T.
Proof. From the above reward functions, we have From Lemma (A1), the above inequality holds. Expanding this inequality for t + 1, t + 2, . . . , 100, we have hence the result holds. Lemma (A2) means that the patient in state H has expected QALYs no less than that in state P given the decision to wait.
Proof. From the above reward functions and the assumption α ≤ β ≤ γ, we have From Lemma (A2), the above inequality holds. Expanding this inequality for t + 1, t + 2, . . . , 100, we have hence the result holds when α = β. Lemma (A3) means that the patient in state C has expected QALYs no less than that in state P.
Lemma A4. R t (C) ≥ R t (M) for all t ∈ T.
Proof. From the above reward functions and the assumption α ≤ β ≤ γ, we have R t (C) − R t (M) = r t (C) + λp t (C|C )R t+1 (C) − (r t (M) + λp t (M|M )R t+1 (M)) When the patient is in state M, R t (M) is greater than 0, thereby the above inequality holds. Expanding this inequality for t + 1, t + 2, . . . , 100, we have hence the result holds. Lemma (A4) means that the patient in state C has expected QALYs no less than that in state M.
Lemma A5. R t (H) ≥ R t (P) for all t ∈ T.
From Lemmas (A1) and (A3), we have k ≥ 1, where k = R t+1 (C)−R t+1 (P) R t+1 (P)−R t+1 (M) . Because the probability of health deterioration in a CAG diagnosis of healthy is not more than k times the probability of a CAG diagnosis of having CHD, we have p 3 k ≥ e t (1 − p 3 ), and hence the above inequality holds. Expanding this inequality for t + 1, t + 2, . . . , 100, we have which implies R t (H) ≥ R t (P) for all t ∈ T. Lemma (A5) means that the patient in state H has expected QALYs no less than that in state P given the decision to diagnosis. Lemma A6. R t (P) − R t (P) ≥ −µ for all t ∈ T.
Proof of Proposition A1. Due to R t (H) − R t (H) = −µ, we have ∆QALYs = b t (P)(R t (P) − R t (P) + µ) − µ. For any ∀t ∈ T, by taking the derivative of ∆QALYs with respect to b t (P), we have d∆QALYs db t (P) ≥ 0 according to Lemma (A6). Then, we have ∆QALYs increases with b t (P) at each period for any t.
Proof of Proposition A2. By Lemma (A5), we have dv t (b t ,G) db t (P) = −(R t (H) − R t (P)) ≤ 0. Similarly, we conduct the derivation for v t (b t , W), dv t (b t ,W) db t (P) = − R t (H) − R t (P) ≤ 0 by Lemma A2. Because b t (P) ∈ [0, 1], the value of ∆QALYs is finite in T. As a result, there exists an optimal diagnosis time t * ∈ T for the patient with the probability of having CHD b t (P), which maximizes the incremental rewards in QALYs of CHD diagnosis. And the optimal time is uniquely determined by the following equation t * ∈ argmax(v t (b t , G) − v t (b t , W)) = argmax b t (P) R t (P) − R t (P) + µ − µ .
Proof of Proposition A3. According to Lemma (A6), we have R t (P) − R t (P) + µ ≥ 0. If b t (P) ≤ µ R t (P)−R t (P)+µ , ∆QALYs = b t (P) R t (P) − R t (P) + µ − µ ≤ 0 holds, which implies that the optimal diagnosis decision is wait, namely a * t = W. In contrast, if b t (P) > µ R t (P)−R t (P)+µ , we have v t (b t , G) > v t (b t , W), i.e., ∆QALYs > 0 holds, which implies that the optimal diagnosis decision a * t = G. This proposition can be categorized and proved by considering the two different cases presented below.
Case 1 (a * t (b t (P) = 1) = G): In this case, v t (b t = 1, W) < v t (b t = 1, G). From Proposition (2), v t (b t , G) and v t (b t , W) are non-increasing in b t (P). From Lemmas (A1) and (A3), we have dv t (b t ,G) db t (P) ≥ dv t (b t ,W) db t (P) . It follows that v t (b t , W) and v t (b t , G) have at most one intersection of b * t (P). As a result, if b t (P) > b * t (P), a * t (b t ) = G; if b t (P) ≤ b * t (P), a * t (b t ) = W. Case 2 (a * t (b t (P) = 1) = W): By the condition of this case, ∆QALYs is not greater than 0. From Proposition (1), ∆QALYs is positively correlated with b t (P). As a result, the optimal diagnosis decision is wait at any b t (P), which implies a * t (b t ) = W, if b t (P) ≤ b * t (P) and a * t (b t ) = G, if b t (P) > b * t (P), where b * t (P)= 1.