Applying Sequential Pattern Mining to Investigate the Temporal Relationships between Commonly Occurring Internal Medicine Diseases and Intervals for the Risk of Concurrent Disease in Canine Patients

Simple Summary This study used a technique called sequential pattern mining to uncover connections between common internal medicine diseases in dogs. The goal was to understand how these diseases relate to each other over time. Researchers collected medical records from dogs treated at the Konkuk University Veterinary Medicine Teaching Hospital, focusing on their diseases and the time intervals between diagnoses. They also calculated the 3-year risk of developing another disease after the initial diagnosis. This study identified 547 dogs with at least one internal medicine disease. The sequential pattern mining analysis revealed strong associations and time intervals for five of the most common diseases in dogs, including hyperadrenocorticism, myxomatous mitral valve disease, canine atopic dermatitis, chronic kidney disease, and chronic pancreatitis. This research suggests that sequential pattern mining is a useful tool for understanding disease connections and predicting future health issues in dogs. Veterinarians can use these findings to recommend preventive measures and treatments for dogs at risk of developing additional medical conditions, ultimately improving the care and health of canine patients. Abstract Sequential pattern mining (SPM) is a data mining technique used for identifying common association rules in multiple sequential datasets and patterns in ordered events. In this study, we aimed to identify the relationships between commonly occurring internal medicine diseases in canine patients. We obtained medical records of dogs referred to the Konkuk University Veterinary Medicine Teaching Hospital. The data used for SPM included comorbidities and intervals between the diagnoses of internal medicine diseases. Additionally, we estimated the 3-year risk of developing an additional disease after the initial diagnosis of a commonly occurring veterinary internal medicine disease using logistic regression. We identified 547 canine patients diagnosed with ≥ 1 internal medicine disease. The SPM-based analysis assessed comorbidities and intervals for each of the five most common internal medical diseases, including hyperadrenocorticism, myxomatous mitral valve disease, canine atopic dermatitis, chronic kidney disease, and chronic pancreatitis. The highest values of the association rule were 3.01%, 6.02%, 3.9%, 4.1%, and 4.84%, and the shortest intervals were 1.64, 13.14, 5.37, 17.02, and 1.7 days, respectively. This study proposes that SPM is an effective technique for identifying common associations and temporal relationships between internal medicine diseases, and can be used to assess the probability of additional admission due to the development of the subsequent disease that may be diagnosed in canine patients. The results of this study will help veterinarians suggest appropriate preventive measures or other medical treatments for canine patients with medical conditions that have not yet been diagnosed, but are likely to develop in the short term.


Introduction
The veterinary healthcare system has made considerable progress in transitioning from paper charts to electronic medical records (EMRs) [1].This transition has resulted in the accumulation of extensive clinical data, which can serve as a valuable resource for enhancing our understanding of current clinical practices and developing decision support systems [1].Data mining, the process of uncovering hidden knowledge within vast datasets, has found applications in various sectors, including healthcare [2] and the biomedical field [3,4].
Sequential pattern mining (SPM) is a data-mining technique employed to discover common association rules within multiple sequential datasets, and identify patterns in ordered events stored within a database [1,5,6].Originating in 1995 at IBM's Almaden Research Center by Agrawal [7], its initial applications were in the retail industry, where it could predict, for instance, that a customer might purchase the sequel to a book shortly after buying the first installment.Beyond retail, SPM was applied as a methodology to investigate relationships among medical events, and develop predictive models for extensive healthcare data in human medicine [1,8].SPM-derived studies applied to medicine have demonstrated promising outcomes, including disease susceptibility prediction [9,10], improved understanding of disease progression patterns [11,12], identification of revisit patterns [13,14], enhanced pharmacovigilance for medication safety [15,16], and the exploration of relationships between medical conditions [17,18].Despite several limitations, including variations in data quality, privacy concerns, the complexity of developing predictive models, and ethical considerations regarding data usage and transparency, medical research employing SPM has been actively reported to date [19,20].
In the field of veterinary medicine, SPM-based relationship analyses have not yet been applied.Given the diversity and complexity of veterinary data involving various animal species, the application of SPM within EMRs emerges as an area of significant potential.This approach can predict future disease statuses based on current patient conditions, which is particularly valuable in veterinary medicine with its diverse data sources and patient populations.
In veterinary medicine, it is common for diseases, especially those related to internal medicine, to remain undiagnosed in canines until clear symptoms develop; thus, predicting their development holds significant importance [21].Moreover, the prompt and precise diagnosis of internal medicine diseases stands as a pivotal factor in managing and preserving the health of veterinary internal medicine patients.The prognosis of many internal medicine diseases can be affected by the presence of concurrent illnesses, and therapeutic medications targeted at specific diseases may exacerbate comorbidities.Therefore, large-scale studies on the relationships between common internal medicine diseases are imperative to empower veterinarians to deliver more precise prognostic information to owners, and guide appropriate treatment strategies.However, the associations between the most common internal medicine diseases in dogs remain inadequately explored.
This population-based study aimed to identify the relationships between commonly occurring internal medical diseases in canine patients through the utilization of SPM and logistic regression analysis.

Materials and Methods
This section describes the source of data and methodology for the analysis of relationships among internal diseases used in this study.

Canine Patients
This retrospective study included client-owned dogs referred to the Department of Veterinary Internal Medicine at the Veterinary Medicine Teaching Hospital (VMTH), University of Konkuk, Seoul, Republic of Korea.We conducted a thorough search of the EMRs for dogs diagnosed with internal diseases between November 2014 and December 2017.This study was approved by the University of Konkuk Institutional Animal Care and Use Committee (reference no.IACUC 20158).All procedures adhered to the relevant guidelines and regulations, with informed consent acquired from the owners of all participating dogs.

Identification of Internal Medicine Diseases
We obtained the EMRs of dogs referred to the KU-VMTH between 2014 and 2017.Medical records were retrospectively reviewed, and signalments, clinical signs, and diagnostic evaluations were extracted.Patients with previous internal diseases were excluded, and cases with missing or irregular data were excluded from this study.This study selected the five most common veterinary internal medicine diseases, and estimated the 3-year risk of progression to other diseases for each.

SPM
We employed SPM to examine the relationship between diseases caused by time differences, and generalized sequential pattern (GSP) algorithms were used.The algorithms were developed to address sequence mining challenges, and predominantly rely on the a priori (level-wise) approach.Within this level-wise paradigm, an initial step involves identifying all frequently occurring diseases in a systematic manner.This entails enumerating the appearances of individual elements within the medical records.Subsequent to this, comorbidities are refined by excluding infrequent diseases.Consequently, each comorbidity solely encompasses the frequent elements that were initially present.The refined medical records subsequently serve as an input for the algorithm, necessitating a single comprehensive scan of the entire medical record collection.The algorithm conducts several iterations through the records.During the initial iteration, individual items, termed as 1-sequences, are enumerated.Utilizing these frequent items, a collection of potential 2-sequences is constructed, followed by a subsequent iteration to ascertain their prevalence.These recurrent 2-sequences serve as a basis to produce potential 3-sequences.This methodology is perpetuated until no additional recurrent sequences emerge.The algorithm fundamentally comprises two primary steps.In the first step, given the set of frequent sequences D k−1 from the (k − 1)th iteration, candidates for the subsequent iteration are derived by self-joining D k−1 .During the step, any sequence with at least one infrequent subsequence is discarded.In the second step, a search strategy based on a hash tree is utilized to ensure efficient support counting.Ultimately, sequences that are not maximally frequent are excised.Figure 1 shows the GSP algorithm used in the study.
The main parameter is the k-length sequence, and the number of diseases included in the sequence is denoted by K.A sequence comprising two diseases is referred to as a 2-length sequence.The SPM measures are based on confidence and duration values.Confidence (see Equation ( 1)) was defined as the conditional probability of a sequential pattern (i.e., disease "A" to disease "B") [6], and duration (see Equation ( 2)) was the average occurrence time between the diagnosis of the initial disease and the diagnosis of the subsequent disease.In this analysis, sex, size, analogous classification, and date were used as variables to identify the sequential pattern of the disease using SAS Enterprise Miner statistical software, version 13.2 (SAS Institute, Cary, NC, USA).
Figure 1.The SPM algorithm used in this study.

Statistical Analysis
To statistically determine whether a specific internal disease occurred more frequently in patients with other commonly occurring internal medicine diseases, we established a cohort of patients diagnosed with internal diseases between 2014 and 2017, excluding those diagnosed before 2014.We compared the logistic regression results with the occurrence of each disease in patients with other newly diagnosed diseases, and the five most common internal medical diseases in the control group.The independent variable was defined as 1 or 0, depending on whether patients newly diagnosed with a disease had another condition, and the covariates were the presence or absence of other diseases.Diseaseadjusted odds ratios (aORs) and 95% confidence intervals (CIs) were calculated using multivariate logistic regression analysis.Data analyses were performed using SAS Enterprise Miner, version 13.2 (SAS Institute, Cary, NC, USA).Statistical significance was set at p < 0.05.

Results
This section describes the results of SPM and logistic regression in relation to internal diseases during the study period.

Canine Patient Population
In total, 547 dogs were evaluated for clinical signs of internal diseases during the study period.Overall, 697 diseases were diagnosed, of which, hyperadrenocorticism

Statistical Analysis
To statistically determine whether a specific internal disease occurred more frequently in patients with other commonly occurring internal medicine diseases, we established a cohort of patients diagnosed with internal diseases between 2014 and 2017, excluding those diagnosed before 2014.We compared the logistic regression results with the occurrence of each disease in patients with other newly diagnosed diseases, and the five most common internal medical diseases in the control group.The independent variable was defined as 1 or 0, depending on whether patients newly diagnosed with a disease had another condition, and the covariates were the presence or absence of other diseases.Disease-adjusted odds ratios (aORs) and 95% confidence intervals (CIs) were calculated using multivariate logistic regression analysis.Data analyses were performed using SAS Enterprise Miner, version 13.2 (SAS Institute, Cary, NC, USA).Statistical significance was set at p < 0.05.

Results
This section describes the results of SPM and logistic regression in relation to internal diseases during the study period.

Comorbidity Association Rules and Intervals for Internal Medicine Diseases
For each of the five most common internal diseases, the sequential patterns of diseases and intervals between their diagnoses obtained using the SPM are expressed in parentheses, as shown in Figures 2-6.The comorbidity association rule "disease A to disease B" indicates the percentage of patients with disease B among the patients with disease A [23].Duration refers to the interval between the diagnoses of two diseases (average number of days).

Hyperadrenocorticism
Figure 2 illustrates the association rules for HAC and their confidence levels and intervals.The highest value was 3.01% for the following association rules: "HAC to CKD", "HAC to food allergy", and "HAC to CAD".The second highest value was 2.26% for the association rules between HAC and hepatitis, renal calculi, and pyoderma.The association rule for HAC with CKD exhibited the shortest interval of 1.64 days, followed by 2.0, 4.95, 5.36, 6.86, and 9.35 days for the association rules for HAC with hepatitis, renal calculi, CAD, food allergy, and pyoderma, respectively.The highest value of 4.55% was observed for the association rule "hepatitis to HAC", followed by 4.1%, 3.9%, 3.53%, and 3.33% for associations with CKD, CAD, renal calculi, and food allergies, respectively.The association rule for "CAD to HAC" had the shortest interval of 5.37 days, followed by 6.95, 17.02, 17.35, and 33.78 days for the association rules of food allergy, CKD, renal calculi, and hepatitis with HAC, respectively.

Myxomatous Mitral Valve Disease
Figure 3 shows the association rules for MMVD and their confidence intervals.The highest value was 6.02% for the association rule of "MMVD to HAC".The association rule for "MMVD to food allergy" had the shortest interval of 13.14 days, followed by 25.22 and 31.67 days for the association rules for MMVD with hepatitis and HAC, respectively.The value of 4.55% was obtained for the association rule of hepatitis and MMVD.The association rule for "food allergy to MMVD" demonstrated the shortest interval of 1.5 days, followed by 50.67 days for the association rule of hepatitis with MMVD.

Chronic Pancreatitis
Figure 4 illustrates the association rules for chronic pancreatitis and their confidence intervals.The association rule for "chronic pancreatitis to HAC" had the highest value of 4.84%.The interval of 1.7 days for the association rule for "chronic pancreatitis to food allergy" was the shortest, followed by 26.63 days for the association rule for chronic pancreatitis and HAC.

Chronic Kidney Disease
Figure 5 shows the association rules for CKD and their confidence intervals.The association rule for "CKD to HAC" had the highest value of 4.1%.Additionally, the association rule "CKD to hypothyroidism" exhibited the shortest interval of 3.1 days, followed by 17.02 days for the association rule of CKD and HAC.The association rule "HAC to CKD" showed the highest value of 3.01%.The association rule for "HAC to CKD" had the shortest interval of 1.64 days, followed by 2.1 days for the association rule for hypothyroidism and CKD.

CAD
Figure 6 displays the association rules for CAD and their confidence intervals.The association rule of "CAD to HAC" had the highest value of 3.9%.The association rule for "CAD to HAC" had the shortest interval of 5.37 days, followed by 7.96 days for the association rule of CAD and pyoderma.The association rule for "pyoderma to CAD" exhibited the highest value of 6.25%, and the association rule for "HAC to CAD" had the shortest interval of 5.35 days, followed by 5.57 days for the association rule for pyoderma and CAD.

Risk of Progression of the Five Most Common Veterinary Internal Medicine Diseases
We estimated the 3-year risk of developing comorbidities in dogs with the five most common veterinary internal medicine diseases using logistic regression analysis.The aORs and CIs for comorbidities are shown in Table 2. Patients with HAC were at an elevated risk of developing CKD and chronic pancreatitis (aOR 1.653 [95% CI 1.086-2.515];aOR 2.162 [95% CI 1.273-3.672],respectively), whereas patients with MMVD were at a reduced risk of developing CAD and an elevated risk of developing CKD (aOR 0.476 [95% CI 0.255-0.890];aOR 2.003 [95% CI 1.314-3.051]).Additionally, patients with CKD were at an elevated risk of developing HAC, MMVD, and chronic pancreatitis (aOR 1.582 [95% CI 1.035-2.419];aOR 2.003 [95% CI 1.314-3.051];and aOR 3.937 [95% CI 2.318-6.689],respectively).Moreover, patients with chronic pancreatitis had an elevated risk of developing HAC and CKD (aOR, 2.008 [95% CI 1.171-3.444];aOR, 3.822 [95% CI 2.242-6.513]).However, patients with CAD were at a reduced risk of developing MMVD (aOR, 0.440 [95% CI 0.231-0.837]).No statistically significant decrease or increase in the risk of developing MMVD or CAD was observed in patients with HAC.Further, there was no significant decrease or increase in the risk of developing HAC and chronic pancreatitis in patients with MMVD, the risk of developing HAC, CKD, and chronic pancreatitis in patients with CAD, the risk of developing CAD in patients with CKD, or the risk of developing MMVD and CAD in patients with chronic pancreatitis.

Discussion
This retrospective study applied SPM to clinical data from canine patients extracted from the EMRs of the VMTH network in the Republic of Korea to assess the probability of additional admission for developing a subsequent disease within 3 years, in patients with at least one veterinary internal medicine disease.We evaluated the comorbidity association rules and intervals among commonly occurring internal medicine diseases in dogs using SPM.
In veterinary medicine, awareness of the possible associations between diseases and the common interval between diagnoses can facilitate early detection [24].Additionally, some medications used to treat common illnesses can exacerbate other diseases, necessitating careful consideration for patients at high risk for these diseases.Therefore, studies on the comorbidities of commonly occurring internal medicine diseases, and the intervals between diagnoses, are required to inform veterinary clinical practice.Further, when there are many concurrent diseases, confidently linking the reported clinical signs to one specific disease, and not to one or more comorbidities, can be challenging.Moreover, owing to the vague and poorly defined clinical presentation of internal medicine diseases in dogs, identifying the signs that define the clinical presentation of a given disease is difficult.In this study, the shortest interval for disease association was typically less than 1 month.This finding suggests that a concurrent disease may remain undiagnosed until clear symptoms develop, rather than being diagnosed as the disease progresses.Therefore, awareness of the risk of comorbidities in veterinary medicine patients with the most common internal diseases is important.Results of sequence patterns and association mining in this study provided valuable insights into optimizing medical services for disease management, early detection, treatment, and revisits for canine patients with concurrent internal medicine disease patterns.Thus, appropriate preventive measures and recommendations for other medical treatments for canine patients who have been diagnosed with these internal medicine diseases can be provided accordingly.
The percentage of patients with a given disease who experienced the new onset of another specific disease was defined as the confidence parameter [25].The value of the "confidence" parameter is mathematically synonymous with the concept of "comorbidity" in epidemiology [6]; hence, we regarded the confidence parameter as indicative of comorbidity.We investigated the comorbidities of HAC, MMVD, CAD, CKD, and chronic pancreatitis in canine SPM.We also investigated the interval between the onset of specific diseases.
Regarding the association rules for HAC, the highest value (3.01%) was observed for CKD, food allergies, and CAD.The interval of 1.64 days found for the association of HAC with CKD was the shortest.For the association rules for MMVD, the value of 6.02% for the association between MMVD and HAC was the highest.In this study, although the highest value of the association rule was less than 5%, this result is noteworthy because it indicates that patients with a specific disease had a significantly higher risk of developing another disease than those without a specific disease.Additionally, the association between MMVD and food allergies had the shortest interval of 13.14 days.Regarding the association rules for CAD, the value of 3.9% for the association with HAC was the highest.The association of CAD with HAC showed the shortest interval (5.37 days).Concerning the association rules for CKD, the highest value of 4.1% was obtained for an association with HAC.The association between CKD and HAC exhibited the shortest interval (17.02 days).Finally, for the association rules for chronic pancreatitis, the value of 4.84% for association with HAC was the highest.An interval of 1.7 days for the association of chronic pancreatitis with food allergies was the shortest.
In the present study, many internal medicine diseases were diagnosed in canine patients within a few days of the initial diagnosis.Thus, in patients diagnosed with an internal medicine disease, focusing solely on the initial diagnosis may result in missed opportunities to prevent additional admissions, including those for other internal medicine diseases.It is important to monitor canine patients with the five most common diseases for comorbidities that may put them at risk of additional admissions.
Additionally, we estimated the 3-year risk of developing additional diseases after the initial diagnosis of one of the five most common veterinary internal medicine diseases using logistic regression.We found that patients with HAC had a significantly higher risk of developing CKD and chronic pancreatitis than those without HAC.Patients with MMVD had a significantly higher risk of developing CKD than non-MMVD patients.Moreover, patients with CKD had significantly higher risks of HAC, MMVD, and chronic pancreatitis than non-CKD patients.Patients had a significantly higher risk of developing HAC and CKD than those without chronic pancreatitis.However, patients with CAD had a significantly decreased risk of MMVD, and vice versa.This finding may be attributed to differences in the age at onset of these two diseases: CAD occurs predominantly in relatively young dogs [26], whereas MMVD is usually diagnosed in relatively old dogs [27].These findings provide valuable insights into the co-occurrence of diseases reported in previous studies on HAC [28], MMVD [29], CKD [30], and chronic pancreatitis [31] in canine patients.
However, given the retrospective nature of this study, there were several inherent limitations, such as incomplete medical records, nonstandardized diagnostic tools, and clinician bias.Moreover, considering the statistical power, we were unable to control the breed because many breeds (53 breeds/547 dogs) were included in this study.Additionally, the findings of this study may not be equally applicable to dogs of all ages because of differences in the common age at the onset of each disease (e.g., atopic dermatitis is diagnosed predominantly in young dogs, whereas heart disease usually occurs in old dogs).To overcome these limitations, including the publication bias from the single-center retrospective study, further larger-scaled and variably controlled (breed, age, and sex) multicenter epidemiological studies are warranted.Furthermore, if a substantial amount of data become available, we intend to conduct research aimed at predicting the onset dates of diseases after the initial diagnosis.This will involve the utilization of deep learning algorithms such as RNN, LSTM, and GRU that can effectively analyze time-series data to enhance our understanding of disease progression.

Conclusions
In this study, we used SPM to determine the temporal relationships between the onset of internal medicine diseases, visualize these relationships, and generate rules for assessing the probability of additional admission for the development of a subsequent disease that is likely to be diagnosed in canine patients.These findings can provide valuable information to enhance the quality of medical services by recommending suitable medical follow-ups and treatments for the subsequent visits, based on a better understanding of the patterns of concurrent internal medicine diseases in canine patients.

Figure 1 .
Figure 1.The SPM algorithm used in this study.

Figure 2 .
Figure 2. Visualization of the sequential patterns (association rules: comorbidity and average interval between the diagnosis of diseases in days) between hyperadrenocorticism (HAC), chronic kidney disease (CKD), food allergy, hepatitis, renal calculi, pyoderma, and canine atopic dermatitis (CAD).

Figure 2 .
Figure 2. Visualization of the sequential patterns (association rules: comorbidity and average interval between the diagnosis of diseases in days) between hyperadrenocorticism (HAC), chronic kidney disease (CKD), food allergy, hepatitis, renal calculi, pyoderma, and canine atopic dermatitis (CAD).

Table 1 .
Characteristics of the study population with diagnoses of HAC, MMVD, CAD, CKD, and chronic pancreatitis.

Table 2 .
Risk of developing an additional disease after the diagnosis of a commonly occurring internal medical disease in 547 dogs.