A Data-Driven Approach to Defining Risk-Adjusted Coding Specificity Metrics for a Large U.S. Dementia Patient Cohort

Medical coding impacts patient care quality, payor reimbursement, and system reliability through the precision of patient information documentation. Inadequate coding specificity can have significant consequences at administrative and patient levels. Models to identify and/or enhance coding specificity practices are needed. Clinical records are not always available, complete, or homogeneous, and clinically driven metrics to assess medical practices are not logistically feasible at the population level, particularly in non-centralized healthcare delivery systems and/or for those who only have access to claims data. Data-driven approaches that incorporate all available information are needed to explore coding specificity practices. Using N = 487,775 hospitalization records of individuals diagnosed with dementia and discharged in 2022 from a large all-payor administrative claims dataset, we fitted logistic regression models using patient and facility characteristics to explain the coding specificity of principal and secondary diagnoses of dementia. A two-step approach was produced to allow for the flexible clustering of patient-level outcomes. Model outcomes were then used within a Poisson binomial model to identify facilities that over- or under-specify dementia diagnoses against healthcare industry standards across hospitalizations. The results indicate that multiple factors are significantly associated with dementia coding specificity, especially for principal diagnoses of dementia (AUC = 0.727). The practical use of this novel risk-adjusted metric is demonstrated for a sample of facilities and geospatially via a U.S. map. This study’s findings provide healthcare facilities with a benchmark for assessing coding specificity practices and developing quality enhancements to align with healthcare industry standards, ultimately contributing to better patient care and healthcare system reliability.


Introduction
The precise recording, evaluation, and documentation of patient information through medical coding play a large role in the quality of care delivered, reimbursement from payors, and the reliability of healthcare systems [1].The process of clinical coding has multiple facets, such as accuracy, completeness, and appropriate levels for the specificity of diagnostic coding.Coding specificity, an important aspect of the coding process, refers to the level of granularity at which a clinical diagnosis is recorded [2].
A widely used medical coding system is the International Classification of Diseases Clinical Modification (or ICD-CM) [3].ICD-10-CM (herein ICD-10) refers to the 10th revi-sion of the taxonomy and serves to categorize diseases and health conditions at varying degrees of specificity, from coarse or unspecified more general diagnoses to more granular ones representing a deeper level of knowledge of the clinical condition.Medical coding helps standardize documentation across healthcare systems, serves as a tool for medical billing, standardizes risk factors in risk-adjusted quality measures, and helps guide healthcare policy by accurately identifying disease prevalence and supporting national and international decision making [3].The coding system also helps organize and find medical information easily, impacting how we understand and use medical data, including assessing in real time the spread or prevalence of diseases and the optimization of resource allocations [3][4][5][6].
When considering coding practice enhancements, it is important to assess how healthcare professionals in hospitals document the presence of a specific condition or disease.Transcription-potentially including voice transcription for electronic health records (EHRs)-errors, missing information in patient charts, and illegible handwriting all contribute to inadequate specificity in coding [7].It constitutes a shared responsibility among all parties involved to appropriately code to the highest level of specificity [8].The move from ICD-9 to ICD-10 on 1 October 2015 led to a fivefold increase in the number of codes, exacerbating the complexity of coders' work and the potential for coding errors following this transition [3].Clinical specialties have been affected differently, with this ICD-10 transition representing an uneven burden across facilities and patients depending on the diagnosis family and the facility's level of specialization [9,10].
A lower coding specificity is a potential burden for Medicare and other payors, which are billed for potentially under-specified diagnoses when a higher specificity (i.e., enhanced diagnosis quality) could be available [3,8].Medicare is the health insurance coverage provided by the United States (U.S.) government for individuals 65 years and older.Medicare within the acute care inpatient setting refers to payments reimbursed through the Inpatient Prospective Payment System (IPPS).Through the IPPS, hospitalizations are grouped into medical severity diagnosis-related groups (MS-DRGs) based largely on the presence of principal and secondary ICD-10-CM diagnosis codes, and in some cases, ICD-10-PCS procedure codes.The MS-DRG grouping is associated with a weight that further adjusts the base hospital payment (determined by a set of hospital-level characteristics) to determine the final discharge-level reimbursement.There is a tradeoff between productivity and coding quality, as enhancing the coding specificity can be time intensive in some instances, requiring the coder to explore whether more specific coding is appropriate given the information provided in the patient medical record.At the facility level, the associated costs to the healthcare payor as well as the impact on patients' clinical history, treatment, and resulting health outcomes must also be considered [8].Coding specificity is especially relevant when a diagnosis is unrelated to the principal cause of hospitalization, or when diagnoses are not made by specialists, as added specification for these codes may not affect the hospital's rate of reimbursement for the hospitalization.Thus, coding specificity is not only important at the administrative level but also has the potential to impact both healthcare facilities and patients.
At the facility level, coding that accurately captures clinical diagnoses ensures that healthcare facilities maintain effective billing operations.Coding specificity has the potential to impact a facility's financial capital and allocation of resources, since facilities, after a short grace period that ended in October 2016, can be denied Medicare-based claims based on insufficient diagnostic specificity [8,11].ICD-10 codes are the foundation of hospital billing processes, so misdiagnoses or misclassifications of codes can impact hospital reimbursement and insurance eligibility, including Medicare reimbursements [12].Inaccurate coding further has the potential to affect facilities' reputations and pay-for-performance incentive payments, as such hospital-ranking programs evaluate quality performance using risk-adjusted outcomes that rely on ICD-10 coding [13][14][15][16][17][18].
From the patient perspective, accurate documentation ensures that the patient is receiving an adequate treatment plan tailored to the specifics of their diagnosis and needs, both during their inpatient stay and after discharge [8].In addition to individuals who may go undiagnosed, those with an under-specified diagnosis may also suffer worsened clinical outcomes.Among other factors, practitioners' unconscious biases as well as inappropriate facility practices could result in deviations from universal documentation and coding standards, thus potentially exacerbating health disparities [19], which may manifest through specificity gaps across subpopulations.A lack of specificity can lead to patients' clinical histories being affected, thus resulting in potential variations in care and resulting outcomes across social strata [8].From the standpoint of coding specificity, the literature lacks a wider understanding of how decreased (or increased) levels of specificity may be associated with sociodemographic factors, thus potentially exacerbating the aforementioned healthcare disparities [19].
Not all unspecified diagnoses are inappropriate.In fact, unspecified diagnosis codes are recommended by the United States (U.S.) Centers for Medicare and Medicaid Services (CMS): "When sufficient clinical information is not known or available about a particular health condition to assign a more specific code, it is acceptable to report the appropriate unspecified code" [20].Hence, there is a balance between the necessary level of unspecificity and the unnecessary level of unspecificity that needs to be considered, since a pure minimization of unspecified codes could also lead to incorrectly specified diagnoses.Conversely, achieving a higher level of specificity may require additional clinical tests or interventions, which may be subject to additional considerations regarding cost-effectiveness [21][22][23], especially when the primary cause of the inpatient stay is not related to nor affected by the unspecified diagnosis.Also, higher levels of specificity may not be warranted by clinical diagnoses.Incorrect levels of both specificity and unspecificity can lead to inappropriate treatment.Thus, when a diagnosis is not confirmed, it is appropriate to provide an initial, temporary unspecified diagnosis [20] until further tests can be performed, if clinically recommended.
While our approach is generalizable and can be applied across clinical strata, our motivating example consists of a large patient cohort across the U.S. of nearly 500,000 unique inpatient individuals who were diagnosed with dementia and discharged in 2022.In 2020, dementia affected the lives of over 55 million people across the world, which is close to 1% of the global population [24].Projections suggest that this number will experience nearly twofold growth every 20 years, surging to 78 million by 2030 and about 139 million by 2050 [24].This number is further increased by those providing caregiving and other family members indirectly suffering from this debilitating disease.In the absence of enhanced treatments or preventive measures, adverse outcomes associated with dementia will persistently rise [25].Many patients are likely to receive unspecified dementia diagnoses when seeing a primary care provider compared to when seeing a specialized provider (like a neurologist or geriatrician) [26,27].Thus, dementia represents an important disease within an aging population that is likely to be of increased relevance as treatment interventions are developed, and enhanced coding specificity is needed in this area to identify resources properly [25].In a 2017 study, researchers reviewed the medical records of dementia ICD-10 code cases, and they discovered that many of the cases lacked specific descriptions that would aid in confirming the diagnosis of specific types of dementia [28].This study revealed that 63% of cases did not provide a specific diagnosis of dementia in the medical records, but instead considered other conditions as the likely explanation of the patient's hospitalization [28].More generally, mental health conditions have been identified among conditions suffering from higher rates of unspecified diagnoses [8].
Models have been developed for assessing risk-adjusted coding intensity for both diagnoses and procedures, as well as identifying facilities that over-or under-code [29,30].This area tangentially relates to coding specificity.However, the literature still lacks riskadjusted approaches that account for factors potentially associated with coding specificity, adjusting for patient and facility characteristics, with only some initial work developed in the area of depression [31], but none, to our knowledge, in the area of dementia or other neurocognitive diseases.The aim of this study is to provide a novel risk-adjusted metric, demonstrated through a population-based dementia patient cohort in the U.S., to estimate dementia ICD-10 coding specificity by facility upon adjusting for a set of commonly available facility-and patient-level characteristics.
While enhancements in coding specificity practices are possible through other means, such as through the clinical identification of potential coding specificity inaccuracies or increased training, such approaches are not cost-effective if they need to be performed at the population level.There is a need for cost-effective approaches that serve to pre-screen and identify facilities which may need such enhancements the most.Clinical assessments may be possible if electronic health records are available, but this is not always the case.In this case, data-driven approaches may provide insights into how coding specificity can vary across patients and facilities and whether these variations occur in ways that may depart from anticipated randomness.Our proposed data-driven metric can serve facilities to self-assess variation in coding specificity compared with their healthcare peers and can provide a benchmark to identify facilities that could benefit from a further analysis of diagnostic coding specificity practices.

Data and Variables
De-identified data sourced and provided by Premier, Inc.'s private database serve as the foundation of this analysis [32].The dataset is composed of N = 487,775 observations containing information on the first inpatient hospitalization for each patient with a principal or secondary diagnosis related to dementia who was discharged in the year 2022 using the F ICD-10 diagnosis codes provided in Supplementary Tables S1 and S2.The ICD-10 codes corresponding to these diagnoses were identified by an expert team of medical coders at Premier, Inc. Patients who were admitted prior to 2022 were also included if they were discharged in 2022.
The data were further categorized into three types of variables: (1) outcome variables; (2) patient characteristics; and (3) facility characteristics.Outcome variables include coding specificity of principal diagnosis of dementia codes and coding specificity of secondary diagnosis of dementia codes.Principal diagnosis specificity denotes whether the ICD-10 dementia-related principal diagnosis code was specified (versus unspecified), and for secondary diagnoses, a specified diagnosis is assumed when at least one secondary diagnosis related to dementia was specified.In addition to masked patient IDs, patient characteristics for this study include the following: age group; sex; race; length of stay; primary payor; point of origin; discharge status; number of procedure codes; ICD-10 coding period (2022 for coding prior to 1 October 2022 and 2023 for codes from 1 October 2022); five Centers for Disease Control and Prevention's Agency for Toxic Substances and Disease Registry's (ATSDR) social vulnerability indices [33]; a COVID-19 indicator; and Medicare Severity Diagnosis Related Group (MS-DRG) type for the inpatient stay.In addition to masked facility IDs, facility characteristics include the following: three facility status variables (teaching, academic, and urban); ownership; size (bed count, grouped); case mix index (CMI); and U.S. state.

Statistical Analysis
Descriptive statistics were calculated and tabulated.Variables for which certain subgroups had limited representation (e.g., charity and indigent payors) were grouped together.Patients under 45 years old were grouped together due to their low counts.Discharge status codes indicating that the patient expired were collapsed into a single category.A diverse set of categories representing patients' points of origin with low counts were grouped into a single 'other' category.
Univariate and multivariate logistic regression analyses were utilized to identify associations between patient-and facility-level characteristics and each of the two outcomes (specificity of dementia principal and secondary diagnoses per patient hospitalization).Univariate and adjusted odds ratios (ORs), as well as corresponding 95% confidence inter-vals (Cls) and p-values, were computed and tabulated.Receiver operating characteristic (ROC) curves were calculated and depicted, and area under the curve (AUC) values were extracted to demonstrate the multivariate models' fitted performances to explain principal and secondary dementia diagnoses.
Clustering of this metric is demonstrated at the facility level, though other clustering factors are possible.Importantly, as opposed to variables used for constructing the patientspecific metric, clustering variables do not need to be observable for the full sample.A facility-specific metric of diagnostic coding specificity was also calculated from the risk-adjusted probabilities of specificity.Let Y i,j be the binary variable denoting coding specificity of the principal or secondary diagnosis for hospitalization i at facility j.This variable follows a Bernoulli (Ber) distribution with estimated probability pi,j as shown below: Y i,j ∼ Ber pi,j .
The set pi,j was estimated from the multivariate logistic regression model which was adjusted for patient and facility characteristics.Assuming that each hospitalization's coding specificity was independently, though not identically, distributed per facility, the total count of facility-specific coding specificity follows a Poisson binomial (PoiBin) distribution with probability vector pj = p1,j, p2,j , . . ., pn j ,j for n j hospitalizations in facility j as shown as follows: ∑ Facility-specific 95% CIs were extracted through the Poisson binomial facility-specific cumulative distribution functions (CDFs).These were used to identify facilities which under-(p < 0.025) and over-(p > 0.975) specified in their coding versus facilities' peers using the estimated CDF for the specificity count.
Error bars were constructed to demonstrate the facility-specific metric for a sample of 20 facilities for both dementia principal and secondary diagnoses.Among these facilities, the coding specificity of dementia diagnosis indicator variable was defined, and an observed count (dots) was plotted for each facility (X-axis).A 95% CI for each facility, built on the basis of the Poisson binomial model, was added to identify these facilities' adjusted levels of coding specificity against peers.Over-and under-coding risk-adjusted specificity practices were then identified by the facility.
Finally, geospatial U.S. maps were created to display adjusted ORs of principal and secondary diagnosis coding specificity by state against the reference of New York, which is the state with the highest per capita healthcare expenditure in the U.S. [34].

Results
Table 1 provides a summary of the descriptive statistics for N = 487,775 hospitalization records and patients, since each patient is only observed once due to the cohort definition (the first hospitalization for each patient within the year).The dataset comprised observations from 866 facilities, with an average of 563.25 patients per facility.The distribution of age among this dementia patient cohort is naturally skewed, with 61% of individuals being 80 years and older.Females constituted 58% of the patients, and the majority of the patients identified as White (76%).The median length of stay, which was log-transformed due to its large right skewness, was 5 days, and the most common primary payor was Medicare traditional (53%).The point of origin was predominantly non-healthcare facilities (79%), and the discharge status varied, with 19% of the patients being discharged to home or self-care, while the majority were transferred to other healthcare facilities, often skilled nursing facilities (36%).The average number of procedures during inpatient stays was 2.7, with surgical MS-DRGs representing 15% of hospitalizations.Additionally, 13% were COVID-19-positive patients.Most of the facilities were non-teaching (78%) and non-academic (85%).Urban facilities were more prevalent (86%) than rural ones.Voluntary non-profit private was the most common ownership status (64%).The bed capacity varied, with 1-50 beds (3%) and >400 beds (39%) being the least and most common facility sizes, respectively.The mean case mix index was 1.7.The dataset represented multiple states, with New York (9%) and Florida (12%) being the top states in the number of hospitalizations.Table 2 contains the adjusted ORs, 95% CIs, and p-values for the univariate and multivariate logistic regression analyses for modeling the coding specificity of dementia-related principal diagnoses.Younger patients were generally associated with higher odds of coding specificity than patients in the oldest age group (85+).Males experienced 45% higher odds of dementia-related principal diagnosis coding specificity than females (OR = 1.454; 95% CI: 1.301-1.625).Race was generally non-significant, except for Black patients, who experienced significantly higher odds of principal diagnosis coding specificity than White patients (OR = 1.237; 95% CI: 1.058-1.446).The log-length of stay was significant, with longer stays associated with higher odds of coding specificity (OR = 1.124; 95% CI: 1.060-1.191),but primary payor and point of origin were generally not significant.Patients with certain discharge statuses experienced significantly higher odds of coding specificity than those discharged to home or self-care, namely patients discharged to hospice homes, hospice medical facilities, or psychiatric hospitals (OR ≥ 1.354).The number of procedures was also significant, with each additional procedure performed associated with 23% increased odds of coding specificity (OR = 1.230; 95% CI: 1.179-1.283).The CMS fiscal year was highly significant, indicating 73.6% lower odds of specificity for 2023 (discharges occurring between 1 October and 31 December 2022) compared to 2022 (discharges between 1 January and 30 September 2022) (OR = 0.264; 95% CI: 0.224-0.311).Social vulnerability indices were not significant at the multivariate level, though some were significant univariately, indicating that some of the information content may be present in other patient characteristics.COVID-19 status and MS-DRG type were not statistically significant, except for at the univariate level, at which the latter showed surgical MS-DRGs associated with increased odds of specificity.At the facility level, patients in facilities whose teaching status was not available experienced 65.1% lower odds of specificity than those in nonteaching facilities (OR = 0.349; 95% CI: 0.142-0.856).Neither academic nor rural/urban status showed significant variability at the multivariate level.Most ownership categories were not significantly different from the voluntary non-profit private reference, except for other non-profit voluntary (OR = 0.605; 95% CI: 0.442-0.827)and local government (OR = 2.104; 95% CI: 1.401-3.159).Patients from facilities with bed counts lower than the reference category (>400) experienced lower odds of coding specificity, though only three categories were statistically significant.The case mix index was significant, with each unit increase accompanied by 57.9% increased odds of dementia-related principal diagnosis coding specificity (OR = 1.579; 95% CI: 1.188-2.100).Finally, most states demonstrated no statistically significant differences in principal diagnosis coding specificity compared to New York, with the exception of Hawaii, Louisiana, Minnesota, Oregon, Pennsylvania, and Virginia (which had a higher odds) as well as Illinois and Tennessee (which had lower odds of coding specificity).Table 3 reports the univariate and multivariate logistic regression results (ORs, 95% CIs, and p-values) for the specificity of secondary dementia diagnoses' outcome.For the multivariate results, all age groups experienced higher odds of specificity of dementia secondary diagnoses than the reference group of ages 85+ (OR ≥ 1.316; p < 0.001).Male patients had significantly higher odds of dementia secondary diagnosis specificity compared to females (OR = 1.224, 95% CI: 1.209-1.239;p < 0.001).Individuals identifying as Black were associated with lower odds of dementia secondary diagnosis specificity (OR = 0.955; 95% CI: 0.937-0.973)compared to White patients, while the opposite was found for those identifying as other races (OR = 1.069; 95% CI: 1.040-1.099).For some categories, primary payor, patient origin, and discharge status also showed significant associations with dementia secondary diagnosis coding specificity (see Table 3).Length of stay (in log terms) was also associated with higher odds of dementia secondary diagnosis specificity (OR = 1.017; 95% CI: 1.008-1.025).Those undergoing a larger number of procedures experienced higher odds of dementia secondary diagnosis specificity (OR = 1.039; 95% CI: 1.036-1.042).The CMS fiscal year was not substantially different, with those who were hospitalized in the new 2023 fiscal year experiencing 1.4% higher odds of specificity (OR = 1.014; 95% CI: 1.000-1.028).Patient socioeconomic (OR = 0.829; 95% CI: 0.717-0.958)and racial/ethnic minority (OR = 1.09; 95% CI: 1.03-1.154)statuses within the social vulnerability indices were significantly associated with decreased and increased, respectively, odds of dementia secondary diagnosis specificity.COVID-19-positive patients were associated with lower odds of dementia secondary diagnosis specificity (OR = 0.948; 95% CI: 0.930-0.965).Patients undergoing a surgical MS-DRG experienced 14% lower odds of dementia secondary diagnosis specificity compared to those undergoing a medical MS-DRG (OR = 0.859; 95% CI: 0.844-0.875).Academic facilities demonstrated higher odds of dementia secondary diagnosis specificity (OR = 1.052; 95% CI: 1.020-1.085),whereas those in rural settings experienced lower odds of dementia secondary diagnosis specificity (OR = 0.976; 95% CI: 0.955-0.997).Patients at facilities of different ownership types also experienced differing odds of dementia secondary diagnosis specificity (see Table 3).Lower bed counts were generally associated with lower odds of dementia secondary diagnosis specificity (OR ≤ 0.954) than those in the largest cluster of hospitals (>400 beds), with the exception of facilities with 51-100 beds and those with 351-400 beds.Substantial differences in the odds of dementia secondary diagnosis specificity were found by state when compared to the reference state of New York.

Discussion
The literature on diagnostic coding specificity remains scarce, with healthcare facilities and practitioners limited in their ability to self-evaluate against healthcare industry standards of practice.It is also unclear whether non-clinical characteristics can explain variability in specificity practices.To address this gap, a novel approach was demonstrated to evaluate facility-specific practices for the dementia-related coding specificity of principal and secondary diagnoses upon making risk adjustments for commonly available patient and facility characteristics.A logistic regression was applied to make risk adjustments to the probability of receiving a specified dementia diagnosis.The statistical output is used in a two-step approach, building on a Poisson binomial model, to evaluate the

Discussion
The literature on diagnostic coding specificity remains scarce, with healthcare facilities and practitioners limited in their ability to self-evaluate against healthcare industry standards of practice.It is also unclear whether non-clinical characteristics can explain variability in specificity practices.To address this gap, a novel approach was demonstrated to evaluate facility-specific practices for the dementia-related coding specificity of principal and secondary diagnoses upon making risk adjustments for commonly available patient and facility characteristics.A logistic regression was applied to make risk adjustments to the probability of receiving a specified dementia diagnosis.The statistical output is used in a two-step approach, building on a Poisson binomial model, to evaluate the performance of healthcare facilities in providing specified dementia-related principal, or at least one secondary, diagnoses.This metric can be used to identify facilities that perform differently (under-or over-specifying) compared to their healthcare industry peers and can provide an objective standard against which the coding specificity practices of facilities can be evaluated.These findings offer valuable insights for healthcare stakeholders and quality-control personnel, facilitating the identification of facilities that may benefit from targeted interventions to enhance the levels of specificity of dementia-related diagnosis coding.
Our results indicate that the coding specificity of dementia diagnoses is associated with a range of patient and facility characteristics, particularly for primary diagnoses, as demonstrated through a higher AUC value.Younger patients were generally associated with a higher odds of coding specificity for dementia-related principal and secondary diagnoses.While dementia has been found to be more easily identifiable among older patients [35], our findings indicate that, conditional on a dementia diagnosis, the odds of coding specificity are higher among younger patients.However, it is unclear whether there is a clinical association between the prevalence of specified cases of dementia and age, particularly when comparing age groups with those at least 85 years old.
Prior studies have found that the prevalence of types of dementia is different by sex [36], which could also be due to environmental and behavioral differences according to sex.Males had approximately 22% (secondary) and 45% (principal) higher odds of dementia diagnosis specificity compared to females, though this could be confounded with age.Black patients demonstrated a significantly higher odds of principal diagnosis coding specificity than White patients.However, the reverse is observed for secondary diagnosis specificity.In both cases, there could be confounders due to collinearity with other factors, including social vulnerability indices.Patients have been shown to experience differences in the prevalence of dementia and its associated symptoms and severity by race [37], which could potentially have an association with the ability of doctors to provide a specified dementia diagnosis.
The significant association between longer hospital stays and higher odds of both principal and secondary coding specificity could be due to the additional inpatient time which allows for more comprehensive evaluations, diagnoses, and documentation.Patients discharged to specific destinations, such as hospice homes, hospice medical facilities, or psychiatric hospitals, exhibited significantly higher odds of principal and secondary diagnosis specificity.This could be related to the severity of their case or their prior history, which could, in turn, be associated with a potentially more accurate clinical diagnosis.Patients undergoing more procedures had higher odds of receiving a specified principal or secondary diagnosis.Though the cause of this association is unclear, this could be related to there being more resources allocated for identifying a patient's disease when procedures are necessary during their inpatient stay.While a COVID-19 diagnosis was not associated with differing odds of principal diagnosis specificity, it was associated with lower odds of secondary diagnosis specificity.However, it is unclear whether the association between the severity of patients' COVID-19 symptoms and age could be a confounder [38].
While the differences by CMS fiscal year in secondary diagnosis specificity were minor and are probably clinically irrelevant, the differences were more substantial among those with a dementia primary diagnosis.However, this could be due to seasonal confounders.The new fiscal year, denoted as 2023, was only measured in the October-December 2022 period, which may also be a period with seasonally over-burdened hospitals and less time for healthcare personnel to perform more in-depth diagnoses of patients.
From a payor perspective, none of the payor types were associated with differing odds of principal diagnosis specificity when compared to that of Medicare traditional.This is encouraging, as it indicates that principal diagnosis specificity may not be attributable to healthcare payor type.However, the substantial differences in the univariate results indicate that some complex associations may be embedded, though this is unclear, since the patient mix would not be homogeneous across payor types.For example, age could be acting as a proxy for Medicare status.Also, some differences were found when assessing odds of secondary diagnosis specificity.Some of these differences could be due to other patient characteristics.For example, those receiving Medicare traditional may be in widely different age groups than those for whom the payor comes from a direct employer contract or who receives workers' compensation.Thus, health insurance coverage may be substantially different across patients, leading to the different propensities of patients to seek hospitalization [39].
Additionally, the ownership status of the facilities displayed some significant differences, with local government-owned facilities showing notably higher odds of principal and secondary diagnosis specificity.Again, the non-clinical patient characteristics by facility and facility ownership could differ widely.The case mix index of the hospital was significantly, positively associated with the specificity of principal diagnosis, indicating that the overall complexity of patients' needs in a facility is related to higher degrees of specificity provided during a hospitalization.However, no significant association of specificity and the facility case mix index was found when the dementia diagnosis was secondary during the inpatient stay.Substantial differences were also found by state, particularly for secondary diagnoses.These differences could stem from the population mix or could be related to a substantially larger sample size for this analysis.Differences in health care provision by state across multiple metrics, such as care setting and type of disease/clinical area, have been documented [40].However, we cannot link the coding specificity with the quality of care directly, since a low quality of care can occur when there are low levels of specificity state-wide but also when there are high levels of specificity and such excessive level of specificity is not clinically warranted.
These variations in coding practices demonstrate the potential influence of organizational characteristics or state-wide standards of practice on coding specificity.State-level variations may be attributed to regional variations in healthcare infrastructure, regulatory frameworks, insurance-related expectations/requirements, or coding practices.Also, there is state clustering of hospitals with a common health system, which may share a coding department and/or coding standards.However, they could also be influenced by the patient mix and other correlated factors in these states, given the socioeconomic, racial, and age differences across states, which may reflect the underlying reasons for non-idiosyncratic specificity disparities [41].
Providing high levels of coding specificity, when possible and appropriate, supports the accuracy and completeness of health records for patients, potentially enhancing their subsequent health outcomes.However, high coding standards require both time and educational/training resources for coders to conduct efficient and consistent coding practices that are current and accurate.Unspecified diagnoses may sometimes be a consequence of insufficient knowledge about all possible ICD-10 codes available related to a condition.Over-specified diagnoses may be a consequence of miscoding.Therefore, there is a tradeoff between the cost of specificity-related accuracy (oftentimes paid by the provider) and the cost of specificity-related inaccuracy (oftentimes a burden for the payor and the patient).Our approach demonstrates that facilities with dementia-related hospitalizations can be compared against a common/industry standard in a risk-adjusted form, so that facilities over-or under-specifying can be identified and their coding standards of practice can be adjusted, when needed.
While the proposed approach is demonstrated with an example of clustering at the facility level, for which full information is available for all patients, clustering by other factors is also possible.For example, clustering by zip code can allow for geospatial analyses of coding specificity.Also, clustering factors do not need to be available for all observations, allowing for more flexible analyses.For example, some hospitals may collect information about patients or systems that other hospitals do not collect.Clustering analyses are possible in such instances, and it is one of the core advantages of the two-step approach of performing patient-level analyses and subsequently clustering by any desired factor.
Our findings emphasize the association of multiple patient and facility characteristics with coding specificity.The relative significance of the evaluated variables in explaining the variability in coding specificity further underscores the importance of risk-adjusted performance metrics when comparing healthcare outcomes and facility performances.

Strengths and Limitations
A large comprehensive dataset with nearly 488,000 patient observations related to dementia was used for this study, which represents, to our knowledge, the largest dementiarelated study approaching the topic of diagnostic coding specificity.Developing and utilizing the proposed risk-adjusted metric allows for a fair assessment of coding specificity among healthcare facilities while producing an extrapolatable approach that allows for the incorporation of any available information about patient hospitalizations.
Though the dataset contains the most recently completed year (2022), it only encompasses a single year of discharges, yielding temporal limitations since coding policies and practices can be updated yearly.However, due to these potential dynamics, it is important to have a recent dataset that reflects current practices.The demonstrated method, however, can be applied on a rolling basis, so that facilities can assess their practices over time and evaluate any adjustments made along the way.
While the dataset comprises a large portion of U.S. hospitalizations, there could be data imbalances by state or other factors not considered in the study.This may affect our ability to measure associations with some variables with low counts, such as some of the states.However, this would not affect our results as long as the data imbalances are not directly related to the coding specificity.Also, the cohort definition includes only a subset of dementia-related codes (F ICD-10 diagnosis codes).A more expansive cohort definition is possible, but it would not affect the approach taken, since the cohort definition is common across facilities.
We utilize administrative claims data for explaining a substantial portion of the coding specificity variability in healthcare facilities.While this is insufficient to explain the full variability of diagnostic coding specificity, it is noteworthy that this explanatory power was achieved with minimal access to patients' clinical characteristics, such as those provided in EHRs, many of which are not commonly available in claims data.This indicates that the model provides a baseline from which substantial improvements are possible if additional information is available, such as the granularity and clinical details found in EHRs.However, by making EHRs an optional input, our model gains generalizability, since there is no need for a clinical metric against which to measure the 'correctness' of the degree of coding specificity.Thus, while such clinical metric would be ideal, it is also unfeasible.Therefore, our approach should only be used as a metric to compare against industry standards and averages or against aspirational peer facilities.
Our approach assumes that patients are provided homogeneous treatments within facilities conditional on the set of variables used in the multivariate logistic regression.However, this assumption could be relaxed by introducing additional clustering factors/variables, such as the physicians within facilities, which may explain additional sources of coding specificity variability.The assumption of independence across hospitalizations could also be questionable, since there will be a substantial number of unmeasured factors that could contribute to a lack of independence (e.g., how busy the facilities were during the hospitalizations, who provided treatment, what the commonalities of the unmeasured clinical components across patients were, etc.).However, the model provides an initial metric to flag facilities with the potential for non-standard specificity practices, which can then be investigated more thoroughly by quality-control personnel.
The inclusion of random effects in the model was first considered across a range of facility-level characteristics, particularly the facility identifier.However, for the purpose of this study, we did not include random effects for multiple reasons as follows: (1) Computational complexity-for example, facility-specific random effects added hundreds of random effects in this particular dataset and potentially thousands or tens of thousands for other cohorts, leading to memory limitations.The proposed approach still required nearly 8 Gb RAM.Additionally, if even larger computational resources are needed, then the ability of quality-control personnel to use this approach could be substantially limited; (2) The reduced level of extrapolatability for even more complex or larger datasets, as administrative data may contain few observations or just one observation per facility, particularly if the tool is used for 'live' monitoring purposes; (3) Assumptions behind a random effects approach would be highly questionable, since the random effects would likely be correlated with some of the patient-level characteristics; and (4) While random effects and other modeling enhancements (e.g., different machine learning approaches or semiparametric models with spline components for some of the continuous variables, such as the log-length of stay) could have been considered for variables with lower numbers of categories, the purpose of our approach is not to find the optimal model for a particular cohort/year or set of variables.Instead, this manuscript aims to demonstrate the methodology and utility of administrative information in explaining diagnostic coding specificity variability among patients diagnosed with dementia.The purpose of the two-step approach (first at the patient visit level and then aggregated at any level) is to also provide tools that can be used in different forms, both in a disaggregated form for patient visit monitoring and in an aggregated form for facility monitoring.
The presence of multicollinearity among risk adjustment factors can complicate coefficient interpretation.Alternative approaches that map the information content to smaller sets of uncorrelated factors may be viable to reduce variance inflation, though they would be highly complex to construct due to the mostly categorical structure of the explanatory variable set.Such alternatives could also reduce interpretability.Multicollinearity, however, does not impact the main outcome of this manuscript, which is the estimation of a probability metric for coding specificity at the hospitalization level and a subsequent facility-level aggregation to measure facilities against healthcare industry standards.The goodness of fit or model use for prediction are not affected by collinearity, which allows for wide arrays of explanatory variables to be combined, regardless of potential information overlap in these variables.Thus, the focus of this manuscript is the metrics at the hospitalization and facility levels and their utility in identifying hospitalizations and facilities whose outcomes may substantially differ from industry practices, rather than the specific associations between explanatory variables and outcomes.
Finally, some quantitative data were provided in grouped categories for confidentiality purposes (e.g., age and bed size), and additional variables were not included to maintain the confidentiality of the records.This additional granularity and information could prove to enhance model outcomes within healthcare facility settings.

Conclusions
Medical coding is a very important component of healthcare systems, with an extensive impact on patient care quality, reimbursement, and system reliability.An understudied aspect of coding accuracy relates to coding specificity to the highest precision clinically possible.Our study focused on dementia coding specificity in the U.S. and demonstrates that a large number of readily available patient-and facility-level characteristics can be used to make risk adjustments to the odds of coding specificity and thus provide a standardized metric against which facilities can compare their coding specificity practices and standards.This study provides healthcare facilities with a valuable tool to enhance and assess variations in coding specificity, thus contributing to improved healthcare system

Figure 1 Figure 1 .
Figure 1 panel (a) shows the ROC curve for the multivariate model of the coding specificity of a principal diagnosis related to dementia.The estimated AUC was 0.7269, representing the good reliability of the multivariate model in assessing the coding specificity of dementia-related principal diagnoses.Panel (b) shows the ROC curve corresponding to the multivariate logistic regression analysis for assessing the coding specificity of secondary dementia diagnoses.The corresponding AUC was 0.5919, demonstrating a worse model performance when compared to that of the model assessing the coding specificity of primary dementia diagnoses.Healthcare 2024, 12, x FOR PEER REVIEW 18 of 25

Figure 2
Figure 2 represents a subset of the facilities' observed dementia-related principal diagnosis coding specificity (a) and secondary diagnosis coding specificity (b) relative to industry standards.The p-values (and the 95% CIs, which are represented as error bars) from the estimated Poisson binomial distribution are used so that under-specificity versus peers (p < 0.025) is represented in blue; specificity in line with peers (0.025 ≤ p ≤ 0.975) is represented in black; and over-specificity versus peers (p > 0.975) is represented in orange.

Figure 1 .
Figure 1.Receiver operating characteristic (ROC) curve of the multivariate logistic regression model for the specificity of a dementia-related principal diagnosis (a) and secondary diagnosis (b).

Figure 2 Figure 1 .
Figure 2 represents a subset of the facilities' observed dementia-related principal diagnosis coding specificity (a) and secondary diagnosis coding specificity (b) relative to industry standards.The p-values (and the 95% CIs, which are represented as error bars)

Figure 2 Figure 2 .
Figure 2 represents a subset of the facilities' observed dementia-related principal diagnosis coding specificity (a) and secondary diagnosis coding specificity (b) relative to industry standards.The p-values (and the 95% CIs, which are represented as error bars) from the estimated Poisson binomial distribution are used so that under-specificity versus peers (p < 0.025) is represented in blue; specificity in line with peers (0.025 ≤ p ≤ 0.975) is represented in black; and over-specificity versus peers (p > 0.975) is represented in orange.

Figure 3
Figure 3 represents the adjusted ORs for the coding specificity of a dementia-related principal diagnosis (a) and secondary diagnosis (b).All of the adjusted ORs are represented against New York as the reference state.Only a few states demonstrate statistically different adjusted odds of coding specificity of a dementia-related principal diagnosis ver-

Figure 2 .
Figure 2. Observed counts of indicators of principal diagnosis coding specificity (a) and secondary diagnosis coding specificity (b) for dementia diagnoses by facility (dots) and 95% confidence intervals based on the Poisson binomial metric (error bars), with colors denoting over-specificity (orange), under-specificity (blue), and specificity in line with peers (black).

Figure 3 Figure 3 .
Figure 3 represents the adjusted ORs for the coding specificity of a dementia-related principal diagnosis (a) and secondary diagnosis (b).All of the adjusted ORs are represented against New York as the reference state.Only a few states demonstrate statistically different adjusted odds of coding specificity of a dementia-related principal diagnosis versus New York, while a larger amount of variability is observed for states' secondary diagnosis coding specificity.The gray states had non-significant adjusted ORs of coding specificity.Healthcare 2024, 12, x FOR PEER REVIEW 19 of 25

Figure 3 .
Figure 3. Geographical U.S. map of adjusted odds ratios (ORs) of coding specificity of dementiarelated principal (a) and secondary (b) diagnoses by state, with a reference state of New York.Odds ratios that were not statistically significant are shown in gray.

Table 1 .
Descriptive statistics of the dementia-related principal and secondary diagnosis coding specificity outcomes as well as patient and facility characteristics (counts and means/proportions and corresponding percentages/standard deviations).

Table 2 .
Univariate and multivariate logistic regression results including odds ratios (ORs), corresponding 95% confidence intervals (CIs), and p-values for specificity of a dementia-related principal diagnosis.

Table 3 .
Univariate and multivariate logistic regression results including odds ratios (ORs), corresponding 95% confidence intervals (CIs), and p-values for specificity of a dementia-related secondary diagnosis.