Multimorbidity from Diabetes, Heart Failure, and Related Conditions: Assessing a Panel of Depressive Symptoms as Both Formative and Reﬂective Indicators of a Latent Trait

: Through exploring speciﬁc conditions (diabetes, heart failure, related vascular/metabolic diagnoses) and their multimorbidities, I develop a more thorough means to adjust confounders of clinical targets within main or interactive contexts in epidemiological panel studies. Regression-based multiple indicators-multiple causes (MIMIC) models combine multiple or moderated regression and conﬁrmatory factor analysis. In a novel speciﬁcation, each of twenty depressive symptoms is both a “formative” (causal) indicator and a “reﬂective” (effect) indicator of a latent trait (Depression). Although both indicators provide identical information (under different variable names), formative indicators provide “exogenous” information (outside the model) to estimate, within groups or subgroups, “endogenous” effects (recovered by the model) from the latent trait and its reﬂective indicators. Formative indicators within the multiple regressions constitute comprehensive proxies for unspeciﬁed confounders by completely mediating all unspeciﬁed confounder effects on the endogenous latent trait and its reﬂective indicators, the latter estimated through conﬁrmatory factor analysis. Findings of symptom clusters of Depression in these speciﬁc conditions, and in subgroups that capture their synergies, corroborate parallel MIMIC models with instrumental variables that specify several known confounders, but suggest some confounding biases remain. All multimorbidities involve synergy from co-occurring diabetes and heart failure. There may be opportunities to target screening and optimize metformin treatment for these co-occurring conditions. This strategy avoids the need to specify all confounders, which may not be possible or veriﬁable.


Background
Analysts often use principal components analysis (PCA) or confirmatory factor analysis (CFA) to investigate panels of metabolic, biomarker, or symptom data in metabolomic and epidemiologic studies [1]. For instance, PCA of a metabolic panel determined specific metabolites that distinguished lean from obese subgroups with insulin resistance [2]. CFA assesses the effect of the disease group or subgroup on each of the measurement items while simultaneously controlling for the influence of the latent factor on each measurement item. This allows CFA to adjust for measurement error and unreliability. In contrast, PCA measurement loadings may be inflated due to the lack of similar adjustments. Indeed, evidence from Monte Carlo analyses [3,4] suggests measurement loadings for the same symptoms tend to be higher (inflated) in PCA compared to CFA (see Appendix A, note 1).
CFA also estimates the "multiple indicators" that constitute the measurement model portion of a structural equations latent trait model known as the multiple indicators-multiple causes (MIMIC) model. Each participant has a true score reflected by their innate position on a latent trait (e.g., Depression), which generates or precipitates manifest or observed measurement indicators (e.g., different depressive symptoms). The other portion of a MIMIC model, the "structural model", predicts components of the measurement model. When the latent trait is controlled, only variation unique to each observed indicator remains. Predictor effects to the observed indicators have "local independence" in that the observed items are conditionally independent of each other because the latent trait accounts for the shared variation across the observed indicators. The addition of the structural model to CFA permits more valid modeling than CFA alone when symptoms, biomarkers, or metabolites may not all stem from a single biological pathway and confounding influences are more likely.
MIMIC model estimation may be based on data in the form of a matrix of covariances (or correlations with means that can be converted to covariances), in which all variables are endogenous (estimated by the model). If data are in the form of individual observations, regression-based estimation is a more powerful option, in which only some of the variables are endogenous and the remaining are exogenous (they provide information from outside the model to assist in estimation) [5][6][7][8][9]. In contrast to the covariance-based approach, in which all variables are considered jointly as dependent (y) variables, in the regression-based approach the exogenous predictors are independent (x) variables that estimate conditional effects of the endogenous dependent (y) variables and latent trait (i.e., the estimated effects on y are conditional on the set of x).
Let us assume a regression-based MIMIC model in which some or all of the observed indicators of the endogenous measurement model (y i ) are categorical (binary or ordinal). The relationships of the measurement and structural model portions are, respectively: y* i = ν + Λ η i + K x i + ε i , where ν = vector of measurement intercepts; Λ = matrix of factor loadings; η i = vector of latent traits (or latent constructs or latent factors); K = matrix of regression slopes of the latent response variables on the independent variables (exogenous); x i = vector of independent variables (exogenous); ε i = vector of measurement errors in the measurement model uncorrelated with other variables; α = vector of intercepts in the structural model; B = matrix of regression slopes of the latent trait on other latent traits; Γ = matrix of regression slopes of the latent trait on the independent variables (exogenous); ζ i = vector of residuals in the structural model uncorrelated with other variables.
The vector of latent traits (η i ) appears on the left side of the structural model equation in Equation (1) because each latent trait may be predicted by other endogenous latent traits and by the vector of exogenous independent variables (x i ). It also appears on the right side in order to reveal endogenous relationships specified between latent traits, constructs, or factors. The current study includes only one latent trait and thus excludes the right side-term.
The estimation approach developed and demonstrated by Muthén [5][6][7][8][9] and available in the Mplus software program [9] applies ordinal probit regression when y i is a categorical variable to fit the correlation structure of the model to sample correlations. A derived scaling factor (∆) yields the continuous latent response variable (y* si ) behind the observed indicator (see Appendix B in [9] for further details). There may be as few as two observed indicators. When an observed indicator is continuous, it is set equal to the continuous latent response variable (y * i ) and the diagonal elements of ∆ are set to one. Thus, y* si = ∆ y* i .
By assuming conditional normality for y * si given x i , the regression-based MIMIC model yields estimates of conditional expectation and conditional variation: where Ψ = covariance matrix of ζ i ; Θ = covariance matrix of ε i ; (I − B) is non-singular. In contrast to covariance-based approaches, the assumption of conditional normality allows the continuous latent response variables (y* si ) behind categorical y i variables to be non-normal as a function of non-normality in the exogenous (x i ) variables (see Appendix A, note 2).

Purpose of the Study
There is a need for approaches to address confounding biases and to account for synergistic effects from multimorbidities that influence metabolites, biomarkers, and symptoms. A major contribution of this article is that it easily adapts the latent trait model into a nonrecursive, bidirectional specification of the MIMIC model, and in so doing provides a means to partial out unspecified confounding factors.
I call attention to the unrecognized possibility of using the regression-based approach to model any given panel item as both an exogenous variable and an endogenous variable within the same MIMIC model. This special type of MIMIC model is a contender to conventional multidimensional data reduction strategies with PCA or CFA for research on panels of metabolites, biomarkers, or symptoms because it enables CFA either to be conducted across the sample or to be targeted within an overall group or interactive subgroup, while providing extensive and comprehensive control of confounding factors.
The analyst may specify interaction terms to detect synergistic effects of metabolites, biomarkers, or symptoms within homogeneous disease subgroups or subphenotypes that distinguish them from the main effects of the overall disease groups or phenotypes. Interactions among predictor variables define the subgroups rather than subsets of observations derived in cluster analysis, where there is often uncertainty about optimal feature selections for deriving unbiased clusters [10]. Furthermore, research has not allowed for and incorporated synergies within disease subgroups based on interactions of the individual disease components that arise from multimorbidity, which may lead to misassign/misattribute metabolic or symptom cluster variation to specific disease groups.
This overlooked strategy allows some or all items from a panel to be modeled simultaneously as formative (causal) indicators and reflective (effect) indicators of a latent trait, overcoming a commonly assumed restriction that it is necessary to choose only one of these options to specify and model any given indicator, e.g., [11][12][13][14]. The formative (causal) indicators are included to improve the detection of metabolic, biomarker, or symptom clusters (represented by the reflective indicators) either (1) within disease subgroups distinguished by different synergistic effects (interactions) of multimorbidity or (2) across disease groups (see Appendix A, note 3).
Using epidemiological data on metabolic and vascular disease conditions, I develop a protocol to specify MIMIC models that unveil unbiased clusters of psychometric items (the specific metabolites, biomarkers, or symptoms) of a latent trait (the overall level of the panel of items) within main or interactive disease contexts or across the full sample. The regression-based approach available in M-plus statistical software affords this superior modeling advantage over the alternative covariance-based approach. I derive the MIMIC models and multivariate regressions to develop this protocol in M-plus (Version 5.21) using the MLR estimator (maximum likelihood parameter estimates with standard errors that are robust to non-normality and non-independence in complex random samples) [9].
The literature supports testing a panel of metabolites, or higher-order features such as depressive symptoms, within diagnostic subgroups of co-occurring metabolic and vascular conditions, in which their interaction or synergistic effects identify them. I will highlight here certain factors that support a context of synergy among metabolic and vascular conditions and related depressive symptoms. First, diabetes, and especially the metabolic syndrome (based on a constellation of conditions such as obesity, diabetes, hypertension and/or atherosclerosis), has been known for years to double the risk of a heart attack or for developing chronic heart failure [15]. Second, the action of the diabetes medication, metformin, provides a clue to these metabolic-vascular interrelationships. Metformin consistently shows promise for the prevention or slowing of atherosclerosis and heart conditions such as myocardial infarction and chronic heart failure. Among these positive effects, metformin appears to improve how well the protein titin folds and recoils within the heart muscle, which determines how well blood is pumped through the arteries [16][17][18][19][20][21][22][23][24][25][26][27]. The unexpected effects of metformin suggest diabetes may be interrelated with episodes of heart failure or heart attack, and related conditions such as atherosclerosis, in previously unknown ways, and they may be co-occurring conditions with synergistic effects that worsen depressive symptoms of sickness malaise. Third, much accumulated evidence over the years attests that depression occurs in the contexts of each of these individual medical conditions, as sickness malaise that occurs as part of the biological disease process itself and not only as a psychosocial reaction to it, as discussed in [28]. Finally, individuals with multimorbidity from more than one of these medical conditions experience higher levels of depression because of synergies among the intersecting and interrelated disease processes. I recently detected clusters of depressive symptoms that were associated with multiple and interacting co-occurrences of metabolic conditions (excess weight, diabetes) with heart attack and with progressive vascular disease (hypertension, silent cerebrovascular disease, stroke, and vascular cognitive impairment), although congestive heart failure was not considered [28].
As we shall see, in the current study several reflective (effect) indicators are statistically significant in congestive heart failure. Several formative (causal) indicators of depressive symptoms that manifest from the latent trait (Depression) are also statistically significant in the overall disease group of congestive heart failure but not in any of the other disease groups or subgroups (where only two or fewer formative indicators are significant). This distinctive pattern provides a clue that at least part of the influence of congestive heart failure on the reflective indicators of depressive symptoms may occur indirectly through its interactions with other disease conditions. The current investigation focuses on the individual direct effects and synergistic comorbidity of diabetes and heart failure, with this targeted diagnostic subgroup also expanded to incorporate synergies from related conditions, such as hypertension, silent cerebrovascular disease, and previous heart attack. It will yield a protocol for testing a panel of metabolites, biomarkers, or symptoms as formative and reflective indicators in a MIMIC model. The innovation deciphers and adjusts background confounding biases by estimating bidirectional causal pathways based on both types of indicators of each panel item (and the derivative latent trait) in order to unveil effects from disease conditions (diagnoses, genes, risk factors) or synergies from multimorbidity in co-occurring conditions.

Materials and Methods
In order to specify MIMIC models that test a panel of metabolites, biomarkers, or symptoms, this study will compare two competing alternatives I derived. One applies instrumental variables to represent panel items considered "non-traditional" and the other, introduced for the first time in the current study, specifies bidirectional causal relationships that incorporate formative indicators of all panel items. I introduced and tested the instrumental variable MIMIC approach within vascular-metabolic subgroups of participants [28]; I direct the reader to that study for details regarding the sample, measures, and methodology. In that study and the current one, I use survey data from the New Haven, Connecticut subsample of community-residing older adults from the Established Populations for the Epidemiological Study of the Elderly (EPESE; unweighted n =2812). The original data were collected after participants (or proxies for two percent of the sample) provided written consent. The Adelphi University institutional review board exempted from review the de-identified, publicly available data [29].

Alternative MIMIC Models: Initial Comparisons
The two panels of Figure 1a,b target the subgroup in which Diabetes and Heart Failure interact (Diabetes × Heart Failure). In both (a) and (b), the right column labeled "REFLECTIVE INDICATORS" draws on all items from the Center for Epidemiological Studies-Depression (CES-D) inventory. {In (a), the items are "Depressed" to "Each of 4 Non-Traditional CES-D Items (Fearful, Lonely, People Unfriendly, People Disliked Me)"; and in (b), the items are "Depressed 2" to "Each of 4 Non-Traditional CES-D Items 2 (Fearful 2, Lonely 2, People Unfriendly 2, People Disliked Me 2)"}. These are effects of the latent trait (Depression). Furthermore, the two panels distinguish separate participant subgroups based on validated ranges of CES-D total scores. All participants are included when CES-D ≥ 0. Participants with scores of 11 or greater may be either experiencing subthreshold depression, just below the threshold of clinically significant depression (CES-D total score of 11 to 15), or clinically significant depression (CES-D total score of 16 or greater). The notes to Figure 1 are reported in Figure S1. On the other hand, (b) does not involve instrumental variables. Instead, all twenty CES-D items are "FORMATIVE INDICATORS (CES-D ≥ 0)," which are causes of the latent trait (Depression CES-D ≥ 0). The first set of depression items {from "Depressed 1" to "Each of 4 Non-Traditional CES-D Items 1 (Fearful 1, Lonely 1, People Unfriendly 1, People Disliked Me 1)"} that are causes of the latent trait (Depression) are formative indicators. The other set of depression items {"Depressed 2" to "Each of 4 Non-Traditional CES-D Items 2 (Fearful 2, Lonely 2, People Unfriendly 2, People Disliked Me 2)"} that are effects of the latent trait (Depression) are reflective indicators. An especially desirable feature of this dual specification in (b) is that it can provide more extensive and comprehensive control of confounders than trying to specify each of them directly. Each unspecified confounder operates through its effects on each of the specified formative indicators that predict the latent trait (Depression).
The error-conditioned reflective (effect) indicators can be modeled across the sample at large, in an overall group (a diagnosis, gene, epigenetic factor, environmental or other risk factor), or within a more targeted subgroup of interacting diagnoses, genes, epigenetic factors, or environmental or other risk factors.
The full panel of metabolites, biomarkers, or symptoms serve as observed items that contribute to a latent trait representing the overall level of the total metabolite, biomarker, or symptom panel {e.g., Depression in Figure 1a,b}. The simultaneous control for the level of the total panel allows estimation of more valid specific effects for individual metabolites, biomarkers, or symptoms (see Appendix B, note 1).
In Figure 1, all instrumental variables in (a) and all formative indicators in (b) are not predicted by other factors and so are considered exogenous. The diagnoses (Diabetes, Heart Failure) and their interaction (synergy within the subgroup when the diagnoses co-occur in the same individuals) are also exogenous in both panels. Depending on the MIMIC model specification, the formative indicators may provide either exogenous or endogenous information. If the variables reflecting the exogenous diagnosis subgroup in Figure 1 had predicted some of the formative indicators (e.g., the four non-traditional CES-D items), these formative indicators would be endogenous because they would mediate information contributed by the variables that constitute the exogenous diagnosis subgroup. This tighter specification detects suppressor effects, as discussed below.
In the two MIMIC models of Figure 1a,b the specification of indicators (enclosed in boxes in Figure 1a,b) to the left of the latent trait (enclosed by the circle) is estimated by multiple regression, or when interaction terms are also specified as indicators, by moderated multiple regression, which predicts the latent trait. These indicators may include: (1) predictors representing main or interactive epidemiological contexts of diagnoses, genes, epigenetic factors, environmental or other risk factors {in (a) and (b)}; (2) instrumental variables to estimate certain non-traditional items (symptoms) within the measurement model {in (a)}; and/or (3) formative (causal) indicators {in (b)}. All three types of indicators are exogenous because they provide outside information to estimate and recover the latent trait and its reflective (effect) indicators (i.e., the model does not estimate and recover the predictors and formative indicators) (see Appendix B, note 2).
CFA estimates the specification of indicators (enclosed in boxes) to the right of the latent trait, where the direction of causality is from the latent trait to the reflective (effect) indicators, which represent multiple "observed" expressions of the latent trait. It is this joint application of multiple regression and confirmatory factor analysis in estimating the latent trait that allows both reflective (effect) indicators and either instrumental variables or formative (causal) indicators to be specified, with the the formative indicators serving to absorb biases from confounding factors across the sample that would otherwise result in biased reflective indicators within the targeted subgroup. Figure 1a reveals a descriptive MIMIC model within a subgroup with both diabetes and heart failure (based on predictors for Diabetes, Heart Failure, and their interaction). The MIMIC models represented by Figure 1a incorporate the sixteen traditional items and four non-traditional items of the CES-D inventory to identify cases of clinically significant depression. The non-traditional items are modeled not to interfere with the contribution of traditional items to the distinct presentations of depression within the metabolic/vascular illness subgroups, but while still contributing to the latent trait (Depression) that reflects the overall level of depression necessary to identify cases of subthreshold or clinically significant depression (CES-D ≥ 11). In participants with subthreshold or clinically significant depression, CES-D ≥ 11, each non-traditional item (CES-D ≥ 11) is predicted by its targeted, subsumed instrumental variable for the same item (CES-D ≥ 0).

The Instrumental Variables Approach
These four non-traditional items predict the non-traditional "formative (causal) indicator" portion of the variation in the latent trait for Depression. This leaves the exogenous predictors of interest (i.e., Diabetes, Heart Failure, Diabetes × Heart Failure) to predict the "reflective (effect) indicator" portion of the variation in the latent trait for Depression. This portion includes the effects of these three predictors on "Each of 4 Non-Traditional CES-D Items (Fearful, Lonely, People Unfriendly, and People Disliked Me)" in the measurement model portion (the right side) of the MIMIC model. The modeling of the four non-traditional items in this way adjusts for their inter-correlated variation with the sixteen traditional items so that the inclusion of the four non-traditional items does not bias their separate individual influences.
This original approach affords insight into distinct presentations, even phenomenology, by unveiling elusive clusters of psychometric items (metabolites, biomarkers, or symptoms) of a latent trait, either broadly (across a diagnosis, gene, epigenetic, environmental, or other risk group) or uniquely (within a subgroup targeted by interactions of two or more such groups). It overcomes the potential for common, insidious confounding biases in regression-based MIMIC models, which can estimate direct (unique) effects of predictors to the latent trait and to all but one of its reflective indicators. Even if it seems justified not to specify the direct effect to a certain reflective indicator (i.e., usually fixed at zero), hidden bias may infect the latent trait and proliferate across reflective indicators, undermining the validity of specified (shared and direct) effects. The advance avoids this difficulty by offering a new way to specify a MIMIC model that enables the direct effect on every single reflective indicator of a scale or subscale to be unveiled (while still adjusting shared effects across the reflective indicators to account for the level of the latent trait). Thus, it reveals the subset of reflective indicators that have statistically significant direct effects, which comprise the item cluster within the group or subgroup [28].

The Formative Indicators Approach
A MIMIC model that specifies causal pathways from every single predictor (the disease groups, their interactions, and all formative indicators) is not "identified" (i.e., unique estimates do not exist). Therefore, at least one causal pathway must not be estimated (i.e., the regression slope is fixed at zero). However, the analyst should determine specific causal pathway(s) to exclude on valid grounds, which is often not apparent or possible. The formative indicators approach is another solution to this dilemma. The exogenous/endogenous modeling distinction in regression-based MIMIC models affords a unique and valid opportunity to estimate such models. It permits us to specify (1) the endogenous portion of the model to estimate effects within the specific disease group/subgroup (i.e., all effects from the disease groups and their interactions) and (2) the exogenous portion to estimate effects across the sample at large instead of within the disease group/subgroup (i.e., no effects from the disease groups and their interactions). Normal and non-normal variation from predictors in the exogenous portion of the structural model (the formative indicators) across the sample at large conditions the estimates of effects in the endogenous portion of the model (the reflective indicators) within specific disease subgroups. Figure 1b retains the distinction between traditional versus non-traditional CES-D items, however this distinction is unnecessary since as formative indicators, all items contribute, in the same way, to the distinct presentations of depression within the metabolic/vascular illness subgroups captured by the reflective (effect) indicators. Complete mediation occurs because the formative and reflective indicators involve the same measurement items. The specification of all CES-D depression items as exogenous formative indicators derives from a modeling conceptualization in which all unspecified and unknown confounders are direct predictors of all of these formative indicators, which mediate all confounder effects on the latent trait (e.g., Depression) and its reflective indicators. Thus, it is necessary only to specify the formative indicators as exogenous indicators in order to achieve comprehensive control for confounders (since they operate completely through the formative indicators, which are controlled). In contrast to the instrumental variables approach, the formative indicators approach avoids the need to identify and specify all of the important confounding factors, which is a task that is not possible in many contexts and even when possible its achievement is often unknown.
Instead, it relies on specifying non-recursive, bidirectional causal relationships between each panel item and the latent trait. This type of modeling adjusts for the impact of the formative indicator of each panel item as a cause of the latent trait that would otherwise bias the relationship of the latent trait on the reflective indicator of the same panel item. The lack of specification and adjustment for these relationships of reciprocal causation contributes confounding biases [30] (biased estimation occurs within the formative indicators portion of the MIMIC model because regression predictors become correlated with residual terms on account of bias from simultaneity [31] and reverse causation [32]). Regression bias in estimating the latent trait, in turn, triggers non-optimal, biased estimation across its reflective indicators. (In contrast, bias restricted only to a particular reflective indicator will not distort the inter-correlations among the remaining reflective indicators in the measurement model). In contrast to the fully endogenous, covariance-based MIMIC model, the exogenous/endogenous distinction in the regression-based MIMIC approach with both formative and reflective indicators allows a sample-wide exogenous focus while constraining the endogenous focus to be either across a particular disease group or within a more targeted disease subgroup. This distinction means variable(s) that tap the disease group or subgroup predict the reflective items and the latent trait, but not also the formative indicators, avoiding the need for an instrumental variable approach to estimation.

Further Comparisons of Both Approaches
Although the four non-traditional CES-D items operate individually, each panel of Figure 1 does not show them individually but groups them within "Each of the 4 Non-Traditional CES-D Items (Fearful, Lonely, People Unfriendly, and People Disliked Me)". Figure 1b distinguishes the formative indicator (with a '1' following each indicator name) from the reflective indicator (with a '2' following each indicator name).
As an illustration, "Lonely 1" and "Lonely 2" are equivalent representations of the item Lonely, but because they have different variable names, the M-plus software program treats them as different variables. Actually, in Figure 1a,b, M-plus software models the observed distribution for Lonely as a continuous variable {as an instrumental variable in (a) and a formative or causal indicator in (b)} within the multiple regression framework of the MIMIC model. However, it models the postulated latent variable ordinal probit distribution that gives rise to the observed ordinal variable of Lonely (reflective or effect indicator) within the confirmatory factor analysis framework. Strictly speaking, the two variables are virtually, but not absolutely, identical in this context, as they would be if the reflective (effect) indicator were also modeled as a continuous variable (see Appendix B, note 3).
Thus, the instrumental variable in Figure 1a or the formative (causal) indicator in Figure 1b, which is exogenous, and the reflective (effect) indicator (in both panels) of the same item, which is endogenous, comprise bidirectional non-recursive pathways involving the latent trait (Depression). They derive essentially from the same item, which provides exogenous information in one pathway and is recovered as an endogenous factor in the other, and therefore labeled differently as separate variables in the two pathways. Figure 1b reveals an expansion of this MIMIC model by specifying each of the twenty CES-D items as both a formative (causal) and reflective (effect) indicator. In this expansion, there is no distinction between traditional and non-traditional CES-D items; all of the CES-D items now have both a formative (causal) indicator and a reflective (effect) indicator. Thus, in the measurement model portion, analyzed using CFA, there is shared variation across all CES-D items {and not only across the sixteen traditional CES-D items, as in Figure 1a}. This highly flexible specification models both the formative (causal) and reflective (effect) indicators for every CES-D item, avoiding the need to assume that only one of these options is operative within the measurement model for traditional items (and both options for non-traditional items). Rather, it allows the data to shape latent trait and measurement model distributions while providing comprehensive adjustment for confounding factors, which the twenty formative (causal) indicators serve to mediate.
The use of instrumental exogenous variables of some of the psychometric items {Figure 1a}, or the use of original exogenous variables to incorporate formative (causal) indicators of all of the psychometric items {Figure 1b}, each allows a more expansive and flexible specification involving reflective (effect) indicators and bidirectional non-recursive pathways. However, the incorporation of formative (causal) indicators of all CES-D items in Figure 1b is likely to partial out confounding factors more thoroughly. Both approaches overcome confounding from model misspecification in which actual instrumental variable effects or formative (causal) indicator effects are incorrectly attributed to reflective (effect) indicator effects (see Appendix B, note 4).
In the current study, the descriptive MIMIC model involving the instrumental variables approach excludes confounders, while the explanatory MIMIC model adjusts for specified confounders (Black, male, age 75 or older, not a high school graduate, recent widow, income equivalence adjusted for family size, isolated, smoker, alcohol consumption, hypertension, silent cerebrovascular disease, heart failure, excess weight, lost ten pounds during the past three months, diabetes, heart attack, and number of cerebrovascular risk factors) (see Appendix B, note 5). However, specifying a range of confounders does not usually adjust all confounders. In contrast, the new formative indicators approach relies on the formative indicators of the measurement items to tap all confounders since the formative indicators necessarily mediate all confounders in their effects on the latent trait. The extent to which formative indicators partial out unspecified confounders (including other unspecified symptoms) related to specified symptoms will allow the reflective indicators to tap more valid symptom clusters within a disease group or subgroup than the use of CFA or PCA alone. To provide the best comparison between the instrumental variables approach and the formative indicators approach, the current study does not specify any confounders in the latter approach, although it can incorporate individual specified confounders. Table 1 reports descriptive and explanatory MIMIC analyses based on the instrumental variables approach developed previously by the author. Table 2 reports MIMIC analyses based on the formative indicators approach developed in the current article. When these latter MIMIC analyses include formative indicators for all twenty CES-D items of depression, this comprehensive adjustment for unspecified confounders parallels the comprehensive adjustment for specified confounders in the explanatory MIMIC analyses of Table 1. To be clear, although analyses in Table 2 (the formative indicators approach) adjust for unspecified confounders, they do not specify them in contrast to the explanatory analyses in Table 1 (the instrumental variables approach). Finally, I report footnotes for Tables 1 and 2 in the corresponding Supplementary Tables S1 and S2.     Table 1 reports findings from MIMIC models with instrumental variables of four endogenous non-traditional CES-D items of depression that serve as formative indicators. As non-traditional items, the use of instrumental variables allows them to contribute to the level of the latent trait or additive composite of CES-D depression without influencing its presentation among the sixteen traditional depressive symptoms. Table 1 (A) reports findings for descriptive MIMIC models that also specify only the predictor(s) that target the disease group {a single main effects term} or subgroup {interaction term(s) and their one-way component terms}. Table 1 (B) reports findings for explanatory MIMIC models that specify these terms as well as potential confounders tapped by variables reflecting demographic groups and related vascular and metabolic conditions. Table 2 reports findings from MIMIC models with formative indicators as exogenous predictors or illness context mediators. Table 2 (A) reports findings when specifying the original variables of the four exogenous, non-traditional CES-D depression items as four formative indicators. Table 2 (B) reports findings when specifying the original variables of all twenty exogenous CES-D depression items as twenty formative indicators. In the findings from both (A) and (B), the lack of instrumental variables means that the variation from the four non-traditional CES-D items may now compete with the sixteen traditional CES-D items. They may compete in accounting for variation in the latent trait or additive composite, and therefore, in the presentation of statistically significant depressive symptoms as reflective indicators of symptoms and symptom clusters. Table 1 (B) provides adjustment only for known, specified confounders comprising demographic groups and related vascular and metabolic conditions. Table 2 does not also provide adjustment for known, specified confounders.

Comparisons of Formative Indicators as Instrumental Versus Original Variables
I previously focused on the instrumental variables approach in latent trait models with predictors of diabetes, excess weight, and progressive cerebrovascular disease and their interactions, but these models excluded heart failure and its interactions with these included disease predictors [28]. The current article updates the instrumental variables analyses by including heart failure as an additional predictor and an additional component of disease predictor interactions reported in Table 1. The instrumental variables analyses in Table 1 (B) also control for all other progressive cerebrovascular disease conditions, heart attack, and demographic variables in order to adjust impartially for these overlapping sources of variation that would serve as confounders if not also specified. In contrast, the formative indicators analyses exclude these control variables.
I compare these updated analyses in the instrumental variables approach (Table 1) to the parallel analyses in the new formative indicators approach (Table 2) to test the same disease group (a one-way variable) or subgroup (an interaction of variables). All of the disease subgroup interactions involve diabetes and heart failure (or heart failure without heart attack) with different combinations of related conditions (hypertension or silent cerebrovascular disease, heart attack, excess weight, up to the five-way interaction combination). Almost all have reasonably large regression slope estimates (greater than one) for symptoms, and almost all symptoms occur in one or more symptom clusters across disease subgroups. The highly similar, overlapping findings from both approaches provide evidence that the formative indicators approach (which does not include the control variables specified in the instrumental variables approach) adjusts for unspecified confounders. Furthermore, some additional predictors that were not significant in the instrumental variables approach (Table 1) become significant in the formative indicators approach ( Table 2), suggesting that there are additional salient confounders beyond those that were specified as control variables in the instrumental variables approach.
I create the instrumental variables in Table 1 only from the four non-traditional CES-D items, which also constitute the four formative (causal) indicators of the MIMIC reported in Table 2 (A). The descriptive MIMIC findings in Table 1 (A) only specify diagnostic predictors for the group or subgroup of interest (i.e., no co-occurring conditions or confounders are specified). It is striking that in certain disease subgroups, some descriptive MIMIC regression slopes and standard errors in Table 1 (A) are almost identical to those in Table 2 (A). These disease subgroups are Diabetes × Heart Attack; Diabetes × High BP × Heart Failure; Diabetes × High BP × Heart Failure, without Heart Attack; and Diabetes × Silent CVD × Heart Attack × Heart Failure. Thus, both specifications {instrumental variables and formative (causal) indicators} converge to estimate the same MIMIC model. In the remaining diagnostic subgroups, findings from the descriptive MIMIC in Table 1 (A) are strongly consistent with those in Table 2 (A), with almost all of the same CES-D reflective (effect) indicators found to be statistically significant. In one case (Diabetes × Heart Failure), the descriptive MIMIC in Table 1 (A) revealed all twenty CES-D items each to be statistically significant while Table 2 (A) detected the latent trait (Depression) to be significant.
In addition to diagnostic predictors for the group or subgroup of interest, the explanatory MIMIC models reported in Table 1 (B) include a comprehensive (not exhaustive) set of related personal characteristics and diagnostic conditions (dummy variables) in order to control for known co-occurring conditions or confounders. (These are: Black, male, age 75 or older, not a high school graduate, recent widow, income equivalence adjusted for family size, isolated, smoker, alcohol consumption, hypertension, silent cerebrovascular disease, heart failure, excess weight, lost ten pounds during the past three months, diabetes, heart attack, and number of cerebrovascular risk factors). I describe these variables in [28]. To a considerable extent, the Explanatory MIMIC findings {Table 1 (B)} are similar to the findings in Table 2 (B), in which all twenty CES-D items are each specified as a formative (causal) indicator of an "exploded" MIMIC model, along with the diagnostic predictors for the group or subgroup of interest.
A lack of statistically significant effects in the explanatory MIMIC for a diagnostic subgroup in Table 1 (B) always resulted in a similar lack of statistically significant effects in Table 2 (B). The same statistically significant effects in Table 2 (B) were always statistically significant in the Explanatory MIMIC findings in Table 1 (B). However, Table 1 (B) also tended to find other items significant as well in the Explanatory MIMIC runs, which could result partly from the exclusion, in the instrumental variables approach, but not the formative indicators approach, of shared variation within reflective indicators of nontraditional CES-D items with reflective indicators of traditional CES-D items.
To a wider extent, on the other hand, it also suggests that despite the attempt to specify a comprehensive (but not necessarily exhaustive) set of related personal characteristics and diagnostic conditions associated with confounders, confounding factors remain unadjusted. Thus, the specification of all twenty CES-D items as formative (causal) indicators in the "exploded" MIMIC models {Table 2 (B)} may achieve more complete conditioning than the counterpart Explanatory MIMIC models {Table 1 (B)}. This improved conditioning reduces bias and improves reliability. Although measurement loadings are often lower in Table 2 when all twenty CES-D items rather than only the four non-traditional items have formative indicators, this more restricted specification improves the assignment of variation among competing formative and reflective indicators, and leads to perfect model fit (R 2 = 1).
Curiously, in both Tables 1 and 2 . Thus, findings initially attributed to an overall condition such as diabetes or heart failure may well mask confounding biases from other co-occurring and interacting conditions detected either in an explanatory MIMIC model specifying them or in a MIMIC with a more carefully specified measurement model that includes formative indicators for all items. The R 2 fit statistic, which reveals the percent of the variation within the latent trait (Depression) predicted, also suggests confounding biases. In Table 1, when comparing each Descriptive MIMIC {(A)} to the respective Explanatory MIMIC {(B)}, the R 2 values do not increase (and even decrease due to multicollinearity among specified predictors). However, in Table 2, there is always an appreciable increase in the R 2 fit statistic when comparing (1) each MIMIC with four non-traditional items as formative indicators in (A) to (2) the corresponding MIMIC with all twenty items as formative indicators in (B), in which R 2 always equals 1.000 due to perfect fit between the formative and reflective indicators. Table 2 also indicates when one or more of the formative indicators are statistically significant predictors of the weighted, additive composite within each disease group or subgroup. These may reveal a real causal effect by a symptom or symptom cluster, or they may be artifacts of outliers, heteroscedasticity, and multicollinearity from earlier confounding factors for which the symptom(s) serve as a proxy.
In the final section (III) of Table 2, the MIMIC specification is extended to incorporate an additional variable (Excess Weight) to refine or target further the subgroups of chronic conditions (i.e., Table 1 does not include parallel findings).
Supplementary Table S2 footnotes 5-10 discuss findings and suppressor effects from all of the mediated and sequential MIMIC models reported in Table 2 and their implications for detecting synergistic effects from unspecified, co-occurring illness conditions.

The Formative Indicators Approach
In contrast to the covariance-based MIMIC model, which depends on the analysis of a covariance matrix to generate a unidimensional latent trait (or additive composite), the regression-based MIMIC model relies on the availability of the actual responses on each predictor (x variable) across observations. The use of the original data from the exogenous predictors incorporates skewness and non-normality into the generated distributions of the separate, endogenous latent variables behind the observed ordinal reflective indicators. The shared variation across these separate latent variables, which constitute the latent trait or additive composite, may also include skewness and non-normality. Thus, the conditional effects of the regression-based MIMIC model incorporates multidimensionality within the latent trait or additive composite. The specification of the same items as both formative and reflective indicators incorporates all the relevant non-normality in the perfectly estimated (R 2 = 1) MIMIC model with an additive composite {reported in Table 2 (B)}, in contrast to the imperfectly specified and estimated instrumental variables approach (where R 2 is much lower) that may retain confounding biases by missing predictors. Thus, the multidimensionality incorporated by the formative indicators approach is comprehensive, nonbiased, and meaningful.
The reflective indicators may be more likely to tap symptom clusters that stem from the same or shared biological processes, whereas the formative indicators appear more likely to tap those that stem from non-shared biological processes common only to subsets of participants within the same disease group or subgroup. Certain symptoms can act in some individuals as formative indicators (e.g., poor appetite) in causing the latent trait of underlying depression at the same time that they can act in other individuals as reflective indicators as an effect of the latent trait (e.g., poor appetite as a manifest item of depression). This simultaneous specification allows the symptom in some participants to trigger, or precipitate, the underlying latent trait, whereas the same symptom in other participants is a result of, and may manifest from, or perpetuate, the same underlying latent trait. This modeling flexibility allows for two different symptom manifestations involving the same symptom.
The disease group interactions capture the synergies unique to particular variabledefined (and not participant-defined) subgroups. This allows for detecting the shared synergies across participants within that disease subgroup while factoring out those that occur only in some of the participants of that subgroup as formative indicators effects (otherwise expressed as influential outliers, heteroscedasticity, and/or multicollinearity within the recursive or unidirectional regression portion of the MIMIC model).
The regression model portion of the MIMIC model is similar to PCA (a regressionbased procedure), while CFA estimates the measurement model portion. However, the simultaneous estimation using both procedures results in some differences. The latent trait generated reflects both regression (akin to PCA) and CFA because both types of procedures condition it. Since the variation from both formative (causal) and reflective (effect) indicators shapes the derivation of the latent trait, it is likely to differ from the latent trait derived using only one of these procedures. As a modeling approach, it is more valid than assuming that the latent trait derives from only one of these procedures in the absence of evidence or other strong justification. This feature implies that metabolomic and symptom cluster studies that rely on only one of these two procedures may include biases that, in certain cases, could undermine statistical conclusion validity.
Formative (causal) and reflective (effect) indicators may both occur in specific disease processes, but research on symptom clusters does not properly incorporate each of them. The formative (causal) indicators that are statistically significant reflect different, and additive (non-overlapping), sources of variation across the significant items throughout the sample (i.e., the items tap different sources of variation; individual items cannot be dropped without affecting the influence of the remaining items). Thus, these symptoms based on formative indicators tap different sources of variation from participants in the overall sample. The reflective (effect) indicators that are statistically significant reflect shared (overlapping) variation across the significant items within the disease group or subgroup (i.e., the items tap similar variation; individual items can be dropped without affecting the influence of the remaining items). Thus, these symptoms within the symptom clusters based on reflective indicators tap the same or similar sources of variation that tend to co-occur within the same disease group or subgroup.
Symptom clusters based on reflective indicators occur because the symptoms that constitute the cluster are similar in that they stem from the same latent trait reflecting the underlying symptom level. Thus, dropping any of the symptoms does not affect the other symptoms within the cluster. However, symptoms based on formative indicators are different from each other as causes of the latent trait. Thus, dropping any symptom will change the nature of the other symptoms because it contributes a unique variation. Allowing both possibilities within disease groups or subgroups makes sense. Symptoms that are highly similar in their effects are likely to form clusters. On the other hand, symptoms that predict different portions of non-shared variation within the latent trait may or may not also co-occur (leading, if they do, to multicollinearity and heteroscedasticity from the data for the particular observations concerned), but are dissimilar in their effects (i.e., they are not also based on a common variation detected through factor analysis). The formative indicators approach is unique in modeling these multiple modeling influences.
When all panel items are formative indicators, the same data serve as formative (causal) and reflective (effect) indicators which participate equally in shaping the distributions of the latent trait and measurement model, as well as result in perfect fit (R 2 = 1) of the latent trait (Depression). This perfect fit (R 2 = 1) of the latent trait (Depression) means it is equivalent to a weighted, additive composite of all formative indicators. This composite can be assessed for individual observations, in contrast to a latent trait based on factor scores, valid for the sample at large, but indeterminate for individual observations (see Appendix C, note 1). Otherwise, the lack of inclusion of the formative indicators in the structural portion of the MIMIC model leads to a CFA model that retains confounding bias, results in a non-zero error term in the structural model, and cannot be estimated as determinate (nonstochastic; i.e., R 2 = 1), which does not allow a weighted additive composite to be derived. This means each symptom or item contributes equally to the overall level when in reality some items contribute disproportionately to the level of the latent trait (as an artifact of uncontrolled confounding bias that is allowed to operate through them in order to derive this equal weighting) (see Appendix C, note 2).
Arguably, these properties result in more valid derivations of these distributions than does the instrumental variables approach, which depends on the extent of capturing the important co-occurring conditions and confounders (i.e., unspecified confounders contribute to confounding biases) and results in much lower R 2 fit statistics. The equivalence of the latent trait to a weighted composite collapses the MIMIC model such that the in-dividual unique variation of the reflective (effect) indicators, considered "measurement error" within the measurement model prediction of the latent trait or weighted composite, are the only remaining type of error in prediction within the MIMIC model. Furthermore, this unique context of perfect fit in which the same measurement items are used as both formative and reflective indicators means that both types of indicators can be assumed to have internal consistency (see Appendix C, note 3). These properties make it attractive to use the same estimated weights to specify, a priori, a fixed-weight additive composite for use in subsequent MIMIC or structural equations models, either in the same or different samples of data.
It is not necessarily the formative indicators per se that are of interest in generating the latent trait but their use as proxies for unspecified confounders that influences its generation. Thus, we secure this control over what would otherwise be considered biases (heteroscedasticity, influential outliers, and multicollinearity) if the formative indicators were interpreted to operate strictly as explanatory variables that do not serve also as proxies for unspecified confounders. Rather, these so-called biases reflect influences from unspecified confounders for which the formative indicators serve as proxies. Just as we do not adjust specified control variables for issues such as heteroscedasticity and influential outliers because they serve only to partial out non-random noise (their slopes are not interpreted), we similarly use the formative indicators to partial out non-random noise from unspecified confounders that must operate through the mediating formative indicators in order to influence the latent trait. The retained, non-adjusted biases from heteroscedasticity, influential outliers, and multicollinearity from unspecified confounders, mediated through the formative (causal) indicators, all contribute variation that results in perfect model fit (see Appendix C, note 4).
The regression-based MIMIC approach with formative indicator has practical utility to address what would otherwise be unresolved residual confounding (due to limitations in collected data) and reverse causation (unanticipated, data-driven causal pathways in epidemiological studies that become unmasked as formative indicators). For instance, Heart Failure is a dummy variable, but in the absence of formative indicators, residual confounding may occur if there is a threshold effect based on the number of days and/or severity of heart failure symptoms. The fact that the structural model residual term (ε) is equal to zero means there is adjustment for all residual confounding when the latent trait is equivalent to the additive weighted composite, and it means no remaining variation contributes residual confounding to the residual term. The only other specified predictor(s) target and estimate the bidirectional relationships within a disease group or subgroup. Thus, this MIMIC model addresses epidemiological biases that would otherwise result from residual confounding and reverse causation (simultaneity). The absence of residual confounding and confounding due to unspecified reverse causation overcome biases that would otherwise occur from heterogeneity of effects in different subgroups (see Appendix C, note 5).

The Derived Protocol
The utility and promise of the formative indicators approach {Figure 1b} to specify MIMIC models for testing panels of symptoms, biomarkers, or metabolites in epidemiological samples requires the derivation and articulation of a protocol for such an approach. The experience of conducting the MIMIC analyses with formative indicators in Table 2 forms the basis for deriving the protocol. The protocol is useful to guide analysts in conducting separate runs of regression-based MIMIC models that include formative indicators.
First, the analyst specifies all pathways from the predictor terms representing the disease group or subgroup to all of the reflective indicators. This run reveals the set of statistically significant endogenous reflective indicators that cluster within the exogenous disease group (a predictor specified as a main effect) or exogenous disease subgroup (two or more predictors specified separately and together as components of interaction terms).
Second, the analyst specifies all pathways from the predictor terms representing the disease group or subgroup to all of the now-endogenous formative indicators (dropping the previous pathways to all of the reflective indicators to obtain an identified model). This run reveals the set of statistically significant formative indicators that cluster within the same exogenous disease group as suppressor effects.
Third, when the first of the two previous runs does not converge to yield unique estimates, the analyst reruns the MIMIC sequentially in two parts:

•
In the first part, the analyst no longer specifies all pathways from the disease group or subgroup to the latent trait or additive composite, retaining only the pathways from the disease group or subgroup to the reflective indicators. This broader modeling does not also adjust within the disease group or subgroup for the mediating pathway that accounts for the level of the latent trait or additive composite. Thus, the modeling completely attributes all effects to the reflective indicators without simultaneous inclusion of the influence of the overall level of the latent trait or additive composite within the disease group or subgroup (see Appendix C, note 6).

•
In the second part, the analyst includes all pathways from the disease group or subgroup to the latent trait or additive composite, dropping the pathways from the disease group or subgroup to the reflective indicators. This second part tests whether the latent trait or additive composite is statistically significant within the disease group or subgroup without considering whether any of the reflective indicators are statistically significant within the disease subgroup or group.
Fourth, the analyst may specify a replication of these separate runs in the overall sample at large (i.e., only the formative indicators are exogenous predictors, dropping the predictors representing the disease group or subgroup).
Appendix C, note 7 reflects on these steps with greater specificity, especially in relation to Figure 1b and Table 2.

The Pattern of Findings
In the MIMIC models to derive the protocol, I expand the causal indicator pathways predicted by four non-traditional CES-D items {Figure 1a} into one predicted by all twenty CES-D items {Figure 1b}. I run this expansion {Table 2 (B)} of the Descriptive MIMIC model {Table 1 (A)} for the targeted subgroup of Diabetes × Heart Failure and for related or derivative interactions to provide more thorough conditioning for confounders. There is consistent evidence over the years of a stable factor analytic structure of the CES-D Depression Inventory, along with clinical evidence that only four of the items are non-traditional. (The non-traditional items are included to optimize sensitivity and specificity in detecting actual cases of depression, but they should be modeled in such a way that they do not interfere with prediction by traditional items when it is necessary to identify traditional symptoms of depression that comprise symptom clusters with different presentations). These characteristics led to the development of thresholds for depression scores that reflect either subthreshold (CES-D ≥ 11 and CES-D < 16) or clinically significant (CES-D ≥ 16) levels [33][34][35].
Even so, development of the CES-D was as a screening instrument to identify cases of potential, clinically significant depression, which require follow-up to ascertain a clinical diagnosis [24]. It is possible the expanded model over-corrects the findings in some respects, even as it appropriately adjusts for additional confounding biases in others. In Table 2, note the very high slope value for Depression in the Diabetes × Heart Failure subgroup when specifying formative indicators only for the four nontraditional CES-D items {i.e., in (A)}. This slope is not significant when specifying formative indicators for all twenty CES-D items; only the CES-D item Happy, as a formative indicator, remains significant {i.e., in (B)}. From a clinical perspective, the sixteen traditional CES-D items constitute symptoms considered in diagnosing depression (e.g., Diagnostic and Statistical Manual-Version 5), and there is much factor-analytic evidence of their validation as reflective indicators (without their simultaneous modeling as formative indicators). However, this literature does not necessarily generalize to broad contexts of "depression in the context of medical illness" since many of these traditional depressive symptoms may also be symptoms of medical illness, and therefore modeling within each disease group or subgroup should also specify them as formative indicators to test bidirectional pathways. Indeed, the several statistically significant formative indicators in Table 2 revealed in the overall disease group, Heart Failure, suggests that these symptoms may be part of the medical illness, and do not necessarily stem from co-occurring underlying depression.
Regardless of etiology, the findings in Table 2 for the disease subgroup with both diabetes and heart failure (Diabetes × Heart Failure) suggest an important nexus for screening and intervention. The very high slope value for Depression in (A) reveals pronounced Depression when individuals experience both diabetes and heart failure, which suggests there may be much utility in targeting screening for symptoms within participants with both conditions. The findings in (B) suggests that much of this symptomatology may be direct symptoms of multimorbidity from these two medical conditions rather than stemming from a separate, underlying, co-occurring condition of depression. The patterns of symptomatology may be diverse and complicated across participants, such that only the CES-D item Happy remains statistically significant as a formative indicator. This finding, albeit solely from statistical modeling, provides indirect, tentative, and cautious support for the role of the protein titin (as discussed earlier in the review of the literature) in individuals with both diabetes and heart failure and for the potential of the medication metformin in treating both conditions. Only the CES-D item Happy remains significant, which suggests that as a group these individuals are prone to experience low positive affect (i.e., they are less likely to endorse feeling happy). Furthermore, the greater number of symptoms {especially in (B)} that form reflective symptom clusters when Hypertension, Heart Attack, and/or Excess Weight are also part of the disease subgroup, suggests that the role of the protein titin and/or other metabolomic pathways expresses through these additional sources of multimorbidity.
It is possible to make too much of the potential for overcorrection, especially compared to a more restrictive MIMIC model specification without formative indicators (which provides less flexibility to model any given panel item across participants) and when the investigation is exploratory. In the exploration of symptom, biomarker, or metabolite panels without any predetermined non-traditional items, we would not expect the unexplained variation in an item to bias the remaining items. The confounding effects related to each item are captured as part of the explained variation in the formative (causal) indicator (e.g., the twenty CES-D items used as formative indicators) on the latent trait (e.g., Depression) for each participant; the other remaining part of the explained variation is the unbiased effect of the measurement item. Thus, the formative (causal) indicator effect consists of both the confounding effects associated with the formative (causal) indicator, along with the unbiased effects of the formative (causal) indicator itself. This adaptive conditioning and modeling with formative (causal) indicators leads to expected unbiased reflective (effect) indicators.
There are parallel, mirroring processes captured by (1) the exogenous versus endogenous pathways, (2) the formative versus reflective indicators of the same measurement items that capture perfect model fit (R 2 = 1), and (3) the bidirectional, nonrecursive estimation of effects of the measurement items and the weighted additive composite. Considered together, these parallel, mirroring processes all tap more deeply, and precisely, in a modeling sense, how "multiple indicators" truly mimic their "multiple causes." Formative indicators tap the extent to which specific individual symptoms differ in their relationships to other symptoms. The formative indicators factor away biases from uncontrolled confounding factors, which could include differences in symptom expression in only some of the symptom items within the disease group or subgroup of interest. This strategy leaves the reflective indicators and the latent trait to tap the common or shared symptom expression across the full range of symptoms in the disease group or subgroup. It sidesteps the controversial issue as to whether symptoms should contribute variation to more than one symptom cluster because the formative indicators automatically factor out the influence of uncontrolled confounding factors, which may otherwise lead to heterogeneity in the effects of individual symptoms or across smaller subsets of symptoms.

Future Issues 4.4.1. Extending the Utility of the Regression-Based MIMIC Model
By capturing multidimensionality within the composite equivalent of the latent trait, the MIMIC model with formative indicators could overcome the restriction of unidimensionality in CFA within the measurement model of reflective indicators, in contrast to when CFA is used alone (outside the regression-based MIMIC framework). Just because a latent trait can be postulated and estimated when only reflective indicators are used in CFA does not necessarily mean the derived latent trait is the most valid estimate of the true latent trait. A true latent trait should have the property that allows it to be modeled by dissimilar formative symptoms that do not in themselves constitute a symptom cluster of reflective indicators. These formative symptoms provide additional, exogenous modeling information to reveal statistically significant reflective symptoms and symptom clusters by identifying this more plausible latent trait equivalent to the additive composite of the formative indicators. By capturing all of the variation across these formative indicators (i.e., R 2 = 1), this modeling provides determinacy of latent factor scores at the level of the individual observations because they are equivalent to the additive composite scores, in contrast to the indeterminacy of factor scores for individual observations from CFA outside of this MIMIC framework.
CFA is based on the restrictive assumption that the reflective indicators tap a unidimensional latent construct. Even if the exogenous predictors in the regression estimation of the MIMIC structural model are each unidimensional, the additive composite may not be since it consists of the weighted sums of the predictors. The latent trait may thus be multidimensional since the additive composite is equivalent to the latent trait. The determinacy of the latent trait/additive composite allows the structural portion of the MIMIC model to be separate as a multiple regression model. The multidimensionality among the predictors and within the latent trait/additive composite may be modeled by the regression-based structural portion of the MIMIC model, which overcomes the CFA restriction of unidimensionality, allowing the CFA portion to model also the same non-normal variation across the reflective (effect) indicators [5][6][7][8][9]. Even if the symptoms as reflective indicators together tap a unidimensional dimension (latent construct) within a broader multidimensional latent trait, this does not preclude that localities of multidimensionality within the latent trait may be modeled legitimately in both the structural and measurement-model portions of an encompassing MIMIC model.
It is apparent that a lack of thorough and careful attention to model specification decisions, both initially and in subsequent bias conditioning, may lead to undetected outliers, heteroscedasticity, and nonessential multicollinearity in regression-based models that lead to biased probability values and incorrect inferences within homogeneous and heterogeneous samples. However, the extent to which researchers in applied fields will consistently meet these standards continues to be rather limited. Lack of data for important variables may lead to misattributed effects even when meeting these standards. However, in regression-based MIMIC models with formative indicators, the combined regression (structural) and CFA (measurement) models together partial out unknown and unspecified confounding biases and create a more valid and precise additive composite of the shared trait "behind" what would be a more indeterminate latent trait shaped only by CFA. This modeling improvement leads to more sound probability values and confidence intervals. It helps safeguard against drawing incorrect inferences and therefore should be a promising option in the arsenal of approaches to address the scientific replication crisis (see Appendix C, note 8).
Analysts can also use the regression-based MIMIC with formative indicators when the exogenous predictor is for an intervention group rather than a disease group (such as a dummy variable, and in which the zero category refers to a comparison condition such as treatment as usual). The predictor can also be an interaction term that reflects the effect of the intervention within a targeted participant subgroup (e.g., in males) in which randomization in a randomized control trial (RCT) no longer holds within the subgroup. This addresses situations of residual confounding such as when RCTs involve insufficiently randomized small samples or in stratified or regression analysis where the specified confounder variable is not precise enough. Regression-based MIMIC analysis has promising utility for intervention subgroups that are no longer randomized and in observational studies lacking randomization altogether. Confounding bias categories based on dummy variables (e.g., whether a study is an RCT) may be insufficiently precise, resulting in residual confounding when randomization is not adequate across the overall sample. Finally, it may have special application not only within a disease group or subgroup, or for revealing the effect of an intervention, but it can target the intervention within a disease group or subgroup (e.g., diabetes × heart failure × intervention), either in an RCT or in a quasi-experimental or observational study. In these situations, the regression-based MIMIC model with formative indicators partials out confounding from imperfect or absent randomization, other residual confounding, and reverse causation.

Application to Metabolomic Profiling and Symptom Clusters in Epidemiology
Beyond these broader modeling concerns, there are issues specific to metabolomic profiling and symptom clusters in epidemiological studies. Measures of metabolite levels in high-throughput profiling studies and of symptoms in related epidemiological studies tend to be semi-quantitative, which can make it difficult to contrast and integrate findings across such studies [36]. In MIMIC models, even when minimum threshold concentrations for each metabolite are unknown, the inclusion of curvilinear and interaction predictors may still detect effects that may be masked when specifying only the main-effects (one-way) predictors. The higher-order effects manifest after exceeding this unknown threshold or as synergies based on thresholds within particular subgroups. The analyst may also adjust the different types and levels of confounding factors across these studies by specifying panel items as formative (causal) indicators that mediate their biasing influence, regardless of whether they are continuous or ordinal.
Ordinal MIMIC models allow greater modeling flexibility and may be more valid than continuous MIMIC models since reflective (effect) indicators often reflect semi-quantitative, ordinal, and not strictly continuous items. Ordinal variables with multiple rather than binary categories provide a better means for comparison across panel studies when fully quantitative metabolite concentrations are not feasible; they can be accommodated correctly through ordinal probit estimation of MIMIC models rather than over-estimated by considering reflective indicators of the panel items as if they were strictly continuous. Furthermore, ordinal specification of the reflective (effect) indicators avoids the need to specify numerous covariances (and possible over-fitting) across the reflective (effect) indicators in order to obtain acceptable, continuous-model fit statistics (which occurred in parallel continuous MIMIC model runs in [28]). There is much untapped scope to apply the MIMIC model, especially the ordinal probit MIMIC model, in metabolomic profiling and symptom cluster studies (see Appendix C, note 9).
Regression-based MIMIC interactions tap synergy among disease group predictors in order to reveal disease subgroups or subphenotypes implicated in prognostic or predictive enrichment. These disease subgroups or subphenotypes may differ from those based on the detection of clusters of observations derived in cluster analysis, an exploratory, data-driven approach with various options for feature selection and commonly used to detect metabolomic subgroups/subphenotypes and symptom clusters [10]. The moderated regression framework in regression-based MIMIC models avoids the feature selection issues inherent in selecting an optimal algorithm of cluster analysis for a given data context. This means that regression-based MIMIC models with measurement model items specified simultaneously as formative and reflective indicators can be tested as to whether they replicate and confirm the same, or similar, subgroups or subphenotypes detected in cluster analysis (and vice versa), thus providing analyses of statistical conclusion validity for the same data.
The MIMIC model is a promising approach to identify integrated, and not isolated, individual metabolomic processes (e.g., within the proteome), and to conduct analyses at the higher level of symptoms or the whole system. It allows us to determine which specific 'omics perspectives involving cells, tissues, organs, and symptoms-and more definitively, which of the observed items from a panel within any given 'omics focus-are most active as potential clinical targets [37]. A depressive symptom has a different meaning when it occurs as a residual depressive symptom with no clinical significance than when it occurs as part of clinically significant depression. Similarly, the meaning of a metabolite-specific effect is likely to differ when the functioning and metabolic reaction rate of the system is within a healthy range versus when it is functioning poorly. The first-order latent trait (or additive composite) reflects the total level of the known reaction networks within the particular 'omics focus in modeling the observed items from the panel, which could provide a proxy indication of the metabolic reaction rate. A poorly functioning system may reveal low or inconsistent effects across the metabolites. Although certain metabolites may have prominent effects, others may have impaired effects, resulting in an inadequate metabolic reaction rate suggested by the latent trait or weighted additive composite. These factors could precipitate and sustain prominent symptoms and symptom clusters.

Summary
Through an exploration of specific conditions (diabetes, heart failure, related vascular/metabolic diagnoses) and their multimorbidities, I developed a more thorough means to adjust confounders of clinical targets within main or interactive contexts (diseases, genes, epigenetic factors, risk factors) in epidemiological panel studies of symptoms, biomarkers, or metabolites. Regression-based multiple indicators-multiple causes (MIMIC) models combine multiple or moderated regression and confirmatory factor analysis. In a novel specification, each of twenty depressive symptoms is both a "formative" (causal) indicator and a "reflective" (effect) indicator of a latent trait (Depression). Formative indicators within the multiple regressions constitute comprehensive proxies for unspecified confounders (which may be unknown) by completely mediating all unspecified confounder effects on the endogenous latent trait and its reflective indicators, the latter estimated through confirmatory factor analysis. This strategy avoids the need to specify all confounders, which may not be possible or verifiable.
Using epidemiological data on metabolic and vascular conditions, I developed a protocol to specify MIMIC models that unveil unbiased clusters of psychometric items (the specific metabolites, biomarkers, or symptoms) of a latent trait (the overall level of the panel of items) within main or interactive disease contexts or even across the sample. The current investigation focused on the individual direct effects and synergistic comorbidity of diabetes and heart failure with this targeted diagnostic subgroup also expanded to incorporate synergies from related conditions, such as hypertension, silent cerebrovascular disease, and previous heart attack. It yields a protocol for testing a panel of metabolites, biomarkers, or symptoms as formative and reflective indicators in a MIMIC model, but beyond this specific epidemiological focus, the protocol is useful to guide analysts across disciplines in conducting separate runs of regression-based MIMIC models that include formative indicators.
Findings of symptom clusters of depression in specific conditions, and in subgroups that capture their multimorbidity synergies, corroborate parallel MIMIC models with instrumental variables that specify several known confounders, but suggest they retain some confounding biases. In particular, there is evidence of pronounced levels of depression when individuals experience both diabetes and heart failure. Other analyses suggest that much of this symptomatology may be direct symptoms of multimorbidity from these two medical conditions rather than stem from a separate, underlying, co-occurring condi-tion of depression. These findings indirectly support the role of the protein titin and the potential of the medication metformin in co-occurring diabetes and heart failure.

Model Specification and Methodological Contributions
A major contribution of this article is that it easily adapts the latent trait model into a nonrecursive, bidirectional specification of the MIMIC model, and in so doing provides a means to partial out unspecified confounding factors. I call attention to the unrecognized possibility of using the regression-based approach to model any given panel item as both an exogenous variable and an endogenous variable within the same MIMIC model. This overlooked strategy allows some or all items from a panel to be modeled simultaneously as formative (causal) and reflective (effect) indicators of a latent trait. It enables CFA to be conducted across the sample or to be targeted within an overall group or interactive subgroup while providing extensive and comprehensive control of confounding factors.
This innovation deciphers and adjusts background confounding biases by estimating bidirectional causal pathways based on both types of indicators of each panel item (and the derivative latent trait) in order to unveil effects from groups of participants (e.g., disease conditions) or subgroups (e.g., synergies from multimorbidity in co-occurring conditions). It relies on specifying non-recursive, bidirectional causal relationships between each panel item and the latent trait. This type of modeling adjusts for the impact of the formative indicator of each panel item as a cause of the latent trait that would otherwise bias the relationship of the latent trait on the reflective indicator of the same panel item. The lack of specification and adjustment for these relationships of reciprocal causation contributes confounding biases within the formative indicators portion of the MIMIC model. Regression bias in estimating the latent trait, in turn, triggers non-optimal, biased estimation across its reflective indicators.
Each unspecified confounder operates through its effects on each of the specified formative indicators that predict the latent trait. The error-conditioned reflective (effect) indicators can be modeled across the sample at large, in an overall group or within a more targeted subgroup of interactions among the group factors.
The full panel of observed items contribute to a latent trait representing the overall level of the total panel {e.g., Depression in Figure 1a,b}. The simultaneous control for the level of the total panel allows the estimation of more valid specific effects for individual observed items. This approach adjusts for the overall dynamic state of a system within cross-sectional data.
The instrumental variable in Figure 1a or the formative (causal) indicator in Figure 1b, which is exogenous, and the reflective (effect) indicator (in both panels) of the same item, which is endogenous, comprise bidirectional non-recursive pathways involving the latent trait (Depression). They derive essentially from the same item, which provides exogenous information in one pathway and is recovered as an endogenous factor in the other, and are therefore labeled differently as separate variables in the two pathways.
The use of instrumental exogenous variables of some of the psychometric items {Figure 1a}, or the use of original exogenous variables to incorporate formative (causal) indicators of all of the psychometric items {Figure 1b}, each allows a more expansive and flexible specification involving reflective (effect) indicators and bidirectional non-recursive pathways. However, the incorporation of formative (causal) indicators of all of the psychometric items in Figure 1b is likely to partial out confounding factors more thoroughly.
The highly similar, overlapping findings from both approaches provides evidence that the formative indicators approach (which does not include the control variables specified in the instrumental variables approach) adjusts for unspecified confounders. Furthermore, some additional reflective (effect) indicators within specific disease subgroups that were not significant in the instrumental variables approach (Table 1) become significant in the formative indicators approach (Table 2). This suggests that there are additional salient confounders beyond those specified as control variables in the instrumental variables approach and that the formative indicators approach has practical utility.
In certain disease subgroups, some Descriptive MIMIC regression slopes and standard errors in (A) of Table 1 are almost identical to those in (A) of Table 2. Thus, both specifications {instrumental variables and formative (causal) indicators} converge to estimate the same MIMIC model. In the remaining diagnostic subgroups, findings from the Descriptive MIMIC in (A) of Table 1 are strongly consistent with those in (A) of Table 2, with almost all of the same reflective (effect) indicators found statistically significant.
In addition to diagnostic predictors for the group or subgroup of interest, the explanatory MIMIC models reported in Table 1 (B) include a comprehensive (not exhaustive) set of related personal characteristics and diagnostic conditions (dummy variables) specified in order to control for known co-occurring conditions or confounders. (These are: Black, male, age 75 or older, not a high school graduate, recent widow, income equivalence adjusted for family size, isolated, smoker, alcohol consumption, hypertension, silent cerebrovascular disease, heart failure, excess weight, lost ten pounds during the past three months, diabetes, heart attack, and number of cerebrovascular risk factors). To a strong degree, the Explanatory MIMIC findings {Table 1 (B)} are similar to the findings in Table 2 (B), in which all twenty psychometric items are each specified as a formative (causal) indicator of a MIMIC model, along with the diagnostic predictors for the group or subgroup of interest but with no comprehensive set of specified confounders (see Appendix D, note 1).
The use of the original data from the exogenous predictors incorporates skewness and non-normality into the generated distributions of the separate, endogenous latent variables behind the observed ordinal reflective indicators. The shared variation across these separate latent variables, which constitute the latent trait or additive composite, may also include skewness and non-normality. Thus, the conditional effects of the regression-based MIMIC model incorporates multidimensionality within the latent trait or additive composite. The specification of the same items as both formative and reflective indicators incorporates all the relevant non-normality in the perfectly estimated (R 2 = 1) MIMIC model with an additive composite, in contrast to the imperfectly specified and estimated instrumental variables approach (where R 2 is much lower) that may retain confounding biases by missing predictors. Thus, the multidimensionality incorporated by the formative indicators approach is comprehensive, nonbiased, and meaningful (see Appendix D, note 2).
In the formative indicators approach, the same data serve as formative (causal) and reflective (effect) indicators, which participate equally in shaping the distributions of the latent trait and measurement model, as well as result in perfect fit (R 2 = 1) of the latent trait. This perfect fit (R 2 = 1) of the latent trait means it is equivalent to a weighted, additive composite of all formative indicators. This additive composite can be assessed for individual observations, in contrast to a latent trait based on factor scores, valid for the sample at large, but indeterminate for individual observations. Arguably, these properties result in more valid derivations of these distributions than does the instrumental variables approach, which depends on the extent of capturing the important co-occurring conditions and confounders and results in much lower R 2 fit statistics. The equivalence of the latent trait to a weighted additive composite collapses the MIMIC model such that the individual unique variation of the reflective (effect) indicators, considered "measurement error" within the measurement model prediction of the latent trait or weighted additive composite, are the only remaining type of error in the model (see Appendix D, note 3).
The formative (causal) indicator effect consists of both the confounding effects associated with the formative (causal) indicator, along with the unbiased effects of the formative (causal) indicator itself. This adaptive conditioning and modeling with formative (causal) indicators leads to expected unbiased reflective (effect) indicators.
The determinacy of the additive composite allows the structural portion of the MIMIC model to be separate as a multiple regression model. Multidimensionality among the predictors and within the additive composite may be modeled by the regression (structural) portion of the MIMIC model, which overcomes the CFA restriction of unidimensionality, allowing the CFA portion to also model the same non-normal variation across the reflective (effect) indicators [5][6][7][8][9].
In regression-based MIMIC models with formative indicators, the combined regression (structural) and CFA (measurement) models together partial out unknown and unspecified confounding biases and create a more valid and precise additive composite of the shared trait "behind" what would be a more indeterminate latent trait shaped only by CFA. This modeling improvement leads to more sound probability values and confidence intervals. It helps safeguard against drawing incorrect inferences and therefore should be a promising option in the arsenal of approaches to address the scientific replication crisis.

Further Implications for Research on Symptom Clusters
Formative and reflective indicators may both occur in specific disease processes, but research on symptom clusters does not properly incorporate each of them. The formative (causal) indicators that are statistically significant reflect different, and additive (nonoverlapping), sources of variation across the significant items throughout the sample (i.e., the items tap different sources of variation; individual items cannot be dropped without affecting the influence of the remaining items). Thus, these symptoms based on formative indicators tap different sources of variation from participants in the overall sample. The reflective (effect) indicators that are statistically significant reflect shared (overlapping) variation across the significant items within the disease group or subgroup (i.e., the items tap similar variation; individual items can be dropped without affecting the influence of the remaining items). Thus, these symptoms within the symptom clusters based on reflective indicators tap the same or similar sources of variation that tend to co-occur within the same disease group or subgroup.
Symptom clusters based on reflective indicators occur because the symptoms that constitute the cluster are similar in that they stem from the same latent trait reflecting the underlying symptom level. Thus, dropping any of the symptoms does not affect the other symptoms within the cluster. Allowing both possibilities within disease groups or subgroups makes sense. Symptoms that are highly similar in their effects are likely to form clusters. On the other hand, symptoms that predict different portions of nonshared variation within the latent trait may or may not also co-occur (leading, if they do, to multicollinearity and heteroscedasticity from the data for the particular observations concerned), but are dissimilar in their effects (i.e., they are not also based on common variation detected through factor analysis). The formative indicators approach is unique in modeling these multiple influences.
Formative indicators tap the extent to which specific individual symptoms differ in their relationships to other symptoms. The formative indicators factor away biases from uncontrolled confounding factors, which could include differences in symptom expression in only some of the symptom items, or in smaller clusters with fewer symptoms, within the disease group or subgroup of interest. This strategy leaves the reflective indicators and the latent trait to tap the common or shared symptom expression across the full range of symptoms in the disease group or subgroup. It sidesteps the controversial issue as to whether symptoms should contribute variation to more than one symptom cluster because the formative indicators automatically factor out the influence of uncontrolled confounding factors, which may otherwise lead to heterogeneity in the effects of individual symptoms or across smaller subsets of symptoms.
Ordinal variables with multiple categories for psychometric items provide a better means for comparison across panel studies when fully quantitative symptom measures or metabolite concentrations are not feasible. As in the current study, they can be accommodated correctly through ordinal probit estimation of MIMIC models rather than over-estimated by modelling reflective indicators of the panel items as if they were strictly continuous. Furthermore, ordinal specification of the reflective (effect) indicators avoids the need to specify numerous covariances (and possible over-fitting) across the reflective (effect) indicators in order to obtain acceptable, continuous-model fit statistics (which occurred in parallel continuous MIMIC model runs in [28]). There is much untapped scope and potential to apply the MIMIC model, especially the ordinal probit MIMIC model, in studies of symptom clusters or metabolomic profiling.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/math9212715/s1, Figure S1: Footnotes to Figure 1; Thus, the use of both partitions of the unique and shared variation from data captured by the regression-based structural model, and by the CFA model, to create the additive weighted composite is more valid than a purely CFA-determined latent trait of only the shared portion of the data. The weighted additive composite also comprises the unique portion of the variation, unlike the latent trait that would be determined only by CFA, which means it has a determinate value for each observation, in contrast to factor scores from CFA. The weighted additive composite provides the best total score within the particular disease group or subgroup tested in the MIMIC model. The different weights make sense since some items are more pronounced in their effects in the particular disease group or subgroup. In contrast, CFA alone provides a latent trait total score, based equally on all items, which is valid across the sample at large. Some items, based on their formative indicators, are more important influences in shaping the additive weighted composite than the unique variation in other items. Thus, the greater the unique variation contributed by an item, the more it contributes to the additive weighted composite. Since only the unique variation is contributed by a formative indicator, the statistical significance of two formative indicators does not mean that they constitute a symptom cluster within the same disease group or subgroup, only that these two formative indicators are significant in the sample at large.
3. Of course, the reflective indicators inter-correlate since they all stem from a common cause (the latent trait, which in this case, is equivalent to the composite). Because the formative indicators are the same measurement items specified as reflective indicators, they must also be inter-correlated, which suggests they share similar antecedents and consequences, although not necessarily completely, as it is the unique or non-shared variation within each formative indicator that predicts the composite.
4. However, since the structural model taps only the variation unique to each formative indicator, changing the order of specifying the formative indicators as predictors should not shift the regression slope values of the composite weights. Similarly, measurement model loadings do not shift from changing the order of specifying the reflective indicators.
5. The property of local independence and the conditional independence of the reflective (effect) indicators mean that their prediction within disease or other groups and subgroups, in the measurement-model portion of the MIMIC model, avoids regressionbased complications, often undetected or unadjusted, that may compromise effects by the formative (causal) indicators in the structural portion of the MIMIC model. These include biases from multicollinearity, heteroscedasticity, and influential outliers that arise from differential effects in subsets of participants and from unspecified confounders, and may undermine the local independence of effects by formative (causal) indicators. Thus, the formative indicators approach appears sound to interpret the reflective (effect) indicators within disease or other groups and subgroups.
6. These are not unique effects but rather total effects of the reflective indicators comprising the symptom cluster. When the initial model of the unique effects is inestimable, we cannot calculate the total effect of each reflective indicator (i.e., its unique effect plus the effect of the latent trait or additive composite within the disease group or subgroup). Rather, this broader modeling detects these hidden effects by estimating the total effects of each reflective indicator directly.
7. The following reviews these steps with greater specificity, especially in relation to Figure 1b and Table 2. 1.
The first part of a protocol for conducting MIMIC models in symptom, biomarker, or metabolite panels is to run the model with all metabolites, biomarkers, or symptoms as both formative and reflective indicators, as in Figure 1b. There is a possibility of pronounced confounding biases especially when predictors represent overall groups (i.e., without interactions among predictors that would detect effects within targeted subgroups). I recommend including formative indicators of all measurement items for all MIMIC models, especially when MIMIC models are limited to main (noninteractive) effects. In Table 2, these are either unlabeled, or listed as "Full" runs when also reporting other types of MIMIC runs for the same diagnosis subgroup (in order to distinguish them).

2.
Each MIMIC model should then be rerun to account for potential suppressor effects by predicting each formative (causal) indicator by the predictors comprising the exogenous group or subgroup {i.e., the formative indicator becomes a mediator and switches from exogenous to endogenous}. These are "Mediated" runs in Table 2.
{While the formative (causal) indicators serve to mediate unspecified confounders in all of the MIMIC analyses, the term "mediated" in this second step of the protocol refers to MIMIC models in which the formative (causal) indicators mediate the specified group or subgroup. Also, note that Table 2 reports some mediated runs when only the four non-traditional CES-D items have formative indicators {i.e., in (A)}. In these cases, mediated runs do not also yield estimates when all twenty CES-D items have formative indicators {i.e., in (B)} because the model is unidentified. These runs are "Non-Mediated" in order to distinguish them from the mediated run in (A)}.

3.
Some full or mediated MIMIC models do not converge to yield unique estimates of the latent trait or additive composite along with all of its reflective indicators, which leads to the third approach in the protocol. In sequential estimation, the MIMIC model is first specified and run to interpret the reflective indicators without also controlling for the latent trait or additive composite {e.g., Depression; all of the predictor arrows are dropped that lead to Depression (represented by pathways 1 and 4 in Figure 1b}. The analyst then re-specifies and reruns it to interpret the latent trait or additive composite without also controlling the individual reflective psychometric items {dropping all '2' pathways in Figure 1b}. These are "Sequential" runs in Table 2.
{This sequential approach detects statistically significant panel items without also adjusting for the overall level across all items, or detects a statistically significant latent trait without also adjusting for individual items. It is especially appropriate when specifying a latent trait or additive composite only as a device to model a panel of metabolites, biomarkers, or symptoms but is not inherently and substantively meaningful. For instance, the analyst may specify a panel of metabolites even though they are not all necessarily reflective indicators or a single latent trait or dimension; the latent trait merely controls for the overall level of metabolites, even if they are not unidimensional.} 4.
A final approach in the protocol is the exclusion of exogenous predictors representing specific groups or subgroups {e.g., all of the '1 pathways representing the diagnosis subgroup (Diabetes, Heart Failure, and their interaction) are dropped in Figure 1b} to provide estimates across the entire panel sample rather than within a predetermined disease group or subgroup from the sample. The analyst may run this final approach whether or not any of the previous approaches converge to an optimal solution. It is appropriate if there is no real basis to identify specific groups or to target subgroups. The unreported MIMIC model for the sample at large in the current study reveals all of the reflective indicators to be statistically significant, with items of dysphoria or low positive affect (Depressed 2, Sad 2, Blues 2, Happy 2) having the highest measurement loadings (0.616 to 1). All of the formative indicators are statistically significant, with two items of dysphoria (Depressed, Sad) having the highest regression slopes (0.724 and 0.671, respectively).
8. How might analysts integrate formative and reflective indicators of the same measurement items into covariance-based MIMIC models? Even as both indicators provide identical information under different variable names, it is unclear whether covariancebased estimation can accommodate both within the same MIMIC model, especially since all variables are modeled as endogenous factors (in contrast to the exogenous nature of the formative indicators in regression-based MIMIC models). For instance, the formative indicators would duplicate the same pattern of covariances as the reflective indicators, the variance of each formative indicator would be equivalent to the variance of its counterpart reflective indicator, and the covariance of each formative indicator with its counterpart reflective indicator would be equivalent to this variance. However, these three factors might result in a non-invertible covariance matrix. As another strategy to create a covariance matrix for analysis, the analyst would first specify a multiple regression (without latent variables) to predict each of the observed psychometric items using all remaining items. This approach would yield a highly correlated, but not perfect, relationship in which the predicted values serve as an instrumental variable for each item. If the original data across all observations is not available, but only the covariance matrix, the analyst can estimate a covariance-based structural equation predicting each item by the remaining items in order to create its instrumental variable. Assuming derivable estimates for each of the instrumental variables, they can serve as formative indicators, and the original predictors can serve as reflective indicators, in the subsequent covariance-based MIMIC model. The covariance-based MIMIC model should result in a small residual distribution (ζ) in the structural portion of the model {unlike in a regression-based MIMIC with perfect fit and the residual (ζ) is a constant equal to zero}. The analyst can either fix this residual term to zero prior to estimation, or ignore it and interpret the latent trait as an additive composite. The two strategies suggested in this paragraph need testing and vetting in different data to determine their validity and feasibility.
9. This novel approach with formative (causal) indicators that partial out biases from confounders may provide an alternative analysis, or replication, to relying on metabolite set enrichment, in which prior knowledge from previous research about sets of genes involved in cellular processes and generated scores from sets of metabolites under different conditions are compared [36,39]. Analysts use metabolite set enrichment especially when knowledge from the current specific context is lacking or confounded, or is difficult to assemble or derive from external databases. The approach with formative (causal) indicators may also be an alternative, or serve to replicate, analyses based on Mendelian randomization, a technique that uses genetic variants as instrumental variables to estimate causal relationships between metabolites and traits or diseases. The genetic variants (instrumental variables) are not associated with confounders due to their random assignment from parental genotypes during the formation of gametes [40]. On the other hand, I advise caution when deriving findings is entirely or primarily based on statistical modeling using formative (causal) indicators without reliance on sets of genes or genetic variants for substantive justification and corroboration. In these circumstances, the statistical approach with formative indicators may validate the application of metabolite set enrichment or Mendelian randomization, but its lack or insufficient genetic information means it may be questionable as a complete replacement. because a latent trait can be postulated and estimated when only reflective indicators are used in CFA does not necessarily mean the derived latent trait is the most valid estimate of the true latent trait. A true latent trait should have the property that allows it to be modeled by dissimilar variation across formative indicators that do not in themselves constitute a cluster of reflective indicators. The formative indicators provide additional, exogenous modeling information to reveal statistically significant reflective indicators and clusters by identifying this more plausible latent trait equivalent to the additive composite of the formative indicators. By capturing all of the variation across these formative indicators (i.e., R 2 = 1), this modeling provides determinacy of latent factor scores at the level of the individual observations because they are equivalent to the additive composite scores, in contrast to the indeterminacy of factor scores for individual observations from CFA outside of this MIMIC framework.
3. Furthermore, this unique context of perfect fit, in which the same measurement items are used as both formative and reflective indicators, means that both types of indicators can be assumed to have internal consistency. Of course, the reflective indicators inter-correlate since they all stem from a common cause (the latent trait, which in this case, is equivalent to the weighted additive composite). Because the formative indicators are the same measurement items specified as reflective indicators, they must also be inter-correlated, which suggests that they share similar antecedents and consequences, although not necessarily completely, as it is the unique or non-shared variation within each formative indicator that predicts the weighted additive composite. These properties make it attractive to use the obtained estimates of the weights to specify, a priori, an additive composite based on these fixed weights for use in subsequent MIMIC or other structural equations models (e.g., multiple regression), either in the same or different samples.