Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM

Vanbrabant, Leonard; Rosseel, Yves

doi:10.3390/math14111936

Open AccessArticle

Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM

by

Leonard Vanbrabant

^1,2,* and

Yves Rosseel

¹

Department of Data-Analysis, Ghent University, B-9000 Ghent, Belgium

²

Research Department, GGD West-Brabant, 4816 CZ Breda, The Netherlands

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(11), 1936; https://doi.org/10.3390/math14111936

Submission received: 15 April 2026 / Revised: 27 May 2026 / Accepted: 28 May 2026 / Published: 2 June 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Non-convergence in small-sample structural equation modeling (SEM) is frequently driven by localized pathologies in the sample covariance matrix—specific covariance patterns that, for a given model specification, substantially elevate the risk of non-convergence. We propose a diagnose–localize–shrinkage framework in which a machine learning classifier predicts SEM non-convergence and SHAP values localize the covariance pairs that drive this prediction. Sample covariance matrices are generated independently of the fitted SEM and modified through controlled pathology injection. A fixed SEM is then fitted to each matrix, and the resulting convergence status is used as the prediction target. The classifier is trained on the unique off-diagonal elements of the corresponding correlation matrix. The SHAP-identified covariance pairs are then used to construct a local shrinkage step toward a well-conditioned model-based target matrix to evaluate whether SEM convergence can be restored with limited distortion of the original covariance pattern. We demonstrate the proposed method using an illustrative example.

Keywords:

structural equation modeling; non-convergence; small samples; covariance matrix pathology; local shrinkage; machine learning; SHAP

MSC:

62H25

1. Introduction

Non-convergence is a persistent and practically relevant problem in structural equation modeling (SEM). When an estimation algorithm fails to converge, parameter estimates, standard errors, and model fit measures are unavailable, which prevents researchers from addressing their substantive research questions.

Empirically, non-convergence occurs disproportionately in small samples, where two distinct datasets of equal (small) sample size may show a contrasting convergence status even when fitted to the same SEM specification. This relationship is well documented [1,2], but is often summarized as insufficient information or increased sampling variability. While such explanations are correct, they do not fully capture the mechanisms through which small samples lead to estimation failure. In SEM, parameter estimation is usually based on the discrepancy between the sample covariance (VCOV) matrix S and its model-implied counterpart

Σ

. In small samples, this discrepancy depends less on the exact model specification and more on the numerical properties of the VCOV matrix. Throughout the remainder of this paper, the term VCOV refers to the sample covariance matrix S, unless explicitly stated otherwise.

Limited sample sizes often produce VCOV matrices that are ill-conditioned (i.e., with a large spread in eigenvalues (In the context of a VCOV matrix, eigenvalues indicate how much of the total variance is explained by independent linear combinations of variables. Large eigenvalues correspond to dominant directions of variation in the data, whereas eigenvalues close to zero signal strong linear dependencies or numerical instability. The condition number, defined as the ratio of the largest to the smallest eigenvalue, quantifies the numerical stability of the matrix: a large condition number indicates that the matrix is ill-conditioned and that small changes in the data may lead to large changes in parameter estimates)), nearly singular (i.e., with eigenvalues close to zero), or, for more general sample covariance/correlation matrices such as tetrachoric or polychoric correlation matrices or matrices based on pairwise deletion, even indefinite (i.e., containing negative eigenvalues). Extreme covariances, unstable eigenvalues, and inflated condition numbers are therefore not exceptional, but may occur more frequently when information is limited. These properties directly affect the computation of the discrepancy function, its gradient, and its Hessian. As a result, optimization algorithms may be unable to find a good solution, may get stuck near the boundary of the parameter space, or may terminate due to numerical instability. From this perspective, non-convergence should not be viewed solely as an optimization failure, but as an indication of problematic VCOV patterns caused by sampling variability, sparse information, or the way the covariance/correlation matrix was estimated.

A range of approaches has been proposed to address small-sample and numerically unstable SEM. Broadly, these can be divided into two categories. First, some approaches modify the input statistics by adjusting the sample VCOV matrix prior to estimation. A widely used strategy is shrinkage estimation [3,4,5,6,7,8], in which the sample VCOV matrix is combined with a structured, well-conditioned target so as to improve numerical stability and increase the likelihood of positive definiteness. Recent work has extended this idea to small-sample multilevel SEM by replacing the sample covariance matrix with a regularized shrinkage estimate [9]. Second, other approaches leave the input statistics unchanged but modify the estimation procedure or restrict the parameter space. These include regularized and penalized SEM [10,11,12], although recent simulations show that these methods themselves can suffer from non-convergence [13]; Bayesian regularized SEM [14,15,16,17,18]; bounded estimation [19,20]; and various forms of factor score regression [21,22,23].

The broader covariance-estimation literature has shown that shrinkage can substantially improve the conditioning and stability of covariance estimates, particularly when the sample covariance matrix is noisy, high-dimensional, or poorly conditioned, e.g., [24]. These developments provide the general statistical rationale for replacing an unstable sample covariance matrix by a better-conditioned compromise between the sample matrix and a structured target. However, such approaches are typically designed to improve the covariance estimator as a whole. They do not aim to identify which specific covariance elements contribute to estimation failure in a given SEM.

This limitation extends beyond covariance-based approaches: regularization at the parameter level—whether through penalized maximum likelihood, Bayesian priors, or factor score regression—equally treats the SEM as a global optimization problem, without diagnosing which parts of the sample covariance pattern are responsible for estimation failure.

The current paper builds on the work of [4], who propose a shrinkage approach specifically designed for SEM. They show that shrinkage approaches—in which the VCOV matrix is combined with a highly structured and well-conditioned shrinkage target matrix T—substantially improve convergence rates in small samples. More formally, the method constructs an adjusted VCOV matrix as a weighted combination of S and T, given by

(1 - λ) S + λ T

, where

λ

denotes the shrinkage intensity with

0 \leq λ \leq 1

. However, the shrinkage is applied uniformly to the entire VCOV matrix and therefore does not identify which covariance elements drive non-convergence in any particular case.

We address this diagnostic gap by treating non-convergence as an observable outcome of the interaction between the VCOV matrix and a specified SEM. The same VCOV matrix may lead to non-convergence for one SEM specification while converging without difficulty for another. To study this interaction, we construct a large-scale, model-independent adversarial matrix-generating process that produces a wide range of VCOV matrices spanning realistic and stress-test conditions, including near-singular, indefinite, and highly correlated cases. A pre-specified SEM is then fitted to each generated VCOV matrix to obtain a convergence label.

Using these labeled matrices, we train a machine learning model—inspired by ([25], Chapter 5)—to predict SEM non-convergence using only information from the VCOV matrix. SHAP (SHapley Additive exPlanations) values are then used to decompose the prediction into feature-specific contributions [26]. This allows us to identify covariance pairs that contribute most strongly to the predicted risk of non-convergence. Based on these identified pairs, we outline a localized shrinkage strategy that targets destabilizing covariance patterns while preserving the overall structure of the VCOV matrix as much as possible.

2. Materials and Methods

A fitted SEM may fail to converge for a given VCOV matrix, but it is often unclear which parts of the VCOV are responsible for this behavior. To illustrate and study this mechanism in a controlled setting, we consider a simple SEM shown in Figure 1 as a running example. For this model the number of observed variables is

p = 6

.

2.1. Adversarial Generation of VCOV Matrices and Pathology Injection

In contrast to traditional SEM simulations that rely on correctly or incorrectly specified population models, we separate sample covariance/correlation patterns from model truth. To this end, we adopt a model-agnostic simulation approach with controlled pathology injection [27], where sample covariance/correlation matrices (

R_{s}

) are generated independently of any fitted model and then modified in controlled ways to reflect realistic but problematic empirical situations. This stress-test design is conceptually related to adversarial contamination ideas [28]. It allows us to study non-convergence as a function of the sample-matrix pattern itself, rather than as a by-product of model misspecification.

Because the matrices are generated directly rather than sampled from finite datasets, the matrix-generation mechanism is independent of sample size. The injected pathologies should therefore be interpreted as matrix-level stress-test conditions designed to mimic numerical features often encountered in small-sample SEM, such as near-singularity, unstable local correlations, and poor conditioning.

Figure 2 summarizes the full diagnose–localize–shrinkage workflow; the individual steps are described in the subsections that follow.

In practice, we first sampled, with equal probability, a baseline template of size

p \times p

, with

p = 6

, from a heterogeneous set of generators designed to capture common dependence patterns observed in practice (i.e., Wishart-type, block-structured, Toeplitz-like, low-rank, and uniform random templates). Illustrative examples of the baseline templates are presented in Appendix A. Next, we injected controlled pathologies by manipulating the smallest eigenvalue and local covariance patterns; see Appendix B for a detailed description of the injection mechanisms. Pathology severity was controlled using a pre-specified distribution: clean cases were sampled with probability 45%, mild cases with 25%, moderate cases with 15%, severe cases with 10%, and extreme cases with 5%. Pathology types included (i) near-singularity, generated by shrinking the smallest eigenvalue to a severity-dependent range, (ii) indefiniteness, generated by forcing a small negative eigenvalue with severity-dependent magnitude, and (iii) extreme covariance clusters, created by imposing high within-block correlations for a randomly selected subset of variables. A mixed condition combined these mechanisms while avoiding unrealistically extreme matrices.

A total of 50,000 adversarial VCOV matrices were sampled. All generated matrices were symmetrized and standardized to correlation form, with a unit diagonal. The “clean” condition was generated using a rejection criterion that retained only well-conditioned, strictly positive definite matrices. Candidate matrices that did not satisfy these criteria were discarded and resampled. If repeated attempts failed, a nearest positive definite projection [29] was used to obtain a stable clean reference.

To evaluate whether the adversarial matrix-generation procedure produced the intended numerical variation, we computed model-independent diagnostics. Specifically, we inspected the smallest eigenvalue, the condition number, the maximum absolute off-diagonal correlation, and the proportion of matrices with at least one non-positive eigenvalue across pathology severity levels.

Figure 3 shows that the severity levels induced systematic variation in matrix stability. Clean matrices were generally well-conditioned and positive definite, whereas matrices from more severe conditions showed poorer conditioning, more extreme local correlations, and more frequent proximity to, or violation of, positive definiteness. Thus, the adversarial generation procedure produced a heterogeneous set of sample matrices with the intended matrix-level numerical properties.

In a second step, a SEM (see Figure 1) was fitted to each adversarially generated correlation matrix

R_{s}

using both the R [30] package lavaan ([31], version 0.6-21) and OpenMx ([32], version 2.21.13). The convergence status reported by each package was recorded, with 1 indicating that the model failed to converge and 0 indicating that the model converged. After all iterations had been completed, only cases with equal convergence status in both packages were retained for further analysis. This filtering step reduced the influence of optimizer-specific behavior. A model was classified as converged if (i) the optimizer reported convergence and (ii) all standard errors could be computed, indicating that the local curvature information required for standard error estimation was available.

In total,

N = 13,506

cases converged in both lavaan and OpenMx, whereas

N = 30,922

resulted in non-convergence. The resulting dataset therefore consisted convergence status and the vectorized lower-triangular of the correlation elements. The rate of non-convergence increased monotonically with the severity of the injected pathology (see Table 1). This pattern provided an internal validity check of the adversarial data-generating process, confirming that increasing numerical severity was associated with an increased likelihood of non-convergence. The deviation for the extreme category was likely related to inadmissible solutions, indicating that numerical convergence alone is not a sufficient indicator of model validity. The aim of this study, however, is not to guarantee admissible model solutions, but to increase the likelihood of convergence in the presence of problematic correlation patterns.

2.2. An XGBoost Model to Detect Local Correlation Instability

To identify local correlation instability associated with non-convergence, we trained a gradient-boosted decision-tree classifier using the Extreme Gradient Boosting algorithm, implemented in the R package xgboost ([33,34], version 3.1.3.1).

The prediction target was coded as 0 for convergence and 1 for non-convergence. The predictors consisted of the vectorized lower-triangular elements of the correlation matrix. To stabilize the scale of the correlation features and to reduce the influence of extreme values near

\pm 1

, all correlations were transformed using the Fisher-z transformation before model fitting.

The full dataset was split into training, validation, and test sets using a stratified sampling procedure, thereby preserving the proportion of converged and non-converged cases across splits. Specifically, 70% of the data were assigned to the training set, 15% to the validation set, and the remaining 15% to the test set. This three-way split allowed model development and hyperparameter tuning to be performed without using information from the final test set.

Model training used a binary logistic objective function with a fixed learning rate of 0.1. To account for class imbalance between converged and non-converged cases, class weighting was applied to the loss function. The optimal number of boosting iterations was selected using 5-fold cross-validation within the training data, with early stopping based on cross-validation performance to reduce the risk of overfitting.

Predicted probabilities were obtained for the validation and test sets. The validation set was used during model development, whereas the test set was reserved for the final evaluation of model performance. Performance was assessed using multiple metrics, including logloss, AUC-PR, AUC-ROC, and the Brier score. The results in Table 2 show that the XGBoost model discriminated well between converged and non-converged cases. Although predictive accuracy is not the primary objective of the diagnostic model, an adequate level of discrimination is required before model attributions can be used to identify correlation patterns associated with non-convergence.

A final model was trained on the combined training and validation data using the optimal number of boosting iterations identified above.

SHAP-Based Localization of Unhappy Correlation Pairs

Next, we identified the specific correlation pairs that contributed most to the predicted risk of non-convergence, referred to here as unhappy correlations. The detailed algorithmic steps are provided in Appendix C. Here, we summarize the procedure at a conceptual level.

To localize unhappy correlations, SHAP values were used to decompose the model prediction into additive contributions of individual correlation pairs. For each observation (i.e., each correlation matrix), the SHAP value of a feature quantified how much the corresponding correlation pair increased or decreased the predicted risk of non-convergence relative to the model baseline. Global importance was summarized by the mean absolute SHAP value across all observations, providing a ranking of the correlation features most strongly associated with non-convergence.

Two qualifications of this procedure should be noted. First, many commonly used SHAP implementations rely, either explicitly or implicitly, on assumptions about feature independence that are not strictly met in the present setting: the features are pairwise correlations from a single matrix and are therefore structurally dependent. This dependence can affect how attribution is distributed across closely related correlation pairs [35]. Second, SHAP values quantify contributions to model predictions rather than causal effects on convergence. We therefore used SHAP as a localization device: it identified which correlation pairs the classifier associated most strongly with predicted non-convergence, rather than providing a causal explanation of why the SEM failed to converge. The practical usefulness of the localization was evaluated empirically by examining whether the subsequent targeted repair step restored convergence for the original matrix.

The selection of unhappy correlations proceeded in three steps. First, only correlation pairs with positive SHAP values were considered, because these pairs increased the predicted risk of non-convergence according to the diagnostic classifier. Negative SHAP values were ignored because they indicated correlation pairs associated with a lower predicted risk of non-convergence. Second, the absolute SHAP value of each pair was evaluated relative to its empirical distribution across the training data. This yielded a percentile score that reflected how extreme the contribution was compared with typical cases. Third, a combined score was computed as the product of the SHAP value and its percentile rank. Only correlation pairs that exceeded the predefined thresholds were retained.

This procedure ensured that the selected unhappy correlations were both influential for the current case and unusually large relative to what was typically observed in the training data. To limit unnecessary modifications, the number of unhappy correlations per case could be capped. The final set of unhappy correlations was then used in the targeted repair step.

2.3. Shrinkage Approach for Targeted Repair of Correlation Patterns

In the previous section we identified a small set of unhappy correlations associated with non-convergence. We next describe how these unhappy correlations can be used to construct a localized shrinkage update with minimal distortion of the original correlation pattern.

2.3.1. Construction of the Model-Based Target Correlation Matrix

The localized repair operator shrinks selected correlations in

R_{s}

toward a well-conditioned and highly structured target matrix

R_{target}

. Rather than relying on a generic identity or constant-correlation target, we used a model-based target matrix that reflected the assumed measurement structure of the SEM. This choice was inspired by the model-based shrinkage framework of [4], but differs in that the target matrix was used here as a reference for localized repair rather than as a global replacement of the VCOV matrix.

For the six-indicator example, the target correlation matrix has the block structure

R_{target} = \begin{matrix} (\begin{matrix} 1.00 \\ 0.80 & 1.00 \\ 0.80 & 0.80 & 1.00 \\ 0.16 & 0.16 & 0.16 & 1.00 \\ 0.16 & 0.16 & 0.16 & 0.80 & 1.00 \\ 0.16 & 0.16 & 0.16 & 0.80 & 0.80 & 1.00 \end{matrix}) . \end{matrix}

We constructed this target using a well-conditioned SEM with two latent factors and three indicators per factor (see Figure 1). The factor loadings are fixed at 0.7, implying equally strong indicators. Indicator reliabilities are fixed at

0.8

, yielding homogeneous residual variances, all latent correlations are set to

0.2

, and residual errors were assumed to be uncorrelated. The resulting model-implied covariance matrix was then rescaled to correlation form.

2.3.2. Localized Shrinkage Update of Unhappy Correlations

A purely local update of only the unhappy correlations may be too restrictive, because non-convergence can reflect broader dependency patterns rather than a single destabilizing correlation. Therefore, the adjustment of each unhappy correlation was also propagated to connected correlations. The size of this secondary adjustment depended on the strength of the corresponding correlation and was controlled by a smooth sigmoid weighting function. Correlations more strongly connected to an unhappy correlation received a larger adjustment, resulting in a more coherent modification of the correlation matrix.

For a given shrinkage level

λ

, this procedure yielded a candidate repaired correlation matrix

R (λ)

that preserved symmetry and a unit diagonal by construction. However, the localized shrinkage update did not guarantee positive definiteness. Therefore, for each candidate value of

λ

, we computed the smallest eigenvalue of

R (λ)

and rejected the candidate unless

λ_{\min} {R (λ)} > ϵ,

where

ϵ

is a small positive tolerance. The shrinkage level

λ

was selected by line search. Starting from a small initial value,

λ

was increased until

R (λ)

was positive definite and the SEM fitted to

R (λ)

converged. The first accepted candidate was retained, corresponding to the smallest accepted value of

λ

and therefore minimizing distortion of the original correlation pattern.

2.3.3. Jackknife-Based Reference Interval

To assess whether a repaired correlation introduces changes that exceed the natural variability of the observed data, we evaluated the size of each correlation adjustment relative to a jackknife-based reference interval. The jackknife provides an estimate of the sensitivity of each correlation coefficient to individual observations.

Starting from the observed dataset with n observations, we computed leave-one-out correlation matrices by removing one observation at a time and recomputing the correlation matrix. This procedure yielded n jackknife correlation matrices. For each correlation pair, the 95% interval across these matrices reflected the range of correlation values that one may expect from ordinary single-observation fluctuations in the data.

After the repair step each correlation was compared with its jackknife interval. If the repaired correlation fell within the interval, the adjustment was considered consistent with natural sampling variability of the data.

3. Results

Illustrative Case Study

This section illustrates the diagnose–localize–shrinkage pipeline using a controlled example. The goal is to show how a non-convergent SEM fit can be stabilized through a local adjustment of the covariance matrix.

We simulated a dataset of size

n = 30

from the population model shown in Figure 1. The random seed was chosen such that fitting the SEM to the resulting VCOV would not converge.

Step 1:: SEM fit on $R_{s}$ (non-convergence).

We fitted the SEM using lavaan with maximum likelihood estimation. The model did not converge as expected:

lavaan 0.6-21 did NOT end normally after 2302 iterations
** WARNING ** Estimates below are most likely unreliable
	 
  Estimator                                           ML
  Optimization method                             NLMINB
  Number of model parameters                          13
	 
  Number of observations                              30

Step 2:: Diagnose and localize the instability.

The VCOV matrix was transformed to its correlation matrix

R_{s}

(see Table 3), after which the unique off-diagonal correlations were Fisher-z transformed and evaluated using the trained XGBoost diagnosis model. For each correlation pair, a SHAP value was computed to quantify its contribution to the model’s prediction of non-convergence for this specific case.

Unhappy correlations were selected using the SHAP-based procedure described in Section SHAP-Based Localization of Unhappy Correlation Pairs. Unhappy correlations were identified using two criteria: a percentile score of at least 0.50 and a minimum strength score of 0.05. Note that these thresholds were chosen heuristically and are not intended as optimal cut-off values, but rather as practical criteria for identifying a small set of unhappy correlations. Table 4 shows the resulting SHAP values, percentile scores, and strength scores for all correlation pairs.

This resulted in two unhappy correlations,

x_{3}

–

x_{2}

and

y_{1}

–

x_{2}

. The positive SHAP value for

x_{3}

–

x_{2}

indicated that this correlation pair increased the predicted risk of non-convergence for the example matrix. The second selected pair,

y_{1}

–

x_{2}

, also contributed positively to the non-convergence prediction. These two pairs therefore defined the primary repair targets. The localized shrinkage update was then applied to the correlation pattern surrounding these targets: the unhappy correlations received the strongest adjustment, while connected correlations were adjusted simultaneously according to the sigmoid weighting scheme.

Step 3:: Repair with sigmoid-smoothed shrinkage.

The repair extended beyond the two unhappy correlations. These unhappy correlations were treated as the primary repair targets and received the direct shrinkage adjustment. In addition, a sigmoid smoothing function was applied to correlations that shared a variable with an unhappy correlation, so that the surrounding correlation pattern could also be adjusted. For each connected correlation

r_{i j}

, a weight was computed using a logistic function with center

= 0.65

, sharpness

= 10

, weight_min

= 0.00

, and weight_max

= 0.30

. As a result, connected correlations with an absolute value well below 0.65 received virtually no additional adjustment, whereas connected correlations substantially above 0.65 approached the maximum adjustment weight of 0.30. The parameter sharpness

= 10

yielded a relatively steep transition around the center point. Consequently, the unhappy correlations received the primary shrinkage adjustment, while moderate to strong connected correlations were adjusted proportionally and weak connected correlations remained essentially unchanged. For the selected unhappy correlations and their neighbors, the shrinkage update is

r_{i j}^{(rep)} = (1 - λ_{used}) r_{i j}^{(n)} + λ_{used} r_{i j}^{(target)} .

The shrinkage level determined by line search was

λ_{used} = 0.016 .

Using

(1 - 0.016) R_{s} + 0.016 R_{target}

, the resulting repaired correlations are summarized in Table 5, which jointly reports the sample correlations (

R_{s}

), repaired values (

R_{rep}

), and corresponding repair adjustments (

Δ R = R_{rep} - R_{s}

).

The two unhappy correlations changed only slightly:

R_{s} (x_{3}, x_{2}) = 0.025 \to R_{rep} (x_{3}, x_{2}) = 0.037,

R_{s} (y_{1}, x_{2}) = - 0.385 \to R_{rep} (y_{1}, x_{2}) = - 0.382 .

No nearest positive definite projection was required. The repair therefore consisted of a minimal adjustment. As shown in Table 5, all changes were small in magnitude, with adjustments typically on the order of

10^{- 3}

to

10^{- 2}

. The largest modifications occurred for correlations directly connected to the unhappy ones, while the majority of correlations remained nearly unchanged.

To put the size of the repair into perspective, we compared the repaired correlations with the variability of the observed correlations using a leave-one-out jackknife procedure. Starting from the observed dataset with 30 observations, we recomputed the correlation matrix 30 times, each time leaving out one observation. This yielded 30 jackknife correlation matrices. For each correlation pair, we determined the 2.5th and 97.5th percentiles across the jackknife replicates, representing the range of correlation values expected under ordinary sampling fluctuations.

The resulting 95% jackknife bounds are also reported in Table 5. Comparing

R_{rep}

with these bounds showed that all repaired correlations remained well within the range induced by sampling variability. This indicated that the repair was effective and remained small relative to the natural variability in the observed correlations.

Thus, the repair can be understood as a local stabilization of the correlation pattern identified by the SHAP diagnosis. Rather than replacing the full matrix, the update moved the destabilizing pairs and their connected correlations toward a stable target matrix.

Step 4:: SEM refit on $R_{rep}$ (convergence).

The original covariance matrix S was rescaled to its corresponding correlation matrix

R_{s}

to remove scale effects. Model training, diagnosis and shrinkage were performed on the correlation scale. Because this transformation is reversible, the repaired matrix can be mapped back to a covariance matrix, and the SEM can therefore be refitted on the covariance scale if desired. We refitted the same SEM to

R_{rep}

using maximum likelihood estimation. After the repair, the model converged normally after 41 iterations:

lavaan 0.6-21 ended normally after 41 iterations
 
  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13
 
  Number of observations                            30

Thus, the originally non-convergent SEM converged after a small, SHAP-guided local shrinkage update of the identified destabilizing correlation patterns.

4. Discussion

In this paper, we proposed a diagnose–localize–shrinkage framework to address non-convergence in small-sample structural equation modeling (SEM). In contrast to existing approaches, which treat non-convergence either as a global optimization problem or as a direct consequence of limited information, our framework conceptualizes non-convergence as the result of local pathologies in the sample covariance matrix and their interaction with a given SEM specification.

A central contribution of this paper is the strict separation between model structure and diagnostic information. Covariance matrices are generated adversarially and independently of the SEM, while convergence is treated solely as an outcome variable. This design avoids information leakage from the model into the diagnostic stage. This separation is important, as many simulation studies confound model misspecification, sampling variability, and numerical instability, which makes it difficult to identify the mechanisms underlying non-convergence. The results show that specific local covariance patterns, such as extreme or internally inconsistent covariances, are systematically associated with non-convergence, even when the SEM itself is correctly specified.

Furthermore, this study demonstrates that machine learning models, such as XGBoost, can serve as a diagnostic tool in SEM. The XGBoost classifier was not primarily used to maximize predictive accuracy, but to identify via SHAP values which specific covariance pairs contribute to non-convergence for a given model. The strong predictive performance indicates that problematic covariance patterns follow consistent and detectable patterns. Importantly, these patterns are interpretable at the level of individual covariance elements, which makes local shrinkage possible.

Another key feature of the proposed framework is the local nature of the shrinkage update. Global shrinkage methods, such as ridge or Tikhonov regularization, have well-understood effects on the covariance or correlation matrix, but they modify the matrix as a whole. By contrast, the proposed approach restricts the update to a small set of unhappy correlations and their connected correlations. The illustrative case study shows that relatively small, targeted adjustments can be sufficient to restore convergence without substantially altering the overall covariance patterns.

We do not claim that the local repair has the same general statistical interpretation as global shrinkage. This local procedure is data-, model-, and diagnosis-dependent. It therefore has a diagnostic purpose: to show whether targeted local adjustments can restore convergence under a specified target structure. The repaired matrix can be used as an alternative sample matrix for subsequent SEM fitting, although users may also choose other remedies, such as changing the model, collecting more data, using another estimator, applying global shrinkage, or reporting the instability without repair.

A natural refinement of the procedure is to apply corrections sequentially rather than simultaneously, starting with the most influential unhappy correlation, refitting the model, and continuing only if needed. This strategy may achieve convergence with smaller overall modifications, although a larger shrinkage factor per step may be required. The trade-off between the number of corrected correlations and the magnitude of each adjustment can be monitored against the jackknife-based variability bounds described in Section 2.3.3. We leave a systematic evaluation of this sequential variant to future work.

Several limitations should be acknowledged. First, the diagnostic step is inherently model-specific. The same covariance matrix may lead to non-convergence for one SEM specification while posing no problems for another. Although this dependency is an explicit part of the framework, it implies that diagnostic models need to be retrained for substantively different SEM structures. Second, the analysis focuses on convergence as a binary outcome. Convergence alone does not guarantee substantive validity. As illustrated by the extreme pathology condition, numerical convergence may coincide with biased parameter estimates, distorted standard errors, or other inadmissible solutions, particularly in small samples.

Closely related to this issue is the question of how local shrinkage affects the statistical properties of the estimated model parameters. Although the shrinkage strategy is targeted, its impact on parameter bias, variance, mean squared error, and substantive research questions is currently unknown. In line with the model-based shrinkage approaches in [4], future simulation studies are needed to systematically evaluate these effects. The proposed framework should therefore be viewed as a stabilizing preprocessing step rather than as a guarantee of model validity.

An additional limitation is that the effects of the hyperparameters in the sigmoid transformation used to scale local covariance adjustments are not yet well understood. The different choices for its slope and midpoint may influence both the strength and the localization of the repair. Similarly, the selection of quantile-based cutoff values used to identify extreme SHAP contributions is currently heuristic. There is no criterion for their selection yet. It is also unclear how sensitive the results are to these choices.

A further limitation concerns the diagnostic instrument itself. As discussed in Section SHAP-Based Localization of Unhappy Correlation Pairs, SHAP-based attribution assumes feature independence that does not strictly hold when features are pairwise correlations from a single matrix, and SHAP values quantify contributions to model predictions rather than causal effects on convergence. In our framework, the validity of the localization is therefore not asserted on the basis of the SHAP values alone but evaluated empirically, through whether the proposed repair restores convergence. Future work could examine attribution methods that explicitly accommodate feature dependence, or compare SHAP-based localization with alternative selection strategies—for instance, sensitivity-based approaches that perturb individual covariance pairs and measure the resulting change in convergence.

Despite these limitations, the proposed framework offers something that existing approaches to non-convergence do not: a case-specific diagnosis of which covariance elements drive estimation failure for a given SEM, combined with a localized intervention whose effect is empirically verifiable. Global shrinkage, regularization, and Bayesian priors stabilize estimation but treat the input covariance pattern as a black box. The framework introduced here opens that box, identifies which specific correlation pairs the classifier associates with non-convergence, and offers a minimally invasive adjustment that can be evaluated against the natural variability of the observed data.

Author Contributions

Conceptualization, L.V. and Y.R.; methodology, L.V. and Y.R.; software, L.V.; validation, L.V. and Y.R.; formal analysis, L.V.; investigation, L.V.; data curation, L.V.; writing—original draft preparation, L.V.; writing—review and editing, Y.R.; visualization, L.V.; supervision, Y.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Special Research Fund (BOF) of Ghent University through the BOF Basic Research Funding (Grant/Project No. BOF/BAF/4Y/2024/01/1035).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All tools required to reproduce this illustrative example or to apply the method to other models are available at https://github.com/LeonardV/repairing_unhappy_covariances (accessed on 27 May 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Illustrative Baseline Templates (p = 6)

This appendix shows an illustrative template of the baseline templates used for adversarially generated VCOV matrices. Each matrix is transformed to a correlation matrix with unit diagonal and constructed such that its characteristic structure is visible from the numerical configuration of its entries.

Appendix A.1. Wishart-Type Template

R_{wishart} = [\begin{matrix} \begin{matrix} 1.00 & 0.18 & - 0.12 & 0.09 & - 0.21 & 0.05 \\ 0.18 & 1.00 & 0.26 & - 0.08 & 0.11 & - 0.17 \\ - 0.12 & 0.26 & 1.00 & 0.14 & - 0.06 & 0.22 \\ 0.09 & - 0.08 & 0.14 & 1.00 & 0.19 & - 0.10 \\ - 0.21 & 0.11 & - 0.06 & 0.19 & 1.00 & 0.07 \\ 0.05 & - 0.17 & 0.22 & - 0.10 & 0.07 & 1.00 \end{matrix} \end{matrix}]

The mixed pattern of moderate positive and negative correlations reflects the pattern typically obtained from

S = A^{⊤} A

with Gaussian entries.

Appendix A.2. Block-Structured Template

Variables within the same block are strongly correlated, while correlations between blocks are small. This pattern clearly indicates two distinct blocks.

Appendix A.3. Toeplitz-like Template

R_{toeplitz} = [\begin{matrix} 1.00 & 0.50 & 0.25 & 0.12 & 0.06 & 0.03 \\ 0.50 & 1.00 & 0.50 & 0.25 & 0.12 & 0.06 \\ 0.25 & 0.50 & 1.00 & 0.50 & 0.25 & 0.12 \\ 0.12 & 0.25 & 0.50 & 1.00 & 0.50 & 0.25 \\ 0.06 & 0.12 & 0.25 & 0.50 & 1.00 & 0.50 \\ 0.03 & 0.06 & 0.12 & 0.25 & 0.50 & 1.00 \end{matrix}]

Correlations depend solely on the lag

| i - j |

, producing constant subdiagonals parallel to the main diagonal. Correlations become smaller as variables are further apart, which characterizes ordered dependence.

Appendix A.4. Low-Rank Template

R_{lowrank} = [\begin{matrix} \begin{matrix} 1.00 & 0.77 & 0.73 & - 0.74 & 0.64 & 0.75 \\ 0.77 & 1.00 & 0.83 & - 0.85 & 0.73 & 0.85 \\ 0.73 & 0.83 & 1.00 & - 0.80 & 0.69 & 0.80 \\ - 0.74 & - 0.85 & - 0.80 & 1.00 & - 0.70 & - 0.82 \\ 0.64 & 0.73 & 0.69 & - 0.70 & 1.00 & 0.71 \\ 0.75 & 0.85 & 0.80 & - 0.82 & 0.71 & 1.00 \end{matrix} \end{matrix}]

The matrix has an approximately low-rank pattern. The first eigenvalue explains 80.1% of the total variance, indicating that most dependence is driven by a single dominant component.

Appendix A.5. Uniform Random Template

R_{uniform} = [\begin{matrix} \begin{matrix} 1.00 & - 0.61 & 0.08 & 0.47 & - 0.22 & 0.31 \\ - 0.61 & 1.00 & - 0.35 & 0.12 & 0.58 & - 0.06 \\ 0.08 & - 0.35 & 1.00 & - 0.74 & 0.19 & 0.44 \\ 0.47 & 0.12 & - 0.74 & 1.00 & - 0.11 & - 0.52 \\ - 0.22 & 0.58 & 0.19 & - 0.11 & 1.00 & - 0.67 \\ 0.31 & - 0.06 & 0.44 & - 0.52 & - 0.67 & 1.00 \end{matrix} \end{matrix}]

The matrix represents a uniform random pattern, as its off-diagonal entries are sampled from a uniform distribution, leading to freely varying signs and magnitudes across the matrix.

Appendix B. Pathology Injection Mechanisms

Starting from a baseline correlation matrix R as described in Appendix A of dimension

p \times p

, controlled pathologies are injected to create correlation structures that are known to cause numerical difficulties in structural equation modeling. Pathologies are applied according to a severity level

s \in {1, 2, 3, 4}

, where 1 = mild, 2 = moderate, 3 = severe, and 4 = extreme represents the deviations from well-conditioned correlation matrices.

Appendix B.1. Near-Singularity

Near-singular matrices are created by shrinking the smallest eigenvalue of R. Let

R = V Λ V^{⊤}

denote the eigendecomposition of R. The smallest eigenvalue

λ_{\min}

is replaced by a target value drawn from a severity range:

λ_{\min}^{*} \sim \{\begin{matrix} 10^{U (- 2, - 1)} & s = 1 \\ 10^{U (- 3, - 2)} & s = 2 \\ 10^{U (- 4, - 3)} & s = 3 \\ 10^{U (- 5, - 4)} & s = 4 \end{matrix}

where

U (a, b)

denotes a uniform distribution on

[a, b]

. The modified matrix is reconstructed as

R^{*} = V Λ^{*} V^{⊤} .

Finally, the matrix is rescaled to a correlation matrix to ensure unit diagonal.

Appendix B.2. Indefinite Matrices

Indefinite matrices are created by forcing the smallest eigenvalue to become negative. Again using the eigendecomposition

R = V Λ V^{⊤}

, the smallest eigenvalue is replaced by

λ_{\min}^{*} \sim \{\begin{matrix} - U (0.01, 0.05) & s = 1 \\ - U (0.05, 0.10) & s = 2 \\ - U (0.10, 0.25) & s = 3 \\ - U (0.25, 0.50) & s = 4 . \end{matrix}

The resulting matrix

R^{*} = V Λ^{*} V^{⊤}

is numerically adjusted (i.e., sanitized). Specifically, non-finite entries are replaced by zero, the matrix is symmetrized as

R = (R + R^{⊤}) / 2

, and the diagonal is set to one.

Appendix B.3. Correlation Clusters

Correlation clusters are created by selecting a random subset of variables and imposing high bivariate correlations within that subset. Let k denote the cluster size, sampled uniformly from

k \sim {3, \dots, \min (5, p)} .

For the selected subset, bivariate correlations

ρ

are drawn from severity ranges:

ρ \sim \{\begin{matrix} U (0.60, 0.80) & s = 1 \\ U (0.75, 0.90) & s = 2 \\ U (0.85, 0.95) & s = 3 \\ U (0.95, 0.98) & s = 4 . \end{matrix}

These values are inserted into the corresponding submatrix while keeping the diagonal equal to one. The matrix is then sanitized.

Appendix B.4. Mixed Pathologies

The mixed condition combines multiple mechanisms in order to mimic more realistic empirical pathologies. The applied combinations depend on severity:

$s = 1$ : correlation cluster + mild near-singularity;
$s = 2$ : correlation cluster + moderate near-singularity;
$s = 3$ : strong correlation cluster + mild indefiniteness;
$s = 4$ : strong correlation cluster + moderate indefiniteness + near-singularity.

The resulting matrix is finally rescaled to a correlation matrix.

Appendix C. SHAP-Based Localization of Unhappy Correlation Pairs

Here, we localize the specific correlation pairs that contribute most to predicted non-convergence. The following steps describe how the observed correlation matrix is processed and reduced to a small set of unhappy correlations.

Step 0:: Fixed feature schema and vectorization

We start from a correlation matrix

R_{s}

of size

p \times p

. Each feature corresponds to one off-diagonal element from the lower triangle of

R_{s}

, stored under a fixed naming scheme (the schema) of the form Vi_Vj. Using this schema, the matrix is vectorized as

x = vech (R_{s}) \in R^{p (p - 1) / 2} .

Step 1:: Fisher-z transform

Prior to classification, each correlation

r_{i j}

is transformed using the Fisher-z transformation,

z_{i j} = \frac{1}{2} \log (\frac{1 + r_{i j}}{1 - r_{i j}}),

with numerical clipping (i.e.,

1 \times 10^{- 6}

) to avoid

\pm 1

in practice. The XGBoost model is trained and evaluated on these Fisher-z features.

Step 2:: Pairwise contribution analysis using SHAP

For a given case, SHAP expresses the model prediction as a sum of contributions from individual features. Let

ϕ_{i j}

denote the SHAP value for correlation pair

(i, j)

, and let

ϕ_{0}

denote the baseline prediction of the model. On the logit scale,

logit (\hat{p}) = ϕ_{0} + \sum_{i < j} ϕ_{i j} .

Each

ϕ_{i j}

indicates whether, and by how much, correlation pair

(i, j)

increases (

ϕ_{i j} > 0

) or decreases (

ϕ_{i j} < 0

) the predicted likelihood of non-convergence, relative to the baseline and keeping all other correlations fixed.

Step 3:: Identify unhappy correlations

Unhappy correlation selection starts by keeping only those correlation pairs that increase the predicted risk of non-convergence. In our implementation, we use the threshold

ϕ_{i j} > 0,

so that only unhappy correlation pairs are kept.

Step 4:: Filtering by global SHAP importance

A positive SHAP value may be important for a single case, but we additionally require that it is unusual compared to the overall behavior of the classifier. To examine this, we construct the empirical distribution of absolute SHAP values across all observations and all features in the training and validation data,

D = {| ϕ_{i j}^{(m)} | : m \in train + valid, i < j} .

For each candidate pair, we compute its percentile within this distribution,

π_{i j} = F_{D} (| ϕ_{i j} |),

where

F_{D}

denotes the empirical cumulative distribution function of

D

. By default, we keep only pairs satisfying

π_{i j} \geq τ_{percentile},

where

τ_{percentile} \in (0, 1)

is a user-defined threshold.

Step 5:: Strength

Local impact and global relevance are combined into a strength score,

a_{i j} = ϕ_{i j} π_{i j} .

Pairs are retained if

a_{i j} \geq τ_{strength},

where

τ_{strength} > 0

is a user-specified tuning parameter.

Step 6:: Final unhappy correlation selection.

All remaining pairs are ranked in decreasing order of

a_{i j}

.

References

Bentler, P.M.; Yuan, K.H. Structural Equation Modeling with Small Samples: Test Statistics. Multivar. Behav. Res. 1999, 34, 181–197. [Google Scholar] [CrossRef]
Nevitt, J.; Hancock, G.R. Evaluating Small Sample Approaches for Model Test Statistics in Structural Equation Modeling. Multivar. Behav. Res. 2004, 39, 439–478. [Google Scholar] [CrossRef]
Arruda, E.H.; Bentler, P.M. A Regularized GLS for Structural Equation Modeling. Struct. Equ. Model. A Multidiscip. J. 2017, 24, 657–665. [Google Scholar] [CrossRef]
De Jonckere, J.; Rosseel, Y. A Model-Based Shrinkage Target to Avoid Nonconvergence in Small Sample SEM. Struct. Equ. Model. A Multidiscip. J. 2023, 30, 941–955. [Google Scholar] [CrossRef]
Touloumis, A. Nonparametric Stein-type shrinkage covariance matrix estimators in high-dimensional settings. Comput. Stat. Data Anal. 2015, 83, 251–261. [Google Scholar] [CrossRef]
Yuan, K.H.; Chan, W. Structural equation modeling with near singular covariance matrices. Comput. Stat. Data Anal. 2008, 52, 4842–4858. [Google Scholar] [CrossRef]
Yuan, K.H.; Wu, R.; Bentler, P.M. Ridge Structural Equation Modeling with Correlation Matrices for Ordinal and Continuous Data. Br. J. Math. Stat. Psychol. 2011, 64, 107–133. [Google Scholar] [CrossRef]
Yuan, K.H.; Chan, W. Structural equation modeling with unknown population distributions: Ridge Generalized Least Squares. Struct. Equ. Model. A Multidiscip. J. 2016, 23, 163–179. [Google Scholar] [CrossRef]
Walther, J.K.; Hecht, M.; Zitzmann, S. Shrinking Small Sample Problems in Multilevel Structural Equation Modeling via Regularization of the Sample Covariance Matrix. Struct. Equ. Model. A Multidiscip. J. 2025, 32, 46–65. [Google Scholar] [CrossRef]
Huang, P.H.; Chen, H.; Weng, L.J. A Penalized Likelihood Method for Structural Equation Modeling. Psychometrika 2017, 82, 329–354. [Google Scholar] [CrossRef]
Jacobucci, R.; Grimm, K.; McArdle, J. Regularized Structural Equation Modeling. Struct. Equ. Model. A Multidiscip. J. 2016, 23, 555–566. [Google Scholar] [CrossRef]
Asparouhov, T.; Muthén, B. Penalized Structural Equation Models. Struct. Equ. Model. A Multidiscip. J. 2024, 31, 429–454. [Google Scholar] [CrossRef]
Le, T.T.; Vermunt, J.K.; Ballhausen, N.; Van Deun, K. Exploratory Structural Equation Modeling and the Curse of Dimensionality. Behav. Res. Methods 2026, 58, 84. [Google Scholar] [CrossRef]
Lee, S.Y.; Song, X.Y. Evaluation of the Bayesian and Maximum Likelihood Approaches in Analyzing Structural Equation Models with Small Sample Sizes. Multivar. Behav. Res. 2004, 39, 653–686. [Google Scholar] [CrossRef]
Smid, S.; Rosseel, Y. Small Sample Size Solutions. In Chapter SEM with Small Samples: Two-Step Modeling and Factor Score Regression Versus Bayesian Estimation with Informative Priors; Routledge: London, UK, 2020; pp. 239–254. [Google Scholar] [CrossRef]
Smid, S.C. Bayesian SEM with Small Samples: Precautions and Guidelines. Ph.D. Thesis, Utrecht University, Utrecht, The Netherlands, 2023. [Google Scholar] [CrossRef]
Van Erp, S.; Mulder, J.; Oberski, D.L. Prior sensitivity analysis in default Bayesian structural equation modeling. Psychol. Methods 2018, 23, 363–388. [Google Scholar] [CrossRef] [PubMed]
Van Erp, S. Bayesian Regularized SEM: Current Capabilities and Constraints. Psych 2023, 5, 814–835. [Google Scholar] [CrossRef]
De Jonckere, J.; Rosseel, Y. Using Bounded Estimation to Avoid nonconvergence in Small Sample Structural Equation Modeling. Struct. Equ. Model. A Multidiscip. J. 2022, 29, 412–427. [Google Scholar] [CrossRef]
Lüdtke, O.; Ulitzsch, E.; Robitzsch, A. A Comparison of Penalized Maximum Likelihood Estimation and Markov Chain Monte Carlo Techniques for Estimating Confirmatory Factor Analysis Models With Small Sample Sizes. Front. Psychol. 2021, 12, 615162. [Google Scholar] [CrossRef]
Bogaert, J.; Loh, W.W.; Rosseel, Y. A Small Sample Correction for Factor Score Regression. Educ. Psychol. Meas. 2023, 83, 495–519. [Google Scholar] [CrossRef] [PubMed]
Croon, M. Using predicted latent scores in general latent structure models. In Latent Variable and Latent Structure Models; Marcoulides, G., Moustaki, I., Eds.; Lawrence Erlbaum: Washington, DC, USA, 2002; pp. 195–224. Pagination: 288. [Google Scholar]
Skrondal, A.; Laake, P. Regression Among Factor Scores. Psychometrika 2001, 66, 563–575. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation. J. Financ. Econom. 2022, 20, 187–218. [Google Scholar] [CrossRef]
De Jonckere, J. Nonconvergence in Small Sample Structural Equation Modeling. Ph.D. Thesis, Ghent University, Gent, Belgium, 2023. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Lewandowski, D.; Kurowicka, D.; Joe, H. Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 2009, 100, 1989–2001. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Higham, N.J. Computing the Nearest Correlation Matrix - A Problem from Finance. IMA J. Numer. Anal. 2002, 22, 329–343. [Google Scholar] [CrossRef]
R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025; ISBN 3-900051-07-0. [Google Scholar]
Rosseel, Y. lavaan: An R Package for Structural Equation Modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef]
Neale, M.C.; Hunter, M.D.; Pritikin, J.N.; Zahery, M.; Brick, R.T.; Kirkpatrick, B.R.M.; Estabrook, R.; Bates, T.C.; Maes, H.H.; Boker, S.M. OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika 2016, 81, 535–549. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. xgboost: Extreme Gradient Boosting, R Package Version 3.2.1.1. 2026. Available online: https://cran.r-project.org/web/packages/xgboost/index.html (accessed on 27 May 2026). [CrossRef]
Bilodeau, B.; Jaques, N.; Koh, P.W.; Kim, B. Impossibility Theorems for Feature Attribution. Proc. Natl. Acad. Sci. USA 2024, 121, e2304406120. [Google Scholar] [CrossRef]

Figure 1. A simple model where the latent variable Y is regressed on the latent variable X. Each latent variable is measured by three observed indicators.

Figure 2. Overview of the proposed diagnose–localize–shrinkage framework. Sample covariance matrices are generated independently of the fitted SEM and modified through controlled pathology injection. A classifier predicts SEM non-convergence from Fisher-z-transformed, vectorized unique off-diagonal correlations. SHAP values then identify locally destabilizing correlation pairs. These pairs guide a localized shrinkage update toward a stable model-based target matrix. Candidate repaired matrices are accepted only if they satisfy the positive-definiteness criterion and yield a convergent SEM refit.

Figure 3. Model-independent diagnostics of the adversarial matrix-generation procedure by pathology severity. Panels show the smallest eigenvalue, the log-transformed condition number, the maximum absolute off-diagonal correlation, and the proportion of matrices with at least one non-positive eigenvalue. These diagnostics were computed before fitting the model and therefore assess whether the generated sample matrices exhibited the intended numerical pathologies independently of model convergence.

Table 1. Convergence and non-convergence rates by severity level for cases with identical convergence status in lavaan and OpenMx. Percentages are reported per severity level, with absolute counts shown in parentheses.

Severity Level	Converged	Non-Converged
Clean	48.31% (9540)	51.69% (10,209)
Mild	21.14% (2331)	78.86% (8698)
Moderate	11.33% (762)	88.67% (5961)
Severe	5.36% (253)	94.64% (4464)
Extreme	28.05% (620)	71.95% (1590)

Table 2. Predictive performance of the XGBoost classifier for detecting non-convergence based on correlation features.

Dataset	Log Loss	AUC–PR	AUC–ROC	Brier
Validation	0.276	0.973	0.947	0.084
Test	0.280	0.971	0.944	0.085

Table 3. Sample correlation matrix

R_{s}

.

Table 3. Sample correlation matrix

R_{s}

.

	$x_{1}$	$x_{2}$	$x_{3}$	$y_{1}$	$y_{2}$	$y_{3}$
$x_{1}$	1.00
$x_{2}$	0.48	1.00
$x_{3}$	0.22	0.03	1.00
$y_{1}$	−0.14	−0.39	−0.22	1.00
$y_{2}$	−0.11	0.03	−0.07	0.23	1.00
$y_{3}$	−0.08	−0.25	−0.21	0.36	0.24	1.00

Table 4. SHAP contributions of correlation pairs for the example matrix. Positive SHAP values indicate unhappy correlations. The two unhappy correlations are highlighted in bold.

Correlation Pair	SHAP	$\| SHAP \|$	Percentile	Strength
$x_{3}$ – $x_{2}$	0.82	0.82	0.96	0.78
$y_{1}$ – $x_{2}$	0.45	0.45	0.81	0.37
$x_{2}$ – $x_{1}$	0.18	0.18	0.47	0.09
$y_{3}$ – $x_{2}$	0.15	0.15	0.39	0.06
$y_{2}$ – $x_{2}$	0.06	0.06	0.17	0.01
$y_{2}$ – $y_{1}$	0.04	0.04	0.11	0.00
$y_{1}$ – $x_{1}$	0.03	0.03	0.07	0.00
$y_{3}$ – $y_{2}$	−0.04	0.04	0.11	0.00
$y_{2}$ – $x_{1}$	−0.04	0.04	0.12	−0.01
$y_{3}$ – $x_{3}$	−0.06	0.06	0.17	−0.01
$y_{3}$ – $x_{1}$	−0.07	0.07	0.19	−0.01
$y_{1}$ – $x_{3}$	−0.10	0.10	0.27	−0.03
$x_{3}$ – $x_{1}$	−0.19	0.19	0.49	−0.09
$y_{2}$ – $x_{3}$	−0.20	0.20	0.50	−0.10
$y_{3}$ – $y_{1}$	−0.32	0.32	0.70	−0.23

Table 5. SHAP contributions, sample correlations (

R_{s}

), repaired values (

R_{rep}

), repair adjustment (

Δ R

), and 95% jackknife bounds. Positive SHAP values indicate unhappy correlations. The two unhappy correlations are highlighted in bold.

Table 5. SHAP contributions, sample correlations (

R_{s}

), repaired values (

R_{rep}

), repair adjustment (

Δ R

), and 95% jackknife bounds. Positive SHAP values indicate unhappy correlations. The two unhappy correlations are highlighted in bold.

Correlation Pair	SHAP	$R_{s}$	$R_{rep}$	$Δ R$	Lower	Upper
$x_{3}$ – $x_{2}$	0.82	0.025	0.037	0.012	−0.071	0.113
$y_{1}$ – $x_{2}$	0.45	−0.385	−0.382	0.004	−0.443	−0.330
$x_{2}$ – $x_{1}$	0.18	0.480	0.485	0.005	0.444	0.522
$y_{3}$ – $x_{2}$	0.15	−0.247	−0.246	0.001	−0.311	−0.161
$y_{2}$ – $x_{2}$	0.06	0.025	0.022	−0.003	−0.035	0.094
$y_{2}$ – $y_{1}$	0.04	0.226	0.235	0.009	0.101	0.273
$y_{1}$ – $x_{1}$	0.03	−0.142	−0.142	0.000	−0.191	−0.078
$y_{3}$ – $y_{2}$	−0.04	0.236	0.236	0.000	0.170	0.294
$y_{2}$ – $x_{1}$	−0.04	−0.110	−0.110	0.000	−0.187	−0.024
$y_{3}$ – $x_{3}$	−0.06	−0.212	−0.211	0.001	−0.293	−0.134
$y_{3}$ – $x_{1}$	−0.07	−0.081	−0.081	0.000	−0.130	−0.009
$y_{1}$ – $x_{3}$	−0.10	−0.219	−0.218	0.001	−0.277	−0.158
$x_{3}$ – $x_{1}$	−0.19	0.220	0.229	0.009	0.114	0.297
$y_{2}$ – $x_{3}$	−0.20	−0.073	−0.074	−0.001	−0.130	0.014
$y_{3}$ – $y_{1}$	−0.32	0.361	0.368	0.007	0.312	0.419

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vanbrabant, L.; Rosseel, Y. Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM. Mathematics 2026, 14, 1936. https://doi.org/10.3390/math14111936

AMA Style

Vanbrabant L, Rosseel Y. Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM. Mathematics. 2026; 14(11):1936. https://doi.org/10.3390/math14111936

Chicago/Turabian Style

Vanbrabant, Leonard, and Yves Rosseel. 2026. "Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM" Mathematics 14, no. 11: 1936. https://doi.org/10.3390/math14111936

APA Style

Vanbrabant, L., & Rosseel, Y. (2026). Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM. Mathematics, 14(11), 1936. https://doi.org/10.3390/math14111936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Diagnosis and Local Shrinkage of Covariance-Level Pathologies in SEM

Abstract

1. Introduction

2. Materials and Methods

2.1. Adversarial Generation of VCOV Matrices and Pathology Injection

2.2. An XGBoost Model to Detect Local Correlation Instability

SHAP-Based Localization of Unhappy Correlation Pairs

2.3. Shrinkage Approach for Targeted Repair of Correlation Patterns

2.3.1. Construction of the Model-Based Target Correlation Matrix

2.3.2. Localized Shrinkage Update of Unhappy Correlations

2.3.3. Jackknife-Based Reference Interval

3. Results

Illustrative Case Study

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Illustrative Baseline Templates (p = 6)

Appendix A.1. Wishart-Type Template

Appendix A.2. Block-Structured Template

Appendix A.3. Toeplitz-like Template

Appendix A.4. Low-Rank Template

Appendix A.5. Uniform Random Template

Appendix B. Pathology Injection Mechanisms

Appendix B.1. Near-Singularity

Appendix B.2. Indefinite Matrices

Appendix B.3. Correlation Clusters

Appendix B.4. Mixed Pathologies

Appendix C. SHAP-Based Localization of Unhappy Correlation Pairs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI