Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects
Abstract
1. Introduction
2. Methods
2.1. Kernel Density Estimation (KDE) Baseline
2.2. Gaussian Copula
2.3. Coverage and Evaluation Metrics
- 1.
- Log-Likelihood (LL): The average test log-likelihood, which measures the model’s ability to fit the observed conditional density:
- 2.
- Coverage: The empirical proportion of true outcomes falling within the model’s 95% predictive region (e.g., ellipsoidal contour for Gaussian/Copula or Highest Density Region for KDE).
- 3.
- CATE RMSE: The root mean squared error between estimated and true CATE values, quantifying the accuracy of individual-level causal effect estimation:
2.4. Conditional Average Treatment Effect (CATE) Estimation
- Distributional CNN-LSTM: Samples are drawn from the predicted multivariate Gaussian distribution (or mixture distribution) parameterized by the network output [27].
2.5. Evaluation Details and Reproducibility
2.6. Comparison Metrics
- 1.
- Mean Log-Likelihood (MLL): For a multivariate observation , let the conditional predictive density be denoted by . The log-likelihood contribution of is
- 2.
- 90% Coverage: The proportion of test points falling within the model’s 90% credible region evaluates the calibration of predicted uncertainty [30]. For Gaussian-based models (CNN-LSTM, Gaussian Copula), this is computed using the Mahalanobis distance condition:
- 3.
- CATE RMSE: The root mean squared error of estimated conditional average treatment effects (CATEs) is
- 4.
3. Simulation Study
3.1. Bivariate Distribution
- 1.
- Distributional CNN-LSTM: A neural network composed of convolutional layers followed by an LSTM was trained to output the parameters of a bivariate Gaussian,Training minimized the negative log-likelihood of the observed training points . This approach allows heteroscedasticity and covariance to vary across samples.
- 2.
- Kernel Density Estimation (KDE): A two-dimensional KDE was fitted on the training data . Test densities were computed via bilinear interpolation across the evaluation grid. KDE captures multimodality but assumes smooth density surfaces.
- 3.
- Gaussian Copula: Empirical marginals with densities were combined with a Gaussian copula to estimate joint dependence. Test set densities were obtained as
- Mean Log-Likelihood (MLL): Average log-density of the test points under the fitted conditional model :
- 90% Confidence Coverage: For Gaussian-based models (CNN-LSTM, Gaussian Copula), this is the proportion of test points lying within the 90% confidence ellipse derived from the predicted covariance matrix:
- CATE Estimates: Synthetic treatment effects (the oracle values) were simulated. The accuracy of model-based CATE estimates was quantified using the Root Mean Squared Error (RMSE) and Bias on the test set:
3.1.1. General Setup
3.1.2. Binodal Distribution
- CNN-LSTM: Produces multiple ellipsoidal contours reflecting a flexible, localized structure capable of capturing multimodality. The color gradient represents the heterogeneity in estimated CATE values . This flexibility in modeling local uncertainty and CATE heterogeneity comes at a cost of potential over-fitting complexity.
- KDE: Provides a smooth, non-parametric approximation of the joint distribution. Due to smoothing, the visible modes are less sharply defined. The CATE scale is narrower compared to the CNN-LSTM, reflecting a less expressive capture of treatment heterogeneity but demonstrating good stability.
- Gaussian Copula: Generates an intrinsically elliptical dependence structure that models correlation between and . It underestimates the inherent multimodality of the underlying data compared to the CNN-LSTM and exhibits the smallest range of estimated CATE heterogeneity.
- Mean Log-Likelihood: KDE (−3.7248) achieves the highest value, indicating the strongest generative fit to the data; CNN-LSTM and Copula are slightly lower.
- 90% Coverage: Gaussian Copula () provides the best coverage, suggesting superior uncertainty calibration in this setting, followed by KDE (); CNN-LSTM () is closer to the nominal level but less calibrated than baselines.
- CATE RMSE: KDE () has the lowest RMSE, suggesting the most accurate treatment effect estimation in this relatively simple binodal scenario.
- CATE Bias: CNN-LSTM exhibits nearly zero bias (), while KDE slightly underestimates () and Copula slightly overestimates ().
3.1.3. Trimodal Distribution
- CNN-LSTM: Produces multiple ellipsoidal contours colored by . The varying sizes, orientations, and spatial distribution of these contours successfully capture the inherent multimodality, asymmetry, and heterogeneous correlations of the mixture components.
- KDE: Due to smoothing, the estimated density appears as a single smooth, large region. While KDE is non-parametric, the degree of mode overlap and the selected bandwidth lead to an oversimplified representation that fails to resolve the three distinct modes.
- Gaussian Copula: Represents dependencies using a single Gaussian-based elliptical structure. As a semi-parametric model with a Gaussian dependence function, it fundamentally underrepresents the underlying multimodality and, consequently, underestimates the true range of treatment effect heterogeneity.
- Mean Log-Likelihood: CNN-LSTM achieves the highest value, indicating the best generative fit to the observed data under this complex distributional structure.
- 90% Coverage: CNN-LSTM achieves , demonstrating superior calibration of its confidence intervals, closely matching the nominal level.
- CATE RMSE: CNN-LSTM has the lowest RMSE, reflecting superior accuracy in predicting heterogeneous treatment effects.
- CATE Bias: CNN-LSTM exhibits minimal bias, indicating nearly unbiased treatment effect estimates.
3.1.4. Quadrimodal Distribution
- CNN-LSTM: Displays multiple overlapping ellipses with varying sizes and orientations, capturing multimodality and asymmetric correlations. Ellipse colors correspond to , showing heterogeneous treatment effects across the distribution.
- KDE: Shows a single smooth ellipse representing the overall distribution. Captures global structure but fails to represent multimodality and asymmetric correlations.
- Gaussian Copula: Represents dependencies using a single Gaussian-based ellipse. Captures correlation structure but underestimates multimodality and treatment effect heterogeneity due to Gaussian assumptions.
- Mean Log-Likelihood: CNN-LSTM achieves the highest value (), indicating the best fit to observed data.
- 90% Coverage: CNN-LSTM achieves , suggesting confidence intervals closely match the true distribution.
- CATE RMSE: CNN-LSTM attains the lowest RMSE (), reflecting superior accuracy in predicting treatment effects.
- CATE Bias: CNN-LSTM exhibits the lowest bias (), indicating nearly unbiased CATE estimates.
3.2. Multivariate Distribution
3.2.1. Data Splitting and Preprocessing
3.2.2. Distributional CNN-LSTM Model
3.2.3. Baselines: KDE and Gaussian Copula
Kernel Density Estimation (KDE)
KDE Implementation Details
Gaussian Copula
Gaussian Copula Implementation Details
3.2.4. Evaluation Metrics for Density Models
- Mean Log-Likelihood (MLL): The average conditional log-density of the test outcomes :
- 90% Coverage: Percentage of test points lying within the estimated 90% probability region.
3.2.5. Simulated Treatment Effects (CATE)
3.2.6. CATE Estimation Approaches
Distributional CNN-LSTM
CNN-LSTM CATE Details
KDE-Based CATE
KDE CATE Detail
Gaussian Copula-Based CATE
Gaussian Copula CATE Details
3.2.7. Evaluation Metrics
Computational and Stability Metrics
3.2.8. Three-Variable Mixture Data
- Case 1 (): Two equally weighted components withCovariance structures differ: has moderate positive correlations, while includes both positive and negative dependencies.
- Case 2 (): Adds a third cluster,
- Case 3 (): Introduces
- Case 4 (): Adds
- Case 5 (): Adds
3.2.9. Model Comparison
- : KDE achieves the highest log-likelihood (least negative) and lowest RMSE, demonstrating the strongest overall performance in the simplest case. CNN-LSTM is competitive in both LL and RMSE and exhibits significantly lower bias than the Gaussian Copula.
- : Bias increases substantially across all models, with both CNN-LSTM and Copula bias reaching their highest values across the simulation scenarios. KDE retains the highest log-likelihood and the lowest RMSE.
- : RMSE rises across models, indicating increased difficulty in CATE estimation. The Gaussian Copula achieves the lowest RMSE and the best bias control (closest to zero), while the CNN-LSTM and KDE exhibit a large negative shift in bias.
- : RMSE values converge among all models, suggesting similar predictive accuracy in this highly overlapping case. The CNN-LSTM achieves the lowest CATE RMSE and the best bias control (closest to zero), indicating superior causal calibration.
- : The Gaussian Copula attains the lowest RMSE and the best bias control. KDE’s mean log-likelihood collapses (−27.6310) due to the severe curse of dimensionality in the three-dimensional, highly overlapping space. The CNN-LSTM remains stable in terms of log-likelihood and RMSE, exhibiting moderate negative bias.
3.2.10. CNN-LSTM Ellipsoids and CATE Visualization
- K = 2–3: Distinct ellipsoids form per cluster, with smoothly varying CATE gradients, indicating clear separation between treatment effects in the less complex scenarios.
- K = 4–5: Overlapping ellipsoids reflect mixed correlations and high-dimensional complexity; CATE variation emerges both within and across clusters, showcasing the model’s ability to handle ambiguous component assignments.
- K = 6: Dense and highly overlapping ellipsoids reveal maximal heterogeneity; the CNN-LSTM captures nonlinear dependencies and heterogeneous CATE more flexibly than the KDE or Copula baselines, which exhibited either collapsed density fit (KDE) or rigid correlation structures (Copula).
- 1.
- Flexible multivariate density modeling: Ellipsoid shapes, sizes, and orientations adapt locally to cluster-specific variances and asymmetric correlations, essential for accurate probabilistic forecasting.
- 2.
- Heterogeneous treatment effect representation: CATE values vary smoothly and non-linearly across the multivariate space, reflecting both local and global heterogeneity, which is critical for robust individual-level causal inference.
4. Real Data Analysis
4.1. Real Data Experiments: Iris Dataset
- 1.
- Distributional CNN-LSTM: a deep learning model capturing complex dependencies. Outputs predict distribution parameters for each , trained via negative log-likelihood using the Adam optimizer.
- 2.
- Kernel Density Estimation (KDE): a multivariate Gaussian kernel estimator [1], providing test densities .
- 3.
- Gaussian Copula models marginal distributions with dependence via a Gaussian copula [6]:
- 90% Confidence Coverage: proportion of test points contained within the predicted 90% probability region. For CNN-LSTM and Copula, this region is the confidence ellipsoid derived from the predictive covariance .
- Mean Log-Likelihood (MLL): average log-likelihood of test observations, measuring the generative fidelity of the model’s estimated density:
- CATE Metrics: for known or oracle treatment effects , the accuracy and systematic error of estimated are quantified:
- 90% Coverage: Both KDE and Gaussian Copula achieve nearly full coverage (), reflecting superior uncertainty calibration compared to the nominal level, possibly due to over-smoothing (KDE) or rigid dependence structure (Copula). CNN-LSTM slightly undercovers (), indicating its predicted variance is less conservative.
- Log-Likelihood: KDE and Copula attain the highest (least negative) log-likelihoods, indicating a superior fit of their estimated densities to the test data. CNN-LSTM has the lowest log-likelihood (), suggesting a trade-off between generative fidelity and conditional modeling flexibility.
- CATE_RMSE: CNN-LSTM attains the lowest RMSE (), substantially outperforming KDE () and Copula (). This demonstrates the CNN-LSTM’s superior ability to model the complex, conditional dependencies required for accurate individual-level CATE estimation, even when its overall density fit (MLL) is lower than the baselines.
4.2. Real Data Experiments: Criteo Uplift Dataset
5. Conclusions and Future Work
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986. [Google Scholar]
- Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed.; Wiley: New York, NY, USA, 2015. [Google Scholar]
- Friedman, J.H.; Stuetzle, W.; Schroeder, A. Projection Pursuit Density Estimation. J. Am. Stat. Assoc. 1984, 79, 599–608. [Google Scholar] [CrossRef]
- Li, Q.; Racine, J.S. Nonparametric Econometrics: Theory and Practice; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
- Genest, C.; Favre, A.C. Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 2009, 14, 465–476. [Google Scholar] [CrossRef]
- Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Joe, H. Dependence Modeling with Copulas; Chapman & Hall/CRC: London, UK, 2014. [Google Scholar]
- Bishop, C.M. Mixture Density Networks; Technical Report NCRG/94/004; Neural Computing Research Group: Singapore, 1994. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; NeurIPS Foundation: La Jolla, CA, USA, 2017; Volume 31, pp. 6405–6416. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Rezende, D.J.; Mohamed, S. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1530–1538. [Google Scholar]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using Real NVP. arXiv 2017, arXiv:1605.08803. [Google Scholar] [CrossRef]
- Tagasovska, N.; Ackerer, D.; Vatter, T. Copulas as high-dimensional generative models: Vine copula autoencoders. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; NeurIPS Foundation: La Jolla, CA, USA, 2019; Volume 32, pp. 6525–6537. [Google Scholar]
- Girard, S.; Gobet, E.; Pachebat, J. Deep Generative Modeling of Multivariate Dependent Extremes. 2024. Available online: https://inria.hal.science/hal-04700084v2/document (accessed on 19 March 2025).
- Kim, J.-M. Integrating copula-based random forest and deep learning approaches for analyzing heterogeneous treatment effects in survival analysis. Mathematics 2025, 13, 1659. [Google Scholar] [CrossRef]
- Kim, J.-M. Treatment effect estimation in survival analysis using deep learning-based causal inference. Axioms 2025, 14, 458. [Google Scholar] [CrossRef]
- Kim, J.-M. A copula-driven CNN-LSTM framework for estimating heterogeneous treatment effects in multivariate outcomes. Mathematics 2025, 13, 2384. [Google Scholar] [CrossRef]
- Kim, J.-M. Multi-task CNN-LSTM modeling of zero-inflated count and time-to-event outcomes for causal inference with functional representation of features. Axioms 2025, 14, 626. [Google Scholar] [CrossRef]
- Kim, G. A copula-based deep graphical causal model for multivariate conditional treatment effect estimation. Meas. Interdiscip. Res. Perspect. 2025, in press. [CrossRef]
- Kim, J.-M.; Ha, I.D.; Kim, S. Deep learning-based survival analysis with copula-based activation functions for multivariate response prediction. Comput. Stat. 2025, in press. [CrossRef]
- Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
- Wasserman, L. All of Nonparametric Statistics; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231. [Google Scholar]
- Hofert, M.; Kojadinovic, I.; Mächler, M.; Yan, J. Elements of Copula Modeling with R; Springer Series in Statistics; Springer: New York, NY, USA, 2018. [Google Scholar]
- Demarta, S.; McNeil, A.J. The t copula and related copulas. Int. Stat. Rev. 2005, 73, 111–129. [Google Scholar] [CrossRef]
- Patton, A.J. Modelling asymmetric exchange rate dependence. Int. Econ. Rev. 2006, 47, 527–556. [Google Scholar] [CrossRef]
- Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
- Hill, J.L. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 2011, 20, 217–240. [Google Scholar] [CrossRef]
- Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
- Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: Cambridge, MA, USA, 1979. [Google Scholar]
- Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 1974, 66, 688–701. [Google Scholar] [CrossRef]
- Imbens, W.G.; Rubin, B.D. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: Cambridge, MA, USA, 2015. [Google Scholar]
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
- Criteo Research. Criteo Uplift Modeling Dataset, version 2.1; Criteo Research: Paris, France, 2021; Available online: https://huggingface.co/datasets/criteo/criteo-uplift/blob/main/criteo-research-uplift-v2.1.csv.gz (accessed on 15 October 2025).
Components | Model | Mean Log-Likelihood | CATE RMSE | CATE Bias |
---|---|---|---|---|
2 | CNN-LSTM | −7.0081 | 2.0313 | 0.2140 |
2 | KDE | −6.0352 | 2.0229 | 0.1139 |
2 | Copula | −6.6024 | 2.1770 | 0.8119 |
3 | CNN-LSTM | −7.4576 | 2.2828 | 0.8241 |
3 | KDE | −6.5099 | 2.2486 | 0.7241 |
3 | Copula | −7.5509 | 2.3097 | 0.8960 |
4 | CNN-LSTM | −7.7268 | 2.6603 | −0.2822 |
4 | KDE | −7.0799 | 2.6727 | −0.3822 |
4 | Copula | −7.8317 | 2.6489 | −0.1390 |
5 | CNN-LSTM | −7.6992 | 2.3897 | −0.0638 |
5 | KDE | −7.0063 | 2.3944 | −0.1639 |
5 | Copula | −7.7034 | 2.3946 | 0.1664 |
6 | CNN-LSTM | −7.7029 | 2.3765 | −0.5122 |
6 | KDE | −27.6310 | 2.4000 | −0.6122 |
6 | Copula | −7.6682 | 2.3247 | −0.1371 |
Model | Coverage | LogLik | CATE_RMSE |
---|---|---|---|
CNN-LSTM | 0.889 | −119.005 | 0.065 |
KDE | 0.956 | −128.074 | 0.915 |
Gaussian Copula | 0.956 | 0.149 | 0.916 |
Model | Coverage | LogLik | CATE_RMSE |
---|---|---|---|
CNN-LSTM | 0.986 | −4090.264 | 0.034 |
KDE | 0.986 | −68,488.785 | 1.008 |
Gaussian Copula | 0.986 | 10.980 | 0.951 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.-M. Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects. Analytics 2025, 4, 29. https://doi.org/10.3390/analytics4040029
Kim J-M. Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects. Analytics. 2025; 4(4):29. https://doi.org/10.3390/analytics4040029
Chicago/Turabian StyleKim, Jong-Min. 2025. "Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects" Analytics 4, no. 4: 29. https://doi.org/10.3390/analytics4040029
APA StyleKim, J.-M. (2025). Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects. Analytics, 4(4), 29. https://doi.org/10.3390/analytics4040029