# Cascading and Ensemble Techniques in Deep Learning

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction and Related Works

## 2. Methods

#### 2.1. Bagging

#### 2.2. Boosting

#### 2.3. Stacking

#### 2.4. Cascading

## 3. Methodology

- Bagging: These techniques, such as Random Forests, are predominantly employed to reduce the model’s variance.
- Boosting: Methods like AdaBoost, XGBoost, and gradient-enhanced decision trees, primarily aim at reducing the model’s bias.
- Stacking: They are mainly employed to enhance the model’s prediction accuracy.
- Cascading: These models are predominantly used in situations demanding high precision, such as in detecting fraudulent credit-card transactions.

#### 3.1. Dataset

- Pregnancies: Number of times the patient has been pregnant.
- Glucose: 2 h plasma glucose concentration during an oral glucose tolerance test.
- Blood Pressure: Diastolic blood pressure (measured in mm Hg).
- Skin Thickness: Triceps skinfold thickness (measured in mm).
- Insulin: 2 h serum insulin (measured in mu U/mL).
- BMI: Body mass index (computed as weight in kg/(height in m)${}^{2}$).
- Diabetes Pedigree Function: Diabetes pedigree function, a function which scores likelihood of diabetes based on family history.
- Age: Age of the patient (in years).
- Outcome (target variable): Class variable indicating the presence (1) or absence (0) of diabetes.

#### 3.2. Parallel Combination of Classifiers

#### 3.2.1. Decision Trees

#### 3.2.2. Bagging

#### 3.2.3. Boosting

#### 3.3. Preliminary Trials

#### 3.3.1. K-Nearest Neighbors (KNN)

#### 3.3.2. Support Vector Machines (SVMs)

#### 3.3.3. Stacking

#### 3.3.4. Cascading

#### 3.3.5. NN Ensembles and Cascading

#### 3.4. ML Models and Hyperparameter Tuning

#### 3.5. The Importance of Data

#### 3.6. Evaluation and ROC Curve

#### 3.7. Predictor Importance Evaluation and Sensitivity Analysis

## 4. Discussion

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

Machine Learning | ML |

Deep Learning | DL |

Neural Networks | NN |

K-Nearest Neighbors | K-NN |

Support Vector Machines | SVM |

Stochastic Gradient Descent | SGD |

## References

- Zhu, T.; Li, K.; Herrero, P.; Georgiou, P. Deep learning for diabetes: A systematic review. IEEE J. Biomed. Health Inform.
**2020**, 25, 2744–2757. [Google Scholar] [CrossRef] [PubMed] - Tama, B.A.; Rhee, K.-H. Tree-based classifier ensembles for early detection method of diabetes: An exploratory study. Artif. Intell. Rev.
**2019**, 51, 355–370. [Google Scholar] [CrossRef] - Chen, P.; Pan, C. Diabetes classification model based on boosting algorithms. BMC Bioinform.
**2018**, 19, 109. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst.
**2012**, 25, 1097–1105. [Google Scholar] [CrossRef] [Green Version] - Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv
**2018**, arXiv:1810.04805. [Google Scholar] - Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst.
**2017**, 30, 5998–6008. [Google Scholar] - Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med.
**2019**, 25, 24–29. [Google Scholar] [CrossRef] [PubMed] - Bakator, M.; Radosav, D. Deep learning and medical diagnosis: A review of literature. Multimodal Technol. Interact.
**2018**, 2, 47. [Google Scholar] [CrossRef] [Green Version] - Dietterich, T.G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl. Soft Comput.
**2019**, 84, 105740. [Google Scholar] [CrossRef] - Aashima; Bhargav, S.; Kaushik, S.; Dutt, V. A combination of decision trees with machine learning ensembles for blood glucose level predictions. In Proceedings of the International Conference on Data Science and Applications: ICDSA 2021, Virtual, 10–11 April 2021; Springer: Berlin/Heidelberg, Germany, 2022; Volume 2, pp. 533–548. [Google Scholar]
- Polat, K.; Gunes, S. A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst. Appl.
**2007**, 32, 386–395. [Google Scholar] - Laila, U.E.; Mahboob, K.; Khan, A.W.; Khan, F.; Whangbo, T. An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study. Sensors
**2022**, 22, 5247. [Google Scholar] [CrossRef] [PubMed] - Singh, N.; Singh, P. Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern. Biomed. Eng.
**2020**, 40, 1–22. [Google Scholar] [CrossRef] - Taser, P.Y. Application of bagging and boosting approaches using decision tree-based algorithms in diabetes risk prediction. Proceedings
**2021**, 74, 6. [Google Scholar] [CrossRef] - Woźniak, M.; Graña, M.; Corchado, E. A survey of multiple classifier systems as hybrid systems. Inf. Fusion
**2014**, 16, 3–17. [Google Scholar] [CrossRef] [Green Version] - Chakraborty, S. Diabetes Prediction Using Machine Learning. 2023. Available online: https://www.kaggle.com/code/srijitachakraborty01/diabetes-prediction-using-machine-learning (accessed on 4 June 2023).
- Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, D.; Tarantola, S. Global Sensitivity Analysis: The Primer; John Wiley Sons: Hoboken, NJ, USA, 2008. [Google Scholar]

**Figure 1.**Distribution of variables in the Diabetes dataset. Each subplot displays a histogram along with the Kernel Density Estimate (KDE; line in blue) for each of the eight features: Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, and Age. These distributions provide an overview of the data’s spread and central tendencies.

**Figure 2.**Heatmap displaying the correlation matrix of the predictors within the diabetes dataset. Color intensity and the size of the squares are proportional to the correlation coefficients. The ’coolwarm’ color map is used, where warm colors represent positive correlation coefficients and cool colors denote negative correlations. Annotations within the squares indicate the exact correlation coefficient between each pair of variables.

**Figure 3.**ROC curve from the averaged probabilities of the secondary ensemble networks SuperNet. This curve represents the improvement in model performance achieved through the stacked model architecture. The dashed line in blue represents the baseline performance of a random classifier.

**Table 1.**Structure of the NN: This table outlines each layer of the initial model, providing information about the type of layer, its parameters, and the output dimensions. The model consists of four fully connected layers, each followed by a ReLU activation function. The dimensions of the output gradually decrease, ultimately leading to a two-dimensional output suitable for binary classification.

Layer # | Layer Type | Parameters | Output Dimensions |
---|---|---|---|

1 | Linear (Fully Connected) | Input: 8, Output: 50 | 50 |

2 | ReLU | - | 50 |

3 | Linear (Fully Connected) | Input: 50, Output: 20 | 20 |

4 | ReLU | - | 20 |

5 | Linear (Fully Connected) | Input: 20, Output: 10 | 10 |

6 | ReLU | - | 10 |

7 | Linear (Fully Connected) | Input: 10, Output: 2 | 2 |

**Table 2.**Structure of the Improved Neural Networkl: This table presents the details of each layer in the model. It provides information about the layer type, the parameters used in each layer and the output dimensions after passing through each layer. This model includes various techniques to prevent overfitting such as dropout layers and batch normalization.

Layer # | Layer Type | Parameters | Output Dimensions |
---|---|---|---|

1 | Linear (Fully Connected) | Input: 8, Output: 100 | 100 |

2 | Batch Normalization | Input: 100 | 100 |

3 | ReLU | - | 100 |

4 | Dropout | p = 0.5 | 100 |

5 | Linear (Fully Connected) | Input: 100, Output: 50 | 50 |

6 | Batch Normalization | Input: 50 | 50 |

7 | LeakyReLU | negative_slope = 0.02 | 50 |

8 | Dropout | p = 0.5 | 50 |

9 | Linear (Fully Connected) | Input: 50, Output: 20 | 20 |

10 | Batch Normalization | Input: 20 | 20 |

11 | ReLU | - | 20 |

12 | Dropout | p = 0.5 | 20 |

13 | Linear (Fully Connected) | Input: 20, Output: 10 | 10 |

14 | Batch Normalization | Input: 10 | 10 |

15 | ReLU | - | 10 |

16 | Dropout | p = 0.5 | 10 |

17 | Linear (Fully Connected) | Input: 10, Output: 2 | 2 |

Method | Ensemble/Cascading | Accuracy |
---|---|---|

Gradient Boosting | Cascading | 76.1% |

Neural Network | Ensemble | 71.1% |

Improved Neural Network | Ensemble | 71.6% |

Improved Neural Network | Ensemble and Cascading | 82.6% |

Model | Training Accuracy | Testing Accuracy |
---|---|---|

XGB Classifier | 1.0 | 0.748 |

Gradient Boosting Classifier 1 | 1.0 | 0.759 |

Gradient Boosting Classifier 2 | 1.0 | 0.759 |

Decision Tree Classifier | 0.791 | 0.683 |

Ada Boost Classifier | 1.0 | 0.757 |

SGD Classifier | 0.420 | 0.418 |

Logistic Regression | 0.752 | 0.765 |

KNN Classifier | 0.801 | 0.715 |

SVM Classifier | 0.781 | 0.750 |

Random Forest Classifier | 0.986 | 0.767 |

Model | Accuracy before Data Curation | Accuracy after Data Curation |
---|---|---|

Improved Neural Network (Ensemble) | 71.6% | 84.6% |

Improved Neural Network (Ensemble and Cascading) | 82.6% | 91.5% |

Precision | Recall | F1-Score | Support | |
---|---|---|---|---|

0 | 0.93 | 0.85 | 0.89 | 297 |

1 | 0.76 | 0.88 | 0.82 | 164 |

Accuracy | 0.86 | 461 | ||

Macro Avg | 0.84 | 0.86 | 0.85 | 461 |

Weighted Avg | 0.87 | 0.86 | 0.86 | 461 |

Predicted: 0 | Predicted: 1 | |
---|---|---|

Actual: 0 | 251 | 46 |

Actual: 1 | 19 | 145 |

**Table 8.**Detailed classification metrics showing the model’s precision, recall, F1-score and support for each class, along with overall accuracy, macro average, and weighted average metrics.

Precision | Recall | F1-Score | Support | |
---|---|---|---|---|

0 | 0.93 | 0.94 | 0.93 | 297 |

1 | 0.88 | 0.87 | 0.88 | 164 |

Accuracy | 0.91 | 461 | ||

Macro Avg | 0.91 | 0.90 | 0.91 | 461 |

Weighted Avg | 0.91 | 0.91 | 0.91 | 461 |

**Table 9.**Confusion matrix representing the distribution of actual and predicted classes, allowing for the visualization of true positives, true negatives, false positives, and false negatives.

Predicted: 0 | Predicted: 1 | |
---|---|---|

Actual: 0 | 278 | 19 |

Actual: 1 | 21 | 143 |

**Table 10.**Indices First-Order Sobol and Total-Order for each predictor in the initial diabetes-prediction model. This analysis was executed on the inaugural network within the ensemble, prior to the implementation of cascading. A thorough examination of predictor influence in later ensemble stages poses greater complexity due to the dimensionality increase in the SuperNet architecture—from the original eight attributes to a 100-dimensional space corresponding to the size of the primary ensemble. Hence, interpretability is inherently more challenging in these subsequent stages.

Predictor | First-Order Index Sobol | Total-Order Index Sobol |
---|---|---|

Pregnancies | 0.04393478 | 0.04300161 |

Glucose | 0.04792455 | 0.04966603 |

BloodPressure | 0.1176754 | 0.11754018 |

SkinThickness | 0.39438682 | 0.39478155 |

Insulin | 0.09117344 | 0.09335395 |

BMI | 0.10445807 | 0.10289758 |

DiabetesPedigreeFunction | 0.1756332 | 0.17328341 |

Age | 0.02469845 | 0.02507718 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

de Zarzà, I.; de Curtò, J.; Hernández-Orallo, E.; Calafate, C.T.
Cascading and Ensemble Techniques in Deep Learning. *Electronics* **2023**, *12*, 3354.
https://doi.org/10.3390/electronics12153354

**AMA Style**

de Zarzà I, de Curtò J, Hernández-Orallo E, Calafate CT.
Cascading and Ensemble Techniques in Deep Learning. *Electronics*. 2023; 12(15):3354.
https://doi.org/10.3390/electronics12153354

**Chicago/Turabian Style**

de Zarzà, I., J. de Curtò, Enrique Hernández-Orallo, and Carlos T. Calafate.
2023. "Cascading and Ensemble Techniques in Deep Learning" *Electronics* 12, no. 15: 3354.
https://doi.org/10.3390/electronics12153354