CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches
Abstract
1. Introduction
Contributions
- A unified cascade-alignment framework: A single principle that jointly aligns branch training, threshold calibration, and soft-target distillation with the inference-time cascade-survivor distribution, yielding three concrete training-recipe modifications for frozen-backbone post-training early-exit cascades.
- Lightweight 1D-conv branch architecture: Compact prototype-based branches that remain expressive on small survivor subsets while adding minimal inference overhead.
- Class Precision Margin (CPM): A class-wise calibration procedure for setting exit thresholds under class imbalance while preserving per-class precision relative to the backbone.
- Empirical characterization of cascade-survivor structure: Per-super-class exit-rate profiles and a sample-flow waterfall showing that population shrinkage through the cascade is concentrated on a small number of “easy” super-classes, while a long tail of harder classes survives to the backbone, supporting the design choice of cascade-aligned training.
- Pareto-frontier-as-comparison-object methodology: Methods are compared via the full accuracy-vs.-FLOPs-reduction Pareto frontier obtained by sweeping the CPM precision margin over a fixed eight-point set per dataset, rather than via accuracy at any single operating point. This removes operating-point selection bias and exposes regime-specific behavior (small- near-backbone vs. large- aggressive-exit), which a single-margin comparison hides.
- Comprehensive evaluation: A fully matched four-configuration study (ResNet18/ResNet50 CIFAR-100 coarse/CINIC-10), ablation study isolating all three cascade-alignment components, comparison with PTEEnet, ZTW, and BoostNet baselines, latency and energy measurements, and multi-seed robustness check over three random seeds ( seeds, ).
2. Related Work
2.1. Static Inference Optimization
2.2. Selective and Early-Exit Inference
2.3. The Post-Training Covariate-Shift Problem
3. Materials and Methods
3.1. Lightweight Early-Exit Branch Architecture
Refined Branch Head
3.2. Selective Inference Process
| Algorithm 1 Selective Inference with Cascade-Aligned Early-Exit Branches |
| 1: Initialize Input: sample x, backbone, branches {Bl}, thresholds {Til}. |
| 2: for l = 1 to L do |
| 3: Residual Block Inference: Xl ← ResBlockl(Xl − 1). |
| 4: Branch Classification Head Inference: |
| 5: scores ← Hlclass(Xl); Ĉ ← arg max(scores). |
| 6: Branch Confidence Head Inference: |
| 7: RĈl ← [Hlconf(Xl)]Ĉ. |
| 8: Early-Exit Decision: if RĈl > TĈl then return ŷ ← Ĉ. |
| 9: end for |
| 10: Final Classification: ŷ ← FinalClassifier(XL). |
| 11: return ŷ. |
3.3. Class Precision Margin (CPM) Calibration
3.4. Cascade-Aligned Training of Early-Exit Branches
- Extract feature maps by running the frozen backbone on .
- Train the classification head via cross-entropy:where is the backbone-predicted pseudo-label.
- Train the confidence head via binary cross-entropy on correctness:
- CPM-calibrate thresholds on (Equation (7)).
- [Cascade-aligned only] Update the cascade state for the next branch. In the hard-filter reference formulation, the survivor subset is defined asand analogously for .
3.4.1. Cascade-Aligned Sample Weighting
3.4.2. Cascade-Aware Calibration
3.4.3. Knowledge Distillation Soft Target
3.4.4. Relationship to Published Baselines
- PTEEnet: In our matched implementation, PTEEnet uses the basic prototype head (Equations (1)–(4)) and minimizes a cumulative cross-entropy loss on backbone pseudo-labels. Branches share a gradient signal through the cumulative loss, but do not filter or reweight samples. The method described here trains each branch with its own loss under survivor-aware sample weighting, uses the augmented head (Equation (5)), applies cascade-aware calibration (Equation (12)), and adds the distilled target (Equation (13)).
- ZTW (Zero-Time Waste) trains exit branches with a weighted ensemble of all earlier branches’ predictions and uses geometric-mean confidence aggregation across the cascade. ZTW does not employ distillation against the backbone softmax or explicit per-class precision calibration. The method here is conceptually simpler (no inference-time ensembling of prior branches), adds CPM precision calibration, and replaces argmax cross-entropy with the distilled soft target.
- BoostNet is included as an evaluated published baseline under the same frozen-backbone post-training protocol used for the other methods. In its original formulation, BoostNet addresses the early-exit train–test mismatch by formulating the dynamic network as a boosting-inspired additive model and combining mini-batch joint optimization, prediction reweighting with temperature, and fixed gradient rescaling. In our matched implementation, the backbone remains frozen, and the BoostNet branch-training mechanism is adapted to the post-training setting. The behavior of this frozen-backbone adaptation is discussed in Section Accuracy–FLOPs Pareto Frontier Across Backbones and Datasets.
- “CalexNet (no alignment, no KD)” baseline (within-paper) has the same augmented branch head and CPM calibration as CalexNet, but is trained with uniform per-sample weights and standard cross-entropy on argmax pseudo-labels. Comparing this reference to CalexNet isolates the joint contribution of cascade-aligned weighting, cascade-aware calibration, and the distilled target as orthogonal training-recipe modifications. Among the configurations evaluated, soft-target KD (Equation (13)) is the dominant lever; cascade-survivor weighting (Equation (11)) is a corrective refinement that varies with margin and dataset. Both are retained because they address distinct train–inference mismatches, and no combination shows a downside.
- Use of ground-truth labels. For CalexNet, “CalexNet (no alignment, no KD)” baseline, PTEEnet and ZTW, branch heads are trained on backbone pseudo-labels only and do not require ground-truth labels for per-sample training. Ground-truth labels are used only for (a) the one-shot per-class precision target in CPM calibration on the held-out validation set and (b) post hoc test-accuracy reporting. BoostNet is included as a matched frozen-backbone adaptation of its published training mechanism, as described above.
- BranchyNet is the foundational multi-exit method that this entire research line derives from. Under the post-training assumption used in this work, BranchyNet is superseded by PTEEnet: both methods minimize the weighted sum of per-exit cross-entropy losses on backbone-pseudo-labels with the backbone frozen, and the remaining differences (Conv-BN-ReLU branch stacks vs. prototype head, entropy vs. max-softmax exit signal) are subsumed by the PTEEnet design. We therefore use PTEEnet as the representative of the joint-cumulative-loss baseline family and do not run BranchyNet as a separate baseline. BranchyNet is cited and discussed for historical completeness in Section 2.2.
4. Experimental Setup
4.1. Datasets
4.2. Backbone Architecture
4.3. Training Configuration
4.4. Metrics
5. Results
Accuracy–FLOPs Pareto Frontier Across Backbones and Datasets
6. Discussion and Conclusions
Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| CPM | Class Precision Margin |
| CUDA | Compute Unified Device Architecture |
| ECE | Expected Calibration Error |
| FLOPs | Floating-point operations |
| FR | FLOPs reduction |
| GELU | Gaussian Error Linear Unit |
| GPU | Graphics Processing Unit |
| KD | Knowledge distillation |
| KL | Kullback–Leibler |
| MLP | Multilayer perceptron |
| NLL | Negative log-likelihood |
| SGD | Stochastic gradient descent |
| SI | Selective inference |
Appendix A. Supplementary Analyses, Ablations, and Reproducibility Details
Appendix A.1. Calibration: CPM vs. Temperature Scaling
| Margin | Branch 1 (n/ECE) | Branch 2 (n/ECE) | Branch 3 (n/ECE) |
|---|---|---|---|
| 0.05 | 1417/0.054 | 1404/0.052 | 2734/0.100 |
| 0.20 | 3711/0.102 | 2297/0.146 | 3344/0.210 |
| 0.40 | 6996/0.094 | 2017/0.166 | 962/0.214 |
Appendix A.2. Quantitative Covariate Shift
| Reach Branch | Samples | ) | Gini | Top-3 Share |
|---|---|---|---|---|
| full set | 10,000 | 0.000 | 0.000 | 15.0% |
| ≥1 | 6289 | 0.068 | 0.193 | 21.5% |
| ≥2 | 3992 | 0.106 | 0.256 | 24.4% |
| ≥3 | 648 | 0.194 | 0.342 | 31.5% |
Appendix A.3. Component Ablation: Cascade Alignment vs. KD
| FR (FLOPs Reduction) | A. Baseline (No Alignment, No KD) | B. Alignment Only (No KD) | C. KD Only (No Alignment) | D. Full CalexNet (Alignment + KD) |
|---|---|---|---|---|
| 0.4 | 6.1% | 7.3% | 4.9% | 4.2% |
| 0.6 | 16.6% | 18.0% | 12.6% | 12.4% |
Appendix A.4. Reproducibility and Hyperparameters
Appendix A.5. Per-Sample Exit Visualization

Appendix A.6. Wall-Clock Latency and GPU Energy
| Method | Test Acc | Latency (ms) | Energy (mJ) |
|---|---|---|---|
| ZTW | 0.79/0.68/0.53 | 2.11/1.66/1.07 | 118/94/65 |
| PTEEnet | 0.80/0.71/0.54 | 1.94/1.67/1.30 | 124/102/76 |
| “no align, no KD” ref. | 0.82/0.74/0.55 | 3.19/2.49/1.65 | 171/142/91 |
| CalexNet | 0.83/0.76/0.59 | 2.54/2.19/1.31 | 164/120/83 |
| CalexNet vs. | Acc 0.80 | Acc 0.75 | Acc 0.70 |
|---|---|---|---|
| ZTW | |||
| PTEEnet | -- | ||
| “no align, no KD” ref. | any | any | any |
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Rokach, L.; Aperstein, Y.; Akselrod-Ballin, A. Deep active learning framework for chest-abdominal CT scans segmentation. Expert Syst. Appl. 2025, 263, 125522. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1–26 July 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Li, H.; Ota, K.; Dong, M. Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE Netw. 2018, 32, 96–101. [Google Scholar] [CrossRef]
- Wang, Y.; Han, Y.; Wang, C.; Song, S.; Tian, Q.; Huang, G. Computation-efficient deep learning for computer vision: A survey. Cybern. Intell. 2024, 1, 9390002. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–21 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
- Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
- Cheng, H.; Zhang, M.; Shi, J.Q. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10558–10578. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Teerapittayanon, S.; McDanel, B.; Kung, H.T. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 139–144. [Google Scholar] [CrossRef]
- Farina, P.; Biswas, S.; Yildiz, E.; Akhunov, K.; Ahmed, S.; Islam, B.; Yildirim, K.S. Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems. arXiv 2024, arXiv:2405.10426. [Google Scholar]
- Odema, M.; Rashid, N.; Al Faruque, M.A. Eexnas: Early-exit neural architecture search solutions for low-power wearable devices. In Proceedings of the 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED); IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Li, X.; Lou, C.; Chen, Y.; Zhu, Z.; Shen, Y.; Ma, Y.; Zou, A. Predictive exit: Prediction of fine-grained early exits for computation and energy-efficient inference. In Proceedings of the AAAI Conference on Artificial Intelligence; AIP Publishing: Melville, NY, USA, 2023; Volume 37, pp. 8657–8665. [Google Scholar] [CrossRef]
- Laskaridis, S.; Kouris, A.; Lane, N.D. Adaptive inference through early-exit networks: Design, challenges, and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning, Virtual, 25 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Matsubara, Y.; Levorato, M.; Restuccia, F. Split computing and early exiting for deep learning applications: Survey and research challenges. ACM Comput. Surv. 2022, 55, 1–30. [Google Scholar] [CrossRef]
- Rahmath, P.H.; Srivastava, V.; Chaurasia, K.; Pacheco, R.G.; Couto, R.S. Early-Exit Deep Neural Network—A Comprehensive Survey. ACM Comput. Surv. 2024, 57, 1–37. [Google Scholar] [CrossRef]
- Li, B.; Cao, X.; Li, J.; Ji, L.; Wei, X.; Geng, J.; Zhang, R. CaDCR: An Efficient Cascaded Dynamic Collaborative Reasoning Framework for Intelligent Recognition Systems. Electronics 2025, 14, 2628. [Google Scholar] [CrossRef]
- Li, H.; Zhang, H.; Qi, X.; Yang, R.; Huang, G. Improved techniques for training adaptive deep networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 29 October–1 November 2019; pp. 1891–1900. [Google Scholar] [CrossRef]
- Liang, Y.P.; Chao, W.C.; Chung, C.C. Low-Power Branch CNN Hardware Accelerator with Early Exit for UAV Disaster Detection Using 16 nm CMOS Technology. Sensors 2025, 25, 4867. [Google Scholar] [CrossRef]
- Wang, M.; Mo, J.; Lin, J.; Wang, Z.; Du, L. Dynexit: A dynamic early-exit strategy for deep residual networks. In 2019 IEEE International Workshop on Signal Processing Systems (SiPS); IEEE: New York, NY, USA, 2019; pp. 178–183. [Google Scholar] [CrossRef]
- Ma, Y.; Wang, Y.; Tang, B. Joint Optimization of Model Partitioning and Resource Allocation for Multi-Exit DNNs in Edge-Device Collaboration. Electronics 2025, 14, 1647. [Google Scholar] [CrossRef]
- Lahiany, A.; Aperstein, Y. PTEENET: Post-trained early-exit neural networks augmentation for inference cost optimization. IEEE Access 2022, 10, 69680–69687. [Google Scholar] [CrossRef]
- Wójcik, B.; Przewiȩźlikowski, M.; Szatkowski, F.; Wołczyk, M.; Bałazy, K.; Krzepkowski, B.; Podolak, I.; Tabor, J.; Śmieja, M.; Trzciński, T.; et al. Zero time waste in pre-trained early exit neural networks. Neural Netw. 2023, 168, 580–601. [Google Scholar] [CrossRef]
- Peng, X.; Wu, X.; Xu, L.; Wang, L.; Fei, A. DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices. In Proceedings of the GLOBECOM 2024—2024 IEEE Global Communications Conference; IEEE: New York, NY, USA, 2024; pp. 3116–3121. [Google Scholar] [CrossRef]
- Elhoushi, M.; Shrivastava, A.; Liskovich, D.; Hosmer, B.; Wasti, B.; Lai, L.; Mahmoud, A.; Acun, B.; Agrawal, S.; Roman, A.; et al. LayerSkip: Enabling early exit inference and self-speculative decoding. arXiv 2024, arXiv:2404.16710. [Google Scholar] [CrossRef]
- Khalilian, S.; Aghapour, E.; Meratnia, N.; Pimentel, A.; Pathania, A. Early-Exit DNN Inference on HMPSoCs. In Proceedings of the 2025 IEEE International Conference on Edge Computing and Communications (EDGE); IEEE: New York, NY, USA, 2025; pp. 75–82. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; pp. 511–518. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
- Fahlman, S.E.; Lebiere, C. The cascade-correlation learning architecture. In Advances in Neural Information Processing Systems; NeurIPS: Sydney, Australia, 1990; Volume 2. [Google Scholar]
- Yu, H.; Li, H.; Hua, G.; Shi, H. Boosted Dynamic Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI: Washington, DC, USA, 2023; Volume 37, pp. 10989–10997. [Google Scholar] [CrossRef]
- Mokssit, S.; Karrakchou, O.; Mousist, A.; Ghogho, M. Confidence-gated training for efficient early-exit neural networks. arXiv 2025, arXiv:2509.17885. [Google Scholar]
- Regol, F.; Chataoui, J.; Coates, M. Jointly-learned exit and inference for a dynamic neural network: Jei-dnn. arXiv 2023, arXiv:2310.09163. [Google Scholar]
- Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7436–7456. [Google Scholar] [CrossRef]
- Elhoushi, M.; Shrivastava, A.; Liskovich, D.; Hosmer, B.; Wasti, B.; Lai, L.; Mahmoud, A.; Acun, B.; Agarwal, S.; Roman, A.; et al. Layerskip: Enabling early exit inference and self-speculative decoding. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; Volume 1, pp. 12622–12642. [Google Scholar]
- Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–19 June 2019; pp. 9268–9277. [Google Scholar] [CrossRef]
- Byrd, J.; Lipton, Z. What is the effect of importance weighting in deep learning? In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 872–881. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Darlow, L.N.; Crowley, E.J.; Antoniou, A.; Storkey, A.J. CINIC-10 is not ImageNet or CIFAR-10. arXiv 2018, arXiv:1810.03505. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 630–645. [Google Scholar] [CrossRef]




| Strategy | Backbone Modification | Handles Cascade-Distribution Mismatch | Key Examples |
|---|---|---|---|
| Joint training | Yes | No | BranchyNet, DynExit, DistrEE |
| Independent post-training | No | No | PTEENet, ZTW |
| Cascade-aligned post-training | No | Yes | CalexNet (this work) |
| Component | CalexNet (Proposed) | “CalexNet (No Alignment, No KD)” Baseline | PTEEnet | ZTW | BoostNet |
|---|---|---|---|---|---|
| Backbone | frozen post hoc | frozen post hoc | frozen post hoc | frozen post hoc | frozen post hoc |
| Branch head | augmented prototype (Equation (5)) | same | basic prototype (Equations (1)–(4)) | BasicBlock + linear | basic prototype (Equations (1)–(4)) |
| Cls training objective | distilled KL (Equation (13)) | argmax CE (Equation (8)) | cumulative CE on pseudo-label | weighted CE with prior-branch ensemble | additive CE with stopped prior-output term |
| Sample weighting | cascade-aligned (Equation (11)) | uniform | uniform | uniform | uniform |
| Exit-decision signal | per-class CPM threshold | per-class CPM threshold | per-class CPM threshold | geometric-mean cascade confidence | per-class CPM threshold |
| Calibration distribution | cascade-aware (Equation (12)) | full validation set | full validation set | full validation set | full validation set |
| Inference cost beyond backbone | per-exit branch forward | same | same | per-exit branch + ensemble aggregation | per-exit branch + additive logit aggregation |
| Dataset | Classes | Train/Val/Test | Per-Class Val (Raw) | Per-Class Val After Branch-1 Exits | Backbone Test Acc (R18/R50) |
|---|---|---|---|---|---|
| CINIC-10 | 10 | 90k/90k/90k | ~9000 | ~5000–7000 | 0.874/0.908 |
| CIFAR-100 coarse | 20 | 40k/10k/10k | ~500 | ~200–300 | 0.867/0.881 |
| FR | ZTW | PTEEnet | BoostNet | CalexNet (No Align, No KD) | CalexNet |
|---|---|---|---|---|---|
| 0.4 | 8.8/9.5 | 7.7/6.0 | 8.2/8.4 | 5.9/4.5 | 4.1/2.7 |
| 0.5 | 14.8/12.4 | 11.9/9.3 | 14.1/12.5 | 9.9/6.8 | 7.0/5.4 |
| 0.6 | 21.7/15.2 | 17.8/13.4 | 20.8/17.1 | 15.7/12.6 | 12.3/9.1 |
| 0.7 | 29.5/19.9 | 28.2/18.7 | 29.4/24.1 | 23.5/20.1 | 20.9/12.9 |
| 0.8 | 38.9/27.3 | 37.3/26.5 | 39.2/32.7 | 36.7/31.7 | 29.4/21.5 |
| FR | ZTW | PTEEnet | BoostNet | CalexNet (No Align, No KD) | CalexNet |
|---|---|---|---|---|---|
| 0.4 | --/-- | --/-- | --/-- | --/-- | --/-- |
| 0.5 | 5.6/5.3 | 5.4/5.1 | 5.9/5.6 | --/-- | --/-- |
| 0.6 | 7.1/6.5 | 7.7/6.7 | 7.7/7.3 | 6.8/6.6 | 5.2/4.9 |
| 0.7 | 11.0/10.9 | 10.5/9.9 | 10.8/9.9 | 8.7/10.7 | 8.2/7.8 |
| 0.8 | 18.6/18.5 | 18.5/18.0 | 19.4/18.4 | 18.8/20.1 | 15.6/15.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Aperstein, Y.; Apartsin, A. CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches. Electronics 2026, 15, 2149. https://doi.org/10.3390/electronics15102149
Aperstein Y, Apartsin A. CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches. Electronics. 2026; 15(10):2149. https://doi.org/10.3390/electronics15102149
Chicago/Turabian StyleAperstein, Yehudit, and Alexander Apartsin. 2026. "CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches" Electronics 15, no. 10: 2149. https://doi.org/10.3390/electronics15102149
APA StyleAperstein, Y., & Apartsin, A. (2026). CalexNet: Soft Cascade-Aligned Training and Calibration for Lightweight Early-Exit Branches. Electronics, 15(10), 2149. https://doi.org/10.3390/electronics15102149

