Expert-Transformer with Prototype-Aware Contrastive Learning for Semi-Supervised Time-Series Classification
Abstract
1. Introduction
- 1.
- We integrate an uncertainty-guided Mixture-of-Experts (MoE) with the Transformer encoder, enabling dynamic routing of input sequences to specialized experts and improving generalization across diverse temporal patterns.
- 2.
- An expert balancing strategy, which prevents expert collapse and ensures meaningful contribution from all experts, is designed to enhance robustness and adaptability.
- 3.
- We introduce a prototype-aware contrastive learning loss, which iteratively computes class prototypes and guides both labeled and high-confidence unlabeled samples, embedding categorical semantics to strengthen discriminative feature learning.
- 4.
- We conduct extensive experiments on public benchmark datasets. The results demonstrate that ExT-PACL consistently outperforms state-of-the-art SSL methods, particularly in extreme low-label scenarios, while also being more computationally efficient than standard contrastive baselines.
2. Related Work
2.1. Semi-Supervised Time-Series Classification
2.2. Contrastive Representation Learning for Time-Series
3. Methodology
3.1. Problem Formulation
3.2. MoE-Enhanced Expert-Transformer
3.3. Balanced Expert Training and Prototype-Guided Representation Learning
3.4. Training Losses
4. Experiments
4.1. Datasets
4.2. Comparison Baselines
4.3. Evaluation Metrics
4.4. Implement Details
4.5. Experimental Results
4.6. Ablation Study
4.7. Sensitivity Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yehuda, Y.; Freedman, D.; Radinsky, K. Self-supervised Classification of Clinical Multivariate Time Series using Time Series Dynamics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023; KDD’23; ACM: New York, NY, USA, 2023; pp. 5416–5427. [Google Scholar] [CrossRef]
- Dzaferagic, M.; Marchetti, N.; Macaluso, I. Fault Detection and Classification in Industrial IoT in Case of Missing Sensor Data. IEEE Internet Things J. 2021, 9, 8892–8900. [Google Scholar] [CrossRef]
- Oliveira, M.; Costa, G. Quantitative portfolio optimization framework with market regimes classification, probabilistic time series forecasting, and hidden Markov models. Digit. Financ. 2025, 7, 553–603. [Google Scholar] [CrossRef]
- Müller, P.N.; Müller, A.J.; Achenbach, P.; Göbel, S. IMU-Based Fitness Activity Recognition Using CNNs for Time Series Classification. Sensors 2024, 24, 742. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Shi, K. Multi-scale Attention Convolutional Neural Network for time series classification. Neural Netw. 2021, 136, 126–140. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Yang, D.; Liu, X.; Chen, X.; Liang, Z.; Wang, H.; Cui, Y.; Gu, J. TodyNet: Temporal dynamic graph neural network for multivariate time series classification. Inf. Sci. 2024, 677, 120914. [Google Scholar] [CrossRef]
- Foumani, N.M.; Tan, C.W.; Webb, G.I.; Salehi, M. Improving position encoding of transformers for multivariate time series classification. Data Min. Knowl. Discov. 2024, 38, 22–48. [Google Scholar] [CrossRef]
- Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; Daumé, H., III, Singh, A., Eds.; PMLR (Proceedings of Machine Learning Research): Vienna, Austria, 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15604–15618. [Google Scholar] [CrossRef] [PubMed]
- Xing, H.; Xiao, Z.; Dawei, Z.; Luo, S.; Dai, P.; Li, K. SelfMatch: Robust semisupervised time-series classification with self-distillation. Int. J. Intell. Syst. 2022, 37, 8583–8610. [Google Scholar] [CrossRef]
- Liu, C.; Guan, D.; Yuan, W.; Koc, C.K. ITS2Graph: Graph-based generative adversarial learning for imbalanced time series classification. Neural Netw. 2025, 191, 107770. [Google Scholar] [CrossRef] [PubMed]
- Pei, E.; Zhao, W.; Hu, Z.; He, L.; Ning, H.; Chen, H. Classifier ensemble based source-free domain adaptation for time series classification. Knowl. Based Syst. 2025, 330, 114584. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, F.; Huang, X.; Wang, R.; Xi, L. Bidirectional consistency with temporal-aware for semi-supervised time series classification. Neural Netw. 2024, 180, 106709. [Google Scholar] [CrossRef] [PubMed]
- Eldele, E.; Ragab, M.; Chen, Z.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual, 19–27 August 2021. [Google Scholar]
- Zhang, J.; Dai, Q.; Ye, R. iBACon: imBalance-Aware Contrastive Learning for Time Series Forecasting. IEEE Trans. Knowl. Data Eng. 2025, 37, 5967–5982. [Google Scholar] [CrossRef]
- Liu, M.; Sheng, H.; Zhang, N.; Zhao, P.; Yi, Y.; Jiang, Y.; Dai, J. DSDCLNet: Dual-stream encoder and dual-level contrastive learning network for supervised multivariate time series classification Multivariate Time Series Classification Gated Recurrent Unit Multiscale Convolutional Neural Network Dual-Stream Encoder Contrastive Learning. Knowl. Based Syst. 2024, 292, 111638. [Google Scholar] [CrossRef]
- Cao, Y.; Fan, L.; Wang, W.; He, Y. Machine learning-driven acceleration of Mg-MOF-74 synthesis optimization for enhanced CO2 adsorption. Sep. Purif. Technol. 2026, 398, 138172. [Google Scholar] [CrossRef]
- Yan, H.; Ye, L.; Zhou, T.; Li, Z.H.; Ye, T.; Zhang, F.; Liu, C.L. Physics-informed neural network for predicting multi-row film cooling superposition using Fourier transform and attention mechanism. Phys. Fluids 2025, 37, 065174. [Google Scholar] [CrossRef]
- Andrzejak, R.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2002, 64, 061907. [Google Scholar] [CrossRef] [PubMed]
- Sarkar, P.; Etemad, A. Self-Supervised ECG Representation Learning for Emotion Recognition. IEEE Trans. Affect. Comput. 2020, 13, 1541–1554. [Google Scholar] [CrossRef]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Nice, France, 2017; Volume 30. [Google Scholar]
- Li, J.; Socher, R.; Hoi, S.C.H. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. arXiv 2020, arXiv:2002.07394. [Google Scholar]
- Fan, H.; Zhang, F.; Wang, R.; Huang, X.; Li, Z. Semi-Supervised Time Series Classification by Temporal Relation Prediction. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2021; pp. 3545–3549. [Google Scholar] [CrossRef]
- Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Nice, France, 2020; Volume 33, pp. 596–608. [Google Scholar]





| Datasets | Epilepsy | Wafer | POC | |||
|---|---|---|---|---|---|---|
| Methods | Accuracy | MF1-Score | Accuracy | MF1-Score | Accuracy | MF1-Score |
| 1% of labeled data | ||||||
| Random Init | 70.3 ± 2.1 | 66.2 ± 2.6 | 90.6 ± 1.6 | 58.1 ± 2.1 | 61.4 ± 0.0 | 38.3 ± 0.0 |
| Supervised | 76.1 ± 0.7 | 74.8 ± 0.4 | 91.9 ± 1.3 | 67.6 ± 9.2 | 62.0 ± 0.8 | 40.0 ± 2.1 |
| SSL-ECG | 89.3 ± 0.4 | 86.0 ± 0.3 | 93.4 ± 0.5 | 76.1 ± 2.4 | 62.5 ± 1.8 | 41.2 ± 4.9 |
| CPC | 88.9 ± 1.1 | 85.8 ± 0.3 | 93.5 ± 0.4 | 78.4 ± 1.5 | 64.8 ± 1.0 | 48.2 ± 2.9 |
| SimCLR | 88.3 ± 1.5 | 84.0 ± 1.0 | 93.8 ± 0.2 | 78.5 ± 1.1 | 61.5 ± 0.1 | 3.8 ± 0.3 |
| TS-TCC | 91.2 ± 0.5 | 89.2 ± 0.2 | 93.2 ± 0.8 | 76.7 ± 4.6 | 63.8 ± 0.5 | 48.1 ± 0.9 |
| Mean-Teacher | 91.5 ± 0.3 | 90.6 ± 0.6 | 94.7 ± 0.2 | 84.7 ± 0.3 | 62.1 ± 0.3 | 40.8 ± 1.2 |
| DivideMix | 90.9 ± 0.7 | 89.4 ± 1.4 | 93.2 ± 0.5 | 82.0 ± 0.8 | 62.1 ± 0.6 | 40.7 ± 2.1 |
| SemiTime | 91.6 ± 0.3 | 90.8 ± 0.6 | 94.4 ± 0.6 | 84.4 ± 1.2 | 62.0 ± 0.5 | 40.4 ± 1.6 |
| FixMatch | 93.2 ± 0.2 | 92.2 ± 0.5 | 95.0 ± 0.4 | 84.8 ± 1.2 | 61.9 ± 0.5 | 40.0 ± 1.8 |
| CA-TCC | 92.0 ± 0.1 | 91.9 ± 0.1 | 95.1 ± 0.3 | 85.1 ± 0.6 | 63.4 ± 0.4 | 49.3 ± 0.7 |
| ExT-PACL | 93.8 ± 0.3 | 89.4 ± 0.5 | 95.1 ± 0.2 | 85.2 ± 0.7 | 65.6 ± 0.3 | 50.4 ± 0.5 |
| 5% of labeled data | ||||||
| Random Init | 75.5 ± 3.6 | 70.5 ± 3.3 | 91.2 ± 1.2 | 65.5 ± 8.2 | 61.6 ± 0.3 | 38.8 ± 1.0 |
| Supervised | 83.4 ± 0.7 | 80.4 ± 0.7 | 94.6 ± 0.3 | 83.9 ± 0.6 | 61.4 ± 0.0 | 38.3 ± 0.0 |
| SSL-ECG | 92.8 ± 0.2 | 89.0 ± 0.3 | 94.9 ± 0.3 | 84.5 ± 0.7 | 62.9 ± 0.3 | 43.3 ± 1.4 |
| CPC | 92.8 ± 0.3 | 90.2 ± 0.5 | 92.5 ± 0.4 | 79.4 ± 0.8 | 66.9 ± 2.6 | 44.3 ± 8.4 |
| SimCLR | 74.9 ± 1.5 | 89.2 ± 1.0 | 94.8 ± 0.2 | 83.3 ± 0.6 | 62.7 ± 1.1 | 42.4 ± 4.0 |
| TS-TCC | 93.1 ± 0.3 | 93.7 ± 0.6 | 93.2 ± 0.4 | 81.2 ± 0.7 | 62.6 ± 1.1 | 42.6 ± 3.0 |
| Mean-Teacher | 94.0 ± 0.4 | 93.6 ± 0.7 | 94.4 ± 0.7 | 83.8 ± 1.4 | 62.1 ± 0.6 | 41.2 ± 2.5 |
| DivideMix | 93.9 ± 0.6 | 93.4 ± 1.1 | 94.7 ± 0.6 | 84.6 ± 1.5 | 62.9 ± 1.3 | 45.9 ± 7.0 |
| SemiTime | 94.0 ± 0.5 | 93.0 ± 0.9 | 95.0 ± 0.4 | 84.7 ± 1.0 | 62.4 ± 0.5 | 41.8 ± 1.7 |
| FixMatch | 93.7 ± 1.4 | 92.4 ± 0.3 | 94.9 ± 0.6 | 84.4 ± 1.2 | 63.1 ± 1.4 | 43.6 ± 4.3 |
| CA-TCC | 94.5 ± 0.1 | 94.0 ± 0.1 | 95.8 ± 0.2 | 85.2 ± 0.6 | 66.4 ± 0.3 | 52.8 ± 0.3 |
| ExT-PACL | 94.4 ± 0.2 | 90.8 ± 0.4 | 98.1 ± 0.1 | 94.1 ± 0.0 | 67.1 ± 0.2 | 53.5 ± 0.3 |
| Datasets | Epilepsy | Power | ||
|---|---|---|---|---|
| Methods | Accuracy | MF1-Score | Accuracy | MF1-Score |
| 1% of labeled data | ||||
| CA-TCC | 63.4 ± 0.4 | 49.3 ± 0.7 | 69.9 ± 0.4 | 67.9 ± 0.6 |
| E-PACL(2) | 63.2 ± 0.3 | 48.8 ± 0.5 | 70.4 ± 0.5 | 68.2 ± 0.4 |
| ExT-PACL(2) | 64.1 ± 0.2 | 50.1 ± 0.8 | 70.4 ± 0.6 | 68.8 ± 0.4 |
| E-PACL(4) | 64.1 ± 0.2 | 49.7 ± 0.5 | 70.2 ± 0.6 | 70.2 ± 0.2 |
| ExT-PACL(4) | 65.6 ± 0.3 | 50.4 ± 0.5 | 71.7 ± 0.2 | 69.4 ± 0.7 |
| E-PACL(8) | 63.2 ± 0.3 | 48.2 ± 0.8 | 70.1 ± 0.2 | 67.9 ± 0.2 |
| ExT-PACL(8) | 64.2 ± 0.3 | 49.8 ± 0.5 | 71.2 ± 0.3 | 68.6 ± 0.3 |
| 5% of labeled data | ||||
| CA-TCC | 66.4 ± 0.3 | 52.8 ± 0.3 | 76.8 ± 0.6 | 74.1 ± 0.7 |
| E-PACL(2) | 66.5 ± 0.2 | 53.0 ± 0.2 | 77.5 ± 0.4 | 75.5 ± 0.2 |
| ExT-PACL(2) | 66.9 ± 0.6 | 52.9 ± 0.5 | 77.2 ± 0.6 | 76.4 ± 0.5 |
| E-PACL(4) | 66.3 ± 0.2 | 52.3 ± 0.6 | 77.8 ± 0.6 | 76.9 ± 0.4 |
| ExT-PACL(4) | 67.1± 0.2 | 53.5 ± 0.3 | 78.7 ± 0.3 | 78.2 ± 0.5 |
| E-PACL(8) | 65.7 ± 0.2 | 52.9 ± 0.4 | 75.8 ± 0.6 | 76.2 ± 0.3 |
| ExT-PACL(8) | 66.2 ± 0.4 | 53.4 ± 0.5 | 76.9 ± 0.2 | 77.8 ± 0.7 |
| Datasets | Methods | Params | Params (One Forward) | Train (ms/it) | Infer (ms/it) |
|---|---|---|---|---|---|
| POC | CA-TCC | 453.58 K | 703.86 M | 86.91 | 9.97 |
| ExT-PACL | 520.14 K | 704.53 M | 92.21 | 10.38 | |
| power | CA-TCC | 340.81 K | 356.72 M | 61.30 | 7.63 |
| ExT-PACL | 407.37 K | 359.46 M | 77.03 | 8.52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, Z.; Peng, F.; Hou, K.; Xia, D.; An, T. Expert-Transformer with Prototype-Aware Contrastive Learning for Semi-Supervised Time-Series Classification. Electronics 2026, 15, 2303. https://doi.org/10.3390/electronics15112303
Huang Z, Peng F, Hou K, Xia D, An T. Expert-Transformer with Prototype-Aware Contrastive Learning for Semi-Supervised Time-Series Classification. Electronics. 2026; 15(11):2303. https://doi.org/10.3390/electronics15112303
Chicago/Turabian StyleHuang, Zhen, Fei Peng, Kaiyuan Hou, Deming Xia, and Tianyu An. 2026. "Expert-Transformer with Prototype-Aware Contrastive Learning for Semi-Supervised Time-Series Classification" Electronics 15, no. 11: 2303. https://doi.org/10.3390/electronics15112303
APA StyleHuang, Z., Peng, F., Hou, K., Xia, D., & An, T. (2026). Expert-Transformer with Prototype-Aware Contrastive Learning for Semi-Supervised Time-Series Classification. Electronics, 15(11), 2303. https://doi.org/10.3390/electronics15112303

