Author Contributions
Conceptualization, C.G.; funding acquisition, W.L.; methodology, J.H.; project administration, W.L.; software, J.H.; supervision, C.G.; validation, C.G.; visualization, J.H.; writing—original draft, J.H. and C.G.; writing—review and editing, J.H., C.G. and W.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the United Kingdom EPSRC (grant numbers UKRI256, EP/V028251/1, EP/N031768/1, EP/S030069/1, and EP/X036006/1).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original data presented in the study are openly available in the repositories mentioned in the Section
4.
Acknowledgments
The support of Altera, Intel, AMD and Google Cloud is gratefully acknowledged.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
- Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; Su, J.K. This Looks Like That: Deep Learning for Interpretable Image Recognition. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar] [CrossRef]
- Donnelly, J.; Barnett, A.J.; Chen, C. Deformable protopnet: An interpretable image classifier using deformable prototypes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 10265–10275. [Google Scholar] [CrossRef]
- Singh, G.; Yow, K.C. These do not look like those: An interpretable deep learning model for image recognition. IEEE Access 2021, 9, 41482–41493. [Google Scholar] [CrossRef]
- Barnes, E.A.; Barnes, R.J.; Martin, Z.K.; Rader, J.K. This looks like that there: Interpretable neural networks for image tasks when location matters. Artif. Intell. Earth Syst. 2022, 1, e220001. [Google Scholar] [CrossRef]
- Nauta, M.; Van Bree, R.; Seifert, C. Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14933–14943. [Google Scholar] [CrossRef]
- Seo, S.; Kim, S.; Park, C. Interpretable Prototype-based Graph Information Bottleneck. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Volume 36, pp. 76737–76748. [Google Scholar] [CrossRef]
- Gao, C.; Zhang, T.; Jiang, X.; Huang, W.; Chen, Y.; Li, J. ProtoPLSTM: An Interpretable Deep Learning Approach for Wearable Fine-Grained Fall Detection. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), Haikou, China, 15–18 December 2022; pp. 516–524. [Google Scholar] [CrossRef]
- Liu, M.; Ren, S.; Ma, S.; Jiao, J.; Chen, Y.; Wang, Z.; Song, W. Gated transformer networks for multivariate time series classification. arXiv 2021, arXiv:2103.14438. [Google Scholar] [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar] [CrossRef]
- Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, Virtual Event, Singapore, 14–17 August 2021; pp. 2114–2124. [Google Scholar] [CrossRef]
- Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International joint conference on neural networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: New York, NY, USA, 2017; pp. 1578–1585. [Google Scholar] [CrossRef]
- Ding, H.; Trajcevski, G.; Scheuermann, P.; Wang, X.; Keogh, E. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 2008, 1, 1542–1552. [Google Scholar] [CrossRef]
- Abanda, A.; Mori, U.; Lozano, J.A. A review on distance based time series classification. Data Min. Knowl. Discov. 2019, 33, 378–412. [Google Scholar] [CrossRef]
- Xing, Z.; Pei, J.; Philip, S.Y. Early Prediction on Time Series: A Nearest Neighbor Approach. In Proceedings of the Early Prediction on Time Series: A Nearest Neighbor Approach, Pasadena, CA, USA, 11–17 July 2009; pp. 1297–1302. [Google Scholar]
- Górecki, T.; Łuczak, M.; Piasecki, P. An exhaustive comparison of distance measures in the classification of time series with 1NN method. J. Comput. Sci. 2024, 76, 102235. [Google Scholar] [CrossRef]
- Ye, L.; Keogh, E. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 947–956. [Google Scholar] [CrossRef]
- Karlsson, I.; Papapetrou, P.; Boström, H. Generalized random shapelet forests. Data Min. Knowl. Discov. 2016, 30, 1053–1085. [Google Scholar] [CrossRef]
- Yang, Y.; Deng, Q.; Shen, F.; Zhao, J.; Luo, C. A shapelet learning method for time series classification. In Proceedings of the 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA, USA, 6–8 November 2016; pp. 423–430. [Google Scholar] [CrossRef]
- Hills, J.; Lines, J.; Baranauskas, E.; Mapp, J.; Bagnall, A. Classification of time series by shapelet transformation. Data Min. Knowl. Discov. 2014, 28, 851–881. [Google Scholar] [CrossRef]
- Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
- He, J.; Wang, W.; Jin, X.; Li, H.; Liu, J.; Chen, B. Multiscale Cross-Attention CNN-Transformer Two-Branch Fusion Network for Detecting Railway U-Shaped Bolts & Nuts Defects. IEEE/ASME Trans. Mechatron. 2025, 1–12. [Google Scholar] [CrossRef]
- He, J.; Lv, F.; Liu, J.; Wu, M.; Chen, B.; Wang, S. C2T-HR3D: Cross-Fusion of CNN and Transformer for High-Speed Railway Dropper Defect Detection. IEEE Trans. Instrum. Meas. 2025, 74, 1–16. [Google Scholar] [CrossRef]
- He, J.; Duan, R.; Dong, M.; Kao, Y.; Guo, G.; Liu, J. CNN-Transformer Bridge Mode for Detecting Arcing Horn Defects in Railway Sectional Insulator. IEEE Trans. Instrum. Meas. 2024, 73, 1–16. [Google Scholar] [CrossRef]
- Hüsken, M.; Stagge, P. Recurrent neural networks for time series classification. Neurocomputing 2003, 50, 223–235. [Google Scholar] [CrossRef]
- Tang, Y.; Xu, J.; Matsumoto, K.; Ono, C. Sequence-to-sequence model with attention for time series classification. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 503–510. [Google Scholar] [CrossRef]
- Narayanan, A.; Bergen, K. Prototype-Based Methods in Explainable AI and Emerging Opportunities in the Geosciences. In Proceedings of the ICML 2024 AI for Science Workshop, Vienna, Austria, 21–27 July 2024. [Google Scholar] [CrossRef]
- Li, O.; Liu, H.; Chen, C.; Rudin, C. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar] [CrossRef]
- Senin, P.; Malinchik, S. Sax-vsm: Interpretable time series classification using sax and vector space model. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 12–15 December 2016; pp. 1175–1180. [Google Scholar] [CrossRef]
- Xing, Z.; Pei, J.; Yu, P.S.; Wang, K. Extracting Interpretable Features for Early Classification on Time Series. In Proceedings of the 2011 SIAM International Conference on Data Mining (SDM), Mesa, AZ, USA, 29–30 April 2011; pp. 247–258. [Google Scholar] [CrossRef]
- Liang, Z.; Wang, H. Efficient class-specific shapelets learning for interpretable time series classification. Inf. Sci. 2021, 570, 428–450. [Google Scholar] [CrossRef]
- Lee, Z.; Lindgren, T.; Papapetrou, P. Z-time: Efficient and effective interpretable multivariate time series classification. Data Min. Knowl. Discov. 2024, 38, 206–236. [Google Scholar] [CrossRef]
- Middlehurst, M.; Ismail-Fawaz, A.; Guillaume, A.; Holder, C.; Guijo-Rubio, D.; Bulatova, G.; Tsaprounis, L.; Mentel, L.; Walter, M.; Schäfer, P.; et al. aeon: A Python Toolkit for Learning from Time Series. J. Mach. Learn. Res. 2024, 25, 1–10. [Google Scholar] [CrossRef]
- Dau, H.A.; Keogh, E.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Chen, Y.; Hu, B.; Begum, N.; et al. The UCR Time Series Classification Archive. 2018. Available online: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 2 November 2025).
- Bagnall, A.; Dau, H.A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; Keogh, E. The UEA multivariate time series classification archive, 2018. arXiv 2018, arXiv:1811.00075. [Google Scholar] [CrossRef]
Figure 1.
The architecture of ProtoPGTN. The inputs are passed to two separate transformer encoders to capture temporal and spatial features, with positional encoding added to the step-wise encoder. The prototype layer learns k prototypes per class to explain the decision-making process resulting in a total of n prototypes. The similarity score between learned prototypes and encodings is calculated. This similarity score is passed to a fully connected layer followed by a softmax activation for final prediction output.
Figure 2.
Prototypes learned for representative classes (0, 1, 6, and 17) from ArticularyWordRecognition dataset. Each colored row corresponds to one class, and each row contains three of five learned prototypes. Each prototype is visualized using nearest training segments and the segment information is shown on the legend indicating sample index, channel and temporal window. Prototypes within the same class exhibit similar patterns, whereas prototypes across different classes demonstrate distinct temporal dynamics. This suggests that ProtoPGTN can learn representative prototypes, contributing to both interpretability and accurate final decision-making.
Figure 3.
Illustration of reasoning process of the proposed network for a sample from ArticularyWordRecognition dataset. The top panel shows the original test sample, which belongs to class 0 and contains nine channels. Two blocks below compare the latent features of this test sample with learned prototypes from class 0 (left) and class 1 (right). Within each block, the first column represents one channel from original sample, with the red box highlighting the subsequence that is most similar to the prototype learned in the second column. The third column displays the cosine similarity score computed in the latent space between them. The class-0 prototypes achieve significantly higher similarity scores (0.9598, 0.9480, 0.9679) than class-1 prototypes (0.1053, 0.0942, 0.0982). This demonstrates that the model’s decision-making process is interpretable and evidence-based.
Table 1.
Comparison of prototype-based interpretable models and their applicability across domains.
| Model | Backbone | Similarity Metric | Domain |
|---|
| ProtoPNet [3] | CNN | Euclidean distance | Computer Vision |
| Deformable ProtoPNet [4] | CNN | Euclidean distance | Computer Vision |
| NP-ProtoPNet [5] | CNN | Euclidean distance | Computer Vision |
| ProtoLNet [6] | CNN | Euclidean distance | Earth Systems |
| ProtoTree [7] | CNN + Decision Tree | Euclidean distance | Computer Vision |
| PGIB [8] | GNN + Information Bottleneck | Euclidean distance | Graph Data |
| ProtoPLSTM [9] | CNN + LSTM | Euclidean distance | Time Series (SisFall) |
| ProtoPGTN | Gated Transformer | Cosine similarity | General TSC |
Table 2.
Comparison of interpretability techniques across categories and their applicability to TSC.
| Category | Technique | Interpretation | TSC |
|---|
| Prototype-based | ProtoPNet | Image-patch prototypes | No |
| Def. ProtoPNet | Flexible prototypes | No |
| NP-ProtoPNet | Pos/Neg prototypes | No |
| ProtoLNet | Location-aware prototypes | No |
| ProtoTree | Prototype + Decision Tree | No |
| ProtoPLSTM | Prototype + CNN/LSTM | Yes |
| Shapelet/Feature-based | SAX-VSM | Symbolic pattern extraction | Yes |
| Shapelet-based | Discriminative subsequences | Yes |
Interpretable Feature Extraction | Local shapelets for early classification | Yes |
| Z-time | Temporal abstraction | Yes |
| Model-agnostic/Other | Linear Regression | Weight-based | No |
| Decision Tree | Rule/Path-based | No |
| Attention-based | Saliency/attention weights | Yes |
| BETA | Rule-based approximation | No |
| LIME | Local surrogate models | Yes |
Table 3.
Comparison of the existing prototype-based interpretable model (ProtoPLSTM) and the proposed ProtoPGTN. ✓ indicates the presence of the feature and ✗ indicates absence.
| Aspect | ProtoPLSTM [9] | ProtoPGTN |
|---|
| Linear layer before feature extraction | ✗ | ✓, reduces memory usage |
| Feature extraction | 2D Conv + LSTM | GTN |
| Gate mechanism | ✗ | ✓, fuses temporal and spatial results |
| Similarity measure | Euclidean distance | Cosine similarity |
| Prototype dimension | 2D | 1D |
| Efficient prototype matching | ✗ | ✓, direct projection to prototype shape |
| Evaluation scope | One dataset (SisFall) | 165 datasets (UCR/UEA) |
| Accuracy | Comparable on UCR/UEA | Comparable to ProtoPLSTM |
| Training time | Slow | ∼8× faster on average |
| GPU memory usage | Extremely high | Highly efficient (∼379× reduction on average) |
Table 4.
Hyperparameters used in ProtoPGTN and their default configurations.
| Parameter | Default Value |
|---|
| Model Architecture Parameters |
| tuned |
| Number of layers | 4 |
| Number of heads | 4 |
| Hidden units | 128 |
| Query/Value dimension | 8 |
| Dropout rate | 0.1 |
| Positional encoding | Yes |
| Mask | Yes |
| Number of prototypes per class | 5 |
| Feature dimension | 128 |
| Training Parameters |
| Batch size | 32 |
| Epochs | 100 |
| Projection interval | 5 |
| Random seed | tuned |
| Learning rate | 0.001 |
Table 5.
Hardware configuration used for recording the training time.
| Category | Specification |
|---|
| Hardware |
| GPU | NVIDIA A30 (24 GB) |
| Number of GPUs | 1 |
| CPU | AMD EPYC 7702P (32 cores) |
| System Memory (RAM) | 111GB |
| Storage | 180 GB SSD (root: 35 GB, var: 20 GB, data: 121 GB) |
| Software Environment |
| Operating System | Ubuntu 24.04.2 LTS (Noble) |
| CUDA/cuDNN | CUDA 12.8 |
Table 6.
Accuracy comparison across multivariate time series datasets. Results in bold mean higher accuracy between ProtoPLSTM and ProtoPGTN. Missing values for ProtoPLSTM and ProtoPConv are due to excessive GPU memory usage or invalid convolution kernel settings on certain datasets. For Zerveas’s transformer-based model, missing results also occur when GPU memory is insufficient for large-channel datasets. The average accuracy is computed only on datasets where all models successfully produced results.
| Dataset | ProtoPGTN | ProtoPLSTM | ProtoPConv | TimesNet | Zerveas | MLP |
|---|
| AsphaltObstaclesCoordinates | 73.66% | 80.36% | 52.69% | 78.01% | 79.54% | 0.00% |
| AsphaltPavementTypeCoordinates | 80.49% | 88.26% | 84.19% | 85.89% | 80.78% | 0.00% |
| AsphaltRegularityCoordinates | 91.61% | 96.54% | 95.74% | 95.87% | 89.08% | 0.13% |
| ArticularyWordRecognition | 93.00% | 96.67% | 81.00% | 98.00% | 98.33% | 1.00% |
| BasicMotions | 82.50% | 75.00% | 62.50% | 97.50% | 100.00% | 0.00% |
| Blink | 55.56% | 55.56% | 77.33% | 95.33% | 95.78% | 0.67% |
| CharacterTrajectories | 88.65% | 98.47% | 97.21% | 98.89% | 98.89% | 0.07% |
| Cricket | 81.94% | 94.44% | 59.72% | 90.28% | 0.00% | 1.39% |
| EMOPain | 78.31% | 78.31% | 66.76% | 81.13% | 86.20% | 0.85% |
| ERing | 68.52% | 13.33% | 33.33% | 81.48% | 94.81% | 1.11% |
| EthanolConcentration | 28.14% | 28.90% | 25.10% | 29.67% | – | 0.76% |
| EyesOpenShut | 50.00% | 50.00% | 50.00% | 66.67% | 64.29% | 4.76% |
| FaceDetection | 65.81% | – | 65.32% | 67.65% | 67.68% | 0.03% |
| DuckDuckGeese | 36.00% | – | 24.00% | 58.00% | 70.00% | 7.98% |
| JapaneseVowels | 95.41% | – | – | 98.38% | 99.46% | 0.81% |
| MotionSenseHAR | 92.45% | 98.87% | 95.09% | 92.08% | 95.85% | 1.13% |
| RacketSports | 81.58% | – | – | 84.21% | 88.16% | 0.66% |
| SelfRegulationSCP1 | 88.05% | 80.89% | 69.28% | 90.44% | 89.76% | 0.00% |
| Handwriting | 13.88% | 3.41% | 6.94% | 30.94% | 32.12% | 0.00% |
| Epilepsy | 74.64% | 91.30% | 72.46% | 92.03% | 97.83% | 1.45% |
| AtrialFibrillation | 33.33% | 33.33% | 33.33% | 46.67% | 46.67% | 0.00% |
| FingerMovements | 45.00% | – | – | 57.00% | 57.00% | 2.00% |
| HandMovementDirection | 50.00% | 33.78% | 17.57% | 58.11% | 67.57% | 0.00% |
| Heartbeat | 72.20% | 72.20% | 72.20% | 78.05% | 74.63% | 0.00% |
| Libras | 53.89% | – | – | 75.56% | 86.67% | 2.78% |
| LSST | 54.62% | – | – | 37.99% | 58.92% | 0.04% |
| MindReading | 23.12% | – | 23.12% | 41.50% | 45.33% | 0.15% |
| NATOPS | 76.11% | – | – | 83.89% | 96.11% | 0.06% |
| PEMS-SF | 52.60% | – | 32.95% | 82.08% | 84.97% | 3.47% |
| PenDigits | 97.97% | – | – | 98.28% | 97.68% | 0.06% |
| PhonemeSpectra | 2.56% | 18.43% | 14.52% | 13.93% | 8.44% | 0.06% |
| SelfRegulationSCP2 | 50.56% | 50.00% | 50.00% | 53.89% | 55.00% | 1.11% |
| SpokenArabicDigits | 98.04% | 96.77% | 97.68% | 98.18% | 98.04% | 0.09% |
| StandWalkJump | 33.33% | 33.33% | 33.33% | 48.00% | – | 13.33% |
| UWaveGestureLibrary | 83.44% | 75.00% | 13.75% | 86.88% | 86.56% | 0.00% |
| Average | 67.69% | 66.02% | 53.83% | 77.14% | 78.10% | 5.92% |
| Wins | 19 | 10 | – | – | – | – |
Table 7.
Accuracy comparison across univariate time series datasets. Results in bold mean higher accuracy between ProtoPLSTM and ProtoPGTN. Missing values from ProtoPLSTM and ProtoPConv are due to the invalid kernel size.
| Dataset | ProtoPGTN | ProtoPLSTM | ProtoPConv | TimesNet | Zerveas | MLP |
|---|
| BeetleFly | 90.00% | 65.00% | 50.00% | 85.00% | 95.00% | 85.00% |
| BME | 96.00% | 38.67% | 51.33% | 64.00% | 90.00% | 96.67% |
| Car | 70.00% | 60.00% | 31.67% | 78.33% | 85.00% | 81.67% |
| CBF | 86.44% | 82.67% | 33.33% | 83.89% | 98.33% | 86.78% |
| Coffee | 100.00% | 60.71% | 53.57% | 89.29% | 96.43% | 100.00% |
| Computers | 54.80% | 62.00% | 66.80% | 65.60% | 69.60% | 53.60% |
| Crop | 73.77% | – | – | 75.72% | 74.88% | 66.92% |
| DistalPhalanxTW | 58.27% | 30.22% | 67.63% | 71.94% | 71.22% | 66.19% |
| ECG200 | 81.00% | 80.00% | 68.00% | 84.00% | 90.00% | 90.00% |
| ECG5000 | 92.49% | 91.96% | 92.02% | 94.09% | 94.13% | 92.22% |
| ElectricDeviceDetection | 85.93% | 83.97% | 64.30% | 86.28% | 86.31% | 81.82% |
| FaceAll | 81.42% | 68.99% | 63.91% | 77.40% | 75.50% | 81.30% |
| FaceFour | 73.86% | 25.00% | 15.91% | 82.95% | 84.09% | 85.23% |
| Haptics | 40.91% | 37.66% | 19.16% | 44.48% | 48.38% | 42.86% |
| FreezerSmallTrain | 76.63% | 53.54% | 51.16% | 76.70% | 77.02% | 67.61% |
| GesturePebbleZ2 | 61.39% | 70.89% | 34.18% | 85.44% | 75.95% | 59.49% |
| GunPointMaleVersusFemale | 96.52% | 52.53% | 47.47% | 99.05% | 100.00% | 99.68% |
| Ham | 82.86% | 64.76% | 67.62% | 79.05% | 75.24% | 71.43% |
| Lightning2 | 70.49% | 60.66% | 45.90% | 77.05% | 77.05% | 68.85% |
| MixedShapesSmallTrain | 72.04% | 56.29% | 38.56% | 82.39% | 83.55% | 84.49% |
| Plane | 93.33% | 88.57% | 73.33% | 98.10% | 99.05% | 97.14% |
| PowerCons | 100.00% | 99.44% | 93.33% | 99.44% | 100.00% | 100.00% |
| ProximalPhalanxOutlineAgeGroup | 85.37% | 48.78% | 83.90% | 85.37% | 86.83% | 82.93% |
| ShapeletSim | 47.22% | 50.00% | 53.89% | 56.11% | 53.89% | 48.33% |
| SonyAIBORobotSurface1 | 88.02% | 42.93% | 42.93% | 77.04% | 78.04% | 65.56% |
| SonyAIBORobotSurface2 | 87.20% | 61.70% | 68.10% | 82.27% | 83.00% | 83.84% |
| UWaveGestureLibraryX | 73.03% | 77.39% | 49.69% | 72.39% | 72.17% | 76.66% |
| WormsTwoClass | 59.74% | 54.55% | 55.84% | 64.94% | 67.53% | 59.74% |
| LargeKitchenAppliances | 56.27% | 76.00% | 79.47% | 60.00% | 53.07% | 48.00% |
| Earthquakes | 74.82% | 74.82% | 74.82% | 76.98% | 74.82% | 70.50% |
| Average | 76.99% | 62.75% | 56.48% | 78.51% | 80.53% | 76.48% |
| Number of Wins | 24 | 5 | – | – | – | – |
Table 8.
Detailed training time in seconds (s) for ProtoPGTN and ProtoPLSTM on multivariate datasets. Missing values are due to memory limitations and invalid kernel sizes. The bold values indicate faster training time.
| Dataset | ProtoPGTN | ProtoPLSTM |
|---|
| AsphaltObstaclesCoordinates | 0.9839 | 1.8569 |
| AsphaltPavementTypeCoordinates | 2.1802 | 4.3391 |
| AsphaltRegularityCoordinates | 1.4258 | 2.3824 |
| ArticularyWordRecognition | 1.0612 | 6.5625 |
| BasicMotions | 0.5367 | 0.5922 |
| Blink | 1.4250 | 2.6275 |
| CharacterTrajectories | 2.5115 | 6.4605 |
| Cricket | 2.7315 | 9.6463 |
| EMOPain | 1.4805 | 14.0935 |
| ERing | 0.5438 | 0.6205 |
| EthanolConcentration | 0.5211 | 6.1584 |
| EyesOpenShut | 0.5373 | 0.7162 |
| FaceDetection | 6.0796 | – |
| DuckDuckGeese | 1.5141 | – |
| JapaneseVowels | 0.7107 | – |
| MotionSenseHAR | 4.0186 | 68.0326 |
| RacketSports | 0.5967 | – |
| SelfRegulationSCP1 | 1.5961 | 3.6851 |
| Handwriting | 0.8535 | 1.9227 |
| Epilepsy | 0.6433 | 0.9020 |
| AtrialFibrillation | 0.4843 | 1.6728 |
| FingerMovements | 0.6868 | – |
| HandMovementDirection | 0.7891 | 2.9576 |
| Heartbeat | 0.9343 | 10.8406 |
| Libras | 0.7334 | – |
| LSST | 2.4925 | – |
| MindReading | 2.0731 | – |
| NATOPS | 0.6762 | – |
| PEMS-SF | 1.7808 | – |
| PenDigits | 5.6463 | – |
| PhonemeSpectra | 11.3748 | 227.0682 |
| SelfRegulationSCP2 | 0.6953 | 3.9527 |
| SpokenArabicDigits | 6.2297 | 16.5668 |
| StandWalkJump | 0.9033 | 1.0293 |
| UWaveGestureLibrary | 0.7198 | 1.5138 |
| Number of Wins | 35 | 0 |
Table 9.
Detailed training time in seconds (s) for ProtoPGTN and ProtoPLSTM on univariate datasets. Only datasets for which both models produced training time results are shown. The bold values indicate faster training time.
| Dataset | ProtoPGTN | ProtoPLSTM |
|---|
| BeetleFly | 0.5328 | 0.5660 |
| BME | 0.5385 | 0.6015 |
| Car | 0.6966 | 0.7052 |
| CBF | 0.5469 | 0.6645 |
| Coffee | 0.5253 | 0.5151 |
| Computers | 1.2655 | 0.9348 |
| DistalPhalanxTW | 0.8235 | 0.7525 |
| ECG200 | 0.5170 | 1.0160 |
| ECG5000 | 0.7974 | 2.1640 |
| ElectricDeviceDetection | 1.0514 | 1.3032 |
| FaceAll | 1.2722 | 1.4824 |
| FaceFour | 0.5430 | 0.7160 |
| Haptics | 1.9317 | 1.5354 |
| FreezerSmallTrain | 0.5461 | 0.9043 |
| GesturePebbleZ2 | 0.8747 | 1.0640 |
| GunPointMaleVersusFemale | 0.6117 | 0.7709 |
| Ham | 0.6375 | 0.8147 |
| Lightning2 | 0.6237 | 0.9095 |
| MixedShapesSmallTrain | 1.5439 | 1.4716 |
| Plane | 0.6202 | 0.8274 |
| PowerCons | 0.6474 | 0.8431 |
| ProximalPhalanxOutlineAgeGroup | 0.7405 | 0.8287 |
| ShapeletSim | 0.5200 | 0.7512 |
| SonyAIBORobotSurface1 | 0.5108 | 0.7453 |
| SonyAIBORobotSurface2 | 0.5182 | 0.7689 |
| UWaveGestureLibraryX | 2.3779 | 3.0240 |
| WormsTwoClass | 1.2762 | 1.0682 |
| LargeKitchenAppliances | 1.8021 | 1.5819 |
| Earthquakes | 1.1020 | 1.0097 |
| Number of Wins | 21 | 8 |
Table 10.
Memory usage (MB) comparison between ProtoPGTN and ProtoPLSTM across selected high-dimensional datasets. The smaller value in each row is highlighted in bold.
| Dataset | ProtoPGTN | ProtoPLSTM | No. Variable | Reduction Factor |
|---|
| ArticularyWordRecognition | 103.60 | 281.17 | 9 | 2.71 |
| BasicMotions | 55.76 | 138.78 | 6 | 2.49 |
| EMOPain | 85.68 | 2857.00 | 30 | 33.34 |
| EthanolConcentration | 42.20 | 53.40 | 3 | 1.27 |
| EyesOpenShut | 25.75 | 642.34 | 14 | 24.94 |
| FaceDetection | 62.43 | 62,267 (est.) | 144 | 997.61 |
| DuckDuckGeese | 1490.77 | 5,440,971.19 (est.) | 1345 | 3649.38 |
| JapaneseVowels | 34.88 | 435.54 (est.) | 12 | 12.49 |
| MotionSenseHAR | 460.78 | 523.58 | 12 | 1.14 |
| RacketSports | 35.15 | 111.93 | 6 | 3.18 |
| FingerMovements | 54.37 | 2362.59 | 28 | 43.45 |
| Heartbeat | 263.24 | 11,776.62 | 61 | 44.75 |
| LSST | 37.67 | 112.12 (est) | 6 | 2.98 |
| MindReading | 323.94 | 125,282.15 (est.) | 204 | 386.72 |
| NATOPS | 45.21 | 1737.69 (est.) | 24 | 38.44 |
| PEMS-SF | 1774.83 | 2,784,217.19 (est.) | 963 | 1569.33 |
| PhonemeSpectra | 180.03 | 431.73 | 11 | 2.40 |
| SpokenArabicDigits | 66.35 | 574.74 | 13 | 8.66 |
| Number of wins | 18 | 0 | – | – |
Table 11.
Results of the Wilcoxon signed-rank test comparing ProtoPGTN and ProtoPLSTM across different metrics. p-values below 0.05 indicate statistical significance. Bold values indicate statistical significance at .
| Metric | Test Type | p-Value |
|---|
| Accuracy (Multivariate) | two-sided | 0.32587 |
| Accuracy (Univariate) | two-sided | 0.00069 |
| Training Time (Multivariate) | one-sided (less) | 0.00000 |
| Training Time (Univariate) | one-sided (less) | 0.00691 |
| Memory Usage | one-sided (less) | 0.00000 |
Table 12.
Ablation study results comparing ProtoPGTN, ProtoPGTN (Euclidean), and GTN. ProtoPGTN is the original proposed model, ProtoPGTN (Euc.) is the ProtoPGTN using Euclidean distance to replace cosine similarity, and the GTN classifier uses the representation learned by the two-tower transformer directly for classification. Bold values indicate the best performance among the three methods for each dataset.
| Dataset | ProtoPGTN | ProtoPGTN (Euc.) | GTN |
|---|
| AsphaltObstaclesCoordinates | 73.66% | 49.10% | 55.75% |
| AsphaltPavementTypeCoordinates | 80.49% | 66.57% | 67.05% |
| AsphaltRegularityCoordinates | 91.61% | 81.76% | 80.56% |
| ArticularyWordRecognition | 93.00% | 94.67% | 96.33% |
| BasicMotions | 82.50% | 47.50% | 95.00% |
| Blink | 55.56% | 55.56% | 90.22% |
| CharacterTrajectories | 88.65% | 98.19% | 98.26% |
| Cricket | 81.94% | 76.39% | 84.72% |
| EMOPain | 78.31% | 79.15% | 84.23% |
| ERing | 68.52% | 84.07% | 89.63% |
| EthanolConcentration | 28.14% | 25.10% | 34.98% |
| EyesOpenShut | 50.00% | 50.00% | 50.00% |
| FaceDetection | 65.81% | 67.57% | 65.30% |
| DuckDuckGeese | 36.00% | 20.00% | 58.00% |
| JapaneseVowels | 95.41% | 94.86% | 98.38% |
| MotionSenseHAR | 92.45% | 94.34% | 93.58% |
| RacketSports | 81.58% | 76.32% | 83.55% |
| SelfRegulationSCP1 | 88.05% | 87.37% | 84.30% |
| Handwriting | 13.88% | 12.12% | 19.18% |
| Epilepsy | 74.64% | 68.12% | 67.39% |
| AtrialFibrillation | 33.33% | 13.33% | 26.67% |
| FingerMovements | 45.00% | 62.00% | 47.00% |
| HandMovementDirection | 50.00% | 43.24% | 39.19% |
| Heartbeat | 72.20% | 74.63% | 73.66% |
| Libras | 53.89% | 12.78% | 83.33% |
| LSST | 54.62% | 56.29% | 52.60% |
| MindReading | 23.12% | 23.12% | 23.12% |
| NATOPS | 76.11% | 86.67% | 91.67% |
| PEMS-SF | 52.60% | 12.72% | 85.55% |
| PenDigits | 97.97% | 97.74% | 96.91% |
| PhonemeSpectra | 2.56% | 6.29% | 9.22% |
| SelfRegulationSCP2 | 50.56% | 52.78% | 50.56% |
| SpokenArabicDigits | 98.04% | 97.36% | 97.95% |
| StandWalkJump | 33.33% | 33.33% | 33.33% |
| UWaveGestureLibrary | 83.44% | 78.13% | 85.00% |
| Average | 64.20% | 59.40% | 68.35% |
| No. Wins | 9 | 6 | 20 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).