Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis
Abstract
1. Introduction
2. Related Work
3. Background
Progressive Neural Networks (PNNs)
4. Materials, Problem Definition, and Baseline Model (BM)
4.1. Materials and Problem Definition
4.2. Baseline Model (BM)
5. Method
5.1. Adversarial Environments
5.2. PNN Agent Training and Evaluation Methodology
5.2.1. Training Procedure
5.2.2. Post-Training Evaluation Procedure
5.2.3. Forgetting Evaluation Procedure
6. Results and Discussion
6.1. Results in the Adversarial Environments
6.2. PNN Agent Training, Evaluation and Forgetting Results
7. Conclusions and Future Work
- Regarding the relevance of the visual layers, address visual mismatches between both scenes by, for instance, preprocessing the real image to make it resemble the virtual setup more (e.g., to mitigate reflections and other visual artifacts) or applying DR or DA during the training phase of the teacher agent.
- Research on the different possibilities of joining the teacher and the student columns through the lateral connections, apart from the one proposed in this paper.
- Study if there is any mechanism, like the lateral connections drop-off, that might help the agent to balance old and new knowledge correctly and prevent partial forgetting.
- Extend the analysis to more complex visual discrepancies, such as variations in texture and lighting conditions. These multimodal perturbations would provide a deeper understanding of the generalization capabilities of PNNs under other domain shifts.
- Perform the sim-to-real experiments with the PNN architecture designed and tested in this paper, but with the student trained in the real setting.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. PNN Hyperparameters
| Action Set a | Reward Distribution b | Success Distance |
|---|---|---|
| 0 | 70 if goal reached if goal not reached | 5
cm in training 10 cm in post-training evaluation |
| Hyperparameter | |
|---|---|
| Seed | 123 |
| Training steps | 30 million |
| Episode length | 50 steps or target reached |
| Success distance | 5 cm |
| Evaluation interval | 50,000 steps |
| Evaluation episodes | 40 episodes |
| Discount factor () | 0.99 |
| Learning rate | 1 |
| RMSprop decay | 0.99 |
| Entropy weight | 0.01 |
| Hyperparameter | |
|---|---|
| Seed | 803 |
| Success distance | 10 cm |
| Episodes evaluated | 1000 episodes |
Appendix B. Environments Evaluated
| Background | Gripper | ||||||
|---|---|---|---|---|---|---|---|
| Environment | Changes | Ground | Sky | Robot Link n | Fingers | Connection Part | Target |
| BM | – | Green (0.19, 0.30, 0.23) XXX | Gray (0.67, 0.71, 0.75) XXX | Orange (1.0, 0.5, 0) XXX Black (0.0, 0.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | Yellow (1.0, 1.0, 0.0) XXX | Red (1.0, 0.0, 0.0) XXX |
| Ground | Black (0.0, 0.0, 0.0) XXX | – | – | – | – | – | |
| White (1.0, 1.0, 1.0) ![]() | – | – | – | – | – | ||
| Sky | – | Black (0.0, 0.0, 0.0) XXX | – | – | – | – | |
| – | White (1.0, 1.0, 1.0) ![]() | – | – | – | – | ||
| Robot | – | – | Black (0.0, 0.0, 0.0) XXX | – | – | – | |
| – | – | White (1.0, 1.0, 1.0) ![]() | – | – | – | ||
| – | – | Black (0.0, 0.0, 0.0) XXX | – | – | – | ||
| – | – | White (1.0, 1.0, 1.0) ![]() | – | – | – | ||
| – | – | Red (1.0, 0.0, 0.0) XXX | – | – | – | ||
| – | – | Blue (0.0, 0.0, 1.0) XXX | – | – | – | ||
| Fingers changes | – | – | – | Black (0.0, 0.0, 0.0) XXX | – | – | |
| – | – | – | Red (1.0, 0.0, 0.0) XXX | – | – | ||
| Target | – | – | – | – | – | Black (0.0, 0.0, 0.0) XXX | |
| – | – | – | – | – | White (1.0, 1.0, 1.0) ![]() | ||
| – | – | – | – | – | Blue (0.0, 0.0, 1.0) XXX | ||
| Background | Gripper | ||||||
|---|---|---|---|---|---|---|---|
| Environment | Changes | Ground | Sky | Robot Link n | Fingers | Connection Part | Target |
| BM | – | Green (0.19, 0.30, 0.23) XXX | Gray (0.67, 0.71, 0.75) XXX | Orange (1.0, 0.5, 0) XXX Black (0.0, 0.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | Yellow (1.0, 1.0, 0.0) XXX | Red (1.0, 0.0, 0.0) XXX |
| Gripper | – | – | – | Green (0.19, 0.30, 0.23) XXX | Green (0.19, 0.30, 0.23) XXX | – | |
| – | – | – | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | ||
| – | – | – | White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | – | ||
| Background | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | – | – | – | |
| White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | – | – | – | – | ||
| Blue (0.0, 0.0, 1.0) XXX | Blue (0.0, 0.0, 1.0) XXX | – | – | – | – | ||
| Green (0.19, 0.30, 0.23) XXX | Green (0.19, 0.30, 0.23) XXX | – | – | – | – | ||
| Gray (0.67, 0.71, 0.75) XXX | Gray (0.67, 0.71, 0.75) XXX | – | – | – | – | ||
| Background & gripper | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | |
| Blue (0.0, 0.0, 1.0) XXX | Blue (0.0, 0.0, 1.0) XXX | – | Blue (0.0, 0.0, 1.0) XXX | Blue (0.0, 0.0, 1.0) XXX | – | ||
| Sky & gripper | – | Black (0.0, 0.0, 0.0) XXX | – | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | |
| – | White (1.0, 1.0, 1.0) ![]() | – | White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | – | ||
| Robot | – | – | Black (0.0, 0.0, 0.0) XXX | – | – | – | |
| – | – | White (1.0, 1.0, 1.0) ![]() | – | – | – | ||
| Robot & Gripper | – | – | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | Black (0.0, 0.0, 0.0) XXX | – | |
| – | – | White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | – | ||
Appendix C. Adversarial Environments Results
| Environment Change | Mean Return | Std. Dev. Return | Mean Episode Length | Std. Dev. Episode Length | Mean Failure Distance (cm) | Std. Dev. Failure Distance | Max. Failure Distance (cm) | Accuracy (%) |
|---|---|---|---|---|---|---|---|---|
| BM | 55.65 | 0.79 | 25.73 | 0.64 | 0.00 | 0.00 | 0.00 | 100.0 |
| White Ground | 25.26 | 5.90 | 36.62 | 1.70 | 0.15 | 0.02 | 0.20 | 60.7 |
| Black Sky | 28.31 | 5.82 | 37.01 | 1.72 | 0.18 | 0.02 | 0.22 | 64.5 |
| Black Robot | 21.11 | 6.14 | 38.17 | 1.77 | 0.23 | 0.02 | 0.28 | 54.4 |
| White Robot | 19.59 | 5.24 | 38.91 | 1.52 | 0.23 | 0.02 | 0.27 | 54.0 |
| Blue Robot | 18.19 | 5.88 | 38.82 | 1.72 | 0.24 | 0.02 | 0.29 | 51.7 |
| Black Target | 35.21 | 4.76 | 34.95 | 1.57 | 0.18 | 0.02 | 0.22 | 72.9 |
| White Target | 39.20 | 4.78 | 32.85 | 1.30 | 0.20 | 0.03 | 0.26 | 78.8 |
| Blue Target | 17.43 | 6.18 | 39.67 | 1.63 | 0.20 | 0.01 | 0.22 | 49.7 |
| Black Ground and Sky | 37.76 | 4.87 | 32.82 | 1.71 | 0.16 | 0.02 | 0.19 | 76.2 |
| White Ground and Sky | 32.58 | 5.12 | 34.34 | 1.39 | 0.18 | 0.02 | 0.22 | 69.8 |
| Blue Ground and Sky | 27.54 | 5.74 | 35.82 | 1.63 | 0.27 | 0.04 | 0.34 | 65.1 |
|
Black Ground, Sky, Connection part and Fingers | 37.18 | 5.63 | 33.15 | 1.71 | 0.16 | 0.03 | 0.27 | 75.8 |
| Blue Ground, Sky, Connection part and Fingers | 29.36 | 5.29 | 35.20 | 1.64 | 0.27 | 0.03 | 0.32 | 67.8 |
|
Black Sky, Connection part and Fingers | 24.80 | 5.04 | 38.08 | 1.39 | 0.19 | 0.01 | 0.22 | 60.0 |
| Black Link 3 and Link 4 | 45.21 | 4.42 | 31.25 | 1.42 | 0.21 | 0.04 | 0.30 | 86.4 |
| White Link 3 and Link 4 | 35.96 | 4.29 | 34.40 | 1.51 | 0.21 | 0.02 | 0.28 | 73.5 |
| Black Link 4 and Link 5 | 38.17 | 3.97 | 33.87 | 1.53 | 0.19 | 0.02 | 0.25 | 75.6 |
|
Black Link 4, Link 5, Connection part and Fingers | 39.34 | 4.59 | 33.16 | 1.37 | 0.20 | 0.04 | 0.27 | 78.2 |
References
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
- Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive Neural Networks. arXiv 2016, arXiv:1606.04671. [Google Scholar] [CrossRef]
- Rusu, A.A.; Večerík, M.; Rothörl, T.; Heess, N.; Pascanu, R.; Hadsell, R. Sim-to-Real Robot Learning from Pixels with Progressive Nets. In Proceedings of the 1st Conference on Robot Learning (CoRL), Mountain View, CA, USA, 13–15 November 2017. [Google Scholar] [CrossRef]
- Panzer, M.; Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 2022, 60, 4316–4341. [Google Scholar] [CrossRef]
- Rupprecht, T.; Wang, Y. A survey for deep reinforcement learning in markovian cyber–physical systems: Common problems and solutions. Neural Netw. 2022, 153, 13–36. [Google Scholar] [CrossRef]
- Singh, B.; Kumar, R.; Singh, V.P. Reinforcement learning in robotic applications: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 945–990. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, Canberra, Australia, 1–4 December 2020; pp. 737–744. [Google Scholar] [CrossRef]
- Qureshi, A.H.; Nakamura, Y.; Yoshikawa, Y.; Ishiguro, H. Intrinsically motivated reinforcement learning for human–robot interaction in the real-world. Neural Netw. 2018, 107, 23–33. [Google Scholar] [CrossRef]
- Wang, M.; Deng, W. Deep Visual Domain Adaptation: A Survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
- Higgins, I.; Pal, A.; Rusu, A.; Matthey, L.; Burgess, C.; Pritzel, A.; Botvinick, M.; Blundell, C.; Lerchner, A. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1480–1490. [Google Scholar]
- Jeong, R.; Aytar, Y.; Khosid, D.; Zhou, Y.; Kay, J.; Lampe, T.; Bousmalis, K.; Nori, F. Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2718–2724. [Google Scholar] [CrossRef]
- Bousmalis, K.; Irpan, A.; Wohlhart, P.; Bai, Y.; Kelcey, M.; Kalakrishnan, M.; Downs, L.; Ibarz, J.; Pastor, P.; Konolige, K.; et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 4243–4250. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 5–8 December 2013; Volume 27, pp. 2672–2680. [Google Scholar]
- Zhang, J.; Tai, L.; Yun, P.; Xiong, Y.; Liu, M.; Boedecker, J.; Burgard, W. VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control. IEEE Robot. Autom. Lett. 2019, 4, 1148–1155. [Google Scholar] [CrossRef]
- Rao, K.; Harris, C.; Irpan, A.; Levine, S.; Ibarz, J.; Khansari, M. RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11157–11166. [Google Scholar] [CrossRef]
- Arndt, K.; Hazara, M.; Ghadirzadeh, A.; Kyrki, V. Meta Reinforcement Learning for Sim-to-Real Domain Adaptation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2725–2731. [Google Scholar] [CrossRef]
- Wang, J.X.; Kurth-Nelson, Z.; Soyer, H.; Leibo, J.Z.; Tirumala, D.; Munos, R.; Blundell, C.; Kumaran, D.; Botvinick, M.M. Learning to reinforcement learn. arXiv 2017, arXiv:1611.05763. [Google Scholar] [PubMed]
- Ben-Iwhiwhu, E.; Dick, J.; Ketz, N.A.; Pilly, P.K.; Soltoggio, A. Context meta-reinforcement learning via neuromodulation. Neural Netw. 2022, 152, 70–79. [Google Scholar] [CrossRef] [PubMed]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar] [CrossRef]
- Vuong, Q.; Vikram, S.; Su, H.; Gao, S.; Christensen, H.I. How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies? arXiv 2019, arXiv:1903.11774. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Baker, B.; Chociej, M.; Józefowicz, R.; McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning dexterous in-hand manipulation. Int. J. Robot. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef]
- Chen, X.; Hu, J.; Jin, C.; Li, L.; Wang, L. Understanding Domain Randomization for Sim-To-Real Transfer. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25 April 2022. [Google Scholar]
- Ramos, F.; Possas, R.C.; Fox, D. BayesSim: Adaptive domain randomization via probabilistic inference for robotics simulators. arXiv 2019, arXiv:1906.01728. [Google Scholar] [CrossRef]
- Muratore, F.; Eilers, C.; Gienger, M.; Peters, J. Data-Efficient Domain Randomization with Bayesian Optimization. IEEE Robot. Autom. Lett. 2021, 6, 911–918. [Google Scholar] [CrossRef]
- Mehta, B.; Mila, M.D.; Mila, F.G.; Mila, C.J.P.; Montréal, P.; Paull, C.L. Active Domain Randomization. In Proceedings of the 4th Conference on Robot Learning (CoRL), Virtual, 16–18 November 2020; pp. 1162–1176. [Google Scholar] [CrossRef]
- Prakash, A.; Boochoon, S.; Brophy, M.; Acuna, D.; Cameracci, E.; State, G.; Shapira, O.; Birchfield, S. Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 7249–7255. [Google Scholar] [CrossRef]
- Yue, X.; Zhang, Y.; Zhao, S.; Sangiovanni-Vincentelli, A.; Keutzer, K.; Gong, B. Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2100–2110. [Google Scholar] [CrossRef]
- Chebotar, Y.; Handa, A.; Makoviychuk, V.; MacKlin, M.; Issac, J.; Ratliff, N.; Fox, D. Closing the sim-to-real Loop: Adapting simulation randomization with real world experience. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8973–8979. [Google Scholar] [CrossRef]
- Güitta-López, L.; Boal, J.; López-López, Á.J. Learning more with the same effort: How randomization improves the robustness of a robotic deep reinforcement learning agent. Appl. Intell. 2022, 53, 14903–14917. [Google Scholar] [CrossRef]
- Hussein, A.; Gaber, M.M.; Elyan, E.; Jayne, C. Imitation Learning: A Survey of Learning Methods. ACM Comput. Surv. (CSUR) 2017, 50, 21. [Google Scholar] [CrossRef]
- Nair, A.; McGrew, B.; Andrychowicz, M.; Zaremba, W.; Abbeel, P. Overcoming Exploration in Reinforcement Learning with Demonstrations. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 6292–6299. [Google Scholar] [CrossRef]
- Peng, X.B.; Abbeel, P.; Levine, S.; van de Panne, M. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. Assoc. Comput. Mach. Trans. Graph. 2018, 37, 143. [Google Scholar] [CrossRef]
- Ng, A.Y.; Russell, S. Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning (ICML), Stanford, CA, USA, 29 June–2 July 2000; pp. 663–670. [Google Scholar]
- Arora, S.; Doshi, P. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 2021, 297, 103500. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, Z.; Merel, J.; Rusu, A.; Erez, T.; Cabi, S.; Tunyasuvunakool, S.; Kramár, J.; Hadsell, R.; de Freitas, N.; et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills. In Proceedings of the Robotics: Science and Systems, Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar] [CrossRef]
- Ho, J.; Ermon, S. Generative Adversarial Imitation Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 4572–4580. [Google Scholar]
- Tiboni, G.; Arndt, K.; Kyrki, V. DROPO: Sim-to-real transfer with offline domain randomization. Robot. Auton. Syst. 2023, 166, 104432. [Google Scholar] [CrossRef]
- Rusu, A.A.; Colmenarejo, S.G.; Gulcehre, C.; Desjardins, G.; Kirkpatrick, J.; Pascanu, R.; Mnih, V.; Kavukcuoglu, K.; Hadsell, R. Policy Distillation. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Traoré, R.; Caselles-Dupré, H.; Lesort, T.; Sun, T.; Díaz-Rodríguez, N.; Filliat, D. Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Kadokawa, Y.; Zhu, L.; Tsurumine, Y.; Matsubara, T. Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization. Robot. Auton. Syst. 2023, 165, 104425. [Google Scholar] [CrossRef]
- Bellman, R. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef]
- Chen, Z.; Liu, B. Lifelong Machine Learning, 2nd ed.; Springer International Publishing: Cham, Switzerland, 2018; pp. 1–187. [Google Scholar] [CrossRef]
- Hou, J.; Zhang, Y.; Liu, X. Advancing Continual Lifelong Learning in Neural Information Retrieval. Inform. Sci. 2025, 687, 121368. [Google Scholar] [CrossRef]
- Yang, Q. Continual Learning: A Systematic Literature Review. Neural Netw. 2025, 195, 108226. [Google Scholar] [CrossRef]
- Luo, Y.; Li, W.; Wang, P.; Duan, H.; Wei, W.; Sun, J. Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand. IEEE Trans. Cogn. Dev. Syst. 2024, 16, 2019–2031. [Google Scholar] [CrossRef]
- Echchahed, A.; Castro, P.S. A Survey of State Representation Learning for Deep Reinforcement Learning. Trans. Mach. Learn. Res. 2025. [Google Scholar] [CrossRef]
- Yan, M.; Lyu, J.; Li, X. Enhancing visual reinforcement learning with State–Action Representation. Knowl.-Based Syst. 2024, 304, 112487. [Google Scholar] [CrossRef]
- De Oliveira, B.L.M.; Martins, L.G.B.; Brandão, B.; Luz, M.L.D.; De Lima Soares, T.W.; Carvalho Melo, L. Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning. In Proceedings of the 42nd International Conference on Machine Learning, PMLR, Vancouver, BC, Canada, 13–19 July 2025; Volume 267, pp. 12689–12717. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1928–1937. [Google Scholar]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar] [CrossRef]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Kadir, T.; Brady, M. Saliency, Scale and Image Description. Int. J. Comput. Vis. 2001, 45, 83–105. [Google Scholar] [CrossRef]
- Rosynski, M.; Kirchner, F.; Valdenegro-Toro, M. Are Gradient-based Saliency Maps Useful in Deep Reinforcement Learning? In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS), virtually, 6–12 December 2020. [Google Scholar] [CrossRef]
- Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 9525–9536. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
- Stevens, E.; Antiga, L.; Viehmann, T. Deep Learning with PyTorch; Manning Publications: Shelter Island, NY, USA, 2020. [Google Scholar]








| First Column (Teacher) | Second Column (Student) | |
|---|---|---|
| LSTM(2) | ||
| - | ||
| - | ||
| - | ||
| - | ||
| Total | 301,147 | 43,323 |
| Background | Robot | Gripper | Target |
|---|---|---|---|
| Green (ground) (0.19, 0.30, 0.23) XXX Gray (sky) (0.67, 0.71, 0.75) XXX | Orange (1.0, 0.5, 0.0) XXX Black (joint 6) (0.0, 0.0, 0.0) XXX | Yellow (connector) (1.0, 1.0, 0.0) XXX White (fingers) (1.0, 1.0, 1.0) ![]() | Red (1.0, 0.0, 0.0) XXX |
| Background | Gripper | ||||||
|---|---|---|---|---|---|---|---|
| Ground | Sky | Robot Link n | Fingers | Connection Part | Target | Accuracy (%) | |
| Green (0.19, 0.30, 0.23) XXX | Gray (0.67, 0.71, 0.75) XXX | Orange (1.0, 0.5, 0) XXX Black (0.0, 0.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | Yellow (1.0, 1.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | 78.8 | |
| White (1.0, 1.0, 1.0) ![]() | White (1.0, 1.0, 1.0) ![]() | Orange (1.0, 0.5, 0) XXX Black (0.0, 0.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | Yellow (1.0, 1.0, 0.0) XXX | Red (1.0, 0.0, 0.0) XXX | 69.6 | |
| Green (0.19, 0.30, 0.23) XXX | Gray (0.67, 0.71, 0.75) XXX | Orange (1.0, 0.5, 0) XXX Black (0.0, 0.0, 0.0) XXX | White (1.0, 1.0, 1.0) ![]() | Yellow (1.0, 1.0, 0.0) XXX | Blue (0.0, 0.0, 1.0) XXX | 49.7 | |
| Agent | Evaluation Step | Mean Return | Std. Dev. Return | Mean Episode Length | Std. Dev. Episode Length | Mean Failure Distance (cm) | Std. Dev. Failure Distance (cm) | Max Failure Distance (cm) | Accuracy (%) |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 39.20 | 4.78 | 32.85 | 1.30 | 0.20 | 0.03 | 0.26 | 78.8 | |
| 1,600,031 | 54.32 | 9.22 | 27.60 | 5.46 | 0.27 | 0.10 | 0.41 | 99.0 | |
| 0 | 32.58 | 5.12 | 34.34 | 1.39 | 0.18 | 0.02 | 0.22 | 69.8 | |
| 5,650,232 | 50.85 | 18.08 | 29.49 | 7.32 | 0.17 | 0.04 | 0.30 | 94.7 | |
| 0 | 17.43 | 6.18 | 39.67 | 1.63 | 0.20 | 0.01 | 0.22 | 49.7 | |
| 10,650,675 | 44.59 | 25.92 | 33.09 | 9.80 | 0.21 | 0.11 | 0.50 | 85.6 |
| Agent | Architecture | Training Environment | Evaluated Environment | Mean Return | Std. Dev. Return | Mean Episode Length | Std. Dev. Episode Length | Mean Failure Distance (cm) | Std. Dev. Failure Distance | Max Failure Distance (cm) | Accuracy (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LSTM | White target | White target | 54.32 | 9.22 | 27.60 | 5.46 | 0.27 | 0.10 | 0.41 | 99.0 | |
| White target | Red target | 44.19 | 26.57 | 34.84 | 9.46 | 0.18 | 0.10 | 0.55 | 83.7 | ||
| FC | White target | White target | 2.13 | 38.96 | 43.05 | 10.56 | 0.22 | 0.08 | 0.46 | 32.7 | |
| White target | Red target | 1.37 | 38.24 | 43.24 | 10.36 | 0.22 | 0.07 | 0.41 | 31.6 | ||
| LSTM | White ground & sky | White ground & sky | 50.85 | 18.08 | 29.49 | 7.32 | 0.17 | 0.04 | 0.30 | 94.7 | |
| White ground & sky | Green ground & grey sky | 40.82 | 29.65 | 32.24 | 9.62 | 0.21 | 0.05 | 0.35 | 80.9 | ||
| FC | White ground & sky | White ground & sky | −4.85 | 37.75 | 44.30 | 9.69 | 0.23 | 0.08 | 0.42 | 27.5 | |
| White ground & sky | Green ground & gray sky | −39.98 | 19.96 | 49.80 | 1.39 | 0.32 | 0.09 | 0.60 | 2.4 | ||
| LSTM | Blue target | Blue target | 44.59 | 25.92 | 33.09 | 9.80 | 0.21 | 0.11 | 0.50 | 85.6 | |
| Blue target | Red target | 54.88 | 5.65 | 26.54 | 3.95 | 0.21 | 0.05 | 0.27 | 99,8 | ||
| FC | Blue target | Blue target | 21.26 | 39.04 | 37.53 | 11.63 | 0.22 | 0.06 | 0.38 | 56.7 | |
| Blue target | Red target | 29.69 | 37.18 | 35.23 | 10.85 | 0.23 | 0.07 | 0.40 | 68.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Güitta-López, L.; Boal, J.; López-López, Á.J. Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis. AI 2026, 7, 120. https://doi.org/10.3390/ai7040120
Güitta-López L, Boal J, López-López ÁJ. Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis. AI. 2026; 7(4):120. https://doi.org/10.3390/ai7040120
Chicago/Turabian StyleGüitta-López, Lucía, Jaime Boal, and Álvaro J. López-López. 2026. "Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis" AI 7, no. 4: 120. https://doi.org/10.3390/ai7040120
APA StyleGüitta-López, L., Boal, J., & López-López, Á. J. (2026). Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis. AI, 7(4), 120. https://doi.org/10.3390/ai7040120


