# Self-Improving Generative Artificial Neural Network for Pseudorehearsal Incremental Class Learning

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- Designing a neural network model that combines a generative model with a classifier to learn new patterns while reducing the need for storage of training data.
- Introducing a novelty-detection model that can help recognize new tasks for incremental learning tasks.

## 2. Theoretical Framework

#### 2.1. Rehearsal and Pseudorehearsal Learning

#### 2.2. Variational Autoencoders for Image Generation

#### 2.3. Novelty Recognition

## 3. Proposal

Algorithm 1 SIGANN Model training. | |

Require: Set of initial data ${X}_{train}$ and output classes ${y}_{train}\in \{1,\cdots ,C\}$ | |

1: | Uniform Distribution Generator $Z=\mathcal{U}(-\sqrt{3},\sqrt{3})$ of sample length n |

2: | Train Adversarial Autoencoder: $\widehat{X},\widehat{z},\widehat{y}=AAE({X}_{train},{y}_{train},Z)$ |

3: | for c = 1, …, C do |

4: | Get Mean Activation of trained samples ${\mu}_{c}=mean(\widehat{y}\in c)$ |

5: | Compute distance of each output of class c: ${d}_{c}=|{\widehat{y}}_{c}-{\mu}_{c}|$ |

6: | Fit ${d}_{c}$ to Weibull distribution ${\mathcal{W}}_{c}$ and get parameters ${\rho}_{c}=({\kappa}_{c},{\lambda}_{c})$ |

7: | while Receiving new data ${X}_{new}$ do |

8: | Evaluate samples of data ${X}_{new}$ on inference: $\widehat{y}=AA{E}_{enc}\left({X}_{new}\right)$ |

9: | Revise OpenMax activation of data: ${y}^{\star}=\mathrm{OpenMax}(\widehat{y},\rho ,\u03f5=0.95)$ |

10: | if ${y}^{\star}=C+1$ then |

11: | Store input ${X}^{\star}$ and output ${y}^{\star}$ |

12: | Generate samples ${X}_{gen}$ from classes ${y}_{gen}\in \{1,\cdots ,C\}$ |

13: | Evaluate generated samples: ${\widehat{y}}_{gen}=AA{E}_{enc}\left({X}_{gen}\right)$ |

14: | if ${\widehat{y}}_{gen}={y}_{gen}$ and $P({\widehat{y}}_{gen}=c|X)>0.9$ then |

15: | Store ${X}_{gen},{y}_{gen}$ |

16: | else |

17: | Discard ${X}_{gen},{y}_{gen}$ and Re-generate |

18: | Update Number of classes: $C=C+1$ |

19: | Re-train Adversarial Autoencoder with new and generated data: $\widehat{X},\widehat{z},\widehat{y}=AAE({X}_{gen}+{X}^{\star},{y}_{gen}+{y}^{\star},Z)$ |

20: | Fit new class and samples to Weibull distribution |

21: |

#### 3.1. Module 1: Classifier

#### 3.2. Module 2: Generator

#### 3.3. Module 3: Novelty Detector

Algorithm 2 Meta-Recognition Algorithm. |

Require: Logits from the final layer from each class ${\mathbf{v}}_{c}\left(x\right)={v}_{1}\left(x\right)\cdots {v}_{N}\left(x\right)$ from training phase; Number of extreme values to fit $\eta $ |

1: for $c=1\cdots N$ do |

2: Compute Mean Activation: ${\mu}_{c}=mean\left(\right)open="("\; close=")">{\mathbf{v}}_{c}$ |

3: Fit to Weibull: ${\rho}_{c}=({\kappa}_{c},{\lambda}_{c})=\mathrm{FitHigh}\left(\right)open="("\; close=")">\Vert {\mathbf{v}}_{c}-{\mu}_{c}\Vert ,\eta $ |

4: return ${\mu}_{c}$ and ${\rho}_{c}$ for each class |

Algorithm 3 OpenMax Algorithm. |

Require: Logits from the final layer from each class ${\mathbf{v}}_{c}\left(x\right)={v}_{1}\left(x\right)\cdots {v}_{C}\left(x\right)$ from evaluation; $\alpha $ number of top classes to revise. |

1: Execute Algorithm 2 of the Meta-Recognition to obtain the Mean ${\mu}_{c}$ and Weibull parameters ${\rho}_{c}=({\kappa}_{c},{\lambda}_{c})$ for each class c |

2: for $i=1,\cdots ,\alpha $ do |

3: $s\left(i\right)=argsort\left(\right)open="("\; close=")">{v}_{c}\left(x\right)$ |

4: ${\omega}_{s\left(i\right)}\left(x\right)=1-\frac{\alpha -i}{\alpha}{{e}^{-\left(\right)open="("\; close=")">\frac{\Vert {v}_{i}\left(x\right)-{\mu}_{s\left(i\right)}\Vert}{{\lambda}_{s\left(i\right)}}}}^{}{\kappa}_{s\left(i\right)}$ |

5: Revise activations: $\widehat{v}\left(x\right)=\mathbf{v}\left(x\right)\circ \omega \left(x\right)$ |

6: Define ${\widehat{v}}_{new}\left(x\right)={\sum}_{i}({\mathbf{v}}_{i}\left(x\right)-{\widehat{v}}_{i}\left(x\right))$ |

7: $\widehat{P}(y=j|x)=\frac{{e}^{{\widehat{v}}_{c}\left(x\right)}}{{\sum}_{c=0}^{N}{e}^{{\widehat{v}}_{c}\left(x\right)}}$ |

8: Let ${y}^{\star}=argma{x}_{i}P(y=i|\mathbf{x})$ |

9: return New class if ${y}^{\star}=C+1$ or $P(y={y}^{\u2606}|\mathbf{x})<\u03f5$ |

## 4. Experiments

#### 4.1. Experiment 1: Does the Method Gradually Forget?

#### 4.2. Experiment 2: Is the Method Able to Detect New Unknown Classes?

#### 4.3. Experiment 3: How Long Could the Method Last?

#### 4.4. Case of Study with CIFAR10

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Polikar, R.; Upda, L.; Upda, S.S.; Honavar, V. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)
**2001**, 31, 497–508. [Google Scholar] [CrossRef] - Salas, R.; Moreno, S.; Allende, H.; Moraga, C. A robust and flexible model of hierarchical self-organizing maps for non-stationary environments. Neurocomputing
**2007**, 70, 2744–2757. [Google Scholar] [CrossRef] - Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1842–1850. [Google Scholar]
- Thrun, S. Is learning the n-th thing any easier than learning the first? In Proceedings of the 8th International Conference on Neural Information Processing Systems, Denver, CO, USA, 27 November–2 December 1995; pp. 640–646. [Google Scholar]
- Torres, R.; Salas, R.; Allende, H.; Moraga, C. Robust Expectation Maximization Learning Algorithm for Mixture of Experts. In Computational Methods in Neural Modeling; Mira, J., Álvarez, J.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 238–245. [Google Scholar]
- McCloskey, M.; Cohen, N.J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In Psychology of Learning and Motivation; Bower, G.H., Ed.; Academic Press: Cambridge, MA, USA, 1989; Volume 24, pp. 109–165. [Google Scholar]
- Grossberg, S. Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control; Reidel Press: Dordrecht, The Netherlands, 1982. [Google Scholar]
- Mermillod, M.; Bugaiska, A.; Bonin, P. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Front. Psychol.
**2013**, 4. [Google Scholar] [CrossRef] [PubMed] - French, R.M. Semi-distributed Representations and Catastrophic Forgetting in Connectionist Networks. Connect. Sci.
**1992**, 4, 365–377. [Google Scholar] [CrossRef] - Salas, R.; Saavedra, C.; Allende, H.; Moraga, C. Machine fusion to enhance the topology preservation of vector quantization artificial neural networks. Pattern Recognit. Lett.
**2011**, 32, 962–972. [Google Scholar] [CrossRef] - Kemker, R.; McClure, M.; Abitino, A.; Hayes, T.L.; Kanan, C. Measuring catastrophic forgetting in neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Goodfellow, I.J.; Mirza, M.; Xiao, D.; Courville, A.; Bengio, Y. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. arXiv
**2013**, arXiv:1312.6211. [Google Scholar] - Ge, Z.; Demyanov, S.; Chen, Z.; Garnavi, R. Generative OpenMax for Multi-Class Open Set Classification. arXiv
**2017**, arXiv:1707.07418. [Google Scholar] [Green Version] - Rebuffi, S.A.; Kolesnikov, A.; Lampert, C.H. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5533–5542. [Google Scholar]
- Li, Y.; Li, Z.; Ding, L.; Yang, P.; Hu, Y.; Chen, W.; Gao, X. Supportnet: Solving catastrophic forgetting in class incremental learning with support data. arXiv
**2018**, arXiv:1806.02942. [Google Scholar] - Li, Z.; Hoiem, D. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell.
**2018**, 40, 2935–2947. [Google Scholar] [CrossRef] - Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual learning with deep generative replay. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2990–2999. [Google Scholar]
- Cherti, M.; Kegl, B.; Kazakci, A. Out-of-Class Novelty Generation: An Experimental Foundation. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA, 6–8 November 2017; pp. 1312–1319. [Google Scholar]
- Leveau, V.; Joly, A. Adversarial Autoencoders for Novelty Detection. Ph.D. Thesis, Inria-Sophia Antipolis, Valbonne, France, 2017. [Google Scholar]
- Skvára, V.; Pevný, T.; Smídl, V. Are generative deep models for novelty detection truly better? In Proceedings of the ACM SIGKDD Workshop on Outlier Detection De-constructed (KDD-ODD 2018), London, UK, 20 August 2018. [Google Scholar]
- Mellado, D. A Biological Inspired Artificial Neural Network Model for Incremental Learning with Novelty Detection. Master’s Thesis, Universidad de Valparaiso, Valparaiso, Chile, 2018. [Google Scholar]
- Mellado, D.; Saavedra, C.; Chabert, S.; Torres, R.; Salas, R. Self-Improving Generative Artificial Neural Network for Pseudo-Rehearsal Incremental Class Learning. Preprints
**2019**, 2019. [Google Scholar] [CrossRef] - Ratcliff, R. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychol. Rev.
**1990**, 97, 285–308. [Google Scholar] [CrossRef] [PubMed] - Robins, A. Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci.
**1995**, 7, 123–146. [Google Scholar] [CrossRef] - Mellado, D.; Saavedra, C.; Chabert, S.; Salas, R. Pseudorehearsal Approach for Incremental Learning of Deep Convolutional Neural Networks. In Proceedings of the Computational Neuroscience: First Latin American Workshop, LAWCN 2017, Porto Alegre, Brazil, 22–24 November 2017; Springer: Cham, Switzerland, 2017; pp. 118–126. [Google Scholar] [CrossRef]
- Freeman, W.J. How and Why Brains Create Meaning From Sensory Information. Int. J. Bifurc. Chaos
**2004**, 14, 515–530. [Google Scholar] [CrossRef] - Robins, A.; McCallum, S. The consolidation of learning during sleep: Comparing the pseudorehearsal and unlearning accounts. Neural Networks
**1999**, 12, 1191–1206. [Google Scholar] [CrossRef] - Ans, B.; Rousset, S. Avoiding catastrophic forgetting by coupling two reverberating neural networks. Comptes Rendus de l’Académie des Sciences—Series III—Sciences de la Vie
**1997**, 320, 989–997. [Google Scholar] [CrossRef] - Atkinson, C.; McCane, B.; Szymanski, L.; Robins, A. Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks. arXiv
**2018**, arXiv:1802.03875. [Google Scholar] - Besedin, A.; Blanchart, P.; Crucianu, M.; Ferecatu, M. Evolutive deep models for online learning on data streams with no storage. In Proceedings of the Workshop on IoT Large Scale Learning from Data Streams Co-Located with the 2017 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2017), Skopje, Macedonia, 18–22 September 2017; p. 12. [Google Scholar]
- Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw.
**2019**, 113, 54–71. [Google Scholar] [CrossRef] - Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv
**2013**, arXiv:1312.6114. [Google Scholar] - Doersch, C. Tutorial on variational autoencoders. arXiv
**2016**, arXiv:1606.05908. [Google Scholar] - Gregor, K.; Danihelka, I.; Graves, A.; Rezende, D.J.; Wierstra, D. DRAW: A Recurrent Neural Network for Image Generation. arXiv
**2015**, arXiv:1502.04623. [Google Scholar] - Kulkarni, T.D.; Whitney, W.F.; Kohli, P.; Tenenbaum, J. Deep convolutional inverse graphics network. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 2, Montreal, QC, Canada, 7–12 December 2015; pp. 2539–2547. [Google Scholar]
- Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial Autoencoders. arXiv
**2015**, arXiv:1511.05644. [Google Scholar] - Creswell, A.; Bharath, A.A.; Sengupta, B. Conditional Autoencoders with Adversarial Information Factorization. arXiv
**2017**, arXiv:1711.05175. [Google Scholar] - Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Markou, M.; Singh, S. Novelty detection: A review—Part 1: Statistical approaches. Signal Process.
**2003**, 83, 2481–2497. [Google Scholar] [CrossRef] - Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process.
**2014**, 99, 215–249. [Google Scholar] [CrossRef] - Richter, C.; Roy, N. Safe visual navigation via deep learning and novelty detection. In Proceedings of the RSS 2017: Robotics: Science and Systems, Cambridge, MA, USA, 12–16 July 2017. [Google Scholar]
- Bendale, A.; Boult, T.E. Towards open set deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1563–1572. [Google Scholar]
- Kotz, S.; Nadarajah, S. Extreme Value Distributions: Theory and Applications; Imperial College Press: London, UK, 2000. [Google Scholar]
- Scheirer, W.J.; Rocha, A.; Michaels, R.; Boult, T.E. Meta-Recognition: The Theory and Practice of Recognition Score Analysis. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)
**2011**, 33, 1689–1695. [Google Scholar] [CrossRef] [PubMed] - Lai, C.D.; Murthy, D.; Xie, M. Weibull Distributions and Their Applications. In Springer Handbook of Engineering Statistics; Springer: London, UK, 2006; pp. 63–78. [Google Scholar] [CrossRef]
- Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. EMNIST: An extension of MNIST to handwritten letters. arXiv
**2017**, arXiv:1702.05373. [Google Scholar] - Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 25 September 2019).
- Oberauer, K. Interference between storage and processing in working memory: Feature overwriting, not similarity-based competition. Mem. Cogn.
**2009**, 37, 346–357. [Google Scholar] [CrossRef]

**Figure 4.**Histogram of activation distances between each training class and the new character class A.

**Figure 6.**t-SNE visualization of distribution space $\widehat{z}$ with generated samples for the starting classes (

**a**), after 4 steps (

**b**) and after 8 training steps (

**c**). Each color represents a class.

Training Scheme | SIGANN with the Complete Data Set | Pseudorehearsal SIGANN | Rehearsal SIGANN with 10% Real Data |
---|---|---|---|

Split 1 | $87.71\%$ | $97.91\pm 0.24\%$ | $97.97\pm 0.17\%$ |

Split 2 | − | $89.25\pm 0.54\%$ | $90.34\pm 0.42\%$ |

Split 3 | − | $81.84\pm 0.98\%$ | $86.58\pm 0.55\%$ |

Split 4 | − | $70.11\pm 2.21\%$ | $81.28\pm 0.71\%$ |

Decay Rate | − | $9.27\pm 0.75\%$ | $5.56\pm 0.21\%$ |

**Table 2.**Percentage of new class identification for each non-number character images on EMNIST after training with number images.

Char | % | Char | % |
---|---|---|---|

A | $49.91\pm 7.66\%$ | T | $32.82\pm 9.12\%$ |

B | $28.76\pm 8.88\%$ | U | $44.44\pm 13.44\%$ |

C | $61.49\pm 10.40\%$ | V | $41.43\pm 11.63\%$ |

D | $45.71\pm 27.46\%$ | W | $47.37\pm 8.36\%$ |

E | $45.42\pm 8.17\%$ | X | $47.07\pm 11.92\%$ |

F | $60.38\pm 7.66\%$ | Y | $19.45\pm 5.03\%$ |

G | $38.38\pm 7.44\%$ | Z | $8.27\pm 3.54\%$ |

H | $14.92\pm 7.78\%$ | a | $46.16\pm 9.43\%$ |

I | $14.11\pm 11.89\%$ | b | $19.83\pm 7.69\%$ |

J | $40.46\pm 6.87\%$ | d | $51.22\pm 8.56\%$ |

K | $45.56\pm 7.56\%$ | e | $57.91\pm 6.29\%$ |

L | $11.38\pm 6.33\%$ | f | $60.82\pm 8.16\%$ |

M | $60.21\pm 8.85\%$ | g | $25.36\pm 4.39\%$ |

N | $42.64\pm 8.00\%$ | h | $43.84\pm 8.01\%$ |

O | $35.84\pm 37.46\%$ | n | $54.74\pm 9.65\%$ |

P | $42.73\pm 7.54\%$ | q | $22.27\pm 4.51\%$ |

Q | $48.37\pm 8.46\%$ | r | $55.91\pm 10.83\%$ |

R | $36.09\pm 6.66\%$ | t | $29.99\pm 8.64\%$ |

S | $15.38\pm 8.06\%$ |

Training Scheme | SIGANN with the Complete Data Set | Pseudorehearsal SIGANN | Rehearsal SIGANN with 10% Real Data |
---|---|---|---|

Split 1 | $81.23\%$ | $83.76\%$ | $83.76\%$ |

Split 2 | − | $77.12\%$ | $79.48\%$ |

Split 3 | − | $68.82\%$ | $72.29\%$ |

Decay Rate | − | $9.34\%$ | $7.08\%$ |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mellado, D.; Saavedra, C.; Chabert, S.; Torres, R.; Salas, R.
Self-Improving Generative Artificial Neural Network for Pseudorehearsal Incremental Class Learning. *Algorithms* **2019**, *12*, 206.
https://doi.org/10.3390/a12100206

**AMA Style**

Mellado D, Saavedra C, Chabert S, Torres R, Salas R.
Self-Improving Generative Artificial Neural Network for Pseudorehearsal Incremental Class Learning. *Algorithms*. 2019; 12(10):206.
https://doi.org/10.3390/a12100206

**Chicago/Turabian Style**

Mellado, Diego, Carolina Saavedra, Steren Chabert, Romina Torres, and Rodrigo Salas.
2019. "Self-Improving Generative Artificial Neural Network for Pseudorehearsal Incremental Class Learning" *Algorithms* 12, no. 10: 206.
https://doi.org/10.3390/a12100206