A Survey of Handwritten Character Recognition with MNIST and EMNIST

: This paper summarizes the top state-of-the-art contributions reported on the MNIST dataset for handwritten digit recognition. This dataset has been extensively used to validate novel techniques in computer vision, and in recent years, many authors have explored the performance of convolutional neural networks (CNNs) and other deep learning techniques over this dataset. To the best of our knowledge, this paper is the ﬁrst exhaustive and updated review of this dataset; there are some online rankings, but they are outdated, and most published papers survey only closely related works, omitting most of the literature. This paper makes a distinction between those works using some kind of data augmentation and works using the original dataset out-of-the-box. Also, works using CNNs are reported separately; as they are becoming the state-of-the-art approach for solving this problem. Nowadays, a signiﬁcant amount of works have attained a test error rate smaller than 1% on this dataset; which is becoming non-challenging. By mid-2017, a new dataset was introduced: EMNIST, which involves both digits and letters, with a larger amount of data acquired from a database different than MNIST’s. In this paper, EMNIST is explained and some results are surveyed.


Introduction
In recent years, deep learning-based techniques have been gaining significant interest in the research community for solving a variety of supervised, unsupervised and reinforcement learning problems. One of the most well-known and widely used techniques are convolutional neural networks (CNNs), a kind of neural networks which are able to automatically extract relevant features from input data.
The properties of convolutional neural networks, including the fact that they are able to retrieve features from multidimensional inputs, turn them into a very interesting alternative for solving problems within the field of computer vision. In fact, computer vision has become a testbed for validating novel techniques and innovations in CNNs.
More specifically, one dataset has been widely used for this purpose: the MNIST dataset. MNIST is a database of labeled handwritten digits, with separate training and test sets, and therefore is an easily interpretable domain that allows a fast comparison between different techniques. In fact, MNIST is so widely used that it is even used in "hello-world" tutorials of some deep learning frameworks, such as TensorFlow [1].
Soon after the release of the MNIST dataset in 1998, some authors have released rankings including the performance of different works. In fact, the first ranking was published by LeCun et al. [2], the authors of the original MNIST paper, and has included works as recent as 2012. To the best of our knowledge, the most recent and exhaustive ranking has been released by Benenson [3], updated as of 2016. However, this ranking is still very incomplete and omits a substantial number of relevant works, as well as those published after 2016. Finally, many authors of papers reporting results on the MNIST dataset have compared their performance against related works; however, they often include only other works using similar techniques.
As a consequence of this, there are not exhaustive and updated rankings of the MNIST database; and this paper intends to fill that gap. In particular, it gathers and summarizes all the works we have found reporting results on the MNIST dataset that are competitive; i.e., whose results are better than a 1% test error rate, classifying them according to two different criteria: whether they perform preprocessing and/or data augmentation or not, and whether the techniques used to obtain a given result involve convolutional neural networks or not. This paper has been written with completeness in mind, and therefore includes not only works published in journals and international conferences, but also those available in pre-print and source code repositories. However, results that are reported in works not published after a peer-review process are explicitly noted as so.
Additionally, as of today, MNIST is widely considered a non-challenging database anymore. As a consequence of it, an extended version of MNIST, namely EMNIST dataset was released in April 2017. EMIST comprises both handwritten digits and handwritten letters, is significantly larger than MNIST and data comes from a different source. In this paper, we also describe the acquisition of the EMNIST dataset and a survey of the state-of-the-art.
This paper is structured as follows: Section 2 describes the acquisition process for the MNIST database as well as the data format, and surveys the state of the art for the MNIST dataset. Then, Section 3 explains the acquisition process for the EMNIST database and the different subsets along with their format, later describing works using this dataset as well as NIST Special Database 19 (from which EMNIST is obtained). Finally, Section 4 summarizes the main findings found when surveying works on MNIST and EMNIST.

Acquisition and Data Format
The MNIST (Mixed National Institute of Standards and Technology) database was introduced by LeCun et al. [2] in 1998. Since then, this dataset has been widely used as a testbed for different machine learning and pattern recognition proposals.
The MNIST database contains a total of 70,000 instances, from which 60,000 are for training and the remainder are for testing. The database comprises two different sources: NIST's Special Database 1 (collected among high-school students) and NIST's Special Database 3 (retrieved from Census Bureau employees). The training and test set were chosen so that a same writer would not be involved in both sets. The training set contains samples from more than 250 writers.
Original images were submitted to preprocessing. This procedure first involved normalizing images to fit in a 20 × 20 pixel box while preserving the aspect ratio. Then, an anti-aliasing filter was applied, and as a result black and white images were effectively transformed into grayscale. Finally, blank padding was introduced to fit the images in a larger 28 × 28 pixel box, so that the center of mass of the digit matched its center. An example of a sample corresponding to the digit '7' can be found in Figure 1a. A larger set of instances belonging to the training set is displayed in Figure 1b.
By looking at this figure, we can see that some digits can be easily confused. Depending on how they are written, it can be difficult to elucidate if a certain digit is a '4' or a '9', to say an example. Some cases of digits difficult to recognize, even for humans, can be seen in Figure 1b (see the ninth '2' or the ninth '7').

State of the Art
Because MNIST has been widely used to test the behavior of many implementations of classifiers, there has been an effort to publish some rankings in the past, using MNIST as a benchmark. Most of these rankings, as well as the established literature uses the "test error rate" metric when referring to performance over MNIST. This metric is the percentage of incorrectly classified instances.
One of the earliest rankings was published by LeCun et al. [2] themselves, which includes references up to 2012. This website provides a taxonomy of classifiers and points out whether each work performed data preprocessing or augmentation (addition of new instances to the training set resulting from distortions or other modifications of the original data).
In this ranking, we can verify that the early machine learning approaches used by LeCun et al. [4] included linear classifiers (whose error rate ranges from 7.6 to 12%), K-nearest neighbors approaches (K-NN, ranging from 1.1 to 5%), non-linear classifiers (about 3.5%), support vector machines (SVM, from 0.8 to 1.4%), neural networks (NN, from 1.6 to 4.7%) and convolutional neural networks (CNN, from 0.7 to 1.7%). It is remarkable that data augmentation leads to better results, in particular, the best error rate achieved using a convolutional neural network with no distortions and no preprocessing in LeCun et al. [4] was 0.95%.
From the different works gathered in LeCun et al.'s ranking, those based in convolutional neural networks outperform the other techniques. However, there are some classical machine learning techniques which are still able to provide competitive error rates (in this paper, we will consider "competitive" those techniques achieving an error rate under 1.0%). For instance, Belongie et al. [5] achieved 0.63% and Keysers et al. [6] achieved 0.54% and 0.52% using K-NN, Kégl and Busa-Fekete [7] achieved 0.87% using boosted stumps on Haar features, LeCun et al. [4] achieved 0.8% and Decoste and Schölkopf [8] attained results from 0.56 to 0.68% using SVM.
When using non-convolutional neural networks, Simard et al. [9] achieved 0.9% and 0.7% respectively using a 2-layer neural network with MSE and cross-entropy loss functions. Deng and Yu [10] achieved 0.83% using a deep convex network without data augmentation or preprocessing. Very interesting results were attained by Meier et al. [11] (0.39%) using a committee of 25 neural networks and Cireşan et al. [12] (0.35%) using a 6-layers neural network (it is worth mentioning that the reproducibility of this result has been put into question by Martin [13] in his blog).
On the other hand, works based on convolutional neural networks attained a much better average performance (in fact, the worst result reported in LeCun's ranking was 1.7%). LeCun et al. [4] combined different convolutional architectures along with data augmentation techniques, obtaining error rates ranging from 0.7 to 0.95%. Lauer et al. [14] attained between 0.54% and 0.83% using a trainable feature extractor along with SVMs, and Labusch et al. [15] reported an error rate of 0.59% using a similar technique consisting on a CNN to extract sparse features and SVMs for classification. Simard et al. [9] obtained error rates between 0.4% and 0.6% using CNN with cross-entropy loss function and data augmentation. Ranzato et al. [16] reported results between 0.39% and 0.6% using a large CNN along with unsupervised pretraining, and some years later Jarrett et al. [17] reported a test error of 0.59% with a similar approach technique and without data augmentation. The best results in this ranking are those obtained by Cireşan et al. [18], which reported an error rate of 0.35% using a large CNN, and 0.27% [19] and 0.23% [20] using committees of 7 and 35 neural networks respectively, using data augmentation in all cases.
In 2015, McDonnell et al. [21] proposed an approach using the 'extreme learning machine' (ELM) [22] algorithm to train shallow non-convolutional neural networks, and they agree on the fact that data augmentation with distortions can reduce error rates even further. In their work, McDonnell et al. compare their results with other previous ELM and selected non-ELM approaches, updated with some CNN-based works as of 2015. The most outstanding ELM-based results achieved test error rates of 0.97% in the work by Kasun et al. [23], and ranged from 0.57 to 1.36% in the proposal by McDonnell et al. [21].
However, most interesting results included in the comparison by McDonnell et al. [21] are not those provided in that work, but those from previous works where convolutional neural networks were used. For example, Wan et al. [24] used CNNs with a generalization of dropout they called DropConnect, and reported an error rate of 0.57% without data augmentation and as low as 0.21% with data augmentation. Zeiler and Fergus [25] proposed the use of stochastic pooling achieving an error rate of 0.47%. Goodfellow et al. [26] described the maxout model averaging technique, attaining a test error rate of 0.45% without data augmentation. In 2015, Lee et al. [27] described "deeply supervised nets", an approach by which they introduce a classifier (SVM or softmax) at hidden layers, reporting a result of 0.39% without data augmentation.
A more recent ranking, updated as of 2016, has been made available by Benenson [3] in his GitHub's page. This ranking includes many of the works reviewed so far, as well as other with very competitive results. For example, Sato et al. [28] explore and optimize data augmentation, and attain a test error rate of 0.23% in the MNIST database using a convolutional neural network. A very interesting result is reported by Chang and Chen [29], where an error rate of 0.24% (the best result found as of 2017 in the state of the art without data augmentation) was obtained by using a network-in-network approach with a maxout multi-layer perceptron, an approach they named MIN. Also, very competitive performances without data augmentation were reported by Lee et al. [30] when using gated pooling functions in CNNs (0.29%), by Liang and Hu [31] when using a recurrent CNN (0.31%), by Liao and Carneiro [32] when using CNNs with normalization layers and piecewise linear activation units (0.31%) and competitive multi-scale convolutional filters (0.33%) [33], or by Graham Graham [34] using fractional max-pooling with random overlapping (0.32%).
Additional works where state-of-the-art results were obtained without data augmentation include those by McFonnell and Vladusich [35], who reported a test error rate of 0.37% using a fast-learning shallow convolutional neural network, Mairal et al. [36], who achieved 0.39% using convolutional kernel networks, Xu et al. [37], who explored multi-loss regularization in CNNs obtaining an error rate of 0.42%, and Srivastava et al. [38] used so-called convolutional "highway" networks (inspired by LSTM recurrent networks) to achieve an error rate of 0.45%.
Lin et al. [39] reported an error rate of 0.47% using a network-in-network approach, where micro-neural networks are used as convolutional layers. Ranzato et al. [40] used convolutional layers for feature extraction and two fully connected NN for classification, attaining an error rate of 0.62%. Bruna and Mallat [41] designed a network that performed wavelet transform convolutions to the input and a generative PCA (principal component analysis) classifier to obtain an error of 0.43%.
Calderón et al. [42] proposed a variation of a CNN where the first layer applies Gabor filters over the input, obtaining an error of 0.68%. Also, Le et al. [43] explored the application of limited-memory BFGS (L-BFGS) and conjugate gradient (CG) as alternatives to stochastic gradient descent methods for network optimization, achieving 0.69% error rate. Finally, Yang et al. [44] developed a new transform named Adaptative Fastfood to reparameterize the fully connected layers, obtaining a test error rate of 0.71%.
An interesting approach is followed by Hertel et al. [45], in which they reuse the convolutional kernels previously learned over the ILSVRC-12 dataset, proving that CNNs can learn generic feature extractors than can be reused across tasks, and achieving a test error rate in the MNIST dataset of 0.46% (vs. 0.32% when training the network from scratch).
It can be seen how most recent works are based on convolutional neural networks, due to their high performance. However, some exceptions can be noticed. One example is the work by Wang and Tan [46] where they attained a test error rate of 0.35% using a single layer centering support vector data description (C-SVDD) network with multi-scale receptive voting and SIFT (scale-invariant feature transform) features. Also, Zhang et al. [47] devised a model (HOPE) for projecting features from raw data, that could be used for training a neural network, that attained a test error rate of 0.40% without data augmentation. Visin et al. [48] proposed a recurrent neural network (RNN) that replaced the convolutional layers by recurrent neural networks that swiped through the image, attaining an error rate of 0.45% with data augmentation.
Additional examples with less performance include the work by Azzopardi and Petkov [49] who used a combination of shifted filter responses (COSFIRE), attaining an error rate of 0.52%. Also, Chan et al. Chan et al. [50] used a PCA network to learn multi-stage filter banks, and report an error rate of 0.62%. Mairal et al. [51] used task-driven dictionary learning, reporting a test error rate of 0.54%, yet using data augmentation. Jia et al. [52] explored the receptive field of pooling, and obtained an error of 0.64%. Thom and Palm [53] explored the application of the sparse connectivity and sparse activity properties to neural networks, obtaining an error rate of 0.75% using a supervised online autoencoder with data augmentation. Lee et al. [54] described convolutional deep belief networks with probabilistic max-pooling achieving an error rate of 0.82%. Min et al. [55] reported an error of 0.95% using a deep encoder network to perform non-linear feature transformations which are then introduced to K-NN for classification. Finally, Yang et al. [56] used supervised translation-invariant sparse coding with a linear SVM attaining an error rate of 0.84%.
Other approach to solve the MNIST classification problem involves so-called "deep Boltzmann machines". The first application of this technique was suggested by Salakhutdinov and Hinton [57] and enabled them to report an error rate of 0.95%. Few years later, Goodfellow et al. [58] introduced the multi-prediction deep Boltzmann machine, achieving a test error rate of 0.88%.
The reviewed rankings gather most of the works providing very competitive results for the MNIST database. Besides those rankings, we have found a work by Mishkin and Matas [59] where the authors work in the weights initialization of the CNN and replacing the softmax classifier with SVM, achieving a test error rate of 0.38%, and another work by Alom et al. [60] where inception-recurrent CNNs were used to attain an error of 0.29%.
Finally, it is worth mentioning some techniques that automatically optimize the topology of CNNS: MetaQNN [61], which relies on reinforcement learning for the design of CNNs architectures, has reported an error rate on MNIST of 0.44%, and 0.32% when using an ensemble of the best found neural networks. Also, DEvol [62], which uses genetic programming, has obtained a error rate of 0.6%. Baldominos et al. [63] presented a work in 2018 where the topology of the network is evolved using grammatical evolution, attaining a test error rate of 0.37% without data augmentation and this result was later improved by means of the neuroevolution of committees of CNNs [64] down to 0.28%. Similar approaches of evolving a committee of CNNs were presented by Bochinski et al. [65], achieving a very competitive test error rate of 0.24%; and by Baldominos et al. [66], where the models comprising the committee were evolved using a genetic algorithm, reporting a test error rate of 0.25%.
A summary of all the reviewed works can be found in Tables 1 and 2. Table 1 includes works where some preprocessing or data augmentation took place before classification, whereas Table 2 includes the remaining works. The top of the table shows the performance of classical machine learning, while the lower side shows the results reported using CNNs. In those cases when authors have reported more than one result using a single classifier (e.g., with different parameters), only the best is shown. Table 1. Side-by-side comparison of the most competitive (error rate < 1%) results found in the state of the art for the MNIST database with data augmentation or preprocessing. Best achieved performances are boldfaced.

Acquisition and Data Format
As it can be observed from the previous section, there are a significant amount of works tested over MNIST and reporting a test error rate of 1%. The fact that many of these works approach an error rate of 0.2% suggests that this value can be irreducible for this problem. For this reason, MNIST is considered to be already solved. As a result, Cohen et al. [67] introduced in April 2017 the Extended MNIST database (EMNIST), consisting on both handwritten digits and letters, and sharing the same structure than the MNIST database. Authors of EMNIST stated that at that point "the dataset labeling can be called into question", describing MNIST as a non-challenging benchmark.
A sample of the letters and digits in the EMNIST dataset can be found in Figure 2. The source for building EMNIST database was NIST Special Database 19 (NIST SD 19) [68], containing NIST's (National Institute of Standards and Technology of the US) entire corpus of training materials for handprinted document and character recognition, including over 800,000 manually checked and labelled characters from almost 3700 writers [69] who filled a form. Even if this database was available from the mid-1990s, it remained mostly unused due to the difficulty in accessing and using it in modern computers, because of the way it was stored. This was fixed in 2016 when a second version of the database was released with a simpler format.
The authors of EMNIST have intentionally made the database compliant with MNIST in terms of structure, therefore performing a similar processing, which involves the following steps: 1.
Storing original images in NIST SD 19 as 128 × 128-pixels black and white images.

2.
Applying a Gaussian blur filter to soften edges.

3.
Removing empty padding to reduce the image to the region of interest.

4.
Centering the digit in a square box preserving the aspect ration.
Originally, NIST SD 19 had two different labeling schemata: These schemata have been ported to EMNIST. Also, besides these two datasets, EMNIST has generated additional datasets:

State of the Art
In the original EMNIST paper [67], authors included a baseline using a linear classifier and another technique named OPIUM (Online Pseudo-Inverse Update Method), a classifier introduced by van Schaik and Tapson [70]. Authors declared to have chosen this technique because it generates an analytical solution which is deterministic, and their aim was not to obtain cutting-edge results. In fact, they report an accuracy of 85.15% in the Letters dataset and 95.90% in the Digits dataset using OPIUM. Since Cohen et al. [67] reported the performance in terms of accuracy instead of error rate, we have decided to do so as well in this survey.
Techniques not belonging to the field of deep learning have been tested on EMNIST with successful results. For example, Ghadekar et al. [71] discrete wavelet transform (DWT) and discrete cosine transform (DCT) were hybridized to extract features from the handwritten samples, later using K-NN and SVM for classification. Authors report that SVM outperform K-NN in all scenarios, attaining an accuracy of up to 89.51% in EMNIST Letters and 97.74% in EMNIST Digits.
As it happened with MNIST, outstanding achievements have been possible in EMNIST when testing deep learning techniques. This can be seen in a recent paper by Botalb et al. [72] comparing a multi-layer perceptron with CNNs in EMNIST. They report a maximum accuracy of 89.47% using a multi-layer perceptron, and an accuracy of 99.2% using a CNN, although the test set used by authors is not standard. An early attempt to use deep learning to EMNIST was presented by Peng and Yin [73], who described the application of Markov random field-based CNNs reporting accuracies ranging between 87.77% and 99.75%. Singh et al. [74] have used a CNN with three convolutional layers and two dense layers, reporting an accuracy in EMNIST Digits of 99.62%. Mor et al. [75] used a CNN with two convolutional layers and one dense layer for classifying EMNIST with the purpose of developing a handwritten text classification app for Android, obtaining an accuracy of 87.1% over EMNIST By_Class.
An interesting work using capsule layers (a concept recently introduced by Sabour et al. [76]) is TextCaps [77], a CNN with 3 convolutional layers and two capsule layers, which obtain accuracies of 95.36% in EMNIST Letters and 99.79% in EMNIST Digits. Another interesting work was published by dos Santos et al. [78], where they proposed the use of deep convolutional extreme learning machine, where gradient descent is not used to train the network, allowing a very fast learning stage (only 21 min using CPU, according to authors). They tested their proposal only with EMNIST Digits, attaining an accuracy of 99.775%. Finally, Cavalin and Oliveira [79] presented a method to generate a hierarchical classifier by automatically studying the confusion matrix of a "flat" (regular) classifier. They test logistic regression, multi-layer perceptrons and CNNs, and report the best results for the CNN, although with this technique the hierarchical classification do not beat the flat setting. They report an accuracy of 99.46% for EMNIST Digits and 93.63% for EMNIST Letters.
Recently, some authors have made used of neural architecture search for automatically optimizing the hyperparameters of CNN classifiers. For example, Dufourq and Bassett [80] presented EDEN, where they applied neuroevolution obtaining an accuracy of 88.3% in EMNIST Balanced and 99.3% in EMNIST Digits. More recently, [64] have used committees of neuroevolved CNNs using topology transfer learning, obtaining an accuracy of 95.35% in EMNIST Letters and 99.77% in EMNIST Digits.
EMNIST database has been used in some application papers as well. As an example, the dataset has also been used by Neftci et al. [81] for training neuromorphic deep neural networks and by Shu et al. [82] for the sake of pairwise classification. However, due to the way in which the dataset was used, results are not comparable other works. Srivastava et al. [83] used EMNIST for training a optical character recognition system for bank cheques, and Sharma et al. [84] used the database for optical recognition in specialized forms filled by students, although in neither case performance on the EMNIST test set was reported. Similarly, Shawon et al. [85] designed a system using a CNN with six convolutional layers and two fully connected layers for Bangla handwritten digit recognition, testing it also in EMNIST and reporting accuracies of 90.59% in EMNIST Balanced and 99.79% in EMNIST Digits.
Additionally, some researchers have relied on NIST SD 19 in the past as a testbed for their proposals. These works are not directly comparable to EMNIST since datasets are not identical, but can be used as a reference. For example, Milgram et al. [86] reported an average accuracy of 98.75% using SVMs with sigmoid function over a test set made only of digits. Granger et al. [87] used particle swarm optimization for evolving the topology of neural networks, obtaining an accuracy over a test set of digits of 96.49%. Moreover, Oliveira et al. [88] tested a multi-layer perceptron, reporting an accuracy of 98.39% in the same dataset.
In other works, letters from NIST SD 19 are also taken into account. For example, Radtke et al. [89] optimized a classifier using an annealing-based approach called record-to-record travel reporting an accuracy of 93.78% for letters and 96.53% for digits. Koerich and Kalva [90] tested a multi-layer perceptron attaining an accuracy of 87.79% in the letters dataset. Also, Cavalin et al. [91] used hidden Markov models reporting an accuracy of 87% for letters (both uppercase and lowercase) and 98% for digits. Finally, Cireşan et al. [19] have used this database to test the performance of committees of CNNs, attaining accuracies of 88.12% for the whole database, 92.42% for letters and 99.19% for digits.
Surveyed works are summarized in Table 3, with the upper side of the table showing the performance of classical machine learning models and the lower part showing works involving CNNs.

Conclusions
This paper has provided an exhaustive review of the state of the art for both the MNIST and EMNIST databases.
The MNIST database of handwritten digits was introduced almost two decades ago and has been extensively used to validate computer vision algorithms, and more recently, also as a benchmark to test different convolutional neural networks architectures and approaches. To the best of our knowledge, this paper provides the most extensive and updated survey of the MNIST dataset, including papers as recent as 2019 and making a distinction on whether they use preprocessing and/or data augmentation or not. As of today (July 2019), the best result obtained for this dataset involves a test error rate of 0.21%, and uses a convolutional neural network with two convolutional layers, one dense layer, and DropConnect, a regularization approach which generalizes dropout. To achieve such a competitive result, this proposal relies on data augmentation. On the other hand, the best result without any kind of data augmentation corresponds to a test error rate of 0.24% using network-in-network, a technique that shares many properties with convolutional neural networks.
After MNIST, we have described the EMNIST database, which was introduced in 2017 and includes a significantly larger number of samples, coming from a difference source than MNIST, and including also handwritten letters; thus constituting a more challenging benchmark. Although the number of works addressing EMNIST is significantly smaller than those testing on MNIST, because of the recency of the former, some works obtaining very competitive results are starting to emerge. In this case, the highest accuracy found for the Letters database is 95.44% when combining Markov random field models and convolutional neural networks. In the case of the Digits database, the best accuracy is achieved using convolutional neural networks. One of the works reporting this top result uses capsule layers after convolution.
It is remarkable that MNIST was introduced in 1998 by LeCun et al. [2], in the same paper where they presented convolutional neural networks. This technique has evolved over two decades, and with the improvement of graphics hardware they are becoming an industry standard for solving a large number of computer vision problems, as well as in other domains. In particular, the best results reported in this survey are based in convolutional neural networks, sometimes combined with other techniques to further improve performance. In some cases, CNNs are combined with evolutionary algorithms or other optimization techniques in order to enhance the selection of hyperparameters, leading to better accuracy. In other cases, different CNN models are combined into a committee, reducing classification error.
Also, from the surveyed works we can infer that data augmentation techniques can enhance performance. These techniques are most often used with neural networks or SVMs, and when combined with a CNN classifier they are able to make the top of the absolute ranking in the MNIST dataset.
As a general conclusion, we can see how during two decades, there has been no significant changes in the artificial intelligence techniques used for computer vision. Indeed, most of the merit of the good performance can be attributed instead to hardware improvements, which has eventually led to more complex architectures and the ability to deal with higher amounts of data. Still, some works are proposing novel developments or improvements, which are often combined with convolutional neural networks, reporting outstanding results. Although accuracy in MNIST and EMNIST is very close to 100% and will hardly increase, these novel developments might hold the key for breaking more complex computer vision problems.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: