# Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE

^{*}

## Abstract

**:**

## 1. Introduction

## 2. t-Distributed Stochastic Neighbour Embedding (t-SNE)

## 3. Mathematical Notation

## 4. Outlier Analysis

## 5. Transfer Learning as Feature Extraction

## 6. Methodology

Algorithm 1 Identifying Outlier Images |

LoadVGG-16 |

Read raw dataset |

Create function train_vgg16(epochs: real number, data: raw dataset) |

Initialize i ←1 |

While i < epochs do |

train VGG-16 on data |

End while |

Return (features map) |

End function |

Create function t-SNE(v: features map, dim:2, iter:1000) |

Initialisei ← 1 |

$\sigma =$ variance of the Gaussian |

Repeat |

For each pair of points ${x}_{i}$ and ${x}_{j}$ in v do |

If ${x}_{i}={x}_{j}$ then |

${P}_{i|i}=0$ |

Else do |

Compute ${p}_{j|i}=\frac{\mathrm{exp}(-||{x}_{i}-{x}_{j}|{|}^{2}/2{{\sigma}_{i}}^{2})}{{\sum}_{k\ne i}\mathrm{exp}(-||{x}_{i}-{x}_{j}|{|}^{2}/2{{\sigma}_{i}}^{2})}$ |

End if |

End for |

For each counterpart pair of points ${y}_{i}$ and ${y}_{j}$ in low-dimension do |

If ${y}_{i}={y}_{j}$ then |

${q}_{j|i}=0$ |

Else do |

Compute ${q}_{j|i}=\frac{\mathrm{exp}(-||{y}_{i}-{y}_{j}|{|}^{2})}{{\sum}_{k\ne i}\mathrm{exp}(-||{y}_{i}-{y}_{j}|{|}^{2})}$ |

End if |

End for |

i += |

Untili = iter |

WriteJSON_file ← class, image_name, ${y}_{i}$,$\text{}{y}_{j}$ |

return (JSON_file) |

End function |

Create function IQR(classes:JSON_file) |

outliers = [[] |

For each class do |

For each image_name do |

If corresponding ${y}_{i}$, and ${y}_{j}$ are NOT in $[{Q}_{1}-k\left({Q}_{3}-{Q}_{1}\right),{Q}_{3}+k\left({Q}_{3\text{}}-{Q}_{1}\right)$ then |

Outliers[[] += image_name |

Else |

Continue |

End if |

End if |

End for |

Return (Outliers[[]) |

End function |

Create function Main() |

Call train_vgg16() |

Call t-SNE() |

Call IQR() |

## 7. Results

## 8. Discussion and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Yu, Z.; Zhang, C. Image Based Static Facial Expression Recognition with Multiple Deep Network Learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, 9–13 November 2015; ACM: New York, NY, USA, 2015; pp. 435–442. [Google Scholar]
- Bartlett, M.S.; Littlewort, G.; Lainscsek, C.; Fasel, I.; Movellan, J. Machine learning methods for fully automatic recognition of facial expressions and facial actions. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), The Hague, The Netherlands, 10–13 October 2004; Volume 1, pp. 592–597. [Google Scholar]
- Duygulu, P.; Barnard, K.; de Freitas, J.F.G.; Forsyth, D.A. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In Proceedings of the Computer Vision—ECCV 2002, Copenhagen, Denmark, 28–31 May 2002; Heyden, A., Sparr, G., Nielsen, M., Johansen, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 97–112. [Google Scholar]
- Joutou, T.; Yanai, K. A food image recognition system with Multiple Kernel Learning. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 285–288. [Google Scholar]
- Graves, A.; Mohamed, A.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Deng, L.; Li, X. Machine Learning Paradigms for Speech Recognition: An Overview. IEEE Trans. Audio Speech Lang. Process.
**2013**, 21, 1060–1089. [Google Scholar] [CrossRef] - Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Casper, J.; Catanzaro, B.; Cheng, Q.; Chen, G.; et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 173–182. [Google Scholar]
- Magoulas, G.D.; Prentza, A. Machine Learning in Medical Applications. In Machine Learning and Its Applications: Advanced Lectures; Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; pp. 300–307. ISBN 978-3-540-44673-6. [Google Scholar]
- Kononenko, I. Inductive and Bayesian Learning in Medical Diagnosis. Appl. Artif. Intell.
**1993**, 7, 317–337. [Google Scholar] [CrossRef] - Foster, K.R.; Koprowski, R.; Skufca, J.D. Machine learning, medical diagnosis, and biomedical engineering research—Commentary. Biomed. Eng. Online
**2014**, 13, 94. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Perez, H.; Tah, J.H.M.; Mosavi, A. Deep Learning for Detecting Building Defects Using Convolutional Neural Networks. Sensors
**2019**, 19, 3556. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vakalopoulou, M.; Karantzalos, K.; Komodakis, N.; Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1873–1876. [Google Scholar]
- Makantasis, K.; Protopapadakis, E.; Doulamis, A.; Doulamis, N.; Loupos, C. Deep Convolutional Neural Networks for efficient vision based tunnel inspection. In Proceedings of the 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 3–5 September 2015; pp. 335–342. [Google Scholar]
- Cha, Y.-J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng.
**2017**, 32, 361–378. [Google Scholar] [CrossRef] - Ghorai, S.; Mukherjee, A.; Gangadaran, M.; Dutta, P.K. Automatic Defect Detection on Hot-Rolled Flat Steel Products. IEEE Trans. Instrum. Meas.
**2013**, 62, 612–621. [Google Scholar] [CrossRef] - Bernieri, A.; Ferrigno, L.; Laracca, M.; Molinara, M. Crack Shape Reconstruction in Eddy Current Testing Using Machine Learning Systems for Regression. IEEE Trans. Instrum. Meas.
**2008**, 57, 1958–1968. [Google Scholar] [CrossRef] - Park, J.-K.; Kwon, B.-K.; Park, J.-H.; Kang, D.-J. Machine learning-based imaging system for surface defect inspection. Int. J. Precis. Eng. Manuf.-Green Technol.
**2016**, 3, 303–310. [Google Scholar] [CrossRef] - Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science
**2015**, 349, 261–266. [Google Scholar] [CrossRef] - Wing, J.M. Computational thinking and thinking about computing. Philos. Trans. R. Soc. Math. Phys. Eng. Sci.
**2008**, 366, 3717–3725. [Google Scholar] [CrossRef] - Al-Jarrah, O.Y.; Yoo, P.D.; Muhaidat, S.; Karagiannidis, G.K.; Taha, K. Efficient Machine Learning for Big Data: A Review. Big Data Res.
**2015**, 2, 87–93. [Google Scholar] [CrossRef] [Green Version] - Royal Society (Great Britain). Machine Learning: The Power and Promise of Computers that Learn by Example; Royal Society: London, UK, 2017; ISBN 978-1-78252-259-1. [Google Scholar]
- Chatterjee, S. Good Data and Machine Learning. Available online: https://towardsdatascience.com/data-correlation-can-make-or-break-your-machine-learning-project-82ee11039cc9 (accessed on 10 September 2019).
- Thury, G.; Wüger, M. Outlier detection and adjustment. Empirica
**1992**, 19, 71–93. [Google Scholar] [CrossRef] - Zhang, K.; Luo, M. Outlier-robust extreme learning machine for regression problems. Neurocomputing
**2015**, 151, 1519–1527. [Google Scholar] [CrossRef] - Kwak, S.K.; Kim, J.H. Statistical data preparation: Management of missing values and outliers. Korean J. Anesthesiol.
**2017**, 70, 407–411. [Google Scholar] [CrossRef] [PubMed] - Allen, M. The SAGE Encyclopedia of Communication Research Methods; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2017; ISBN 978-1-4833-8143-5. [Google Scholar]
- Aggarwal, C.C. Outlier analysis. In Data mining; Springer: Cham, Switzerland, 2015; pp. 237–263. [Google Scholar]
- Committee, A.M. Robust statistics—How not to reject outliers. Part 1. Basic concepts. Analyst
**1989**, 114, 1693–1697. [Google Scholar] - Rubin, D.B. Multiple Imputation after 18+ Years. J. Am. Stat. Assoc.
**1996**, 91, 473–489. [Google Scholar] [CrossRef] - Schafer, J.L. Multiple imputation: A primer. Stat. Methods Med. Res.
**1999**, 8, 3–15. [Google Scholar] [CrossRef] - Hautamaki, V.; Karkkainen, I.; Franti, P. Outlier detection using k-nearest neighbour graph. In Proceedings of the Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004; Volume 3, pp. 430–433. [Google Scholar]
- Burke, S. Missing Values, Outliers, Robust Statistics & Non-parametric Methods. Sci. Data Manag.
**1998**, 1, 32–38. [Google Scholar] - Gentleman, J.F.; Wilk, M.B. Detecting Outliers. II. Supplementing the Direct Analysis of Residuals. Biometrics
**1975**, 31, 387–410. [Google Scholar] [CrossRef] - Hoyer, P.O. Non-negative sparse coding. In Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, Martigny, Switzerland, 6 September 2002; pp. 557–565. [Google Scholar]
- Casalino, G.; Gillis, N. Sequential dimensionality reduction for extracting localized features. Pattern Recognit.
**2017**, 63, 15–29. [Google Scholar] [CrossRef] [Green Version] - Kubica, J.; Moore, A. Probabilistic noise identification and data cleaning. In Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Australia, 19–22 November 2003; pp. 131–138. [Google Scholar]
- Khoshgoftaar, T.M.; Seliya, N.; Gao, K. Rule-based noise detection for software measurement data. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, Las Vegas, NV, USA, 8–10 November 2004; pp. 302–307. [Google Scholar]
- Duan, L.; Xu, L.; Guo, F.; Lee, J.; Yan, B. A local-density based spatial clustering algorithm with noise. Inf. Syst.
**2007**, 32, 978–986. [Google Scholar] [CrossRef] - Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying Density-based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; ACM: New York, NY, USA, 2000; pp. 93–104. [Google Scholar]
- Kuncheva, L.I.; Jain, L.C. Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recognit. Lett.
**1999**, 20, 1149–1156. [Google Scholar] [CrossRef] - Brodley, C.E.; Friedl, M.A. Identifying and Eliminating Mislabeled Training Instances. In Proceedings of the National Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996. [Google Scholar]
- Guan, D.; Yuan, W.; Lee, Y.-K.; Lee, S. Identifying mislabeled training data with the aid of unlabeled data. Appl. Intell.
**2011**, 35, 345–358. [Google Scholar] [CrossRef] - Seo, H.-S.; Yoon, M. Outlier Detection Using Support Vector Machines. Commun. Stat. Appl. Methods
**2011**, 18, 171–177. [Google Scholar] [CrossRef] - Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Aldrich, J.R.A. Fisher and the making of maximum likelihood 1912-1922. Stat. Sci.
**1997**, 12, 162–176. [Google Scholar] [CrossRef] - Letters to the Editor. Am. Stat.
**1987**, 41, 338–341. [CrossRef] - Hobson, A. Concepts in Statistical Mechanics; CRC Press: Boca Raton, FL, USA, 1987; ISBN 978-0-677-21870-0. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer: New York, NY, USA, 2006; ISBN 978-0-387-31073-2. [Google Scholar]
- MacKay, D.J.C.; Kay, D.J.C.M. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003; ISBN 978-0-521-64298-9. [Google Scholar]
- Buchala, S.; Davey, N.; Gale, T.M.; Frank, R.J. Analysis of linear and nonlinear dimensionality reduction methods for gender classification of face images. Int. J. Syst. Sci.
**2005**, 36, 931–942. [Google Scholar] [CrossRef] [Green Version] - Berger, A.L.; Pietra, V.J.D.; Pietra, S.A.D. A maximum entropy approach to natural language processing. Comput. Linguisti.
**1996**, 22, 39–71. [Google Scholar] - Chowdhury, G.G. Natural language processing. Ann. Rev. Inf. Sci. Technol.
**2003**, 37, 51–89. [Google Scholar] [CrossRef] [Green Version] - Zimek, A.; Filzmoser, P. There and back again: Outlier detection between statistical reasoning and data mining algorithms. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2018**, 8, e1280. [Google Scholar] [CrossRef] [Green Version] - Yu, C.H. Exploratory data analysis. Methods
**1977**, 2, 131–160. [Google Scholar] - Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data
**2016**, 3, 9. [Google Scholar] [CrossRef] [Green Version] - Paatero, P.; Hopke, P.K. Discarding or downweighting high-noise variables in factor analytic models. Anal. Chim. Acta
**2003**, 490, 277–289. [Google Scholar] [CrossRef] - Cord, A.; Chambon, S. Automatic road defect detection by textural pattern recognition based on AdaBoost. Comput.-Aided Civ. Infrastruct. Eng.
**2012**, 27, 244–259. [Google Scholar] [CrossRef]

**Figure 1.**Boxplot showing outliers. The upper and lower fences represent values more and less than 75th and 25th percentiles (3rd and 1st quartiles), respectively, by 1.5 times the difference between the 3rd and 1st quartiles. An outlier is defined as the value above or below the upper or lower fences.

**Figure 2.**Interquartile range (IQR) projection on a normally distributed density. The median of IQR the equivalent to the mean 0$\sigma $. The value IQR = ${Q}_{3}-{Q}_{1}$ corresponds to 50% of the density distribution and the first quartile corresponds to −0.67 of the population while the third quartile corresponds to +0.67.

**Figure 3.**VGG-16 model. Diagram showing the architecture of the VGG-16 used in this study for feature extraction. Early convolution layers are re-trained on the custom dataset while the fully connected layers are used for classification.

**Figure 4.**Dataset used in this study. A sample of the dataset that was used to train our model showing different mould images (first row), paint deterioration (second row), stains (third row), and images with no defect (normal) in the fourth row.

**Figure 5.**VGG-16 architecture. The VGG-16 model that consists of five Convolution layers (in blue) each is followed by a pooling layer (in orange) and three fully-connected layers (in green), followed by a final SoftMax classifier (in purple).

**Figure 8.**Summary of the interquartile range IQR boxplots. (

**a**) shows no outlier images in the normal group; (

**b**) IQR boxplot showing two outlier images in the deterioration class; (

**c**) IQR boxplot showing one outlier image in the mould class; and (

**d**) IQR boxplot showing three outlier images in the stain class.

**Figure 9.**Outlier images in the training dataset. (

**a**,

**b**) represent outlier images from the deterioration class; (

**c**–

**e**) represent outlier images from the stain class, and (

**f**) is an outlier image from the mould class.

Without t-SNE | With t-SNE | |||||
---|---|---|---|---|---|---|

Precision | Recall | F1-Score | Precision | Recall | F1-Score | |

Deterioration | 0.72 | 0.77 | 0.74 | 0.82 | 0.82 | 0.82 |

Mould | 0.89 | 0.89 | 0.89 | 0.89 | 0.90 | 0.89 |

Normal | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

Stain | 0.76 | 0.71 | 0.71 | 0.83 | 0.80 | 0.82 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Perez, H.; Tah, J.H.M.
Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE. *Mathematics* **2020**, *8*, 662.
https://doi.org/10.3390/math8050662

**AMA Style**

Perez H, Tah JHM.
Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE. *Mathematics*. 2020; 8(5):662.
https://doi.org/10.3390/math8050662

**Chicago/Turabian Style**

Perez, Husein, and Joseph H. M. Tah.
2020. "Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE" *Mathematics* 8, no. 5: 662.
https://doi.org/10.3390/math8050662