Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks
Abstract
:1. Introduction
2. Proposed System
2.1. SNN Training
2.2. Prototype Selection
2.3. Projection in the Dissimilarity Space
2.4. SVM Classification
2.5. HASC
3. Siamese Neural Network (SNN)
3.1. The Two Identical Twin Subnetworks
3.2. Subtract Block, FC Layer, and Sigmoid Function
4. Clustering
- Step 1.
- Randomly select a set of centroids from the training data points;
- Step 2.
- For each remaining data point in the training set, find the distance between it and the nearest centroid;
- Step 3.
- Calculate new centroids via a weighted probability distribution;
- Step 4.
- Repeat Steps 2 and 3 until convergence.
5. Results
- BIRDz [39]: This balanced data set is a real-world benchmark for bird species vocalizations. The testing protocol is ten-runs using the data split in [39]. The audio tracks were extracted from the Xeno-Canto Archive (http://www.xeno-canto.org/ accessed on 20 January 2021). BIRDz contains a total of 2762 acoustic samples from eleven North American bird species along with 339 unclassified audio samples (consisting of noise and unknown bird vocalizations). The bird classes vary in size from 246 to 259. Each observation is represented by five spectrograms: (1) constant frequency, (2) frequency modulated whistles, (3) broadband pulses, (4) broadband with varying frequency components, and (5) strong harmonics;
- InfLar [43]: This data set contains eighteen narrow-band imaging (NBI) endoscopic videos of eighteen different patients with laryngeal cancer. The videos were retrospectively analyzed and categorized into four classes based on quality of the images (informative, blurred, with saliva or specular reflections, and underexposed). The average video length is 39s. The videos were acquired with an NBI endoscopic system (Olympus Visera Elite S190 video processor and an ENF-VH rhino-laryngo videoscope) with a frame rate of 25 fps and an image size of 1920 × 1072 pixels. A total of 720 video frames, 180 for each of the four classes was extracted and labeled. The testing protocol is three-fold cross-validation with data separated at the patient level to ensure that the frames from the same class were classified based on the features characteristic of each class and not on features linked to the individual patient (e.g., vocal fold anatomy).
- RPE [44]: This is a data set that contains 195 images for the classification of maturation of human stem cell-derived retinal pigmented epithelium. The images were divided into sixteen subwindows, each of which was assigned to one of four classes: (1) Fusifors (216 images of nuclei and separated cells that are fuse shaped), (2) Epithelioid (547 images of relatively packed cells and nuclei that are globular in shape), (3) Cobblestone (949 images of well-defined cell contours and cell walls that are tightly packed, homogeneous cytoplasm, and hexagonal in shape), and (4) Mixed (150 images containing two or more instances of the other three classes). Removed were images that were out of focus or that contained only background information or other clutter. The resulting total number of labeled images is 1862.
- The proposed F_NN6-Hasc ensemble improves previous methods based on Siamese networks;
- F_NN6 obtains a performance that is similar to eCNN on BIRD but lower than eCNN on the other data sets;
- Results show that the gap in performance between an ensemble of Siamese networks and CNNs is closing.
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Tversky, A. Features of Similarity. Psychol. Rev. 1977, 84, 327–352. [Google Scholar] [CrossRef]
- Cha, S.-H. Use of Distance Measures in Handwriting Analysis. Ph.D. Thesis, State University of New York at Buffalo, Buffalo, NY, USA, April 2001. [Google Scholar]
- Pękalska, E.; Duin, R.P.W. The Dissimilarity Representation for Pattern Recognition-Foundations and Applications; World Scientific: Singapore, 2005. [Google Scholar]
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
- Rubner, Y.; Tomasi, C.; Guibas, L.J. The Earth Mover’s Distance as a metric for image retrieval. Int. J. Comput. Vis. 2000, 40, 99–121. [Google Scholar] [CrossRef]
- Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recongtiion using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef] [Green Version]
- Grauman, K.; Darrell, T. The pyramid match kernel: Discriminative classification with sets of image features. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, Beijing, China, 17–21 October 2005; Volume 2, p. 1458. [Google Scholar]
- Chen, Y. Similarity-based Classification: Concepts and Algorithms. J. Mach. Learn. Res. 2009, 10, 747–776. [Google Scholar]
- Riesen, K.; Bunke, H. Graph Classification based on vector space embedding. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 1053–1081. [Google Scholar] [CrossRef]
- Pękalska, E.; Duin, R.P.W. Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2008, 38, 729–744. [Google Scholar] [CrossRef]
- Cortes, C.; Mohri, M.; Rostamizadeh, A. Algorithms for Learning Kernels Based on Centered Alignment. J. Mach. Learn. Res. 2012, 13, 795–828. [Google Scholar]
- Scholkopf, B.; Mika, S.; Burges, C.J.; Knirsch, P.; Muller, K.-R.; Ratsch, G.; Smola, A.J. Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 1999, 10, 1000–1017. [Google Scholar] [CrossRef] [Green Version]
- Duin, R.P.W.; Loog, M.; Pȩkalska, E.; Tax, D.M.J. Feature-Based Dissimilarity Space Classification. in ICPR Contests.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6388, pp. 46–55. [Google Scholar]
- Song, K. Adaptive Nearest Neighbor: A General Framework for Distance Metric Learning. arXiv 2019, arXiv:1911.10674. [Google Scholar]
- Wang, D.; Cheng, Y.; Yu, M.; Guo, X.; Zhang, T. A hybrid approach with optimization-based and metric-based meta-learner for few-shot learning. Neurocomputing 2019, 349, 202–211. [Google Scholar] [CrossRef]
- Zheng, F.; Deng, C.; Sun, X.; Jiang, X.; Guo, X.; Yu, Z.; Huang, F.; Ji, R. Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8506–8514. [Google Scholar]
- Hou, R.; Ma, B.; Chang, H.; Gu, X.; Shan, S.; Chen, X. Interaction-And-Aggregation Network for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9309–9318. [Google Scholar]
- Niethammer, M.; Kwitt, R.; Vialard, F.-X. Metric Learning for Image Registration. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 8455–8464. [Google Scholar]
- Wang, X.; Han, X.; Huang, W.; Dong, D.; Scott, M.R. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5017–5025. [Google Scholar]
- Filkovic, I.; Kalafatic, Z.; Hrkac, T. Deep metric learning for person Re-identification and De-identification. In Proceedings of the 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 30 May–3 June 2016; pp. 1360–1364. [Google Scholar]
- Bromley, J.; Bentz, J.W.; Bottou, L.; Guyon, I.; LeCun, Y.; Moore, C.; Säckinger, E.; Shah, R. Signature verification using a “siamese” time delay neural network. Int. J. Pattern Recognit. Artif. Intell. 1993, 7, 669–688. [Google Scholar] [CrossRef] [Green Version]
- Kaya, M.; Bilge, H. Şakir Deep Metric Learning: A Survey. Symmetry 2019, 11, 1066. [Google Scholar] [CrossRef] [Green Version]
- Costa, Y.M.G.; Bertolini, D.; Britto, A.S.; Cavalcanti, G.D.C.; Oliveira, L.E.S. The dissimilarity approach: A review. Artif. Intell. Rev. 2019, 53, 2783–2808. [Google Scholar] [CrossRef]
- Cha, S.-H.; Srihari, S.N. Writer Identification: Statistical Analysis and Dichotomizer. In Computer Vision; Springer: Berlin/Heidelberg, Germany, 2000; pp. 123–132. [Google Scholar]
- Pękalska, E.; Duin, R.P. Dissimilarity representations allow for building good classifiers. Pattern Recognit. Lett. 2002, 23, 943–956. [Google Scholar] [CrossRef]
- Oliveira, L.S.; Justino, E.; Sabourin, R. Off-line Signature Verification Using Writer-Independent Approach. In Proceedings of the 2007 International Joint Conference on Neural Networks, Orlando, FL, USA, 12–17 August 2007; pp. 2539–2544. [Google Scholar]
- Hanusiak, R.K.; Oliveira, L.S.; Justino, E.; Sabourin, R. Writer verification using texture-based features. Int. J. Doc. Anal. Recognit. (IJDAR) 2011, 15, 213–226. [Google Scholar] [CrossRef]
- Martins, J.G.; Oliveira, L.S.; Britto, A.S.; Sabourin, R. Forest species recognition based on dynamic classifier selection and dissimilarity feature vector representation. Mach. Vis. Appl. 2015, 26, 279–293. [Google Scholar] [CrossRef]
- Zottesso, R.H.; Costa, Y.M.; Bertolini, D.; Oliveira, L.E. Bird species identification using spectrogram and dissimilarity approach. Ecol. Inform. 2018, 48, 187–197. [Google Scholar] [CrossRef]
- Souza, V.L.F.; Oliveira, A.L.I.; Sabourin, R. A Writer-Independent Approach for Offline Signature Verification using Deep Convolutional Neural Networks Features. In Proceedings of the 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), Sao Paulo, Brazil, 22–25 October 2018; pp. 212–217. [Google Scholar]
- Pękalska, E.; Duin, R.P.; Paclík, P. Prototype selection for dissimilarity-based classifiers. Pattern Recognit. 2006, 39, 189–208. [Google Scholar] [CrossRef]
- Nguyen, G.; Worring, M.; Smeulders, A. Similarity learning via dissimilarity space in CBIR. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval, Santa Barbara, CA, USA, 26–27 October 2006. [Google Scholar]
- Theodorakopoulos, I.; Kastaniotis, D.; Economou, G.; Fotopoulos, S. HEp-2 cells classification via sparse representation of textural features fused into dissimilarity space. Pattern Recognit. 2014, 47, 2367–2378. [Google Scholar] [CrossRef]
- Hernández-Durán, M.; Calaña, Y.P.; Vazquez, H.M. Low-Resolution Face Recognition with Deep Convolutional Features in the Dissimilarity Space.In 6th Internacional Workshop on Artificial Intelligence and Pattern Recognition (IWAIPR). Havana, Cuba, 24–26 September 2018. [Google Scholar]
- Mekhazni, D.; Bhuiyan, A.; Ekladious, G.; Granger, E. Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification. In Proceedings of the 16th European Conference On Computer Vision (ECCV), Glasgow, UK, August 23–23 2020. [Google Scholar]
- Agrawal, A. Dissimilarity learning via Siamese network predicts brain imaging data. arXiv 2019. Neurons and Cognition. [Google Scholar]
- Nanni, L.; Rigo, A.; Lumini, A.; Brahnam, S. Spectrogram Classification Using Dissimilarity Space. Appl. Sci. 2020, 10, 4176. [Google Scholar] [CrossRef]
- Nanni, L.; Brahnam, S.; Lumini, A.; Maguolo, G. Animal Sound Classification Using Dissimilarity Spaces. Appl. Sci. 2020, 10, 8578. [Google Scholar] [CrossRef]
- Zhang, S.-H.; Zhao, Z.; Xu, Z.-Y.; Bellisario, K.; Pijanowski, B.C. Automatic Bird Vocalization Identification Based on Fusion of Spectral Pattern and Texture Features. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 271–275. [Google Scholar]
- Pandeya, Y.R.; Kim, D.; Lee, J. Domestic Cat Sound Classification Using Learned Features from Deep Neural Nets. Appl. Sci. 2018, 8, 1949. [Google Scholar] [CrossRef] [Green Version]
- Pandeya, Y.R.; Lee, J. Domestic Cat Sound Classification Using Transfer Learning. Int. J. Fuzzy Log. Intell. Syst. 2018, 18, 154–160. [Google Scholar] [CrossRef] [Green Version]
- San, M.; Crocco, M.; Cristani, M.; Martelli, S.; Murino, V.; Biagio, M.S.; Cristani, M. Heterogeneous auto-similarities of characteristics (hasc): Exploiting relational information for classification. In Proceedings of the IEEE Computer Vision (ICCV13), Sydney, Australia, 1–8 December 2013; pp. 809–816. [Google Scholar]
- Moccia, S.; Vanone, G.O.; De Momi, E.; Laborai, A.; Guastini, L.; Peretti, G.; Mattos, L.S. Learning-based classification of informative laryngoscopic frames. Comput. Methods Programs Biomed. 2018, 158, 21–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nanni, L.; Paci, M.; Dos Santos, F.L.C.; Skottman, H.; Juuti-Uusitalo, K.; Hyttinen, J. Texture Descriptors Ensembles Enable Image-Based Classification of Maturation of Human Stem Cell-Derived Retinal Pigmented Epithelium. PLoS ONE 2016, 11, e0149399. [Google Scholar] [CrossRef]
- Vapnik, V.N. The Support Vector method. In Computer Vision; Springer: Berlin/Heidelberg, Germany, 1997; pp. 261–271. [Google Scholar]
- Chicco, D. Siamese neural networks: An overview. In Artificial Neural Networks. Methods in Molecular Biology; Cartwright, H., Ed.; Springer Protocols: Humana, NY, USA, 2020; pp. 73–94. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics Conference (AISTATS), Ft. Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models. In Proceedings of the 30th International Conference on Machine Learning (ICM), Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Huang, J.; Ling, C. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef] [Green Version]
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
- Nanni, L.; Costa, Y.M.G.; Lumini, A.; Kim, M.Y.; Baek, S.R. Combining visual and acoustic features for music genre classification. Expert Syst. Appl. 2016, 45, 108–117. [Google Scholar] [CrossRef]
- Nanni, L.; Costa, Y.; Lucio, D.; Silla, C.; Brahnam, S. Combining visual and acoustic features for audio classification tasks. Pattern Recognit. Lett. 2017, 88, 49–56. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhang, S.-H.; Xu, Z.-Y.; Bellisario, K.; Dai, N.-H.; Omrani, H.; Pijanowski, B.C. Automated bird acoustic event detection and robust species classification. Ecol. Inform. 2017, 39, 99–108. [Google Scholar] [CrossRef]
- Patrini, I.; Ruperti, M.; Moccia, S.; Mattos, L.S.; Frontoni, E.; De Momi, E. Transfer learning for informative-frame selection in laryngoscopic videos through learned features. Med Biol. Eng. Comput. 2020, 58, 1225–1238. [Google Scholar] [CrossRef]
- Nanni, L.; Paci, M.; Brahnam, S.; Ghidoni, S. An ensemble of visual features for Gaussians of local descriptors and non-binary coding for texture descriptors. Expert Syst. Appl. 2017, 82, 27–39. [Google Scholar] [CrossRef]
- Fristrup, K.M.; Watkins, W.A. Marine animal sound classification. J. Acoust. Soc. Am. 1995, 97, 3369. [Google Scholar] [CrossRef] [Green Version]
Siamese Network 1 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 215 × 215 × 64 | 6464 | 10 × 10 | 64 |
ReLU | 215 × 215 × 64 | 0 | ||
Max Pooling | 107 × 107 × 64 | 0 | 2 × 2 | |
2D Convolution | 101 × 101 × 128 | 401,536 | 7 × 7 | 128 |
ReLU | 101 × 101 × 128 | 0 | ||
Max Pooling | 50 × 50 × 128 | 0 | 2 × 2 | |
2D Convolution | 47 × 47 × 128 | 262,272 | 4 × 4 | 128 |
ReLU | 47 × 47 × 128 | 0 | ||
Max Pooling | 23 × 23 × 128 | 0 | 2 × 2 | |
2D Convolution | 19 × 19 × 64 | 204,864 | 5 × 5 | 64 |
ReLU | 19 × 19 × 64 | 0 | ||
Fully Connected | 4096 | 94,638,080 | ||
Siamese Network 2 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | 0 | ||
2D Convolution | 220 × 220 × 64 | 1664 | 5 × 5 | 64 |
LeakyReLU | 220 × 220 × 64 | 0 | ||
2D Convolution | 216 × 216 × 64 | 102,464 | 5 × 5 | 64 |
LeakyReLU | 216 × 216 × 64 | 0 | ||
Max Pooling | 108 × 108 × 64 | 0 | 2 × 2 | |
2D Convolution | 106 × 106 × 128 | 73,856 | 3 × 3 | 128 |
LeakyReLU | 106 × 106 × 128 | 0 | ||
2D Convolution | 104 × 104 × 128 | 147,584 | 3 × 3 | 128 |
LeakyReLU | 104 × 104 × 128 | 0 | ||
Max Pooling | 52 × 52 × 128 | 0 | 2 × 2 | |
2D Convolution | 49 × 49 × 128 | 262,272 | 4 × 4 | 128 |
LeakyReLU | 49 × 49 × 128 | 0 | ||
Max Pooling | 24 × 24 × 128 | 0 | 2 × 2 | |
2D Convolution | 20 × 20 × 64 | 204,864 | 5 × 5 | 64 |
LeakyReLU | 20 × 20 × 64 | 0 | 5 × 5 | |
Fully Connected | 2048 | 52,430,848 | ||
Siamese Network 3 | ||||
Layers | Activations | Learnable | Filter Size | Num. Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 55 × 55 × 128 | 6400 | 7 × 7 | 128 |
Max Pooling | 27 × 27 × 128 | 0 | 2 × 2 | |
2D Convolution | 23 × 23 × 256 | 819,456 | 5 × 5 | 256 |
ReLU | 23 × 23 × 256 | 0 | ||
2D Convolution | 19 × 19 × 128 | 819,328 | 5 × 5 | 128 |
Max Pooling | 9 × 9 × 128 | 0 | 2 × 2 | |
2D Convolution | 7 × 7 × 64 | 73,792 | 3 × 3 | 64 |
ReLU | 7 × 7 × 64 | 0 | ||
Max Pooling | 3 × 3 × 64 | 0 | 2 × 2 | |
Fully Connected | 4096 | 2,363,392 | ||
Siamese Network 4 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 218 × 218 × 128 | 6400 | 7 × 7 | 128 |
Max Pooling | 54 × 54 × 128 | 0 | 4 × 4 | |
ReLU | 54 × 54 × 128 | 0 | ||
2D Convolution | 50 × 50 × 256 | 819,456 | 5 × 5 | 256 |
ReLU | 50 × 50 × 256 | 0 | ||
2D Convolution | 48 × 48 × 64 | 147,520 | 3 × 3 | 64 |
Max Pooling | 24 × 24 × 64 | 0 | 2 × 2 | |
2D Convolution | 22 × 22 × 128 | 73,856 | 3 × 3 | 128 |
ReLU | 22 × 22 × 128 | 0 | ||
2D Convolution | 18 × 18 × 64 | 204,864 | 5 × 5 | 64 |
Fully Connected | 4096 | 84,938,752 | ||
Siamese Network 5 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 215 × 215 × 64 | 6464 | 10 × 10 | 64 |
Max Pooling | 107 × 107 × 64 | 0 | 2 × 2 | |
ReLU | 107 × 107 × 64 | 0 | ||
2D Convolution | 26 × 26 × 128 | 401,536 | 7 × 7 | 128 |
ReLU | 26 × 26 × 128 | 0 | ||
2D Convolution | 9 × 9 × 128 | 409,728 | 5 × 5 | 128 |
ReLU | 9 × 9 × 128 | 0 | ||
2D Convolution | 6 × 6 × 64 | 131,136 | 4 × 4 | 64 |
ReLU | 6 × 6 × 64 | 0 | ||
Fully Connected | 4096 | 9,441,280 | ||
Siamese Network 6 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 218 × 218 × 64 | 3200 | 7 × 7 | 64 |
Max Pooling | 109 × 109 × 64 | 0 | 2 × 2 | |
ReLU | 109 × 109 × 64 | 0 | ||
2D Convolution | 107 × 107 × 128 | 73,856 | 3 × 3 | 128 |
Max Pooling | 53 × 53 × 128 | 0 | 2 × 2 | |
ReLU | 53 × 53 × 128 | 0 | ||
2D Convolution | 53 × 53 × 64 | 8256 | 1 × 1 | 64 |
ReLU | 53 × 53 × 64 | 0 | ||
2D Convolution | 51 × 51 × 128 | 73,856 | 3 × 3 | 128 |
ReLU | 51 × 51 × 128 | 0 | ||
Max Pooling | 25 × 25 × 128 | 0 | 2 × 2 | |
2D Convolution | 25 × 25 × 128 | 16,512 | 1 × 1 | 128 |
ReLU | 25 × 25 × 128 | 0 | ||
2D Convolution | 22 × 22 × 64 | 131,136 | 4 × 4 | 64 |
Max Pooling | 11 × 11 × 64 | 0 | 2 × 2 | |
ReLU | 11 × 11 × 64 | 0 | ||
Fully Connected | 4096 | 31,723,520 | ||
Siamese Network 7 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
Dropout Layer | 224 × 224 | 0 | ||
2D Convolution | 218 × 218 × 64 | 3200 | 7 × 7 | 64 |
Max Pooling | 109 × 109 × 64 | 0 | 2 × 2 | |
2D Convolution | 105 × 105 × 128 | 204,928 | 5 × 5 | 128 |
Max Pooling | 52 × 52 × 128 | 0 | 2 × 2 | |
2D Convolution | 48 × 48 × 64 | 204,864 | 5 × 5 | 64 |
Max Pooling | 24 × 24 × 64 | 0 | 2 × 2 | |
2D Convolution | 22 × 22 × 256 | 147,712 | 3 × 3 | 256 |
Max Pooling | 11 × 11 × 256 | 0 | 2 × 2 | |
2D Convolution | 9 × 9 × 256 | 590,080 | 3 × 3 | 256 |
Fully Connected | 4096 | 16,781,312 | ||
Siamese Network 8 | ||||
Layers | Activations | Learnable | Filter Size | Num. of Filters |
Input Layer | 224 × 224 | |||
2D Convolution | 215 × 215 × 32 | 3232 | 10 × 10 | 32 |
Max Pooling | 107 × 107 × 32 | 0 | 2 × 2 | |
ReLU | 107 × 107 × 32 | 0 | ||
2D Grouped Convolution | 101 × 101 × 64 | 50,240 | 7 × 7 | 64 |
2D Convolution | 97 × 97 × 128 | 204,928 | 5 × 5 | 128 |
Max Pooling | 48 × 48 × 128 | 0 | 2 × 2 | |
ReLU | 48 × 48 × 128 | 0 | ||
2D Grouped Convolution | 46 × 46 × 256 | 147,712 | 3 × 3 | 256 |
Fully Connected | 4096 | 2.218,790,912 |
Name | Input Image | Network Topology | #Classifiers | CAT | InfLar | BIRD | RPE |
---|---|---|---|---|---|---|---|
HASC | NN1 | 4 | 78.64 | 90.56 | 94.52 | 84.46 | |
HASC | NN2 | 4 | 81.69 | 88.33 | 93.22 | 84.75 | |
HASC | NN3 | 4 | 78.64 | 79.44 | 94.91 | 82.59 | |
HASC | NN4 | 4 | 82.37 | 88.33 | 93.33 | 84.58 | |
HASC | NN5 | 4 | 78.98 | 87.64 | 94.04 | 80.09 | |
HASC | NN6 | 4 | 80.68 | 89.72 | 93.09 | 85.22 | |
HASC | NN7 | 4 | 76.61 | 80.97 | 91.97 | 82.18 | |
HASC | NN8 | 4 | 78.64 | 85.69 | 91.37 | 80.84 | |
F_NN4 | HASC | NN1 … NN4 | 16 | 84.07 | 89.86 | 94.99 | 84.80 |
F_NN6 | HASC | NN1 … NN6 | 24 | 84.41 | 91.11 | 95.10 | 85.24 |
F_NN8 | HASC | NN1 … NN8 | 32 | 84.75 | 90.56 | 95.10 | 84.80 |
[37] | 82.41 | 74.86 | 92.97 | 66.19 | |||
[38] | 84.07 | 89.86 | 94.99 | 84.80 |
Name | Input Image | Network Topology | #Classifiers | CAT | InfLar | BIRD | RPE |
---|---|---|---|---|---|---|---|
Spect/Im | NN1 | 4 | 78.64 | 74.72 | 92.46 | 63.60 | |
Spect/Im | NN2 | 4 | 76.95 | 71.39 | 92.74 | 37.81 | |
Spect/Im | NN3 | 4 | 75.25 | 83.47 | 93.02 | --- | |
Spect/Im | NN4 | 4 | 81.36 | 74.17 | 91.86 | --- | |
Spect/Im | NN5 | 4 | 76.95 | 81.25 | 94.03 | --- | |
Spect/Im | NN6 | 4 | 78.31 | 75.46 | 91.96 | --- | |
Spect/Im | NN7 | 4 | 72.54 | 66.81 | 88.43 | --- | |
Spect/Im | NN8 | 4 | 79.32 | 77.92 | 94.14 | --- | |
F_NN4 | Spect/Im | NN1 … NN4 | 16 | 79.32 | 79.17 | 93.44 | --- |
F_NN6 | Spect/Im | NN1 … NN6 | 24 | 81.69 | 80.69 | 93.76 | --- |
F_NN8 | Spect/Im | NN1 … NN8 | 32 | 83.39 | 79.58 | 94.24 | --- |
Method | CAT | BIRD | InfLar | RPE |
---|---|---|---|---|
F_NN4-Hasc | 84.07 | 94.99 | 89.86 | 84.80 |
F_NN6-Hasc | 84.41 | 95.10 | 91.10 | 85.24 |
F_NN8-Hasc | 84.75 | 95.10 | 90.56 | 84.00 |
GoogleNet | 82.98 | 92.41 | 90.42 | 87.70 |
VGG16 | 84.07 | 95.30 | 91.53 | 89.27 |
VGG19 | 83.05 | 95.19 | 92.22 | 89.30 |
GoogleNetP365 | 85.15 | 92.94 | 93.61 | 88.51 |
eCNN | 87.36 | 95.81 | 94.03 | 89.82 |
F_NN6-Hasc + eCNN | 88.14 | 96.04 | 95.56 | 89.75 |
F_NN8-Hasc + eCNN | 88.14 | 96.04 | 94.86 | 89.86 |
Method | CAT | BIRD | InfLar | RPE |
---|---|---|---|---|
[37] | 0.967 | 0.983 | 0.906 | |
F_NN4-Hasc | 0.973 | 0.993 | 0.982 | 0.938 |
F_NN6-Hasc | 0.973 | 0.993 | 0.985 | 0.937 |
F_NN8-Hasc | 0.975 | 0.995 | 0.985 | 0.933 |
GoogleNet | 0.979 | 0.994 | 0.992 | 0.966 |
VGG16 | 0.984 | 0.997 | 0.994 | 0.966 |
VGG19 | 0.981 | 0.997 | 0.995 | 0.972 |
GoogleNetP365 | 0.986 | 0.995 | 0.993 | 0.969 |
eCNN | 0.987 | 0.997 | 0.996 | 0.972 |
F_NN6-Hasc + eCNN | 0.986 | 0.996 | 0.997 | 0.968 |
F_NN8-Hasc + eCNN | 0.987 | 0.997 | 0.997 | 0.969 |
Authors | Reference | CAT | BIRD | InfLar | RPE |
---|---|---|---|---|---|
Nanni et al. | [51] | — | 96.3 | — | — |
Nanni et al. | [52] | — | 95.1 | — | — |
Zhao et al. | [53] | — | 93.6 | — | — |
Pandeya & Lee. | [41] | 87.7 | — | — | — |
Pandeya et al. | [40] | 91.1 | — | — | — |
Pandeya et al. | [40]−CNN | 90.8 | — | — | — |
Zhang et al. | [39] | — | 96.7 | — | — |
Patrini et al. | [54] | — | — | 93.25 | — |
Moccia et al. | [43] | — | — | 80.25 | — |
Nanni et al. | [55] | — | — | — | 97.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanni, L.; Minchio, G.; Brahnam, S.; Maguolo, G.; Lumini, A. Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks. Sensors 2021, 21, 1573. https://doi.org/10.3390/s21051573
Nanni L, Minchio G, Brahnam S, Maguolo G, Lumini A. Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks. Sensors. 2021; 21(5):1573. https://doi.org/10.3390/s21051573
Chicago/Turabian StyleNanni, Loris, Giovanni Minchio, Sheryl Brahnam, Gianluca Maguolo, and Alessandra Lumini. 2021. "Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks" Sensors 21, no. 5: 1573. https://doi.org/10.3390/s21051573
APA StyleNanni, L., Minchio, G., Brahnam, S., Maguolo, G., & Lumini, A. (2021). Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks. Sensors, 21(5), 1573. https://doi.org/10.3390/s21051573