Neighbor-Based Label Distribution Learning to Model Label Ambiguity for Aerial Scene Classification
Abstract
:1. Introduction
1.1. Difficulties in Modeling Label Ambiguity
- The widely used label distribution is Gaussian smoothing from the sample ground truth to close labels [7,8,9,10,11], as shown in Figure 1. For the age images, local sample similarity is known, i.e., similar images have close ages; also, global label correlations are available, i.e., the age difference reflects label correlations. The Gaussian distribution properly models the label ambiguity because both the local similarity and label correlations are captured.
- For scene classification or generic SLL problems, neither local similarity nor label correlations are known, and thus it is hard to model label ambiguity. As mentioned in Gao et al. [7], modeling label ambiguity is challenging due to the diversity of label space.
1.2. Motivation and Fundamental Ideal
- We assume that the label ambiguity is caused by sample neighbors having different but correlated labels. As shown in Figure 2, the images of scenes Center, Square, and Stadium share similar visual features; hence, labels Square and Stadium cause the label ambiguity of the Center sample annotated by the red box. As label ambiguity originates from label correlations, we desire to uncover the local sample neighbors that satisfy global label correlations.
- Subspace learning [12,13,14] has the potential to uncover sample neighbors that satisfy a certain property. To find neighbors having discrimination ability, subspace learning is adopted for semi-supervised learning (SSL) or clustering problems [15,16,17]. In subspace learning, the neighboring relations are formulated as the representation coefficient matrix that takes samples as the dictionary and reconstructs each sample as a linear combination of others, i.e., the self-expressiveness assumption [12]. The sample affinity graph is determined by the coefficient matrix. Discriminative neighbors desire the matrix to be block-diagonal, i.e., neighboring relations occur only between samples of the same label. Many constraints are proposed to purchase the block-diagonal matrix, such as the block-wise [18], low-rank [19], and group-sparse [20] constraints.
- The subspace learning methods developed for SSL or clustering problems are suboptimal for modeling label ambiguity. The desired discriminative neighbors are used for capturing reliable unlabeled samples in SSL or producing accurate clustering membership. The block-diagonal matrix requires sample neighbors have the same label, which is inappropriate for uncovering the neighbors with different labels.
1.3. Contributions
- Different from most subspace learning methods, our method is developed for modeling label ambiguity. Most subspace learning methods emphasize the discrimination ability of neighboring relations, and the learned affinity graph is encouraged to be block-diagonal. Conversely, we aim to uncover neighboring relations among different but correlated labels. Figure 3 shows that our affinity graph is consistent with the global label correlations and is not block-diagonal.
- Most LDL methods are invalid for generic SLL problems, and we model label ambiguity by jointly capturing local sample similarity and global label correlations. Although the data-dependent LDL (D2LDL) proposed by He et al. [21] has the potential to handle generic SLL problems, it only uses a norm to uncover sample neighbors but overlooks label correlations. In contrast, we introduce label correlations to uncover sample neighbors, which can reduce label noise, as explained in Figure 2.
- We define a subspace learning problem, which jointly captures local sample similarity and global label correlations. The neighbor-based label distribution can robustly express label ambiguity.
- To our knowledge, this is the first LDL work that can manage generic SLL problems. Experiment results demonstrate that using the label distributions can prevent CNNs from over-fitting and assist feature learning.
2. Related Works
2.1. Deep Learning
2.2. Label Distribution Learning (LDL)
2.3. Subspace Learning
3. Proposed Method
3.1. Modeling Label Ambiguity
- The agreement with local sample similarity. If samples m and n have similar features, samples m and n are neighboring, i.e., .
- The consistence with global label correlations. If samples m and n have substantially different labels, samples m and n are not neighboring, i.e., .
3.1.1. Uncovering Sample Neighbors
3.1.2. Predefining Label Correlations
3.1.3. Optimization
Algorithm 1: Solving the objective Function (3) through ADMM |
Input: X, , , , |
Initialize:, , , , , , , |
Compute the constant term in Equation (9) |
While the convergence conditions are not satisfied: |
1: update according to Equations (7) and (8) |
2: update according to Equation (9) |
3: update according to Equation (10) |
4: update and according to Equation (11) |
5: update by |
6: check the convergence conditions: and |
7: |
8: if , break |
End while |
Output: |
3.2. Constructing Label Distributions
3.2.1. Label Propagation
3.2.2. Rectifying Label Distributions
3.3. CNN Learning Framework
4. Experiments and Results
4.1. Implementation Details
4.1.1. Datasets and Protocols
4.1.2. Network Backbones
4.1.3. Parameters for Subspace Learning
4.2. Analysis of Label Distributions
4.2.1. Agreement with Local Similarity
4.2.2. Consistence with Label Correlations
- The dark elements in G indicate the correlated label pairs. For example, the green boxes show that scenes Rail S and Indus are correlated. The explanation is that lots of Rail S images and Indus images are confused with each other in validation sets.
- The light elements in A are globally consistent with the dark elements in G. For example, the red boxes suggest that many Rail S samples have Indus neighbors. The explanation is that Z is penalized by G in objective Function (3) and thus the neighboring relations are consistent with label correlations.
4.2.3. Comparisons with D2LDL
- D2LDL can construct adaptive label distributions but results in many distorted distributions, which imply gross label noise. For example, the yellow rectangle shows that many Rail S samples are associated with label Indus, which reflects the correlation between Rail S and Indus.
- Our method produces much less distorted distributions compared to D2LDL (138 vs. 411), which suggests the reduction of noise. Compared to the yellow rectangular region, the red rectangular region is ‘cleaner’, which indicates that fewer samples are contaminated by noisy labels. Therefore, label noise can be reduced by introducing label correlations to regularize the discovery of neighbors.
- The distorted distributions of our method originate from severely confused image contents. As shown in Figure 9, the image semantics are confused with the ambiguous labels even by humans.
4.3. Analysis of the Network Performance
4.3.1. Classification Accuracy
- LSR, CWL, and D2LDL all realize higher accuracies than using sample ground truth; hence, the efforts to address label ambiguity are beneficial for aerial scene classification.
- CWL methods yield competitive performance over D2LDL, which demonstrates the role of incorporating label correlations into label representations.
- Our methods clearly outperform CWL. The reason is that the label representations of CWL are class-specific and thus are inflexible to adapt to the large intraclass variation of scene images. In contrast, our neighbor-based label distributions capture local sample similarity and can express different patterns of label distributions.
- Our methods substantially outperform D2LDL, which highlights the effectiveness of considering label correlations. Figure 10 presents the comparisons in terms of confusion matrices. There are large accuracy gaps in the scenes of Center (0.81 vs. 0.78), Rail S (0.91 vs. 0.84), and School (0.74 vs. 0.69). These scenes are usually comprised of complicated visual contents and are prone to the involvement of diverse scene labels. Accordingly, these scenes are susceptible to label noise. Facilitated by label correlations, our methods reduce the label noise and can robustly encode label ambiguity. Therefore, our label distribution is effective in representing complex scenes.
4.3.2. Feature Robustness
4.4. Comparisons with State-of-the-Art Methods
- Although we only use common network structures (i.e., ResNet), our method achieves comparable performance compared to recent deep learning methods which devise complicated network structures, such as Attention-GAN [22] and CapsNet [56]. The explanation is that the label distributions substantially improve data utilization by associating samples with multiple labels. Thus, our method is a simple yet effective approach for aerial scene classification.
- SF-CNN [57] slightly surpasses our method on NR-0.1 (89.89 vs. 89.80). The SF-CNN, namely scale-free CNN, enlarges sample number by 4 times through resizing each image to 4 scales from 224×224 to 400 × 400. In contrast, our computational complexity is much lower because we do not apply data augmentation.
4.5. Experiments Using Different Sizes of Datasets
4.6. Influence of Parameters and Time Efficiency
4.6.1. Influence of
4.6.2. Influence of D
4.6.3. Influence of
4.6.4. Time Efficiency
5. Discussion
5.1. Summary of the Experiment Results
- Our label distributions can robustly model label ambiguity for aerial images. Compared to D2LDL, our method substantially reduced label noise by incorporating label correlations, as shown in Figure 8.
- Using our label distributions for network training yielded competitive classification performance. Compared to the label representations of LSR, CWL, or D2LDL, our label distributions led to higher accuracies, as presented in Table 3.
- Our method can improve the generation ability of networks and is useful, especially for small datasets. Regularizing output distributions by our label distributions helps to mitigate over-fitting problems, as illustrated in Figure 13.
- It is convenient and time-efficient to apply our method to common network structures. We only introduced label distributions to regularize network learning, and the time for constructing label distributions was short, as listed in Table 5.
5.2. Why Does N-LDL Work Well?
5.3. Are There Better Label Distributions?
- As label distributions are expected to have low trace values, a trace norm can be imposed on network outputs (i.e., predicted label distributions) in loss functions, which enables the framework to be end-to-end trainable.
- GCNs learned to map the initial label graph (such as the S or G in this paper) into inter-dependent label embeddings that can implicitly model label correlations. The label embeddings project sample features into network outputs, which enable the predicted label distributions to be consistent with label correlations.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Shao, Z.; Yang, K.; Zhou, W. A benchmark dataset for performance evaluation of multi-label remote sensing image retrieval. Remote Sens. 2018, 10, 964. [Google Scholar] [CrossRef] [Green Version]
- Ji, J.; Jing, W.; Chen, G.; Lin, J.; Song, H. Multi-label remote sensing image classification with latent semantic dependencies. Remote Sens. 2020, 12, 1110. [Google Scholar] [CrossRef] [Green Version]
- Shin, S.J.; Kim, S.; Kim, Y.; Kim, S. Hierarchical multi-label object detection framework for Remote Sensing Image. Remote Sens. 2020, 12, 2734. [Google Scholar] [CrossRef]
- Geng, X. Label Distribution Learning. IEEE Trans. Knowl. Data Eng. 2016, 28, 1734–1748. [Google Scholar] [CrossRef] [Green Version]
- Gao, B.B.; Xing, C.; Xie, C.W.; Wu, J.; Geng, X. Deep Label Distribution Learning with Label Ambiguity. IEEE Trans. Image Process. 2017, 26, 2825–2838. [Google Scholar] [CrossRef] [Green Version]
- Gao, B.B.; Zhou, H.Y.; Wu, J.; Geng, X. Age estimation using expectation of label distribution learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 712–718. [Google Scholar]
- Yang, J.; Chen, L.; Zhang, L.; Sun, X.; She, D.; Lu, S.P.; Cheng, M.M. Historical context-based style classification of painting images via label distribution learning. In Proceedings of the 2018 ACM Multimedia Conference (MM 2018), Seoul, Korea, 22–26 October 2018; pp. 1154–1162. [Google Scholar]
- Xu, L.; Chen, J.; Gan, Y. Head pose estimation with soft labels using regularized convolutional neural network. Neurocomputing 2019, 337, 339–353. [Google Scholar] [CrossRef]
- Wu, X.; Wen, N.; Liang, J.; Lai, Y.K.; She, D.; Cheng, M.M.; Yang, J. Joint acne image grading and counting via label distribution learning. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 10641–10650. [Google Scholar]
- Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [Green Version]
- Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adler, A.; Elad, M.; Hel-Or, Y. Linear-Time Subspace Clustering via Bipartite Graph Modeling. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2234–2246. [Google Scholar] [CrossRef]
- Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Wright, J.; Ma, Y.; Mairal, J.; Sapiro, G.; Huang, T.S.; Yan, S. Sparse representation for computer vision and pattern recognition. Proc. IEEE 2010, 98, 1031–1044. [Google Scholar] [CrossRef] [Green Version]
- Fang, X.; Han, N.; Wong, W.K.; Teng, S.; Wu, J.; Xie, S.; Li, X. Flexible Affinity Matrix Learning for Unsupervised and Semisupervised Classification. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1133–1149. [Google Scholar] [CrossRef]
- Li, C.G.; Vidal, R. Structured Sparse Subspace Clustering: A unified optimization framework. In Proceedings of the 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 277–286. [Google Scholar]
- Vidal, R.; Favaro, P. Low rank subspace clustering (LRSC). Pattern Recognit. Lett. 2014, 43, 47–61. [Google Scholar] [CrossRef] [Green Version]
- Fang, Y.; Wang, R.; Dai, B.; Wu, X. Graph-based learning via auto-grouped sparse regularization and kernelized extension. IEEE Trans. Knowl. Data Eng. 2015, 27, 142–154. [Google Scholar] [CrossRef]
- He, Z.; Li, X.; Zhang, Z.; Wu, F.; Geng, X.; Zhang, Y.; Yang, M.H.; Zhuang, Y. Data-Dependent Label Distribution Learning for Age Estimation. IEEE Trans. Image Process. 2017, 26, 3846–3858. [Google Scholar] [CrossRef]
- Yu, Y.; Li, X.; Liu, F. Attention GANs: Unsupervised Deep Feature Learning for Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 519–531. [Google Scholar] [CrossRef]
- Bi, Q.; Qin, K.; Li, Z.; Zhang, H.; Xu, K.; Xia, G.S. A Multiple-Instance Densely-Connected ConvNet for Aerial Scene Classification. IEEE Trans. Image Process. 2020, 29, 4911–4926. [Google Scholar] [CrossRef]
- Guo, Y.; Ji, J.; Lu, X.; Huo, H.; Fang, T.; Li, D. Global-Local Attention Network for Aerial Scene Classification. IEEE Access 2019, 7, 67200–67212. [Google Scholar] [CrossRef]
- Tong, W.; Chen, W.; Han, W.; Li, X.; Wang, L. Channel-Attention-Based DenseNet Network for Remote Sensing Image Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4121–4132. [Google Scholar] [CrossRef]
- Hashim, F.A.; Hussain, K.; Houssein, E.H.; Mabrouk, M.S.; Al-Atabany, W. Archimedes optimization algorithm: A new metaheuristic algorithm for solving optimization problems. Appl. Intell. 2020, 51, 1531–1551. [Google Scholar] [CrossRef]
- Pathak, Y.; Arya, K.V.; Tiwari, S. Feature selection for image steganalysis using levy flight-based grey wolf optimization. Multimed. Tools Appl. 2019, 78, 1473–1494. [Google Scholar] [CrossRef]
- Canayaz, M. MH-COVIDNet: Diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images. Biomed. Signal Process. Control 2021, 64, 102257. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Geng, X. Classification with label distribution learning. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3712–3718. [Google Scholar]
- González, M.; González-Almagro, G.; Triguero, I.; Cano, J.R.; García, S. Decomposition-Fusion for Label Distribution Learning. Inf. Fusion 2021, 66, 64–75. [Google Scholar] [CrossRef]
- González, M.; Cano, J.R.; García, S. ProLSFEO-LDL: Prototype selection and label- specific feature evolutionary optimization for label distribution learning. Appl. Sci. 2020, 10, 3089. [Google Scholar]
- Xu, M.; Zhou, Z.H. Incomplete label distribution learning. In Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI), Melboume, Australia, 19–25 August 2017; pp. 3175–3181. [Google Scholar]
- Jia, X.; Ren, T.; Chen, L.; Wang, J.; Zhu, J.; Long, X. Weakly supervised label distribution learning based on transductive matrix completion with sample correlations. Pattern Recognit. Lett. 2019, 125, 453–462. [Google Scholar] [CrossRef]
- Zeng, X.Q.; Chen, S.F.; Xiang, R.; Wu, S.X.; Wan, Z.Y. Filling missing values by local reconstruction for incomplete label distribution learning. Int. J. Wirel. Mob. Comput. 2019, 16, 314–321. [Google Scholar] [CrossRef]
- Zeng, X.Q.; Chen, S.F.; Xiang, R.; Li, G.Z.; Fu, X.F. Incomplete label distribution learning based on supervised neighborhood information. Int. J. Mach. Learn. Cybern. 2020, 11, 111–121. [Google Scholar] [CrossRef]
- Chen, K.; Kämäräinen, J.K.; Zhang, Z. Facial age estimation using robust label distribution. In Proceedings of the 2016 ACM Multimedia Conference (MM 2016), Amsterdam, The Netherlands, 15–19 October 2016; pp. 77–81. [Google Scholar]
- Ling, M.; Geng, X. Indoor Crowd Counting by Mixture of Gaussians Label Distribution Learning. IEEE Trans. Image Process. 2019, 28, 5691–5701. [Google Scholar] [CrossRef] [PubMed]
- Li, P.; Hu, Y.; Wu, X.; He, R.; Sun, Z. Deep label refinement for age estimation. Pattern Recognit. 2020, 100, 107178. [Google Scholar] [CrossRef]
- Ou, Y.; Luo, J.; Li, B.; Swamy, M.N.S. Gray-level image denoising with an improved weighted sparse coding. J. Vis. Commun. Image Represent. 2020, 72, 102895. [Google Scholar] [CrossRef]
- Zhuang, L.; Gao, S.; Tang, J.; Wang, J.; Lin, Z.; Ma, Y.; Yu, N. Constructing a Nonnegative Low-Rank and Sparse Graph With Data-Adaptive Features. IEEE Trans. Image Process. 2015, 24, 3717–3728. [Google Scholar] [CrossRef] [Green Version]
- Li, C.G.; You, C.; Vidal, R. Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework. IEEE Trans. Image Process. 2017, 26, 2988–3001. [Google Scholar] [CrossRef]
- Zhai, H.; Zhang, H.; Zhang, L.; Li, P.; Plaza, A. A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 43–47. [Google Scholar] [CrossRef]
- Chen, H.; Wang, W.; Feng, X. Structured Sparse Subspace Clustering with Within-Cluster Grouping. Pattern Recognit. 2018, 83, 107–118. [Google Scholar] [CrossRef]
- Wang, L.; Guo, S.; Huang, W.; Xiong, Y.; Qiao, Y. Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs. IEEE Trans. Image Process. 2017, 26, 2055–2068. [Google Scholar] [CrossRef] [Green Version]
- Lei, Y.; Dong, Y.; Xiong, F.; Bai, H.; Yuan, H. Confusion Weighted Loss for Ambiguous Classification. In Proceedings of the 2018 IEEE International Conference on Visual Communications and Image Processing, Taichung, Taiwan, China, 9–12 December 2018. [Google Scholar]
- Chang, C.C.; Lin, C.J. LIBSVM: A Library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2010, 3, 1–122. [Google Scholar] [CrossRef]
- Talukdar, P.P.; Crammer, K. New regularized algorithms for transductive learning. Lect. Notes Comput. Sci. 2009, 5782, 442–457. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G. When does label smoothing help? In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Pereyra, G.; Tucker, G.; Chorowski, J.; Kaiser, Ł.; Hinton, G. Regularizing neural networks by penalizing confident output distributions. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Hou, J.; Zeng, H.; Cai, L.; Zhu, J.; Chen, J.; Ma, K.K. Multi-label learning with multi-label smoothing regularization for vehicle re-identification. Neurocomputing 2019, 345, 15–22. [Google Scholar] [CrossRef]
- He, N.; Fang, L.; Li, S.; Plaza, A.; Plaza, J. Remote sensing scene classification using multilayer stacked covariance pooling. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6899–6910. [Google Scholar] [CrossRef]
- Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
- Yu, Y.; Liu, F. Dense connectivity based two-stream deep feature fusion framework for aerial scene classification. Remote Sens. 2018, 10, 1158. [Google Scholar] [CrossRef] [Green Version]
- Zhang, W.; Tang, P.; Zhao, L. Remote sensing image scene classification using CNN-CapsNet. Remote Sens. 2019, 11, 494. [Google Scholar] [CrossRef] [Green Version]
- Xie, J.; He, N.; Fang, L.; Plaza, A. Scale-free convolutional neural network for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6916–6928. [Google Scholar] [CrossRef]
- He, N.; Fang, L.; Li, S.; Plaza, J.; Plaza, A. Skip-Connected Covariance Network for Remote Sensing Scene Classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 1461–1474. [Google Scholar] [CrossRef] [Green Version]
- Xu, K.; Huang, H.; Deng, P.; Shi, G. Two-stream feature aggregation deep neural network for scene classification of remote sensing images. Inf. Sci. 2020, 539, 250–268. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Bagherinezhad, H.; Horton, M.; Rastegari, M.; Farhadi, A. Label refinery: Improving ImageNet classification through label progression. arXiv 2018, arXiv:1805.06241. [Google Scholar]
- Liu, Z.; Chen, Z.; Bai, J.; Li, S.; Lian, S. Facial pose estimation by deep learning from label distributions. In Proceedings of the 2019 International Conference on Computer Vision Workshop, Seoul, Korea, 27 October 2 November 2019; pp. 1232–1240. [Google Scholar]
- Van Der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2625. [Google Scholar]
- Xie, L.; Wang, J.; Wei, Z.; Wang, M.; Tian, Q. DisturbLabel: Regularizing CNN on the loss layer. In Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4753–4762. [Google Scholar]
- Chen, B.; Deng, W.; Du, J. Noisy softmax: Improving the generalization ability of DCNN via postponing the early softmax saturation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4021–4030. [Google Scholar]
- Wu, B.; Jia, F.; Liu, W.; Ghanem, B.; Lyu, S. Multi-label Learning with Missing Labels Using Mixed Dependency Graphs. Int. J. Comput. Vis. 2018, 126, 875–896. [Google Scholar] [CrossRef] [Green Version]
- Zhu, C.; Miao, D.; Wang, Z.; Zhou, R.; Wei, L.; Zhang, X. Global and local multi-view multi-label learning. Neurocomputing 2020, 371, 67–77. [Google Scholar] [CrossRef]
- Chen, Z.M.; Wei, X.S.; Wang, P.; Guo, Y. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 16–20 June 2019; pp. 5172–5181. [Google Scholar]
- Liu, Y.; Chen, W.; Qu, H.; Mahmud, S.M.H.; Miao, K. Weakly supervised image classification and pointwise localization with graph convolutional networks. Pattern Recognit. 2021, 109, 107596. [Google Scholar] [CrossRef]
Symbol | Description |
---|---|
N | The number of training samples |
C | The number of labels |
D | The dimension of the sample features |
The ground truth of the nth sample | |
The binary matrix of sample truth labels | |
The set of sample features | |
The matrix of global label correlations | |
The matrix of representation coefficients | |
The sample affinity graph | |
The set of label distributions | |
The set of rectified label distributions |
Method | Description |
---|---|
Original CNNs | Sample ground truth is used to train original VGG and ResNet. |
Label smoothing regularization (LSR) [50,52] | LSR is the standard label smoothing method, which handles label ambiguity by uniform label smoothing. The smoothing parameter to 0.1. |
Confusion weighted loss (CWL) [44,45] | CWL [44,45] encodes label ambiguity on the basis of the confusion matrices on validation sets. Label pairs with high confusion proportions are posited to be correlated, and the label intensity is smoothed from sample ground truth to the correlated labels. The label representations are class-specific. We set the confusion thresholds [45] for AID and NR as 0.02 and 0.03 respectively, due to the lower accuracy achieved on NR. |
D2LDL | D2LDL constructs neighbor-based label distributions but overlooks label correlations. The label distributions are rectified by Equation (13). |
N-LDL | N-LDL jointly captures local similarity and label correlations. The label distributions are rectified by Equation (13). |
Method | AID-0.2 | NR-0.1 |
---|---|---|
VGG | 91.21 ± 0.28 | 85.90 ± 0.22 |
LSR-v | 91.93 ± 0.21 | 86.66 ± 0.16 |
CWL-v | 92.79 ± 0.35 | 87.32 ± 0.28 |
D2LDL-v | 92.61 ± 0.26 | 87.29 ± 0.17 |
N-LDL-v (ours) | 93.66 ± 0.23 | 88.58 ± 0.20 |
ResNet | 92.61 ± 0.26 | 88.09 ± 0.23 |
LSR-r | 93.10 ± 0.22 | 88.41 ± 0.17 |
CWL-r | 93.68 ± 0.31 | 88.95 ± 0.27 |
D2LDL-r | 93.51 ± 0.25 | 89.01 ± 0.18 |
N-LDL-r (ours) | 94.11 ± 0.22 | 89.80 ± 0.19 |
Methods | Year | AID-0.2 | NR-0.1 |
---|---|---|---|
MSCP [53] | 2018 | 91.52 ± 0.21 | 85.33 ± 0.17 |
D-CNN [54] | 2018 | 90.82 ± 0.16 | 89.22 ± 0.50 |
TEX-TS-Net [55] | 2018 | 93.31 ± 0.11 | 84.77 ± 0.24 |
CapsNet [56] | 2019 | 93.79 ± 0.13 | 89.03 ± 0.21 |
SF-CNN [57] | 2019 | 93.60 ± 0.12 | 89.89 ± 0.16 |
SCCov [58] | 2020 | 93.12 ± 0.25 | 89.30 ± 0.35 |
Attention-GAN [22] | 2020 | 93.97 ± 0.23 | 88.06 ± 0.19 |
MIDC-Net [23] | 2020 | 88.51 ± 0.41 | 86.12 ± 0.29 |
TFADNN [59] | 2020 | 93.21 ± 0.32 | 87.78 ± 0.11 |
N-LDL-r (ours) | 94.11 ± 0.22 | 89.80 ± 0.19 |
Dataset | Step 1 | Step 2 | Step 3 | Total Time |
---|---|---|---|---|
AID-0.2 | 55.2 | 6.5 | 0.6 | 62.3 |
NR-0.1 | 49.6 | 22.0 | 0.8 | 72.4 |
Sample Truth Labels | D2LDL-v | N-LDL-v | D2LDL-r | N-LDL-r | |
---|---|---|---|---|---|
AID-0.2 | 244.03 | 166.74 | 142.27 | 162.30 | 110.54 |
NR-0.1 | 376.12 | 183.51 | 155.15 | 179.28 | 124.67 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, J.; Wang, Y.; Ou, Y.; He, B.; Li, B. Neighbor-Based Label Distribution Learning to Model Label Ambiguity for Aerial Scene Classification. Remote Sens. 2021, 13, 755. https://doi.org/10.3390/rs13040755
Luo J, Wang Y, Ou Y, He B, Li B. Neighbor-Based Label Distribution Learning to Model Label Ambiguity for Aerial Scene Classification. Remote Sensing. 2021; 13(4):755. https://doi.org/10.3390/rs13040755
Chicago/Turabian StyleLuo, Jianqiao, Yihan Wang, Yang Ou, Biao He, and Bailin Li. 2021. "Neighbor-Based Label Distribution Learning to Model Label Ambiguity for Aerial Scene Classification" Remote Sensing 13, no. 4: 755. https://doi.org/10.3390/rs13040755
APA StyleLuo, J., Wang, Y., Ou, Y., He, B., & Li, B. (2021). Neighbor-Based Label Distribution Learning to Model Label Ambiguity for Aerial Scene Classification. Remote Sensing, 13(4), 755. https://doi.org/10.3390/rs13040755