A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images
Abstract
:1. Introduction
- To improve the scene classification accuracy under insufficient annotated data, we proposed a simple yet effective self-supervised representation learning algorithm called Lite-SRL. To reduce computation consumption, we design a lightweight contrastive learning structure in Lite-SRL and adopt the stop-gradient operation;
- To realize on-board deployment of Lite-SRL algorithm, we proposed a training framework called DHP and a generic computation workload balancing module CWB. As far as we know, we represent the first work to combine self-supervised learning with on-board data processing;
- Extensive experiments on four representative datasets demonstrated that Lite-SRL could improve the scene classification accuracy under limited annotated data, and it is generalizable to SAR and optical images. Compared with six state-of-the-art methods, Lite-SRL had clear advantages in overall accuracy, number of parameters, memory consumption, and training latency;
- Eventually, to evaluate the proposed work’s on-board operational capability, we transplant Lite-SRL to the low-power computing platform NVIDIA Jetson TX2.
2. Related Works
2.1. RSSC under Limited Annotated Samples
2.2. Self-Supervised Contrastive Learning
2.3. Distributed Training under Limited Resources
3. Methods
3.1. Overview of the Proposed Framework
3.2. Lite-SRL Self-Supervised Representation Learning Network
3.2.1. Network Structure
Algorithm 1. Learning Procedure of Lite-SRL |
E: Encoder with |
P: Prediction MLP Aug: random image augmentation |
: parameters of E and P Stop: stop-gradient operation Input: Training samples Output: negative cosine similarity loss |
1: for number of training epochs do |
2: Training samples in a minibatch form |
3: Do augmentation 4: In Lite-SRL 2-way do and ; and 5: Calculate negative cosine similarity with stop-gradient operation 6: Do backwards propagation with SGD optimizer 7: Update weights 8: end for 9: After training, use pre-trained model for downstream Remote Sensing Scene Classification |
3.2.2. Lite-SRL Network Partition
3.3. Distributed Training Strategy
3.3.1. Computation Workload Balancing Module
Algorithm 2. CWB search for the best partition point |
Step1:CWB performs memory workload balancing Step2:CWB performs time equalization 1: Assign and to 2: Assume 3 TX2s can satisfy memory allocation, then 2 sets of candidate partition point that satisfy memory workload balancing are recorded as |
3: for a in do 4: for b in do 5: Partition point 1 adopts a, partition point 2 adopts b 6: Denote the running time of as 7: Denote the running time of as 8: Denote the running time of as |
9: The training time for a mini-batch is |
10: The partition point use , the ratio of running time to waiting time of is |
11: Calculate the equipment utilization indices E using Equation (9) 12: end for 13: end for 14: The partition point of E with the highest score is the best partition point |
3.3.2. Dynamic Chain System
4. Experimental Setups
4.1. Datasets Description
- The OpenSARUrban [14] dataset consists of 10 categories of urban scene images collected from Sentinel-1; its scene images cover 21 major cities in China. Each category contains about 40 to 2000 images with a size of 100 × 100 pixels, and the resolution of the images is about 20 m;
- The WHU-SAR6 [11] dataset consists of six categories of scene images collected form Sentinel-1 and GF-3. Each category contains about 250 to 420 images with ranging in size from 500 to 600 pixels. Since the total number of WHU-SAR6 images is relatively small, to increase the dataset volume we crop the images into small patches of 256 × 256 pixels without destroying the scene semantic information.
- The NWPU-RESISC45 [3] dataset is the current largest open benchmark dataset for scene classification task, consisting of 45 categories of scene images. Each category contains 700 images with a size of 256 × 256 pixels, and the spatial resolution of the images is about 0.2 to 30 m.
- The AID [15] dataset consists of 30 categories of scene images; each category containing about 200 to 400 images, for a total of 10,000 samples, each with a size of 600 × 600 pixels.
4.2. Data Augmentation
4.3. Implementation Details
- Experiments of self-supervised learning. In this part we use workstations to compare the proposed Lite-SRL with other advanced self-supervised methods comprehensively. The two workstations are identically configured with NVIDIA RTX 3090GPU, Intel Xeon CPU E5-1650, and 64 G RAM.
- Experiments for on-board deployment of Lite-SRL algorithm. We used the proposed distributed training modules and provided detailed records for the deployment process. The experimental on-board computing platform consists of NVIDIA Jetson TX2 nodes and a high-speed switch.
5. Experimental Results
5.1. Guaranteed Accuracy with Less Computation
5.2. Self-Supervised Representation Extractor
5.3. Improving the Scene Classification Accuracy with Limited Annotated Data
5.4. Confusion Matrix Analysis
6. Deployment of Lite-SRL
6.1. Computation Workload Balancing
6.2. Distributed Training with Higher Efficiency
7. Conclusions
- We will design a dedicated lightweight feature extractor in the self-supervised structure to further reduce the memory computation;
- We will explore techniques such as gradient compression, network pruning, etc., to further improve distributed training efficiency;
- We will explore hardware acceleration solutions for onboard distributed training;
- We expect to add more remote sensing observation missions to on-board distributed self-supervised training applications.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Abbreviation | Full Name |
---|---|
AID | Aerial Image Dataset |
BN | Batch Normalization |
BYOL | Bootstrap Your Own Latent |
CNN | Convolution Neural Network |
CWB | Computation workload Balancing module |
MG-CAP | Multi-Granularity Canonical Appearance Pools |
MLP | Multi-Layer Perceptron |
MoCo | Momentum Contrast for Visual Representation Learning |
MTL | Multitask Learning |
NWPU-45 | NWPU-Resisc45 Dataset |
DHP | Distributed Hybrid Parallelism Training Framework |
Lite-SRL | Lightweight Self-supervised Representation Learning algorithm |
ReLU | Rectified Linear Unit |
RSIs | Remote Sensing Images |
RSSC | Remote Sensing Scene Classification |
SGD | Stochastic Gradient Descent |
SimCLR | Simple Framework For Contrastive Learning |
Simsiam | Simple Siamese Representation Learning |
SwAV | Unsupervised Learning By Contrasting Cluster Assignments |
t-SNE | T-Distributed Stochastic Neighbor Embedding |
References
- Hu, F.; Xia, G.-S.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef] [Green Version]
- Ni, K.; Liu, P.; Wang, P. Compact Global-Local Convolutional Network with Multifeature Fusion and Learning for Scene Classification in Synthetic Aperture Radar Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7284–7296. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Xu, X.; Zhang, X.; Zhang, T. Multi-Scale SAR Ship Classification with Convolutional Neural Network. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Online Event, 11–16 July 2021; pp. 4284–4287. [Google Scholar]
- Lu, X.; Sun, X.; Diao, W.; Feng, Y.; Wang, P.; Fu, K. LIL: Lightweight Incremental Learning Approach through Feature Transfer for Remote Sensing Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5611320. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. Squeeze-And-Excitation Laplacian Pyramid Network with Dual-Polarization Feature Fusion for Ship Classification in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4019905. [Google Scholar] [CrossRef]
- Gu, Y.; Wang, Y.; Li, Y. A Survey on Deep Learning-Driven Remote Sensing Image Scene Understanding: Scene Classification, Scene Retrieval and Scene-Guided Object Detection. Appl. Sci. 2019, 9, 2110. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X.; Ke, X.; Liu, C.; Xu, X. HOG-ShipCLSNet: A Novel Deep Learning Network with HOG Feature Fusion for SAR Ship Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5210322. [Google Scholar] [CrossRef]
- Liao, N.; Datcu, M.; Zhang, Z.; Guo, W.; Zhao, J.; Yu, W. Analyzing the Separability of SAR Classification Dataset in Open Set Conditions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7895–7910. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. HyperLi-Net: A Hyper-Light Deep Learning Network for High-Accurate and High-Speed Ship Detection from Synthetic Aperture Radar Imagery. ISPRS J. Photogramm. Remote Sens. 2020, 167, 123–153. [Google Scholar] [CrossRef]
- Su, B.; Liu, J.; Su, X.; Luo, B.; Wang, Q. CFCANet: A Complete Frequency Channel Attention Network for SAR Image Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11750–11763. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A Polarization Fusion Network with Geometric Feature Embedding for SAR Ship Classification. Pattern Recognit. 2022, 123, 108365. [Google Scholar] [CrossRef]
- Dumitru, C.O.; Schwarz, G.; Datcu, M. SAR Image Land Cover Datasets for Classification Benchmarking of Temporal Changes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1571–1592. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, Z.; Yao, W.; Datcu, M.; Xiong, H.; Yu, W. OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 187–203. [Google Scholar] [CrossRef]
- Xia, G.-S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhang, T.; Zhang, X. A Full-Level Context Squeeze-And-Excitation ROI Extractor for SAR Ship Instance Segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4506705. [Google Scholar] [CrossRef]
- Kolesnikov, A.; Zhai, X.; Beyer, L. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1920–1929. [Google Scholar]
- Noroozi, M.; Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 69–84. [Google Scholar]
- Stojnic, V.; Risojevic, V. Self-supervised learning of remote sensing scene representations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 1182–1191. [Google Scholar]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S.; Wang, J.; Li, J.; Su, H.; Zhou, Y. Balance Scene Learning Mechanism for Offshore and Inshore Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4004905. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10181–10190. [Google Scholar]
- Franklin, D. NVIDIA Developer Blog: NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge. Available online: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/ (accessed on 13 April 2022).
- Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018. [Google Scholar] [CrossRef]
- Aitech’s S-A1760 Venus™ Brings NVIDIA-Based AI Supercomputing to Next Generation Space Applications: Radiation-CharActerized COTS System Qualified for Use in Small Sat Clusters and Short-Duration Spaceflights. Available online: https://aitechsystems.com/aitechs-s-a1760-venus-brings-nvidia-based-ai-supercomputing-to-next-generation-space-applications/ (accessed on 13 April 2022).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Processing Syst. 2019, 32, 8026–8037. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Shazeer, N.; Cheng, Y.; Parmar, N.; Tran, D.; Vaswani, A.; Koanantakool, P.; Hawkins, P.; Lee, H.; Hong, M.; Young, C.; et al. Mesh-tensorflow: Deep learning for supercomputers. arXiv 2018, arXiv:1811.02084. [Google Scholar]
- Onoufriou, G.; Bickerton, R.; Pearson, S.; Leontidis, G. Nemesyst: A hybrid parallelism deep learning-based framework applied for internet of things enabled food retailing refrigeration systems. Comput. Ind. 2019, 113, 103133. [Google Scholar] [CrossRef] [Green Version]
- Grill, J.-B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G. Bootstrap your own latent: A new approach to self-supervised learning. arXiv 2020, arXiv:2006.07733. [Google Scholar]
- Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
- Li, X.; Shi, D.; Diao, X.; Xu, H. SCL-MLNet: Boosting Few-Shot Remote Sensing Scene Classification via Self-Supervised Contrastive Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5801112. [Google Scholar] [CrossRef]
- Li, Y.; Shao, Z.; Huang, X.; Cai, B.; Peng, S. Meta-FSEO: A Meta-Learning Fast Adaptation with Self-Supervised Embedding Optimization for Few-Shot Remote Sensing Scene Classification. Remote Sens. 2021, 13, 2776. [Google Scholar] [CrossRef]
- Tao, C.; Qi, J.; Lu, W.; Wang, H.; Li, H. Remote Sensing Image Scene Classification With Self-Supervised Paradigm Under Limited Labeled Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8004005. [Google Scholar] [CrossRef]
- Kang, J.; Fernandez-Beltran, R.; Duan, P.; Liu, S.; Plaza, A.J. Deep Unsupervised Embedding for Remotely Sensed Images Based on Spatially Augmented Momentum Contrast. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2598–2610. [Google Scholar] [CrossRef]
- Jung, H.; Oh, Y.; Jeong, S.; Lee, C.; Jeon, T. Contrastive Self-Supervised Learning with Smoothed Representation for Remote Sensing. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8010105. [Google Scholar] [CrossRef]
- Zhao, L.; Luo, W.; Liao, Q.; Chen, S.; Wu, J. Hyperspectral Image Classification with Contrastive Self-Supervised Learning under Limited Labeled Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6008205. [Google Scholar] [CrossRef]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. arXiv 2020, arXiv:2006.09882. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Kim, S.; Yu, G.-I.; Park, H.; Cho, S.; Jeong, E.; Ha, H.; Lee, S.; Jeong, J.S.; Chun, B.-G. Parallax: Sparsity-aware data parallel training of deep neural networks. In Proceedings of the Fourteenth EuroSys Conference, Dresden, Germany, 25–28 March 2019; pp. 1–15. [Google Scholar]
- Jia, Z.; Zaharia, M.; Aiken, A. Beyond data and model parallelism for deep neural networks. Proc. Mach. Learn. Syst. 2019, 1, 1–13. [Google Scholar]
- Lee, S.; Kim, J.K.; Zheng, X.; Ho, Q.; Gibson, G.; Xing, P. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning; Carnegie Mellon University: Pittsburgh, PA, USA, 2014; pp. 2834–2842. [Google Scholar]
- Akintoye, S.B.; Han, L.; Zhang, X.; Chen, H.; Zhang, D. A hybrid parallelization approach for distributed and scalable deep learning. arXiv 2021, arXiv:2104.05035. [Google Scholar] [CrossRef]
- Demirci, G.V.; Ferhatosmanoglu, H. Partitioning sparse deep neural networks for scalable training and inference. In Proceedings of the ACM International Conference on Supercomputing, Virtual Event, 14–17 June 2021; pp. 254–265. [Google Scholar]
- Moreno-Alvarez, S.; Haut, J.M.; Paoletti, M.E.; Rico-Gallego, J.A. Heterogeneous model parallelism for deep neural networks. Neuro Comput. 2021, 441, 1–12. [Google Scholar] [CrossRef]
- Das, D.; Avancha, S.; Mudigere, D.; Vaidynathan, K.; Sridharan, S.; Kalamkar, D.; Kaul, B.; Dubey, P. Distributed deep learning using synchronous stochastic gradient descent. arXiv 2016, arXiv:1602.06709. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Van Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, S.; Hou, X.; Shao, L.; Dhabi, A. Recurrent transformer network for remote sensing scene categorisation. In Proceedings of the 2018 British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; Volume 266, p. 0987. [Google Scholar]
- Wang, S.; Guan, Y.; Shao, L. Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification. IEEE Trans. Image Proces. 2020, 29, 5396–5407. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.; Luo, Z.; Li, J.; Chen, C.; Piao, Y. When Self-Supervised Learning Meets Scene Classification: Remote Sensing Scene Classification Based on a Multitask Learning Framework. Remote Sens. 2020, 12, 3276. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. HTC+ for SAR Ship Instance Segmentation. Remote Sens. 2022, 14, 2395. [Google Scholar] [CrossRef]
Datasets | Images Number | Categories Number | Training Proportions |
---|---|---|---|
OpenSARUrban 1 [14] | 16679 | 10 | 10%, 20% |
WHU-SAR6 2 [11] | 17590 | 6 | 10%, 20% |
NWPU-RESISC45 [3] | 31500 | 45 | 10%, 20% |
AID [15] | 10000 | 30 | 10%, 20%, 50% |
Method: Freeze | Parameters (Millions) | Overall Accuracy (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
WHU-SAR6 | OpenSARUrban | NWPU-45 | AID | ||||||
10% | 20% | 10% | 20% | 10% | 20% | 10% | 20% | ||
ImageNet 1 (Supervised) [16] | - | - | - | - | - | 73.17 | 77.08 | 79.40 | 80.45 |
SimCLR [22] | 13.57 | 83.40 | 86.73 | 67.87 | 68.33 | 86.45 | 88.32 | 85.52 | 87.23 |
MoCo-v2 [43] | 22.48 | 82.39 | 85.07 | 65.52 | 66.07 | 83.37 | 86.63 | 84.56 | 86.05 |
SWAV [44] | 18.45 | 83.04 | 86.30 | 65.98 | 67.28 | 84.16 | 87.85 | 84.85 | 86.59 |
BYOL [32] | 31.81 | 86.11 | 87.75 | 68.36 | 69.73 | 88.63 | 90.06 | 87.24 | 88.32 |
SimSiam [33] | 22.73 | 87.59 | 88.64 | 70.20 | 70.86 | 91.19 | 91.26 | 89.15 | 90.49 |
Lite-SRL (ours) | 12.82 | 87.71 | 88.56 | 70.23 | 71.09 | 91.22 | 91.28 | 89.27 | 90.67 |
Method: Fine-Tune | Parameters (Millions) | Overall Accuracy (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
WHU-SAR6 | OpenSARUrban | NWPU-45 | AID | ||||||
10% | 20% | 10% | 20% | 10% | 20% | 10% | 20% | ||
Randomly initialized | - | - | - | - | - | 77.16 | 82.87 | 80.63 | 83.47 |
ImageNet (Supervised) | - | - | - | - | - | 84.74 | 89.93 | 89.81 | 90.54 |
SimCLR | 13.57 | 91.85 | 93.74 | 80.21 | 83.87 | 90.35 | 92.02 | 91.32 | 93.54 |
MoCo-v2 | 22.48 | 90.59 | 92.70 | 79.07 | 82.75 | 88.71 | 90.56 | 89.96 | 91.47 |
SWAV | 18.45 | 91.58 | 93.37 | 79.85 | 83.63 | 89.26 | 92.07 | 91.53 | 92.84 |
BYOL | 31.81 | 93.21 | 94.86 | 80.62 | 84.88 | 90.57 | 92.94 | 91.95 | 93.68 |
SimSiam | 22.73 | 94.77 | 95.69 | 81.49 | 85.29 | 92.68 | 93.48 | 92.38 | 94.63 |
Lite-SRL (ours) | 12.82 | 94.57 | 95.83 | 81.76 | 85.43 | 92.77 | 93.51 | 92.55 | 94.82 |
Method | Overall Accuracy (%) | |||
---|---|---|---|---|
NWPU 10% | NWPU 20% | AID 20% | AID50% | |
D-CNN with GoogLeNet [55] | 86.89 | 90.49 | 86.89 | 90.49 |
RTN [56] | 89.90 | 92.71 | 92.44 | - |
MG-CAP(Sqrt-E) [57] | 90.83 | 92.95 | 93.34 | 96.12 |
ResNet-101 [53] | 89.41 | 92.51 | 93.31 | 96.34 |
ResNet-101+MTL [58] | 91.61 | 93.93 | 93.67 | 96.61 |
ResNet-18+Lite-SRL (ours) | 92.77 | 93.51 | 94.82 | 95.78 |
ResNet-101+Lite-SRL (ours) | 93.41 | 94.43 | 95.29 | 96.82 |
Average Running Time of One Iteration (ms) | Partition 1 | Partition 2 | Partition 3 | ||||
---|---|---|---|---|---|---|---|
Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | ||
Baseline | 3572 | Average runtime of each node in one iteration (ms) | |||||
1035 | 1039 | 1145 | 1139 | 921 | 923 | ||
Running iterations | |||||||
500 | 500 | 500 | 500 | 500 | 500 |
Average Running Time of One Iteration (ms) | Partition 1 | Partition 2 | Partition 3 | ||||
---|---|---|---|---|---|---|---|
Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | ||
Dynamic | 2750 | Average runtime of each node in one iteration (ms) | |||||
1036 | 1038 | 1037 | 1140 | 1142 | 922 | ||
Running iterations | |||||||
329 | 331 | 340 | 406 | 594 | 1000 |
Method | Memory Consumption (MB) | Distributed Training Time Consumption | Accuray 2 (%) | |||
---|---|---|---|---|---|---|
Baseline (ms) | Dynamic (ms) | Improvement | Baseline | Dynamic | ||
ResNet18 + Lite-SRL | 7599.7 | 3572 | 2750 | 23.0% | 91.31 | 91.27 |
ResNet34 + Lite-SRL | 10,185.9 | 4895 | 3984 | 18.6% | 91.75 | 91.78 |
ResNet50 1 + Lite-SRL | 13,039.3 | 6473 | 5962 | 6.9% | 92.11 | 92.09 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, X.; Li, C.; Lei, Y. A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sens. 2022, 14, 2956. https://doi.org/10.3390/rs14132956
Xiao X, Li C, Lei Y. A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sensing. 2022; 14(13):2956. https://doi.org/10.3390/rs14132956
Chicago/Turabian StyleXiao, Xiao, Changjian Li, and Yinjie Lei. 2022. "A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images" Remote Sensing 14, no. 13: 2956. https://doi.org/10.3390/rs14132956
APA StyleXiao, X., Li, C., & Lei, Y. (2022). A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sensing, 14(13), 2956. https://doi.org/10.3390/rs14132956