CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification
Abstract
:1. Introduction
- (1)
- We analyzed three continual learning scenarios and propose training batch partitioning criteria for these three scenarios.
- (2)
- We constructed a large-scale remote sensing image scene classification database, namely, CLRS. This database can provide researchers with better data resources to evaluate and improve the performance of continual learning methods in remote sensing image scene classification.
- (3)
- We provided a new method for constructing a large-scale scene classification database based on the target detection pretrained model, which can save on manual annotation costs.
- (4)
- We tested and analyzed several mainstream continual learning methods for three continual scenarios, and the results can be used as a baseline for future work.
2. Construction Principles and Methods of CLRS Dataset
2.1. Construction Principles of CLRS dataset
- New Instances scenario (NI): New instances of the same scene category will exist in subsequent batches. Although these instances belong to the same category, they have different textures, backgrounds, resolutions, regions, etc. (as shown in Figure 1a). In the NI scenario, the model is required to continuously consolidate the knowledge of the scene categories that have been learned to achieve better prediction accuracy.
- New Classes scenario (NC): Figure 1b shows a schematic diagram of the NC scenario. The scene categories in subsequent batches are all new categories that the model has not learned before. In the NC scenario, the model must be able to quickly learn new scene categories and also not forget the knowledge of previously learned categories. In other words, the model can accurately predict new scene categories without losing the prediction accuracy for categories that have already been learned.
- New Instances and Classes scenario (NIC): In the NIC scenario (as shown in Figure 1c), subsequent training batches have both new categories that the model has not learned and new instances of the classes that the model has learned. The NIC scenario is the closest to the real-world remote sensing image scene classification problem. It requires the model to correctly distinguish different scene categories and to also continuously consolidate the knowledge of the categories that have been learned. Therefore, the NIC scenario is also the most difficult of the three scenarios.
- (1)
- Regarding the selection of the CLRS categories, we have referenced various land-use classification standards. The authors in Reference [23] constructed a scene category network for remote sensing image scene classification (as shown in Table 1), which synthesizes various land-use classification standards, and details which subclasses are included under each parent class. From this scene category network, we select 25 common scenarios as the scene categories of the CLRS.
- (2)
- The scene classification model based on deep learning can easily overfit small datasets. Therefore, the CLRS should have a large amount of sample data. Each class has 600 images, and the size of each CLRS image is , which will satisfy the majority of deep learning models.
- (3)
- In practical applications of the scene classification problem, scenes in the same category will have remarkable differences due to many factors, such as illumination, background, geographical location, spatial resolution, scale, etc. Therefore, the intraclass samples of the CLRS should be diverse and representative and be able to reflect the characteristics of the scene category as truly as possible, in order to improve the robustness and generalization ability of the model. CLRS samples can be collected from multiple sensors to increase the variation of the samples within the same scene category.
- (4)
- Many similar scenes exist in the actual scene classification. The similarity of these scenes is very high, which brings some difficulties and challenges to the classification model. Therefore, a small difference between the CLRS scene categories increases their similarity to the actual application. Similar scenarios should be considered in the selection of the CLRS categories.
- (5)
- For the above three continual learning scenarios, the CLRS should develop a set of criteria for dividing training batches and ensure that the image data between each batch cannot be duplicated. The spatial resolution can be quantitatively divided according to its numerical value. Therefore, the CLRS should record the spatial resolution of each remote sensing image in the collection process to divide the training batches quantitatively.
- (6)
- Remote sensing images are more complicated than natural images due to their background and texture. Therefore, the CLRS must consider the smoothness between different batches to reduce this difficulty. In other words, the training batches division should be as balanced as possible, and the differences should not be extremely large.
2.2. Construction Methods of CLRS Dataset
- (1)
- As shown in Figure 5a, if the detected object is at the center of the remote sensing image, the image is cropped with the center of the detected object boundary box as the starting point. If other similar objects are included in the boundary box, only one image can be output.
- (2)
- If the detected object is on the edge of the remote sensing image (as shown in Figure 5b), and cropping cannot be conducted with the object as the center, then the image is cropped according to the boundary of the remote sensing image, as long as the object is included in the area.
3. The Proposed CLRS Dataset
- (1)
- Multisource. To meet the requirements of the deep learning model regarding the diversity of the samples in the dataset, the CLRS ensures the diversity and representativeness of the samples in the collection. In the same way as most of the existing datasets images are collected, such as AID++ [23], RSD46-WHU [32,33], etc., CLRS images are mainly collected from Google Earth, Bing Map, Google Map, and Tianditu, which use different remote imaging sensors. Therefore, the CLRS images are multisource and provide rich sample data.
- (2)
- The samples within the class are more diverse. During the acquisition process, the image is considered for various factors, such as the illumination, background, time, scale, and angle. The image locations are widely distributed worldwide, including major cities and regions in Asia, Africa, Europe, North America, South America, and Oceania (as shown in Figure 7). These factors remarkably increase the intraclass diversity of the CLRS samples (see Figure 8). Figure 8a shows the differences in the same category due to seasonal changes. In Figure 8b, the effects of the climate and geographical environment can also lead to large variations in the same category of objects. In Figure 8c, we show sample differences due to the different cultures and architectural styles of different countries. In Figure 8d, we display two samples of the same scene category with different resolutions.
- (3)
- The difference between classes is smaller. Given that the scenes in actual applications are often similar, the CLRS also selects some similar scene categories (as shown in Figure 9) to narrow the interclass differences of the CLRS. The main difference in Figure 9a is that the railway station not only has the railway but also the station. In Figure 9b, the stadium has the surrounding buildings except for the playground. The airport not only has many planes but also has a runway, as shown in Figure 9c. In Figure 9d, the bare land has some artificial traces, but the desert does not. The CLRS has higher interclass similarity and is closer to the actual remote sensing image scene classification task.
- (4)
- The CLRS provides the training batch partitioning standard. The existing datasets lack training batch partitioning standards, and thus they cannot be used to evaluate and compare the performance of different continual learning algorithms in remote sensing image scene classification. The CLRS provides a set of training batch partitioning standards. Each CLRS image comprehensively records the spatial resolution of the image, and the resolution range of each type of image is counted. Based on the resolution, each type of image is divided into three levels. Each level has 200 images. Table 2 presents the resolution range of the three levels for each type of image. Each image will be named in the following format to facilitate the training batch division: Category_Number_Resolution Level_Resolution Size.tif.
4. Experiment
4.1. Training Batch Partitioning in Three Scenarios
- NI scenario: The remaining images can be divided into batch1, batch2, and batch3 for training. Each training batch has 25 scene classes and 3750 images. In the NI scenario, all the scene classes in the test set have appeared in the training set, that is, all the classes are known. The classifier only needs to classify 25 classes in each batch and does not need to distinguish new classes.
- NC scenario: Considering the smoothness between training batches, the NC scenario randomly divides the 25 scene classes into five copies, each of which contains all the images of the three levels in the five scene classes to simplify the difficulty. Each training batch contains five scene classes and 2250 images. In the NC scenario, the test set contains scene classes that have not yet appeared in the training set. A classifier needs to learn to distinguish not only all the classes it has seen so far but also those that have never appeared before. Therefore, the NC scenario is more difficult than the NI scenario.
- NIC scenario: Considering the five training batches of the NC scenario and the three training batches of the NI scenario, the NIC scenario can be divided into 15 batches. Each training batch has five scene classes and 750 images. In this scenario, the training batch sequence is longer (15 batches), and the model will continue to consolidate the previously learned knowledge while continuously learning new scene classes. Among the scenarios, the NIC scenario is the closest to real world remote sensing image scene classification and the most difficult of the three scenarios.
4.2. Baseline Methods
4.3. Evaluation Metrics
4.4. Parameter Settings
4.5. Experiment Results and Analysis
4.5.1. The NI Scenario
4.5.2. The NC Scenario
4.5.3. The NIC Scenario
4.6. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep Learning Based Feature Selection for Remote Sensing Scene Classification. IEEE Geosci. Remote. Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Du, P.; Li, E.; Xia, J.; Samat, A.; Bai, X. Feature and Model Level Fusion of Pretrained CNN for Remote Sensing Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 2600–2611. [Google Scholar] [CrossRef]
- Cheng, G.; Li, Z.; Yao, X.; Guo, L.; Wei, Z. Remote sensing image scene classification using bag of convolutional features. IEEE Geosci. Remote. Sens. Lett. 2017, 14, 1735–1739. [Google Scholar] [CrossRef]
- Zhang, B.; Zhang, Y.; Wang, S. A Lightweight and Discriminative Model for Remote Sensing Scene Classification With Multidilation Pooling Module. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 2636–2653. [Google Scholar] [CrossRef]
- Helber, P.; Bischke, B.; Dengel, A.; Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef] [Green Version]
- Gomez-Chova, L.; Camps-Valls, G.; Munoz-Mari, J.; Calpe, J. Semisupervised Image Classification With Laplacian Support Vector Machines. IEEE Geosci. Remote. Sens. Lett. 2008, 5, 336–340. [Google Scholar] [CrossRef]
- Chen, L.; Cui, X.; Li, Z.; Yuan, Z.; Xing, J.; Xing, X.; Jia, Z. A New Deep Learning Algorithm for SAR Scene Classification Based on Spatial Statistical Modeling and Features Re-Calibration. Sensors 2019, 19, 2479. [Google Scholar] [CrossRef] [Green Version]
- Alhichri, H. Multitask Classification of Remote Sensing Scenes Using Deep Neural Networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22 July 2018; pp. 1195–1198. [Google Scholar] [CrossRef]
- Liu, M.; Nie, L.; Wang, X.; Tian, Q.; Chen, B. Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning. IEEE Trans. Image Process. 2019, 28, 1235–1247. [Google Scholar] [CrossRef]
- McCloskey, M.; Cohen, N.J. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation; Elsevier: Amsterdam, The Netherlands, 1989; Volume 24, pp. 109–165. [Google Scholar]
- French, R.M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef]
- Ratcliff, R. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychol. Rev. 1990, 97, 285–308. [Google Scholar] [CrossRef]
- Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef] [PubMed]
- Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zenke, F.; Poole, B.; Ganguli, S. Improved multitask learning through synaptic intelligence. arXiv 2017, arXiv:abs/1703.04200. [Google Scholar]
- Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2935–2947. [Google Scholar] [CrossRef] [Green Version]
- Tasar, O.; Tarabalka, Y.; Alliez, P. Incremental Learning for Semantic Segmentation of Large-Scale Remote Sensing Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2019, 12, 3524–3537. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Lecun, Y.; Cortes, C.; Burges, C.J.C. The Mnist Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 14 January 2020).
- Lomonaco, V.; Maltoni, D. Core50: A new dataset and benchmark for continuous object recognition. arXiv 2017, arXiv:1705.03550. [Google Scholar]
- Jin, P.; Xia, G.S.; Hu, F.; Lu, Q.; Zhang, L. AID++: An Updated Version of AID on Scene Classification. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22 July 2018; pp. 4721–4724. [Google Scholar]
- Li, H.; Tao, C.; Wu, Z.; Chen, J.; Gong, J.; Deng, M. RSI-CB: A large scale remote sensing image classification benchmark via crowdsource data. arXiv 2017, arXiv:1705.10450. [Google Scholar]
- Kang, J.; Körner, M.; Wang, Y.; Taubenböck, H.; Zhu, X.X. Building instance classification using street view images. ISPRS J. Photogramm. Remote. Sens. 2018, 145, 44–59. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:abs/1804.02767. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:abs/1405.0312. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2015, arXiv:abs/1512.02325. [Google Scholar]
- Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 19 June 2018. [Google Scholar]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Xiao, Z.; Long, Y.; Li, D.; Tang, G.; Liu, J. High-Resolution Remote Sensing Image Retrieval Based on CNNs from a Dimensional Perspective. Remote. Sens. 2017, 9, 725. [Google Scholar] [CrossRef] [Green Version]
- Zhao, B.; Zhong, Y.; Xia, G.; Zhang, L. Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 2108–2123. [Google Scholar] [CrossRef]
- Maltoni, D.; Lomonaco, V. Continuous learning in single-incremental-task scenarios. arXiv 2018, arXiv:1806.08568. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:stat.ML/1503.02531. [Google Scholar]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2013, arXiv:abs/1311.2524. [Google Scholar]
- Yuan, X.; Liu, X.; Yan, S. Visual Classification With Multitask Joint Sparse Representation. IEEE Trans. Image Process. 2012, 21, 4349–4360. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 19 June 2016; pp. 770–778. [Google Scholar]
- Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2019. [Google Scholar] [CrossRef] [Green Version]
- Zhu, D.; Zhang, F.; Wang, S.; Wang, Y.; Cheng, X.; Huang, Z.; Liu, Y. Understanding Place Characteristics in Geographic Contexts through Graph Convolutional Neural Networks. Ann. Am. Assoc. Geogr. 2020, 110, 408–420. [Google Scholar] [CrossRef]
Parent Class | Subclasses |
---|---|
airport | airport, runway |
highway | bridge, parking, parking_by_the_road, road, viaduct |
port land | port |
railway | railway, station |
waters | beach, lake, river |
unused land | bareland, desert, ice, rock, mountain |
resident | mix, multi-family, single-family |
arable land | dry land, paddy fields, terraces |
grassland | meadow, shrub |
woodland | forest |
power station | solar, wind, hydraulic |
factory | storage tank, works |
mining area | mine, oilfield |
commerce | commercial |
religious land | church |
sports land | baseball-field, basketball-field, golf-course, stadium, soccer field, tennis court |
special land | cemetery |
leisure land | amusement park, park, pool, square |
Scene Categories | Level1 (m) | Level2 (m) | Level3 (m) |
---|---|---|---|
airport | |||
bare-land | |||
beach | |||
bridge | |||
commercial | |||
desert | |||
farmland | |||
forest | |||
golf-course | |||
highway | |||
industrial | |||
meadow | |||
mountain | |||
overpass | |||
park | |||
parking | |||
playground | |||
port | |||
railway | |||
railway station | |||
residential | |||
river | |||
runway | |||
stadium | |||
storage-tank |
Datasets | Scene Classes | Total Images | Spatial Resolution (m) | Data Source | Location Distribution | Training Batch Partitioning Standard |
---|---|---|---|---|---|---|
UCM [18] | 21 | 2100 | USGS | Urban areas in the United States | No | |
SIRI-WHU [34] | 12 | 2400 | 2 | Google Earth | mainly covers urban areas in China | No |
AID [19] | 30 | 10000 | Google Earth | mainly in China, the United States, England, etc. | No | |
NWPU-RESISC45 [20] | 45 | 31500 | Google Earth | more than 100 countries | No | |
CLRS | 25 | 15000 | Google Earth, Bing Map, Google Map, and Tianditu | more than 100 countries | Yes |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, H.; Jiang, H.; Gu, X.; Peng, J.; Li, W.; Hong, L.; Tao, C. CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification. Sensors 2020, 20, 1226. https://doi.org/10.3390/s20041226
Li H, Jiang H, Gu X, Peng J, Li W, Hong L, Tao C. CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification. Sensors. 2020; 20(4):1226. https://doi.org/10.3390/s20041226
Chicago/Turabian StyleLi, Haifeng, Hao Jiang, Xin Gu, Jian Peng, Wenbo Li, Liang Hong, and Chao Tao. 2020. "CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification" Sensors 20, no. 4: 1226. https://doi.org/10.3390/s20041226
APA StyleLi, H., Jiang, H., Gu, X., Peng, J., Li, W., Hong, L., & Tao, C. (2020). CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification. Sensors, 20(4), 1226. https://doi.org/10.3390/s20041226