Attention to Both Global and Local Features: A Novel Temporal Encoder for Satellite Image Time Series Classification
Abstract
:1. Introduction
- 1.
- The proposed GL-TAE is an early attempt to extract hybrid global–local temporal features for SITS classification, adopting self-attention mechanisms for global attention extraction and convolution for local attention extraction.
- 2.
- The proposed GL-TAE is a lightweight model. We selected and set two lightweight submodules (LTAE and LConv) in parallel and extracted global and local temporal features, respectively, in combination with dimensional split strategies, which further reduce the parameter numbers under the same hyperparameter settings.
2. Methodology
2.1. Overview
2.2. Self-Attention Mechanism
- 1.
- Compute a triplet of the query–key–value for each time step t in the time series by three shared linear layers with transformation matrices to the input .
- 2.
- Compute the attention mask in which the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. The compatibility function computes the dot products of the query with all keys, divides each by (dimension of keys), and applies a softmax function to obtain the weights on the values.
- 3.
- Compute the sum of the values weighted by the corresponding attention mask as the output for each time step.
2.3. Lightweight Self-Attention for Global Attention Extraction
- 1.
- Unlike the vanilla transformer [41], which computes multiple triplets for multi-head attention from the input sequence of size and produces an output size of ( is the number of heads), LTAE’s multi-head attention is applied along the channel dimensions, then the outputs of all heads are concatenated into a vector with the same channels as the input sequence, regardless of the number of heads. Such modified multi-head attention mechanism, while retaining the diversity of feature extraction, significantly reduces the number of trainable parameters.
- 2.
- Reference [52] defines a learnable query, named a master query, for each head in LTAE instead of computing from the input sequence by a linear layer, which further reduces the parameter numbers. The attention masks of each head are defined as the scaled softmax of the dot-product between the keys and the master query. The output of each head is defined as the sum of the corresponding inputs weighed by the attention mask in the temporal dimension.
2.4. Lightweight Convolution for Local Attention Extraction
2.5. Channel Dimension Split and Concatenation
3. Experiments and Results
3.1. Datasets
- (1)
- TiSeLaC: The study data come from the datasets provided by the 2017 Time Series Land Cover Classification Challenge (accessed on 20 December 2021) (https://sites.google.com/site/dinoienco/tiselac-time-series-land-cover-classification-challenge). The original data were generated from the level 2A Landsat, 8 images from 23 acquisition dates in 2014 above Reunion Island, shown in Figure 2. The spatial resolution was 30 m and 10 features were selected for each pixel, including the first 7 bands of Landsat-8 (Band1–Band7) and 3 complementary radiometric indices (normalized difference vegetation index, normalized difference water index, and brightness index) computed from original data. The original geographic coordinate information for each pixel was not considered in the experiments. The original dataset was officially divided into the train set and test set; we reintegrated the dataset to form a dataset consisting of 99,687 pixels distributed over 9 classes of land cover; the sample statistics are shown in Table 1. The ground truth label of each pixel is referenced from the 2012 Corine Land Cover map (https://www.eea.europa.eu/publications/COR0-landcover) and the 2014 local farmers’ graphical land parcel registration results.
- (2)
- TimeSen2Crop [57]: The data were collected from 15 Sentinel-2 tiles covering Austria, shown in Figure 3, between September 2017 and September 2018m with cloud coverage of less than 80% and atmospherically corrected using the radiative transfer model MODTRAN [58]. The number of acquisition dates of the time series varies for each tile, ranging from 23 to 38. The dataset has 16 crop types and 1.2 million pixels in total, and the sample statistics are shown in Table 2. For each pixel, there are 9 Sentinel-2 Bands (B2, B3, B4, B5, B6, B7, B8A, B11, B12) and 1 non-spectral band, indicating the condition of the pixel that was not considered in the experiments. The 20-m bands (B5, B6, B7, B8A, B11, B12) were resampled to 10 m and the final spatial resolution for each pixel is 10 m. The ground truth label of each pixel was extracted from the publicly available Austrian crop type map (https://www.data.gv.at/katalog/dataset/e21a731f).
3.2. Competing Methods and Classification Architecture
- 1.
- TempCNN: TempCNN [32] has three vanilla convolutional layers that apply convolutions on the temporal dimensions with different kernels of kernel_size = 3. We added the same channel enhancement layer of GL-TAE before the convolutions. Then TempCNN flattens and dimensionally reduces the time series from to with a dense layer. However it is unable to handle the time series with variable temporal lengths; therefore, we replaced the flattened-dense layer with the global (temporal) average pooling layer for experiments on the TimeSen2Crop dataset.
- 2.
- TCN: TCN [35], which is different from TempCNN, has three causal convolutional layers with dilation and , which means TCN has a larger perceptive field than TempCNN along the temporal dimension.
- 3.
- LSTM: We adopted a one-layer LSTM with adjustable hyperparameters , and then used the last hidden state and transformed it into the size by a dense layer. A channel enhancement layer was also added before the LSTM layer.
- 4.
- Transformer: The architecture in [50] was adopted. In the original transformer, a query–key–value triplet was computed simultaneously for each time step in the time series and the attention mask was obtained by calculating the similarity of the queries and keys. The encoded result was defined as the sum of the values weighted by the attention mask. Reference [50] took the original transformer encoder with positional encoding, multi-head attention, and feedforward networks and used a global maximum average operation to achieve a dimensionality reduction from size to .
- 5.
- TAE: Reference [51] removed the feedforward networks and defined the master query as the mean of queries. The master query was then multiplied with the keys to determine a single attention mask for weighing the input time series.
- 6.
3.3. Implementation Details
3.4. Classification Results
3.5. Comprehensive Comparison with Different Hyperparameter Settings
4. Discussion
4.1. Importance Comparison of Global and Local Attention
4.2. Necessity of Channel Dimension Split
4.3. Impact of Different Values of Hyperparameters
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Langley, S.K.; Cheshire, H.M.; Humes, K.S. A comparison of single date and multitemporal satellite image classifications in a semi-arid grassland. J. Arid. Environ. 2001, 49, 401–411. [Google Scholar] [CrossRef]
- Franklin, S.E.; Ahmed, O.S.; Wulder, M.A.; White, J.C.; Hermosilla, T.; Coops, N.C. Large area mapping of annual land cover dynamics using multitemporal change detection and classification of Landsat time series data. Can. J. Remote Sens. 2015, 41, 293–314. [Google Scholar] [CrossRef]
- Zheng, B.; Myint, S.W.; Thenkabail, P.S.; Aggarwal, R.M. A support vector machine to identify irrigated crop types using time-series Landsat NDVI data. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 103–112. [Google Scholar] [CrossRef]
- Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Dedieu, G. Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas. Remote Sens. Environ. 2016, 187, 156–168. [Google Scholar] [CrossRef]
- Guo, Y.; Jia, X.; Paull, D. Effective sequential classifier training for SVM-based multitemporal remote sensing image classification. IEEE Trans. Image Process. 2018, 27, 3036–3048. [Google Scholar]
- Ouzemou, J.e.; El Harti, A.; Lhissou, R.; El Moujahid, A.; Bouch, N.; El Ouazzani, R.; Bachaoui, E.M.; El Ghmari, A. Crop type mapping from pansharpened Landsat 8 NDVI data: A case of a highly fragmented and intensive agricultural system. Remote Sens. Appl. Soc. Environ. 2018, 11, 94–103. [Google Scholar] [CrossRef]
- Kang, J.; Zhang, H.; Yang, H.; Zhang, L. Support vector machine classification of crop lands using Sentinel-2 imagery. In Proceedings of the 2018 7th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Hangzhou, China, 6–9 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Gbodjo, Y.J.E.; Ienco, D.; Leroux, L. Toward spatio–spectral analysis of sentinel-2 time series data for land cover mapping. IEEE Geosci. Remote Sens. Lett. 2019, 17, 307–311. [Google Scholar] [CrossRef]
- Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E. A multiscale random forest kernel for land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2842–2852. [Google Scholar] [CrossRef]
- Hao, P.; Zhan, Y.; Wang, L.; Niu, Z.; Shakir, M. Feature selection of time series MODIS data for early crop classification using random forest: A case study in Kansas, USA. Remote Sens. 2015, 7, 5347–5369. [Google Scholar] [CrossRef] [Green Version]
- Cai, Y.; Lin, H.; Zhang, M. Mapping paddy rice by the object-based random forest method using time series Sentinel-1/Sentinel-2 data. Adv. Space Res. 2019, 64, 2233–2244. [Google Scholar] [CrossRef]
- Berndt, D.J.; Clifford, J. Using Dynamic Time Warping to Find Patterns in Time Series; KDD Workshop: Seattle, WA, USA, 1994; Volume 10, pp. 359–370. [Google Scholar]
- Jiang, W. Time series classification: Nearest neighbor versus deep learning models. SN Appl. Sci. 2020, 2, 1–17. [Google Scholar] [CrossRef]
- Petitjean, F.; Inglada, J.; Gançarski, P. Satellite image time series analysis under time warping. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3081–3095. [Google Scholar] [CrossRef]
- Zhang, Z.; Tang, P.; Huo, L.; Zhou, Z. MODIS NDVI time series clustering under dynamic time warping. Int. J. Wavelets Multiresolution Inf. Process. 2014, 12, 1461011. [Google Scholar] [CrossRef]
- Zhao, Y.; Lin, L.; Lu, W.; Meng, Y. Landsat time series clustering under modified Dynamic Time Warping. In Proceedings of the 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Guangzhou, China, 4–6 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 62–66. [Google Scholar]
- Maus, V.; Câmara, G.; Cartaxo, R.; Sanchez, A.; Ramos, F.M.; De Queiroz, G.R. A time-weighted dynamic time warping method for land-use and land-cover mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3729–3739. [Google Scholar] [CrossRef]
- Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
- Maus, V.; Câmara, G.; Appel, M.; Pebesma, E. dtwsat: Time-weighted dynamic time warping for satellite image time series analysis in r. J. Stat. Softw. 2019, 88, 1–31. [Google Scholar] [CrossRef] [Green Version]
- Belgiu, M.; Zhou, Y.; Marshall, M.; Stein, A. Dynamic time warping for crops mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 947–951. [Google Scholar] [CrossRef]
- Ienco, D.; Gaetano, R.; Dupaquier, C.; Maurel, P. Land cover classification via multitemporal spatial data by deep recurrent neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1685–1689. [Google Scholar] [CrossRef] [Green Version]
- Rußwurm, M.; Korner, M. Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 June 2017; pp. 11–19. [Google Scholar]
- Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the Interspeech, Makuhari, Japan, 26–30 September 2010; Volume 2, pp. 1045–1048. [Google Scholar]
- Sharma, A.; Liu, X.; Yang, X. Land cover classification from multi-temporal, multi-spectral remotely sensed imagery using patch-based recurrent neural networks. Neural Netw. 2018, 105, 346–355. [Google Scholar] [CrossRef] [Green Version]
- Minh, D.H.T.; Ienco, D.; Gaetano, R.; Lalande, N.; Ndikumana, E.; Osman, F.; Maurel, P. Deep recurrent neural networks for winter vegetation quality mapping via multitemporal SAR Sentinel-1. IEEE Geosci. Remote Sens. Lett. 2018, 15, 464–468. [Google Scholar] [CrossRef]
- Ienco, D.; Gaetano, R.; Interdonato, R.; Ose, K.; Minh, D.H.T. Combining sentinel-1 and sentinel-2 time series via rnn for object-based land cover classification. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4881–4884. [Google Scholar]
- Yin, R.; He, G.; Wang, G.; Long, T.; Li, H.; Zhou, D.; Gong, C. Automatic Framework of Mapping Impervious Surface Growth With Long-Term Landsat Imagery Based on Temporal Deep Learning Model. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Kwak, G.H.; Park, C.W.; Ahn, H.Y.; Na, S.I.; Lee, K.D.; Park, N.W. Potential of bidirectional long short-term memory networks for crop classification with multitemporal remote sensing images. Korean J. Remote. Sens. 2020, 36, 515–525. [Google Scholar]
- Crisóstomo de Castro Filho, H.; Abílio de Carvalho Júnior, O.; Ferreira de Carvalho, O.L.; Pozzobon de Bem, P.; dos Santos de Moura, R.; Olino de Albuquerque, A.; Rosa Silva, C.; Guimarães Ferreira, P.H.; Fontes Guimarães, R.; Trancoso Gomes, R.A. Rice crop detection using LSTM, Bi-LSTM, and machine learning models from sentinel-1 time series. Remote Sens. 2020, 12, 2655. [Google Scholar] [CrossRef]
- Bakhti, K.; Arabi, M.E.A.; Chaib, S.; Djerriri, K.; Karoui, M.S.; Boumaraf, S. Bi-Directional LSTM Model For Classification Of Vegetation From Satellite Time Series. In Proceedings of the 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Tunis, Tunisia, 9–11 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 160–163. [Google Scholar]
- Pelletier, C.; Webb, G.I.; Petitjean, F. Temporal convolutional neural network for the classification of satellite image time series. Remote Sens. 2019, 11, 523. [Google Scholar] [CrossRef] [Green Version]
- Ienco, D.; Gbodjo, Y.J.E.; Gaetano, R.; Interdonato, R. Weakly supervised learning for land cover mapping of satellite image time series via attention-based CNN. IEEE Access 2020, 8, 179547–179560. [Google Scholar] [CrossRef]
- Račič, M.; Oštir, K.; Peressutti, D.; Zupanc, A.; Čehovin Zajc, L. Application of temporal convolutional neural network for the classification of crops on sentinel-2 time series. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1337–1342. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Brock, J.; Abdallah, Z.S. Investigating Temporal Convolutional Neural Networks for Satellite Image Time Series Classification. arXiv 2022, arXiv:2204.08461. [Google Scholar]
- Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
- Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional neural networks. In Proceedings of the International Conference on Web-Age Information Management, Macau, China, 16–18 June 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 298–310. [Google Scholar]
- Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Front. Comput. Sci. 2016, 10, 96–112. [Google Scholar] [CrossRef]
- Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I.; Sutskever, I. Improving language understanding by generative pre-training. 2018. Available online: https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf (accessed on 15 March 2022).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; ACM: New York, NY, USA, 2021; pp. 10347–10357. [Google Scholar]
- Yuan, Y.; Lin, L. Self-supervised pretraining of transformers for satellite image time series classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 474–487. [Google Scholar] [CrossRef]
- Rußwurm, M.; Körner, M. Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote. Sens. 2020, 169, 421–435. [Google Scholar] [CrossRef]
- Garnot, V.S.F.; Landrieu, L.; Giordano, S.; Chehata, N. Satellite image time series classification with pixel-set encoders and temporal self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12325–12334. [Google Scholar]
- Garnot, V.S.F.; Landrieu, L. Lightweight temporal self-attention for classifying satellite images time series. In Proceedings of the International Workshop on Advanced Analytics and Learning on Temporal Data, Ghent, Belgium, 18 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 171–181. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Garnot, V.S.F.; Landrieu, L. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4872–4881. [Google Scholar]
- Interdonato, R.; Ienco, D.; Gaetano, R.; Ose, K. DuPLO: A DUal view Point deep Learning architecture for time series classificatiOn. ISPRS J. Photogramm. Remote Sens. 2019, 149, 91–104. [Google Scholar] [CrossRef] [Green Version]
- Wu, F.; Fan, A.; Baevski, A.; Dauphin, Y.N.; Auli, M. Pay less attention with lightweight and dynamic convolutions. arXiv 2019, arXiv:1901.10430. [Google Scholar]
- Weikmann, G.; Paris, C.; Bruzzone, L. Timesen2crop: A million labeled samples dataset of sentinel 2 image time series for crop-type classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4699–4708. [Google Scholar] [CrossRef]
- Berk, A.; Anderson, G.P.; Bernstein, L.S.; Acharya, P.K.; Dothe, H.; Matthew, M.W.; Adler-Golden, S.M.; Chetwynd, J.H., Jr.; Richtsmeier, S.C.; Pukall, B.; et al. MODTRAN4 radiative transfer modeling for atmospheric correction. In Proceedings of the Optical spectroscopic techniques and instrumentation for atmospheric and space research III, Denver, CO, USA, 19–21 July 1999; SPIE: Bellingham, WA, USA, 1999; Volume 3756, pp. 348–353. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Class Name | Number of Samples | Proportion (%) |
---|---|---|
Urban areas | 20,000 | 20.06 |
Other built-up surfaces | 3883 | 3.89 |
Forests | 20,000 | 20.06 |
Sparse vegetation | 19,398 | 19.46 |
Rocks and bare soil | 15,530 | 15.58 |
Grassland | 6817 | 6.84 |
Sugarcane crops | 9187 | 9.22 |
Other crops | 1754 | 1.76 |
Water | 3118 | 3.13 |
Total | 99,687 | 100.0 |
Class Name | Number of Samples | Proportion (%) |
---|---|---|
Legumes | 7951 | 0.66 |
Grassland | 283,263 | 23.37 |
Maize | 164,316 | 13.55 |
Potato | 30,678 | 2.53 |
Sunflower | 22,787 | 1.88 |
Soy | 70,884 | 5.85 |
Winter Barley | 94,061 | 7.76 |
Winter Caraway | 1472 | 0.12 |
Rye | 53,694 | 4.42 |
Rapeseed | 41,901 | 3.46 |
Beet | 34,064 | 2.81 |
Spring Cereals | 85,353 | 7.04 |
Winter Wheat | 132,327 | 10.92 |
Winter Triticale | 66,448 | 5.48 |
Permanent Plantation | 46,312 | 3.82 |
Other Crops | 76,713 | 6.33 |
Total | 1,212,224 | 100.0 |
Model | Candidates of Hyperparameters | Reference |
---|---|---|
GL-TAE | ||
LTAE | [52] | |
TAE | [51] | |
Transformer | [50] | |
TempCNN | [32] | |
TCN | [35] | |
LSTM | [21] |
(a) TiSeLaC | |||||||
---|---|---|---|---|---|---|---|
Class | GL-TAE | LTAE | TAE | Transformer | TempCNN | TCN | LSTM |
N.params | 163 K | 161 K | 164 K | 165 K | 156 K | 160 K | 156 K |
OA(%) | |||||||
mIoU(%) | |||||||
Urban areas | |||||||
Other built-up surfaces | |||||||
Forest | |||||||
Sparse Vegetation | |||||||
Rocks and bare soil | |||||||
Grassland | |||||||
Sugarcane crops | |||||||
Other crops | |||||||
Water | |||||||
(b) TimeSen2Crop | |||||||
Class | GL-TAE | LTAE | TAE | Transformer | TempCNN | TCN | LSTM |
N.params | 139 K | 137 K | 142 K | 142 K | 137 K | 139 K | 137 K |
OA(%) | |||||||
mIoU(%) | |||||||
Legumes | |||||||
Grassland | |||||||
Maize | |||||||
Potato | |||||||
Sunflower | |||||||
Soy | |||||||
Winter Barley | |||||||
Winter Caraway | |||||||
Rye | |||||||
Rapeseed | |||||||
Beet | |||||||
Spring Cereals | |||||||
Winter Wheat | |||||||
Winter Triticale | |||||||
Permanent Plantation | |||||||
Other Crops |
(a) | (b) | ||||
---|---|---|---|---|---|
N.params | mIoU | N.params | mIoU | ||
256 | 82 K | 8 | 113 K | ||
512 | 163 K | 16 | 130 K | ||
1024 | 488 K | 32 | 163 K | ||
2048 | 2.8 M | 64 | 229 K | ||
(c) | (d) | ||||
N.params | mIoU | N.params | mIoU | ||
4 | 130 K | 3 | 163 K | ||
8 | 163 K | 5 | 163 K | ||
16 | 229 K | 7 | 163 K | ||
32 | 361 K | 9 | 163 K |
(a) | (b) | ||||
---|---|---|---|---|---|
N.params | mIoU | N.params | mIoU | ||
256 | 70 K | 8 | 89 K | ||
512 | 139 K | 16 | 106 K | ||
1024 | 481 K | 32 | 139 K | ||
2048 | 2.7 M | 64 | 205 K | ||
(c) | (d) | ||||
N.params | mIoU | N.params | mIoU | ||
4 | 106 K | 3 | 139 K | ||
8 | 139 K | 5 | 139 K | ||
16 | 205 K | 7 | 139 K | ||
32 | 337 K | 9 | 139 K |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Zhang, H.; Zhao, Z.; Tang, P.; Zhang, Z. Attention to Both Global and Local Features: A Novel Temporal Encoder for Satellite Image Time Series Classification. Remote Sens. 2023, 15, 618. https://doi.org/10.3390/rs15030618
Zhang W, Zhang H, Zhao Z, Tang P, Zhang Z. Attention to Both Global and Local Features: A Novel Temporal Encoder for Satellite Image Time Series Classification. Remote Sensing. 2023; 15(3):618. https://doi.org/10.3390/rs15030618
Chicago/Turabian StyleZhang, Weixiong, Hao Zhang, Zhitao Zhao, Ping Tang, and Zheng Zhang. 2023. "Attention to Both Global and Local Features: A Novel Temporal Encoder for Satellite Image Time Series Classification" Remote Sensing 15, no. 3: 618. https://doi.org/10.3390/rs15030618
APA StyleZhang, W., Zhang, H., Zhao, Z., Tang, P., & Zhang, Z. (2023). Attention to Both Global and Local Features: A Novel Temporal Encoder for Satellite Image Time Series Classification. Remote Sensing, 15(3), 618. https://doi.org/10.3390/rs15030618