Next Article in Journal
EfficientRDet: An EfficientDet-Based Framework for Precise Ship Detection in Remote Sensing Imagery
Previous Article in Journal
CSTC: Visual Transformer Network with Multimodal Dual Fusion for Hyperspectral and LiDAR Image Classification
Previous Article in Special Issue
The Paradigm Shift in Hyperspectral Image Compression: A Neural Video Representation Methodology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study

by
Shubham Subhankar Sharma
,
Jit Mukherjee
and
Fabio Dell’Acqua
*
Department of Electrical, Computer & Biomedical Engineering, University of Pavia, 27100 Pavia, Italy
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(18), 3159; https://doi.org/10.3390/rs17183159
Submission received: 27 June 2025 / Revised: 25 August 2025 / Accepted: 8 September 2025 / Published: 11 September 2025

Abstract

Droughts significantly impact agriculture, water resources, and ecosystems. Their timely detection is essential for implementing effective mitigation strategies. This study explores the use of multispectral Sentinel-2 remote sensing indices and machine learning techniques to detect drought conditions in three distinct regions of India, such as Jodhpur, Amravati, and Thanjavur, during the Rabi season (October–April). Twelve remote sensing indices were studied to assess different aspects of vegetation health, soil moisture, and water stress, and their possible joint use and influence as indicators of regional drought events. Reference data used to define drought conditions in each region were primarily sourced from official government drought declarations and regional and national news publications, which provide seasonal maps of drought conditions across the country. Based on this information, a district vs. year (3 × 10) ground truth is created, indicating the presence or absence of drought (Drought/No Drought) for each region across the ten-year period. Using this ground truth table, we extended the remote sensing dataset by adding a binary drought label for each observation: 1 for “Drought” and 0 for “No Drought”. The dataset is organized by year (2016–2025) in a two-dimensional format, with indices as columns and observations as rows. Each observation represents a single measurement of the remote sensing indices. This enriched dataset serves as the foundation for training and evaluating machine learning models aimed at classifying drought conditions based on spectral information. The resultant remote sensing dataset was used to predict drought events through various machine learning models, including Random Forest, XGBoost, Bagging Classifier, and Gradient Boosting. Among the models, XGBoost achieved the highest accuracy (84.80%), followed closely by the Bagging Classifier (83.98%) and Random Forest (82.98%). In terms of precision, Bagging Classifier and Random Forest performed comparably (82.31% and 81.45%, respectively), while XGBoost achieved a precision of 81.28%. We applied a seasonal majority voting strategy, assigning a final drought label for each region and Rabi season based on the majority of predicted monthly labels. Using this method, XGBoost and Bagging Classifier achieved 96.67% accuracy, precision, and recall, while Random Forest and Gradient Boosting reached 90% and 83.33%, respectively, across all metrics. Shapley Additive Explanation (SHAP) analysis revealed that Normalized Multi-band Drought Index (NMDI) and Day of Season (DOS) consistently emerged as the most influential features in determining model predictions. This finding is supported by the Borda Count and Weighted Sum analysis, which ranked NMDI, and DOS as the top feature across all models. Additionally, Red-edge Chlorophyll Index (RECI), Normalized Difference Water Index (NDWI), Normalized Difference Moisture Index (NDMI), and Ratio Drought Index (RDI) were identified as important features contributing to model performance. These features help reveal the underlying spatiotemporal dynamics of drought indicators, offering interpretable insights into model decisions. To evaluate the impact of feature selection, we further conducted a feature ablation study. We trained each model using different combinations of top features: Top 1, Top 2, Top 3, Top 4, and Top 5. The performance of each model was assessed based on accuracy, precision, and recall. XGBoost demonstrated the best overall performance, especially when using the Top 5 features.
Keywords: copernicus; agricultural applications; Sentinel-2; SHAP; drought detection; Borda Count; XGBoost; India; machine learning; remote sensing indices; bagging classifier copernicus; agricultural applications; Sentinel-2; SHAP; drought detection; Borda Count; XGBoost; India; machine learning; remote sensing indices; bagging classifier

Share and Cite

MDPI and ACS Style

Sharma, S.S.; Mukherjee, J.; Dell’Acqua, F. Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study. Remote Sens. 2025, 17, 3159. https://doi.org/10.3390/rs17183159

AMA Style

Sharma SS, Mukherjee J, Dell’Acqua F. Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study. Remote Sensing. 2025; 17(18):3159. https://doi.org/10.3390/rs17183159

Chicago/Turabian Style

Sharma, Shubham Subhankar, Jit Mukherjee, and Fabio Dell’Acqua. 2025. "Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study" Remote Sensing 17, no. 18: 3159. https://doi.org/10.3390/rs17183159

APA Style

Sharma, S. S., Mukherjee, J., & Dell’Acqua, F. (2025). Leveraging Sentinel-2 Data and Machine Learning for Drought Detection in India: The Process of Ground Truth Construction and a Case Study. Remote Sensing, 17(18), 3159. https://doi.org/10.3390/rs17183159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop