Abstract
Droughts significantly impact agriculture, water resources, and ecosystems. Their timely detection is essential for implementing effective mitigation strategies. This study explores the use of multispectral Sentinel-2 remote sensing indices and machine learning techniques to detect drought conditions in three distinct regions of India, such as Jodhpur, Amravati, and Thanjavur, during the Rabi season (October–April). Twelve remote sensing indices were studied to assess different aspects of vegetation health, soil moisture, and water stress, and their possible joint use and influence as indicators of regional drought events. Reference data used to define drought conditions in each region were primarily sourced from official government drought declarations and regional and national news publications, which provide seasonal maps of drought conditions across the country. Based on this information, a district vs. year (3 × 10) ground truth is created, indicating the presence or absence of drought (Drought/No Drought) for each region across the ten-year period. Using this ground truth table, we extended the remote sensing dataset by adding a binary drought label for each observation: 1 for “Drought” and 0 for “No Drought”. The dataset is organized by year (2016–2025) in a two-dimensional format, with indices as columns and observations as rows. Each observation represents a single measurement of the remote sensing indices. This enriched dataset serves as the foundation for training and evaluating machine learning models aimed at classifying drought conditions based on spectral information. The resultant remote sensing dataset was used to predict drought events through various machine learning models, including Random Forest, XGBoost, Bagging Classifier, and Gradient Boosting. Among the models, XGBoost achieved the highest accuracy (84.80%), followed closely by the Bagging Classifier (83.98%) and Random Forest (82.98%). In terms of precision, Bagging Classifier and Random Forest performed comparably (82.31% and 81.45%, respectively), while XGBoost achieved a precision of 81.28%. We applied a seasonal majority voting strategy, assigning a final drought label for each region and Rabi season based on the majority of predicted monthly labels. Using this method, XGBoost and Bagging Classifier achieved accuracy, precision, and recall, while Random Forest and Gradient Boosting reached and , respectively, across all metrics. Shapley Additive Explanation (SHAP) analysis revealed that Normalized Multi-band Drought Index (NMDI) and Day of Season (DOS) consistently emerged as the most influential features in determining model predictions. This finding is supported by the Borda Count and Weighted Sum analysis, which ranked NMDI, and DOS as the top feature across all models. Additionally, Red-edge Chlorophyll Index (RECI), Normalized Difference Water Index (NDWI), Normalized Difference Moisture Index (NDMI), and Ratio Drought Index (RDI) were identified as important features contributing to model performance. These features help reveal the underlying spatiotemporal dynamics of drought indicators, offering interpretable insights into model decisions. To evaluate the impact of feature selection, we further conducted a feature ablation study. We trained each model using different combinations of top features: Top 1, Top 2, Top 3, Top 4, and Top 5. The performance of each model was assessed based on accuracy, precision, and recall. XGBoost demonstrated the best overall performance, especially when using the Top 5 features.