Abstract
Over the past decade, the frequency and impact of both natural and human-induced disasters have increased significantly, highlighting the urgent need for effective and timely relief operations. Disaster response requires efficient allocation of resources to the right locations and disaster types in a cost- and time-effective manner. However, during such events, large volumes of unverified and rapidly spreading information—especially on social media—often complicate situational awareness and decision-making. Consequently, extracting actionable insights and accurately classifying disaster-related information from social media platforms has become a critical research challenge. Machine Learning (ML) approaches have shown strong potential for categorizing disaster-related tweets, yet substantial variations in model accuracy persist across disaster types and regional contexts, suggesting that universal models may overlook linguistic and cultural nuances. This paper investigates the categorization and sub-categorization of natural disaster tweets using a labeled dataset of over 32,000 samples. Logistic Regression and Random Forest classifiers were trained and evaluated after comprehensive preprocessing to predict disaster categories and sub-categories. Furthermore, a country-specific prediction framework was implemented to assess how regional and cultural variations influence model performance. The results demonstrate strong overall classification accuracy, while revealing marked differences across countries, emphasizing the importance of context-aware, culturally adaptive ML approaches for reliable disaster information management.