AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19

Pramanik, Prithviraj; Mondal, Tamal; Arosh, Sirshendu; Saha, Mousumi

doi:10.3390/su17198924

Open AccessArticle

AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19

by

Prithviraj Pramanik

^1,*

,

Tamal Mondal

²,

Sirshendu Arosh

² and

Mousumi Saha

¹

Department of Computer Science & Engineering, National Institute of Technology Durgapur, Durgapur 713209, India

²

Symbiosis Centre for Information Technology, Symbiosis International (Deemed University), Pune 411057, India

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(19), 8924; https://doi.org/10.3390/su17198924

Submission received: 25 August 2025 / Revised: 1 October 2025 / Accepted: 3 October 2025 / Published: 8 October 2025

(This article belongs to the Special Issue Air Pollution and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

Air pollution has become a significant concern for human health, especially in developing countries. Among Primary Pollutants, particulate matter 2.5 (

{PM}_{2.5}

), refers to airborne particles which have a diameter of 2.5 micrometres or less, and has become a widely used measure for monitoring air quality globally. The standard go-to method usually uses Federal Reference Grade sensors to understand air quality. But, they are quite cost-prohibitive, so the popular alternative is low-cost (LC) air quality sensors. Even LC air quality monitors do not cover many areas, especially across the global south. On the other hand, the ubiquitous use of online social media OSM has led to its evolution in participatory sensing. While it does not function as a physical sensor, it can be a proxy indicator of public perception on the topic under study. OSM platforms such as Twitter/X and Reddit have already demonstrated their value in understanding human perception across various domains, including air quality monitoring. This study focuses on understanding air pollution in a resource-constrained setting by examining how the community perception on social media can complement traditional monitoring. We leverage metadata readily available from social media user data to find patterns with air quality fluctuations before and during the pandemic. We use the US Embassy

{PM}_{2.5}

data for baseline measurement. In the study, we empirically analyse the variations in quantitative & intent-based community perception in seasonal & pandemic outbreaks with varying air quality. We compare the baseline against temporal & user-specific attributes of Twitter/X relating to tweets like daily frequency of tweets, tweet lags 1–5, user followers, user verified, and user lists memberships across two timelines: pre-COVID-19 (20 March 2019– 29 February 2020) & COVID-19 (1 March 2020–20 September 2020). Our analysis examines both the quantitative and the intent-based community engagement, highlighting the significance of features like user authenticity, tweet recurrence rates, and intensity of participation. Furthermore, we show how behavioural patterns in the online discussions diverged across the two periods, which reflected the broader shifts in the air pollution levels and the public attention. This study empirically demonstrates the significance of X/Twitter metadata, beyond standard tweet content, and provides additional features for modelling and understanding air quality in developing countries.

Keywords:

participatory sensing; air quality; social media; association; Social Network Features; COVID-19

1. Introduction

“You can’t manage what you don’t measure”—Peter Drucker [1]

The Global Industrial Revolution has significantly improved the quality of life and the development of the global economy [2]. But at what cost? The economic and quality of life improvements have been accompanied by the exponential increase in air pollutant emissions, which directly impact human health [3]. Global air pollution causes an estimated

8.1

trillion US dollars loss to the global economy, approximately

6.1 %

of the global Gross Domestic Product (GDP) [4]. Poor air quality significantly affects human health, and given the insidious character of air quality on health, there is a severe need for awareness and precautionary measures against it [5].

This situation is particularly exacerbated in developing countries. Due to escalated air pollution, Southeast Asia and African countries face the highest burden, and poor air quality has become the second-highest risk factor of death in 2021 [6]. The issue of poor air quality is clearly visible in India, where some 13 of the top 20 polluted cities are located, per the World Air Quality Report 2024 [6]. According to the Global Burden of Disease (GBD) Assessment 2017 [7], India is one of the leading nations where increasing air pollution is a significant health risk factor, causing millions of unnatural deaths and health hazards in urban regions like Delhi, Kanpur, Kolkata, Kochi, and suburban areas like Korba (Chattisgarh) and Ghaziabad (outside of Delhi).

Air pollution in India’s leading cities is primarily caused by particulate matter (PM), particularly the primary pollutant

{PM}_{2.5}

. In New Delhi, it is reported that improved air quality to the WHO standard can add 7.8 years to the residents’ life expectancy [8,9]. Our decision to focus on

{PM}_{2.5}

was guided by both data availability and public health relevance for later downstream tasks. Given the choice among

{PM}_{1.0}

,

{PM}_{2.5}

and

{PM}_{10}

, the ultrafine particles

{PM}_{1.0}

are of emerging interest, but it is not routinely monitored or widely reported in Delhi or across India. The lack of consistent datasets at this scale makes meaningful correlation analysis infeasible within the scope of our study. Considering

{PM}_{10}

, the coarse fraction (2.5–

10 µ

m) is typically less directly linked to severe cardiopulmonary and mortality outcomes compared to

{PM}_{2.5}

. Studies, which include WHO Global Air Quality Guidelines (2021) [5], identify

{PM}_{2.5}

as the dominant pollutant of concern because of its ability to penetrate deeply into the alveolar regions of the lungs and its strong association with health burden indicators. Thus,

{PM}_{2.5}

is selected as the major pollutant for our study.

Monitoring air pollutants is crucial for identifying hotspots and shaping effective policies. The Central Pollution Control Board of India (CPCB) (https://cpcb.nic.in/, accessed on 19 August 2025) has established Continuous Ambient Air Quality Monitoring Stations (CAAQMS) in various cities, including Delhi, to assess air quality and regulate pollution levels by WHO [10] and NAAQS [11] standards. However, air quality monitoring (AQM) stations only cover

12 %

of the country, according to CPCB’s rule of thumb [12]. The limited deployment of CAAQM sites makes it difficult to collect air quality data. Additionally, each CAAQM station incurs significant installation and annual maintenance expenditures [13]. In terms of measurements other than federal reference monitors, there are alternative methods as well. Techniques are using low-cost sensing for air quality [14] and satellite-based sensing [15,16] along with physical [17] and hybrid machine learning models [18]. With machine learning model-based techniques, with data-driven results, novel data signals can yield better pattern recognition [19] with better interpretability, but gaps remain.

In parallel, the ubiquitous use of online social media (OSM) like Twitter/X, Reddit, Sina Weibo has strengthened its role in participatory sensing. While OSM does not replace physical monitors, OSM functions as a proxy indicator of public perception. Studies comparing multiple social platforms (e.g., Twitter vs. Reddit) have shown that user responses can differ in timing, volume, and content, underscoring the potential value of cross-platform perspectives in sensing public sentiment [20]. Recent cross-platform studies also show that user perceptions can vary depending on the medium. For example, Ref. [21] compared public discourse on electric vehicles across Reddit and Twitter, highlighting differences in demographic representation and discussion focus. Urban populations increasingly use platforms like Twitter/X and Sina Weibo to express their opinions on environmental issues. Recent studies have shown that social media data can capture community views on pollutants (especially

{PM}_{2.5}

), track spatiotemporal pollution dynamics using machine learning, and reveal how people respond to deteriorating air quality [22,23,24,25,26,27]. India’s growing user base on Twitter/X offers a unique opportunity to analyse perceptions and patterns.

In our prior work [13,28], we examined how Twitter/X data can identify pollution signals amid noise, including time drifts between public perception and ground truth, anomalous surges of discussion, and user-network influences such as retweets, followers, and favourites. Baseline

{PM}_{2.5}

measurements were collected from the U.S. Embassy reference-grade monitor at RK Puram, New Delhi (Figure 1). Building on prior findings [13,28], and after integrating this stream with the second dataset, the present study extends the analysis to characterize temporal and user-specific variation across the pre-COVID-19 and COVID-19 periods. The baseline

{PM}_{2.5}

data were collected from the US Embassy’s Reference Grade Sensor located at RK Puram (Figure 1). Building on prior findings [13,28], the current study extends the analysis to characterize the temporal and user-specific variations across pre-COVID-19 and COVID-19 periods.

During COVID-19, Delhi experienced up to a

52 %

reduction in particulate matter concentrations [31], alongside shifts in online discussions that focused on “Government Initiatives,” “Pollution Control Behaviours,” and, to a lesser extent, “Awareness Campaigns” [32]. Under normal conditions, seasonal changes in

{PM}_{2.5}

are mirrored in Twitter/X discussions, with frequent use of terms such as “severe,” “breathe,” “choke,” and “worse” [25,33].

Based on this context, our research is driven by two questions:

1.: How do temporal and user-specific features of tweets (e.g., frequency, lags, and user characteristics) relate to ${PM}_{2.5}$ concentrations?
2.: How do these factors contrast their behavioural patterns across seasonal variation and black swan events such as the COVID-19 pandemic?

The term “AirCalypse,” combining “Air” and “Apocalypse,” highlights the urgency of the air pollution crisis and continues our earlier series [13,28]. In this study, we empirically explore the time-synchronised association of

{PM}_{2.5}

with Twitter/X metadata in New Delhi over 18 months (February 2019–September 2020), spanning pre-COVID-19 and COVID-19 timelines. Unlike prior work focused primarily on content, this paper emphasises metadata, tweet frequency, lags, recurrence, and user authenticity as features with potential utility for future data-driven air quality modelling.

2. Literature Survey

The current study examines the temporal relationship between Twitter-specific features, such as daily tweet frequency, tweet lags (1–5), user followers, verified, listed, and user favourites, and

{PM}_{2.5}

levels as part of outdoor air quality monitoring. Given the efficacy of online social media (OSM) in air quality monitoring, popular social media platforms such as Twitter/X and Sina-Weibo (China’s largest micro-blogging service) have been observed to serve as the primary data sources for obtaining reports, opinions, sentiments, and so on, disseminated by the community. In previous investigations, researchers used various mining techniques to interpret community perception traits into these OSM platforms while respecting outdoor air quality monitoring.

First, several works have focused on prediction and modelling approaches using meteorological and pollutant data. In [34,35], the authors exploit primary and secondary meteorological and pollutant data, suggesting a machine learning-based technique to increase air quality prediction. Similarly, Ref. [36] applied a Bi-LSTM deep learning model to assess air quality changes before, during, and after COVID-19 lockdowns across multiple cities in Henan, China. Their findings highlight how restrictions reduced

{PM}_{2.5}

,

{PM}_{10}

,

{NO}_{2}

, and

{SO}_{2}

, illustrating both the predictive capacity of deep learning and the unique natural experiment created by the pandemic. These authors used various statistical analyses, machine learning, and deep learning methods to estimate the concentration of pollutants, focusing on pollutants. After identifying the most impacted pollutants using statistical methods, the machine learning and deep learning models were implemented, and it was discovered that deep learning models are more effective in forecasting.

Next, we showcase some studies that have examined Twitter/X as a tool for air quality monitoring. In [37], the authors investigate how

{PM}_{2.5}

air pollution hazards are defined and appraised in a networked public sphere, utilising Twitter data, government documents, and media stories. It blends Beck’s idea of risk society with digital media theories. The approach also emphasises a transnational but linguistically split public sphere and the influence of media. This article delves into some statistical analysis of the implications of research questions connected to the above features and maps their impacts. Whereas [25] investigated the use of Twitter data for qualitative air pollution monitoring in Delhi between 2019 and 2020. Tweets were rated as poor, good, or neutral air quality using a machine learning model that included embedding and BiLSTM layers.

{PM}_{2.5}

concentration values were determined by analysing tweets and official CAAQMS data. The approach demonstrated remarkable accuracy (80–

99 %

) under harsh air quality conditions. Its success depends on public awareness, Twitter engagement, and visible air quality improvements. The authors in [26] investigate predicting urban air quality using Twitter data in cities without monitoring stations. A framework for gathering and geo-tagging relevant tweets is created, and transfer learning is utilised to apply ideas from monitored to unmonitored cities. Tests in UK and US cities reveal that Twitter-based estimations are accurate, although not as precise as spatial interpolation. However, combining the two systems enhances accuracy, particularly in remote towns. The study emphasises the utility of social media for air quality monitoring. Gradient tree boosting, a regression-based method, was applied here. The work in [27] proposes a framework to model and analyse how air quality messages spread via Twitter/X. It investigates both the flow of messages and the content supplied by users. The method employs natural language processing (NLP) tools and deep learning classification algorithms to categorise tweets from scratch. It uses both quantitative and qualitative methodologies within an interdisciplinary framework. The methodology is demonstrated through a specific air quality use case. Finally, the work in [33] analyses nearly two years of Twitter data (September 2015–May 2018) from Paris, London, and New Delhi to assess public responses to air quality issues. It was discovered that health concerns outweighed reactions to deteriorating air quality, particularly in New Delhi. The study discovers hashtags that best correlate with local pollution levels and demonstrates consistent public behaviour patterns across cities. Topic modelling identifies major themes such as health, policy, and event-specific pollution spikes. The study shows that Twitter can be useful for large-scale, real-time public opinion research on environmental health. Text classification has been carried out using machine learning methods.

Beyond the single-platform analyses, researchers have also examined cross-platform differences in perception. For example, Ref. [20] looks into the 2019 Ridgecrest earthquake across Twitter and Reddit, showing how public response varies between platforms. Similarly, Ref. [21] compared discussions on electric vehicles across Reddit and Twitter, finding distinct patterns of demographic representation and discourse focus. These studies highlight the criticality of understanding cross-platform variation when analysing public perceptions.

Besides Twitter, several studies have explored community perceptions posted on Sina-Weibo in the context of air quality monitoring. The authors in [22] investigate how to track air quality trends and public perception. Researchers assessed 93 million posts using keyword filtering and topic models to discover pollution-related information. Message volumes were compared to official pollution data from 74 cities to evaluate reliability. A qualitative analysis of sample posts indicated frequent discussions of health issues and behavioural responses. The findings emphasise Sina Weibo’s potential as a valuable real-time environmental health monitoring source in China. Basic statistical tools such as Pearson correlation and qualitative data were used in this study. Similarly, Ref. [23] uses geo-targeted Sina Weibo posts to track air quality trends in major Chinese cities. A social media analytics framework was created to investigate the relationship between Weibo postings and official Air Quality Index (AQI) data. Messages were divided into three categories: retweets, app-generated, and original individual posts. The original individual messages had the strongest association with AQI changes. The findings indicate that filtered social media data can track air quality changes over time. Gradient Tree Boosting (GTB) has been used to solve classification difficulties. In contrast, Ref. [24] suggests that social media analysis can be a cost-effective alternative to traditional environmental monitoring in China. An Environmental Quality Index (EQI) was created to gauge public opinion about air, water, and food quality. Text data from Sina Weibo and Baidu Tieba (2015–2016) were examined using a support vector machine (SVM), obtaining

85.67 %

classification accuracy. The EQI scores were determined for 27 provinces. Results were consistent with official data, demonstrating the model’s viability and effectiveness.

Beyond outdoor air quality, researchers have also investigated indoor environments through social media. Ref. [38] analysed indoor air quality using social media and NLP methods from the perception of United States-based occupants, highlighting the role of OSM in understanding indoor environmental health concerns. More broadly, OSM-based analysis has extended to environmental issues beyond air quality. Ref. [39] conducted a sentiment and emotion analysis of environmental posts, which provides insights into how communities express concerns about ecological issues online. These studies show that OSM-based environmental monitoring is becoming more widespread indoors and outdoors.

Finally, beyond social media–driven studies, several investigations have specifically assessed the impact of COVID-19 lockdowns on urban air quality. In [40], the authors examined the Madrid region, finding significant

{NO}_{2}

and

{NO}_{x}

reductions during mobility restrictions. Ref. [41] analysed pollutant patterns in Lahore, Pakistan, reporting sharp declines during lockdown followed by post-lockdown surges, with strong correlations between PM and

{NO}_{2} {/SO}_{2}

. Similarly, Ref. [42] studied Shanghai, observing reductions of 61% in

{PM}_{2.5}

and 43% in

{PM}_{10}

, underscoring the combined role of emission reductions and meteorological influences. Together, these studies reinforce that COVID-19 restrictions offered valuable insight into the anthropogenic drivers of urban air quality.

Novelty of Present Study: It is evident from past studies that the evolution of AQI prediction has already been initiated through several investigations. So far, the community has tried to analyse the significance of meteorological and seasonal factors over pollutants for predictive modelling using classical machine learning, deep learning, and attention models [34,35]. Besides, the exploitation of Sina-Weibo textual messages associated with air quality is also made to (a) differentiate social media data with AQI, (b) quantification of public perceptions to pollutions, (c) monitor the spatio-temporal dynamics of AQI through machine learning, and (d) analyse & classify community response towards air quality degradation [22,23,24]. Furthermore, it has been perceived from studies [20,21,25,26,27,37] that the contextual or cross-platform analysis of community response tweets brings policy makers & researchers to map social perception and AQI in real-time. Such associations were captured through analysing (a) temporal correlation of tweets having trending hashtags, (b) content classification through machine learning, and (c) trending pollution intent topics through unsupervised models in a timeline. However, investigating the variations of platform-specific metadata & their derivations with transforming pollution levels remains unexplored. The significance of criticality over relevance in data stream volume, user handles, and other additional metadata on air quality should be explored. Moreover, the contrasts in behavioural patterns of such factors with shifting air quality at pre-COVID-19 and COVID-19 timelines (pre-COVID-19 20 March 2019 to 19 March 2020, COVID-19 20 March 2020 to 20 September 2020), and the rationale behind such patterns, should also be examined. In the current study, such an attempt has been made by exploring the temporal & user-defined properties of tweet objects in terms of daily magnitude, lags 1–5, user followers, user verified, user listed, and user favourite to analyse their impact on air quality (particularly on variations in

{PM}_{2.5}

concentration). Later, the significance of features, i.e., intensity of community engagement, community intents, user authenticity, and tweet recurrence rate, derived from temporal & user-specific properties, is studied & analysed at changing

{PM}_{2.5}

concentration at the mentioned pre-COVID-19 & COVID-19 timeline. Finally, the behavioural patterns of key features are evaluated on the pre-COVID-19 and COVID-19 timelines, highlighting their efficacy in detecting

{PM}_{2.5}

concentration.

3. Material & Methods

Considering severe air pollution in Delhi, hashtags such as #airpollutiondelhi, #delhismog, #delhiairpollution, and #delhipollution gained significant traction on Twitter as air quality levels worsened dramatically. The increase in pollution adversely impacts residents of both urban and suburban regions, resulting in a substantial surge of tweets that rapidly turn these hashtags into trending topics. The initial phase of our analytical framework involved collecting tweets related to air pollution from Twitter. For this purpose, we used Twitter’s Streaming API with the Researcher Access API (https://docs.tweepy.org/en/stable/api.html, accessed on 20 August 2025). It was continuously streamed through a local server running 24 × 7 from 20 March 2019 to 20 September 2020, retrieving more than

1.1

million tweets. Network and power interruptions resulted in snags in data collection, with some days of data missing within this period. Every tweet gathered via the API contains several essential attributes, including a unique 64-bit integer tweet-id, the creation timestamp (created_at), the user_id of the tweet author, tweet text, and many more. Given the scope of our research for monitoring Delhi’s air pollution, we filtered the data set to include only English-language tweets explicitly related to Delhi’s air pollution for reliable preprocessing and NLP tool support. We collected using the X/Twitter streaming API filtering features that allow exclusion based on language, location, and specific keywords. Additional filtering was performed using combinations of targeted hashtags such as #NewDelhiairpollution, #delhipollution, #delhismog, #delhichokes and #savedelhi. After applying such parameters, we analysed the dataset of 1.1 million tweets, focusing on tweet content and user profiles. Related to the filtration of tweet content while analysing user intents, we only considered the removal of undesired elements, i.e., stop words, hashtags, links, emojis, URLs, @, other exclamations, and non-ASCII characters, since they are not required in the intent analysis. We also collected air quality data from the US Embassy’s monitoring station in Delhi (https://in.usembassy.gov/air-quality-data-information-4/ accessed on 20 August 2025). The US Embassy’s data provided detailed monitoring on principal pollutants, including

{PM}_{2.5}

, with rigorous data validation recorded at 60-min intervals. These ground truth data were analysed alongside the tweets for the time frame from March 2019 to September 2020, enabling a comprehensive assessment of pollution patterns, particularly concerning

{PM}_{2.5}

levels in Delhi. The details about the data collection process have been depicted through Figure 2.

Feature Analysis in Pre-COVID-19 & COVID-19 Scenario

For analysing the impact of temporal & user-specific features on air quality in pre-COVID-19 & COVID-19 scenario, the tweets related to air pollution in Delhi are considered for analysis, which spanned around the timelines, i.e., 20 March 2019 to 19 March 2020 & 20 March 2020 to 20 September 2020 respectively. Here, the periodic distribution of Twitter-specific features, i.e., tweet frequency, tweet lags 1–5 and user-specific features, i.e., followers, verified, listed, and favourite counts, have been assessed. Tweet frequency is defined as the number of tweets that have been posted in an interval on a particular topic. It plays an important role in OSM as it impacts the topic visibility and the user engagement. Their impacts are measured in context to the raw concentration of

{PM}_{2.5}

at the timelines.

Temporal Features: There has been a lot of research carried out in the recent past, which shows the correlation between

{PM}_{2.5}

concentration levels and social media posts (X/Twitter, and Sina-Weibo) related to pollution at different geographic granularities. For instance, the authors in [33] have established significant associations for pollution-related posts of London, Delhi, Beijing, and many more. Such insights demonstrate the public concerns with the increasing rise of

{PM}_{2.5}

concentration, which serves as a proxy for air quality monitoring. Besides, through [13], it is evident that along with the inherent rise of social perception with the increase of

{PM}_{2.5}

, there has been a time drift between social perception on Twitter and actual ground truth (raw concentration of

{PM}_{2.5}

). The reason is that social perception takes longer to form compared to chemical sensors employed in sensors. Considering such nature, tweet frequency lags, i.e., lag 1–5, have been regarded as features based on a day basis to shift the delay in social perception with sensory ground truth data. The temporal lag features are generated from the aggregated daily tweet counts to explore the relationship between X/Twitter community perception on social media and measured air quality. The lag feature represents the shifted value of the original time series, such that the information from prior days is used to explain present-day variation. In this study, the lagged variables were created for one to five days preceding the current observation, i.e., Lag 1 corresponds to the number of tweets posted one day prior, Lag 2 corresponds to two days prior, and so on up to Lag 5. The rationale for including lagged features is twofold. First, the human behavioural responses to changes in air quality are not instantaneous. For example, exposure to elevated

{PM}_{2.5}

levels may increase online discourse only after symptoms are felt or after media coverage disseminates the event. Second, from a modelling perspective, lagged features minimise the risk of temporal leakage by ensuring that only past user activity is used to interpret or forecast present air quality levels. Prior studies on temporal dynamics of social media have also indicated that event-related discussions often peak with a delay due to the diffusion of information across online networks. By incorporating daily tweet lags, we aim to capture these delayed behavioural patterns and assess their association with ground-truth pollution measurements at the RK Puram station in Delhi.

User Features: Recently, several studies examined how Twitter user profile variables, including follower count, verification status, listing count, and favourite count, can predict user influence in debates about air pollution and

{PM}_{2.5}

monitoring. The authors in [28] investigated user-specific attributes to identify important users and forecast retweets, implying that these measures are strong markers of user influence and engagement. Such influence and community engagement in pollution monitoring have been analysed daily with sensory ground truth, i.e.,

{PM}_{2.5}

concentration.

4. Results

This section presents the empirical results linking Twitter/X metadata with ground-truth

{PM}_{2.5}

in New Delhi. We organise the findings by period: pre-COVID-19 (20 March 2019 to 29 February 2020) and COVID-19 (1 March 2020 to 20 September 2020). We further evaluate two types of features: (i) temporal tweet signals (daily tweet frequency and lags 1–5 days) and (ii) user-specific attributes (followers, verification, list membership, favourites), alongside reproduction dynamics (retweet rates and tweet–retweet matches) in the following subsections.

4.1. Evaluating Feature Significance—Pre-COVID-19 Scenario

Now, the temporal & user-specific features of the tweet corpus are analysed, which have been posted in the pre-COVID-19 timeline, to evaluate their impact on

{PM}_{2.5}

raw concentration. In Figure 3, the timeline plot of behavioural patterns of tweet count & count lags (1–5) has been depicted along with the change in

{PM}_{2.5}

raw concentration daily.

It is evident from the figure that the raw concentration had started rising by the end of October 2019 and reached its peak during November & December 2019. Likewise, the daily count of tweets also started increasing from the start of November 2019 as the community reacted as they experienced pollution. Here, the cross-correlation between daily tweet count &

{PM}_{2.5}

concentration is observed, which evaluates the correlation between tweet counts and

{PM}_{2.5}

levels at different time shifts (lags). As Figure 3 shows, all lags are positive, meaning a delayed community response to pollution. Also, it’s evident from the figure that shifting lags 1–5 days improves the pattern similarity of count with that of

{PM}_{2.5}

concentration. Table 1 refers to the frequency of community perceptions in the form of tweets during changes in concentration at the pre-COVID timeline. We observed that from October 2019 onwards, the community started responding with an increasing level,

{PM}_{2.5}

, and it reached its highest level in November & December 2019, respectively, due to the poorest air quality index. Besides, since February 2020, the response has declined due to improved air quality index. Table 2 depicts the community intent-based tweet snapshots. From March to August 2019, the low concentration level affects the community’s intentions towards appreciating the efforts for improving air quality. Hence, minimal change in tweet frequency was observed. However, from the end of August 2019 to September 2019, the intent was transformed into suggestions to avoid adverse effects of air pollution and economic loss, among others. Relatively, the tweet frequency also increased. Finally, from November 2019 to January 2020, due to a sudden climb in concentration levels, a drastic rise in community response can be observed (Table 1). Here, the community intent mostly focused on complaining/accusing each other, society, Government, etc., due to poor air quality (Table 2).

In Figure 4, the timeline plot of behavioural patterns of user-specific features, i.e., count of followers, verified, listed, favourite, etc., has been depicted along with the change in

{PM}_{2.5}

raw concentration daily. To reduce the day-to-day variability and highlight underlying patterns, we added a 5-day rolling mean with the 95% confidence interval for each feature. This allows more precise visualisation of the temporal trends and the uncertainty around them. It is pretty evident from Figure 4 that, except for the ’verified count’, the other user features do not have an immediate impact on raw concentration. Here, ’verified count’ specifies whether a verified or non-verified handle generated a particular tweet. In the context of verified users, the monthly distribution of users (with repetition) tweeting on pollution has been analysed. From Table 3, it’s evident that the climb in concentration levels during September 2019 to February 2020 impacts the frequency of tweets posted from a verified handle. The verified users are active as social sensors who actively monitor air quality, report on air quality status, suggest necessary actions, etc. Such user-participation also increased during September 2019 to February 2020 due to a rapid increase in

{PM}_{2.5}

concentration level. It has also been observed that a total of

26, 696

tweets were generated (with repetition) by verified users in the pre-COVID-19 timeline, out of which most proportions were captured from October 2019 to February 2020.

Around

93 %

of tweets were propagated on October 2019 to February 2020 during the rise in concentration levels (Table 3). Considering the participation of non-verified users, Table 3 depicts similar patterns in tweet propagation during September 2019 to February 2020 as the raw concentration level increases. Here also, out of 941,078 total tweets generated (with repetition) at pre-COVID-19, around

87 %

tweets were propagated on October 2019 to February 2020 during the rise in concentration levels(Table 3). Note, these observations clearly signify the presence of verified and non-verified users as sensors for monitoring & reporting air quality updates while the updation in concentration level occurs. Further, the indirect impact of tweets at the

{PM}_{2.5}

level, considering retweet rates and tweet-retweet ratio, is also analysed. For analysing such parameters, here one X/Twitter specific feature, i.e., rt_created_at has been considered.

The metadata rt_created_at provides the moment when the last original tweet/retweet of a retweet has been posted. For original tweets, the value of rt_created_at would be null; otherwise, the moment information would be stored for retweets. Based on this feature, the original tweets & retweets posted during the pre-COVID-19 timeline are segregated. After segregation, a total of 222,779 original tweets (with repetition) & 744,995 retweets (with repetition) are found, which have been posted in the pre-COVID-19 timeline, i.e., April 2019 to February 2020. While analysing timeline-based retweet rates, it is evident from Figure 5 that the retweet rate started rising by the end of October 2019 and reached its peak in November & December 2019 due to the climb in

{PM}_{2.5}

concentration (Figure 6). Besides, it has been observed that around

80 %

of the retweets are propagated from November 2019 to February 2019. Such observations clearly depict the association between the variation rate in

{PM}_{2.5}

levels and community awareness about pollution. Further, such awareness fosters environmental responsibility, encouraging positive change, the necessary course of action, criticism, etc. In addition to retweet rates, the tweet-retweet ratio evaluates the recurrence rate of an original tweet as retweeted through community posts. Here, the recurrence rate signifies a wide & rapid spread of a tweet as retweets across X/Twitter due to its content relevance with controversy, relativity, seeking actions, impactful text, etc., related to air quality in Delhi. Figure 7 shows that the increasing concentration levels from the end of October 2019 drive the recurrence rate. Original tweets influence the increasing number of users posting in the community because of the original tweets’ actionable insights, influential opinions, essential reports, criticisms, and many others. The monthly distribution of such virality of original tweets regarding recurrence rates is extracted in the pre-COVID-19 timeline. Table 4 depicts the top retweets propagated monthly on the pre-COVID-19 timeline. Note, this depiction indicates the transformation of the community’s intention towards considering highly engaging tweet contents based on the alterations in

{PM}_{2.5}

levels (discussed earlier in this Section) in the pre-COVID-19 timeline. Here, it can be perceived that the virality of the intentions relies upon the status of raw concentration at the timetable.

4.2. Evaluating Feature Significance—COVID-19 Scenario

Here we are analysing the temporal and user-specific features from the COVID-19 tweet corpus to assess their impact on

{PM}_{2.5}

raw concentration. As it has already been analysed in studies [31,32] that the significant reduction in

{PM}_{2.5}

level took place in the first phase of covid in major Indian cities, which in turn reduces the community debate on pollution & climate change and climate policies. However, the intensity of community engagement regarding tweet count, community intents on tweets, user authenticity, and tweet recurrence rates should be evaluated to analyse their associations with the alterations in

{PM}_{2.5}

level at the COVID-19 timeline. It can be observed from Figure 8 that the concentration level has declined from March 2020 and climbed in between May & June 2020, but no more than

95 μ

{g/m}^{3}

. Likewise, the community engagement in frequency & lags can be observed, which is also low across the timeline except for mid-April 2020, where a sharp peak can be observed. Table 5 shows the monthly distribution of tweet count & mean concentration level at the COVID-19 timeline, i.e., March to September 2020. An unreasonable climb can be observed in community engagement regarding tweet frequency in April 2020, although the concentration is low. Further, the tweet frequency for April 2020 is observed daily as shown in Figure 9. It can be noticed that the majority of the tweets are propagated on 9–11 April 2020. Following that, the repetitions in that duration are discarded, and 4746 unique tweets have been received. It is evident from tweets that most intents were suggestions, i.e., propagating ideas, advice, recommendations, etc., related to the pandemic outbreak & reduced pollution. However, comparatively fewer intents related to praise & complaint are observed in community engagement. This observation also remains consistent across the covid timeline. A few example tweets related to the intents propagation on 9–11 April 2020 are depicted in Table 6.

Figure 10 depicts the patterns related to user features, i.e., count of followers, verified, listed, favourite, etc., with

{PM}_{2.5}

concentration on a daily basis. Like in Figure 4, to reduce the day-to-day variability and highlight underlying patterns, we added the 5-day rolling mean with the 95% confidence interval for each feature for clearer visualisation of the temporal trends and the uncertainty around them. Here also, it is evident that except for the ‘verified’ count, the other user features do not immediately impact

{PM}_{2.5}

concentration. Therefore, the utility of user authenticity needs to be analysed. From Table 7, it is evident that the majority of verified & non-verified users were active on April 2020, resulting in around

46 %

and

56 %

of the total tweets (with repetition) posted that month, irrespective of the low concentration level. However, for other months, the engagement of verified and non-verified users adheres to the relativity with the low concentration level. These behavioural patterns are observed due to several facts & reports published related to the sharp decline of

{PM}_{2.5}

and improvement in NAAQS (https://www.thehindu.com/news/cities/Delhi/delhi-pollution-halved-during-first-phase-of-lockdown-cpcb/article31419356.ece, https://timesofindia.indiatimes.com/city/delhi/despite-lockdown-pm2-5-as-high-as-two-years-ago/articleshow/83793790.cms accessed on 19 August 2025). for several days, which drives the community to engage in discussions. The impact of retweet rates and tweet-retweet ratio is analysed for March–September 2020 and specifically in the timeline of April 2020 using rt_created_at. The utility of rt_created_at has already been discussed earlier in this Section. The original tweets and retweets made within the COVID-19 period are now separated depending on this feature. After separation, a total of 59,417 original tweets (with repetition) & 147,457 retweets (with repetition) are found that have been posted in the covid timeline, i.e., March–September 2020. From Figure 11a,b, it is evident that the retweet rate also has climbed at its peak on April 2020 irrespective of low concentration level (Table 7). Around

70 %

of overall retweets (with repetition) are posted this month. Note, such observations clearly show a little reliance on the community’s engagement in the discussions on the reports or facts related to the reduction in the air quality index. Such discussions mostly surround general awareness, hopes, and the course of action to keep pollution levels low. The recurrence rate of original tweets as retweets has been depicted in Figure 11c,d. Here, the increasing community influence of original tweets is observed in April 2020. The monthly distribution of recurrence rates has been extracted from the COVID-19 timeline.

In Table 8, the monthly arrival of top retweeted tweets has been shown. Here, it can be observed that the tweets that were mostly retweeted on April–September 2020 are related to suggestions and praise. In April 2020, the community was retweeting surprising & unexpected improvement of air quality. In May 2020, the engagement transformed into wishful thinking and call for actions. Further, in June 2020, the community retweets primarily subjected to pollution reduction initiatives, actions etc., which also continued in July 2020. Finally, in August & September 2020, the intents, i.e., related policy suggestion, mandated solutions for pollution control, praising current situation, initiatives are mostly retweeted.

5. Discussions

Numerous studies have recently been discussed regarding the effects of COVID-19 lockdown measures on air quality in different parts of India. This comprehensive country-wide curfew has dramatically improved India’s air quality.

Contrasts in Feature Influence in Pre- COVID-19 & COVID-19 Timeline: Studies show that the declining trend in

{PM}_{2.5}

concentration levels was nearly consistent across the country. In addition, there is a significant drop compared to the pre-COVID-19 situation. Conversely, a moderate correlation has been noted between the community’s opinions on X/Twitter regarding the fluctuating AQI levels and the air quality data published by print or electronic media at different times. The X/Twitter platforms raise public concerns and provide a forum for citizen participation, including complaints and experience sharing, posting news reports and print media reports on AQI levels, public sentiments, and the difficulties and complications that the rising pollution causes in daily life. These interactions frequently lead to a cycle of community knowledge, government policy responses, and reactions to pollution levels. Regarding the generated temporal & user-specific features, the differences in their dependence on the real-time variations in

{PM}_{2.5}

concentration level are now examined for both pre-COVID-19 and Covid scenarios.

5.1. Metadata-Based Attribute Reliance in Pre-COVID-19 Situation

The community’s role in spreading reports, opinions, and attitudes was investigated in the pre-COVID-19 era with

{PM}_{2.5}

variants. The daily raw concentration of

{PM}_{2.5}

correlates with patterns of community interaction in the conversation (Table 1). The temporal nature of tweet propagation is observed, which further correlates to the magnitude of the community’s intents & the transformation of intents based on AQI levels at the timeline (Table 2). The trend over the prevalence of verified & non-verified users’ posts is also associated with

{PM}_{2.5}

concentration change. The presence of verified and non-verified users as sensors can be observed for monitoring & reporting air quality updates while the updation in concentration level takes place (Table 3). Besides, through Figure 5 and Figure 7, it’s evident that propagating reports, intents, sentiments, and opinions in the form of retweets & recurrence ratio varies with the concentration. Figure 5 and Figure 7 show that daily PM concentrations are closely related to patterns of public engagement through social ties while spreading situational awareness. Further, the community influence through inbound original tweets comprising actionable insights, influential opinions, essential reports, criticisms, and many others is also observed through recurrence rate (Figure 7). These insights clearly suggest the critical reliance of derived temporal features, i.e., tweet frequency lags, retweet rate, and recurrence rate & user-specific features, i.e., user authenticity, and user intents, with the

{PM}_{2.5}

variations in the pre-COVID-19 timeline. Therefore, under normal circumstances, public reactions on social media correspond to the variations in AQI levels.

5.2. Metadata-Based Attribute Reliance in COVID-19 Situation

Considering the COVID-19 timeline, the relevance of temporal & user-specific features is quite associated with

{PM}_{2.5}

variations, excluding the month April 2020 (Table 5). Although, the variations in

{PM}_{2.5}

remained low across the timeline (Figure 8). In terms of temporal aspects, i.e., tweet frequency lags, retweet rate, and recurrence rate, it can be observed that there has been a sharp peak in tweet frequency, i.e., community engagement on April 2020 (Figure 10). The retweet rate climbed at its peak on April 2020 (Figure 11a,b). Besides, the recurrence rate also shoots up on the same duration (Figure 11c,d). The user features, i.e., user authenticity and user-intents, also have similar behavioural patterns. It is observed that both verified & non-verified users were active in April 2020, and an enormous chunk of discussions were made during that period, propagating intents, i.e., suggestions, ideas, advice, and recommendations. These insights clearly depict the moderate change in feature reliance on concentration in the COVID-19 scenario. Implementing several lockdowns significantly reduced traditional pollution sources such as automobile traffic and industrial activities. This resulted in a visible and quantitative increase in air quality, providing a rare and significant environmental outcome amid a global health catastrophe. Furthermore, the COVID-19 pandemic considerably impacted established patterns of media reportage and public conversation. During this time, traditional and social media focused overwhelmingly on the pandemic, pushing other critical issues—such as air pollution in major centres like Delhi—to the background. During the COVID-19 period, as seen in the preceding section, the dominant focus of environmental discourse moved to the significant decrease in PM (particulate matter) levels. Comparative examinations of PM data from the last five years revealed record-low concentrations, which were mainly ascribed to lower human activity during the lockdown. As a result, public mood and social media involvement (measured by tweet densities) closely reflected media coverage patterns. This association indicates that print and electronic media narratives significantly impact public receptivity, as shown in Figure 11.

Further, the contrasts over the influences of significant features are summarised through Table 9. Besides, the disparities in summary statistics associated with

{PM}_{2.5}

concentrations are also highlighted through Table 10. The overall volume of tweets decreased drastically, representing a 78.62% reduction in the COVID-19 timeline. The engagement of verified users also declined more sharply, at 84.12%. Likewise, the unverified users, who have generated the majority of content throughout pre-COVID-19 & COVID-19 periods, decreased their activity by 78.46%. Besides, the retweet rate also drops to 80.20%. However, the recurrence rate has a moderate decrease of 12.67%, indicating the continued presence of a core group of highly engaged users who still talk about the improvements in air quality, course of actions to be taken, etc. Also, the subjective variations can be observed in the user discourse into two periods. In pre-COVID-19, the user intent comprised mostly of complaints, suggestions, and compliments; however, during the COVID-19 timeline, complaints became less common, and the discourse shifted primarily to ideas and praises. Finally, the peak activity month for pre-COVID-19 discourse took place on November 2019 (with 17 November 2019, as peak activity day) while the

{PM}_{2.5}

concentration reached its cap. However, during Covid, the peak activity was observed on 10 April. April 2020 emerged as the most active month for community discussion, assisted by various published reports on air quality improvement (https://www.indiatoday.in/india/story/lockdown-cuts-pm2-5-pm10-levels-by-half-in-delhi-cpcb-1670273-2020-04-23, https://www.ndtv.com/delhi-news/delhi-breathes-easy-as-air-quality-improves-to-good-category-amid-coronavirus-lockdown-2202099, accessed on 20 August 2025). In addition, Table 10 specifies the statistical contrasts on

{PM}_{2.5}

levels in the pre-COVID-19 & COVID-19 timeline. It is observed that the mean concentration drops sharply around

60 %

at the Covid timeline. Also, compared to pre-COVID-19, fewer variations in concentration levels are observed in the COVID-19 timeline. Furthermore, the median concentration was also halved in the COVID-19 timeline, representing a 50% decrease. These observations clearly illustrate that community engagement increases in response to unpleasant and uncomfortable levels of air pollution, with people increasingly expressing their concerns on micro-blogging platforms and other social media channels. Such engagement can be captured effectively through temporal & user-defined features across various phases in the pollution timeline. Such features can further be prepared for modelling/estimating

{PM}_{2.5}

concentration. Therefore, the social media activity could serve as an accurate real-time air quality indicator. Further, such accuracy would have the potential to gauge immediate policy actions, such as targeted sensor deployment, vehicle restriction schemes, and the activation of air pollution controls.

6. Conclusions & Future Research

This study empirically demonstrates that predictive modelling for air quality monitoring can be significantly enhanced by using methods beyond the conventional seasonal, meteorological, and content-based features. While prior research has significantly used community sentiments, trending hashtags, and qualitative intent analysis from social media data, our findings highlight the importance of systematically incorporating platform-specific metadata like temporal lags, user-authenticity, engagement patterns, and recurrence rates into the modelling frameworks.

The results show a clear empirical relationship between

{PM}_{2.5}

concentrations and community perceptions when assessed through temporal and user-specific attributes. Moreover, the current analysis shows that the features exhibit strong dependencies when observed with air quality fluctuations across both seasonal and COVID-19 transitions. This underscores their utility as complementary signals for ground-truth measurements. In this study our analysis is limited to New Delhi, but there are similar studies in other contexts (e.g., Paris and London [33], multiple Chinese cities [22,23,42], and South Asian urban centers such as Lahore [41]) which demonstrate that linking social media signals with air quality monitoring has broader relevance. Referencing these findings strengthens the generalizability of our framework beyond Delhi. In the next step, we should assess this work across multiple cities using additional monitoring stations and multilingual analyses, enabling broader insights into public perception and region-specific participatory monitoring strategies.

Thus, the results from this research can be expanded in the following ways:

Integration of multimodal data sources: We plan to combine social media metadata with content-based features, low-cost sensor data, and satellite observations to construct more robust forecasting models.
Model explainability: When we can quantify the relative contribution of each feature (temporal, user-specific, content-based, and multimodal), we can improve transparency, interpretability, and policy relevance. That is another avenue of work expansion.
Regional Variability: Expand the study in a multi-city context by incorporating data from additional monitoring stations and expanding to multilingual analyses.
Sustainability-driven Policy Making: When we cannot measure, we cannot make decisions that can help in sustainable development. Thus, deployment of models that are enriched with explainable features can guide urban planning, adaptive pollution control, and participatory monitoring in contexts where traditional infrastructure is limited. Hence, expansion of this avenue needs further attention.

Thus, through these research directions, we propose to advance predictive accuracy while ensuring that air quality monitoring frameworks remain robust, transparent, equitable, and actionable even when the majority of the globe does not have continuous monitoring federal reference grade monitors. Without proper monitoring of the air quality, there will be a dearth of informed policy making. Hence, a multimodal data based air quality models can provide a way for data-driven decision making and in attaining the Sustainable Development Goals.

Author Contributions

Conceptualisation, P.P. and M.S.; methodology, P.P. and T.M.; software, P.P.; validation, P.P., T.M. and S.A.; formal analysis, P.P.; investigation, P.P. and T.M.; resources, M.S.; data curation, P.P.; writing—original draft preparation, P.P.; writing—review and editing, P.P., T.M., S.A. and M.S.; visualisation, P.P. and T.M.; supervision, M.S.; P.P. led the research design, data analysis, software development, and manuscript drafting. M.S. provided overall supervision, critical review, and guidance throughout the research process. T.M. contributed to data analysis, manuscripting updates, refining the methodology, improving model reproducibility, and strengthening the technical sections. S.A. assisted in conducting supplementary surveys, providing contextual interpretation, and validating insights. All authors have read and agreed to the published version of the manuscript.

Funding

This publication is an outcome of the R&D work undertaken by the Visvesvaraya PhD Scheme of the Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation. The lead author, Prithviraj Pramanik, received this fellowship. This research received no additional external funding.

Data Availability Statement

The Twitter/X data presented in this work is available on request from the corresponding author. Please note that it would be subjected to the Terms and Conditions of Twitter’s Academic Research access. The

{PM}_{2.5}

data are openly available from the U.S. Embassy and Consulate air quality monitoring sites through the US Department of State’s AirNow program (https://in.usembassy.gov/embassy-consulates/new-delhi/air-quality-data/, accessed on 19 August 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

World Bank. You Can’t Manage What You Don’t Measure. 2025. Available online: https://blogs.worldbank.org/en/education/you-can-t-manage-what-you-don-t-measure (accessed on 20 August 2025).
The Editors of Encyclopaedia Britannica. The Rise of the Machines: Pros and Cons of the Industrial Revolution. 2025. Available online: https://www.britannica.com/story/the-rise-of-the-machines-pros-and-cons-of-the-industrial-revolution (accessed on 20 August 2025).
OECD. The Economic Consequences of Outdoor Air Pollution; OECD Publishing: Paris, France, 2016. [Google Scholar] [CrossRef]
United Nations Environment Programme (UNEP). Why Dirty Air Costs Us Trillions Every Year. 2025. Available online: https://www.unep.org/news-and-stories/video/why-dirty-air-costs-us-trillions-every-year (accessed on 20 August 2025).
World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide. 2021. Available online: https://www.who.int/publications/i/item/9789240034228 (accessed on 20 August 2025).
Health Effects Institute. State of Global Air 2024 Annual Report. 2024. Available online: https://www.stateofglobalair.org/resources/report/state-global-air-report-2024 (accessed on 20 August 2025).
Balakrishnan, K.; Dey, S.; Gupta, T.; Dhaliwal, R.S.; Brauer, M.; Cohen, A.J.; Stanaway, J.D.; Beig, G.; Joshi, T.K.; Aggarwal, A.N.; et al. The impact of air pollution on deaths, disease burden, and life expectancy across the states of India: The Global Burden of Disease Study 2017. Lancet Planet. Health 2019, 3, e26–e39. [Google Scholar] [CrossRef]
Air Quality Life Index (AQLI); Energy Policy Institute at the University of Chicago (EPIC). India Fact Sheet: Air Quality Life Index. 2024. Available online: https://aqli.epic.uchicago.edu/wp-content/uploads/2024/08/India-FactSheet_2024.pdf (accessed on 20 August 2025).
Greenstone, M.; Ganguly, T.; Hasenkopf, C.; Sharma, N.; Gautam, H. Air Quality Life Index: 2024 Annual Update. 27 August 2024. Available online: https://aqli.epic.uchicago.edu/files/AQLI-2024-Report_English.pdf (accessed on 20 August 2025).
World Health Organization. WHO Global Air Quality Guidelines—Questions and Answers. 2021. Available online: https://www.who.int/news-room/questions-and-answers/item/who-global-air-quality-guidelines (accessed on 20 August 2025).
United States Environmental Protection Agency (EPA). National Ambient Air Quality Standards (NAAQS) Table. 2025. Available online: https://www.epa.gov/criteria-air-pollutants/naaqs-table (accessed on 20 August 2025).
Central Pollution Control Board (Ministry of Environment & Forests, Government of India). Guidelines for Ambient Air Quality Monitoring. 2003. Available online: https://urbanemissions.info/wp-content/uploads/docs/2003_CPCB_Guidelines_for_Air_Monitoring.pdf (accessed on 20 August 2025).
Pramanik, P.; Nandi, S.; Saha, M. AirCalypse: Revealing fine-grained air quality from social media. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 1507–1508. [Google Scholar]
Hagan, D.H.; Gani, S.; Bhandari, S.; Patel, K.; Habib, G.; Apte, J.S.; Hildebrandt Ruiz, L.; Kroll, J.H. Inferring aerosol sources from low-cost air quality sensor measurements: A case study in Delhi, India. Environ. Sci. Technol. Lett. 2019, 6, 467–472. [Google Scholar] [CrossRef]
Dey, S.; Purohit, B.; Balyan, P.; Dixit, K.; Bali, K.; Kumar, A.; Imam, F.; Chowdhury, S.; Ganguly, D.; Gargava, P.; et al. A satellite-based high-resolution (1-km) ambient PM_2.5 database for India over two decades (2000–2019): Applications for air quality management. Remote Sens. 2020, 12, 3872. [Google Scholar] [CrossRef]
Gupta, P.; Christopher, S.A.; Wang, J.; Gehrig, R.; Lee, Y.; Kumar, N. Satellite remote sensing of particulate matter and air quality assessment over global cities. Atmos. Environ. 2006, 40, 5880–5892. [Google Scholar] [CrossRef]
Dhaka, S.; Lakshmi, S.; Vaishya, A.; Ojha, N.; Pozzer, A.; Ansari, T.; Deb, P.; Sharma, A. Influences of regional and trans-regional anthropogenic emissions on meteorology and cloud properties over western India assessed using WRF-Chem model. Environ. Sci. Pollut. Res. 2025, 32, 17931–17951. [Google Scholar] [CrossRef]
Masood, A.; Ahmad, K. Data-driven predictive modeling of PM_2.5 concentrations using machine learning and deep learning techniques: A case study of Delhi, India. Environ. Monit. Assess. 2023, 195, 60. [Google Scholar] [CrossRef] [PubMed]
Pramanik, P.; Karmakar, P.; Sharma, P.K.; Chatterjee, S.; Roy, A.; Mandal, S.; Nandi, S.; Chakraborty, S.; Saha, M.; Saha, S. Aquamoho: Localized low-cost outdoor air quality sensing over a thermo-hygrometer. ACM Trans. Sens. Netw. 2023, 19, 1–30. [Google Scholar] [CrossRef]
Ruan, T.; Kong, Q.; McBride, S.K.; Sethjiwala, A.; Lv, Q. Cross-platform analysis of public responses to the 2019 Ridgecrest earthquake sequence on Twitter and Reddit. Sci. Rep. 2022, 12, 1634. [Google Scholar] [CrossRef] [PubMed]
Ruan, T.; Lv, Q. Public perception of electric vehicles on Reddit and Twitter: A cross-platform analysis. Transp. Res. Interdiscip. Perspect. 2023, 21, 100872. [Google Scholar] [CrossRef]
Wang, S.; Paul, M.J.; Dredze, M. Social media as a sensor of air quality and public response in China. J. Med. Internet Res. 2015, 17, e22. [Google Scholar] [CrossRef]
Jiang, W.; Wang, Y.; Tsou, M.H.; Fu, X. Using social media to detect outdoor air pollution and monitor air quality index (AQI): A geo-targeted spatiotemporal analysis framework with Sina Weibo (Chinese Twitter). PLoS ONE 2015, 10, e0141185. [Google Scholar] [CrossRef]
Wang, Z.; Ke, L.; Cui, X.; Yin, Q.; Liao, L.; Gao, L.; Wang, Z. Monitoring environmental quality by sniffing social media. Sustainability 2017, 9, 85. [Google Scholar] [CrossRef]
Kumbalaparambi, T.S.; Menon, R.; Radhakrishnan, V.P.; Nair, V.P. Assessment of urban air quality from Twitter communication using self-attention network and a multilayer classification model. Environ. Sci. Pollut. Res. 2023, 30, 10414–10425. [Google Scholar] [CrossRef] [PubMed]
Charitidis, P.; Spyromitros-Xioufis, E.; Papadopoulos, S.; Kompatsiaris, Y. Twitter-based sensing of city-level air quality. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Zagori, Greece, 10–12 June 2018; pp. 1–5. [Google Scholar]
Juanals, B.; Minel, J.L. An instrumented methodology to analyze and categorize information flows on twitter using nlp and deep learning: A use case on air quality. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, Limassol, Cyprus, 29–31 October 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 315–322. [Google Scholar]
Pramanik, P.; Mondal, T.; Nandi, S.; Saha, M. AirCalypse: Can Twitter help in urban air quality measurement and who are the influential users? In Proceedings of the Companion Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 540–545. [Google Scholar]
Survey of India. Online Maps Portal. 2025. Available online: https://onlinemaps.surveyofindia.gov.in/ (accessed on 20 August 2025).
QGIS Development Team. QGIS Geographic Information System. 2025. Available online: https://www.qgis.org (accessed on 20 August 2025).
Verma, R.L.; Kamyotra, J.S. Impacts of COVID-19 on air quality in India. Aerosol Air Qual. Res. 2021, 21, 200482. [Google Scholar] [CrossRef]
Malakar, K.; Majumder, P.; Lu, C. Twitterati on COVID-19 pandemic-environment linkage: Insights from mining one year of tweets. Environ. Dev. 2023, 46, 100835. [Google Scholar] [CrossRef]
Gurajala, S.; Dhaniyala, S.; Matthews, J.N. Understanding public response to air quality using tweet analysis. Soc. Media Soc. 2019, 5, 2056305119867656. [Google Scholar] [CrossRef]
Liu, Q.; Cui, B.; Liu, Z. Air quality class prediction using machine learning methods based on monitoring data and secondary modeling. Atmosphere 2024, 15, 553. [Google Scholar] [CrossRef]
Rad, A.K.; Nematollahi, M.J.; Pak, A.; Mahmoudi, M. Predictive modeling of air quality in the Tehran megacity via deep learning techniques. Sci. Rep. 2025, 15, 1367. [Google Scholar] [CrossRef]
Bhatti, M.A.; Song, Z.; Bhatti, U.A.; Ahmad, N. Predicting the impact of change in air quality patterns due to COVID-19 lockdown policies in multiple urban cities of henan: A deep learning approach. Atmosphere 2023, 14, 902. [Google Scholar] [CrossRef]
Chen, W.; Tu, F.; Zheng, P. A transnational networked public sphere of air pollution: Analysis of a Twitter network of PM_2.5 from the risk society perspective. Inf. Commun. Soc. 2017, 20, 1005–1023. [Google Scholar] [CrossRef]
Ashayeri, M.; Piri, S.; Abbasabadi, N. Exploring US occupant perception toward indoor air quality via social media and NLP analysis. J. Environ. Sci. Public Health 2024, 8, 49–58. [Google Scholar] [CrossRef]
Amangeldi, D.; Usmanova, A.; Shamoi, P. Understanding environmental posts: Sentiment and emotion analysis of social media data. IEEE Access 2024, 12, 33504–33523. [Google Scholar] [CrossRef]
Bañuelos-Gimeno, J.; Sobrino, N.; Arce-Ruiz, R.M. Effects of mobility restrictions on air pollution in the Madrid region during the COVID-19 pandemic and post-pandemic periods. Sustainability 2023, 15, 12702. [Google Scholar] [CrossRef]
Khan, W.A.; Sharif, F.; Khokhar, M.F.; Shahzad, L.; Ehsan, N.; Jahanzaib, M. Monitoring of ambient air quality patterns and assessment of air pollutants’ correlation and effects on ambient air quality of Lahore, Pakistan. Atmosphere 2023, 14, 1257. [Google Scholar] [CrossRef]
Zhang, Q.; Zhang, Q.; Liu, H.; Lu, M. The impact of COVID-19 lockdown on ambient air quality in Shanghai, 2022. Atmosphere 2023, 14, 898. [Google Scholar] [CrossRef]

Figure 1. US Embassy Reference Grade Sensor location (RK Puram) and New Delhi Boundary [29,30].

Figure 2. Data Collection Process for the Tweets.

Figure 3. Behavioural Patterns of Temporal Features and

{PM}_{2.5}

Concentration in the Pre-COVID-19 Period.

Figure 3. Behavioural Patterns of Temporal Features and

{PM}_{2.5}

Concentration in the Pre-COVID-19 Period.

Figure 4. Behavioural Patterns of User-Specific Features & Raw

{PM}_{2.5}

Concentration in Pre-COVID-19 Scenario.

Figure 4. Behavioural Patterns of User-Specific Features & Raw

{PM}_{2.5}

Concentration in Pre-COVID-19 Scenario.

Figure 5. Daily Retweet Rate on Air Quality at Pre-COVID-19 Timeline.

Figure 6. Daily Raw Concentration at Pre-COVID-19 Timeline.

Figure 7. Daily Tweet-Re-tweet Match on Air Quality at Pre-COVID-19 Timeline.

Figure 8. Behavioural Patterns of Temporal Features & Raw

{PM}_{2.5}

Concentration in COVID-19 Scenario.

Figure 8. Behavioural Patterns of Temporal Features & Raw

{PM}_{2.5}

Concentration in COVID-19 Scenario.

Figure 9. Daily Distribution of Tweets for April 2020.

Figure 10. Behavioural Patterns of User Features & Raw

{PM}_{2.5}

Concentration in COVID-19 Scenario.

Figure 10. Behavioural Patterns of User Features & Raw

{PM}_{2.5}

Concentration in COVID-19 Scenario.

Figure 11. Retweet rates and tweet-retweet matches were observed in the COVID-19 timeline.

Table 1. Pollution Tweet Frequency Posted at Pre-COVID-19 Time &

{PM}_{2.5}

Raw Concentration.

Table 1. Pollution Tweet Frequency Posted at Pre-COVID-19 Time &

{PM}_{2.5}

Raw Concentration.

Timeline	Tweet Frequency (Tweet Count)	${PM}_{2.5}$ Concentration ( $μ$ ${g/m}^{3}$ )
March 2019	7939	71.23
April 2019	11,782	75.21
May 2019	13,437	92.21
June 2019	11,605	57.00
July 2019	15,129	44.00
August 2019	14,983	31.53
September 2019	50,812	36.62
October 2019	83,298	111.63
November 2019	596,767	204.60
December 2019	69,446	208.00
January 2020	41,897	153.13
February 2020	50,690	118.65

Table 2. Categories of Intent-based Tweets Posted in Pre-COVID-19 Scenario.

Category	Tweets
Complaint	“society doing nothing about pollution, which is shortening life, but is taking away”
	Air Quality Index in Delhi is more than 300. Seems todays Crackers are emitting more Oxygen that’s why No Koharam by Liberals
	Just imagine the negative impact of air pollution on learning for children in Delhi or Islamabad this result implies.
	Why is no one complaining of air pollution? The AQI in Delhi at my place is more than 300. Is it only a Deepawali centric
	Sir why do you conduct a match in Delhi this season and make fun of India about pollution, can’t you see.
Praise	Saw a Protest done by Mr.Vijay Goel against Delhi Govt. He is the only Politician in BJP Delhi unit who Protest
	This is a GOOD MOVE. This should be rolled out NATION WIDE. Let’s make INDIA safer and cleaner.
	LET’S SEGREGATE OUR WASTE! We can help. Visit: url It can help reduce contribution to a.
	Pedestrianised Karol Bagh road has much cleaner air: CSE
	GGF distributed pollution masks among the kids of Delhi School from his own expenses.
Suggestions	Delhi pollution board has advised people not to go out for morning walks.
	Little kids, heart patients and others should not get carried away by false sense of security of clean air after
	Delhi Needs a 65% Cut in Pollution Levels, Says new CSE Analysis
	India requires immediate action to control Air Pollution. And AONE would like to contribute by encouraging people
	The economic cost of premature death and disability resulting in the loss of income due to #airpollution in India is estimate

Table 3. Verified and Non-verified User Posts vs.

{PM}_{2.5}

Concentration before COVID-19.

Table 3. Verified and Non-verified User Posts vs.

{PM}_{2.5}

Concentration before COVID-19.

Timeline	Verified User Posts (Count)	${PM}_{2.5}$ Concentration ( $μ$ ${g/m}^{3}$ )	Non-Verified User Posts (Count)
March 2019	215	71.23	7724
April 2019	336	75.21	11,445
May 2019	361	92.21	13,075
June 2019	329	57.00	11,275
July 2019	500	44.00	14,628
August 2019	433	31.53	14,549
September 2019	1588	36.62	49,223
October 2019	3495	111.63	79,802
November 2019	14,939	204.60	581,827
December 2019	2264	208.00	67,181
January 2020	956	153.13	40,940
February 2020	1280	118.65	49,409

Table 4. Few Mostly Retweeted Pollution Tweets every month in the Pre-COVID-19 Timeline.

Month	Few Top Re-Tweeted Tweets
April 2019	The last time Delhi was on Top of a table, I was reading Air Pollution data. #RRvDC
April 2019	Air pollution has cut the average lifespan of a South Asian child by two-and-a-half years url
May 2019	Could India say goodbye to air pollution? url
May 2019	Air pollution in India is a problem 2013 if exposed, it can be as bad as smoking 50 cigarettes per day. To help fight pollution.
June 2019	Life expectancy in India down by 2.6 yrs due to air pollution: Study url
June 2019	Air pollution kills 100,000 children under five in India each year url
July 2019	India air pollution: Will Gujarat’s ’cap and trade’ programme work? url
July 2019	This is what the future looks like. @nilanjanaroy on New Delhi’s blistering heat and pollution url
August 2019	India air pollution: Will Gujarat’s ‘cap and trade’ programme work? url
August 2019	This is what the future looks like. @nilanjanaroy on New Delhi’s blistering heat and pollution url
September 2019	According to the World Bank, India lost over 8.5% of its GDP in 2013 due to air pollution url
September 2019	Air pollution in Delhi drops 25% in four years: How Arvind Kejriwal cleared the air url
October 2019	Delhi is the most polluted city in the world today, says Air quality report url
	Delhi: Air Quality Index (AQI) at 306 (very poor) in Lodhi Road area. url
	A lot of people here are rightly demanding immediate closure of schools in Noida/Delhi. Pollution level dangerously high!
November 2019	India air pollution at ‘unbearable levels’, Delhi minister says url
	‘Gas chamber’: Pollution hits record high in India’s New Delhi url
	India declares a pollution health emergency as official compares Delhi’s air to a GAS CHAMBER 2019 url
December 2019	pollution or protest, it has to reach Delhi, else it doesn’t seem to matter.
	Delhi’s air quality turns “severe” as toxic haze lingers. url
	India Should Take Urgent Action To Tackle Air Pollution: UN Body Official url
January 2020	Are any of the climate change hysterical, protesting in India? Looks like Ground Zero to me.
	Eastern expressway cut air pollution in Delhi by 7%: CRRI study url
	Give maximum funds to fight pollution in Delhi: CM @ArvindKejriwal to Centre. url
February 2020	Delhi CM Arvind Kejriwal asks AAP volunteers not to burst crackers during victory celebrations to prevent pollution
	Twenty-one of the world’s 30 cities with the worst air pollution are in India, with six in the top ten. url
	Did you know 7 million premature deaths annually are linked to air pollution? url

Table 5. Pollution Tweet Frequency Posted at COVID-19 Time &

{PM}_{2.5}

Raw Concentration.

Table 5. Pollution Tweet Frequency Posted at COVID-19 Time &

{PM}_{2.5}

Raw Concentration.

Timeline	Tweet Frequency (Tweet Count)	${PM}_{2.5}$ Concentration ( $μ$ ${g/m}^{3}$ )
March 2020	26,342	52.95
April 2020	114,616	40.83
May 2020	12,508	50.52
June 2020	12,974	42.24
July 2020	7408	30.28
August 2020	29,722	21.28
September 2020	3304	44.03

Table 6. Sample of Tweets Categorised by Intent During 9–11 April 2020 (COVID-19 Period).

Tweet No.	Intent	Tweet
1	Suggestion	@NYT now that we have seen the potential for blue sky in Delhi and elsewhere, can we add carbon back strategical url
2	Suggestion	RT @Greenpeace: “Pollution is going down, but we cannot let the suffering of so many human beings be the way to clean the air url
3	Complaint	@rachbarnhart Air quality in New Delhi sometime will go as bad as #500. Now it is #38!!
4	Complaint	I did not realise how polluted the air is in India. The bad polluted air backs up along the Himalaya Mountain Range 2026 url
5	Praise	Amazing difference in air #pollution occurring in India. Once the coronavirus pandemic is past there will be a lot url.
6	Praise	This has been a fascinating aspect/byproduct of the pandemic. Amazing how quickly the air quality is impacted by tr. url.

Table 7. Verified and Non-verified User Posts vs.

{PM}_{2.5}

Concentration during COVID-19.

Table 7. Verified and Non-verified User Posts vs.

{PM}_{2.5}

Concentration during COVID-19.

Timeline	Verified User Posts (Count)	${PM}_{2.5}$ Concentration ( $μ$ ${g/m}^{3}$ )	Non-Verified User Posts (Count)
March 2020	607	52.95	25,735
April 2020	1948	40.83	112,668
May 2020	278	50.52	12,230
June 2020	279	42.24	12,695
July 2020	209	30.28	7199
August 2020	845	21.28	28,877
September 2020	73	44.03	3231

Table 8. Top Retweeted Pollution Tweets on Monthly Basis in COVID-19 Timeline.

Month	Top Retweeted Tweets
April 2020	People in India can see the Himalayas for first time in ‘decades,’ as lockdown reduces air pollution url.
	Finally delhi air quality is better than mumbai
	Delhi records a 63% reduction in nitrogen oxide poisonous gas url.
May 2020	Wow now if only this realisation could stay in pps minds n we really work to lower the pollution we create
May 2020	Amit Shah Logic: The use of fossil fuels will reduce air pollution. url.
June 2020	Mumbai Startup Creates Carbon Tiles Out of Polluted Air. Brilliant! url.
June 2020	Delhi govt. to provide free sewer connections to reduce pollution in the Yamuna url.
July 2020	Delhi govt will work with neighbouring states to deal with air pollution problem in winters. url.
July 2020	Delhi air quality is GOOD status today! Never ever before. #AQI url.
August 2020	Ditching fossil fuels would pay for itself through clean air alone. #CleanAirIsAHumanRight url.
August 2020	This Electric Vehicle Policy is the country’s most progressive policy: Delhi CM Arvind Kejriwal
September 2020	Delhi Air Pollution: SC Directs Completion Of Smog Towers Construction In 10 Months [Read Order] url.
September 2020	Rain, Wind Speed, #COVID19 Curbs Help Delhi Breathe Clean: Pollution Body url.

Table 9. Comparison of temporal and user-defined features in pre-COVID-19 and COVID-19 tweets on air quality in India.

Metric	Pre-COVID-19	COVID-19	% Change
Number of Tweets	967,774	206,874	−78.62%
Posts by Verified Users	26,696	4239	−84.12%
Posts by Unverified Users	941,078	202,635	−78.46%
Retweet Count	744,995	147,457	−80.20%
Recurrence Rate	70,971	61,978	−12.67%
Users’ Intent Analysis	Complaints, Suggestions, Praises	Suggestions, Praises	–
Average Tweet Length (chars)	152	128	−9.40%
Top Relevant Keywords	#DelhiPollution, #OddEven, #DelhiAirEmergency	#KnowYourAir, #coronavirus, #COVID19	–
Peak Activity Day	17 November 2019	10 April 2020	–
Peak Activity Month	November	April	–

Table 10. Summary statistics of

{PM}_{2.5}

concentrations in pre-COVID-19 and COVID-19 timelines.

Table 10. Summary statistics of

{PM}_{2.5}

concentrations in pre-COVID-19 and COVID-19 timelines.

Timeline	Mean	SD	Min	25%	Median	75%	Max
Pre-COVID	100.38	61.24	31.53	53.75	83.71	127.27	208.73
COVID	40.31	11.14	21.29	35.56	42.24	47.28	52.96
% Change	−59.84%	−81.81%	−32.49%	−33.85%	−49.54%	−73.36%	−74.62%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pramanik, P.; Mondal, T.; Arosh, S.; Saha, M. AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19. Sustainability 2025, 17, 8924. https://doi.org/10.3390/su17198924

AMA Style

Pramanik P, Mondal T, Arosh S, Saha M. AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19. Sustainability. 2025; 17(19):8924. https://doi.org/10.3390/su17198924

Chicago/Turabian Style

Pramanik, Prithviraj, Tamal Mondal, Sirshendu Arosh, and Mousumi Saha. 2025. "AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19" Sustainability 17, no. 19: 8924. https://doi.org/10.3390/su17198924

APA Style

Pramanik, P., Mondal, T., Arosh, S., & Saha, M. (2025). AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19. Sustainability, 17(19), 8924. https://doi.org/10.3390/su17198924

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19

Abstract

1. Introduction

2. Literature Survey

3. Material & Methods

Feature Analysis in Pre-COVID-19 & COVID-19 Scenario

4. Results

4.1. Evaluating Feature Significance—Pre-COVID-19 Scenario

4.2. Evaluating Feature Significance—COVID-19 Scenario

5. Discussions

5.1. Metadata-Based Attribute Reliance in Pre-COVID-19 Situation

5.2. Metadata-Based Attribute Reliance in COVID-19 Situation

6. Conclusions & Future Research

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI