Estimation of Ground PM2.5 Concentrations in Pakistan Using Convolutional Neural Network and Multi-Pollutant Satellite Images

: During the last few decades, worsening air quality has been diagnosed in many cities around the world. The accurately prediction of air pollutants, particularly, particulate matter 2.5 (PM2.5) is extremely important for environmental management. A Convolutional Neural Network (CNN) P-CNN model is presented in this paper, which uses seven different pollutant satellite images, such as Aerosol index (AER AI), Methane (CH 4 ), Carbon monoxide (CO), Formaldehyde (HCHO), Nitrogen dioxide (NO 2 ), Ozone (O 3 ) and Sulfur dioxide (SO 2 ), as auxiliary variables to estimate daily average PM2.5 concentrations. This study estimates daily average of PM2.5 concentrations in various cities of Pakistan (Islamabad, Lahore, Peshawar and Karachi) by using satellite images. The dataset contains a total of 2562 images from May-2019 to April-2020. We compare and analyze AlexNet, VGG16, ResNet50 and P-CNN model on every dataset. The accuracy of machine learning models was checked with Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). The results show that P-CNN is more accurate than other approaches in estimating PM2.5 concentrations from satellite images. This study presents robust model using satellite images, useful for estimating PM2.5 concentrations.


Introduction
Particulate matter of a diameter of 2.5 µm (PM2.5) is hazardous for human health, leading to further damage and the destruction of lung function [1][2][3][4][5][6]. These fine particles are extremely dangerous if they get into the lungs, which might complement the seriousness of COVID-19 infection, and increases the chances of attacks and damage to the respiratory system [7]. Overall, these hazardous pollutants impact human health and produce life-threatening complications in a short period if found in the atmosphere in large concentrations [8]. The research has proven that these particulate matters can potentially affect humans at the genetic level [9].
Various methods have been presented to better explain city-wide air quality, for example, the recent Neighbor legislation and spatial averaging [10,11], to make the most of the limited data gathered by monitoring stations using spatial interpolation. The data sparsity problem is solved by adding monitoring data in most of these systems, which are based on the assumption that air pollution particles diffuse in a spatially continuous manner. However, there are two significant drawbacks of these methods. First, different estimation approaches obtain completely different results. Second, the differences in results are particularly unsatisfactory for raw data with sparse spatial distribution. The air quality detecting network has been optimized by various researchers [12]. For instance, Mei et al. [13] suggest a method to monitor air quality utilizing mobile data. Crowdsourcing computing, This study uses satellite images and employs a novel deep learning-based method for PM2.5 predictions. This technique, such as prediction from satellite images, is not limited by locations and can be suitable to detect air quality at any location. This study uses seven satellite images (AER AI, CH4, CO, HCHO, NO2, O3 and SO2) collected by high resolution sensors (TROPOMI) from the sentinel-5p satellite. The method that we used in this study differs from existing methods. It estimates the daily average of PM2.5 concentration using satellite images collected by the TROPOMI sensor of sentinel-5p satellite every day. It can address the weaknesses of present air quality detection technologies and offer fine-grained, low-cost air quality monitoring. The proposed technique can estimate the AQI directly, which is broader and better reflects the air quality. The air quality index (AQI) is a daily indicator that measures the quality of the air at a certain location. It is a way to measure how air pollution affects a person's health during a short period of time (less than 24 h). In short, this study investigates the relationship between PM2.5 concentrations and the concentrations of various pollutants based on satellite images. P-CNN recognizes and extracts patterns and features from input images, and it estimates the daily average of PM2.5 concentrations from these images. This study used four datasets covering Islamabad, Karachi, Lahore and Peshawar city, each dataset contains seven pollutants' images for each day. This paper proposes a deep convolutional neural network model to estimate PM2.5 concentrations from seven given input images. In addition, we also conducted comparative analysis of our proposed model with other three deep learning models on four datasets for more robust results.
This paper is structured in the following way. The second section introduces the study area, datasets and methodology. The third section presents result and discussion of the study, followed by the conclusion and implications in the last section.

Study Area and Dataset
The study area we have chosen in this paper is Pakistan. We have taken four metropolitan cities for our experiments such as Karachi, Lahore, Islamabad and Peshawar.  There is no openly available library to estimate PM2.5 concentrations from satellite images; therefore, based on sentinel-5p satellite, a multi-input air quality image database was built for each city (Islamabad, Lahore, Peshawar and Karachi). The library contains

•
We collected scene images for each city from the official website [34] from May-2019 to April-2020. Each day contains seven different pollutant images (AER AI, CH 4 , CO, HCHO, NO 2 , O 3 and SO 2 ). Table 1 describes the information about the air quality image collection point. One Image cannot cover the concentration of various gases; therefore, each sample is described by taking at least seven satellite images in our research work. The standard single-input CNN architecture is not suitable for our research. Thus, a novel P-CNN model was built to accept seven images as input.  There is no openly available library to estimate PM2.5 concentrations from satellite images; therefore, based on sentinel-5p satellite, a multi-input air quality image database was built for each city (Islamabad, Lahore, Peshawar and Karachi). The library contains 2562 images with different PM2.5 levels, which are a collection of scene satellite images at different PM2.5 levels. We used the following steps to create the dataset:

•
We collected scene images for each city from the official website [34] from May-2019  to April-2020. Each day contains seven different pollutant images (AER AI, CH4, CO,  HCHO, NO2, O3 and SO2). Table 1 describes the information about the air quality image collection point. One Image cannot cover the concentration of various gases; therefore, each sample is described by taking at least seven satellite images in our research work. The standard single-input CNN architecture is not suitable for our research. Thus, a novel P-CNN model was built to accept seven images as input.  Figure 2 shows the actual satellite images of seven air pollutants with different PM2.5 air quality levels in the image library. Figure 2, such as from A to D shows different days, while I, II, III, IV, V, VI, and VII are seven different pollutant images by sentinel-5p satellite for same day. I represents concentration of AER AI pollutant in single day, while II illustrates CH4 pollutant concentration for same day. III number image is about CO concentration. IV image is about HCHO pollutant concentration. V, VI and VII images are examples of NO, O3, and SO2, respectively.  Real-time monitoring stations across main cities of Pakistan, such as Islamabad, Lahore, Karachi and Peshawar, measure air quality levels then upload them on the website for the open access. Figure 1 shows the location of the monitoring stations. PM2.5 hourly real-time data were obtained from the official website [35]. Since PM2.5 concentration data are measured hourly by the monitoring stations for each city, we converted the 24-hour data into a daily average to train our model. For the model training, 70% of the images were randomly selected for training and 30% for testing purposes. Furthermore, to prevent (1) Randomly Image Rotation between [0, 360] degrees.

Convolutional Neural Network (CNN)
CNN, firstly proposed by LeCun et al. [36] for recognition of handwritten digits, has been widely successful in the areas of image detection, segmentation, and identification tasks [37][38][39][40][41][42]. CNN has shown its remarkable capacity to classify large-scale images. It consists of three-layers: convolutional layers, pooling layers and fully connected layers. The essential layers in CNN are the convolutional and pooling layers. The convolution layers are used to extract features with numerous filters by convolving image regions. As the layers expand, the CNN gradually understands the image. The pooling layers lower the dimensions of output maps from the convolutional layers and avoid overfitting. The number of neurons, parameters and connections in the CNN model is substantially less through these two levels. Thus, CNNs are much more effective than Backpropagation (BP) neural networks with correspondingly sized layers.

Architecture of P-CNN
Based on the standard CNN architecture, we have proposed a model named P-CNN. The model is employed to estimate PM2.5 concentrations and acquire a preferable result on the dataset. Figure 3 shows the entire model of CNN architecture.

Forward Propagation
Data are transmitted from the input layer to the output layer by a sequence of operations that include convolution, pooling and fully connected. Each convolutional layer employs trainable kernels in order to filter the results of the preceding layer followed by activation function to build the output feature map.
In a general way, the procedure is as follows: where denotes the collection of input maps we choose. is the bias that is applied to all output map. indicates the kernels, the weight of the row " " and column " " in each kernel is represented by the . Using a kernel map, the outputs of surrounding neurons are summarized by the pooling layer, which is the operation of the pooling layer. The convolutional layers C1-C7 filter seven 300 × 300 × 3 input images with 32 kernels of size 4 × 4 × 3 with the stride of 1 pixel. The stride of pooling layers S1-S7 is 2 pixels. C8-C14 filter with 16 kernels of size 4 × 4 × 3 with the stride of 1 pixel. The stride of pooling layers S8-S14 is 2 pixels, and the dropout is applied to the output of S8-S14, which has been flattened (E1-E4). D1 is the concatenation of the previous flattened E1-E4. The fully connected layer FC1 has ten neurons, FC2 has ten neurons, and FC3 has one. The activation of the output layer is a linear function.
A high-level neural networks API called "Keras" is used to implement the model [16]. All of the experiments were carried out on an Ubuntu Kylin 14.04 server equipped with a 3.40 GHz i7-3770 CPU (16 GB RAM) and a GTX 1070 graphics card (8 GB memory). The original image has a resolution of 3310 × 1575 pixels, which needs be lowered in order to Remote Sens. 2022, 14, 1735 6 of 17 fit into the GPU memory. All of the original images are scaled to 300 × 300 pixels, and then the value of per-pixel is divided by 255. In addition, images should be normalized and standardized before being fed into model in order to achieve rapid convergence. A randomization process is used to ensure that the model is not influenced by the sequence in which photographs are input. Both the sequence of samples and the seven images corresponding to each sample should be randomized. The convolutional neural network training procedure is divided into two steps. The first is called forward propagation, and the second is called backward propagation.

Forward Propagation
Data are transmitted from the input layer to the output layer by a sequence of operations that include convolution, pooling and fully connected. Each convolutional layer employs trainable kernels in order to filter the results of the preceding layer followed by activation function to build the output feature map.
In a general way, the procedure is as follows: where M j denotes the collection of input maps we choose. b is the bias that is applied to all output map. k indicates the kernels, the weight of the row "i" and column "j" in each kernel is represented by the k e ij . Using a kernel map, the outputs of surrounding neurons are summarized by the pooling layer, which is the operation of the pooling layer.
where β denotes multiplicative bias and b indicates additive bias, "down" is a subsampling function that uses the max-pooling algorithm [43]. The reason why we chose max-pooling over mean pooling is that the latter makes it impossible to identify critical information such as the edges of objects, whereas the former selects the most active neuron of each region in feature maps, which is more efficient [44]. As a result, it is easier to extract useful features when using max-pooling. In a multilayer perceptron, the fully connected layer is equivalent to the hidden layer. The activation function "linear" for output layer was employed for regression [45], which is given below by Any constant value can be for variable "a". A derivative of f(x) in this case is not zero, but is equal to the constant employed. Notably, the gradient does not equal zero, but rather a constant number that is independent of the input value x, which indicates that the weights and biases will be updated throughout the backpropagation phase, despite the fact that the updating factor will remain the same.

Backward Propagation
Backward propagation adjusts parameters by using stochastic gradient descent (SGD) in order to reduce the disparity between the anticipated outcome and the actual outcome. For the purpose of avoiding overfitting, L 1 and L 2 regularization is used.
where C 0 represents loss in the formula (4). The formula for L 2 is given by below This paper uses a weight of 0.0001 for L1 and L2 regularization. Dropout is also used to prevent overfitting [46], and its value is set to 0.1. The SGD algorithm calculates the gradients and modifies the coefficients or weights. It can be stated in the following way: where δ x denotes the sensitivities of each unit to fluctuations of the bias b, and • represents the element-wise multiplication. An upsampling procedure is represented by the up(), and subsampling operation is represented by the down(). The updated weight is denoted by w, and n represents the learning rate.

Evaluation Metrics
The following evaluation measures, Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), were employed in this work to complete the quantitative assessment of the constructed P-CNN model's capabilities. MAE is a model assessment statistic that is commonly employed in regression models. It is a metric for estimating the average discrepancy between estimates and actual results. It is used to estimate the machine learning model's accuracy.
The Root Mean Square Error (RMSE) is a commonly used metric for determining how well a model predicts quantitative data. Here, RMSE calculates the error between actual (station value) and predicted value (model's predicted value).
MAPE means absolute percentage error and is a statistical indicator used for prediction. The "accuracy" of this measurement is expressed as a percentage. It is possible to determine for each period the average absolute percent error, which is deducted from the actual numbers, and then the outcome is divided by actual values. However, the larger the concentration, the bigger the absolute inaccuracy in the forecast. As a result, we anticipate that the MAPE will be able to offer the most accurate forecasts among models.

Results
AlexNet, VGG16, ResNet50 and P-CNN were all evaluated for their prediction abilities using three different indicators. They are Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Table 2 displays MAE results for Lahore, Karachi, Peshawar and Islamabad after applying different machine learning models. When we applied AlexNet on the datasets, a 34.464 average value was achieved, which was reduced 5.113 using ResNet50. VGG16 also decreased the 7.723 MAE value after ResNet50. After applying the P-CNN model on the datasets, 6.475 MAE reduced, and its average value for each city was calculated as 15.152, which is a really good result. average value for all cities after using the AlexNet model, we achieved 43.932. After employing ResNet50, the 7.373 MAPE value decreased. VGG16 also decreased the 11.990 MAPE value. Lastly, P-CNN reduced 9.403 MAPE after VGG16, and its average value for all cities was 15.167. All of these metrics show that P-CNN is superior to other models.   Figure 5 shows the difference between the actual and predicted values after applying the same models to Lahore city. The graphs clearly demonstrate that P-CNN (d) outperformed the other models. AlexNet (d) achieved the lowest MAE, RMSE and MAPE (14.205, 20.835 and 12.394). The actual and estimated outcomes for Peshawar city are depicted in Figure 6. The graph clearly demonstrates that the P-CNN (d) estimated values more accurate than AlexNet (a), Res-Net50 (c) and VGG16 (b). In addition, performance metrics revealed too that P-CNN (d) outperformed all other models.   Consequently, to conduct further testing efficiency of our developed model, P-CNN, we trained a model on one city dataset, and tested on all remaining cities. After training the model on Islamabad, as seen in Figure 8, it can be used to predict PM2.5 concentrations in a number of different cities, such as Karachi, Lahore and Peshawar. Figure 8 clearly demonstrates that the P-CNN predicted values for Karachi, Lahore and Peshawar are extremely close to the real values. The proposed model for predicting PM2.5 concentrations was also trained on a dataset from Karachi and evaluated on datasets from other cities such as Lahore, Peshawar and Islamabad (as shown in Figure 9). The results indicated that the model, which was trained on the Karachi dataset, can be applied to Lahore, Peshawar and Islamabad. It was also found that training a model with Lahore data, can accurately predict PM2.5 concentrations for other cities such as Islamabad, Karachi and Peshawar (see Figure 10). According to Figure 11, using Peshawar as a training dataset, our model is able to predict the concentrations of PM2.5 in other cities such as Islamabad, Lahore and Karachi. These results proved that our proposed P-CNN model also can be applied to other cities after being trained on a single city. Overall, these results demonstrate that P-CNN model is useful in predicting PM2.5 concentrations with satellite images. Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 18  Consequently, to conduct further testing efficiency of our developed model, P-CNN, we trained a model on one city dataset, and tested on all remaining cities. After training the model on Islamabad, as seen in Figure 8, it can be used to predict PM2.5 concentrations in a number of different cities, such as Karachi, Lahore and Peshawar. Figure 8 clearly demonstrates that the P-CNN predicted values for Karachi, Lahore and Peshawar are extremely close to the real values. The proposed model for predicting PM2.5 concentrations was also trained on a dataset from Karachi and evaluated on datasets from other cities such as Lahore, Peshawar and Islamabad (as shown in Figure 9). The results indicated that the model, which was trained on the Karachi dataset, can be applied to Lahore, Peshawar and Islamabad. It was also found that training a model with Lahore data, can accurately predict PM2.5 concentrations for other cities such as Islamabad, Karachi and Peshawar (see Figure 10). According to Figure 11, using Peshawar as a training dataset, our model is able to predict the concentrations of PM2.5 in other cities such as Islamabad, Lahore and Karachi. These results proved that our proposed P-CNN model also can be applied to other cities after being trained on a single city. Overall, these results demonstrate that P-CNN model is useful in predicting PM2.5 concentrations with satellite images.

Discussion
This study adopts seven inputs to estimate PM2.5 concentrations in four cities, namely, Pakistan, Islamabad, Lahore, Karachi and Peshawar. The findings revealed that seven input pollutants (AER AI, CH4, CO, HCHO, NO2, O3 and SO2) are closely linked with PM2.5. The existing studies have used different approaches for PM2.5 estimation. Li et al. [47] uses transmission and depth matrices to estimate haze levels. As a proxy for PM2.5, two datasets were utilized for the evaluation. The authors used 8761 photographs in the PM2.5 datasets, and the stated Absolute Spearman correlation is 40.83%. PM2.5's dataset contains three classes: HeavyHaze, LightHaze and NonHaze and the stated correlation is 89.05%. Zhang et al. [48] proposed deep learning method to classify the camera images according to AQI-levels; there were six classes: good, moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy and Hazardous. The applied method was tested on the dataset and achieved 74.0% accuracy. Both these studies have developed deep learning models for classification purpose; however, we proposed a novel P-CNN approach, which uses seven auxiliary input satellite images and estimates actual real number, PM2.5 concentrations. Estimating PM2.5 concentrations differs from classifying, segmenting or recognizing objects based on attributes such as color or texture. We tested P-CNN model on four different datasets using statistics metrics. We achieved satisfactory values of MAE (15.152), RMSE (19.557) and MAPE (15.167) using P-CNN model. Furthermore, in estimating PM2.5 concentrations, the results showed that the P-CNN method provides better results. For instance, the advantage of using this model helps to cover remote areas for estimating air quality.
There are various reasons that compared to Islamabad and Peshawar, the air quality in Lahore and Karachi is far worse. Peshawar and Islamabad are smaller and less populated than Lahore and Karachi city. Islamabad and Peshawar city have less public transit

Discussion
This study adopts seven inputs to estimate PM2.5 concentrations in four cities, namely, Pakistan, Islamabad, Lahore, Karachi and Peshawar. The findings revealed that seven input pollutants (AER AI, CH 4 , CO, HCHO, NO 2 , O 3 and SO 2 ) are closely linked with PM2.5. The existing studies have used different approaches for PM2.5 estimation. Li et al. [47] uses transmission and depth matrices to estimate haze levels. As a proxy for PM2.5, two datasets were utilized for the evaluation. The authors used 8761 photographs in the PM2.5 datasets, and the stated Absolute Spearman correlation is 40.83%. PM2.5's dataset contains three classes: HeavyHaze, LightHaze and NonHaze and the stated correlation is 89.05%. Zhang et al. [48] proposed deep learning method to classify the camera images according to AQI-levels; there were six classes: good, moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy and Hazardous. The applied method was tested on the dataset and achieved 74.0% accuracy. Both these studies have developed deep learning models for classification purpose; however, we proposed a novel P-CNN approach, which uses seven auxiliary input satellite images and estimates actual real number, PM2.5 concentrations. Estimating PM2.5 concentrations differs from classifying, segmenting or recognizing objects based on attributes such as color or texture. We tested P-CNN model on four different datasets using statistics metrics. We achieved satisfactory values of MAE (15.152), RMSE (19.557) and MAPE (15.167) using P-CNN model. Furthermore, in estimating PM2.5 concentrations, the results showed that the P-CNN method provides better results. For instance, the advantage of using this model helps to cover remote areas for estimating air quality.
There are various reasons that compared to Islamabad and Peshawar, the air quality in Lahore and Karachi is far worse. Peshawar and Islamabad are smaller and less populated than Lahore and Karachi city. Islamabad and Peshawar city have less public transit than Lahore and Karachi. The number of industries and construction sites are also less in Islam-abad and Peshawar. Lahore and Karachi have a greater ratio of growing urbanization than Peshawar and Islamabad. On the other hand, Lahore is one of second-largest metropolitan city of Pakistan, with a population of 11 million residents, and has topped the daily rankings of the world's most polluted cities for the second time this year. Tree cover in Lahore has declined significantly over the previous 15 years as a result of an ambitious effort to develop highways, bridges and tunnels. Increasing population, industry, deplorable conditions of municipal utilities, and traffic congestion are the primary sources of air pollution in Karachi city. Furthermore, environmental issues have increased as a result of rapid urbanization such as sewage system inadequacies, overcrowding, inadequate transportation and uncontrolled growth, particularly in Karachi. Air pollution is also exacerbated by industrial pollutants, waste burning, house fires, and other particulates. However, it appears that neither the government nor environmental organizations are taking this matter seriously or responding quickly enough. Similarly, an increase in population accelerates agriculture and industrial production, resulting an increase in waste [49]. Government can help relevant industries by providing green credit funds for the eco-friendly environment, which helps the business community to accelerate green technology and research and development. Pakistan, being a developing economy, suffers huge losses due to environmental problems. During the period between 1999 and 2018, the country spent around USD 3.8 billion to fight against environmental issues in Karachi, Lahore and Peshawar [50]. The waterand land-based ecosystems are being demolished, and unplanned urban structure have damages environment badly. This implies that poor socioeconomic systems cause environmental degradation. Lahore city is the second metropolitan city in Pakistan, covering 2233 manufacturing firms [51]. Lahore is regarded as one of the most developed cities in socioeconomic perspectives. However, some factors, such as industrial waste, poor sanitation systems and lack of urban planning, are barriers to environmental quality. Compared to Karachi and Lahore city, Islamabad is a well-planned city, with the transportation and construction sectors having been developed. On the other hand, Peshawar city is also one of the important hubs in Pakistan. Urban sprawl, deforestation and the burning of contaminated fuel have proved to be the drivers of greenhouse gas emissions [52,53].
Overall, the poor socioeconomic status of these cities has prevented efforts to maintain the ecosystem. Poor infrastructure, dense population and dependency on traditional cook stoves can increase the CO 2 , PM2.5 and other greenhouse gas emissions. The findings of Mehmood et al. [53] revealed that most of the households in rural areas of Pakistan burn wood, straw, animal dung and crops for cooking purpose, indicating that the most of the households are dependent on contaminated fuels. Moreover, cooking practices with contaminated fuel have the direct association with PM2.5 concentrations [54]; thus, the government should promote clean energy, provide modern cook stoves and reduce fossil fuel consumption to mitigate PM2.5 and other greenhouse gas emissions in Pakistan.
All four cities (Lahore, Peshawar, Islamabad and Karachi) from 1 January to 31 December 2017, had PM2.5 concentrations above than the standard recommendation (10 mg/m 3 ). According to AQI rankings of the world's most polluted cities, Lahore was ranked at number six, while Karachi was ranked at number sixteen, with AQI levels of 170 and 155, respectively [55]. Most recently, Lahore ranked as world's most polluted city [56]. Hence, we need immediately the finest and most effective tools and methods to analyze, understand and estimate air quality properly. Our proposed deep learning model for estimating PM2.5 concentrations is efficient and cost saving. We do not need to deploy physical measurement tools in each city to calculate air quality. Using portable devices (laptops, mobiles, etc.), PM2.5 concentrations for any city can be estimated using our deep learning model. Pudasaini et al. [57] had proposed a model to estimate PM2.5 concentration from photographs. However, in order to estimate PM2.5 concentrations, we would need to travel to the site area and snap a picture of it using a mobile phone. However, in our method, we need only chose a city to predict PM2.5 concentrations on portable device anywhere. Thus, this study suggests a reliable and effective way of estimating PM2.5 concentrations.

Conclusions
This paper proposes a deep learning P-CNN model for PM2.5 concentrations. This model mainly uses deep convolutional neural networks to extract feature representation information related to PM2.5 in satellite images to estimate PM2.5 concentration levels. We also performed comparative analysis of our constructed model with other deep learning models such as AlexNet, VGG16 and ResNet50 on four different datasets (Karachi, Lahore, Peshawar and Islamabad). The study performed different measures to analyze the model's accuracy. In this regard, MAE, RMSE and MAPE were used as accuracy metrics. The experimental results demonstrated that the P-CNN model is more suitable for predicting PM2.5 concentrations than other models. The results confirmed that the PM2.5 concentrations our model predicts from satellite images are closely related with actual results. Any future research should focus on finding ways to make the model more accurate, as well as to focus on seasonal-wise PM 2.5 estimations. Although, the model provides better results, some limitations cannot be avoided. Based on available datasets, we used the samples between May-2019 to April-2020. This study focuses on four cities of Pakistan; future study should find large datasets and use more cities, which will give better results.