An Outlier Detection Study of Ozone in Kolkata India by the Classical Statistics, Statistical Process Control and Functional Data Analysis

: Air pollution is prevalent throughout the entire world due to the release of various gases such as NO x , PM, SO 2 , tropospheric ozone (O 3 ), etc. Ground-stage ozone is the predominant issue in smog and is the product of the interplay between sunlight and emissions. The destructive impact on the health of the populace might also still occur in cities with noticeably clean air and where ozone levels hardly ever exceed safe limits. Therefore, the ﬁndings of small variations in air quality and the technique of regulating air contamination are thought-provoking. The study employs various techniques to effectively observe and assess strategies for detecting and eliminating outliers in ozone emissions from pollution episodes. This technique helps to describe the sources and exceedance values and enhance the value of monitoring the data. In this study, the data have some missing observations. The method of imputation, the classical statistical technique, the statistical process control (SPC) technique, functional data analysis (FDA), and functional process control help to ﬁll in the data and detect outliers, trend deviations, and changes in ozone concentration at ground level. A comparison study is carried out using these three techniques: classical analysis, SPC, and FDA, and the results show how the statistical process control and functional data methods performed better than the classical technique for the detection of outliers and also in what way this methodology can enable an additional, comprehensive method of deﬁning air pollution control measures and water pollution control measures.


Introduction
Air pollution contributes to climate change by affecting humans, animals, and ecosystems and causing illnesses like pneumonia, lung cancer, and influenza. It also causes smog, aerosol formation, reduced eyesight, rising temperatures, acid rain, and early death. According to the 2011 India census, with an urban agglomeration comprising the city and its suburbs, Kolkata has a population of 4.5 million, making it the third most densely populated metropolitan area in the world. Researchers used a machine learning model to forecast air pollution and how it affects human health [1][2][3]. Majumdar et al. [4] projected emissions for 2030 in a business-as-usual case, revealing that existing measures and policies are insufficient to significantly reduce PM 2.5 emissions in Kolkata Metropolitan City by 2030. Understanding the geographical distribution of PM 2.5 , relative humidity, temperature, and wind speed is crucial for assessing air quality in metropolitan areas. This helps to understand the atmospheric conditions contributing to their dispersion. Various studies, comparable to air pollution analyses, have carried out various types of spatial interpolations of atmospheric variables, such as PM 2.5 and PM 10 , and their influence on humans [5][6][7]. Additionally, Refs. [8][9][10] investigated the effects of Kolkata's air quality before and after the lockdown. Imputation is the assignation by inference of a value to something from the value of goods or the value-filling process. In all forms of collected data, missing data is a common concern. Many ways of managing missing data exist. Full or accessible case analysis, the missing indicator method, and general mean imputation are the most basic and commonly used approaches [11]. Missing attribute values in incomplete datasets hinder data mining and machine learning efficiency. Deep learning techniques outperform previous methods for missing value imputation [12,13].
SPC is a process that aims to maintain quality features with minimal variability. Control of statistical processes is crucial for achieving reliability and improving capacity by reducing variability. It involves continuous monitoring of normal variation to identify deviations and adjust for disturbance removal. SPC is a technique for data collection, organization, analysis, and decision-making that determines the mean, lower control limit, and upper control limit (LCL and UCL, respectively). If the values fall, the process is not under control.
In order to apply them to vector problems, FDA has been developed. The FDA approach was inspired by the classical technique of data mining to cope with vectorial data treatment. The applications of FDA have also been used for environmental research [14][15][16][17][18][19][20], medical research [21], and the manufacturing sector [22]. This functional model provides two important features: First, the correlation of the data structure with time is taken into account, and second, comparisons are made with a view of the global problem. The application compares functional depth principal curves, a metric that reflects within a group of curves the centrality of a given curve. By generating a new functional sample, the model transforms the sample vectors to find functional outliers by adapting the principle of functional depth. In order to achieve an improved air quality management solution, the researcher compares the corresponding results with classical and functional approaches and obtains the most suitable methodology to evaluate the dataset. Torres et al. [23] identified a solution that uses functional data analysis to discover the outliers in urban gas emissions. Over time, the researchers considered gas emissions as curves, with outliers found by comparing curves rather than vectors. The method identified outliers in Oviedo's gas omissions using the functional depth principle and compared it with traditional vector comparison methods. A tool for outlier detection in water quality parameters was provided by Di Blasi et al. [24] using the variables of conductivity, turbidity, and ammonium. The methods were based on the consideration of the various parameters as a vector whose concentration values were components. A groundbreaking approach to monitoring water quality over time views the dataset as a time-dependent function, identifying outliers in samples from the Miño river basin based on functional depth in NW Spain. This technique helps identify trends in water quality over time. The approach of practical pattern recognition for the identification of fluvial valleys and topographic profiles of glaciers [25] uses a functional method for the classification of vector machines in order to determine process stability. Martinez et al. [26] addressed statistical process monitoring and control charts used for outlier detection methods. The one-class peeling (OCP) method provides statistical and machine learning techniques in multivariate data to identify multiple outliers. The one-class peeling approach is suitable for statistical process control in high-dimensional data sets with high outliers. However, the one-class peeling approach is more effective and stable in high dimensions. In the additional materials, examples of R commands and data sets determine the respective OCP distances and threshold availability. Garca-Nieto [27] studied the foraging efficacies of aerosol elements for removal devices, including congealing, heterogeneous nucleation, and gravitational subsidence, and examined the health effects of the aerosols on respirable dirt fractions. The scavenging equations were applied to three atmospheric situations: pure, cloudy, and city. The primary elimination mechanism for respirable aerosol was gravitational settling, which is nearly six times better than rainout. Imputation of missing data replaces missing data with values derived from variable distribution estimation. Just one approximation is used in one single imputation. Various projections are used in several imputations, representing the ambiguity in the Sustainability 2023, 15, 12790 3 of 13 distribution calculation. Both single and multiple imputations produce unbiased study correlation estimates under missing-at-random and absolutely-at-random circumstances. However, single imputation leads to small approximate standard errors, whereas multiple imputations can result in incorrect results [28]. In this research, the imputation technique was used to address missing data, ensuring a more complete dataset for analysis. Additionally, the traditional statistical technique, SQC, provided a well-established method for identifying outliers in the air pollution monitoring data. The functional approach, on the other hand, offered a novel perspective by considering the underlying functions and patterns in the data. By comparing and contrasting these three approaches, this research aimed to determine the most effective method for detecting outliers and providing valuable insights for enhancing local air quality measures.

Air Quality Monitoring Station, Kolkata, India
In urban areas, air pollution is very important for environmental health, especially in developing countries. Air pollution is a major concern in Indian cities, and it affects human health. In order to ensure pleasant breathing air in the future, Kolkata is one of the Indian cities that desperately needs intervention policies. More areas are being affected by air pollution from smog, industrial activity, etc. Moreover, ground-stage ozone is the predominant issue in smog and is the product of the interplay between sunlight and emissions. Kolkata has a number of air quality monitoring stations throughout the country that are part of the national program for monitoring ambient air quality. The air quality of Kolkata in relation to ozone (µg/m 3 ) is considered to be depreciating because measurement data indicate that it may in the future exceed the limit values and the national emissions ceiling.

Analysis Methodology
The systems can provide valuable insights into the quality of air, water, and soil in a given area. By analyzing these data, researchers and policymakers can make informed decisions to improve environmental conditions and protect public health. The expected value of the sample position and taking into account classical analysis, patterns, and differences between neighboring stations may be used to identify particular data values that are not usual. The pattern analysis in R-programming is an illustration of the expert structure of the data and the validation of an environmental parameter [29]. Analyzing data sets using mathematical models and statistical methods yields conclusions about a population without subjective interpretation or qualitative insights. In order to extract conclusions, the suggested approach requires using a large amount of data that already exists, with some incomplete findings. Today, automated analysis techniques are needed for the number of data stored in databases. The research methodology discussed here is oriented towards the discovery of information in databases (knowledge discovery database) (KDD) [30]. The KDD is a comprehensive data extraction procedure for preparing and analyzing results. It offers a clear and collaborative method for identifying design and model parameters for outlier identification, prediction, and classification. This research paper uses steps like imputation, classical analysis, SPC, and FDA.

Imputation
Imputation is a simple technique for missing observations, replacing each observation with a true value. Various imputation procedures fill in missing values using respondent data. For example, when the missing value is an average replacement for pollution characteristics like NO 2 , O 3 , and CO 2 , it is considered reasonable [31]. The method that is used for imputation is mean substitution. The mean value of the data replaces the missing data.

Classical Analysis
This approach allows researchers to make inferences about the larger data based on the characteristics observed in the sample. By analyzing the data collected, classical statistics (mean, quartiles (first quartile: Q1, second quartile: Q2, third quartile: Q3), time series, box plot, etc.) provides insights into the behavior and patterns of the variables under study. Additionally, it helps determine the level of confidence or uncertainty associated with the findings, enhancing the overall reliability of the statistical analysis.
Traditional statistical analysis seeks to evaluate the empirical frequency distribution that yields the absolute frequency of occurrence of each of the several possible results of the frequent size of a discrete event [19]. If there is just a finite number of various outcomes (a discrete example) and if the distribution function is utilized in the situation of an indefinitely frequent and randomly trustworthy calculation and each result is different, the outcome of relative frequency will not be very enlightening. This returns all values of the absolute frequency of occurrence that are less than x in this example [32].

Statistical Process Control
By applying SPC to monitor the system, it is possible to identify the outliers. However, in conditions where the points do not reach the defined limit, the analysis focuses on substantially low and high measurements. To study individual observation, the techniques can be used to study individual or average maps. It should split the dataset into logical subgroups [33]. In such cases, it may not be feasible to form rational subgroups due to the lack of variability. However, it is crucial to identify and address these sources of error in order to ensure accurate measurements and reliable data.
Data collection using the rational subgroup method shows intrinsic variation, a common cause of variation, which can be ignored. The method establishes special cause variation to avoid imperfect subgroups while determining the control chart's border limit based on variability within each subgroup. If the mechanism is violated, only subgroups that duplicate the process's common cause of variance should be collected [19].
When the data have some missing observations, the data have been imputed, and if normality has been established, then the data are correctly structured. If the researcher rejects the null hypothesis, then there are two methods to normalize the data: using modified techniques for non-normal distributions or transforming the data to normalize the set, and using Box-Cox transformation [34,35]. The Box-Cox transformation is as follows: where ω denotes the maximizes profile likelihood function of the data. Classical process analysis can be divided into two stages: the control stage, where patterns are evaluated and conditions outside of control are encountered, and the first stage, which removes normality and atypical measurement from the results. The average, UCL, and LCL are specified at the first level. The average is defined precisely by the control model and signifies the objective point. Then, the confidence interval is set to be the standard deviation of the process [36].
Shewart's control chart is a popular monitoring system for graphical statistical processes, detecting major changes in processes. It is designed as a conventional control chart when the underlying form of the distribution of processes is known. Charts using recent samples show no significant improvement in the process. Complementary rules are needed to identify deviations and add to the initial rules, as established by different authors [37,38]. Supplementary rules enhance the alertness and detection capacity for non-random samples of Shewart's control chart, improving non-random sample detection [27].
The average run length (ARL) is the most used and simple mode to measure the capacity of a control chart with supplementary run rules. In the control charts, the run rule is used before the warning alarm's indication when the process is not controlled. The important thing to do if this happens is to identify it as soon as possible. On the other hand, it would be reasonable to have a few false alarms when the mechanism is statistically in a state of control. This term is defined specifically as an alpha error (type I) and a β error (type II). The technique of sensitivity is also defined and is highly linked to the number of outliers. It must be considered that the potential to identify out-of-control techniques is high; for this reason, there are a lot of points that fall outside [39].

Functional Data Analysis
FDA is a method for studying curves and functions to analyze data over time [40]. It converts vector samples into functional samples, using discrete values as starting points.
Smoothing transforms vector points into continuous functions over time, making it valuable in air pollution research. This data composition allows for outlier detection, as days with varying ozone values may have identical averages. Functional analysis identifies possible outliers, making functional techniques superior for such investigations.
Let x(t j ) represent the initial observations, t j ∈ R signifies the time steps, and p represents the number of observations (j = 1, 2, . . . . . . , p). The individual value of the function where φ k is the set of basis functions (k = 1, 2, . . . . . . , n b ) and p is the number of basis functions necessary to generate a functional sample. In statistics, there are various types of bases, but the Fourier basis is the most commonly employed. Furthermore, for periodic data like the ones in our study, the Fourier basis is the best option [19].
x is the observing point at t j , j is the random noise with zero mean, λ is the level of regulization, and Γ is the penalized operator.
where c k } p k=1 is the coefficient that multiplies the basis function. This can be written the problem of smoothing as . . . . . , z p T , the expansion of vector coefficient c = (c 1 , . . . . . ., c p ) T , a (p, n b )-matrix φ whose elements are φ jk = φ k t j , and a (p, n b )-matrix R whose elements are: The problem can be solved with the following equation: Functional data help identify higher-than-mean time intervals and their differences, allowing for the removal of outliers caused by system failure and detecting system failure. The notion of depth allows you to sort a collection of data in Euclidian space by how close it is to the sample core. In multivariate analysis, the concept of depth emerged and was generated to calculate a point centrality among a cloud. This idea started to be incorporated into practical data analysis over the course of the year. In this region, the centrality of a certain curve x i is defined by depth, and the center of the sample is the mean curve. The two-depth measurement of Fraiman-Muniz depth (FMD) and the H-model depth (HMD) [19] are most usual in the sense of functional data.
Through the estimation of depths, it is also possible to classify outliers with a practical approach. In this case, it takes into account elements that have different behavioral designs than the rest. Instead of summarizing the curve observations into a single point, such as the average, the definition of depth makes it possible to deal with observations identified at a given interval in curve types. The depth technique is used for the identification of outliers and significance: There is a low depth of an element that is distant from the sample. Thus, practical outliers are the curves with the least depth.
Firstly, F n,t (x i (t)) is the cumulative empirical distribution function of the values of the curves {x i (t)}, (i = 1, 2, . . . . . . , n) in a certain time t [a, b] in which it is contemplated. It can be defined as: where I(.) is an indicator function. Next, the FMD for curve x i is calculated as: where t [a, b]. The functional mode in HMD, on the other hand, is the element or curve that is most densely surrounded by the other curves in the dataset. HMD is written as: In a functional space, with a kernel function K : R + → R + , a bandwidth parameter h and · as the norm. In a vast majority of cases, norm L 2, is expressed as: There is also a number of parameters for the kernel functions K(·). The truncated Gaussian kernel is a popular one and can be expressed as: In this paper, the HMD depth was chosen for the identification of outliers. The value of h is the value that leaves, below it, 15% of the data coming from the distribution of x i (t) − x j (t) , i, j = 1, 2, . . . . . . , n , and the cut-off C is selected, especially the 1% type I error, according to P r (HMD n (x i (t))) < c = 0.01, i = 1, 2, . . . . . . , n.
Because the functional depth distribution is unknown, the cut-off C must be computed. There is a variety of ways in which this estimation can be carried out. The bootstrapping approach, on the other hand, was best suited to the study's objectives [19]. The steps are below: Extract the substitution of the original with a new sample. II.
Via the statistics of this new sample, estimate the research parameter. III.
Repeat the steps overhead a significant number of times. The Monte-Carlo simulation is often referred to as this repetition. It uses duplication to extract evidence from the data. IV.
Determine the empirical statistical distribution.

Results and Discussion
In this paper, the results of the technique that applied are as follows. The whole analysis and figure generation were carried out with R software (R 4.0.2) [41]. The complete 2018 hourly data were collected from the Central Pollution Control Board of India [42]. The data that were collected for analysis have some missing values, so the data were imputed by simple imputation (mean imputation).

Classical Statistical Analysis
The classical monitoring strategy was used to evaluate the air quality within limits. The statistical parameters of the data were also used to evaluate the trends. The statistical techniques used to perform the classical air quality monitoring strategy are presented in the data on O 3 concentrations at the Victoria air quality monitoring station in Kolkata, India. A summary of the statistical analysis of hourly data is presented in Table 1 for the minimum, maximum, mean, Q1, Q2, Q3, and interquartile range. The descriptive analysis parameter in Table 1 demonstrates that the limit values were not exceeded. Additional steps are to analyze the hourly data from 2018 of individual time series plots (Figure 1) ranging from a minimum value of 3.03 µg/m 3 to a maximum value of 153.32 µg/m 3 . ysis and figure generation were carried out with R software (R 4.0.2) [41]. The complete 2018 hourly data were collected from the Central Pollution Control Board of India [42]. The data that were collected for analysis have some missing values, so the data were imputed by simple imputation (mean imputation).

Classical Statistical Analysis
The classical monitoring strategy was used to evaluate the air quality within limits. The statistical parameters of the data were also used to evaluate the trends. The statistical techniques used to perform the classical air quality monitoring strategy are presented in the data on O3 concentrations at the Victoria air quality monitoring station in Kolkata, India. A summary of the statistical analysis of hourly data is presented in Table 1 for the minimum, maximum, mean, Q1, Q2, Q3, and interquartile range. The descriptive analysis parameter in Table 1 demonstrates that the limit values were not exceeded. Additional steps are to analyze the hourly data from 2018 of individual time series plots (Figure 1) ranging from a minimum value of 3.03 (μg/m ) to a maximum value of 153.32 (μg/m ).     Figure 3 shows the QQ plot (normal probability plot) of the data. The null hypothesis is that the value follows a normal distribution and the alternate hypothesis is that the value does not follows a normal distribution.   Figure 3 shows the QQ plot (normal probability plot) of the data. The null hypothesis is that the value follows a normal distribution and the alternate hypothesis is that the value does not follows a normal distribution.  Figure 3 shows the QQ plot (normal probability plot) of the data. The null is that the value follows a normal distribution and the alternate hypothesis is tha does not follows a normal distribution. Other tests were applied to see whether the data resembled any of the foll tributions: normal, generalized extreme value, or Weibull and Rayleigh; how was no suitable null hypothesis at the 5% significance level.

(I) Control I-MR Chart with Individual Mean
The control chart of the I-MR hourly concentration was generated usin through the SPC method. In the examination of the results in Figure 4, there false alarms, i.e., outliers that were significant. This challenge is attributable to t the data were non-normal, as shown in Figure 3, and that there was greater va the data. Other tests were applied to see whether the data resembled any of the following distributions: normal, generalized extreme value, or Weibull and Rayleigh; however, there was no suitable null hypothesis at the 5% significance level.

(I) Control I-MR Chart with Individual Mean
The control chart of the I-MR hourly concentration was generated using the data through the SPC method. In the examination of the results in Figure 4, there were some false alarms, i.e., outliers that were significant. This challenge is attributable to the fact that the data were non-normal, as shown in Figure 3, and that there was greater variability in the data.  The results of the hourly data as rational subgroups of the and S chart are not under control because there were more variations in the data. The results of the hourly data as rational subgroups of the X and S chart are not under control because there were more variations in the data.

Functional Data Analysis
In the functional methodology technique, the initial step is to generate a sample curve based on discrete measurements taken each hour. The graph shows 365 functions derived from 24 h data. If the data have been translated into a functional form, i.e., curves with 24 points in a day, each of which takes into consideration the correlation between the O 3 readings and may be examined for outliers, the data can be analyzed.
When the depths are taken into account, the functional analysis results let us discover days with aberrant functional points, even if there are no outliers. Despite the fact that the daily limit values were not exceeded, the O 3 absorption may have demonstrated aberrant behavior throughout the course of the day. The functional technique, on the other hand, detects any variation from normal daily O 3 emission behavior without depending on any distribution limits. The functional outliers found in this study. Outliers are indicated by black and dotted lines.
The variable O 3 hourly data set was studied with a box plot, X/S chart, and functional depth. The hourly box plot ( Figure 2) shows that there was a large number of outliers because of the O 3 concentration. There were 5425 points not in between the limit using Tukey's fences, as shown in Figure 2. The IMR and X/S chart (Figures 4-6) shows that it was not statistically under control because there was greater variability in the O 3 concentration. The X hourly control chart cycle shows that the UCL was 50.83 (µg/m 3 ) and the LCL was 16.40 (µg/m 3 ) and that the number beyond the limit was 4014, in which the smallest number of outliers was shown in the months of January to March and October to December and the largest number of outliers was shown in the months of April to September. This information shows that the highest concentration was in the months of April to September and that the minimum concentration was in the months of January to March and October to December. The S control chart shows that 1080 values went beyond the limit. This shows that there was a large number of concentrations beyond the standard limit (180 µg/m 3 ), which is defined by the Indian Pollution Control Board. Moving on to functional data analysis, based on depth, Figure 7 represents the 365 functions (for each days) generated with 24 hourly data and Figure 8 shows functional outliers for 365 days. There were 25 outliers (24 h per day) detected out of 365 days. The data analysis revealed functionally significant changes on some days, which suggests that there were minimal O 3 pollution issues. The FDA technique is good at identifying days where the patterns are distinct from those of the rest of the data. It is essential to examine pollution within permitted limits as well as days where the levels were detected to be different from what is expected. The functional approach detected an atypical day on the sixth day, as shown in Figure 8. The mean was higher than that obtained through SPC ( Figure 5), but within the limit values, this day was not considered an outlier by classical analysis or SPC. The functional approach's strength in detecting such a day allows for the study of the reasons behind the O 3 behavior on that particular day.
expected. The functional approach detected an atypical day on the sixth day, as shown i Figure 8. The mean was higher than that obtained through SPC ( Figure 5), but within th limit values, this day was not considered an outlier by classical analysis or SPC. The func tional approach's strength in detecting such a day allows for the study of the reasons be hind the O3 behavior on that particular day.    expected. The functional approach detected an atypical day on the sixth day, as shown in Figure 8. The mean was higher than that obtained through SPC ( Figure 5), but within the limit values, this day was not considered an outlier by classical analysis or SPC. The functional approach's strength in detecting such a day allows for the study of the reasons behind the O3 behavior on that particular day.     Figure 8. The mean was higher than that obtained through SPC ( Figure 5), but withi limit values, this day was not considered an outlier by classical analysis or SPC. The tional approach's strength in detecting such a day allows for the study of the reason hind the O3 behavior on that particular day.    Box plots can detect outliers, but they are disproportionate in all cases and fail t identify validation events. Additionally, the non-normality of the data makes it impossibl to detect outliers below the minimum limit of the box plot. Control charts, such as an chart, can detect trends but have flaws due to their initial design for industrial processe with low variability. The rational subgroups and mean values contribute to loss of infor mation, preventing accurate outlier detection. Despite these flaws, control charts offer be ter and more consistent results than box plots, providing a clear graphical representatio of variable changes over time. The functional approach studies the entire dataset, min mizing information loss and enabling reliable analysis of hidden trends. The transfor mation from discrete information points to functional data reduces instrumental error detected by classical analysis and control charts. Instead, one can focus on analyzing th patterns and relationships within the data to gain insights. This allows for a more flexibl approach in understanding the data without relying solely on their original distribution The FDA approach performs well compared to classical statistics and control chart meth ods. Greater traceability of major sources, including information on the weather, traffi patterns, and industry sources, is required to comprehend aberrant O3 emissions behav ior. Outlier evaluation can be enhanced by including meteorological factors like tempera tures, daylight hours, and precipitation. Separating the sources of common and uniqu variability may be done with the use of outlier detection and air pollution events, whic enable the creation and application of efficient mitigation strategies.

Conclusions
This study evaluates three analytical methodologies for identifying outliers in hourl data from a metropolitan air quality monitoring station in Victoria, Kolkata, India. Ko kata's air quality is worsening day by day and it is challenging to find outliers for a quality. Traditional vectorial techniques, statistical process control, and functional dat analysis were used to analyze data, dividing them into days or hours and control chart The mean imputation and proposed methodology findings help to identify the air pollu tion events and outliers. To effectively control air pollution and achieve pure air qualit conditions, a new strategy and new techniques are needed to measure local air pollution The result obtained using the classical vectorial method is simple but has weaknesses i terms of the temporal correlation structure and recognizing true outliers. Advanced meth odologies can provide deeper insights into mitigating air pollution issues, similar to sta tistical process control. As a result, the method might be unable to effectively identify pa terns or trends that could help in differentiating between true abnormalities and fals Box plots can detect outliers, but they are disproportionate in all cases and fail to identify validation events. Additionally, the non-normality of the data makes it impossible to detect outliers below the minimum limit of the box plot. Control charts, such as an X chart, can detect trends but have flaws due to their initial design for industrial processes with low variability. The rational subgroups and mean values contribute to loss of information, preventing accurate outlier detection. Despite these flaws, control charts offer better and more consistent results than box plots, providing a clear graphical representation of variable changes over time. The functional approach studies the entire dataset, minimizing information loss and enabling reliable analysis of hidden trends. The transformation from discrete information points to functional data reduces instrumental errors detected by classical analysis and control charts. Instead, one can focus on analyzing the patterns and relationships within the data to gain insights. This allows for a more flexible approach in understanding the data without relying solely on their original distribution. The FDA approach performs well compared to classical statistics and control chart methods. Greater traceability of major sources, including information on the weather, traffic patterns, and industry sources, is required to comprehend aberrant O 3 emissions behavior. Outlier evaluation can be enhanced by including meteorological factors like temperatures, daylight hours, and precipitation. Separating the sources of common and unique variability may be done with the use of outlier detection and air pollution events, which enable the creation and application of efficient mitigation strategies.

Conclusions
This study evaluates three analytical methodologies for identifying outliers in hourly data from a metropolitan air quality monitoring station in Victoria, Kolkata, India. Kolkata's air quality is worsening day by day and it is challenging to find outliers for air quality. Traditional vectorial techniques, statistical process control, and functional data analysis were used to analyze data, dividing them into days or hours and control charts. The mean imputation and proposed methodology findings help to identify the air pollution events and outliers. To effectively control air pollution and achieve pure air quality conditions, a new strategy and new techniques are needed to measure local air pollution. The result obtained using the classical vectorial method is simple but has weaknesses in terms of the temporal correlation structure and recognizing true outliers. Advanced methodologies can provide deeper insights into mitigating air pollution issues, similar to statistical process control. As a result, the method might be unable to effectively identify patterns or trends that could help in differentiating between true abnormalities and false alarms. The accuracy of outlier identification might be improved and the incidence of false alarms could be decreased by including techniques that can record and analyze continuous data. Additionally, working from a functional point of view may also require expertise in statistical modeling and analysis to accurately interpret the results. Furthermore, the use of functional data analysis techniques may not be suitable for datasets with missing or incomplete observations. To do so, it is necessary to impute the data, as complete time units are required for analysis. By identifying and analyzing outliers from a functional point of view, researchers can gain valuable insights into the factors that contribute to air pollution and its control. This information can then be used to develop targeted actions and policies that effectively address the root causes of these distinct O 3 value sets. Additionally, understanding the O 3 outliers helps with accurately quantifying the impact of air pollution on public health and enabling better decision-making for sustainable development. Future studies aim to eliminate percentiles for outliers by testing classification techniques like isolation forest or k-means, identifying functions as outliers.