Majority Voting Based Multi-Task Clustering of Air Quality Monitoring Network in Turkey

: Air pollution, which is the result of the urbanization brought by modern life, has a dramatic impact on the global scale as well as local and regional scales. Since air pollution has important e ﬀ ects on human health and other living things, the issue of air quality is of great importance all over the world. Accordingly, many studies based on classiﬁcation, clustering and association rule mining applications for air pollution have been proposed in the ﬁeld of data mining and machine learning to extract hidden knowledge from environmental parameters. One approach is to model a region in a way that cities having similar characteristics are determined and placed into the same clusters. Instead of using traditional clustering algorithms, a novel algorithm, named Majority Voting based Multi-Task Clustering (MV-MTC), is proposed and utilized to consider multiple air pollutants jointly. Experimental studies showed that the proposed method is superior to ﬁve well-known clustering algorithms: K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical sum of the running time of all single task clustering results. Experiments were performed on a desktop computer with Intel Core i7-6700 3.40 GHz processor and 8 GB memory. In each experiment, the algorithms were executed 10 times and then the average values were reported. The empirical results show that the running time of the proposed K-means ++ algorithm under MV-MTC framework is better than EM and hierarchical clustering algorithms. Besides, The MV-MTC algorithm has comparable speed with the traditional clustering algorithms when we compare C ALL and MV-MTC results on the datasets.


Introduction
Air pollution is now recognized as an important problem all over the world. It can be referred as a mixture of multiple pollutants that vary in size and composition. Air pollutants (also referred to as "criteria pollutants") are commonly grouped as particulate matters such as PM 10 and PM 2.5, and ground-level pollutants such as ozone (O 3 ), carbon monoxide (CO), sulfur dioxide (SO 2 ), nitrogen oxides (NO) and nitrogen dioxide (NO 2 ).
It is known that air pollution has negative impacts on human health, range of vision, materials, and plant and animal health. Air pollutants trigger or worsen chronic diseases such as asthma, pneumonia, heart attack, bronchitis and other respiratory problems. Since particulate matters are very small and light, they tend to stay in the air longer than the heavier particles. This increases the likelihood of humans and animals inhaling these particles through respiration. Due to their small size, these particles can easily pass through the nose and throat and penetrate the lungs, and some may even enter the circulatory system. Smoke, a gas mixture of solid and liquid particles resulting from non-burned carbon materials such as solid fuels and fuel oil, is a variety of air pollution and has a reducing effect on range of visibility. Air pollution also has a destructive and disturbing effect on artistic and architectural structures. On plants, they can be lethal and prevent their growth. Thus, high concentrations of air pollutants can harm human health, adversely influence environment, and also cause property damage [1][2][3][4].
Due to the seriousness of the issue, air pollution control policies require systematic monitoring and evaluation of air quality. The causes of air pollution should be investigated and necessary precautions should be taken in accordance with the findings. Therefore, it is very important to develop an appropriate tool to understand the air quality in an area. For this purpose, effective methods are continuously developed with new studies.
In this context, this study aimed at examining the air quality monitoring stations in Turkey according to their similarities in terms of five air pollutants, PM 10 , SO 2 , NO 2 , NO and O 3 , and making appropriate inferences based on the analysis of the levels of air pollutants measured at these stations in the time interval from 1 November 2017 to 1 November 2018. In this way, city areas with similar air pollution behavior can be identified so that decision-making authority can canalize emission sources to be located to the regions in need. To perform the experiments, a novel algorithm, named majority-based multi-task clustering (MV-MTC), is proposed, instead of applying traditional clustering algorithms, to benefit from the common decision coming from different pollutant sources. The novelty of this study is the implementation of multi-task clustering (MTC) in the field of environmental science and the examination of air pollution in Turkey with this method for the first time.
The proposed algorithm (MV-MTC) was compared with popular clustering algorithms, namely K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical clustering methods, in terms of sum of squared error (SSE). The experimental results obtained in this study indicate that the proposed approach produces better clusters than standard clustering algorithms by considering relationships among multiple air pollutants jointly.
The remainder of this study is organized as follows. In Section 2, a detailed literature survey investigating the studies using data mining methods to deal with the air quality control of Turkey is given in addition to the recent studies on the proposed method of multi-task clustering. In Section 3, background information on the applied methodology used in the experiments is explained. The proposed MTC technique and dataset description are mentioned in Sections 4 and 5, respectively. The experimental studies are presented and the obtained results are discussed in Section 6. Lastly, concluding remarks, a brief summary and future directions are given.

Related Work
The monitoring stations located in nearby area are characterized by the same specific air pollution characteristics. Many studies have been done in the literature using this information. Data mining and machine learning are intensely applied to environmental subjects to identify interesting structure in large amount of environmental data, where the structure finds patterns, rules, predictive models and relationships among the data. Ignaccolo, Ghigo and Giovenali [5] classified the air quality monitoring network in Piemonte (Northern Italy) using functional cluster analysis based on Partitioning around Medoids algorithm and considering three air pollutants, namely NO 2 , PM 10 , and O 3, to classify sites in homogeneous clusters and identify the representative ones. Barrero, Orza, Cabello and Cantón [6] analyzed and made experiments on the variations of PM 10 concentrations at 43 stations in the air quality monitoring network of the Basque Country to group them according to their common characteristics. They implemented the autocorrelation function and K-means clustering. Similarly, Lu, He and Dong [7] used principal component analysis and cluster analysis for the management of air quality monitoring network of Hong Kong and for the reduction of associated expenses.
In Turkey, the importance of environmental issues has also gained much attention and studies related to air quality increasingly continue in this direction. Several of environmental data mining studies mentioned thus far on "air quality in Turkey" are compared in Table 1 by displaying the year of the publication, the target pollutants of the study, dataset content used in the experiments, the aim of the study, which data mining task was applied and which algorithms/methods were implemented as well as performance metrics to evaluate the results of the applied methodology. The bold notation in the Algorithms/Methods column shows the algorithm which performs the best among the others. According to the findings, most of the experiments are done using the measurements of PM 10 concentrations [8][9][10][11][12][13] and prediction of air pollutant amount is the main goal. In addition to pollution data, some of the studies also integrate meteorological data such as temperature, wind speed, wind direction, pressure and humidity into the problem domain [8,[11][12][13].
Multi-task classification and multi-task clustering are two well-known types of multi-task learning recently presented in the literature. Wang, Yan, Lu, Zhang and Li [44] use multi-task classification in the prediction of air pollution particles by implementing a deep multi-task learning framework. On the other hand, multi-task clustering has not been studied until now for the air quality management, neither in the environmental science.
There is an issue to be addressed: "what to share" while learning multiple tasks. The form of sharing type determines which knowledge sharing among all the tasks could occur. Usually, there are three forms of sharing: feature, instance and parameter. Feature-based MTL aims to learn common features among different tasks. Instance-based MTL identifies useful data objects in a task for other tasks and then shares knowledge via the identified instances. Parameter-based MTL uses model parameters in a task to help learn model parameters in other tasks [42]. The proposed method in this study (MV-MTC) is among the popularly applied unsupervised learning schemes of instance-based MTL applications.
MTC has been applied in many different areas including bioinformatics, text mining, web mining, image mining, daily activity recognition and so on [18][19][20][21][22][23]45]. The resulting clustering template of MTC has generally outperformed any single clustering algorithm's outputs. Table 2 presents a brief list of studies in which different MTC algorithms are proposed and applied in various subject areas. It is experimentally proven that MTC algorithms provide remarkable performance when compared to single task learners.  The proposed MV-MTC algorithm has many advantages over existing multi-task clustering methods. First, some methods have a complicated theoretical foundation, which leads to implementation difficulties. For instance, graph-based methods and matrix factorization for nonnegative data are commonly applied (e.g., [21]) by implementing a semi-nonnegative matrix tri-factorization method to co-cluster the data in each view of each task. Likewise, the algorithm introduced in [49] has several sophisticated steps: feature extraction, clustering-based regularization, convex relaxation, and optimization. Spectral clustering that uses the eigenvectors of the Laplacian of a graph for clustering is another way to implement multi-task clustering [20]. In addition to graph-based methods (e.g., [19]), multi-task clustering can be performed by reweighting the distance between data points in different tasks by learning a shared subspace. In this way, clustering operation for each individual task is generated by selecting the nearest neighbors for each sample from the other tasks in the learned shared subspace.
Second, some proposed MTC methods (e.g., scVDMC [18] and Arboretum-Hi-C [20]) were designed as field-specific methods and have a valid use only for bioinformatics data to analyze the genome architecture or to simultaneously capture the differentially expressed genes. These methods are not suitable for the analysis of geographical data (or for the identification of air pollution levels of a region).
Third, our algorithm is particularly advantageous since it does not need any a priori information about the data. However, Yan et al. [22] proposed a novel algorithm, named Convex Multi-task Clustering (CMTC), which requires some a-priori knowledge about the data relationship.
Fourth, some multi-task clustering algorithms (i.e., [23]) require additional parameters and the results change significantly with different parameter values. It makes it difficult to use the algorithm, since the user should determine the optimal parameter for each problem. Our algorithm does not require any additional parameter tuning.
Fifth, the execution time of some multi-task clustering algorithms (e.g., [45]) increases exponentially when the input data increase. However, our algorithm (MV-MTC) requires computation time that grows linearly with the number of instances, clusters and tasks.
Sixth, our proposed method can effectively avoid the imbalance of cluster distribution by merging multiple models according to majority voting. In addition, the MV-MTC framework can effectively reduce clustering errors by selecting the best clustering algorithm for the problem under consideration.
Our goal is to propose an easily implemented, generally applicable, fast, prior knowledge-and parameter-independent multi-task clustering method. Unlike existing methods, the algorithm in this paper is a new kind of multi-task clustering method that is much easier to understand and implement by taking the jointly obtained common decision from different tasks using cluster labels. It was developed as a new method that can appeal to every area rather than being specific to one area (e.g., [18,20]). Different types of MTC algorithms have been proposed. For instance, multi-task multi-view clustering [21] is presented to handle the learning problem of multiple related tasks with one or more common views. Each view is associated with one task or multiple related tasks, the inter-task knowledge is transferred to one another, and multi-task and multi-view relationships are exploited to improve clustering performance. In [21], it is applied for webpage and image mining operations under clustering framework.

Materials and Methods
In this section, applied methodologies and datasets for experiments in addition to used platforms are presented. The overall goal of the used techniques was to create clusters with a consistent set of similar behavioral points by ensuring the maximum similarities in intra-cluster objects while keeping inter-cluster differences high. The clustering algorithms or techniques used in this study were: K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical clustering in addition to the proposed technique Multi-task clustering.

K-Means Clustering
Consider a dataset D = {o 1 , o 2 , . . . , o n } where each o i represents an object as a p-dimensional explanatory variable and n is the number of objects (instances) in the dataset. Assume that the problem domain is to be divided into k clusters combination of which is represented as a vector C KM = {C 1 , C 2 , . . . , C k } and the centroids of k clusters are denoted by µ = {m 1 , m 2 , . . . , m k }.
The first step is to assign k points as cluster centers at random. The distance between each data point o i and each cluster centroids m j , where i = {1, . . . , n} and j = {1, . . . , k}, are calculated using one of the distance metrics such as Euclidean, Manhattan, Chebyshev, Minkowski distance, etc. as argmin j dist(o i , m j ) to find the nearest cluster for the respective instance to be assigned. New cluster centroids are calculated by m j = 1/n j o i ∈m j o i , where n j denotes the number of objects in cluster j, C j . This process iteratively continues until no data point changes cluster membership. According to the method used for the initialization of the process, different techniques instead of random initialization can be used such as K-Means++, Farthest First or Canopy.

Expectation Maximization Clustering
It extends the K-Means paradigm in a different way. While the K-Means algorithm assigns each data point to a cluster, each object in the Expectation-Maximization (EM) model is assigned to each cluster according to a weight representing the probability of membership. In other words, there is no definite limit between clusters and new centers are calculated in terms of weighted measures [50]. EM clusters data points using a finite mixture density model, i.e., normal distribution, of k probability distributions, where each distribution represents a cluster.
As in the K-Means clustering, the process starts with selecting cluster centroids randomly. The procedure continues with two steps to refine the parameters (i.e., clusters) iteratively based on statistical modeling: Expectation (E) step and Maximization (M) step [51]. In Step E, a function to determine the probability of cluster membership of an instance is generated using the present estimate for the attributes using Equation (1) where p o i C j follows the normal distribution and i = {1, . . . , n}, j = {1, . . . , k}.
Step M is applied as in Equation (2) for re-estimating the model parameters by discovering the attributes which maximizes the expected log-likelihood found in Step E. The iterative process continues until obtaining the optimal value.

Canopy Clustering
The general application of Canopy is on the preprocessing step of other clustering algorithms such as K-Means or Hierarchical clustering to speed up the process in the case of large datasets [52]. The procedure uses two distance metrics T1 > T2 to be used for later processing and a list of data points to cluster. Initial canopy center is determined randomly from one of the data points and then distances of all other instances to this canopy center are approximated. The instances whose distance value fall within the threshold of T1 is placed into a canopy while the data points whose distance value fall within the threshold of T2 are removed from the list. These removed ones are excluded from being selected as a new canopy center or creating new canopies. The process iteratively continues until the list is empty.

Farthest First Clustering
It is one of the variants of K-Means clustering where each cluster centroid is selected in turn at the point furthest from the existing cluster centers. This point must lie within the data area. This significantly boosts the speed of clustering in general due to the need of less reassignment and modification [53].

Hierarchical Clustering
Hierarchical clustering is used to group data objects into a tree of clusters either by bottom-up (agglomerative) or top-down (divisive) fashion [49]. In agglomerative version, each instance of the dataset is put into its own cluster initially and all of these atomic clusters are merged continuously until a single cluster is formed to hold all data points inside or if there is a termination condition. Divisive version is just the opposite of agglomerative clustering because it begins the process with a single cluster where all data points are placed and the later steps are the subdivision of the cluster into smaller distinct ones until a termination criterion is satisfied such as a predetermined number of clusters is obtained.
According to the distance calculation method between different clusters, there are many link types used in Hierarchical clustering such as Single (the minimum link that is the closest distance between any items of two different clusters), Complete (the maximum link that is the largest distance between any items of two different clusters), Average (the average distance between the elements of two clusters), Mean (the mean distance of merged cluster) and Centroid (the distance from one centroid to another).

Multi-task Clustering
A task is generally referred to the construction of a model using a specific dataset for a single target or for a sub-goal. In this sense, "multiple tasks" could mean the modeling of multiple output targets simultaneously by using task-related datasets and by considering task relations. Depending on the definition of "multiple tasks", we can define multi-task clustering as follows: multi-task clustering (MTC) is a process of generating global clusters that are shared by the multiple related tasks. MTC is desired to merge information among tasks to improve the clustering performance of individual tasks. The most important aspect in MTC is to discover the shared information among tasks. In this paper, a novel algorithm, named Majority Voting based Multi-task Clustering (MV-MTC), is proposed to provide this aspect.
Consider the unlabeled dataset D = {o 1 , o 2 , . . . , o n } where each o i represents an object as a p-dimensional explanatory variable and n is the number of objects (instances) in the dataset. Assume that the problem domain consists of r different tasks T = {t 1 , t 2 , . . . , t r }, each of which is represented as t i .
In the first step of the algorithm, the instance set allotted to each task should be properly clustered using one of the traditional clustering algorithms. For r different tasks, let us denote the resulting clustering assignments as C = {C t 1 , C t 2 , . . . , C t r } where C t i = {c 1 , c 2 , . . . , c k } for the predetermined number of clusters as k and each c i consists of different o i s from the dataset D. To take the joint decision from all C t i s, a common factor should be determined because the same cluster names do not have to represent the same clustering structure among the task groups. We need to determine common cluster labels meaning the same information through all tasks.
In this context, after clustering instances of each task by one of the single clustering algorithms, all clusters are labeled from the common label set L = {L 1 , L 2 , . . . , L k } as in Table 3 in terms of the mean weights of intra-cluster objects and k cluster labels for k clusters are produced according to the cluster weights. To illustrate if we have three clusters, the heaviest one, the medium one and the lightest one can be labeled as "L 3 ", "L 2 " and "L 1 ", respectively. The same procedure is applied for r tasks. As shown in the following example, all instances in the dataset are labeled with a suitable cluster label L i for each task t i . As the final stage, as in the majority voting approach, the most common cluster label among all tasks for a given instance o i is selected as the final cluster assignment. Therefore, the novel MTC algorithm is called Majority Voting based Multi-task Clustering (MV-MTC).
This study proposes two novel concepts: single-task clusters and multi-task clusters. In the first phase, the proposed algorithm discovers local clusters (single-task clusters) from each task data separately, and, in the second phase, these local clusters are combined to produce the global result (multi-task clusters).

Definition 1.
(Single-Task Clusters) Single-task clusters are groups of instances discovered from the data partition D t of a particular task t, i.e., D = ∪ r t=1 D t , and denoted byC t i = {c 1 , c 2 , . . . , c k }, where k is the number of clusters.

Definition 2. (Multi-Task Clusters) Given r tasks T
where all the tasks are related but not identical, multi-task clusters, which are denoted by C = {C t 1 ,C t 2 , . . . , C t r }, are groups of instances that mostly appear in the same level of the clusters of the tasks.
Based on these definitions, it is possible to say that there are two elementary factors for multi-task clustering. The first factor is the definition of task. Many real world problems consist of a number of related subtasks. For instance, PM 10 , SO 2 , NO 2 , NO and O 3 air pollutants can be considered as the tasks of air quality monitoring problem. The second factor is the definition of ensemble method to combine multiple tasks. In our study, we used majority voting mechanism, which selects the cluster that is the one with the most votes.
To figure out the rationale behind the algorithm, the example scenario in Tables 4 and 5 explain the process step by step. In the first stage, the dataset D, which is full of instances with only one feature, is given. There are three tasks (t 1 , t 2 and t 3 ) and the aim is to group the dataset into three clusters by taking the joint decision from each task. The attribute value of instances can change according to different tasks. The next step is applied for clustering instances by one of the clustering algorithms simultaneously for each task. Instances are properly assigned to one of three clusters (C 1 , C 2 or C 3 ). On the other hand, we need to determine a common decision point on the cluster groups of different tasks to get the final cluster assignments. Therefore, three labels (L 1 , L 2 and L 3 ) are used to generalize the clusters and mean the same groupings under different tasks according to average intra-cluster weights. In the final part, after instances are labeled with the new label set for every task (C t 1 , C t 2 and C t 3 ), majority voting scheme is applied to obtain final cluster labels for MV-MTC algorithm. Figure 1 displays the general framework of multi-task clustering algorithm where each t i shows single task of the task space and D is the unlabeled data. The main purpose is to ensure that the instances in the clusters that are created before the MV-MTC result remain in the same set in the final step. The number of instances remaining in the same cluster is maximized according to Equation (3) where C ij (o r ) means that the instance o r is the member of cluster c j of task t i and MV − MTC j indicates the resulting cluster c j of MV-MTC algorithm. The pseudo code of the proposed algorithm is given in Algorithm 1.   Task t2 Task t3  1  3  23  43  2  10  15  30  3  21  40  54  4  9  32  89  5  18  14  72  6  12  27  28  7  6  24  26  8  4  41  22  9  17  33  79  10  19  28  58  11  10  15  47  12  3 18 73

Ins. ID Task t1
(1) Dataset D and its instance values in terms of three tasks.

Ins. ID
As shown in Algorithm 1, the methodology is made up of four steps. In the first step, single-task clusters are generated by taking each individual task into consideration.
Step 2 is performed to calculate intra-cluster weights under different tasks. In Step 3, cluster labels are assigned to clusters according to their weight values, assigning L 1 to the cluster which has the lowest mean value, and then increasing the label values until giving L k to the highest one. The last step is the place where joint decision from different tasks is taken by applying a majority voting mechanism. As a result, all data points are placed into the most suitable clusters and final cluster labels are assigned from joint decision. for i = 1 to r 2.
C t i = CA(D, t i ) 3.
C.add(C t i ) Output: C = {C t 1 , C t 2 , . . . , C t r } // cluster assignments in terms of different tasks C t i = {c 1 , c 2 , . . . , c k } // k different clusters under the task t i // Step 2: Determine average intra-cluster weights 4.
for each C t i in C 5.
for i = 1 to k 6.
for each o in c k 7. sum = sum + o // value of the instance 8. m i = sum/|c k | Output: µ i = {m 1 , m 2 , . . . , m k } // k different average intra-clusters weights under the task t i // Step 3: Label each cluster c i in C t i for all tasks according to µ i values 9.
for each c i in C t i 10.
for i = 1 to k 11.

Dataset Description and Used Platforms
There are seven geographical regions, namely as Eastern Anatolia, Central Anatolia, Southeastern Anatolia, Blacksea, Mediterranean, Aegean, and Marmara, in Turkey and numerous air quality monitoring stations (AQMS) at each region. This study was conducted on 49 AQMSs from 32 provinces which are from different regions. The features of each station are listed in Tables 6 and 7 by showing the name of AQMS, in which city it is located, the corresponding county of the city, longitude and latitude information, network type (urban/rural/industrial), and which air pollutants are regularly measured in there.
The National Air Quality Monitoring Network of Turkey includes 330 Air Quality Monitoring Stations. The air quality of all provinces in the country is monitored. To facilitate public access to information on air quality, the monitoring results are published online at the website of http: //laboratuvar.cevre.gov.tr [54]. In all of the air pollution measurement stations, SO 2 and PM 10 parameters are measured; in addition, NO, NO 2 , NO x , CO and O 3 are measured automatically in many of them. In this study, all of the AQMSs were investigated and 49 out of 330 stations were selected because the aforementioned air pollutants (PM 10 , SO 2 , NO 2 , NO and O 3 ) are regularly measured in these stations together.
Since the data become roughly periodic after one-year period, only one year of (November 2017 to November 2018) data were used in the experiments. The pollutant concentrations are mean values of daily (24 h) measurements. The application was developed using Weka open source data mining library [55] on Visual Studio.

Experimental Results
In this study, the proposed MTC method, MV-MTC, was compared with traditional clustering algorithms K-Means (KM), Expectation Maximization (EM), Hierarchical Clustering (HIER), Canopy and Farthest First (FFIRST). Each task was clustered by the selected algorithm and then their decision from consensus was obtained in MV-MTC framework. Performance evaluation was done via sum of squared error (SSE) calculation. Before constructing the model, data were normalized and missing data imputation was performed using the mean values.
The number of clusters, k, was selected as 10% of the number of instances in the dataset, therefore it was 5. Distance metric was chosen as Euclidean distance. To take the joint decision from each single clustering algorithm, each cluster was labeled according to the weights calculated as the average value of the instances of intra-cluster. According to this scheme, five cluster labels were determined as "L 1 ", "L 2 ", "L 3 ", "L 4 " and "L 5 ". Table 8 displays the average normalized weight of each cluster in terms of different air pollutants and their corresponding cluster labels. As a result of the joint decision of different tasks, where evaluation of PM 10 , SO 2 , NO 2 , NO and O 3 pollutants were assumed as a new task, final cluster assignments were done.
To evaluate the performance of the applied methodology, values of sum of squared error were calculated. SSE is the sum of the squared differences between each observation and its group's mean. In Equation (4), o i represents an instance of dataset D, C j represents the jth cluster, m j is the centroid value of the specified cluster j where o i is assigned and k is the number of cluster. Total SSE of a method is the sum of all separate SSE calculations coming from distinct clusters. Final assignments are obtained both by MV-MTC and single clustering algorithms. Clustering algorithms is applied on each single task, i.e., the model is formed just by taking one pollutant into consideration. The SSE results of different pollutants under different algorithms are shown as C pollutantName where "pollutantName" is one of the pollutants (PM 10 , SO 2 , NO 2 , NO or O 3 ) in Table 8. C ALL is the average SSE value coming from all pollutants. KM is applied with two different versions in terms of initialization method used. KM with the random initialization is denoted as KM and KM initialized with K-Means++ is displayed as KM++. Hierarchical clustering is implemented with different link types among clusters. HIER Sing , HIER Comp , HIER Avg , HIER Mean and HIER Centro represent the hierarchical clustering types with single link, complete link, average link, mean link and centroid link, respectively. The bold notations in Table 9 show the best results in the respective rows.
We can conclude that the proposed MV-MTC method outperforms all single clustering algorithms that similar AQMSs are assigned to the same cluster group more accurately when multi-task clustering is applied. Besides, the most promising output of MV-MTC is obtained by KM++. In the case of single clustering algorithms, EM performs the best among the other applied techniques.
Final cluster assignments after performing MV-MTC with KM++ are shown in Figure 2. It points out the geographical locations of AQMSs in the map of Turkey with different colored markers where each color represents a cluster.   In MV-MTC approach, a clustering algorithm is performed for t tasks and a merging operation is done in the final step ("majority voting"). In this study, the best results were obtained using the KMeans++ algorithm. The time complexity of K-Means++ is O (n x k + n x k x I), where n is the number of instances, k is the number of clusters, and I is the number of iterations needed for convergence [56]. After single-task clustering step, the merging process takes the runtime cost of O (t x n), where t is the number of tasks. Considering this, the total time complexity of MV-MTC algorithm is O (n x k + n x k x I + t x n). This time complexity indicates that the proposed MV-MTC algorithm requires computation time that grows linearly with the number of instances, clusters and tasks. Thus, the execution time of the algorithm will still be reasonable even if we process a large volume of data. Table 10 shows the execution time (in seconds) to perform MV-MTC algorithm in terms of different clustering methods. Single task clustering results are also shown as CPolluntantName, and CALL represents the sum of the running time of all single task clustering results. Experiments were performed on a desktop computer with Intel Core i7-6700 3.40 GHz processor and 8 GB memory. In each experiment, the algorithms were executed 10 times and then the average values were reported. The empirical results show that the running time of the proposed K-means++ algorithm under MV-MTC framework is better than EM and hierarchical clustering algorithms. Besides, The MV-MTC algorithm has comparable speed with the traditional clustering algorithms when we compare CALL and MV-MTC results on the datasets. In MV-MTC approach, a clustering algorithm is performed for t tasks and a merging operation is done in the final step ("majority voting"). In this study, the best results were obtained using the KMeans++ algorithm. The time complexity of K-Means++ is O (n × k + n × k × I), where n is the number of instances, k is the number of clusters, and I is the number of iterations needed for convergence [56]. After single-task clustering step, the merging process takes the runtime cost of O (t × n), where t is the number of tasks. Considering this, the total time complexity of MV-MTC algorithm is O (n × k + n × k × I + t × n). This time complexity indicates that the proposed MV-MTC algorithm requires computation time that grows linearly with the number of instances, clusters and tasks. Thus, the execution time of the algorithm will still be reasonable even if we process a large volume of data. Table 10 shows the execution time (in seconds) to perform MV-MTC algorithm in terms of different clustering methods. Single task clustering results are also shown as C PolluntantName , and C ALL represents the sum of the running time of all single task clustering results. Experiments were performed on a desktop computer with Intel Core i7-6700 3.40 GHz processor and 8 GB memory. In each experiment, the algorithms were executed 10 times and then the average values were reported. The empirical results show that the running time of the proposed K-means++ algorithm under MV-MTC framework is better than EM and hierarchical clustering algorithms. Besides, The MV-MTC algorithm has comparable speed with the traditional clustering algorithms when we compare C ALL and MV-MTC results on the datasets. MV-MTC algorithm was compared with one of the recently proposed MTC methods, MTCMRL [45], in terms of time complexity. In [45], multi-task clustering is combined with model relation learning (MTCMRL) method to automatically learn the model parameter relatedness between each pair of tasks by providing a solution to a non-convex optimization problem. Even though the proposed algorithm has a better clustering performance compared to other multi-task clustering methods, it still does not offer the expected performance in terms of time complexity, which is O (n 2 * m), where m is the number of features and n is the number of instances per task, thus it increases exponentially when n is increased to larger volumes. On the other hand, MV-MTC is still reasonable to be preferred because of its linearly changing time complexity.
With this study, it was aimed to identify similar regions in terms of air quality. It enables flexible decision-making at the cluster level. Thus, decision makers on the control of air quality can take actions similarly for the members of the same cluster. Since many air quality monitoring station data are summarized in several clusters, it provides richer but compacted information for control and modeling. It finds structure in air quality data and is therefore exploratory in nature. Representing the whole environmental data by few clusters may offer the great advantage of simplification in analyzing the data. Identification of the monitoring station groups can be used to understand why these stations in a same cluster are similar. Clustering monitoring stations minimizes the overload of information. Grouping similar information and summarizing common characteristics help the environmental scientists understand the current situation more clearly. In addition, it is also possible to classify a new station by assigning it to the cluster with the closest center.
The potential contributions of this study to the prediction of air quality can be listed as follows: • The multi-task clustering can also be used to label all the observed elements before air quality prediction, by calculating the distance between each centroid and each element in the data, and then selecting the cluster label (or level) with minimum distance.

•
Multi-task clustering can also be used as a preprocessing step to improve the speed and performance of the classification algorithm that is used to predict air quality index.

•
In the application to predict air quality index, temporal data clustering results can give information about air quality variations, such that a set of forecasting systems, which are dedicated to reflect temporal changes, can be formed.

•
The identification of the air pollution levels of the different regions by clustering can be useful to design air quality monitoring network structure. Such networks must consider the monitoring location, sampling frequencies and the pollutants concern. For instance, clustering results lead to design an optimal network, i.e., a network providing maximum data with minimum measurement devices. The spatial relationship analysis is used to compare the information given by the potential sites that may form the network.

•
On forecasting the level of air pollution, it is possible to find the closest cluster of a new instance to be predicted, and then use the values in this cluster for prediction.

•
Multi-task clustering can also be useful for detecting the extreme air pollution events and can help predict future exceedances. In this sense, an air pollutant value of a region may be considered as an outlier if it exceeds the minimum or maximum value of the cluster it belongs to.

Conclusions
The main purpose of this study was to present a new multi-task clustering algorithm to determine which provinces of Turkey have the same air pollution characteristics so that similar precautions for the reduction of pollution can be taken by the decision-making authority for the cities in the same group. The main air pollutants for the experiments were selected as PM 10 , SO 2 , NO 2 , NO and O 3 and their mean daily concentrations were taken into consideration. All of the data were taken from 49 air quality monitoring stations from different regions of Turkey. Two phases were performed under MV-MTC scheme: single-task clustering and multi-task clustering. In single-task clustering, each air pollutant was handled individually and air quality monitoring stations were assigned to respective clusters (local clustering). In multi-task clustering phase, clusters were labeled according to the intra-cluster weights so that taking common decision from different tasks becomes easier by applying majority voting on these cluster labels per each instance. Final cluster labels were obtained in this phase by combining the results of single-task clusters (global clustering). According to the results of the sum of squared error, the proposed multi-task clustering method MV-MTC performed well compared to classical single clustering algorithms K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical clustering. MV-MTC with K-Means, which was initialized with K-Means++, provides promising results in the detection of similar AQMSs.
With this study, the following benefits can be obtained: • Similar regions can be detected easily so that similar air quality management strategies can be applied for them by the decision-making authority.

•
Collecting similar information together and summarizing common features help environmental scientists figure out the present situation more clearly. • Data analysis becomes easier due to dealing with only few cluster instances instead of whole environmental data.

•
Data summarization is performed resulting in compact and useful information, thus one does not need to handle huge amounts of redundant data.

•
It can be used as a pre-processing step before performing the essential environmental study. • Inherent hidden patterns of air quality data can be discovered.

•
In the case of a new station to be classified, the process can be achieved by placing the station into the cluster that has the nearest cluster center.
In the future, other unsupervised learning methods such as association rule mining or outlier detection or time series analysis can be applied on Turkey's air pollution data. Instead of using only pollutant levels, meteorological factors such as temperature, humidity, wind speed and direction, pressure, etc. are going to be added into the problem domain because they can significantly influence the air quality level of a region. Seasonal changes can also be observed instead of using yearly data. The severity of air quality may be clustered based on the impact on the health issue or the potential damage to the environment. Furthermore, a new study could be conducted to investigate the main causes of pollution by utilizing data such as fuel, exhaust and industrial waste. PM 2.5 is one of the most dangerous particulate matters. However, in Turkey, there is a missing data problem considering the measurements of PM 2.5 particulate matter. The same case is also valid for CO pollution, thus it is not dealt with in this study. If the study is extended to be applied on other countries, new air pollutants can also be handled.
Author Contributions: G.T., D.B. and A.P. were the main investigators; G.T. and D.B. contributed to writing paper and critically reviewed the paper; D.B. contributed to the design of the paper; G.T. performed the review of the literature; G.T. implemented the methodology; D.B. supervised the work and provided experimental insights; and A.P. critically reviewed the paper and contributed to its final edition.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.