A Machine Learning Approach for Air Quality Prediction : Model Regularization and Optimization

In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter (PM2.5) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exist some works applying machine learning to air quality prediction, most of the prior studies are restricted to several-year data and simply train standard regression models (linear or nonlinear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration on the basis of meteorological data of previous days by formulating the prediction over 24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other and compare it with several typical regularizations for MTL, including standard Frobenius norm regularization, nuclear norm regularization, and `2,1-norm regularization. Our experiments have showed that the proposed parameter-reducing formulations and consecutive-hour-related regularizations achieve better performance than existing standard regression models and existing regularizations.


Introduction
Adverse health impacts from exposure to outdoor air pollutants are complicated functions of pollutant compositions and concentrations [1].Major outdoor air pollutants in cities include ozone (O 3 ), particle matter (PM), sulfur dioxide (SO 2 ), carbon monoxide (CO), nitrogen oxides (NO x ), volatile organic compounds (VOCs), pesticides, and metals, among others [2,3].Increased mortality and morbidity rates have been found in association with increased air pollutants (such as O 3 , PM and SO 2 ) concentrations [3][4][5].According to the report from the American Lung Association [6], a 10 parts per billion (ppb) increase in the O 3 mixing ratio might cause over 3700 premature deaths annually in the United States (U.S.).Chicago, as for many other megacities in U.S., has struggled with air pollution as a result of industrialization and urbanization.Although O 3 precursor (such as VOCs, NO x , and CO) emissions have significantly decreased since the late 1970s, O 3 levels in Chicago have not been in compliance with standards set by the Environmental Protection Agency (EPA) to protect public health [7].Particle size is critical in determining the particle deposition location in the human respiratory system [8].PM 2.5 , referring to particles with a diameter less than or equal to 2.5 µm, has been an increasing concern, as these particles can be deposited into the lung gas-exchange region, the alveoli [9].The U.S. EPA revised the annual standard of PM 2.5 by lowering the concentration to 12 µg/m 3 to provide improved protection against health effects associated with long-and short-term exposure [10].SO 2 , as an important precursor of new particle formation and particle growth, has also been found to be associated with respiratory diseases in many countries [11][12][13][14][15]. Therefore, we selected O 3 , PM 2.5 and SO 2 for testing in this study.
Meteorological conditions, including regional and synoptic meteorology, are critical in determining the air pollutant concentrations [16][17][18][19][20][21].According to the study by Holloway et al. [22], the O 3 concentration over Chicago was found to be most sensitive to air temperature, wind speed and direction, relative humidity, incoming solar radiation, and cloud cover.For example, a lower ambient temperature and incoming solar radiation slow down photochemical reactions and lead to less secondary air pollutants, such as O 3 [23].Increasing wind speed could either increase or decrease the air pollutant concentrations.For instance, when the wind speed was low (weak dispersion/ventilation), the pollutants associated with traffic were found at the highest concentrations [24,25].However, strong wind speeds might form dust storms by blowing up the particles on the ground [26].High humidity is usually associated with high concentrations of certain air pollutants (such as PM, CO and SO 2 ) but with low concentrations of other air pollutants (such as NO 2 and O 3 ) because of various formation and removal mechanisms [25].In addition, high humidity can be an indicator of precipitation events, which result in strong wet deposition leading to low concentrations of air pollutants [27].Because various particle compositions and their interactions with light were found to be the most important factors in attenuating visibility [28,29], low visibility could be an indicator of high PM concentrations.Cloud can scatter and absorb solar radiation, which is significant for the formation of some air pollutants (e.g., O 3 ) [23,30].Therefore, these important meteorological variables were selected to predict air pollutant concentrations in this study.
Statistical models have been applied for air pollution prediction on the basis of meteorological data [31][32][33][34][35].However, existing studies on statistical modeling have mostly been restricted to simply utilizing standard classification or regression models, which have neglected the nature of the problem itself or ignored the correlation between sub-models in different time slots.On the other hand, machine learning approaches have been developing for over 60 years and have achieved tremendous success in a variety of areas [36][37][38][39][40][41].There exist various new tools and techniques invented by the machine learning community, which allow for more refined modeling of a specific problem.In particular, model regularization is a fundamental technique for improving the generalization performance of a predictive model.Accordingly, many efficient optimization algorithms have been developed for solving various machine learning formulations with different regularizations.
In this study, we focus on refined modeling for predicting hourly air pollutant concentrations on the basis of historical metrological data and air pollution data.A striking difference between this work and the previous works is that we emphasize how to regularize the model in order to improve its generalization performance and how to learn a complex regularized model from big data with advanced optimization algorithms.We collected 10 years worth of meteorological and air pollution data from the Chicago area.The air pollutant data was from the EPA [42,43], and the meteorological data was from MesoWest [44].From their databases, we fetched consecutive hourly measurements of various meteorological variables and pollutants reported by two air quality monitoring stations and two air pollutant monitoring sites in the Chicago area.Each record of hourly measurements included meteorological variables such as solar radiation, wind direction and speed, temperature, and atmospheric pressure; as well as air pollutants, including PM 2.5 , O 3 , and SO 2 .We used two methods for model regularization: (i) explicitly controlling the number of parameters in the model; (ii) explicitly enforcing a certain structure in the model parameters.For controlling the number of parameters in the model, we compared three different model formulations, which can be considered in a unified multi-task learning (MTL) framework with a diagonal-or full-matrix model.For enforcing the model matrix into a certain structure, we have considered the relationship between prediction models of different hours and compared three different regularizations with standard Frobenius norm regularization.The experimental results show that the model with the intermediate size and the proposed regularization, which enforces the prediction models of two consecutive hours to be close, achieved the best results and was far better than standard regression models.We have also developed efficient optimization algorithms for solving different formulations and demonstrated their effectiveness through experiments.
The rest of the paper is organized as follows.In Section 2, we discuss related work.In Section 3, we describe the data collection and preprocessing.In Section 4, we describe the proposed solutions, including formulations, regularizations and optimizations.In Section 5, we present the experimental studies and the results.In Section 6, we give conclusions and indicate future work.

Related Work
Many previous works have been proposed to apply machine learning algorithms to air quality predictions.Some researchers have aimed to predict targets into discretized levels.Kalapanidas et al. [32] elaborated effects on air pollution only from meteorological features such as temperature, wind, precipitation, solar radiation, and humidity and classified air pollution into different levels (low, med, high, and alarm) by using a lazy learning approach, the case-based reasoning (CBR) system.Athanasiadis et al. [45] employed the σ-fuzzy lattice neurocomputing classifier to predict and categorize O 3 concentrations into three levels (low, mid, and high) on the basis of meteorological features and other pollutants such as SO 2 , NO, NO 2 , and so on.Kurt and Oktay [33] modeled geographic connections into a neural network model and predicted daily concentration levels of SO 2 , CO, and PM 10 3 days in advance.However, the process of converting regression tasks to classification tasks is problematic, as it ignores the magnitude of the numeric data and consequently is inaccurate.
Other researchers have worked on predicting concentrations of pollutants.Corani [46] worked on training neural network models to predict hourly O 3 and PM 10 concentrations on the basis of data from the previous day.Mainly compared were the performances of feed-forward neural networks (FFNNs) and pruned neural networks (PNNs).Further efforts have been made on FFNNs: Fu et al. [47] applied a rolling mechanism and gray model to improve traditional FFNN models.Jiang et al. [48] explored multiple models (physical and chemical model, regression model, and multiple layer perceptron) on the air pollutant prediction task, and their results show that statistical models are competitive with the classical physical and chemical models.Ni, X. Y. et al. [49] compared multiple statistical models on the basis of PM 2.5 data around Beijing, and their results implied that linear regression models can in some cases be better than the other models.
MTL focuses on learning multiple tasks that have commonalities [50] that can improve the efficiency and accuracy of the models.It has achieved tremendous successes in many fields, such as natural language processing [37], image recognition [38], bioinformatics [39,40], marketing prediction [41], and so on.A variety of regularizations can be utilized to enhance the commonalities of the related tasks, including the 2,1 -norm [51], nuclear norm [52], spectral norm [53], Frobenius norm [54], and so on.However, most of the former machine learning works on air pollutant prediction did not consider the similarities between the models and only focused on improving the model performance for a single task, that is, improving prediction performance for each hour either separately or identically.
Therefore, we decided to use meteorological and pollutant data to perform predictions of hourly concentrations on the basis of linear models.In this work, we focused on three different prediction model formulations and used the MTL framework with different regularizations.To the best of our knowledge, this is the first work that has utilized MTL for the air pollutant prediction task.We exploited analytical approaches and optimization techniques to obtain the optimal solutions.The model's evaluation metric was the root-mean-squared error (RMSE).

Data Collection
We collected air pollutant data from two air quality monitoring sites and meteorological data from two weather stations from 2006 to 2015 (summarized in Table 1).The air pollutant data in this study included the concentrations of O 3 , PM 2.5 and SO 2 .We downloaded the air pollutant data from the U.S. EPA's Air Quality System (AQS) database (https://www.epa.gov/outdoor-air-quality-data), which has been widely used for model evaluation [42,43].We selected the meteorological variables that would affect the air pollutant concentrations, including air temperature, relative humidity, wind speed and direction, wind gust, precipitation accumulation, visibility, dew point, wind cardinal direction, pressure, and weather conditions.We downloaded the meteorological data from MesoWest (http://mesowest.utah.edu/), a project within the Department of Meteorology at the University of Utah, which has been aggregating meteorological data since 2002 [44].
The locations of the two air quality monitoring sites and two weather stations are shown in Figure 1.The Alsip Village (AV) air quality monitoring site is also located in a suburban residential area, which is in southern Cook County, Illinois (AQS ID:

Measurement Sites Variables
Alsip Village (AV) Ozone concentration and PM 2. Red circles denote the two meteorological sites.

Preprocessing
We paired the collected meteorological data and air pollutant data on the basis of time to obtain the required data format for applying the machine learning methods.In particular, for each variable, we formed one value for each hour.However, the original data may have contained multiple records or missing values at some hours.To preprocess the data, we calculated the hourly mean value of each numeric variable if there were multiple observed records within an hour and chose the category with the highest frequency per hour for each categorical variable if there were multiple values.Missing values existed for some variables, which was not tolerable for applying the machine learning methods used in this study.Therefore, we imputed the missing values by using the closest-neighbor values for four continuous variables and one categorical variable: wind gust, pressure, altimeter reading, precipitation, and weather conditions.We deleted the days that still had missing values after imputing.We applied dummy coding for two categorical variables, the cardinal wind direction (16 values, e.g., N, S, E, W, etc.) and weather conditions (31 values, e.g., sunny, rainy, windy, etc.).Then, we added the weekday and weekend as two boolean features.Finally, we obtained 60 features in total (9 numerical meteorological features, 16 dummy codings for wind direction, 31 dummy codings for weather conditions, 2 boolean features for weekday/weekend, 1 numerical feature for pollutants, and 1 bias term).We applied normalization for all the features and pollutant targets to make their values fall in the range [0, 1].

Machine Learning Approaches for Air Pollution Prediction
In this section, we describe the proposed approaches for predicting the ambient concentration of air pollutants.

A General Formulation
Our goal is to predict the concentration of air pollutants of the next day on the basis of the historical meteorological and air pollutant data.In this work, we have focused on using the former day's data to predict the next day's hourly pollutants.In particular, we let (x i ; y i ) denote the ith training data, where y i ∈ R 24×1 denotes the concentration of a certain air pollutant on a day, and x i = (u i ; v i ) denotes the observed data on the previous day that include two components, where a semicolon ";" represents the column layout.The first component u i = (u i,1 ; . . .; u i,D ) ∈ R 24•D×1 includes all meteorological data over 24 h for the previous day, where u i,j ∈ R 24×1 denotes the jth meteorological feature of the 24 h and D is the number of meteorological features; the second component v i ∈ R 24×1 includes the hourly concentration of the same air pollutant on the previous day.The general formulation can be expressed as where W denotes the parameters of the model, f (W, x i ) denotes the prediction of the air pollutant concentration, and ϕ(•) denotes a regularization function of the model parameters W.
Next, we introduce two levels of model regularization.The first level is to explicitly control the number of model parameters.The second level is to explicitly impose a certain regularization on the model parameter.For the first level, we consider three models that are described below:

•
Baseline Model.The first model is a baseline model that has been considered in existing studies and has the fewest number of parameters.In particular, the prediction of the air pollutant concentration is given by where e k ∈ R 24×1 is a basis vector with 1 at only the kth position and 0 at other positions; w 0 , w 1 , . . . ,w D , w D+1 ∈ R are the model parameters, where w 0 is the bias term.We denote this model by W = (w 0 , w 1 , . . ., w D+1 ) .It is notable that this model predicts the hourly concentration on the basis of the same hourly historical data of the previous day and that it has D + 2 parameters.This simple model assumes that all 24 h share the same model parameter.
• Heavy Model.The second model takes all the data of the previous day into account when predicting the concentration of every hour of the second day.In particular, for the kth hour, the prediction is given by where w k,j ∈ R 24×1 , j = 1, . . ., D + 1 and w k,0 ∈ R.This model is defined by We note that each column of W corresponds to the prediction model for each hour.There are a total of 24 × (24 ×(D + 1) + 1) parameters.It is notable that the baseline model is a special case by enforcing all columns of W to be the same and because each w k,j has only one non-zero element at the kth position.

•
Light Model.The third model is between the baseline model and the heavy model.It considers the 24 h pattern of the air pollutants in the previous day and the same hourly meteorological data of the previous day to predict the concentration at a particular hour.The prediction is given by where w k,j ∈ R, j = 1, . . ., D and w k,D+1 ∈ R 24×1 .This model is defined by It is also notable that each column corresponds to the predictive model for one hour and that W has a total of 24 × (D + 1) + 24 × 24 × 1 parameters.

Regularization of Model Parameters
In this section, we describe different regularizations for the model parameter matrices W in the heavy and light models.We consider the problem using MTL, in which predicting the concentration of air pollutants over one hour is one task.In the literature, a number of regularizations have been proposed by considering the relationship between different tasks.We first describe three baseline regularizations in the literature and then present the proposed regularization that takes the dimension of time into consideration for modeling the relationship between models at different times.

•
Frobenius norm regularization.Frobenius norm regularization is a generalization of standard Euclidean norm regularization to the matrix case, for which where λ > 0 is a regularization parameter.• 2,1 -norm regularization.2,1 -norm regularization has been used for feature selection in MTL.
The norm is formed by first computing the 2 -norm of each row of the W matrix (across different tasks) and then computing the 1 -norm of the resulting vector.In particular, for W ∈ R d×K , where W j, * denotes the jth row of W. We consider a 2,1 -norm regularizer ϕ(W) = λ W 2,1 .

•
Nuclear norm regularization.The nuclear norm is defined as the sum of singular values of a matrix, which is a standard regularization for enforcing a matrix to have a low rank.The motivation for using a low-rank matrix is that models for consecutive hours are highly correlated, which could render the matrix W to be low rank.We denote by W * the nuclear norm of a matrix W; the regularization is ϕ(W) = λ W * .

•
Consecutive close (CC) regularization.Finally, we propose a useful regularization for the considered problem that explicitly enforces the predictive models for two consecutive hours to be close to each other.The intuition is that usually the concentrations of air pollutants for two consecutive hours are close to each other.We denote the model by W = (w 1 , . . ., w K ) and by The CC regularization is given by where p = 1 or p = 2.

Stochastic Optimization Algorithms for Different Formulations
With the exception that the Frobenius norm regularized model (with 2 -norm CC regularization or not) has a closed-form solution, we solved the other models via advanced stochastic optimization techniques.We denote the following: ; the total number of features is D.Although the standard stochastic (sub)gradient method [55] could be utilized to solve all the formulations considered in this work, it does not necessary yield the fastest convergence.To address this issue, we considered advanced stochastic optimization techniques tailored for solving each formulation.

Optimizing 2,1 -Norm Regularized Model
We utilized the accelerated stochastic subgradient (ASSG) method [56] with proximal mapping to optimize this model.The algorithm runs in mutliple stages, and each stage calls the standard stochastic gradient method with a constant step size.To handle the non-smooth 2,1 -norm, we used proximal mapping [57].The stochastic gradient descent part is where η s is the stage-wise step size, i is a sampled index, and e is a vector with 1 for all its elements.Then a proximal mapping is as follows (denoted by λ = 2η s λ): The above problem has analytical solutions.We denote w i as a column vector for W and w i as a column vector for W t .Then the solution to Equation ( 4) can be computed by the following [51]: The pseudocode of the algorithm is as follows: Algorithm 1: ASSG method with proximal mapping solving 2,1 -norm regularized model.Input: X, Y, W 0 , η 0 , S, and T for s = 1, . . ., S do η s = η s−1 /2 for t = 1, . . ., T do sample i ∈ {1, ..., n} update W t using Equation (3) update W t using Equation ( 4)

Optimizing Nuclear Norm Regularized Model
The challenge in solving the nuclear norm reguralized problem of most optimization algorithms lies with computing the full singular value decomposition (SVD) of the involved matrix W, which is an expensive operation.To avoid full SVD, the SVD-free convex-concave algorithm extension to a stochastic setting (SECONE-S) [58] was employed to solve the problem.The algorithm solves the following minimum-maximum problem: min Then stochastic gradient descent and ascent are used to update W and U at each iteration: where , with (u 1 , v 1 ) being the top-left and -right singular vectors of U t and σ 1 being the top singular value.The pseudocode for the algorithm is as follows: Algorithm 2: SECONE-S solving nuclear norm regularized model.
Input: X, Y, T, η 0 , and τ 0 for t = 1, . . ., T do sample i ∈ {1, ..., n} update W t and U t using Equation ( 6) The challenge of tackling the proposed CC regularization lies in that the standard proximal mapping cannot be computed efficiently.We addressed this challenge by using the alternating-direction method of multipliers.We utilized a recently proposed locally adaptive stochastic alternating-direction method of multipliers (LA-SADMM) [59] to solve the CC regularized model.Below, we discuss the updates for the choice of p = 1 (i.e., using the 1 -norm) in Equation (2).The updates for the choice of p = 2 can be derived similarly.
The objective function can be written as min Here, E = ( ê1 , ..., êk−1 ), where êi = (0, ..., 1, −1, ..., 0) T , i = 1, ..., k − 1, the ith element is 1 and the (i + 1)th element is −1.Therefore, Cons(W) = WE.A dummy variable U = WE was introduced to decouple the last term from the first term, and a Lagrangian function was formed as follows: where Λ is the Lagrangian multiplier and β is the penalty parameter.This could then be solved by optimizing each variable alternatively.The update rules for SADMM are as follows: where . LA-SADMM solves the problem more efficiently by doing stage-wise penalty increasing.The pseudocode for the algorithm is as follows: Algorithm 3: LA-SADMM solving consecutive close (CC) regularized problem with 1 -norm.

Extensive Discussion
It is noteworthy that the main contribution of this work is the incorporation of model parameter reduction and MTL with regularization into air pollutant prediction.As the previous content has illustrated, for the parameter reduction part, our light formulation reduces model parameters by removing heavy meteorological parameters of the other hours for one hour's submodel.For the MTL part, we considered that there could be some similarities for consecutive hours' models; therefore, we could add appropriate regularizers for this purpose.
The high-level idea of MTL lies in transfer learning, which generally aims to transfer knowledge from a related source task to a target task and consequently improve the performance for the target task.There are multiple variants for transfer learning, such as inductive transfer learning, transductive transfer learning and unsupervised transfer learning, and the approaches for transfer learning mainly include instance transfer, feature-representation transfer, parameter transfer and relational-knowledge transfer [60].One of the most common examples is feature-representation transfer for deep neural networks.After either supervised or unsupervised learning from other related datasets, the pretrained model can be appropriately reused for learning the target task with a better performance.The MTL technique in this work is an example of parameter transfer in an inductive-transfer-learning setting.
A similar idea can be applied to other kinds of work.First, if the submodels are not built for each hour but for each day (or even for each location from a spatial perspective), we can still apply the parameter reduction idea that only keeps more important information and removes the information with low priority.Second, for the MTL part, we can still add regularizations for the similarities of the submodels.Furthermore, in this work, the submodel w i was a linear regression model; it is also practical to replace it with support vector regression (SVR), nonlinear regression, neural networks, and so on.Finally, the techniques used in this work can be further combined with many other transfer learning techniques, such as feature-representation transfer for deep neural networks.

Experiments
We used the names of the paired air quality monitoring sites and two weather stations to denote the two datasets, that is, LU-LV and LMA-AV.LU-LV contained the data to predict the concentration of the two air pollutants O 3 and SO 2 .LMA-AV contained the data to predict the concentration of the two air pollutants O 3 and PM 2.5 .
We compared 11 different models that were learned with different combinations of model formulations and regularizations.The 11 models were the following: It is noteworthy that we also added the standard Frobenius norm regularizer for the heavy/light-nuclear, -CCL2, and -CCL1 models, because their regularizers were mainly considered for controlling the similarities of submodels and may not have been enough for preventing overfitting.We divided each dataset into two parts: training data and testing data.Each model was trained on the training data with proper regularization parameters and the learning rate selected on the basis of 5-fold cross-validation.Each trained model was evaluated on the testing data.The splitting of the data was done by dividing all days into a number of chunks of 11 consecutive days, for which the first 8 days were used for training and the next 3 days were used for testing.We have used the RMSE as the evaluation metric.
We first report the improvement of each method over the baseline method.The improvement was measured by a positive or negative percentage over the performance of the baseline method, that is, (RMSE of compared method -RMSE of the baseline method)×100/RMSE of the baseline method.The results are shown in Figures 2 and 3. To facilitate the comparison between different methods, for each air pollutant of each dataset, we report two figures, with one grouping the results by regularizations and the other grouping the results by the model formulations.From the results, we can see that (i) the light model formulation had a clear advantage over the heavy model formulation and the baseline model formulation, which implied that controlling the number of parameters is important for improving generalization performance; and (ii) the proposed CC regularization yielded a better performance than other regularizations, which verified that considering the similarities between models of consecutive hours is helpful.We also report the exact RMSE of each method in Table 2.       Finally, we compared the convergence speed of the employed optimization algorithms with their standard counterparts.In particular, we compared the ASSG and SSG methods for optimizing the 2,1 -norm regularized problem, and SSG for solving the nuclear norm regularized problem, and and SADMM for solving the CC regularized problem.The results are plotted in Figure 4 and demonstrate that the employed advanced optimization techniques converged much faster than the classical techniques.

Conclusions
In this paper, we have developed efficient machine learning methods for air pollutant prediction.We have formulated the problem as regularized MTL and employed advanced optimization algorithms for solving different formulations.We have focused on alleviating model complexity by reducing the number of model parameters and on improving the performance by using a structured regularizer.Our results show that the proposed light formulation achieves much better performance than the other two model formulations and that the regularization by enforcing prediction models for two consecutive hours to be close can also boost the performance of predictions.We have also shown that advanced optimization techniques are important for improving the convergence of optimization and that they speed up the training process for big data.For future work, we will further consider the commonalities between nearby meteorology stations and combine them in a MTL framework, which may provide a further boosting for the prediction.

Figure 1 .
Figure 1.Locations of measurement sites.Blue stars denote the two air quality monitoring sites.Red circles denote the two meteorological sites.

•
Baseline: the baseline model with standard Frobenius norm regularization.• Heavy-F: the heavy model with standard Frobenius norm regularization.• Light-F: the heavy model with standard Frobenius norm regularization.• Heavy-2,1 : the heavy model with 2,1 -norm regularization.• Heavy-nuclear: the heavy model with nuclear-norm regularization.• Heavy-CCL2: the heavy model with CC regularization using the 2 -norm.• Heavy-CCL1: the heavy model with CC regularization using the 1 -norm.• Light-2,1 : the light model with 2,1 -norm regularization.• Light-nuclear: the light model with nuclear-norm regularization.• Light-CCL2: the light model with CC regularization using the 2 -norm.• Light-CCL1: the light model with CC regularization using the 1 -norm.
Grouped by model formulations.

Figure 2 .
Figure 2. Improvement of different methods over the baseline method for Lewis University-Lemont (LU-LV) dataset.
Grouped by model formulations.

Figure 3 .
Figure 3. Improvement of different methods over the baseline method for Lansing Municipal Airport-Alsip Village (LMA-AV) dataset.

Table 2 .
Root-mean-squared error (RMSE) for all approaches and datasets.The best approaches are marked as bold.