Query-By-Committee Framework Used for Semi-Automatic Sleep Stages Classification

Active learning is very useful for classification problems where it is hard or time-consuming to acquire classes of data in order to create a subset for training a classifier. The classification of over-night polysomnography records to sleep stages is an example of such application because an expert has to annotate a large number of segments of a record. Active learning methods enable us to iteratively select only the most informative instances for the manual classification so the total expert’s effort is reduced. However, the process is able to be insufficiently initialised because of a large dimensionality of polysomnography (PSG) data, so the fast convergence of active learning is at risk. In order to prevent this threat, we have proposed a variant of the query-by-committee active learning scenario which take into account all features of data so it is not necessary to reduce a feature space, but the process is quickly initialised. The proposed method is compared to random sampling and margin uncertainty sampling which is another well-known active learning method. It was shown that, during crucial first iteration of the process, the provided variant of query-by-committee acquired the best results among other strategies in most cases.


Introduction
Despite there exist a large amount of various machine learning techniques which can be adopted for numerous applications, in more and more real-world settings we often encounter the problem that it is possible to gather a large number of data, but the process of annotating them (i.e., assigning each instance to a specific class so that they can be used for training of a classifier) is expensive and time-consuming.
The classification of over-night polysomnography (PSG) records to sleep stages is a good example of the mentioned problem. In reality, a doctor or a studied annotator has to walk through the whole several hours-long PSG record, which was previously split to 30 seconds-long segments, and manually classify all segments to one of sleep stages [1]. Nowadays, the resolution of sleep phases provided by the American Association of Sleep Medicine (AASM) is used-sleep is divided into five stages: wake, REM (rapid-eye movement sleep), N1, N2 and N3, where N1, N2 and N3 are specified subsets of non-REM sleep (non-rapid eye movement sleep) [2]. It is clear that the whole process is very time-demanding and it would be appropriate to make it more automatic; on the other hand, the information about sleep stages is used for the patient's diagnosis so the review of an expert is crucial.
The solution is in the adoption of semi-supervised methods as active learning which is used for choosing of the instances that are sufficient for learning an adequately good classifier [3].

Active Learning
Let X be the observation space and Y the space of classes. At the beginning of semi-supervised methods, there are two sets of instances: a small set of labeled instances S L which contains observations that are assigned to some class (i.e., their class is known) -S L = {(x 1 , y 1 ), . . . , (x l , y l )}, x i ∈ X, y i ∈ Y, and a large set of unlabeled instances S U = {x 1 , . . . , x u }, x ∈ X, whose classes we have no information about. The active learning process can be divided into a few steps [3]: 1. Learn a classifier c on the set of labeled instances S L . 2. Assign instances from the set of unlabeled instances S U to some class by using the learnt classifier c. 3. Use a query strategy in order to select instance from set S U . 4. Ask an "oracle" for the class which the selected instance belongs to. By "oracle" it is often meant an expert-a human annotator who has an expertise in the given field. 5. Add the newly classified instance to set S L (and remove it from set S U ). 6. Repeat steps 1-5 until a terminal condition is met (e.g., a given number of iterations is reached, the error attained a specified threshold, etc.).

Query Strategies
In this section, we would like to discuss the third step in the previously mentioned list about the active learning process. The most crucial part of active learning is to determine how instances will be selected for the classification by the "oracle". Settles et al. [3] have introduced a large number of various methods which are commonly used. We will mention two of them, which are in our opinion the most favourite ones-margin uncertainty sampling (MUS) [4] and query-by-committee (QBC) [5]. These two methods are often compared among the literature [6,7].
In the margin uncertainty sampling scenario, the instance, whose class classifier c is the least certain of, is queried [8]. In order to explain it formally, the observation x * is chosen, for which holds: where P Y|X is the conditional probability of a class when the instance is observed,ȳ 1 states for the most probable class of x andȳ 2 is the second most probable class of x.
The second approach consists in the utilisation of an ensemble of classifiers which represents competing hypotheses [9]. All models are learnt on the set of labeled instances S L and the instance, which the classifiers disagree the most about, is selected for assigning to its label. For measuring of the level of disagreement we will use the vote entropy [10]-the instance x * is queried for which holds: where n is the number of classes, p is the number of models in the committee and v(y i ) represents how many classifiers decided that instance x belongs to class y i . Now let us mentioned third query strategy we will use in order to compare the performance of described strategies: random sampling (RS). As the name suggests, in each iteration of the algorithm the instance, which was selected at random, is queried.

Advantages and Disadvantages of Active Learning
In this section we would like to discuss a few pros and cons which can be encountered when active learning is utilised.
• Advantages of Active Learning -Saving of time and money: there is no need to annotate a large amount of data, it is sufficient to label only the most informative instances. -Online adaptation of the classifier: the classifier is automatically retrained when new unseen instances are available.
• Disadvantages of Active Learning -Application-dependent selection of the query strategy: the query strategy has to be chosen wisely according, i.e., to a chosen classifier (e.g., margin uncertainty sampling is suitable when the classifier computes posterior probabilities [3]), to some specific relationship among data instances in the observation space (then density-weighted methods are useful [11]), etc. -Sensitivity to the initialisation: when the process is not properly initialised, the performance of the chosen classifier is insufficient during several first iterations (the so-called "cold start problem" [12]) which can result in a slower convergence of the learning process.

Dataset
In our work, the dataset consisting of 36 full-night PSG recordings was used. 18 healthy individuals and 18 insomniac patients were examined using the standard 10-20 montage EEG [13] in the National Institute of Mental Health, Czech Republic. Although EOG and EMG were also recorded, only EEG signals were used in this study due to varying quality in both of EOG and EMG recordings. The detailed specification of the measured group of patients is provided in Table 1. All records were split to 30 seconds-long segments without overlaps. 21 features were extracted from all of used EEG derivatives (namely: Fp1, Fp2, F3, F4, C3, C4, P3, P4, F7, F8, T3, T4, T5, T6, Fz, Cz, Pz, O1, O2), i.e., the total amount of features was 21 × 19 = 399. List of all computed features is shown in Table 2. The Continuous wavelet transformation (CWT) was used for obtaining of the frequency spectrum which was utilised for computing of features 7-21.  Table 2 were aggregated by using their median value over all EEG derivatives. As a result, every 30 s-long segment was described by 21 features.

Proposed Method
As we mentioned in Section 1, active learning often suffers from the cold start problem. It is now necessary to denote that at the beginning of active learning process the initial set of labeled instances contains only few instances described by relatively many features and this all can lead to overfitting. In our previous work [14] we successfully proposed a method which can be used for the increase of the initial set of labeled instances by 1-nearest neighbour classifier without any additional information about classes of selected instances.
It is also possible to tackle this problem by reducing the feature space. This can be done e.g., by calculating of the mutual information between features and labels [15], i.e., by detecting the features which described instances' classes the most. Values of the mutual information between individual features and labels for all datasets are shown in the Figure 1. At first sight it is clear that skewness, kurtosis and spectral entropy do not acceptably describe classes of instances (values of the mutual information for these features are approaching zero for all datasets). Furthermore, there is not any obvious pattern that some features gives a better detailed account about labels than other features, so the selection of fewer features is not able to be done. Our idea was to utilise a property of the query-by-committee framework. We created an ensemble of simple linear classifiers, each classifier was learnt only on one feature. The instance, about whose class the classifiers had disagreed the most, was queried, classified, and moved to set S L . If there are more instances with the same level of disagreement, one of them is randomly selected and queried. As a result of this method, we suppose that the error on testing data will be smaller (i.e., classifier is well adapted to data) in first crucial iterations of the algorithm when the proposed version of query-by-committee will be adopted.

Experiments and Results
We split each dataset to a training and a testing subsets, the training sets always contain 60% of all instances from a dataset. Training data were divided to the set of unlabeled data and the set of labeled data in such way that five instances of each class were randomly chosen and were added to set S L ; the set of unlabeled instances was created by the rest of training instances. A linear classifier was chosen for learning on training data and consequently for the estimation of test error E on testing data which is defined as: where e i is the percentage of incorrectly classified testing instances of each class. The whole process followed previously mentioned steps of active learning (see Section 2). Note that E is computed in each iteration.
In order to get more reliable results, the whole process was repeated ten times (each time with different initialisation) and the estimations of E for each iteration were averaged.
We decided to compare three query strategies-random sampling, margin uncertainty sampling and our version of query-by-committee. In Tables 3-5 mean values of E acquired in 5th, 10th and 50th iterations are shown. Table 3. Values of the average of E obtained in the 5th iteration of the algorithm for all strategies. The smallest value of E within query strategies is in bold.  Table 5. Values of the average of E obtained in the 50th iteration of the algorithm for all strategies. The smallest value of E within query strategies is in bold. Let us summarize achieved results. Except Dataset 7, both active learning strategies reached a smaller test error than random sampling in the 5th iteration. Furthermore, the query-by-committee framework overcame margin uncertainty sampling in 30 cases. In the 10th iteration, random sampling acquired the smallest test error only on Dataset 18, the query-by-committee scenario reached the best results in 18 cases, In the end in the 50th iteration, this framework beat other strategies in 9 cases.

Dataset RS MUS QBC Dataset RS
Let us show examples of typical results in Figure 2. The averaged test error during first 100 iterations is plotted. In both cases, the error of the query-by-committee achieves smaller values than other strategies in several first iterations, than the results are in favour of margin uncertainty sampling.

Conclusions and Discussion
In this paper, we adopted the query-by-committee framework which consists in training the ensemble of basic linear classifiers (each classifier is learnt on one feature) on the set of unlabeled data. An instance, which class classifiers disagree the most, is then chosen, annotated and added to the set of labeled instances.
Acquired results showed that the test error in several first iterations is actually smaller when query-by-committee is used in comparison with margin uncertainty sampling, but margin uncertainty sampling is in most cases faster in following iterations. The statement, that the utilisation of the proposed variant of the query-by-committee framework is able to help the classifier with a faster adaptation to high-dimensional data, was fulfilled. This leads to the conclusion that the proposed variant of the query-by-committee scenario leads to preventing the cold start problem. Note that random sampling almost always achieved the worst results, so this validates the usage of active learning strategies.
The contribution of margin uncertainty sampling is invaluable, but the utilisation of this method is often limited, because margin uncertainty sampling is entitled to a proper classifier (as it was mentioned above, only classifiers which estimates posterior probabilities can be used). On the other hand, query-by-committee is more robust which was shown e.g., in [6]). Furthermore, our proposed method handles both selection of the most informative instance and dealing with high-dimensional data.
That raises a question of using the combination of both tested query strategies -the variant of query-by-committee at the beginning and then margin uncertainty sampling in next iterations. This will be tested in the future work as well as the utilisation of the proposed method on different data.