Fast Training Set Size Reduction Using Simple Space Partitioning Algorithms

: The Reduction by Space Partitioning (RSP3) algorithm is a well-known data reduction technique. It summarizes the training data and generates representative prototypes. Its goal is to reduce the computational cost of an instance-based classiﬁer without penalty in accuracy. The algo-rithm keeps on dividing the initial training data into subsets until all of them become homogeneous, i.e., they contain instances of the same class. To divide a non-homogeneous subset, the algorithm computes its two furthest instances and assigns all instances to their closest furthest instance. This is a very expensive computational task, since all distances among the instances of a non-homogeneous subset must be calculated. Moreover, noise in the training data leads to a large number of small homogeneous subsets, many of which have only one instance. These instances are probably noise, but the algorithm mistakenly generates prototypes for these subsets. This paper proposes simple and fast variations of RSP3 that avoid the computationally costly partitioning tasks and remove the noisy training instances. The experimental study conducted on sixteen datasets and the corresponding statistical tests show that the proposed variations of the algorithm are much faster and achieve higher reduction rates than the conventional RSP3 without negatively affecting the accuracy.


Introduction
Data reduction is a crucial pre-processing task [1] in instance-based classification [2]. Its goal is to reduce the high computational cost involved in such classifiers by reducing the training data as much as possible without penalty in classification accuracy. In effect, Data Reduction Techniques (DRTs) attempt to either select or generate a small set of training prototypes that represent the initial large training set so that the computational cost of the classifier is vastly reduced. The selected or generated set of training prototypes is called a condensing set.
DRTs can be based on either the concept of Prototype Selection (PS) [3] or the concept of Prototype Generation (PG) [4]. A PS algorithm collects representative prototypes from the initial training set, while a PG algorithm summarizes similar training instances and generates a prototype that represents them. PS and PG algorithms are based on the hypothesis that training instances far from the boundaries of the different classes, also called class decision boundaries, can be removed without penalty in classification accuracy. On the other hand, the training instances that are close to class decision boundaries are the only useful training instances in instance-based classification. In this paper, we focus on the PG algorithms.
The RSP3 algorithm [5] is a well-known parameter-free PG algorithm. Its condensing set leads to accurate and fast classifiers. However, the algorithm requires high computa-tional cost to generate its condensing set because it is based on a recursive partitioning process that divides the training set into subsets that contain training instances of only one class, i.e., they are homogeneous. The algorithm keeps dividing each non-homogeneous subset into two new subsets and stops when all created subsets become homogeneous. The center/mean of each subset constitutes a representative prototype that replaces all instances of the subset. In each algorithm step, a subset is divided by finding the pair of its furthest instances. The instances of the initial subset are distributed to the two subsets according to their distances from those furthest instances. The pair of the furthest instances is retrieved by computing all the distances between instances and finding the pair of instances with the maximum distance. The computational cost of this task is high and may become even prohibitive in cases of very large datasets. This weak point constitutes the first motive of the present paper, namely Motive-A.
The quality and the size of the condensing set created by RSP3 depends on the degree of noise in the training data [6]. Suppose that a training instance x that belongs to class A lies in the middle of a data neighborhood with instances that belong to class B. In this case, x constitutes noise. RSP3 splits the neighborhood into multiple subsets, with one of them containing only instance x. The algorithm mistakenly considers x as a prototype and places it in the condensing set. This observation constitutes the second motive of the present work, namely Motive-B.
In this paper, we propose simple RSP3 variations which consider the two motives presented above. More specifically, this paper proposes: • RSP3 variations that replace the costly tasks of retrieving the pair of the furthest instances by applying simpler and faster tasks based on which each subset is divided. • A mechanism for noise removal. This mechanism considers each subset containing only one instance as noise and does not generate prototypes for that subset. As a result, it improves the reduction rates and the classification accuracy when it is applied on noisy training sets. The proposed mechanism can be incorporated in any of the RSP3 variations (conventional RSP3 included).
The experiments show that the proposed variations are much faster than the original version of RSP3. In most cases, accuracy is retained high, and the variations that incorporate the mechanism for noise removal improve the reduction rates and the classification accuracy, especially on noisy datasets. The experimental results are statistically validated by utilizing the Wilcoxon signed rank test and the Friedman test.
The rest of the paper is organized as follows: The recent research works in the field of PG algorithms are reviewed in Section 2. The RSPE algorithm is presented in Section 3. The new RSP3 variations are presented in detail in Section 4. Section 5 presents and discusses the experimental results and the results of the statistical tests. Section 6 concludes the paper and outlines future work.

Related Work
Prototype Generation algorithms is a research field that has attracted numerous works over the last decades; nowadays, this field is highly active and challenging due to the explosion of Big Data.
In this direction, Triguero et al. [4] review PG algorithms introduced prior to 2012 and present a taxonomy and an experimental comparison of them which show that the RSP3 algorithm achieves considerably high accuracy. Hence, in this paper, we focus our review on research papers published after 2013.
Giorginis et al. [7] introduce two RSP3 variants that accelerate the original RSP3 algorithm by adopting computational geometry methods. The first variant exploits the concept of convex hull in the procedure of finding the pair of the furthest instances in each subset created by RSP3. More specifically, the variant finds the instances that define the convex hull in each subset. Then, it computes only the distances between the convex hull instances and keeps the pair of instances with the largest distance. The second variant is even faster since it approximates the convex hull by finding the Minimum Bounding Rectangle (MBR). The two variants share the motive of the high computational cost of RSP3 with the algorithms presented in this work. However, the algorithms proposed by the present work avoid complicated computational geometry methods. In effect, the development of the algorithms presented here was motivated by the research conducted in [7].
A fast PG algorithm that is based on k-means clustering [8,9] is Reduction through Homogeneous Clustering (RHC) [10]. Like RSP3, this algorithm is based on the concept of homogeneity. Initially, the algorithm considers the whole training set as a non-homogeneous cluster, and a mean instance is computed for each class present in the cluster. Then, the k-means clustering algorithm uses the aforementioned instances as initial seeds. This procedure is recursively applied for each non-homogeneous discovered cluster until all clusters become homogeneous. The set of means of the homogeneous clusters becomes the final condensing set. The experimental results show that the RHC algorithm is slightly less accurate than RSP3; however, it achieves higher reduction rates. Not only was the RHC algorithm was found to be much faster than RSP3, but it also was one of the fastest approaches that took part in this experimental study [10]. A modified version of the RHC algorithm has recently been applied on string data spaces [11,12].
ERHC [13] is a simple variation of RHC. It considers the clusters that contain only one instance as noise and does not generate prototypes for them. In other words, ERHC incorporates an editing mechanism that removes noise from the training data in its data reduction procedure. New RSP3 variations presented in the present paper adopt the same idea.
Gallego et al. [14] present a simple clustering-based algorithm which accelerates the k-NN classification. Firstly, by using c-means clustering, their algorithm discovers clusters in the training set, the number of which is defined by the user as an input parameter. Afterwards, by searching for the nearest neighbors in the nearest cluster, the k-NN classifier performs classification. Furthermore, the authors use Neural Codes (NC), which are featurebased representations extracted by Deep Neural Networks, in order to further improve their algorithm. These NC collect same class instances in order to be placed within the same cluster. Typically, this algorithm cannot be considered a PG algorithm. Although the paper refers to the means of clusters as prototypes, the algorithm does not achieve training data reduction. However, the authors empirically compare their algorithm against several DRTs.
The algorithms presented in [15,16] cannot be considered as PG. Both are based on pre-processing tasks that build a two-level data structure. The first level holds prototypes while the second one stores the "real" training instances. The classification is performed by accessing either the first or the second level of the data structure. The decision is based on the area where the new unclassified instance lies and according to pre-specified criteria. Like the previous paper, the authors compare the algorithms against DRTs.
The algorithm presented in [17] cannot be considered a PG algorithm. However, it is able to perform efficient Nearest Neighbor searches in the context of k-NN classification. As the authors state, the proposed caKD+ algorithm combines clustering, feature selection, different k parameters in each resulted cluster and multi-attribute indexing in order to perform efficient k-NN searches and classification.
Impedovo et al. [18] introduce a handwriting digit recognition PG algorithm. This algorithm consists of two phases. In the first one, using the Adaptive Resonance Theory [19], the number of prototypes is determined, and the initial set of prototypes is generated. In the second phase, a naive evolution strategy is used to generate the final set of prototypes. The technique is incremental, and, by modifying the previously generated prototypes or by adding new prototypes, it can be adapted to writing style changes.
Rezaei and Nezamabadi-pour [20] present a swarm-based metaheuristic search algorithm inspired by motion and gravity Newtonian laws [21], namely the gravitational search algorithm, which is adapted for prototype generation. The authors include the RSP3 in their experimental study.
Hu and Tan [22] improve the performance of NN classification by presenting two methods for evolutionary computation-based prototype generation. The first one, namely error rank, targets at upgrading the NN classifier's generation ability by taking into account the misclassified instances, while the second one is able to avoid over-fitting by pursuing the performance on multiple data subsets. The paper shows that by using the two proposed methods, particle swarm optimization achieves better classification performance. This paper also includes RSP3 in the experimental study.
Elkano et al. [23] present an one-pass Map-Reduce Prototype Generation technique, namely CHI-PG, which exploits the Map-Reduce paradigm and uses fuzzy rules in order to produce prototypes that are exactly the same, regardless of the number of Mappers/Reducers used. The proposed approach enhances the distributed prototype reduction execution time without decreasing classification reduction rates and accuracy; however, its input parameters must be empirically determined.
Escalante et al. [24] present a PG algorithm, namely Prototype Generation via Genetic Programming, which is based on genetic programming. This algorithm generates prototypes that maximize an estimate of the NN classifier's generalization performance by combining training instances through arithmetic operators. Furthermore, the proposed algorithm is able to automatically select the number of prototypes for each class.
Calvo-Zaragoza et al. [25] use dissimilarity space techniques in order to apply PG algorithms to structured representations. The initial structural representation is mapped to a feature-based one, hence allowing the use of statistical PG methods on the original space. In the experimental study, RSP3 and two other PG algorithms are used, while the results show that RSP3 achieves the highest accuracy.
Cruz-Vega and Escalante [26] present a Learning Vector quantization technique, which is based on granular computing and includes Big Data incremental learning mechanisms. This technique, firstly, groups instances with similar features very fast, by using a one-pass clustering task, and, then, it covers the class distribution by generating prototypes. It comprises two stages. In the first one, the number of prototypes is controlled using a usage-frequency indicator, and the best prototype is kept using a life index, while in the second one, the useless dimensions are pruned of the training database.
Escalante et al. [27] present a prototype generation multi-objective evolutionary technique. This technique targets enhancing the reduction rate and accuracy at the same time, and achieving a better trade-off between them, by formulating the prototype generation task as a multi-objective optimization problem. In this technique, the amount of prototypes, as well as the generalization performance estimation that the selected prototypes achieve, are the key factors. The authors include RSP3 in their experimental study.
The algorithm proposed by Brijnesh J. Jain and David Schultz in [28] adapts the Learning Vector Quantization (LVQ) PG method in time-series classification. In effect, the paper extends the LVQ approach from Euclidean spaces to Dynamic Time Wrapping spaces. The work presented by Leonardo A. Silva et al. [29] focuses on the number of prototypes generated by PG algorithms. The work introduces a model that estimates the ideal number of prototypes according to the characteristics of the dataset used. Last but not least, I. Sucholutsky and M. Schonlau in [30] focus on PG methods for datasets with complex geometries.

The Original Rsp3 Algorithm
RSP3 is one of the three proposed RSP algorithms [5]. The three algorithms are descendants of the Chen and Jozwik algorithm (CJA) [31]. However, RSP3 is the only parameterfree RSP algorithm (CJA included) and builds the same condensing set regardless of the order of data in the training set.
RSP3 works as follows: It initially finds the pair of the furthest instances, a and b, in the training set (see Figure 1). Then, it splits the training set into two subsets, C a and C b , with the training instances assigned to their closest furthest instance. Then, in each algorithm iteration and by following the aforementioned procedure, a non-homogeneous subset is divided into two subsets. The splitting tasks stop when all created subsets become homogeneous. Then, the algorithm generates prototypes. For each created subset C, RSP3 computes its mean by averaging its training instances. The mean instance that is labeled by the class of the instances in C plays the role of a generated prototype and is placed in the condensing set. The pseudo-code presented in Algorithm 1 is a possible non-recursive implementation of RSP3 that uses a data structure S to store subsets. In the beginning, the whole training set (TS) is a subset to be processed, and it is placed in S (line 2). At each iteration, RSP3 selects a subset C from S and checks whether it is homogeneous or not. If C is homogeneous, the algorithm computes its mean instance and stores it in the condensing set (CS) as a prototype (lines [6][7][8][9]. Then, C is removed from S (line 17). If C is non homogeneous, the algorithm finds the furthest instances a and b in C (line 11) and divides C into two subsets C a and C b by assigning each instance of C to its closest furthest instance (lines 12-13). The new subsets C a and C b are added to S (lines 14-15), and C is removed from S (line 17). The loop terminates when S becomes empty (line 18), i.e., when all subsets become homogeneous. if C is homogeneous then 7: r ← mean instance of C 8: r.label ← class of instances in C (a, b) ← furthest instances in C {Algorithm 2 is applied} 12: C a ← set of C instances closer to a 13: add(S, C a ) 15: add(S, C b ) 16: end if 17: remove(S, C) 18: until IsEmpty(S) {all subsets became homogeneous} 19: return CS In the close to class decision boundaries areas, the training instances from different classes are close to each other. RSP3 creates more prototypes for those data areas, since many small homogeneous subsets are created. Similarly, more subsets are created and more prototypes are generated for noisy data areas. In effect, a subset with only one instance constitutes noise. In contrast, fewer and larger subsets are created for the "internal" data areas which are far from the decision boundaries where a class dominates.
Sanchez, in his experimental study presented in [5], showed that RSP3 generates a small condensing set. When an instance-based classifier such as k-NN utilizes the RSP3 generated condensing set, it achieves accuracy almost as high as when k-NN runs over the original training set. However, the computational cost of the classification step is significantly lower.
The retrieval of the pair of the furthest instances in each subset requires the computation of all distances between the instances of the subset. This approach is simple and straightforward. However, it is a computationally expensive task that burdens the overall pre-processing cost of the algorithm. In cases of large datasets, this drawback may render the execution of RSP3 prohibitive.
In this respect, the conventional RSP3 algorithm computes |C|×(|C|−1) 2 distances in order to find the most distant instances in each subset C. Thus, for each subset division, RSP3 proceeds with the pseudo-code outlined in Algorithm 2.

Algorithm 2 The Grid algorithm
end if 10: end for 11: end for 12: In effect, a grid of distances is computed; hence, Algorithm 2 is labeled "The Grid Algorithm". It returns the furthest instances f inst 1 and f inst 2 in C along with their distance D max . Hereinafter, each reference to the "RSP3" acronym implies the RSP3 algorithm whereby the most distant instances are calculated by applying the Grid algorithm to all the instances in each subset. It is worth mentioning that RSP3 as implemented in the KEEL software [32] applies this simple and straightforward approach for finding the pair of the most distant instances in C.

The Rsp3 with Editing (Rsp3e) Algorithm
The RSP3 with editing (RSP3E) algorithm incorporates an editing mechanism that removes noise from the training data. RSP3E is almost identical to the conventional RSP3. However, it involves a major difference: If a subset with only one instance is created, this subset is considered to be noise. In effect, such an instance is surrounded by instances that belong to different classes. The algorithm does not proceed with the prototype generation for this subset. Therefore, for each subset containing only one instance, RSP3E does not generate a prototype in the condensed set. RSP3E addresses Motive-B (defined in Section 1). RSP3E has been inspired by the idea first adopted by ERHC [13] and EHC [33].

The Rsp3-Rnd and Rsp3e-Rnd Algorithms
As already explained, RSP3 finds the pair of the furthest instances in each subset in order to divide it. Likely, the most distant instances in a subset belong to different classes. By splitting the subset using such instances, the probability of creating two large homogeneous subsets is higher. Thus, RSP3 may need fewer iterations in order to divide the whole training set into homogeneous subsets, and the reduction rates may be higher.
The RSP3-RND and RSP3E-RND algorithms were inspired by the following observation: RSP3 can run and produce a condensing set even if it selects any pair of instances instead of the pair of the furthest instances. In that case, the algorithm will likely need more subset divisions and the data reduction rate will be lower. However, the procedure of subset division will be much faster, since the costly retrieval of the furthest instances will be avoided. This simple idea is adopted by RSP3-RND and RSP3E-RND that work similarly to RSP3 and RSP3E, respectively, but they randomly select the pair of instances used for subset division.
RSP3-RND and RSP3E-RND will generate different condensing sets in different executions. In other words, the number of divisions and the generated prototypes depend on the selection of the random pairs of instances. RSP3-RND addresses Motive-A, while RSP3E-RND addresses both motives.

The Rsp3-M and Rsp3e-M Algorithms
The RSP3-M and RSP3E-M algorithms are two more simple variations of RSP3 and RSP3E, respectively. Both work as follows: Initially, the algorithms find the two top classes in terms of instances belonging to them. These classes are called the common classes. The mean instances of the common classes constitute the pair of instances based on which a non-homogeneous set is divided into two subsets (see Figure 2). Obviously, similar to RPS3-RND and RSP3E-RND, RSP3-M addresses Motive-A while RSP3E-M addresses both motives. In effect, RSP3-M and RSP3E-M speed up the original algorithm because they replace the computation of the furthest instances of a set with the computation of the two common classes of the set and the corresponding mean instances, which is a much faster approach. The idea behind RSP3-M and RSP3E-M is quite simple: By dividing a non-homogeneous set into two subsets based on the means of the most common classes in the subset, it is more probable for the algorithms to earlier obtain large homogeneous subsets. We expect that both RSP3-M and RSP3E-M will increase the reduction rates at a maximum level. However, this may negatively affect accuracy.

The Rsp3-M2 and Rsp3e-M2 Algorithms
The RSP3-M2 and RSP3E-M2 algorithms are almost identical to RSP3-M and RSP3E-M, respectively. The only difference is that instead of using the generated mean instances of the most common classes in order to divide a non-homogeneous subset, RSP3-M2 and RSP3E-M2 identify and use the training instances that are closer to the mean instances (see Figure 3). We expect that this may reduce the reduction rates, and as a result, the accuracy achieved by RSP3-M2 and RSP3E-M2 will be higher compared to that of RSP3-M and RSP3E-M.

Experimental Setup
The original RSP3 algorithm and its proposed variations were coded in C++. Moreover, we include the results of the NOP approach (no data reduction) for comparison purposes. The experiments were conducted on a Debian GNU/Linux server equipped with a 12core CPU with 64 GB of RAM. The experimental results were measured by running the k-NN classifier (with k = 1) over the original training set (case of NOP classifier) and the condensing sets generated by the conventional RSP3 algorithm and its proposed variations. The k parameter value is the only parameter used in the experimental study. Following the common practice in the field of data reduction for instance-based classifiers, we used the setting k = 1.
We used 16 datasets distributed by the KEEL [34] and UCI machine learning [35] repositories, whose main characteristics are summarized in Table 1. Each dataset's attribute values were normalized to the range [0, 1], and we used the Euclidean distance as a similarity measure. We removed all nominal and fixed-value attributes and the duplicate instances from the KDD dataset, thus reducing its size to 141,481 instances.
As mentioned above, the major goal of the proposed variants of RSP3 is to minimize the computational cost needed for the condensing set construction. High reduction rates as well as keeping the accuracy at high levels are also goals. Thus, for each algorithm and dataset, we used a five-fold cross-validation schema to measure the following four metrics: (i) Accuracy (ACC), (ii) Reduction Rate (RR), (iii) Distance Computations (DC) required for the condensing set construction (in millions (M)), and, (iv) CPU time (CPU) in seconds required for the condensing set construction.  Table 2 presents, for each dataset and algorithm, the ACC, RR, DC and CPU measurements. Table 3 summarizes the measurements of Table 2 and presents the average measurements as well as the standard deviation and the coefficient variance of themeasurements.

Experimental Results
Furthermore, Figures 4-7 present an overview of average measurements in bar diagrams. More specifically, Figure 4 depicts the average accuracy measurements computed by averaging the ACC measurements achieved by the 1-NN classifier using the condensing set generated by the algorithms. Correspondingly, Figure 5 presents the average RR measurements achieved by the algorithms on the different datasets. Figure 6 illustrates the average distance computations and Figure 7 shows the average CPU times. The diagrams presented in Figures 4-6 are in linear scale, while the diagram presented in Figure 7 is in logarithmic scale.
The results reveal that all algorithms are relatively close in terms of accuracy. However, RPS3, RSP3E, RSP3-RND, RSP3E-RND and RSP3E-M2 achieve the highest ACC measurements. Nevertheless, the high reduction rates achieved by RSP3-M, RSP3-M2 and RSP3E-M seem to negatively affect accuracy. Almost in all cases, RSP3E achieves the highest accuracy, while RSP3E-RND and RSP3-M2 follow. The results indicate that the editing mechanism incorporated by these algorithms is effective.
Concerning RR measurements, we observe in Table 2 that RSP3E-M has the highest performance. However, as mentioned above, these high reduction rates negatively affect accuracy. Furthermore, we observe that the algorithms that incorporate the editing mechanism seem to be more effective in terms of RR measurements. In particular, by removing the useless noisy instances from the data, they achieve higher RR measurements than the algorithms that do not incorporate editing and, at the same time, their accuracy is either improved or is not negatively affected.
Moreover, we can observe that the proposed RSP3 variations outperform the original RSP3, in terms of RR, DC and CPU measurements, which concern the pre-processing cost required for the condensing set construction. This happens because RSP3 computes a large number of distances. In contrast, the proposed variations divide the subsets by avoiding computationally costly procedures. As far as the large datasets are concerned (i.e., KDD, SH, LIR, MGT), the gains are extremely high. In contrast, RSP3 leads to noticeably high CPU costs. Figures 6 and 7 visualize this extreme superiority in terms of pre-processing computational cost.
As far as the large datasets are concerned (i.e., KDD, SH), RSP3 leads to noticeably intensive CPU. The experimental results reveal that RSP3-M and RSP3E-M are faster than RSP3-M2 and RSP3E-M2, respectively. In addition, RSP3-M2 and RSP3E-M2 are faster than RSP3-RND and RSP3E-RND, and the latter are faster then the original RSP3 algorithm and the proposed RSP3E variant.
By observing Tables 2 and 3 and Figures 4-7, we observe that the variations with the editing mechanism that removes the subsets containing only one instance (i.e., RSP3E, RSP3E-RND, RSP3E-M and RSP3E-M2) achieve quite higher RR measurements when compared with the corresponding methods without the editing mechanism, and, at the same time, in most cases, they achieve higher accuracy.
Finally, the experimental results show that the RR measurements achieved by RSP3E-M are the highest. In contrast, as expected, RSP3-RND is the algorithm with the lowest reduction rates.

Wilcoxon Signed Rank Test Results
Following the common approach that is applied in the field of PS and PG algorithms [3,4,10,14,24,25,27], the experimental study is complemented with a Wilcoxon signed rank test [36]. Thus, we statistically confirm the validity of the measurements presented in Table 2. The Wilcoxon signed rank test compares all the algorithms in pairs, considering the result achieved against each dataset. We applied the Wilcoxon signed rank test using the PSPP statistical software.
As mentioned above, it is clear that RSP3-M and RSP3E-M compute fewer distances than RSP3-M2 and RSP3E-M2, respectively. Furthermore, RSP3-M2 and RSP3E-M2 compute fewer distances than RSP3-RND and RSP3E-RND, and the latter compute fewer distances than RSP3 and RSP3E. Thus, we do not run the Wilcoxon test for the DC measurements. Table 4 presents the results of the Wilcoxon signed rank test obtained for the ACC, RR and CPU measurements. The column labeled "w/l/t" lists the number of wins, losses and ties for each comparison test. The column labeled "Wilcoxon" (last column) lists a value that quantifies the significance of the difference between the two algorithms compared. When this value is lower than 0.05, one can claim that the difference is statistically significant.
In terms of accuracy, the results show that the statistical difference between the following pairs is not significant: NOP versus RSP3E, NOP versus RSP3E-RND and NOP versus RSP3E-M2. In contrast, the statistical difference between the conventional RSP3 algorithm and NOP is significant. Thus, we can claim that the 1-NN classifier that runs over the condensing set generated by the proposed RSP3E, RSP3E-RND and RSP3E-M2 algorithms achieves as high accuracy as the 1-NN classifier that runs over the original training set. Moreover, the test shows that there is no significant difference in terms of accuracy between the original version of RSP3 and the following proposed variants: RSP3E, RSP3-RND, and RSP3E-RND.
In contrast, there is statistical difference in terms of Reduction Rates and CPU times. This means that we can obtain as high accuracy as that of the original RSP3 algorithm but with lower computational cost, while the cost of the condensing set construction is lower. Moreover, the test confirms that there is statistical difference in terms of accuracy between the pairs RSP3-M versus RSP3-M2 and RSP3E-M versus RSP3E-M2. Although there is a significant difference in terms of RR and CPU measurements, RSP3-M2 and RSP3E-M2 can be considered better. Last but not least, the test shows that RSP3E, RSP3E-RND, RSP3E-M and RSP3E-M2 dominate RSP3, RSP3-RND, RSP3-M and RSP3-M2, respectively, in terms of reduction rates, while the accuracy and the CPU times are not negatively affected.

. Friedman Test Results
The non-parametric Friedman test was used in order to rank the algorithms. The test ranks the algorithms for each dataset separately. The best performing algorithm is ranked number 1, the second best is ranked number 2, etc. We used the Friedman test through the PSPP statistical software. The test was run three times, one for each criterion measured. Table 5

Conclusions
This paper proposed three RSP3 variations that aim at reducing the computational cost involved by the original RSP3 algorithm. All the proposed variations replace the costly task of finding the pair of the furthest instances in a subset by a faster procedure. The first one (RSP3-RND) selects two random instances. The second one (RSP3-M) computes and uses the means of the two most common classes in a subset. The last variation (RSP3-M2) uses the instances that are closer to the means of the two most common classes in a subset.
Moreover, the present paper proposed an editing mechanism for noise removal. The latter does not generate a prototype for each homogeneous subset that contains only one training instance. In effect, this instance is considered noise and is removed. The editing mechanism can be incorporated into any RSP3 algorithm (original RSP3 included). Therefore, in this paper, we developed and tested seven new versions of the original RSP3 PG algorithm (i.e., RSP3E, RSP3-RND, RSP3E-RND, RSP3-M, RSP3E-M, RSP3-M2, RSP3E-M2).
The experimental study as well as the Wilcoxon and Fridman tests revealed that the editing mechanism is quite effective since it removes a high number of irrelevant training instances that do not contribute in classification accuracy. Thus, the reduction rates are improved either with gains or, at least, without loss in accuracy. In addition, the results showed that RSP3-M2 is more effective than RSP3-M. Although the RSP3-RND variation is simple, it is quite accurate. This happens because the RR achieved by RSP3-RND is not very high.
In our future work, we plan to develop data reduction techniques for complex data, such as multi-label data, data in non-metric spaces and data streams. Data Availability Statement: Publicly available datasets were analyzed in this study. These data can be found here: https://archive.ics.uci.edu/ml/ and https://sci2s.ugr.es/keel/datasets.php (accessed on 2 October 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: