# Semantics of Voids within Data: Ignorance-Aware Machine Learning

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Ignorance, AI, and the Open World Assumption

## 3. Ignorance Discovery and Visualization

#### 3.1. Ignorance Driven by Gabriel Neighbors

#### 3.2. Shape-Aware Ignorance

- The term “cluster” is used to name each of the manifolds in the data space around the groups of the data exemplars labeled with the same class (color).
- All the clusters are potential parents, and the discovered ignorance points are their children.
- All parents have as many “chromosomes” as they have data exemplars.
- Each pair of chromosomes from different parents may produce a child (exemplar of ignorance), and the child will be located exactly in the middle of the parent chromosomes.
- Each chromosome can be used only k times for making children (then it will be “retired”) and no more than once with the chromosomes from the same partner.
- The closer parent chromosomes are located to each other, the faster they produce a child (if the distance between two pairs of chromosomes is the same, then the advantage is given to “fresher” (more relaxed from previous birth) parents).
- Additional (optional) rule: If a newly born “child” (”ignorance point”) appears in the area of a certain cluster of data (within one of the parent’s or some other one), then transform (“recolor”) the ignorance point into the data point (“chromosome”) of that cluster, which will have the full right to make its own children as the regular data exemplar.

- (1)
- First, the manifold distance $\mathfrak{D}\left(\mathbb{A},\mathbb{B}\right)$ is computed (Figure 6a).
- (2)
- The valid pairs of “parents” (${A}_{i},{B}_{j}$) from the manifolds’ boundaries are nominated as follows: $\forall {A}_{i},{B}_{j}\left\{{A}_{i}\in \mathbb{A};{B}_{j}\in \mathbb{B};d\left({A}_{i},{B}_{j}\right)=\mathfrak{D}\left(\mathbb{A},\mathbb{B}\right)\right\}$, where d is distance between points in, e.g., Euclidean distance (see Figure 6b).
- (3)
- For each pair, the “child” (ignorance boundary point) is created, which is located exactly in the middle between parent points (Figure 6b–e).
- (4)
- The points of discovered ignorance boundary are connected to form an ignorance zone as shown in Figure 6f.

- (1)
- We nominate the valid pairs of “parents” (${A}_{i},{B}_{j}$) from the manifolds’ boundaries $\forall {A}_{i},{B}_{j}\left({A}_{i}\in \mathbb{A};{B}_{j}\in \mathbb{B}\right)$ such that (for each pair) there exists an empty circle touching the manifold boundaries $\mathbb{A}$ and $\mathbb{B}$ exactly in the points ${A}_{i}$ and ${B}_{j},$ respectively (see Figure 7a).
- (2)
- We set up the “sightline” through each pair of the points ${A}_{i}$ and ${B}_{j},$ and we discover the corresponding points ${A}_{i}^{\prime}$ and ${B}_{j}^{\prime}$ $\left({A}_{i}^{\prime}\in \mathbb{A};{B}_{j}^{\prime}\in \mathbb{B}\right)$ on the manifolds’ boundaries so that sightline segments ${A}_{i},{A}_{i}^{\prime}$ and ${B}_{j},{B}_{j}^{\prime}$ are placed completely within the corresponding manifolds, as shown in Figure 7a.
- (3)
- For each pair of parents ${A}_{i}$ and ${B}_{j}$ and with respect to their “counterparts” ${A}_{i}^{\prime}$ and ${B}_{j}^{\prime}$, the “child” (ignorance curve point) ${I}_{k}$ is created, which is located on the sightline between the parents so that the following balance is kept (Figure 7a): $\frac{d\left({A}_{i},{I}_{k}\right)}{d\left({B}_{j},{I}_{k}\right)}=\frac{d\left({B}_{j},{B}_{j}^{\prime}\right)}{d\left({A}_{i},{A}_{i}^{\prime}\right)}$.
- (4)
- The ends of the ignorance curve are computed as shown in Figure 7b. The sightline ${A}_{i},{B}_{j}$ as well as the sightline ${A}_{r},{B}_{s}$ correspond to the circles with the infinite radius touching both manifold boundaries, and therefore the children ${I}_{k}$ and ${I}_{t}$ are produced just in the middle between the parents without the use of counterparts.
- (5)

#### 3.3. Density-Aware Ignorance

## 4. A Generic Model of Ignorance

**,**remains the same for all the computations. All such zones when merged (Figure 11b) form an ignorance area around each point within each cell. Finally, we get a kind of “ignorance-aware” Voronoi diagram, in which the ignorance zones and the zones of believed certainty are clearly indicated (Figure 11c).

## 5. Using Ignorance as a Driver for Prototype Selection

#### 5.1. Incremental Prototype Selection

- 1.
- The domain boundary for the given training set is calculated (either the smallest rectangular or the circle circumscribing all the points in the original dataset). The set of already selected prototypes is initialized as an empty one.
- 2.
- At every iteration step, the ignorance zones are discovered within the domain populated only with the already selected prototypes. (Notice that the domain as whole, being empty at the very beginning, is considered the largest and only ignorance zone at the initial stage of the iteration process).
- 3.
- At every iteration step, a curiosity focus (the center of the largest ignorance zone) is discovered; a nearest neighbor query is initiated to the original dataset; and the data sample (closest to the curiosity focus and located within the space of the corresponding ignorance zone) from the original dataset is taken and added to the already selected prototypes set. If the largest ignorance zone does not have new points to be taken from the original dataset, then the second largest ignorance zone is examined, and so on. Stopping criteria: the algorithm stops when either none of the ignorance zones have vacant points in the original dataset, or the radius of the largest ignorance zone reaches some predefined minimum ${\epsilon}_{0}$. Otherwise, the stages (2) and (3) are repeated continuously.

#### 5.2. Adversarial Prototype Selection

- 1.
- The domain boundary for the given training set is calculated (either the smallest rectangular or the circle circumscribing all the points in the original dataset). The sets of already selected prototypes are initialized as the empty ones for both authors: the professor and the student.
- 2.
- At every iteration step, the ignorance zones are discovered (in a similar way as in the IPS algorithm) synchronously and independently for the professor using his/her already selected prototypes and for the student using his/her already selected prototypes.
- 3.
- At every iteration step, the curiosity zones are discovered for both actors separately; for the professor:$$Curiosit{y}_{professor}=Ignoranc{e}_{student}{{\displaystyle \cap}}^{\text{}}Ignoranc{e}_{professor};$$$$Curiosit{y}_{student}=Ignoranc{e}_{student}{{\displaystyle \cap}}^{\text{}}\left(\neg Ignoranc{e}_{professor}\right);$$
- 4.
- At every iteration step, a curiosity focus (the center of the largest circle within the curiosity zone) is discovered for both actors separately; appropriate nearest neighbor queries are initiated to the original dataset (from the professor and from the student); the data sample (closest to professor’s curiosity focus and located within the space of the corresponding curiosity zone) from the original dataset is taken and added to the already selected prototypes set of the professor; the same is done with the student’s query. If it happens that both queries result in the same prototype, then the advantage to obtain it will be given to the student. If the largest curiosity zone does not have new points to be taken from the original dataset, then the second largest curiosity zone is examined, and so on. Stopping criteria: the algorithm stops when the curiosity of both actors will be completely satisfied, i.e., either none of the curiosity zones (neither the student’s nor the professor’s ones) have vacant points in the original dataset, or the radius of the largest curiosity zone (simultaneously for the professor and for the student) reaches some predefined minimum ${\epsilon}_{0}$. Otherwise, the stages (2), (3), and (4) are repeated continuously.

#### 5.3. Generic Settings for the Experiments with the Prototype Selection Algorithms

#### 5.4. Results of the Experiments

- (1)
- Apply the Principal Component Analysis (PCA) and get the 2-D projection of S, which will be named S(0).
- (2)
- $\forall i$: Remove attribute $i$ from S, obtain the reduced dataset, and then use the PCA and get the 2-D projection of it named S(i).
- (3)
- After applying step (2) $\forall i$, one gets n different “quasi-orthogonal” 2-D projections of S, and, therefore, the whole set of 2-D projections of S together with S(0) would be {S(0), S(1), …, S(n)}.
- (4)
- $\forall S\left(i\right),i=\overline{0,n}$: apply our prototype selection algorithm (either IPS or APS) as described for 2-D analysis separately to each projection and get n + 1 sets of selected prototypes: {P(0), P(1), P(2),…, P(n)}.
- (5)
- Complete the final selection $\mathbf{P}$ of the prototypes from the original dataset S as follows: the sample p
_{r}from the dataset S will be included to $\mathbf{P}$, if it appears at least k times within the prototype sets: {P(0), P(1), P(2),…, P(n)}, where the best k (1 $\le $ k $\le $ n) can be chosen experimentally (we recommend using k = n, aiming for the best retention rate). - (5*)
- Another option of the completion rule (5), which does not require any assumptions on the parameter k, could be as follows:$$\mathrm{P}=\mathrm{P}\left(0\right)\cap \left[\mathrm{P}\left(1\right){{\displaystyle \cup}}^{\text{}}\mathrm{P}\left(2\right)\dots {{\displaystyle \cup}}^{\text{}}\mathrm{P}\left(\mathrm{n}\right)\right]$$

#### 5.5. Experiments with the Datasets in the GIS Context

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Couclelis, H. The Certainty of Uncertainty: GIS and the Limits of Geographic Knowledge. Trans. GIS
**2003**, 7, 165–175. [Google Scholar] [CrossRef] - Leyk, S.; Boesch, R.; Weibel, R. A Conceptual Framework for Uncertainty Investigation in Map-Based Land Cover Change Modelling. Trans. GIS
**2005**, 9, 291–322. [Google Scholar] [CrossRef] - De Bruin, S. Modelling Positional Uncertainty of Line Features by Accounting for Stochastic Deviations from Straight Line Segments. Trans. GIS
**2008**, 12, 165–177. [Google Scholar] [CrossRef] - O’Sullivan, D.; Unwin, D. Geographic Information Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Mason, J.; Klippel, A.; Bleisch, S.; Slingsby, A.; Deitrick, S. Special Issue Introduction: Approaching Spatial Uncertainty Visualization to Support Reasoning and Decision Making. Spat. Cogn. Comput.
**2016**, 16, 97–105. [Google Scholar] [CrossRef] [Green Version] - Yuan, M.B.; Butternfield, M.; Gahegan, M.; Miller, H. Geospatial Data Mining and Knowledge Discovery. In Research Challenges in Geographic Information Science; McMaster, R.B., Usery, E.L., Eds.; John Wiley & Sons: Hoboken, NJ, USA; CRC Press: Boca Raton, FL, USA, 2005; Chapter 14; pp. 365–388. [Google Scholar]
- Reuter, H.I.; Nelson, A.; Jarvis, A. An Evaluation of Void-Filling Interpolation Methods for SRTM Data. Int. J. Geogr. Inf. Sci.
**2007**, 21, 983–1008. [Google Scholar] [CrossRef] - Kinkeldey, C. Development of a Prototype for Uncertainty-Aware Geovisual Analytics of Land Cover Change. Int. J. Geogr. Inf. Sci.
**2014**, 28, 2076–2089. [Google Scholar] [CrossRef] - Chan, K.C.; Hamaus, N.; Desjacques, V. Large-Scale Clustering of Cosmic Voids. Phys. Rev. D
**2014**, 90, 103521. [Google Scholar] [CrossRef] [Green Version] - Brunino, R.; Trujillo, I.; Pearce, F.R.; Thomas, P.A. The Orientation of Galaxy Dark Matter Haloes around Cosmic Voids. Mon. Not. R. Astron. Soc.
**2007**, 375, 184–190. [Google Scholar] [CrossRef] - DeNicola, D.R. Understanding Ignorance: The Surprising Impact of What We Don’t Know; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Ogata, H.; Hou, B.; Li, M.; Uosaki, N.; Mouri, K. Role of Passive Capturing in a Ubiquitous Learning Environment. In Proceedings of the IADIS International Conference Mobile Learning, Lisbon, Portugal, 14–16 March 2013; pp. 117–124. [Google Scholar]
- Terziyan, V.; Nikulin, A. Ignorance-Aware Approaches and Algorithms for Prototype Selection in Machine Learning. arXiv
**2019**, arXiv:1905.06054. [Google Scholar] - Turing, A.M. Computing Machinery and Intelligence. Mind
**1950**, 59, 433–460. [Google Scholar] [CrossRef] - Warwick, K.; Shah, H. Taking the Fifth Amendment in Turing’s Imitation Game. J. Exp. Theor. Artif. Intell.
**2017**, 29, 287–297. [Google Scholar] [CrossRef] - Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv
**2017**, arXiv:1712.01815. [Google Scholar] - Ford, M. Rise of the Robots: Technology and the Threat of a Jobless Future; Basic Books: New York, NY, USA, 2015. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] - Pynchon, T. Slow Learner. Early Stories; Little, Brown: Boston, MA, USA, 1984; Introduction; pp. 15–16. [Google Scholar]
- Reiter, R. On Closed World Data Bases. In Readings in Artificial Intelligence; Lynn, B.L., Nilsson, N.J., Eds.; Elsevier: Amsterdam, The Netherlands, 1981; pp. 119–140. [Google Scholar]
- Minker, J. On Indefinite Databases and the Closed World Assumption. In Proceedings of the 6th Conference on Automated Deduction, New York, NY, USA, 7–9 June 1982; Lecture Notes in Computer Science. Loveland, D.W., Ed.; Springer: Berlin/Heidelberg, Germany, 1982; Volume 138, pp. 292–308. [Google Scholar] [CrossRef]
- Yager, R.R. On the Dempster-Shafer Framework and New Combination Rules. Inf. Sci.
**1987**, 41, 93–137. [Google Scholar] [CrossRef] - Gabriel, K.R.; Sokal, R.R. A New Statistical Approach to Geographic Variation Analysis. Syst. Biol.
**1969**, 18, 259–278. [Google Scholar] [CrossRef] - Theodorakopoulos, I.; Economou, G.; Fotopoulos, S.; Theoharatos, C. Local Manifold Distance Based on Neighborhood Graph Reordering. Pattern Recognit.
**2016**, 53, 195–211. [Google Scholar] [CrossRef] - Wang, R.; Shan, S.; Chen, X.; Dai, Q.; Gao, W. Manifold-Manifold Distance and its Application to Face Recognition with Image Sets. IEEE Trans. Image Process.
**2012**, 21, 4466–4479. [Google Scholar] [CrossRef] - Terziyan, V. Social Distance Metric: From Coordinates to Neighborhoods. Int. J. Geogr. Inf. Sci.
**2017**, 31, 2401–2426. [Google Scholar] [CrossRef] [Green Version] - Elzinga, D.J.; Hearn, D.W. The Minimum Covering Sphere Problem. Manag. Sci.
**1972**, 19, 96–104. [Google Scholar] [CrossRef] - Ritter, J. An Efficient Bounding Sphere. In Graphics Gems; Glassner, A.S., Ed.; Academic Press Professional: San Diego, CA, USA, 1990; pp. 301–303. [Google Scholar]
- Aurenhammer, F. Voronoi Diagrams—A Survey of a Fundamental Geometric Data Structure. ACM Comput. Surv.
**1991**, 23, 345–405. [Google Scholar] [CrossRef] - Buchanan, M. Ignorance as Strength. Nat. Phys.
**2018**, 14, 428. [Google Scholar] [CrossRef] - Brighton, H.; Mellish, C. Advances in Instance Selection for Instance Based Learning Algorithms. Data Min. Knowl. Discov.
**2002**, 6, 153–172. [Google Scholar] [CrossRef] - Kononenko, I.; Kukar, M. Machine Learning and Data Mining: Introduction to Principles and Algorithms; Horwood Publishing Limited: Chichester, UK, 2007. [Google Scholar]
- Garcia, S.; Derrac, J.; Cano, J.; Herrera, F. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 34, 417–435. [Google Scholar] [CrossRef] - Gupta, S.; Gupta, A. Handling Class Overlapping to Detect Noisy Instances in Classification. In The Knowledge Engineering Review; Cambridge University Press: Cambridge, UK, 2018; Volume 33. [Google Scholar] [CrossRef]
- Olvera-López, J.A.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. Accurate and Fast Prototype Selection based on the Notion of Relevant and Border Prototypes. J. Intell. Fuzzy Syst.
**2018**, 34, 2923–2934. [Google Scholar] [CrossRef] - Zubek, J.; Kuncheva, L. Learning from Exemplars and Prototypes in Machine Learning and Psychology. arXiv
**2018**, arXiv:1806.01130. [Google Scholar] - Chen, F.; Lu, C.T. Nearest Neighbor Query, Definition. In Encyclopedia of GIS; Springer: Berlin/Heidelberg, Germany, 2008; pp. 776–782. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Ed.; MIT Press: Cambridge, MA, USA, 2014; Volume 2, pp. 2672–2680. [Google Scholar]
- Dua, D.; Taniskidou, E.K. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2017; Available online: http://archive.ics.uci.edu/ml (accessed on 1 April 2021).
- Hart, P.E. The Condensed Nearest Neighbour Rule. IEEE Trans. Inf. Theory
**1968**, 14, 515–516. [Google Scholar] [CrossRef] - Wilson, D.L. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans. Syst. Man Cybern.
**1972**, 2, 408–421. [Google Scholar] [CrossRef] [Green Version] - Johnson, B.; Tateishi, R.; Xie, Z. Using Geographically-Weighted Variables for Image Classification. Remote Sens. Lett.
**2012**, 3, 491–499. [Google Scholar] [CrossRef] - Eldawy, A.; Mokbel, M.F. The Era of Big Spatial Data: A Survey. Inf. Media Technol.
**2015**, 10, 305–316. [Google Scholar] [CrossRef] - Klippel, A.; Hirtle, S.; Davies, C. You-Are-Here Maps: Creating Spatial Awareness through Map-Like Representations. Spat. Cogn. Comput.
**2010**, 10, 83–93. [Google Scholar] [CrossRef]

**Figure 1.**Supervised learning visualized: (

**a**) for the Closed World Assumption; (

**b**) and (

**c**) for the Open World Assumption, which assumes the presence of the ignorance zones.

**Figure 2.**(

**a**) The two-point ignorance zone and its ignorance focus; (

**b**) The three-point ignorance zone and its focus.

**Figure 3.**Screenshots of the discovered ignorance zones for the artificial datasets in 2D using Gabriel neighbors.

**Figure 4.**Screenshots of the discovered ignorance zones (set of grey points) for the artificial datasets (two differently colored data manifolds) in 2D with the kNDN algorithm.

**Figure 6.**Illustration of the ignorance zone discovery with the Manifold Distance: (

**a**) Computed Manifold Distance; (

**b**) Nomination of valid “parents”; (

**c**,

**d**) discovery of a “child” for each pair of “parents”; (

**e**) all the discovered children are shown; (

**f**) discovered ignorance boundary and corresponding ignorance zone is shown. One can see that the shape of such zone does not depend much on near-neighbors from both manifolds but more on overall shapes of the manifolds.

**Figure 7.**Illustration of the ignorance (decision boundary) curve discovery with the Balanced View method: (

**a**) “Balance” equation for each pair of “parents”; (

**b**) computing locations for the ends of the ignorance curve; (

**c**) computing locations for the “children” of all pairs of “parents”; (

**d**) visualizing the resulting ignorance curve. One may see here how the actual shapes of the manifolds influence the curve shape.

**Figure 8.**Illustration of the ignorance computed with the Social Distance Metric: (

**a**) selection of a pair of points A and B, which are the Gabriel Neighbors; (

**b**) computing the left end of the ignorance segment as a touch-point of the two circles (the left circle touches also the nearest same-color neighbor of point A); (

**c**) computing the right end of the ignorance segment as a touch-point of the two circles (the right circle touches also the nearest same-color neighbor of point B); (

**d**) resulting ignorance line-segment visualized. One may notice that the ignorance focus produced by two points A and B in the Social Distance Metric is not just a unique middle point between A and B but it is a line segment. The requirement for the ignorance line segment between A and B is that that amount of homogeneous data points within the blue circle around A is the same as the amount of data points within the yellow circle around B.

**Figure 9.**Illustration of the origin and parameters of the believed certainty and the ignorance areas around point A within the domain: (

**a**) ignorance zone produced by one void; (

**b**) ignorance zones produced by some other voids; (

**c**) ignorance zones are merged so that the clear (elliptic) boundary is seen between the ignorance and certainty areas; (

**d**) computations for the boundary between the ignorance and certainty.

**Figure 10.**Illustration of the origin and parameters of the believed certainty and the ignorance areas created by a couple of conflicting data points A and B within the domain: (

**a**) the conflict between A and B makes a decision boundary between them, where all the additional ignorance zones are centered; (

**b**) the discovery of the A-B-conflict-related ignorance zones may go synchronously with the A-Domain-conflict-related and B-Domain-conflict-related ignorance zones discovery; (

**c**) both types of ignorance zones are merged so that the clear boundary is seen between the ignorance and two heterogeneous certainty areas.

**Figure 11.**The example of the Voronoi diagram evolution towards the ignorance-aware Voronoi diagram: (

**a**) the ignorance zones are computed separately for each Voronoi cell, which works as a small domain with generating point in it; (

**b**) the ignorance zones within each cell are merged into the ignorance area around the point leaving also some space for a zone of believed certainty; (

**c**) the same is done for all the Voronoi cell resulting to the ignorance-aware Voronoi diagram.

**Figure 12.**Ignorance-aware Voronoi diagram creation with two homogeneous clusters of data points: (on the left) the Voronoi cells with the same class label are merged into two subdomains separated by the decision boundary; (on the right) the ignorance zones are computed separately for each of two subdomains and merged into the ignorance area around the homogeneous groups of points, leaving also some space for two zones of believed certainty.

**Figure 13.**Incremental prototype selection algorithm with circular domain (Wine dataset): (

**a**) after the first iteration; (

**b**) after the second iteration; (

**c**) after the last iteration. Colored balls are data samples from the dataset and those bolder ones are from the set of already selected prototypes. Domain boundary is shown as a red circle and the blue circles are the voids, from which the ignorance zones are computed.

**Figure 14.**Adversarial prototype selection algorithm (“Student vs. Professor”) with circular domain (Wine dataset) after the last iteration. Colored balls are data samples from the original dataset. Domain boundary is shown as a red circle. Selected prototypes: (

**a**) Bold squares are the prototypes selected by the professor; (

**b**) Bold triangles are the prototypes selected by the student.

**Figure 15.**Injecting a spatial awareness into the Iris dataset. Samples of the three types of irises from the dataset were randomly placed within the three corresponding spatial areas (I, II, and III) on the map. Spatial coordinates have been added to the Iris dataset, which has been updated to the IRIS-GIS dataset as a result.

Dataset | #Exemplars | #Attributes | #Classes | Type of dim. Reduction | Proportion of Variance |
---|---|---|---|---|---|

Iris | 150 | 4 | 3 | ${\mathsf{\chi}}^{2}$ test | - |

Wine | 178 | 13 | 3 | ${\mathsf{\chi}}^{2}$ test | - |

Pima | 768 | 8 | 2 | ${\mathsf{\chi}}^{2}$ test | - |

Breast Cancer | 699 | 9 | 2 | PCA | 0.76 |

Ionosphere | 351 | 34 | 2 | PCA | 0.42 |

Glass | 214 | 9 | 7 | PCA | 0.63 |

Bupa | 345 | 6 | 2 | PCA | 0.60 |

Transfusion | 748 | 4 | 2 | PCA | 0.93 |

Dataset | Full Set | IPS (Rectangular) | IPS (Circular) | |||
---|---|---|---|---|---|---|

ER | RR | ER | RR | ER | RR | |

Iris | 3.7 | 100 | 4.4 | 19.5 | 2.6 | 17.6 |

Wine | 12.9 | 100 | 14.1 | 22.9 | 13.9 | 23.7 |

Pima | 35.6 | 100 | 44.8 | 9.2 | 28.3 | 10.7 |

Breast Cancer | 5.3 | 100 | 4.7 | 4.6 | 4.2 | 5.6 |

Ionosphere | 29.0 | 100 | 28.4 | 18.1 | 19.9 | 18.7 |

Glass | 38.4 | 100 | 48.7 | 40.2 | 42.7 | 45.8 |

Bupa | 46.8 | 100 | 48.2 | 25.8 | 50.8 | 23.0 |

Transfusion | 31.3 | 100 | 27.6 | 6.2 | 26.3 | 7.4 |

AVERAGE | 25.38 | 100 | 27.61 | 18.31 | 24.84 | 19.06 |

**Table 3.**Results of the performance comparison of the suggested ignorance-aware incremental prototype selection (IPS) algorithm vs. Condensed Nearest Neighbor (CNN) algorithm vs. Edited Nearest Neighbor (ENN) algorithm.

Dataset | Full Set | IPS | CNN | ENN | ||||
---|---|---|---|---|---|---|---|---|

ER | RR | ER | RR | ER | RR | ER | RR | |

Iris | 11.0 | 100 | 11.2 | 18.2 | 12.2 | 36.4 | 9.2 | 94.5 |

Wine | 17.6 | 100 | 17.1 | 28.0 | 26.6 | 37.1 | 15.1 | 84.3 |

Pima | 36.4 | 100 | 29.4 | 12.6 | 36.8 | 59.0 | 38.5 | 69.0 |

Breast Cancer | 6.3 | 100 | 5.1 | 5.7 | 6.2 | 38.4 | 4.6 | 96.9 |

Ionosphere | 31.3 | 100 | 27.3 | 25.9 | 35.0 | 60.1 | 39.0 | 69.3 |

Glass | 41.6 | 100 | 44.1 | 48.8 | 59.3 | 17.3 | 46.5 | 33.7 |

Bupa | 48.0 | 100 | 48.0 | 34.6 | 51.8 | 70.2 | 50.2 | 57.3 |

Transfusion | 32.2 | 100 | 28.7 | 14.9 | 37.1 | 47.5 | 44.6 | 65.8 |

AVERAGE | 28.05 | 100 | 26.36 | 23.59 | 33.13 | 45.8 | 30.96 | 71.35 |

Dataset | Full Set | Professor’s Set | Professor’s + Student’s Set | |||
---|---|---|---|---|---|---|

Arithmetic Mean | Contra-Harmonic Mean | Arithmetic Mean | Contra-Harmonic Mean | Arithmetic Mean | Contra-Harmonic Mean | |

Iris | 3.7 | 11.0 | 3.9 | 9.9 | 2.9 | 11.0 |

Wine | 12.9 | 17.6 | 12.9 | 19.7 | 13.0 | 17.5 |

Pima | 35.6 | 36.4 | 29.8 | 31.0 | 31.2 | 32.2 |

Breast Cancer | 5.3 | 6.3 | 4.0 | 5.6 | 4.1 | 5.3 |

Ionosphere | 29.0 | 31.3 | 26.4 | 29.6 | 25.8 | 28.3 |

Glass | 38.4 | 41.6 | 47.1 | 50.1 | 38.4 | 41.5 |

Bupa | 46.8 | 48.0 | 46.1 | 47.5 | 46.9 | 48.1 |

Transfusion | 31.3 | 32.2 | 27.5 | 28.4 | 27.2 | 28.7 |

AVERAGE | 25.38 | 28.05 | 24.71 | 27.73 | 23.69 | 26.58 |

Compared with | AVERAGE (Full set) | 25.38 | 28.05 | 25.38 | 28.05 | |

AVERAGE (IPS) | 23.46 | 26.36 | 23.46 | 26.36 | ||

AVERAGE (CNN) | 30.51 | 33.13 | 30.51 | 33.13 | ||

AVERAGE (ENN) | 28.46 | 30.96 | 28.46 | 30.96 |

Dataset | Full Set | Professor’s Set | Professor’s + Student’s Set | |||
---|---|---|---|---|---|---|

Arithmetic Mean | Contra-Harmonic Mean | Arithmetic Mean | Contra-Harmonic Mean | Arithmetic Mean | Contra-Harmonic Mean | |

Iris | 100 | 100 | 16.5 | 16.8 | 40.0 | 40.3 |

Wine | 100 | 100 | 21.9 | 22.8 | 52.2 | 52.8 |

Pima | 100 | 100 | 10.4 | 11.2 | 26.9 | 28.0 |

Breast Cancer | 100 | 100 | 5.9 | 6.1 | 13.8 | 14.1 |

Ionosphere | 100 | 100 | 18.0 | 19.3 | 43.6 | 44.7 |

Glass | 100 | 100 | 28.5 | 29.9 | 70.4 | 71.6 |

Bupa | 100 | 100 | 20.8 | 23.5 | 53.5 | 56.3 |

Transfusion | 100 | 100 | 8.0 | 9.1 | 21.4 | 22.8 |

AVERAGE | 100 | 100 | 16.25 | 17.34 | 40.23 | 41.33 |

Compared with | AVERAGE (Full set) | 100 | 100 | 100 | 100 | |

AVERAGE (IPS) | 22.31 | 23.59 | 22.31 | 23.59 | ||

AVERAGE (CNN) | 45.71 | 45.80 | 45.71 | 45.80 | ||

AVERAGE (ENN) | 71.26 | 71.35 | 71.26 | 71.35 |

“Forest Types” Dataset | Full Set | CNN | ENN | IPS | APS (Prof.) | APS (Prof.+Student) |
---|---|---|---|---|---|---|

ER (Arithmetic) | 14.30 | 29.00 | 13.80 | 12.60 | 13.50 | 12.80 |

ER (Contra-Harmonic) | 18.00 | 34.00 | 17.10 | 16.90 | 17.20 | 16.70 |

RR (Arithmetic) | 100.00 | 26.90 | 76.30 | 34.10 | 22.70 | 56.00 |

RR (Contra-Harmonic) | 100.00 | 27.00 | 76.40 | 34.40 | 23.30 | 56.60 |

Overall Quality (Arithmetic) | 42.85 | 72.05 | 54.95 | 76.65 | 81.90 | 65.60 |

Overall Quality (Contra-Harmonic) | 41.00 | 69.50 | 53.25 | 74.35 | 79.75 | 63.35 |

**Table 8.**Results of the experiments with the Iris vs. IRIS-GIS datasets and the Wine vs. WINE-GIS datasets.

IRIS (with PCA) | Full Set | CNN | ENN | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 6.70 | 8.80 | 5.50 | 6.30 | 7.50 | 7.10 | 6.10 |

ER (Contra-Harmonic) | 12.20 | 14.90 | 11.60 | 13.00 | 14.20 | 12.80 | 12.10 |

RR (Arithmetic) | 100.00 | 38.00 | 92.80 | 23.90 | 22.30 | 30.00 | 52.30 |

RR (Contra-Harmonic) | 100.00 | 38.20 | 92.80 | 24.70 | 22.90 | 30.50 | 53.00 |

Ov. Quality (Arithmetic) | 46.65 | 76.60 | 50.85 | 84.90 | 85.10 | 81.45 | 70.50 |

Ov. Quality (Contra-H.) | 43.90 | 73.45 | 47.80 | 81.15 | 81.45 | 78.35 | 67.45 |

IRIS-GIS (with PCA) | Full Set | CNN | ENN | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 4.20 | 5.30 | 3.50 | 6.10 | 4.40 | 4.60 | 4.00 |

ER (Contra-Harmonic) | 9.40 | 10.80 | 9.00 | 14.00 | 9.10 | 11.70 | 9.30 |

RR (Arithmetic) | 100.00 | 36.70 | 95.20 | 20.30 | 21.40 | 25.10 | 46.60 |

RR (Contra-Harmonic) | 100.00 | 36.80 | 95.20 | 20.50 | 21.70 | 25.60 | 47.10 |

Ov. Quality (Arithmetic) | 47.90 | 79.00 | 50.65 | 86.80 | 87.10 | 85.15 | 74.70 |

Ov. Quality (Contra-H.) | 45.30 | 76.20 | 47.90 | 82.75 | 84.60 | 81.35 | 71.80 |

WINE (with PCA) | Full Set | CNN | ENN | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 3.60 | 7.50 | 2.90 | 2.90 | 4.40 | 3.80 | 3.50 |

ER (Contra-Harmonic) | 8.40 | 15.10 | 7.00 | 7.20 | 9.40 | 8.10 | 8.80 |

RR (Arithmetic) | 100.00 | 30.80 | 94.80 | 19.80 | 19.10 | 26.90 | 46.10 |

RR (Contra-Harmonic) | 100.00 | 30.90 | 94.90 | 20.10 | 19.40 | 27.50 | 46.50 |

Ov. Quality (Arithmetic) | 48.20 | 80.85 | 51.15 | 88.65 | 88.25 | 84.65 | 72.20 |

Ov. Quality (Contra-H.) | 45.80 | 77.00 | 49.05 | 86.35 | 85.60 | 82.20 | 72.35 |

WINE-GIS (with PCA) | Full Set | CNN | ENN | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 2.40 | 4.40 | 1.50 | 2.30 | 1.90 | 2.10 | 2.40 |

ER (Contra-Harmonic) | 7.50 | 12.00 | 6.10 | 6.70 | 6.60 | 7.50 | 7.50 |

RR (Arithmetic) | 100.00 | 29.40 | 96.20 | 20.60 | 18.80 | 27.00 | 45.80 |

RR (Contra-Harmonic) | 100.00 | 29.50 | 96.30 | 20.70 | 19.00 | 27.60 | 46.20 |

Ov. Quality (Arithmetic) | 48.80 | 83.10 | 51.15 | 88.55 | 89.65 | 85.45 | 75.90 |

Ov. Quality (Contra-H.) | 46.25 | 79.25 | 48.80 | 86.30 | 87.20 | 82.45 | 73.15 |

**Table 6.**Results of the experiments with the quasi-orthogonal projections algorithm in 3D applied to the Iris dataset. Here, one can see the performance of IPS and APS algorithms in 2D, and also (for comparison) there are results in 3D provided by two options of the QOP algorithm.

IRIS-2D (with PCA) | Full Set | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 6.70 | 6.30 | 7.50 | 7.10 | 6.10 |

ER (Contra-Harmonic) | 12.20 | 13.00 | 14.20 | 12.80 | 12.10 |

RR (Arithmetic) | 100.00 | 23.90 | 22.30 | 30.00 | 52.30 |

RR (Contra-Harmonic) | 100.00 | 24.70 | 22.90 | 30.50 | 53.00 |

Ov. Quality (Arithmetic) | 46.65 | 84.90 | 85.10 | 81.45 | 70.50 |

Ov. Quality (Contra-Harmonic) | 43.90 | 81.15 | 81.45 | 78.35 | 67.45 |

IRIS-3D (with PCA) | Full Set | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 6.40 | 11.10 | 18.90 | 10.90 | 7.30 |

ER (Contra-Harmonic) | 12.20 | 18.20 | 34.30 | 18.30 | 13.50 |

RR (Arithmetic) | 100 | 16.50 | 5.10 | 18.20 | 44.00 |

RR (Contra-Harmonic) | 100 | 16.90 | 5.90 | 18.70 | 44.40 |

Ov. Quality (Arithmetic) | 46.80 | 86.20 | 88.00 | 85.45 | 74.35 |

Ov. Quality (Contra-Harmonic) | 43.90 | 82.45 | 79.85 | 81.50 | 71.05 |

IRIS-3D* (with PCA) | Full Set | IPS | APS (S.) | APS (P.) | APS (P.+S.) |

ER (Arithmetic) | 6.40 | 8.10 | 8.50 | 6.90 | 6.80 |

ER (Contra-Harmonic) | 12.20 | 15.30 | 16.50 | 13.80 | 12.50 |

RR (Arithmetic) | 100 | 21.20 | 14.10 | 25.50 | 49.70 |

RR (Contra-Harmonic) | 100 | 21.80 | 14.80 | 25.90 | 50.20 |

Ov. Quality (Arithmetic) | 46.80 | 85.35 | 88.70 | 83.80 | 71.75 |

Ov. Quality (Contra-Harmonic) | 43.90 | 81.45 | 84.35 | 80.15 | 68.65 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Terziyan, V.; Nikulin, A.
Semantics of Voids within Data: Ignorance-Aware Machine Learning. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 246.
https://doi.org/10.3390/ijgi10040246

**AMA Style**

Terziyan V, Nikulin A.
Semantics of Voids within Data: Ignorance-Aware Machine Learning. *ISPRS International Journal of Geo-Information*. 2021; 10(4):246.
https://doi.org/10.3390/ijgi10040246

**Chicago/Turabian Style**

Terziyan, Vagan, and Anton Nikulin.
2021. "Semantics of Voids within Data: Ignorance-Aware Machine Learning" *ISPRS International Journal of Geo-Information* 10, no. 4: 246.
https://doi.org/10.3390/ijgi10040246