A New Background-Based Extraction Algorithm under Factor Space Theory

: Under the factor space theory, this paper proposes a new background base extraction algorithm to address the background base extraction algorithm being sensitive to outliers, the excessive number of iterations of the background base extraction algorithm, and the calculation time-consuming problem being too long. Based on the isolated forest algorithm, the background-based anomaly detection algorithm can identify and delete the anomalous points that will interfere with the background base extraction results, and the background-based anomaly detection uses each factor as an attribute to segment the data space, which is suitable for high-dimensional factor space; combined with α -shape, a new background base extraction algorithm, the base point detection algorithm, was proposed, which has low time and space complexity.


Background-Based Development Process
With the advent of the information revolution, intelligent science and data science came into being, and the era of intelligent networks has enabled mankind to face major opportunities and challenges. Factor space is the mathematical basis of institutionalist artificial intelligence theory [1], and is a further improvement of the existing fuzzy set [2], rough set, and formal background theory. The factor space has a far-reaching influence, and the concept of fuzzy background is derived in combination with fuzzy set theory [3]. The factor space provides a coordinate distribution for the transformation of information concepts [4], converts virtual content into data, and the concept of data forms a background distribution. Finally, the large-scale background distribution data are compressed into a small number of background bases. The application of factor space in large-scale data is reflected in the background distribution of concept generation, and then rule induction and logical reasoning. The background base theory is an important content of factor space thinking, and the main task of the background base is to map large-scale data space into small-scale data and complete the characterization of data [5].
The background relationship is the mathematical basis of causal analysis, which is the core of the knowledge representation of the factor space, and the background base can generate the background relationship, which is the compression of the background relationship without information loss. The data processing idea of factor space is intended to convert the data processed online into a background base point in real time, and in the face of the influx of big data, the database only calmly receives a small data set full of information. Wang Peizhuang [6] discussed the important relationship between database and factor space and provided an approximate algorithm for background base extraction. Lv Jinhui [7] proposed a background base extraction algorithm based on the approximation algorithm and defined internal point determination, attempting to upgrade background base extraction to an accurate algorithm. Pu Lingjie [8] improved the background base extraction algorithm and proposes the IBBE algorithm based on the internal point discrimination method because the algorithm will re-determine each basis point in the original base point set for each sample point, which leads to high complexity and many iterations of the IBBE algorithm. The expressed knowledge is not conducive to inheritance and preservation, nor can it be extended to high-dimensional space.
Given the above problems, under the factor space theory, this paper proposed a new background base extraction algorithm to address the background base extraction algorithm being sensitive to outliers and the calculation time of the background base extraction algorithm having too many iterations when selecting the base point. The new algorithm consists of an anomaly detection algorithm and a base point detection algorithm. Firstly, a background-based anomaly detection algorithm is proposed, which is an isolated forest algorithm based on machine learning [9], which can identify and delete abnormal points that will interfere with the background base extraction results and complete the preprocessing of data. Background-based anomaly detection uses each factor as an attribute to segment the data space, which is suitable for high-dimensional factor space and can improve the efficiency and accuracy of background base extraction. Secondly, aiming at the complicated problem of background base extraction algorithm combined with the α-shape algorithm [10], a new background base extraction algorithm is proposed: a base point detection algorithm, the principle of which is to use a circle of radius r to roll over the sample point set and the set of points skipped by the rolling circle is the base point set. The time and space complexity of the base point detection algorithm is low. Finally, a new background base extraction algorithm is obtained by the background base anomaly detection algorithm and the base point detection algorithm. The final numerical experiments show that the algorithm of the new background base extraction algorithm not only has low complexity and high accuracy but can also be applied to high levels of dimensional, large-scale data [11], which is conducive to the processing of big data and high-dimensional base point data extraction.

The Background-based Anomaly Detection Algorithm
To address abnormal samples in the sample points, which will affect the extraction of the background base, this paper designed an anomaly detection algorithm based on isolated forest to identify and remove the abnormal points that interfere with the background base extraction. Anomaly detection can be applied to many fields, such as financial fraud [12], and there are many anomaly detection algorithms; compared to other density and distancebased anomaly detection algorithms such as k-means and Lof, isolated forests only require small time and space complexity and so are more suitable for high-dimensional, large-scale anomaly detection of background base samples. Based on the isolated forest algorithm, this paper designed a background-based sample anomaly detection algorithm applied to factor space theory, identified the abnormal data in the sample, and improved the accuracy of the background base algorithm.
The algorithm first needs to build a background base detection algorithm to isolate the forest and then use the isolated forest to implement background base anomaly detection.

The Base Point Detection Algorithm
This algorithm combines factor space theory with the α-shape algorithm to propose a new basis point definition. Roll outside the sample point set S with a circle of radius r, and when the radius r of the circle is large enough, the circle will only roll outside the point set, not inside the point set. The procedure of the cardinal point detection algorithm is to iterate through point A in the sample point set S and determine whether another point A' exists in the point set such that at least one of the circles with r as the radius and passing these two points does not exist. If C is an interior point of S, then the distance from the point to the center of the circle is less than r.
The algorithm uses a circle of radius r to roll outside the sample, and the point where the circle passes is the base point. The algorithm complexity of the algorithm is low and the base point detection results are accurate.

New Background Base Extraction Algorithm
The new background-based algorithm comprises an anomaly detection algorithm and a base-point detection algorithm, which can be identified by the background-based anomaly detection algorithm anomalies in the background base sample and avoid abnormal points interfering with the base point detection, which also involves the preprocessing of initial sample data to ensure high-quality detection of the base point. The cardinal point detection algorithm can quickly and efficiently traverse all points in the sample set to detect the base point and expand the algorithm to high-dimensional space. The new algorithm first detects the sample point data via the background base anomaly algorithm, and after detection, the outliers are eliminated and the remaining samples are extracted by the base point detection algorithm.
The traditional background base extraction algorithm has too many iterations and the number of iterations increases exponentially, which is not suitable for large-scale orders of magnitude and cannot be extended to high-dimensional space. The main idea of the new background base extraction algorithm is to use a circle of custom radius to skip all sample points and generate a background base. Therefore, the new background base extraction algorithm does not need to be iterated repeatedly and can be applied to big data or extended to n-dimensional space. Finally, the new background base extraction algorithm and the IBBE algorithm are compared through experiments. The experimental results showed that (1) the new algorithm runs faster and has low time complexity; (2) the new algorithm removes outliers, and outliers still exist in the IBBE algorithm; and (3) the accuracy of the new algorithm is higher.

Numerical Experiments
In order to verify the feasibility of the background base extraction algorithm and its application effect in high-dimensional space, two experiments were designed: (1) the new background base extraction algorithm was applied to the actual data, it was verified whether the new algorithm can detect abnormal data and extract the background base quickly and accurately, and the new background base extraction algorithm was compared with the IBBE algorithm of the literature. (2) To verify the application effect of the new background base extraction algorithm in high-dimensional space, the new background base extraction algorithm was compared with the IBBE algorithm of literature [8] in high dimensions.
The basic equipment of the experiment is a Windows 10 running system and a standard memory server, based on which the original data were imported into the new background base extraction algorithm for anomaly detection and background base extraction.
Finally, it can be seen through experiments that although the IBBE algorithm can also extract the background base, because it has not been anomaly detected, the IBBE algorithm will also include the abnormal points into the background base; background base extraction is the process of data compression, and redundant excess base points will have an impact on classification, so the new background base algorithm extracts the background base better. By comparing the new algorithm with the IBBE algorithm experimentally, the following findings were made: (1) for the running time of the algorithm, the new algorithm runs faster and the time complexity is low. (2) The new algorithm removes outliers, and outliers still exist in the IBBE algorithm. (3) The accuracy of the new algorithm is higher.
In general, this paper designed an anomaly detection algorithm based on isolated forest to identify and remove abnormal points that interfere with background base extraction. Then, a cardinal point detection algorithm was designed, which uses a circle with radius r to roll outside the sample, and the point where the circle passes is the cardinal point. The algorithm complexity was low and the base point detection results were accurate.
Secondly, the background-based anomaly detection algorithm and the base-point detection algorithm were formed into a new background-based algorithm, and any initial sample can be extracted from the background. Numerical experiments were conducted, and the complexity was linear. The improved background base extraction algorithm greatly simplified the computational and programming difficulties. Therefore, the improved algorithm improved the accuracy of the background base extraction algorithm. Finally, the new background base extraction algorithm lays a foundation for the improvement of the base point classification algorithm, etc.