2.1. Adaptive Pendular Truncation Algorithm
Let be a sample of size N of independent, identically-distributed random variables with unknown distribution F(x), where is Tukey’s model of outliers, is the reference aprioristic distribution, is the outlier distribution, is the outlier fraction, and is the number of outliers in the sample. We assume that , and are absolutely continuous unimodal distributions with densities , and , respectively.
The standard problem of detection and selection of
outliers remote from the center of the distribution
reduces to the problem of testing of hypotheses:
Let us consider an anomaly measure based on the functional
where
is the known function, and introduce a sample
,
with variable size. According to the anomaly measure, we transform the sample observations to the form
Let us sort the variables , , and consider the consecutive procedure of detection of applicants for outliers. The outliers according to the anomaly measure T are represented by extreme ordinal statistics . The observation corresponding to is an applicant for outlier status; therefore, we remove it from the sample . As a result, we obtain the sample of size (n − 1). This procedure of detection of applicants for outlier status is repeated for . The sample observations thus removed are not outliers; they are only applicants for outliers. To determine which of them are outliers, an additional decision making procedure is required.
Let us introduce the statistic
where
Since and , it follows that and, hence, the statistic 0 < Ln ≤ 1 is a monotonically decreasing function of n.
Let us find average values of the statistics
,
, and
:
where
. Let us consider the first-order differences of
:
and find the average value of the difference
:
As follows from Equation (10), the first-order differences in the presence of k outliers are, on average, constant at the level , and in the absence of outliers (), they are, on average, constant at the level , where . At the point , the function jumps on average by .
Let us consider the second-order differences . They are on average equal to zero, and at the point , a delta-shaped spike of the function is observed.
The special features in the behavior of the statistics
,
, and
indicated above allow us to construct a consecutive procedure of adaptive pendular truncation (APT) for outlier detection and selection based on the empirical influence and sensitivity functions [
7,
8] that generalizes the adaptive pendular truncation algorithm (APTA) [
20].
2.2. Adaptive Pendular Truncation Algorithm
For the sample , , we perform the following procedures:
Calculate ,
Calculate ,
Sort the variables , ,
Calculate ,
Calculate ,
Find the first-order differences ,
Find the second-order differences ,
Remove the observation corresponding to from the sample,
Execute the above cycle from item 1 to item 9 for .
We note that the APTA is nonparametric, that is, the result of its execution is independent of the form of the distribution and automatically finds on which side of the center the applicant for the outlier status is located.
Generalization of the Algorithm
As the anomaly measure and the transformation described by Equation (1), the functionals , , and can be used, where is a continuous function with bounded variation, is a parameter, and is an estimate of the parameter .