1. Introduction
With the continuous development of national defense science and technology, aircraft play a pivotal role in aerospace, navigation, electronic communication, and other military industries. The landing point hit rate of an aircraft is a very important tactical indicator, and landing point measurement is of great significance to aircraft landing point evaluation [
1,
2]. In recent years, many scholars have carried out research on flight landing forecasting. Asadi D. et al. [
3] estimated the flight path and landing point coordinates of a damaged aircraft using numerical calculation methods; Bavle H. et al. [
4] proposed a method to accurately predict the flight altitude of unmanned aerial vehicles using 3D point cloud sensors in a cluttered and dynamic indoor environment; Marcu A. et al. [
5], based on the embeddable ConvNet theory, proposed a method that can accurately predict the landing area of the aircraft and make the predicted landing area as small as possible; Lin S. et al. [
6], based on the vision system design theory, proposed a method to enable an unmanned aerial vehicle to automatically land on the designated ship flight deck position, improving the accuracy of aircraft target tracking; Malyavej V. et al. [
7] designed a missile landing point prediction method that can be as close as possible to the target and can be specified from a specific direction by exploring the target of the precision guidance problem. Y. Kong and S. Mahadevan et al. [
8,
9] proposed an index prediction method for aircraft landing process based on DASH-link data. Based on machine learning and Bayesian neural networks, the landing vertical velocity of aircraft during landing is predicted numerically, and the landing distance of aircraft is predicted based on the multi-step rolling prediction strategy. The prediction model is verified by the test data of several flights. In particular, Yasser Yousefi et al. [
10] collected a lot of data on aircraft deviation accidents on the runway from several countries with similar aviation accident rates, and then predicted the landing position of aircraft on the existing runway so as to remove obstacles near the runway when necessary to ensure safety.
However, the above studies all require a large number of actual aircraft landing points to provide data support and have high requirements for the sample capacity and confidence of the aircraft’s field data. If a large number of field tests are carried out on an aircraft to study its landing point, there are many problems, such as high equipment cost, a large amount of deployment engineering, difficulty in organization and implementation, and potential safety hazards, so usually only a small amount of field landing point data information can be obtained [
11]. To satisfy the demand for the sample capacity of aircraft landing points, many domestic and foreign research institutions have established semi-physical simulation laboratories [
12]. Many researchers use computers to simulate target shooting and design and establish semi-physical simulation system models, which greatly reduces the test cost and provides test data, improving the sample capacity and confidence of hit assessment. As a typical method that can replace the real environment and test equipment, semi-physical simulation technology connects some entities of a research system to a simulation test loop and has become an important means and construction tool for the research, design, and development of aircraft equipment systems. More placement information has been obtained through simulation design [
13,
14]. Some scholars have established a landing point prediction model based on a multisource data fusion method for aircraft landing point prediction. Through the effective fusion of field test data and the semi-physical simulation data of the coordinates of the landing point of an aircraft, the computer automatically processes a number of predicted landing points and then combines the artificial experience to predict the aircraft landing point.
At present, domestic and foreign scholars have carried out extensive research on the accuracy fusion evaluation of multisource data, such as field tests and semi-physical simulation tests. The main data fusion techniques include deep neural network methods [
15], Bayesian inference [
16], fuzzy logic [
17] and the Dempster–Shafer evidence theory [
18]. For example, to improve the accuracy of multisensor data fusion, Zhang B. C. et al. [
19] proposes a data fusion algorithm based on a backpropagation neural network (BPNN), which can effectively address the problem of network nonconvergence and improve network performance. Gao S. Y. et al. [
20] proposes a multisensor data fusion algorithm based on particle swarm optimization and an improved a backpropagation neural network by optimizing the network input, which improves the accuracy of sensor data fusion. Y. Wang et al. [
21] and J. Huang et al. [
22] fused the data from a simulation model with real-time data based on the Bayesian entropy information fusion method and a posterior regularized Bayesian neural network method combining soft and hard knowledge constraints. Paola A. D. et al. [
23] and Hoang G. M. et al. [
24] proposed a series of data fusion methods based on Bayesian reasoning, which have good fusion effects. Majumder S. et al. [
25], Jian L. et al. [
26], and Vators K. K. et al. [
27] apply fuzzy set theory to multisensor data fusion, which reduces the uncertainty of sensor data information and has less computational complexity. Denoeux T. et al. [
28] and Kanjanatarakul O. et al. [
29] used the D-S evidence theory for multisource data fusion, which reduces the dependence on prior knowledge and demonstrates that a single-cycle fusion method has better fusion performance. In addition, some scholars have improved these basic data fusion methods. Mai F. J. et al. [
30] proposes a new method of optimizing data fusion based on a BPNN improved using a GSO algorithm, which improves the efficiency of data fusion and extends the network life. Hu Xiangdong et al. [
31] optimized the weights and thresholds of a neural network based on an artificial fish swarm algorithm and improved the data fusion accuracy of a wireless sensor network. Wang Hong et al. [
32] used the beetle antennae search algorithm to improve a particle swarm optimization BPNN and applied it to multisource data fusion, which effectively improves the fusion efficiency. Fan Leisong et al. [
33] proposed an improved data fusion method based on a BPNN and applied it to the fire monitoring sensor network, which effectively improves the monitoring accuracy and the efficiency of a wireless sensor network for forest fire monitoring. Li Jie et al. [
34] proposed a multi-sensor fusion method to improve the Dempster–Shafer evidence theory, considering that the complex space environment of the battlefield has led to many difficulties in target recognition of flight equipment. Lu S. F. et al. [
35] applied Dempster–Shafer evidence theory knowledge to information fusion and classification. Qu Qiang et al. [
36] put forward the theoretical method of reciprocal fuzzy closeness in calculating the weight of a data fusion model.
The above methods are all aimed at improving fusion technology and do not start from the simulation data itself to study the validity, completeness, and credibility of the data source. The premise of data fusion is to ensure the validity and completeness of data to be fused. It is difficult for incomplete semi-physical simulation samples to fully characterize the landing information of an aircraft, which will affect the landing prediction accuracy of the fusion model. In this case, if an integrity detection of the semi-physical simulation data samples is carried out, supplementary test conditions are designed for the semi-physical simulation sample points that do not satisfy the completeness requirements, the test condition data corresponding to the placement points are supplemented, and the test is performed again. Then, a complete semi-physical simulation data source is obtained. Using the reconstructed semi-physical simulation dataset for data fusion modeling will greatly improve the accuracy of the multisource data fusion model.
Therefore, on the basis of analyzing the characteristics of a semi-physical simulation system, this paper proposes the concept of integrity detection of semi-physical simulation data and establishes a completeness test criterion based on field test data and expert estimation information. An integrity detection method for semi-physical simulation data is constructed, including data dimensionality reduction, data expansion, and regularization based on the bootstrap method, drawing on the statistical envelope of the confidence region and establishment of completeness test criteria. The integrity detection results are analyzed by the envelope diagram and the coverage calculation value. For the semi-physical simulation data that do not satisfy the integrity detection, a supplementary test condition model is established based on a BPNN method, and a complete semi-physical simulation data source is obtained through retesting. With the aid of the area measurement criterion, the semi-physical simulation data are compared with the fusion effect of the external field data before and after the addition of the landing point data, under the condition that the semi-physical simulation data do not satisfy the integrity detection conditions. The practicability and effectiveness of the method are verified through research in the field and a semi-physical simulation test of a 534A aircraft.
  2. Fusion Evaluation of Semi-Physical Simulation and Field Test Data
For the assessment of aircraft landing point accuracy, due to the multi-sourced landing test types, the coverage is very extensive. It includes not only system-level field prototype tests, but also alternative equivalent tests, such as numerical simulation tests, semi-physical simulation tests, and ground static simulation tests. Restricted by factors such as region, cost, and resources, there are few test data in the system-level field prototype test state. It is difficult to effectively evaluate the placement accuracy with only a small amount of prototype test data. At this time, the information source should be further expanded, and other alternative equivalent test data types should be fully utilized as auxiliary information. It is a common solution to carry out fusion evaluation on this basis, which is a multisource data fusion statistical method. In the evaluation process, it should be noted that the landing point data obtained from the substitute equivalent test are incomplete. Fusing directly with the system-level field prototype test data will lead to deviations in the accuracy of the established landing point fusion model. Errors are generated in the prediction of the coordinates of the landing point of the aircraft. Incompleteness of semi-physical simulation system data [
37] refers to the situation of incomplete and missing information in the system information. Incomplete data information may appear in the process of information acquisition. Information loss caused by external uncertainties (such as failure of measuring equipment, failure of information recording) or information loss during information transmission reduces the reliability of the system analysis results.
Therefore, this paper proposes a method for integrity detection and supplementary test design of semi-physical simulation data of aircraft landing point coordinates. By establishing completeness inspection criteria, it is determined whether the semi-physical simulation data satisfy the completeness requirements. If not, simulation landing point supplementation is realized by the supplementary test condition design method, improving the accuracy of subsequent multisource landing data fusion. The multisource fusion evaluation process of the semi-physical simulation and field test data is shown in 
Figure 1.
  3. Integrity Detection Method for Semi-Physical Simulation Data
  3.1. Dimensionality Reduction in Landing Point Data
The semi-physical simulation landing point coordinates of an aircraft 
 include three dimensions. Considering that two-dimensional aircraft data are more convenient to visualize and analyze, the coordinates of the simulated landing point are projected into three, two-dimensional planes 
, 
, and 
, to achieve data dimensionality reduction, as shown in 
Figure 2. Taking the coordinate data 
 projected on plane 
 as an example, the coordinate is divided into intervals. Referring to the selection method of the optimal histogram group spacing and the number of groups in histogram theory, the semi-physical simulation data interval 
 is divided into 
 equidistant intervals on the 
-axis.
Studies have shown that many scholars have not considered the statistical characteristics of the original data in the selection of group distances in histograms and have directly adopted empirical methods according to the sample capacity, lacking emphasis on histogram theory and optimal histogram production methods. Regarding the choice of group distance, there are many methods in existing research results and there is great controversy. The group distance greatly affects the properties and population distribution characteristics of the histogram, and even if an improper group distance is used, the histogram will be “distorted”. In the same way, in the process of finding the envelope function  for the semi-physical simulation data, the different division of the ordinate of the projected scatterplot will also lead to very different or even “distorted” envelope functions.
Many scholars have discussed the determination of the histogram group distance. Mao Shisong adopts an empirical method and believes that when the sample capacity n is large, the number of groups k is between 10 and 20; when the sample capacity n < 50, k is usually between 5 and 6, and the sample frequency in each group of intervals is usually not less than 5. Xie Zhong-jie uses Moore’s formula in his work: ,. Montgomery gives two suggestions for making histograms: ① the number of groups can be approximately equal to the square root of the sample capacity and ② the distances between groups are equal. Sturges obtains a rough relationship between the number of groups k and the sample capacity n under the assumption that the sample data satisfy the population distribution symmetry and approximate normality:. Many scholars and some statistical software are still based on this formula. Doane modifies the Sturges formula when considering the bias of the sample data. The number of groups , where , n is the sample capacity, and  is the estimated skewness.
According to the above analysis, when the population distribution of the sample data is symmetrical and approximately normal, the Sturges formula considers both the population distribution characteristics and the sample capacity and is the best method for determining the group distance. The main idea of the Sturges method is to approximate the normal distribution with a symmetric binomial distribution 
. Consider an idealized grouping with the number of groups as k. Assume that the probability of each sample observation falling in the i-th interval approximately obeys a binomial distribution with a probability of 
. The average sample frequency of the i-th group is 
, where the number of combinations is 
. Then, when k is large, such an ideal histogram can be approximated as a normal distribution with mean 
 and variance 
. Since the total sample capacity is
        
        the logarithm of both sides is taken to obtain the Sturges formula of the number of groups 
k with respect to the sample capacity 
n:
If the three-dimensional landing point data of the semi-physical simulation passes hypothesis test , the population distribution is symmetrical and  means that the population approximates a normal distribution. The Sturges method is selected to divide the projected ordinate data into equidistant intervals.
  3.2. Data Expansion and Regularization Based on the Bootstrap Method
Considering that the coordinates of the landing point of an aircraft are small sample data, the data need to be expanded and regularized. At present, many methods have been accumulated in engineering to address the problem of small sample data expansion. These methods can be roughly divided into two categories according to whether they utilize prior information. One is traditional estimation methods represented by the Bayesian method, which uses historical information as prior information to estimate the parameters of small sample information. The other is represented by the bootstrap method, which does not require historical prior information and obtains new samples by resampling the small sample data, thereby transforming the small sample problem into a large sample.
Since the focus of this paper is to expand the sample capacity based on the existing small sample data information, there is no prior information available, and a Bayesian-like method is not applicable. In recent years, some scholars have proposed data expansion methods based on neural networks, but the process of unsupervised learning in these methods requires a large sample capacity and is not suitable for the expansion of small sample data. To satisfy the needs of small sample data expansion, this paper presents a data expansion method based on bootstrapping. This method requires less calculation and is suitable for a situation in which the number of dynamic data sequence calculation calls is high, and the calculation amount is large.
Definition 1:  The observed sample  is the overall sample, and its sample capacity is limited. The sample is called the original sample. If ,  is unknown, it is sorted to obtain sequential samples . It is assumed that the probability of datapoints falling into each cell interval  is equal to , that is, it is uniformly distributed in each cell. The empirical distribution function constructed from these original samples is:  Let 
 be the unknown parameter of population distribution F, and 
 is an estimate of 
. Then, the estimated error is:
In the formula,  is a function of X and F.
Let 
 be another set of samples resampled from 
, and let
        
In the formula,  is the empirical distribution of  and  is the bootstrap statistic of .
Therefore, the mean and variance of  characterize the mean and variance of the estimation error of . Using a computer to perform N rounds of sample resampling, N bootstrap statistics  can be obtained. Thus, the probability distribution of  can be obtained from the frequency curve of , and the distribution of  with probability distribution  can be modelled. From Formula (5), . N samples and  can be obtained through computer simulation (usually N is more than 1000). The distribution and eigenvalues of the unknown parameter  can be obtained via corresponding statistical methods.
The most critical aspect of the above process is the sampling method. At present, there are three kinds of sampling methods, including sampling methods that generate n random numbers between 0 and 1 and the cumulative sum is one, methods that directly use the original data for replacement sampling, and empirical distribution sampling methods. Among them, the data generated by the first type of sampling method are concentrated in the middle area, and the generated data will not reflect the individual circumstances of the upper and lower limits of the original sample. The new sample sequence generated by the second type of method after sampling may not conform to the population distribution to which the original sample follows. The third type of method does not have the shortcomings of the other two methods, so the third type is used for the empirical distribution sampling method, which is detailed as follows:
Step 1: Take , rand (1) can randomly generate any decimal in [0,1]
Step 2: Define 
,
, where 
 means rounding to the nearest integer and satisfies:
Step 3: The method to generate a regeneration sample is:
After completing the distribution of unknown parameters  and estimation of eigenvalues, the distribution can be used to generate the required M sample points to implement data expansion.
  3.3. Statistical Envelope of the Confidence Region
At present, statisticians have accumulated many methods to solve the confidence interval. According to whether prior information is used, these methods can be roughly divided into two categories. The first kind are traditional estimation methods represented by the Bayesian method, which integrates overall information, sample information, and prior information. The parameters are estimated in intervals using historical information as prior information. The other type are classical statistical estimation methods that only use the population information and sample information, which can be divided into the following two methods according to whether the type of the overall distribution is known: nonparametric interval estimation methods represented by Hodges‒Lehmann, which estimates the parameter interval directly based on the sample information when the population distribution type is unknown, and interval estimation methods, when the population distribution is known, such as estimating the normal population parameter mean  or variance  interval.
Since the focus of this paper is estimating the confidence interval of the sample data after bootstrap resampling and expansion, there is no prior information available and Bayesian-class methods are not applicable. Conventional confidence interval solving methods, such as the interval estimation of the single-parameter mean  of the normal population, essentially cover the point estimate of the mean  with a confidence interval. This paper tests the integrity of the semi-physical simulation data. A polygon envelope function graph is drawn to judge whether the data are complete. To cover the original sample data, it is not suitable to use a conventional confidence interval solution method. As the coordinate data of the landing points are subject to a normal distribution after the above test, for a normal distribution, 6σ intervals can cover 99.7% of the data. Assuming that the mean and standard deviation obtained by bootstrap resampling are  and , respectively, then the confidence interval can be .
  3.4. Integrity Detection Standard
After obtaining the envelope function  of the semi-physical simulation data and the envelope function  of the external field data, the completeness of the semi-physical simulation data is judged as follows:
(a) If 
, as shown in 
Figure 3a, calculate the area enclosed by 
 and 
 separately:
Then, there are:
(i) If  (where  is the minimum acceptable value estimated by experts, such as 80%), it is considered that the semi-physical simulation data have a high degree of completeness. There is no need to supplement data, and it can directly participate in the follow-up research.
(ii) If , it means that the coverage of the semi-physical simulation data is insufficient, and it is necessary to supplement the data before participating in the follow-up research.
(b) If 
, as shown in 
Figure 3b, it means that the coverage of semi-physical simulation data is insufficient, and it is necessary to supplement the data before participating in the follow-up research.
  4. Design of Supplementary Experimental Conditions Based on a BPNN
If the data fail to pass the above test, it indicates that the completeness of the semi-physical simulation test data is insufficient. A proxy model of the semi-physical simulation system needs to be constructed according to the simulation data. The model is verified and corrected through field test data, and then the proxy model is used to invert the additional data that need to be supplemented. For the semi-physical simulation test conditions, the supplementary test condition data are used to retest and obtain a complete semi-physical simulation dataset to ensure that the two data sources to be fused have a good fusion foundation.
Regarding the further prediction of the coordinates of landing points, the prediction models include two categories. The first category is traditional, based on probability theories such as time series models and regression models; the second category is new, based on new algorithms such as machine learning, which mainly include neural networks, intelligent algorithms, and data mining. These new prediction models are mainly aimed at nonlinear systems and the prediction accuracy is higher than that of traditional models. The first type of model relies on the relevant principles of statistics for analysis, which requires the sample data to be relatively clear, and the sample data have a good distribution law. The second type of model is based on nonlinear systems. In the context of big data, new intelligent algorithms and models can make fast and effective predictions. Considering that traditional prediction models have too many constraints and strict requirements for samples, it is generally difficult to obtain better prediction accuracy.
Therefore, this paper establishes a semi-physical simulation system proxy model based on a BPNN and uses the neural network method for prediction. A BPNN has excellent performance, and the prediction accuracy is better. In the learning process, the input data of the semi-physical simulation system are input into the input layer, and the result is transmitted to the output layer after being processed by a hidden layer. If the result does not match the output of the actual semi-physical simulation system, the error is backpropagated to the hidden layer and the corresponding weights and thresholds are modified. After the learning process is complete, the verification phase begins. First, the field data are input, and then the results obtained by the neural network are compared with the actual situation and the model is revised. 
Figure 4 shows a BP neuron model with 
d inputs and 
n outputs. In the figure, 
 is the input vector of the neuron; 
 is the output vector of the hidden layer neuron; 
 is the input layer weight link; 
 is the hidden layer weight link; 
 is the bias of the hidden layer neuron; 
 is the output of the output layer; 
 is the output layer neuron bias; and the excitation function 
 is a sigmoid transfer function.
Considering that if the large sample and small sample data are directly mixed as the input to the BPNN, the information of the small sample data will be lost due to the difference in sample capacity. Therefore, the data processing needs to be consistent with the sample capacity. Using the data expansion theory in 
Section 3.2 and based on the bootstrap resampling method, the external field test samples are expanded to be consistent with the semi-physical simulation data sample capacity. The expanded field test samples and semi-physical simulation samples are used as the input to the BPNN.
There are many test conditions affecting the landing point of an aircraft, mainly including launch conditions and environmental conditions. The launch conditions include altitude, speed, and attitude angles in the  axial directions; environmental conditions include other disturbances such as temperature, wind speed, and wind direction. As the above eight test conditions can be established by the semi-physical simulation data model in this study, they are the output end of the BPNN.
According to the settings of the input terminal and output terminal of the BPNN, the training of the neural network is started. The dataset containing the field test data in the semi-physical simulation data envelope is denoted by , and the dataset that is not included is denoted by .  data are substituted into the neural network to train the model. After completing the learning, the test conditions  of the dataset are taken as the center point. Search algorithms, such as the fastest gradient method and quasi-Newton iteration, are used to search in its neighborhood, and the test conditions, including the dataset , are inverted as supplementary semi-physical simulation test conditions. The semi-physical simulation test is repeated under the supplementary test conditions, and the obtained landing point coordinates are used as a supplement to the original semi-physical simulation landing point to complete the supplement required for the completeness of the simulation data.
  5. Evaluation of the Fusion Effect of Semi-Physical Simulation and Field Data
For the fusion of the abovementioned semi-physical simulation landing point data that do not pass the integrity detection but complete the supplementation with the field test data, the empirical distribution function (EDF) image in Definition 1 is drawn. At the same time, the semi-physical simulation data that have not passed the integrity detection and have not been supplemented are directly fused with the field test data, and the empirical distribution function image is drawn as a control group.
Considering the utilization of semi-physical simulation data information, the Bayesian fusion method is selected. The idea of Bayesian methods for solving statistical problems is different from classical statistical methods. One of their distinguishing features is using all possible information as much as possible while ensuring that the decision-making risk is as small as possible. This is not only the field test information but also the information before the field test, such as the useful information of the aircraft system during development, the information of the simulation test, and the test information of similar weapon equipment systems. The actual number of field trials can be small. Therefore, in the presence of the abovementioned prior information, as a data fusion method, the Bayesian method can be used for small sample test analysis.
Generally, Bayesian fusion easily obtains an explicit expression in the case of a normal distribution. The specific fusion process is described below. Let the external field sample data be . Each test sample datapoint is an -dimensional vector, and the -dimensional vector is denoted as . The semi-physical simulation sample data are . Each test sample is an -dimensional vector, and the -dimensional vector is denoted as . We use the semi-physical simulation data source as the prior information and the field test data source as the field sample for data fusion.
Definition 2:  The observed sample 
 is the overall sample. The density function of 
 is 
 and the parameter to be estimated is 
. The Bayesian academics treat 
 the same as a random variable 
 and treat 
 as a conditional probability density in classical statistics 
. Under the prior distribution 
, for 
 to generate a new understanding, the posterior density is obtained as
      
	  where 
 is the prior distribution, 
 is the maximum likelihood function of the sample, and the distribution type of 
 is only determined by 
. Therefore, the Bayesian statistical model can be simplified as 
.
 The data 
 of the 
-th dimension of the field test data source obey a normal distribution 
, where 
 is known as the sample variance of 
. Then, the joint probability density of 
 is:
Assuming that the data 
 of the 
-th dimension of the semi-physical simulation data source obey a normal distribution 
, then take 
 as the prior distribution of 
, that is:
      where 
 and 
 are obtained by moment estimation using the data 
 of the 
-th dimension of the semi-physical simulation data source. Then, 
. Thus, the posterior density of 
 is:
It can be seen from the posterior density that the posterior distribution of the unknown parameters is also a normal distribution, with a mean of  and a variance of . In the above formula . Similarly, the posterior probability density function (PDF) and posterior cumulative distribution function (CDF) in other dimensions of the field test data source can be obtained.
To accurately illustrate the pros and cons of the fusion effect, we define a quantitative analysis of the area metrics. The area measurement method was first proposed by American scholars Ferson and Oberkampf [
38,
39] in 2008. It is a confirmation measurement method based on probability distribution distance and is mainly used in the field of modeling and simulation. As shown in 
Figure 5, all the real landing point data of an aircraft must come from the same data population, obey a specific distribution, and have a unique CDF. By calculating the area between the overall cumulative distribution function of the aircraft landing point data and the empirical CDF of the experimental observation fusion (the area of the shaded part in 
Figure 5), the fusion effect of the external field data and the semi-physical simulation data before and after expansion is quantified and evaluated.
Definition 3:  Assume that the overall cumulative distribution function of the real landing point data of the aircraft at time  is . After the fusion of the field data and the semi-physical simulation data, the cumulative distribution function at time  is . Then, the area measurement index confirmed by the model can be used for reference and the quantitative value of the area measurement index of the fusion effect in this paper can be defined as:  It can be seen from the above formula that when the probability distribution of the fused data and the real landing point data is closer, the area measurement index is smaller, and vice versa. Therefore, the area measurement index can characterize the quality of the fusion effect of the field data and the semi-physical simulation data. If the integrity detection fails but the supplemented semi-physical simulation data are fused with the field test data, the required value of  is smaller. This means that the obtained empirical distribution function image is closer to , which shows that our semi-physical simulation data completeness test method and supplementary experimental design are practical and effective.
  6. Application
Take the landing point data of a 534A aircraft as an example. The coordinates of the landing point where the aircraft hits the air for the first time are marked as the origin (0, 0, 0) and a space Cartesian coordinate system is established. 
, and 
 represent the coordinate axis data in the three directions. The front, right, and top are taken as the positive directions of the three coordinate axes and the unit of distance is meters. The sample size of the semi-physical simulation data of the landing point is 
 and the sample size of the field test data landing point is 
. These parameters are shown in 
Table 1. All the following calculations are based on the software MATLAB 2021b implementation.
Through the introduction and analysis of the mainstream group distance determination methods, we conducted experiments on the 
z-axis data projected from the semi-physical simulation data on the plane 
 to find the best group distance determination method. The methods proposed by Mao Shi-song, Xie Zhong-jie, Montgomery, Sturges, Doane, and Scott are used to determine the group distance, and the grouping situation is shown in 
Figure 6. The number of groups is 
.
Clearly, the histograms based on the methods proposed by Mao Shi-song, Xie Zhong-jie, and Scott are too flat, and the original data information is largely lost. The results of the normal distribution hypothesis test on the z-axis data showed that the p-values of the Jarque–Bera test and the Lilliefors test (modified Kolmogorov‒Smirnov test) at a confidence level of 95% were 0.4093 and 0.1176, respectively. All are greater than 0.05, so the null hypothesis is accepted; that is, the z-axis data are considered to obey a normal distribution, so it is effective and meaningful to use the methods proposed by Sturges, Doane, and Scott to determine the group distance.
The skewness of the data is tested as follows and the skewness coefficient is calculated:
Therefore, the sample data are unbiased and Doane’s correction method is not considered. We choose to use the Sturges method to determine the group distance.
According to the projection map of the landing point data of the semi-physical simulation on the plane, the interval is divided according to the Sturges method of determining the group distance. The sample size is 50, so the number of groups k and the group distance h can be calculated as follows:
Taking −0.9 as the minimum value of interval division, the 
z-axis is divided according to the above criteria, as shown in 
Figure 7. After the three-dimensional landing point data are projected on the three coordinate planes and divided equally according to the vertical axis direction, the sample data in each divided interval are subjected to bootstrap resampling using the above method to expand the sample capacity. The sample size of each cell is expanding to 100 as the standard, and a scatterplot of the sample data after the expansion is shown in 
Figure 8. Similarly, the sample data expansion map projected on the planesandcan be obtained by the same method.
The confidence interval is calculated for the coordinates in each of the divided small division intervals and the envelope diagram is drawn according to the endpoints of the confidence interval, as shown in 
Figure 9. The envelope area is further calculated according to the confidence interval envelope map and the envelope area is divided in parallel, as shown in 
Figure 10. The area of each small trapezoid is calculated and summed to obtain a total area value of 5.9632 for the confidence interval envelope. Similarly, the same method can be used to draw the confidence region envelope map projected on the planes 
xoy and 
yoz and to calculate the corresponding envelope area.
According to the above calculation method, the projection envelopes of the three, two-dimensional coordinate planes 
, 
, and 
 of the semi-physical simulation data and the external field data are drawn and the corresponding areas are calculated, as shown in 
Figure 11. The black line is the semi-physical simulation data envelope polyline, the blue line is the external field data envelope polyline, and the corresponding area ratios of the planes 
, 
, and 
 are 71.75%, 56.61%, and 79.68%, respectively. Observing the graph envelope situation shows that the integrity detection is not satisfied. The simulation data that do not satisfy the integrity detection in this example are directly fused with the field test data based on the Bayesian method, and the fusion result is shown in 
Figure 12.
Considering situations where the above examples do not satisfy the integrity detection conditions, a surrogate model of the semi-physical simulation system based on the BPNN method is considered, and the test conditions are supplemented. The semi-physical simulation test is repeated under the supplementary test conditions to obtain supplementary landing point data, thereby obtaining a complete semi-physical simulation landing point dataset. The envelope of the field datapoints is shown in 
Figure 12.
According to the test conditions corresponding to the landing point data in the calculation example, based on the BPNN method, the test conditions of the nonenveloped field landing point data are predicted. The predicted data results are shown in 
Table 2.
According to the prediction results of the test conditions that are not enveloped, the originally set test condition range is expanded. After that, the semi-physical simulation test is carried out under the complete test conditions and the corresponding simulation data of the landing point coordinates are obtained to realize the expansion of the semi-physical simulation landing data and obtain a complete semi-physical simulation landing dataset. Based on Bayesian theory, the semi-physical simulation data in the calculation example and the field test data are fused and analyzed, and the empirical distribution function image is drawn as shown in 
Figure 13.
The qualitative analysis in 
Figure 13 shows that the empirical distribution function fused with the field test data is closer to the cumulative distribution function of the actual flight landing point of a 534A aircraft after supplementing the simulation data, which does not satisfy the completeness test as a whole. Let the CDF of the data population be 
; the EDF of the fusion after supplementation be 
; and the EDF of the directly fused data be 
. The area metrics calculated by Formula (13) are:
Therefore, it is proven that the proposed method of integrity detection and supplementary test design of the semi-physical simulation data of the aircraft landing point is effective for data fusion.
  7. Discussions and Conclusions
Taking the test results of the landing point coordinates of a semi-physical simulation system of an aircraft as the research object, an integrity detection criterion based on field test data and expert estimation information is established. The integrity detection of semi-physical simulation data includes four steps: ① reducing the dimensionality of the 3D landing point data; ② realizing data expansion and regularization based on a bootstrap resampling method; ③ drawing the statistical envelope of the confidence region; and ④ establishing the integrity detection criterion. Based on the BPNN method, the agent model of the semi-physical simulation system is constructed for the landing point which does not satisfy the completeness requirement, and the test conditions of the data of the landing point are supplemented.
Through the application of research in the field and semi-physical simulation tests of the aircraft 534A in 
Section 6, aiming at the coordinate data of the field and semi-physical simulation, integrity detection and a supplementary experimental design of the aircraft semi-physical simulation data were carried out. The result of data processing shows that after adding semi-physical simulation data that does not satisfy the integrity detection, the EDF fused with field test data is closer to the CDF of the actual landing point of aircraft 534A. It is verified that the method proposed in this paper can lay a good foundation for the data fusion of the following semi-physical simulations and field tests. This approach has a number of beneficial effects:
(1) An integrity detection method for semi-physical simulation data is creatively proposed, which can quantitatively test the completeness of simulation data for semi-physical simulation systems based on external field placement data and expert estimated envelope conditions. This effectively improves the semi-physical performance and the practical application significance of the effect and completeness assessment of the fusion of simulation data and field test data.
(2) The established test standard is applicable to data completeness measurements under large and small samples (bootstrap resampling can be used to expand the sample data).
(3) The established test standard is applicable to both the integrity detection measure of a single index and the overall completeness measure of multiple indexes of an entire system (data dimensionality reduction is sufficient) and has good generality.
At the same time, it should be noted that to fully verify the completeness of the model data, a variety of means and methods must be used and bottom-up, multi-index, layer-by-layer aggregation methods must be used for the model indicators. Only then can the final comprehensive evaluation result of the completeness of the semi-physical simulation model be obtained. In addition, this paper’s “Integrity detection and supplementary experimental design of semi-physical simulation data of aircraft landing points” not only applies to semi-physical simulation data and data processing methods. In fact, the equivalent experiment in aircraft not only includes semi-physical simulation experiments, but also includes digital simulation experiments, numerical simulation experiments, ground static simulation experiments, and so on. Whether the data processing method in this paper can be directly applied to the equivalent experimental data of other aircraft substitutions needs further study.