1. Introduction
The global threat landscape now includes terrorist groups, extremist organizations, human traffickers, drug cartels, mercenaries, and other destabilizing actors. The increasing use of asymmetric tactics—particularly improvised explosive devices (IEDs) and targeted attacks—has further complicated the security environment [
1]. Among these threats, attacks on the aviation sector present especially serious challenges in today’s interconnected world. Airports and aircraft, as symbols of global connectivity, are attractive targets because disruptions can cause large-scale devastation [
2]. Suicide attackers often carry explosives on their person or hide them in vehicles, including aircraft [
3].
To counter these threats, various government and nongovernmental organizations began developing, certifying, and deploying explosive detection systems (ETDs) before the 21st century [
4]. In parallel, aviation security has advanced through the adoption of increasingly sophisticated explosive detection technologies. Ion mobility spectrometry (IMS), introduced in the 1990s, became widely used at airports due to its rapid detection of trace explosive residues [
5]. Mass spectrometry (MS) followed in the early 2000s, offering enhanced specificity and precision [
6]. Additionally, MS methods are also used to detect viruses, such as coronaviruses, for disease diagnostics [
7].
Several studies have evaluated IMS-based ETDs, investigating detection limits, ionization efficiency, and surface-dependent sensitivity under real-world conditions [
5,
8,
9]. These studies highlighted the impact of surface properties and operational environments, confirming IMS as a rapid, sensitive, and field-deployable technique. However, careful optimization of sampling and ionization conditions is essential to ensure reliability and minimize false alarms [
5,
8].
Methodological advances in statistical evaluation have also been proposed to improve ETD testing. For instance, binomial confidence intervals and Fisher’s exact test were applied to improve accuracy in estimating detection probabilities, especially under limited data conditions [
10]. Another study identified temperature and humidity as key factors affecting the measurement stability of IMS-based ETDs, emphasizing the need to account for environmental effects during certification [
11].
Despite these efforts, most prior studies focused on individual IMS-based devices, limiting comparative insights into measurement uncertainty and performance stability across ETDs. Differences in ionization methods, internal components, and environmental sensitivity have not been fully explored. Research has largely emphasized technological development, with limited attention to evaluating real-world operational performance.
To address these gaps, this study systematically compared the measurement uncertainty, operational stability, and environmental robustness of two commercially available IMS-based ETDs under controlled conditions. In particular, it focused on a statistical perspective to analyze phenomena not easily explained by physical approaches. This comparative analysis offers deeper insights into practical reliability and contributes to the advancement of certification processes.
This paper is organized as follows:
Section 2 introduces IMS-based ETDs.
Section 3 describes the experimental procedure.
Section 4 presents and discusses the results.
Section 5 concludes the study.
2. Explosive Trace Detectors Based on Ion Mobility Spectrometry
ETDs are employed to detect trace amounts of explosive residues, thereby helping to prevent bombing attacks on passenger aircraft. ETDs identify explosive materials by detecting particles or vapor traces [
9]. Various methods exist for detecting explosive residues, with MS and IMS commonly being employed.
Mass spectrometry is an analytical technique that ionizes a sample under vacuum and separates the resulting ions according to their mass-to-charge ratios using geometric path analyzers such as magnetic sectors or time-of-flight instruments. In contrast, ion mobility spectrometry separates ions based on their mobility through a drift gas under an applied electric field. Ion mobility is influenced by both the mass-to-charge ratio and molecular structure and is typically normalized through calibration with internal standards. Both techniques provide complementary information and are widely utilized for the detection and characterization of trace explosive compounds [
12]. By the early 2000s, the use of IMS had already expanded significantly across various applications [
5,
12]. Consequently, this study specifically focuses on IMS-based ETDs.
Two commercially available IMS-based ETDs were compared. To avoid potential brand bias, the devices are referred to as Product A and Product B. Key specifications for both devices are summarized in
Table 1.
Although manufacturers may implement ionization mechanisms differently, the two devices evaluated in this study employ distinct ionization techniques: dielectric barrier discharge (DBD) and impulsed corona discharge (ICD). The principles, advantages, and disadvantages of these techniques are summarized below.
DBD generates non-thermal plasma by applying alternating high voltage across electrodes separated by a dielectric layer. This method provides stable plasma generation, reduced electrode degradation, and consistent ion production even under varying humidity conditions, making it suitable for long-term laboratory-based operation. However, DBD sources typically involve more complex circuitry and moderately higher power consumption [
13].
In contrast, ICD employs short-duration high-voltage pulses to produce localized corona discharges at a sharp electrode tip. This technique enables compact device design and low power consumption, ideal for portable or field-deployable systems. Corona discharge ionization sources are widely adopted as non-radioactive alternatives capable of generating high ion currents, enhancing sensitivity and reliability in IMS-based detection [
14]. Additionally, corona discharge has demonstrated effective direct ionization of analytes from surfaces, facilitating field applications without extensive sample pretreatment [
15]. However, ICD may be more sensitive to environmental fluctuations, potentially affecting measurement stability during prolonged use [
13].
Thus, the choice of ionization source significantly impacts the analytical sensitivity, operational stability, and field applicability of IMS-based ETDs. Both evaluated devices can operate on battery power; however, they were connected to an external power supply throughout the experiments. To facilitate direct comparison despite manufacturer-specific units, visualization analyses were conducted using normalized data.
3. Experimental Procedure for Head-to-Head Comparison of Measurement Variability Between Two Commercial Explosive Trace Detectors
In this study, a comparative analysis of the measurement uncertainty in two ETDs was conducted according to the procedure illustrated in
Figure 1.
When an explosive is detected by an ETD, a quantitative measurement is displayed. An alarm is triggered if the measurement exceeds a predefined threshold. All measurements inherently involve uncertainty, formally defined as measurement uncertainty [
16]. In this context, the variance of these measurements serves not only as a key indicator of sensor performance [
17] but also as a contributing factor in estimating measurement uncertainty.
The first step was to prepare TNT at the 5 ng detection limit and apply it to swabs for measurement. Sim et al. [
11] identified consecutive operation to be a key influencing factor using a cause-and-effect diagram; the resulting experimental conditions are summarized in
Table 2.
The target hazardous substance was TNT dissolved in acetone. Each device was tested using its designated swab. A 5 ng TNT solution was applied to each swab at the specific location indicated by the red box in
Figure 2. To preserve product anonymity, a generic illustration of the swab—rather than an actual photograph—is used to represent its typical shape. Swabs were manually inserted into the device through a chamber-like heated inlet and remained inside only until detection was confirmed. For all measurements, a new swab was used for each test by applying the solution to one side only, after which the swab was discarded. Although the two devices used in the experiment were not newly manufactured, they were purchased around the same time and were consistently maintained.
Each operational interval was set to 20, 40, 60, and 80 consecutive operations. To ensure comparability, 240 measurements—the least common multiple of these intervals—were performed for each. Accordingly, 12 cycles were conducted for the 20-operation interval, 6 for 40, 4 for 60, and 3 for 80. After each cycle, the detector’s built-in cleaning function was manually activated once for exactly two minutes. For intervals exceeding 8 h between experiments, the device was rebooted and calibrated using the calibration pen provided with the device—applied to a swab—prior to resuming testing.
Following sample preparation and data collection, the repeated measurements were analyzed with a Type A evaluation of measurement uncertainty. The resulting uncertainty was expressed as the standard uncertainty (
) and the expanded uncertainty (
), as defined in Equations (1) and (2). In these equations,
is the sample standard deviation,
is the number of measurements, and
k is the coverage factor for the chosen confidence level.
Subsequent statistical tests were conducted to evaluate whether measurement uncertainty was consistent across operational intervals. Significant differences among the 20-, 40-, 60-, and 80-measurement groups were then examined.
Data normality was first assessed with the Shapiro–Wilk and Anderson–Darling tests; this step informed the choice of variance tests and clarified whether uncertainty remained stable or varied with the number of consecutive operations.
To enable a fair comparison between Product A and Product B—whose output units differ—the data were rescaled to the 0–1 range by min–max normalization. This common scale allowed direct comparison of measurement patterns.
The normalized data were subsequently explored with distribution plots, 95% confidence interval charts, and density graphs. These visualizations provided intuitive insight into measurement consistency, variability, and performance trends for the two ETDs under diverse operational and environmental conditions.
4. Experimental Evaluation
Four consecutive operation intervals (20, 40, 60, and 80 measurements) were selected, and a total of 240 measurements—the least common multiple of these intervals—were performed for each interval. Consequently, measurements were conducted for 12 cycles at the 20-measurement interval, 6 cycles at the 40-measurement interval, 4 cycles at the 60-measurement interval, and 3 cycles at the 80-measurement interval. After completing each cycle, the ETD was cleaned once. The data collected using these methods are shown in
Table 3.
TNT was detected sequentially using each product, and the corresponding measurement values were recorded. At the beginning of each measurement cycle, the date, temperature, and relative humidity were documented. Since temperature and relative humidity were controlled using an air conditioning system, it was assumed that no significant variation occurred in these parameters during each cycle.
4.1. Comparative Analysis of Measurement Characteristics Between Explosive Trace Detectors Under Experimental Conditions
Analysis of Standard Uncertainty by Interval
Each product detected TNT 960 times, and the corresponding measurements were recorded. At the start of every measurement cycle, the date, temperature, and relative humidity were documented. Because temperature and humidity were tightly controlled by an air conditioning system, these parameters were treated as constant throughout each cycle. Consequently, the Type A standard uncertainties and their associated expanded uncertainties were calculated for each consecutive operation interval, as summarized in
Table 4.
Table 4 presents the Type A standard uncertainties and their associated expanded uncertainties for each interval, expressed in the units specific to Products A and B. Expanded uncertainties were calculated with a coverage factor of k = 2, equivalent to a 95% confidence level. Visually, Product B shows interval-dependent changes in standard uncertainty, whereas Product A remains nearly constant. Because the values in
Table 4 are point estimates, statistical tests are required to determine whether the apparent differences are significant.
To test equality of variances, a suitable method must be selected. The F-test applies only to two groups. For more than two groups, as is the case in this study, Bartlett’s test is appropriate when the data are normally distributed, whereas Levene’s test is preferred when the normality assumption is violated. Accordingly, normality was assessed for each product with the Shapiro–Wilk and Anderson–Darling tests.
To assess whether the data follow a normal distribution, we applied two widely used tests: the Shapiro–Wilk and Anderson–Darling tests. Both evaluate the null hypothesis () that the sample is drawn from a normally distributed population and produce p-values, which are compared to a significance level of .
The Shapiro–Wilk test calculates the
statistic, as defined in Equation (3):
where
is the sample size,
refers to the ordered data values,
is the
-th observation in the sample,
is the mean, and
refers to optimally derived weights based on the expected order statistics of a normal distribution [
18]. The
p-value is calculated using Royston’s log-normal approximation [
19].
The Anderson–Darling test calculates the
statistic, as defined in Equation (4):
where
is the cumulative distribution function of a standard normal distribution [
20]. After applying Stephens’s finite-sample correction to obtain
as defined in Equation (5),
the
p-value is determined using piecewise exponential approximations provided by Stephens [
21].
Both the Shapiro–Wilk and Anderson–Darling procedures test the null hypothesis that the sample is normally distributed. The resulting
p-value indicates the probability of obtaining a test statistic at least as extreme as the observed one, assuming the null hypothesis is true [
18,
19,
20,
21]. If this probability is ≤0.05, the evidence against normality is considered statistically significant, and the null hypothesis is rejected. If the
p-value exceeds 0.05, we fail to reject the null hypothesis, indicating no statistically detectable deviation from normality. In other words, a
p-value greater than 0.05 suggests that the data are likely normally distributed, while a value of 0.05 or less indicates a significant departure from normality. The
p-values obtained from both tests are summarized in
Table 5.
As shown in
Table 5, Product A produced
p-values below 0.05 for every interval except 60, indicating departures from normality in those groups. Product B displayed a similar pattern:
p-values in both normality tests were near zero for all intervals except 80, so normality was likewise rejected for that device. The Shapiro–Wilk results confirmed these findings.
Because the data for both products violated the normality assumption, Levene’s test was used to compare variances across intervals. The corresponding statistics appear in
Table 6 (Product A) and
Table 7 (Product B).
Although both products use the same IMS technique, the variance tests produced different outcomes. For Product A, the
p-values indicated no significant differences among intervals. This result suggests that changes in the number of operations do not significantly affect the measurement uncertainty of Product A. Therefore, Product A demonstrates greater stability with respect to operational cycles. In contrast, Product B displayed highly significant variance changes. Consequently, pairwise Levene’s tests were applied to Product B to pinpoint which intervals differed; the results are reported in
Table 8.
As shown in
Table 8, pairwise Levene’s tests for Product B revealed significant variance differences for most interval pairs. Comparisons of 20 vs. 40, 20 vs. 60, 20 vs. 80, 40 vs. 60, and 40 vs. 80 all produced
p-values < 0.05, indicating heterogeneity of variances. In contrast, the 60 vs. 80 comparison yielded
p = 0.245, indicating no significant difference. These results suggest that Product B exhibits reduced measurement stability when operated in intervals shorter than 60 cycles, with variance stabilizing only after reaching 60 or more consecutive operations.
These findings corroborate earlier results, demonstrating that measurement uncertainty during consecutive operations can vary even between devices that share the IMS principle. Because Products A and B differ in physical size, internal components, and—most notably—ionization method, these factors are the most plausible drivers of the observed disparities in uncertainty as the run length increases.
4.2. Comparison of Patterns by Environment Using Visualization
To further examine the measurement characteristics of Products A and B, we visually analyzed their normalized patterns under varying environmental conditions. Given that temperature and relative humidity influence the performance of IMS-based ETDs, all data were rescaled to a common range using min–max normalization, as defined in Equation (6).
This unity-based transformation preserves the rank order of observations while imposing a common numerical scale, thereby preventing high-magnitude features from dominating distance metrics or gradient updates [
22]. After normalization, we visualized 95% confidence intervals for each operational interval to examine how measurement uncertainty evolves across consecutive operations. Because the number of replicates and the coverage factor were fixed when estimating Type A uncertainty, the sample standard deviation was used as the sole indicator of uncertainty.
Figure 3 shows these confidence intervals, providing a clear basis for comparing the measurement stability of Products A and B across all operational intervals and environmental conditions.
As indicated by the statistical test results presented in
Section 4.1, the confidence intervals for the standard deviation of Product A remained relatively stable despite changes in the number of consecutive operations. In contrast, Product B’s standard deviation was not statistically different from that of Product A at the 20-operation interval, but it exhibited a significant increase at the 40-operation interval, followed by a noticeable decrease after 60 operations. Based on
Figure 3, measurement uncertainty for Product B initially demonstrated an increasing trend but appeared to stabilize after exceeding 60 consecutive operations.
The subsequent figures further illustrate how each device responds differently under varying operational conditions, influenced by environmental factors. Specifically,
Figure 4 presents the normalized measurement data plotted sequentially from left to right in the order they were collected.
Figure 4 shows normalized measurement values of Product A and Product B plotted sequentially. Enhanced vertical lines and shaded areas delineate consecutive operation intervals: Interval 20 (indices 0–240), Interval 40 (indices 241–480), Interval 60 (indices 481–720), and Interval 80 (indices 721–960).
In general, Product A consistently exhibits higher normalized measurement values than Product B across all intervals. Product A’s data points (blue squares) show persistent variability throughout the test sequence. In contrast, Product B’s values (red circles) are consistently lower and display clear changes in variability across intervals. During Intervals 20 and 40, both products exhibit relatively high dispersion, indicating unstable measurement behavior. However, beyond 60 consecutive operations, Product B shows reduced variance and greater consistency compared to Product A, suggesting improved stability and lower measurement uncertainty under extended use.
Figure 5 illustrates density distributions of normalized measurements for Product A and Product B under two environmental conditions: (a) temperature and (b) relative humidity.
In panel (a), Distribution by Temperature, the measurement distributions of Product A and Product B show generally similar shapes across most temperature levels, with a notable exception at 21.4 °C, where the difference becomes more pronounced. As temperature increases, the distributions increasingly overlap, indicating convergence in measurement behavior under higher-temperature conditions.
In panel (b), Distribution by Relative Humidity, Product B consistently shows narrower distributions than Product A across the humidity range of approximately 0.43 to 0.50. Additionally, Product B’s distributions are typically unimodal and symmetric, suggesting more consistent and stable measurement performance under varying humidity conditions.
These similarities in distribution patterns across temperature levels suggest that the behavior may be attributed to the inherent properties of IMS technology. This is particularly notable given that, as demonstrated in
Section 4.1, Products A and B—despite both using IMS—produced markedly different levels of measurement uncertainty.
5. Conclusions
In this study, we conducted a comparative analysis of measurement uncertainty and performance stability for two IMS-based explosive trace detectors (ETDs). Given the critical role ETDs play in aviation security, it is essential to assess their performance under realistic operational and environmental conditions.
Statistical tests revealed that despite employing the same IMS principle, the two ETDs showed distinct differences in measurement uncertainty. Product A exhibited stable performance across all intervals, while Product B displayed greater variability that stabilized only after extended use. Visualization analyses further showed differing responses to temperature and humidity, underscoring the importance of environmental robustness in evaluating ETDs.
These findings demonstrate that variations in ionization methods, hardware specifications, and internal architecture can significantly impact device performance. However, the consistency observed under varying temperature conditions also highlights the intrinsic thermal stability of IMS technology.
This study provides a structured, data-driven framework for comparative evaluation of commercial ETDs. Unlike previous studies that focused on individual devices in development settings, our research evaluates side-by-side performance and environmental resilience. This approach offers practical insights that can support certification, procurement, and deployment decisions in the field.
Based on these results, improvements to the performance certification test procedures for ETDs should be considered to better reflect operational realities.
The methodology and findings presented here contribute to closing the gap between development-oriented studies and operational evaluation. Future work should extend this framework to include additional devices and broader testing conditions to advance the reliability and effectiveness of aviation security technologies.