1. Introduction
The globalization of integrated circuit (IC) supply chains has intensified concerns regarding hardware Trojan (HT) insertion during third-party fabrication. Unlike software vulnerabilities, HTs exploit the physical nature of hardware, making them difficult to detect and capable of undermining the security of critical systems. Traditional detection methods rely heavily on side-channel analysis (SCA) or comparison against a trusted “golden” reference design. However, these approaches suffer from major drawbacks: SCA signals are inherently noisy and environment-dependent [
1], and golden reference models are costly and often impractical to maintain in untrusted fabrication settings [
2]. These limitations necessitate the development of golden-free, data-driven solutions for HT detection.
While “golden-free” detection is not a novel concept in itself, the prior literature has largely relied on purely unsupervised anomaly detection, such as One-Class Support Vector Machines (OC-SVM) [
3] or reconstruction-based Autoencoders [
4]. These methods operate on the assumption that deviations from a learned “normal” distribution are automatically indicative of Trojans. However, in the context of side-channel analysis, this assumption is fundamentally fragile; process variation (PV), environmental drift, and measurement noise induce natural deviations that often mask or mimic the Trojan footprint. Consequently, purely unsupervised detectors frequently suffer from unstable decision boundaries, either overfitting to noise or failing to distinguish benign PV from malicious perturbations.
Compounding this challenge is the rise in adversarial attacks, which capitalize on vulnerabilities in machine learning models by incorporating designed perturbations to outsmart detection mechanisms [
5]. For instance, malicious entities might employ substitute models to investigate how a target clustering model organizes data, allowing them to shift Trojan-infected chips toward the statistical center of the benign distribution. This underscores the importance of enhancing detection systems not just against process variation, but against active evasion tactics.
To address this limitation, we propose a hybrid unsupervised–supervised framework. The theoretical motivation for this shift lies in the fundamental distinction between density estimation and discriminative learning. From a statistical learning perspective, unsupervised clustering attempts to model the complex joint distribution
of the chip data, a task that is notoriously difficult in high-dimensional spaces affected by PV [
6]. In contrast, classification targets the posterior
, focusing solely on the decision boundary required to separate classes.
By utilizing high-confidence cluster assignments as proxies for ground truth, our framework leverages the principle of Self-Training or Deep Clustering [
7,
8]. This allows the model to shift from merely identifying outliers to learning a separating hyperplane that maximizes the margin between “Trojan-free” and “Trojan-infected” regions. This transition is critical for adversarial robustness: while an attacker can easily manipulate samples to slide within a vague density threshold, it is significantly harder to cross a maximized margin defined by a discriminative classifier. Therefore, the proposed framework is not merely a heuristic, but a principled application of discriminative learning to convert unreliable anomaly scores into robust Trojan detectors.
To implement this framework, our approach begins with per-chip normalization of Ring Oscillator (RO) frequencies, followed by the extraction of statistical descriptors and correlation-based features. Dimensionality reduction is applied, and chips are clustered using k-means. Instead of treating all clusters as equally reliable, we introduce a clusterwise filtering strategy that selects only high-confidence chips (based on silhouette score and bootstrap stability) as trusted pseudo-labeled seeds. These seeds are then used to train an AdaBoost classifier, which generalizes the labels to the full dataset. Finally, adversarial training is incorporated to strengthen the resilience of the model against worst-case perturbations.
1.1. Key Contributions
The main contributions of this paper are as follows:
We propose a novel hybrid framework that integrates unsupervised clustering with supervised learning, enabling golden-free HT detection.
The proposed method demonstrates resilience to adversarial attacks, ensuring robustness against sophisticated manipulation techniques aimed at evading detection.
We introduce a clusterwise filtering mechanism that quantifies cluster quality using silhouette scores and stability analysis, ensuring reliable pseudo-label generation.
We demonstrate that the AdaBoost classifier trained on pseudo-labeled seeds achieves 99.05% accuracy and a 0% false negative rate, while remaining resilient under adversarial attacks.
We provide a practical and scalable methodology for IC security, bridging the gap between noise-prone unsupervised detection and label-dependent supervised methods.
1.2. Organization of the Paper
The paper follows a structured format that presents the proposed methodology alongside its evaluation process. The paper begins with
Section 2, which establishes background information about HTs through a threat model and an analysis of the existing literature.
Section 3 explains how unsupervised machine learning techniques form the basis of an AI-assisted framework to detect HTs. The experimental setup is described in
Section 4 through explanations of a ring oscillator configuration, Field-Programmable Gate Array (FPGA) implementation and data collection and preprocessing procedures.
Section 5 presents a proposed adversarial attack method that targets clustering-based detection models for compromise. The paper introduces defensive methods in
Section 6 to improve model resistance against adversarial perturbations. The experimental results are presented in
Section 7 to show the detection framework’s performance before and after implementing the defense mechanism. The paper concludes in
Section 8 by summarizing essential findings while proposing future research directions.
3. AI Assisted HT Detection
The section describes how unsupervised clustering techniques reduce false positive rates and detect Trojans within integrated circuits. The research applies hybrid machine learning to develop exact methods for Trojan identification and Trojan-free chip verification. Each chip undergoes a test setup in data collection, which includes infected and non-infected sample chips. The collected data are clustered unsupervisedly to group them according to the infection. After collecting the samples, the detection process follows the steps illustrated in
Figure 3. The data preprocessing and sequential data transformation into recognizable forms are part of these steps. Sequence amplitudes found in sequential data enable data recognition through their amplitude values, which select relevant features. The clustering technique generates Trojan-free and infected clusters from the chips. AI model fine-tuning is applied to get exactly the clustering.
The unsupervised model detects Trojan-infected clusters and Trojan-free clusters within the chips. The clustering method assigns clusters to the chips, which enables the chips to be grouped into two distinct categories. The proposed localization model analyzes sequential chip data following this step. The localization algorithm arranges the ROs in each chip to reveal where the Trojan was inserted. These lists of information further speed up reverse engineering in the targeted chip area. The complete assessment procedures enable researchers to distinguish between chips with inserted Trojans and those without them.
The evaluation process of this research benefits from the above strategy, which helps achieve the accuracy of golden-free Trojan detection. The evaluation metrics consist of precision value and the AUC value as well as accuracy, F1-score, recall, precision, False Negative Rate (FNR) and False Positive Rate (FPR).
4. Experimental Setup
4.1. Ring Oscillator
Generally, a ring oscillator comprises a closed loop of an odd number of inverting stages (
N). In this investigation, we utilize a 5-stage Ring Oscillator (
) consisting of one NAND gate and four inverters, as illustrated in
Figure 4.
The NAND gate serves as the control stage. If the loop consisted entirely of standard inverters, the circuit would oscillate continuously upon device power-up, preventing controlled data acquisition. By using a NAND gate, we introduce an Enable signal input. When the Enable signal is high, the NAND gate acts as an inverter, completing the loop and initiating oscillation; when low, the loop is broken, and the RO enters a static state. This gating mechanism is essential for synchronizing the oscillation with the system counter and minimizing power consumption during idle periods.
The data acquisition process, illustrated in
Figure 4, follows a sequential pipeline to capture the side-channel signatures. As shown in the figure, the collection workflow consists of the following steps:
Oscillation (Sensing): When enabled, the Ring Oscillator (RO) generates a high-frequency clock signal. This frequency is physically dependent on the local process parameters and the instantaneous voltage drop () at that specific location on the die.
Selection: A Multiplexer (MUX) selects the output from one of the distributed ROs to be measured, allowing the system to scan different regions of the FPGA sequentially.
Quantization: An 8-bit counter captures the RO oscillations over a fixed time window defined by the system clock. This converts the analog frequency behavior into a discrete digital count (the “signature”).
Transmission: The Universal Asynchronous Receiver-Transmitter (UART) module serializes the counter output and transmits the data to the host PC for post-processing and clustering analysis.
The circuits are designed to configure the RO blocks in different locations of the FPGA die. Virtually, they can cover every power rail on the chip with their placements. A voltage drop on the rail occurs if adversaries introduce an HT, even if the payload remains dormant. This is due to two primary factors:
Trigger Overhead: The Trojan’s trigger mechanism must continuously monitor system signals to detect the activation code. This constant switching activity consumes dynamic power, creating a localized voltage drop.
Leakage and Loading: The physical insertion of additional gates (both trigger and payload) increases the local leakage current () and parasitic capacitance.
These factors contribute to a specific extra voltage drop
and increased delay, which increases the propagation delay. Based on the standard Alpha-Power Law model for CMOS delay [
30], the oscillation frequency can be expressed as described in Equation (
1):
where:
f is the oscillation frequency of the Ring Oscillator.
is the gate carrier mobility.
is the supply voltage.
is the transistor threshold voltage.
is the additional voltage drop caused by the Trojan payload.
is the velocity saturation index (typically between 1 and 2).
N is the number of inverter stages in the ring.
k is the total capacitance and delay constant of the inverter stages.
Equation (
1) demonstrates that the frequency of the ROs is inversely proportional to the capacitive loading and delay factors (
) and directly reduced by the voltage drop (
) induced by the Trojan insertion.
Trojan Design
Adding Trojan circuits to the hardware can help release confidential information generated by the Test Chip. These Trojans are created and hidden within the chip in groups of 8, 16, or 32 bits. Their physical presence and activity introduce local voltage drops, which alter the oscillation frequencies of the nearby ROs as described by Equation (
1).
4.2. FPGA Setup
These MicroBlaze CPUs will be implemented on three Basys 3 FPGAs with HTs and ROs. The following section details the FPGA specifications used in this study. The Basys 3 FPGA development boards have Xilinx Artix-7 (XC7A35T-1CPG236C) FPGA chips with USB-JTAG protocol, designed for different forms of communication or interfacing. To implant the RO in the Xilinx XDC (Xilinx Design Constraints) macro characteristic, the particular placement and routineing Macro Feature has been used for placing the RO at a fixed location and ensuring uniform routing.
Dataset Generation: Trojan-Free vs. Trojan-Infected Scenarios
To construct a robust dataset for training and evaluation, we collected RO frequency signatures under two distinct operating conditions. First, we established a “Trojan-Free” baseline by measuring the RO frequencies while the Trojan circuits were inactive (dormant). Second, we activated the Trojan payloads (8-bit, 16-bit, and 32-bit variants) to generate the “Trojan-Infected” data classes.
Crucially, to ensure experimental validity, strictly identical placement and routing constraints were applied to both configurations using Xilinx Vivado. The physical setup, including the FPGA board orientation and Plexiglass thermal shielding, was maintained constant throughout the data acquisition process to minimize environmental noise. This methodology ensures that statistically significant deviations in the RO frequencies can be attributed to the Trojan’s power consumption rather than process variation or ambient thermal fluctuations.
4.3. Data Collection and Preparation
The data preparation procedure in this paper is very crucial for the correct clustering results during unsupervised learning phase. Each data sample collected from the IC is fully representative, containing feature vectors for both “golden” (Trojan-free) scenarios and Trojan-infected scenarios (labeled solely for ground-truth evaluation). The collected data is summarized in
Table 1.
It is important to clarify the granularity of the learning task. While the physical dataset comprises 18 FPGA chips (9 Trojan-free and 9 Trojan-infected), the fundamental unit of analysis in our framework is not the chip, but the individual RO. Each chip contains 32 independent ROs distributed across the fabric, and each RO generates a unique 1001 instances considered as a temporal frequency trace. Consequently, the effective dataset size for the machine learning model is .
Furthermore, to prevent the model from merely “fingerprinting” specific chips (i.e., memorizing the absolute process corner of a specific die), we apply two critical preprocessing steps:
RO-Level Feature Extraction: Instead of feeding raw time-series data (which would lead to high dimensionality and overfitting), we extract statistical descriptors from the 1001-point traces. These features capture the dynamic behavior and stability of the signal rather than its absolute frequency.
Per-Chip Normalization: The Algorithm 1 is implemented that normalizes RO frequencies against the chip’s own median baseline. This removes the global process variation bias (the unique “fingerprint” of the chip) and isolates the relative anomalies caused by local Trojan activity.
| Algorithm 1 Normalization Algorithm for Data Preparation |
Require: : Frequencies data matrix Ensure: : Normalized frequencies data matrix
- 1:
Initialize as an matrix - 2:
global average of all frequency instances in - 3:
for to m do - 4:
average of row i in - 5:
for to n do - 6:
- 7:
end for - 8:
end for - 9:
return
|
Therefore, the learning objective is to identify the statistical signature of Trojan-induced loading on a specific RO, a pattern that generalizes across different chips, rather than memorizing the identity of the 18 physical devices.
Noise is an element that can bring in undesired fluctuation in a dataset, mainly by the means of random changes or mistakes in measurement. Here, a moving average filter was applied to increase the signal-to-noise ratio. The filter smoothes the values within a specified window, effectively reducing high-frequency fluctuations. The difference between original data and cleaned data is shown in
Figure 5. The magnitude of the window will be determined with respect to the properties of the displayed dataset based on the required level of noise reduction. After cleaning the data, it is normalised because the effect of the Trojan on each chip differs. Due to the different physical characteristics of the chips, Trojan-affected SCA data of chip A can look similar to Trojan-unaffected data of chip B. To avoid this situation, Algorithm 1 is employed.
4.4. Feature Extractions
This section provides a robust paradigm for extracting valuable information both in locating Trojan-free and Trojan-inserted chips. It will subject re-processed visual data sequences to image-based methods as well as analyze the extraction process of different statistical features from the data and will develop a totally different approach. The implementation set-up in the experimentation and detailed implementations considering the set of statistical features in the next sections are explained below.
To verify the discriminatory power of our selected statistical descriptors, we conducted a comprehensive feature importance analysis using the Gini importance metric derived from the trained AdaBoost classifier (
Figure 6). The high importance scores of higher-order moments empirically validate their discriminatory power in distinguishing Trojan activity from process variation. The analysis reveals that autocorrelation is the most significant predictor of Trojan activity, followed by standard deviation. From a physical perspective, these results are consistent with HT behavior. Autocorrelation captures the temporal signatures of the ring oscillator, which are disrupted by the intermittent switching activity of the Trojan trigger, while Delta effectively flags sudden localized frequency drops caused by the capacitive loading of the Trojan payload. In contrast, Standard Deviation reflects the increased jitter and instability introduced by the malicious logic. This confirms that our feature set is not merely capturing random noise, but is effectively isolating distinct physical perturbations introduced by the Trojan.
4.4.1. Standard Deviation
This is measured by standard deviation
; it quantifies the amount by which each datum in the sequence varies from or differs from the mean. Mathematically, this is shown as:
where
represents each value in the dataset,
denotes the mean of the dataset, and
n is the total number of values.
4.4.2. Range
The range of a dataset measures the spread between the maximum and minimum RO frequency value for each RO of every chip and is calculated as:
4.4.3. Delta
Delta represents the difference between consecutive values in a dataset, calculated as:
4.4.4. Percentage Change
Percentage change measures the relative change between two consecutive values of RO frequency in each instance, computed as:
4.4.5. Autocorrelation
Autocorrelation measures the correlation between a signal and a delayed version of itself at different time lags. It can be calculated using the formula:
where
represents the autocorrelation at lag
k,
is the value at time
t,
is the mean of the dataset, and
n is the total number of values.
These statistically based features provide valuable insights into the characteristics and behavior of the data, contributing to the Trojan detection process.
4.5. Analysis Method
The K-Means clustering algorithm is utilized as the primary analysis method to differentiate Trojan-infected chips from clean ones using the extracted statistical features. K-Means is a centroid-based unsupervised learning algorithm that partitions a dataset into
K distinct, non-overlapping clusters [
31]. Each data point is assigned to the cluster with the nearest mean, known as the centroid, which serves as the representative of that cluster.
The algorithm operates by initially selecting K centroids at random. Each data point is then associated with the nearest centroid based on a distance metric, typically Euclidean distance. Subsequently, the centroids are recalculated as the mean of all data points assigned to the corresponding cluster. These steps are repeated iteratively until convergence is achieved, which occurs when the centroids stabilize or a maximum number of iterations is reached.
The objective function of K-Means is to minimize the within-cluster sum of squares (WCSS), defined as:
where
denotes the set of data points in cluster
k,
is the centroid of cluster
k, and
represents a data point in the feature space.
In the context of HT detection, each data point corresponds to a chip instance characterized by statistical features such as standard deviation, range, delta, percentage change, and autocorrelation of RO frequencies. The clustering process aims to separate these instances into groups that reflect Trojan-free and Trojan-infected behavior, without relying on labeled data.
The centroid-based nature of K-Means presents both strengths and weaknesses. While it offers computational efficiency and interpretability, it is inherently sensitive to small perturbations in the feature space. This characteristic can be exploited by adversaries, who may subtly modify the input data to shift Trojan-infected samples closer to centroids representing Trojan-free clusters. As a result, these manipulated samples may be misclassified, thereby evading detection.
5. Proposed Adversarial Attack on Clustering
Adversarial attacks remain a persistent concern despite the application of various defense techniques. In simpler terms, even small alterations in test samples, which are imperceptible to humans, can still confuse machine learning models and result in misclassifications. Despite the distinctive models and experimental setups proposed in our research, they have revealed vulnerabilities that are susceptible to exploitation.
In HT security, we suggest the usage of frequency side-channel analysis while coming up with defenses against HT attacks. However, adversaries can easily make use of easily vulnerable points through the cloning of a legitimate chip and the embedding of a malicious circuit in the respective cloned chip, differing just a little in frequency from the authentic circuit. This causes confusion in the ML detection models, and so the classification of the circuit is identified as Trojan-free.
Adversarial data attacks on clustering models aim at the clustering models’ misclassification of manipulated data points into another cluster. The following algorithm presents a process of perturbing the source and destination of data points from a good cluster, A, into another good cluster, B, and consequently causes them to be misclustered with the highest perturbation, under a perturbation threshold parameter and step size t.
Physical Realizability of Adversarial Perturbations
A critical consideration in hardware security is the mapping between mathematical feature perturbations and physical reality. While Algorithm 2 operates in the feature space (modifying
x to
), these mathematical shifts correspond to tangible modifications in the Trojan’s design implemented during the pre-silicon layout phase.
| Algorithm 2 Adversarial Attack Algorithm |
Require: Sorted pairs array D, maximum perturbation , step size t Ensure: Subset of perturbed data points successfully misclustered - 1:
for all pair do - 2:
Let a be the original sample and be the adversarial sample (initially ) - 3:
Perturb the features of in the direction of point b in the pair - 4:
for all feature of the adversarial sample do - 5:
Compute - 6:
- 7:
Compute magnitude: {Euclidean distance} - 8:
if
then - 9:
continue to next pair d - 10:
end if - 11:
if is clustered into target cluster B then - 12:
Add to set - 13:
break - 14:
else - 15:
Continue perturbation loop - 16:
end if - 17:
end for - 18:
end for
|
where:
Source Cluster (A) contains the data points intended for misclassification.
Target Cluster (B) is the desired misclassification cluster.
Sorted Pairs array (D) is an array of k (a, b) point pairs with the smallest Euclidean distance in ascending order, where and .
Maximum perturbation () represents the threshold for permissible perturbations.
Step size (t) indicates the magnitude of each perturbation step.
k denotes the number of closest points to Target Cluster B to be perturbed.
and are subsets of the dataset clustered in Source Cluster (A) and Target Cluster (B), respectively.
represents a data point in Source Cluster (A), and represents a data point in Target Cluster (B).
is the subset of perturbed data points successfully misclustered, where and .
We map our specific feature set to physical hardware parameters as follows:
Modulating ‘Delta’ and ‘Range’ (Magnitude Features): The ‘Delta’ feature captures the maximum localized frequency drop, while ‘Range’ captures the global spread. In a physical setting, these are directly proportional to the Capacitive Load of the Trojan payload. During the mask design stage, an attacker physically “perturbs” these values by adding or removing gates in the Trojan design. A larger Trojan payload induces a steeper frequency drop (increasing ‘Delta’), effectively sliding the feature vector along the magnitude axis.
Modulating ‘Autocorrelation’ (Temporal Feature): This feature quantifies the temporal dependency and stability of the ring oscillator signal. Physically, this is controlled by the Trojan Triggering Activity. By adjusting the trigger logic definition in the netlist to switch less frequently or in specific burst patterns, an attacker can mathematically shift the autocorrelation value to mimic benign process noise.
It is important to note that these perturbations are not applied to fabricated chips. Instead, the adversary utilizes a surrogate model (as defined in
Section 2.2) to iteratively optimize the Trojan’s design parameters before fabrication. The optimized feature vector
guides the foundry to manufacture a chip that inherently possesses the statistical properties required to evade detection.
6. Proposed Defensive Method
The proposed defense consists of two major stages: (1) unsupervised clusterwise pseudo-label generation, and (2) supervised classification and adversarial training. The first stage builds a trusted pseudo-labeled dataset from raw chip measurements, while the second leverages this dataset to train an adversarially resilient AdaBoost classifier. This section details the pseudo-label generation process, which is critical to ensuring robustness without golden references. The flow of the defensive approach is illustrated in
Figure 7.
6.1. Per-Chip Normalization
Each chip provides frequency measurements from ring oscillators (ROs) across multiple instances (1001 per RO). Since each chip is fabricated independently, its absolute RO frequencies may differ due to process variations, voltage fluctuations, and measurement conditions. Consequently, direct comparison of raw frequency values across chips is misleading, as differences may reflect environmental drift rather than Trojan activity. The step-by-step implementation of this procedure is detailed in Algorithm 3.
To ensure comparability, we normalize each chip individually by standardizing all of its RO measurements. For chip
i, the normalized measurement
for RO
j is given by
where
is the raw frequency of RO
j on chip
i, while
and
denote the mean and standard deviation computed across all ROs and instances of chip
i. This normalization preserves the shape of the distribution while removing chip-specific offsets and scales, allowing the subsequent feature extraction to focus on statistical anomalies caused by Trojan insertion rather than unrelated chip-to-chip variability.
| Algorithm 3 Per-Chip Normalization |
Require: Raw frequency measurements for each chip i Ensure: Normalized measurements - 1:
for all chip i do - 2:
Compute - 3:
Compute - 4:
for all measurement of chip i do - 5:
- 6:
end for - 7:
end for
|
6.2. Statistical Descriptors
While normalization aligns chips to a common reference scale, it does not explicitly capture higher-order distributional information. HTs are known to alter not only the mean frequency of ROs, but also their variability and distributional shape. To capture these effects, we compute a set of statistical descriptors for each RO of each chip.
For chip
i and RO
j, let
denote the normalized frequency measurement of the
k-th instance (
) of RO
j. We define the following descriptors:
where:
is the normalized frequency of the k-th measurement of RO j on chip i after per-chip normalization.
n is the total number of measurement instances per RO.
is the mean of the normalized RO measurements, representing the central tendency.
is the standard deviation of the normalized RO measurements, capturing variability.
measures the asymmetry of the RO frequency distribution. Positive skew indicates a longer tail on the right, while negative skew indicates a longer tail on the left.
quantifies the tail heaviness (peakedness) of the distribution. The subtraction of 3 makes it excess kurtosis, so that a normal distribution has kurtosis zero.
Concatenating these descriptors across all
d ROs yields the chip-level statistical feature vector:
This feature vector summarizes both the central tendency and higher-order moments of each RO’s frequency distribution, enabling detection of subtle anomalies caused by HTs.
6.3. Correlation Features
HTs may also influence how ROs interact with each other. A Trojan that alters power distribution, for example, may create correlated fluctuations across multiple oscillators. To capture such interdependencies, we compute the Pearson correlation coefficients between every pair of ROs:
where
and
are the standard deviations of oscillators
j and
k, respectively. The upper-triangular portion of the correlation matrix is vectorized and appended to the statistical descriptors, yielding a combined feature vector:
This feature vector now reflects both local RO statistics and global RO interactions. The collection of these vectors constitutes the input dataset for the proposed framework. Specifically, each serves as an input sample for the Normalization procedure (Algorithm 1) and is subsequently processed by the Adversarial Attack Algorithm (Algorithm 2) to generate perturbed training examples.
6.4. Dimensionality Reduction
The combined feature space is high-dimensional, containing thousands of features (from both statistical descriptors and correlations). To improve clustering efficiency and robustness, we apply principal component analysis (PCA). Features are first standardized, then projected into a lower-dimensional subspace. Principal components are retained until 95% of the total variance is preserved, ensuring that the reduced representation maintains essential discriminatory information while discarding redundant noise. The reduced chip representation is denoted as .
We employ PCA to reduce feature dimensionality while preserving 95% of the total variance. While it is often argued that anomalies reside in low-variance components, this assumption holds primarily for raw data. In our framework, we utilize engineered statistical features (e.g., Delta, Autocorrelation) specifically designed to amplify the Trojan’s signature. Consequently, the Trojan-induced deviations manifest as structural variance captured in the top Principal Components.
The discarded bottom 5% of variance corresponds primarily to uncorrelated stochastic noise, such as thermal fluctuations and measurement jitter, which do not correlate with the Trojan’s systematic structural impact. Removing these low-variance components acts as a denoising step, enhancing the signal-to-noise ratio (SNR) for the subsequent clustering phase.
6.5. Clusterwise Pseudo-Labeling
To separate Trojan-free and Trojan-inserted chips without a golden reference, we apply k-means clustering () to the PCA-reduced chip representations . Each chip is thus assigned a cluster label . However, directly using cluster labels as ground truth risks propagating noise, as some chips may be ambiguously assigned. To mitigate this, we introduce a clusterwise filtering strategy that retains only high-confidence chips.
6.5.1. Per-Chip Validation Metrics
For each chip i, two complementary metrics are computed:
Silhouette score , which quantifies how well chip i fits within its assigned cluster relative to the nearest alternative cluster. Higher indicates stronger cluster membership.
Bootstrap stability , defined as the fraction of clustering runs (under bootstrap resampling of features) in which chip i is consistently assigned to the same cluster. This measures robustness to perturbations.
6.5.2. Clusterwise Thresholding
To avoid global thresholds that may unfairly penalize minority clusters, thresholds are computed within each cluster. Specifically, for cluster
, we calculate the median silhouette
and median stability
. A chip
is retained if
The retained chips form a high-confidence seed set with pseudo-labels . Non-retained chips are excluded from training, preventing mislabeled samples from corrupting the supervised model. This filtering step thus converts noisy unsupervised clusters into a small but trustworthy pseudo-labeled dataset, which can then be leveraged by downstream classifiers.
6.6. Refining Decision Boundaries via Core-Based Learning
A critical aspect of our framework is ensuring that the supervised classifier does not merely replicate the errors of the initial unsupervised clustering. Standard k-means assumes spherical clusters and often misclassifies samples at the decision boundaries (low confidence regions).
By applying Clusterwise Filtering (Equation (
13)), we remove these ambiguous boundary samples and retain only the “Cluster Cores”, samples with high silhouette scores and bootstrap stability. The supervised model (AdaBoost) is then trained exclusively on these high-confidence cores. Unlike
k-means, AdaBoost learns a non-linear discriminative hyperplane. This allows the model to generalize the properties of the core samples to correctly classify the previously ambiguous boundary samples, effectively correcting the initial clustering errors rather than propagating them.
6.7. Training the Supervised Model
The pseudo-labeled dataset is used to train an AdaBoost classifier. AdaBoost, short for Adaptive Boosting, is an ensemble learning method that combines multiple weak classifiers (e.g., decision trees) to form a strong classifier. The training process emphasizes misclassified samples by assigning them higher weights, ensuring that subsequent classifiers focus on these challenging cases.
The weight update process in AdaBoost is governed by the following equations:
where
represents the weight of the
t-th weak classifier, and
is its error rate. The sample weights are updated as:
where
is the true label of sample
i,
is the input feature, and
is the prediction of the
t-th weak classifier. These weights are normalized to maintain a valid probability distribution.
The first AdaBoost model, modell, is trained using the pseudo-labeled dataset and saved as model1.pkl. This model is tested against adv_sample to evaluate its initial robustness against adversarial attacks.
6.8. Adversarial Training and Model Enhancement
To enhance resilience, the pseudo-labeled dataset is augmented with adv_sample2, creating a comprehensive training set that includes diverse adversarial scenarios. A second AdaBoost model, model2, is trained on this augmented dataset following the same methodology as model1. The inclusion of adversarial examples during training enables model2 to learn robust decision boundaries, making it resilient to future adversarial perturbations.
6.9. Validation and Testing
The final model, model2, is tested against both adv_sample and a validation set. The validation process evaluates classification accuracy, precision, recall, F1-score, and robustness against adversarial attacks. Results demonstrate that model2 significantly outperforms model1, achieving higher accuracy and resilience.
To further validate the model’s reliability, reverse engineering is applied to Trojan-infected samples. This step identifies the insertion points of HTs within the integrated circuits, providing actionable insights for hardware security.
The proposed method effectively transforms an unsupervised clustering model into a pseudo-supervised framework. By integrating adversarial training, outlier removal, and pseudo-labeling, the approach enhances the robustness of HT detection. Unlike traditional supervised learning, this method does not rely on ground truth labels, making it adaptable to scenarios where labeled data is scarce or unavailable.
The use of AdaBoost further strengthens the system by creating a robust classifier capable of handling adversarial perturbations. This iterative and modular methodology can be extended to other unsupervised machine learning applications, paving the way for more secure and trustworthy clustering systems.
This defensive approach establishes a secure and reliable framework for HT detection, ensuring the integrity of integrated circuits in adversarial environments.
8. Conclusions
This study presents a golden-free framework for detecting HTs by leveraging self-supervised clustering and adversarial training. By introducing a pseudo-supervised learning approach, the proposed method enhances the resilience of unsupervised models against specific adversarial manipulations. Experimental results on FPGA platforms demonstrate that the framework achieves high classification accuracy and recall, offering a robust alternative to traditional methods that rely on expensive golden reference models.
Furthermore, the approach exhibits computational efficiency, as the detection logic scales with the number of distributed sensors rather than the underlying transistor count. While the current validation on FPGAs yields promising results, extending this framework to heterogeneous System-on-Chips (SoCs) remains a critical direction for future research. Specifically, addressing the complex power noise floors and sensor placement constraints in billion-transistor designs will be essential for industrial adoption. Ultimately, these findings highlight the potential of integrating adversarial defense mechanisms into statistical side-channel analysis to enhance hardware trust.