1. Introduction
Integrating the internet and cloud has transformed the way smart buildings process data [
1], enhancing perception, understanding, and responsiveness to complex operations. Integration plays a crucial role in tasks such as unauthorized access detection, equipment behavior analysis, and vulnerability identification in building automation systems [
2]. Data has become the oil of the information era. By leveraging real-time, multi-location Internet of Things (IoT) sensor data from building components, we can uncover the value of streams captured by sensors embedded in walls, elevators, and energy meters [
3,
4]. This enables deep learning (DL) in automated tasks such as classifying building traffic patterns, detecting anomalies in fire alarm communications, and semantically segmenting occupancy-related network behaviors, improving the efficiency and accuracy of smart building traffic data processing [
5].
As AI-enabled smart buildings become more prevalent, securing and ensuring the reliability of building network traffic becomes increasingly important [
6]. It is critical for the data backbone that supports access control, energy monitoring, and fire safety, as abnormal traffic can disrupt building decision-making and put the safety of occupants at risk.
Modern building automation systems increasingly rely on AI-driven analytics for energy optimization and predictive maintenance. For example, we can develop targeted malware that exploits building automation protocols with AI to simulate falsifying or modifying attacks on the chiller controller, thereby endangering the entire network infrastructure [
7]. However, the transmission of critical operational data over public or converged networks amplifies vulnerabilities [
8]. Intelligent buildings incorporate multiple heterogeneous networks, such as building automation, energy management, and security monitoring. The heterogeneous integration of communication technologies exacerbates these vulnerabilities by expanding the attack surface. AI-based security systems are not invulnerable to attack. Adversarial perturbations can deceive DL-powered intrusion detection models, leading to misclassification of malicious traffic as normal and compromising the reliability of network monitoring systems [
9]. These evolving threats underscore the urgent need for robust, adaptive security frameworks.
AI models, including traffic classification models and intrusion detection systems (IDS), rely on abundant labeled abnormal traffic data to learn patterns, enabling automated, high-precision, real-time detection. Lacking high-quality samples harms generalization, leading to missed detections or false alarms against new variant attacks and weakening overall defense. Acquiring abnormal traffic data is crucial for training robust AI detectors, and the quality of data acquisition directly affects AI security effectiveness [
10].
Architectural anomaly traffic data often contains sensitive information such as device vulnerabilities and attack paths, and direct transmission may trigger security policies or be maliciously intercepted. Existing encryption technologies can protect content, but cannot hide transmission behavior itself [
11]. However, sharing abnormal traffic data on open networks presents challenges. Security devices such as firewalls, IDS, and antivirus software have predefined rules for known abnormal traffic, such as ports, payload characteristics, and behavioral patterns [
12]. When abnormal traffic triggers these rules during transmission, they will be intercepted, truncated, or even isolated. This makes it challenging to share real abnormal traffic samples for model training and verification. This creates a data island effect, where key abnormal traffic data cannot be transmitted safely, hindering the upgrade of AI-based IDS models. Unusual traffic may involve attack details or sensitive information, and transmission may also pose compliance risks, making it even more challenging to share [
13,
14].
To address the challenges of sharing abnormal traffic data, this paper proposes a secure, undetectable traffic transmission framework. Steganography is the art of concealing secrets within everyday media such as audio, images, or text to enable covert communication [
15]. Distributing critical information over public networks increases the risk of cyberattacks. Steganography embeds secret messages within pixels to protect sensitive assets and maintain intellectual property authenticity. The abundant image data provides a perfect platform for generating and transmitting AEs.
Figure 1 shows the proposed method for creating container images, which then embeds abnormal traffic into frequency-domain adversarial perturbations through steganography to create visually imperceptible stego images. Clean examples follow natural distribution, while adversarial examples (AEs) deviate from the original distribution by adding subtle perturbations [
16]. We embed abnormal traffic in AEs through steganography [
17]. Perturbations in AEs can exceed AI model recognition boundaries, making them less detectable by traditional IDS tools. When abnormal traffic is transformed into pixel perturbations or frequency-domain data in containers like images, its original characteristics are masked, bypassing firewall rules based on traffic features. This approach not only employs AEs to evade AI detection but also conceals covert traffic transmission with steganography, enabling cross-network and cross-entity sharing of abnormal data. The integrated protection scheme, which blends camouflage in the frequency domain with steganographic embedding, protects against interception during cross-network or cross-departmental transmission. This approach ensures the accuracy of the extracted traffic, which is crucial for training robust building security models and addressing the sample island limitation [
18]. The main contributions of this work are threefold:
We propose a hidden transmission method for network abnormal traffic based on multi-channel image reconstruction, which improves the limited information-carrying capacity in current abnormal traffic transmission. The method begins by extracting one-dimensional feature sequences from abnormal traffic. These sequences are then transformed into two-dimensional grayscale images using three distinct mapping algorithms. The multi-channel traffic images are steganographically embedded within the frequency domain.
We propose an adversarial steganography method that embeds traffic images in the frequency domain to evade security systems. Compared to traditional steganography in the time domain, frequency-domain embedding can disperse information features and enhance concealment.
Experiments show that our framework is effective and combines high- and low-frequency traffic image information to protect against high-fidelity adversarial attacks (AAT). This, in turn, enables secure and undetectable sharing of abnormal traffic data across open networks.
The rest of this paper is structured as follows:
Section 2 provides a comprehensive review of current literature on sensing images in security. In
Section 3, we introduce our proposed method and explain how we integrate frequency quantization and image enhancement techniques to produce adversarial steganography samples.
Section 4 showcases the benefits of our approach. In conclusion, we summarize the paper and discuss future directions.
2. Literature Review
Detecting Abnormal Traffic within Smart Building Environments. BACnet protocol is the core communication protocol for critical equipment such as HVAC (heating, ventilation, and air conditioning) and fire alarm systems in smart buildings. To overcome the lack of real BACnet protocol traffic datasets, which is a key limitation for HVAC system IDS development in smart buildings due to confidentiality, Seyed et al. [
19] propose a framework that injects real collected data into a scenario-based virtual controller simulator to generate BACnet traffic. Tsion et al. [
20] propose countermeasures based on network and physical security, and develop an intrusion detection system based on tree-based algorithms that are tailored to the characteristics of BACnet devices in industrial control networks. Desta et al. [
21] take long–short-term memory (LSTM) networks to detect anomalies in automotive control data. Kang et al. [
22] build a classifier with a deep belief network and test it on simulated datasets. Traditional detection methods based on rules or single-feature analysis struggle to effectively detect cross-system collaborative attacks. Our embedding preserves the multidimensional features of abnormal traffic.
DL has undergone swift development. There have been concerns raised about the complexity and inscrutable aspects of DL-based models, including their susceptibility and lack of a rationale behind their decision-making process. Sharma et al. [
23] find that removing low-frequency adversarial perturbations can make AEs easier to detect. Duan et al. [
16] propose a novel attack method called AdvDrop, which optimizes a quantization table to generate AEs. However, the visual quality of the generated AEs is often poor. Luo et al. [
24] propose a low-frequency constrained method that adds adversarial perturbations to the high-frequency components of an image, resulting in better visual effects. However, this method is optimization-based and requires more iterations compared to gradient-based attack methods. Li et al. [
25] introduced a low-frequency AAT. Since it only requires one transformed example per iteration, it may lead to overfitting on the substitute model. Du et al. [
26] develop a fast C&W algorithm for image recognition. The rapid C&W algorithm’s adversarial noise mainly concentrates near the target area, resulting in positive outcomes in image recognition attacks. DL-based models are vulnerable when faced with AEs caused by adding imperceptible perturbations. Despite various strategies proposed to address AEs, the issue of enhancing defense against unknown attacks remains unresolved. To address this, Gong et al. [
27] employ image reconstruction to train classifiers, eliminating any biases. AEs often mislead DL, leading to high confidence but inaccurate classifications and hindering the utilization of DL. To address this, Zhang et al. [
9] employ adversarial training (AT) based on an adaptive frequency-domain transform to strengthen deep models against AEs.
Our Solution against Secure Traffic Recognition. Existing research on network anomaly detection often focuses on isolated aspects such as temporal modeling with LSTMs, statistical features like frequency or entropy. However, these methods exhibit critical limitations: they rely on single-mode data representations that fail to capture the multidimensional characteristics of abnormal traffic, lack mechanisms for secure cross-domain data sharing without triggering alerts, sacrifice visual quality for adversary effectiveness, and require computationally expensive optimization processes. Most techniques address either detection accuracy or data hiding separately, without integrating both objectives into a unified framework.
To address these gaps, this paper proposes a novel multi-channel imaging hiding method that transforms abnormal traffic into enriched visual representations.
Figure 1 utilizes an information fusion approach that minimizes noticeable perturbations during adversarial steganography, enhancing image robustness against potential attacks. By fusing these temporal and structural features into a unified image representation and embedding it into frequency-based AEs, our approach simultaneously enhances feature integrity for accurate detection and enables stealthy transmission that evades both conventional security systems and AI-based monitors. The proposed framework advances information fusion for a traffic data secret-sharing mechanism, ensuring statistically undetectable pixel distributions.
3. Proposed Method
3.1. Threat Model
The threat model for this work focuses on passive adversaries in smart building networks, including external eavesdroppers capable of monitoring inter-domain traffic and internal security devices such as IDS and firewalls equipped with rule-based or AI-driven detection capabilities. Adversaries may perform deep packet inspection, traffic feature analysis, or steganalysis to identify abnormal traffic patterns, but they lack knowledge of specific steganographic keys, frequency-domain perturbation parameters, or multi-channel fusion strategies. The model aims to protect the integrity and stealth of abnormal traffic during cross-departmental sharing while ensuring that the covert transmission process does not trigger false alarms or disrupt legitimate building automation systems such as BACnet-based control loops. It assumes that effective communication security is in place and ignores active attacks such as data manipulation or poisoning.
3.2. Traffic Image Generation
The traffic image fusion system combines images generated from traffic data. Existing methods directly map network traffic data into images. However, storing one-dimensional data to generate two-dimensional images often produces images that lack practical significance, making it challenging to extract the full characteristics of traffic data and resulting in insufficient information extraction. This is because the generated traffic images are fundamentally different from natural images. The spatial structures of directly converted traffic images, such as edges, color blocks, and textures, lack real physical meaning, making it challenging to leverage their built-in prior knowledge for effective feature extraction and learning.
While RGB is the most commonly used color space, it may not be the best choice for traffic data representation as traffic visualization does not align with the red, green, and blue channels as natural images do. We propose to convert one-dimensional data into two-dimensional images, including Markov transition fields (MTF), recursive plots (RP), and Gram angular fields (GAF). Each method has unique features that effectively extract information and enhance the detail in traffic images. The core idea of MTF is to convert one-dimensional sequence data into a two-dimensional image representation based on the transition probability of the Markov chain. The transition patterns are analyzed to create a visual representation of the data. We divide the given data into
n equal-sized quantile bins, each bin contains the same number of data points, and each data point is in a unique bin according to Equation (
1):
Let the original one-dimensional traffic feature sequence be . To convert it into a Markov Transition Field (MTF), we first partition the values of X into Q discrete quantile bins. Each bin (where ) contains an equal number of data points. The MTF matrix M is then constructed as a matrix, where each element represents the transition probability of a value in quantile bin at time t being followed by a value in the quantile bin at time . This probability is empirically estimated from the frequency of transitions observed across the sequence X.
The size of the matrix, which is also the size of the generated image, is . represents the transition probability from quantile bin to quantile bin .
RP are constructed through phase space reconstruction, where a one-dimensional time series is embedded into an m-dimensional space with a time delay of . This transformation creates state vectors that capture the system’s dynamics and controlling the temporal resolution of the embedded trajectory.
While MTF emphasizes state transition patterns and RP highlights inherent similarities, they do not account for temporal dynamics, frequency distribution, and abrupt change patterns. Single-representation images have low information entropy, causing models like DL to miss critical attack features during feature extraction, such as concurrent DDoS connection spikes or abnormal R2L login sequences. Training accuracy and generalization may decline, hindering the adaptation to the evolving landscape of internet attacks.
GAF leverages the angular relationships among data points to reveal the dynamics and structure of the input data. It calculates the cosines of angles between data points in the sequence and maps these values to a two-dimensional image, capturing features and patterns. The conversion maps the traffic data
to
or
to obtain
. It then generates polar coordinates by timestamp as the radius and arccosine as the angular component. After transforming
to the polar system, the Gram matrix is constructed from the cosine or sine of the inter-point angles according to Equation (
2).
where
represents the angle of the polar coordinate for data point
i.
and
represent
and
, respectively. The traffic data can be transformed into an image where each matrix element is displayed in color. GASF focuses capturing overall trends, while GADF focuses on local changes. As a result, GASF is better suited for binary classification tasks.
3.3. Traffic Steganography
The multi-channel traffic image is then converted to a JPEG-encoded image. To reduce transmission bandwidth and storage space requirements during image transmission, various image compression techniques are employed. JPEG has become the dominant format for image transmission [
28], owing to its high compression efficiency and widespread applicability [
29]. However, while JPEG provides efficient transmission, its complex compression algorithm can result in quality degradation and data loss when decompressed. These issues are crucial for secure image processing, as even minor image quality degradation can affect image decoding and analysis accuracy.
To ensure secure and efficient covert transmission of the generated multi-channel traffic images, we take an adaptive steganographic framework that minimizes an additive distortion function. We adopt the J-UNIWARD (JPEG Universal Wavelet Relative Distortion) algorithm [
30], widely regarded as one of the most secure distortion measures for JPEG-domain steganography. J-UNIWARD computes embedding distortion from changes in spatial-domain wavelet coefficients, enabling the embedding to adapt to local content and favor complex, textured regions that better conceal perturbations.
The human eye is more sensitive to brightness than color, so reducing color pixels has little impact. To achieve this, we can transform an image from
to
space, keeping the luminance at levels 0–255, and subsampling the chrominance
and
. Then, we perform chroma subsampling at
, compressing 50% of the storage space compared to the original image.
Encoding traffic data as images enables steganographic embedding within AEs. This integration exploits the invisibility of adversarial perturbations to further conceal the traffic data and reduce suspicion. As AEs already contain subtle, human-perceptible modifications, embedding traffic images within them further reduces the perception of hidden information. By masking traffic features within these perturbations, the fused data is less likely to be detected by security mechanisms, enabling secure transmission of sensitive information while preserving the integrity of both the AE and the embedded data for subsequent analysis.
3.4. Frequency Compression-Based Adversarial Examples
The core challenge in concealing mixed-traffic images for covert transmission is to balance the dual requirements of preserving the embedded traffic features and maintaining the visual imperceptibility of the stego image, which is crucial for seamless integration and evasion. Traditional spatial-domain steganography and frequency-domain modification methods face trade-offs: altering low-frequency coefficients can degrade overall image quality and trigger detection in smart building monitoring, while changing high-frequency coefficients could risk robustness. Minor transmission noise or compression may corrupt embedded traffic data, leading to feature loss when extracting anomalies for building a security model training. For mixed-traffic images representing multi-dimensional smart building features, any loss or distortion of embedded information undermines their value for training security models, exacerbating the sample-island problem.
By fusing temporal and structural features into a unified image representation and embedding it in frequency-based AEs, our approach enhances feature integrity for accurate detection and enables stealthy transmission. Given a DL model
and an original input
, an adversarial attack seeks a perturbation
such that
where
is the allowable perturbation space.
is the upper bound on perturbations to ensure imperceptibility. For each iteration
, the perturbation is updated as follows:
where
is the projection operation constraining the perturbation within the
ball which is the
norm.
is the step size.
is the loss function with the final adversarial example being computed as
after
T iterations.
The initial value of the AEs is usually determined by a random perturbation near the original sample
x. Improving the attack success rate can be achieved through random initialization and avoiding local optima:
where
is a random perturbation following a uniform distribution over
. The initial sample
must first be projected by
to ensure it conforms to the legitimacy of the original sample. For the
t-th iteration (
), we compute the gradient of the loss function
with respect to the current sample
, denoted as
. Then, update the sample along the gradient direction and restrict the perturbation range through the projection:
where its perturbation range is constrained to prevent the updated sample from exceeding the pre-set boundaries.
We further optimize the frequency-domain transformation and modification strategy. We divide an image into
blocks and transform each block from the spatial domain to the frequency domain. We choose discrete cosine transform (DCT) as the transformation, which transforms cosine functions of different frequencies to represent finite sequences of data points. After the DCT transformation, the image information is concentrated in a few low-frequency coefficients, while texture and edge details are distributed in the mid-to-high-frequency coefficients. The DCT equation can be expressed as follows:
where both
u and
v are both equal to 0, and
and
are equal to
. When
u and
v take any other values,
and
are equal to 1. Equation (
9) computes the
uth,
vth entry of
;
is the value on image coordinates
. J-UNIWARD embeds traffic features into the quantized DCT coefficients of the AE cover image. The objective of embedding is to minimize the sum of relative changes in wavelet coefficients between cover and stego images, preserving statistical undetectability.
Noting that the DCT transformation is lossless and invertible, information loss may occur during the subsequent quantization step. After the spatial domain transformation, attacks are initiated strategically during the quantization phase due to information modification in the quantization process. This modification may lead to misclassifications by classification models.
For the adversarial loss
, we design the loss function [
31] as follows:
where
represents an AE generated by attacking a clean sample, and
represents the confidence score of the AE
for the target model
f on classes other than the target class
k.
and
represent the confidence scores of the AE
for the target model
f on the original label class
y and the target label class
, respectively.
Abnormal traffic is concealed by generating AEs. Without active training, these samples may be misclassified as real attacks by other IDS, triggering false alarms or defensive actions that disrupt operations and maintenance. Incorporating AEs into training enhances the robustness of the internal security detection module, enabling it to distinguish malicious concealed traffic from benign business data. This not only ensures the reliability of the concealed transmission link but also prevents unnecessary resources or operational interruptions due to false reports.
The AT can be expressed as the following min-max optimization problem:
where the inner maximization finds the worst-case perturbation
within a
-ball that maximizes the loss
. The outer minimization optimizes the model parameters
to minimize the expected loss under these worst-case perturbations. The robust model learns to perform well not just on clean examples, but on adversarially perturbed versions of them, thereby improving its robustness against attacks.
5. Conclusions
The increasing sophistication of smart building networks, which combine critical systems such as access control, energy monitoring, and HVAC management, creates a pressing demand for robust anomaly detection and secure data exchange. However, current approaches struggle with two fundamental weaknesses. Single-mode traffic analysis often overlooks the complex nature of multi-faceted attacks, and traditional secure transmission methods like encryption or VPNs inadvertently highlight the very data they aim to protect, rendering it vulnerable to interception. In response to these limitations, we introduce a novel framework designed for both secure and undetectable traffic transmission. Our solution leverages a two-stage process. First, it converts anomalous traffic patterns into comprehensive visual representations through complementary mappings to ensure no feature loss. Second, it conceals these representations within ordinary cover images using a frequency-domain steganographic technique that preserves the statistical properties of the carrier, leaving no detectable trace. To address these limitations, we propose a novel framework that enables secure and undetectable traffic transmission. Our solution employs a two-stage approach. First, it transforms anomalous traffic patterns into comprehensive visual representations through complementary mappings to prevent any feature loss. Second, it embeds these representations into ordinary cover images using a frequency-domain steganographic technique. This method maintains the statistical properties of the carrier image, ensuring that no visible traces are left. Experimental evaluations confirm the effectiveness of the proposed framework. It maintains a remarkably low BER for data recovery while simultaneously evading detection by standard security appliances. This dual capability facilitates the reliable and covert sharing of critical operational intelligence across smart building infrastructures.
In future work, we will explore the framework’s application to encrypted traffic and pursue real-time optimizations for large-scale networks. Furthermore, as a complementary direction beyond the cover-based steganography methods used in this work, we will investigate carrier-free steganography mechanisms based on deep learning to achieve enhanced secrecy.