3.1. System Overview
This method addresses the challenges of high concurrency and low latency in smart logistics edge environments. It adopts a three-layer architecture. The layers are data plane optimization, semantic processing, and dynamic scheduling. The system integrates four functional modules: traffic scheduler, publisher, message center, and subscriber. Each module is mapped to a specific layer. The modules work together to enable efficient and adaptive data stream management.
Table 1 lists the basic symbols used in this work.
The data plane optimization layer is responsible for high-speed data ingestion. It incorporates the publisher module. The module includes the network packet aggregator, data controller, and message publisher. These components aggregate packets from smart terminals. They extract essential features and convert the data into structured messages. The system uses zero-copy and polling techniques based on DPDK. These techniques avoid operating system kernel switching overhead. They significantly improve packet processing efficiency.
The semantic processing layer performs message classification, filtering, and decoupling. It is supported by the message center, which comprises the message controller, message/subscriber matcher, message sender, subscriber controller, and user-mode forwarder. These messages from the publisher are buffered, semantically matched through a reverse indexing mechanism, and routed accordingly. This layer enables scalable and content-aware message distribution. It also reduces the coupling between data sources and receivers.
The dynamic scheduling layer ensures stable performance under varying workloads. It involves the traffic scheduler. It also involves parts of the message center and subscriber modules. The scheduler monitors resource usage, latency, and throughput in real time. Based on these measurements, it profiles each edge node. It then applies a load-balancing algorithm to allocate new terminal requests. The subscriber controller maintains subscription states. The subscriber module supports subscription requests and localized data handling.
This layered architecture forms an end-to-end data processing pipeline. Data streams from smart terminals are acquired, semantically processed, and adaptively scheduled for forwarding across the edge cluster based on system conditions. The coordination among different layers ensures that the system meets the performance, scalability, and flexibility requirements of smart logistics edge applications.
Figure 2 illustrates the system architecture.
3.2. Data Stream Publish/Subscribe Model
Smart-logistics deployments comprise numerous terminal devices that continuously emit real-time data streams. Formally, the system is represented as the quadruple , where is the set of smart terminals, the set of data streams, the set of edge nodes, and the set of compute nodes, with m denoting the number of edge nodes. Each streams produced by is written as , where denotes the extracted feature vector and the raw payload.
Data streams travel from terminal to edge node and onward to compute node , forming the path . At , each stream has its feature vector normalized to . The pair is packaged as a message and all such messages constitute the set . Edge nodes forward only those that satisfy the filtering rules of a given compute node . Hence, the system must absorb high-concurrency traffic under limited edge resources. It must also maintain reliability through effective load balancing when traffic is uneven.
To address these challenges, this work proposes the CBPS-DPDK method, which combines DPDK kernel-bypass forwarding with a content-based publish/subscribe (CBPS) mechanism. DPDK supplies zero-copy, polling-driven I/O in the data plane, while CBPS performs semantic matching and asynchronous routing. The method consists of two tightly coupled components. First, a publish/subscribe module (Algorithm 1) realizes high-throughput content matching. Second, a dual-feedback load-balancing mechanism monitors resource utilization from both global and local aspects and adapts traffic weights accordingly. We describe these components.
Publishers
extract
to create messages
; subscribers
register matching strategies
that form
. Each message adopts the binary form
; the matching routine is detailed in Algorithm 1. For every
, the system evaluates
and forwards
to each subscriber
that returns true.
Algorithm 1 Message/subscriber matching algorithm. |
- Require:
Messages, Subscribers - Ensure:
Messages - 1:
procedure MessageSubscriberMatching - 2:
Messages ← FetchMessages - 3:
Subscribers ← FetchSubscribers - 4:
for each Message in Messages do - 5:
RelevantSubs - 6:
for each Subscriber in Subscribers do - 7:
if IsInterested(Subscriber, Message) then - 8:
Add(RelevantSubs, Subscriber) - 9:
end if - 10:
end for - 11:
UpdateTargetSet(Message, RelevantSubs) - 12:
end for - 13:
end procedure - 14:
procedure IsInterested(Subscriber, Message) - 15:
Interests ← GetInterests(Subscriber) - 16:
MessageTags ← GetTags(Message) - 17:
if then return True - 18:
else return False - 19:
end if - 20:
end procedure
|
To keep Algorithm 1 stable under bursty workloads, a novel lightweight dual-feedback load-balancing mechanism is introduced. At the global layer, the mechanism periodically adjusts traffic weights across all edge nodes to smooth long-term load. At the local layer, each node reacts to instantaneous queuing latency and fine-tunes its weight, allowing fast suppression of emerging hot spots. The overall scheduling workflow is summarized in Algorithm 2.
Algorithm 2 Dual-feedback load-balancing mechanism. |
- Require:
Edge node set E - Ensure:
Final weight array - 1:
procedure SchedulerTick - 2:
/* Global layer */ - 3:
for each do - 4:
CollectGlobalMetrics() - 5:
ComputeNormLoad() - 6:
MapCapacity() - 7:
end for - 8:
NormalizeWeights() - 9:
/* Local layer */ - 10:
for each do - 11:
CollectLocalMetrics() - 12:
ComputeLatencyGap() - 13:
MapLocalCapacity() - 14:
end for - 15:
BlendWeights() - 16:
return - 17:
end procedure
|
Leveraging a multi-metric weight-adjustment strategy [
23], this mechanism models each edge node as an
[
24] queue and tunes traffic weights via coordinated global and local feedback loops.
Let denote the aggregated input traffic and the combined processing capacity of all edge nodes, where is node ’s throughput upper bound.
For node
, let the current input rate be
, the instantaneous throughput be
, the queue depth be
, and the one-way latency be
. The scheduler periodically computes
where
and
are system thresholds and
. After exponential smoothing
, a monotone mapping
yields capacities and the weights
ensure
and converge toward
when
.
For the local layer, each node batches messages and is therefore modeled as an
single-server queue. Let
be the basic processing time per message,
the extra matching/forwarding overhead, and
the batch window. The mean service time is
With arrival rate
, the Pollaczek–Khinchin result for
queues gives [
24]
The total latency is then approximated by
Deviation
is mapped to a local capacity
; updated weights
blend global and local feedback and preserve stability even under extreme concurrency.
By combining user-space high-speed forwarding, content-based message matching, and a concise dual-feedback scheduler, CBPS-DPDK ingests and routes heterogeneous high-concurrency data streams, balances multi-indicator loads, and maintains low latency in edge environments.
3.3. Evaluation Metrics
The performance of the proposed CBPS-DPDK system is evaluated from three perspectives: end-to-end latency, transmission bitrate, and video transmission quality.
End-to-end latency measures the total latency experienced by each data packet from generation to reception. It comprises three components: transmission latency
, processing latency
, and queuing latency
, with the total latency given by
denotes transmission latency, the interval required for data to traverse the network link; is processing latency, the time taken for data to be processed within the reception-and-forwarding system; represents queuing latency, the duration data remains in the queue awaiting processing within the reception and forwarding system. This metric is measured by timestamping packets at the sender and receiver. Packet sizes are varied from 64 Bytes to 1024 Bytes to observe latency performance under different data granularities. The coefficient of variation of latency, defined as the ratio of the standard deviation to the mean, is calculated to assess stability.
Transmission bitrate reflects the ability to sustain high-throughput data forwarding. It is defined as the total number of bits transmitted over a time interval:
Let be the transmission bit rate, is the number of bits sent in duration T. This metric is averaged over a one-hour continuous transmission test to ensure statistical reliability. The coefficient of variation of bitrate is calculated to evaluate temporal consistency.
The transmission quality of data stream reception and forwarding is an important system evaluation metric. In this work, video streams are selected as the experimental object. The proposed system is used to receive and forward video stream data. The video quality is evaluated at the reception end. If the transmission quality of the data streams is high, the quality of the received video is also high. If the transmission quality is poor, issues may occur. The issues include packet loss, frame errors, or latency. These problems reduce the quality of the received video. Therefore, the video quality at the reception end can be used to evaluate the transmission quality of the data stream. In this work, the Video Multimethod Assessment Fusion (VMAF) [
25] method is adopted. VMAF is a video quality evaluation metric designed by Netflix. It assesses the playback quality of videos through multiple evaluation methods, including structural similarity and perceptual quality metrics. The score range of video quality is [0, 100]. A higher score indicates better video quality. In this system, the subscriber is the video playback process. After the subscriber receives all sub-video streams of the same topic, the complete video streams is played. The VMAF method is then used to evaluate its quality. Better video stream quality indicates better data transmission quality.