RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels

Wang, Shuqi; Wang, Fengjiao

doi:10.3390/s25196058

Open AccessArticle

RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels

by

Shuqi Wang

^*

and

Fengjiao Wang

School of Communication and Information Engineering, Xi’an University of Science and Technology, Xi’an 710600, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(19), 6058; https://doi.org/10.3390/s25196058

Submission received: 28 August 2025 / Revised: 19 September 2025 / Accepted: 29 September 2025 / Published: 2 October 2025

(This article belongs to the Section Communications)

Download

Browse Figures

Versions Notes

Abstract

As an emerging technology, Reconfigurable Intelligent Surfaces (RIS) offers an efficient communication performance optimization solution for the complex and spatially constrained environment of coal mines by effectively controlling signal-propagation paths. This study investigates the channel attenuation characteristics of a semi-circular arch coal-mine tunnel with a dual RIS reflection link. By jointly optimizing the base-station beamforming matrix and the RIS phase-shift matrix, an improved Twin Delayed Deep Deterministic Policy Gradient (TD3)-based algorithm with a Noise Fading (TD3-NF) propagation optimization scheme is proposed, effectively improving the sum rate of the coal-mine wireless communication system. Simulation results show that when the transmit power is 38 dBm, the average link rate of the system reaches 11.1 bps/Hz, representing a 29.07% improvement compared to Deep Deterministic Policy Gradient (DDPG). The average sum rate of the 8 × 8 structure RIS is 3.3 bps/Hz higher than that of the 4 × 4 structure. The research findings provide new solutions for optimizing mine communication quality and applying artificial intelligence technology in complex environments.

Keywords:

coal-mine tunnel communication; reconfigurable intelligent surface; twin delayed deep deterministic policy gradient; sum rate

1. Introduction

With the widespread application of 5G technology in underground coal-mine communications, the construction of “smart mines” with the goals of visualization, unmanned operation, informatization, and intelligence is gradually being realized [1,2,3]. To effectively address the challenges of signal quality and coverage blind spots in the complex underground propagation environment, novel superconducting reflective surface technology with passive controllable signal characteristics has been introduced into mine safety production and information exchange processes. By dynamically adjusting signal-propagation characteristics, this technology constructs an adaptive underground signal coverage network, enabling the controllability of random and uncontrollable complex channels [4,5,6,7]. Currently, research on reflector-assisted wireless communication technology in mines can be primarily categorized into two aspects: the first focuses on the hardware structure design of reflectors [8,9,10] and the study of mine channel enhancement mechanisms using traditional algorithms [11,12,13]; the second focuses on the autonomous optimization design of mine reflectors based on intelligent algorithms such as deep learning.

Reference [14] investigates the field-strength superposition principle of multiple Reconfigurable Intelligent Surfaces (RIS) in tunnel scenarios. By constructing a vector model, it studies the signal field-strength characteristics of the RIS receiving end at a frequency of 28 GHz in subway tunnels and provides a deployment scheme for multiple RIS within a limited dynamic distribution space. Reference [15] focused on wireless communication in mine tunnels, specifically investigating non-line-of-sight propagation at the 28 GHz frequency band. It employed 3D ray-tracing simulations to model mine tunnel environments and evaluated the effectiveness of different RIS deployment modes—reflective and absorptive—in enhancing communication capacity under high path loss and multipath interference conditions. In [16], a joint deployment strategy for 5G-R base stations assisted by RIS is proposed, providing a wireless coverage solution for mountainous railways using RIS. It studies system gains using an optimized SINR criterion and the Charnes–Cooper solution method, and investigates the signal compensation characteristics of a 100-unit array RIS at the 930 MHz frequency band. In [17], a joint beamforming scheme for RIS-assisted multi-user MIMO systems based on deep learning is proposed, investigating system performance and data rates under various RIS array configurations. However, this study does not fully consider multi-RIS deployment scenarios or multi-hop signal propagation. In [18,19], multi-hop RIS-assisted joint communication and sensing (JCAS) methods are employed to enhance the energy efficiency and overall sensing rate of JCAS access points in underground coal mines. In [20], an error-rate detection method for mine RIS systems based on the Parallel Interference Cancellation (PIC) algorithm is proposed and its error characteristics are analyzed. However, this study only targets ideal rectangular tunnels and does not consider the complex tunnel structures in real-world scenarios. In [21], a RIS-NOMA mine communication system model based on wireless local area network noise interference is proposed, evaluating system transmission reliability by analyzing the interruption probability. In [22], an underground communication system model based on the Nakagami-g fading channel model and RIS signal-propagation model is constructed, with preliminary channel estimation performed using the least squares (LS) algorithm, and the channel estimation results optimized using an octave convolution (OCT) neural network. In [23], the reconfigurable characteristics of passive RIS beams are implemented through the principles of passive coding and splicing, suitable for coal-mine tunnels with different turning angles.However, traditional algorithms exhibit slower computation speeds and lower efficiency when handling complex environments. Therefore, adopting more efficient intelligent algorithms is particularly crucial for addressing the intricate scenarios encountered in coal-mine communication systems.

Reference [24] introduces RIS into the Vehicle-to-Everything (V2X) communication system for mines. By integrating meta-heuristic optimization algorithms with deep learning techniques and incorporating an error-correction strategy, it optimizes the RIS phase configuration to enhance channel gain.Reference [25] utilizes DRL to obtain the optimal RIS-assisted computational unloading strategy in dynamic mining environments, proposing a RIS-assisted computational unloading scheme based on Deep Deterministic Policy Gradient (DDPG) to jointly optimize the phase shift and unloading rate of RIS components, thereby maximizing the utility of mining IoT devices, improving energy efficiency, and reducing computational latency. In [26], RIS is used to optimize signal quality and maximize spectral efficiency in high-speed railway tunnels, proposing an algorithm combining Long Short-Term Memory (LSTM) and DDPG, namely LSTM-DDPG, to address this issue. In [27,28,29,30], the channel model and fading for RIS-assisted rectangular mines are derived, the feasibility of RIS in mine communications is verified, and the DDPG algorithm is used to optimize the system and rate in rectangular tunnels.DDPG and LSTM-DDPG have achieved some progress in RIS optimization, but they still face challenges in handling noise issues in complex environments and multi-hop RIS deployments. The DDPG algorithm suffers from overestimation problems, which may lead to inefficient and unstable policy optimization. Therefore, there is an urgent need to propose more efficient intelligent algorithms.

The existing literature primarily focuses on simple rectangular tunnel scenarios assisted by RIS, without addressing more complex mine tunnel geometric structures. The main contributions of this paper are summarized in the following three key aspects:

(1): This paper proposes for the first time a novel RIS channel propagation model specifically designed to model complex semi-circular vaulted mine structures. This model accounts for the geometric characteristics of mine tunnels, filling a gap in the existing literature that has only addressed simple rectangular tunnels.
(2): To overcome the noise issues in the existing Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, this paper proposes an improved TD3 algorithm—TD3 with Noise Fading. By dynamically adjusting the standard deviation of Ornstein–Uhlenbeck (OU) noise, this paper effectively mitigates the negative impact of noise on algorithm performance, enhancing the stability and efficiency of the optimization process.
(3): The proposed TD3-NF algorithm jointly optimizes the base-station beamforming matrix and the RIS phase-shift matrix to maximize the system link rate. The paper also analyzes the impact of base-station transmit power and the number of RIS units on the link rate, further explores the influence of key neural network parameters on algorithm performance, and provides new insights for the design of coal-mine RIS communication systems.

The remainder of this paper is structured as follows: Section 2 introduces the system model, Section 3 presents the research questions and proposes corresponding solutions, Section 4 evaluates the proposed approach through simulation, and Section 5 concludes the paper.

2. System Model

Figure 1 shows the simulated distribution of multi-branch mine tunnels. The tunnel geometric cross-section consists of a combination of rectangular tunnel structures and slightly curved semi-circular arched roofs. The figure illustrates two RIS-assisted tunnel scenarios: one is a communication system model assisted by RIS on the same side wall, and the other is a communication system model assisted by RIS on the opposite side wall. To enhance the feasibility of the system model design, the study fully considers the curved and narrow conditions of the mine tunnels. It is important to note that RIS is used for signal coverage in underground environments, so the research focuses solely on the signal-propagation characteristics of RIS-assisted mine tunnels under Non-Line-of-Sight (NLoS) conditions. Due to the complex long-distance propagation in confined spaces, the study does not consider the impact of Line-of-Sight (LoS) propagation on the system.

Consider a dual RIS-assisted multi-user coal-mine communication system with obstacles. The system is specified as consisting of a base station (BS) with M antennas and two RISs composed of

N_{r} (r \in {1, 2})

reflective units.Here, we assume that the two RIS systems have an equal number of reflectors (i.e.,

N_{1} = N_{2}

). This ensures dimensional matching in matrix operations, thereby guaranteeing the validity of the formula and the mathematical consistency of the system model. Furthermore, setting the reflector count equal simplifies the signal-processing model and ensures symmetry during signal optimization. RISs deployed on the curved walls of the mine tunnels, RIS1 near the BS end, RIS2 near the UE end and K single-antenna users (UE). The channels from BS to RIS1 and RIS2 are denoted as

H_{n_{1} m} \in C^{N_{1} \times M}

and

H_{n_{2} m} \in C^{N_{2} \times M}

, respectively, the channels from RIS1 and RIS2 to UE are denoted as

H_{n_{1} k} \in C^{N_{1} \times K}

and

H_{n_{2} k} \in C^{N_{2} \times K}

respectively, and the channel from RIS1 to RIS2 is denoted as

F \in C^{N_{2} \times N_{1}}

.

2.1. Coal-Mine Tunnel Modeling

Define the reflection coefficient matrix of RIS as shown in Equation (1):

Φ_{r} = d i a g (β (θ_{N_{r}}) e^{j θ_{N_{r}}}), θ_{n_{r}} \in [0, 2 π)

(1)

where

θ_{n_{r}} \in [0, 2 π)

is the phase change caused by the

r - th

reflection element of the

n - th

RIS, and

β (θ_{n_{r}})

is an amplitude function based on the

θ_{n_{r}}

phase change:

β (θ_{n_{r}}) = (1 - β_{min}) {(\frac{sin (θ_{n_{r}} - μ) + 1}{2})}^{α} + β_{min}

(2)

where

μ

represents the horizontal offset of the phase shift; and

α

controls the steepness of the function curve.

β_{min}, μ \geq 0, α \geq 0

depends on the hardware implementation constants of the RIS. If the signal undergoes ideal reflection on the RIS, then

{|ϕ_{n}|}^{2} = 1

, i.e.,

β_{min} = 1

, or

α = 0

.

The aggregated channel from the BS to the user k is shown in Equation (3):

H_{k} = \underset{dual - reflection link}{\underset{⏟}{H_{n_{2} k} Φ_{2} F Φ_{1} H_{n_{1} m}}} + \underset{\sin gle - reflection link}{\underset{⏟}{H_{n_{1} k} Φ_{1} H_{n_{1} m} + H_{n_{2} k} Φ_{2} H_{n_{2} m}}}

(3)

Due to the wide-frequency spectrum characteristics of electrical noise, electrical noise that affects radio signals is regarded as a pulse interference response. Therefore, underground noise can be represented by an independent and identically distributed Bernoulli–Gaussian process:

W = w_{bg} + w_{imp} = w_{bg} + B \cdot G_{a}

(4)

where

w_{bg} \sim N (0, σ^{2})

is additive Gaussian white noise, pulse interference noise

w_{imp} = B \cdot G_{a}

is Gaussian noise with random pulses,

B

is a Bernoulli random process with mean 0 and variance 1, taking values 0 or 1,

G_{a} \sim N (0, σ_{imp}^{2})

.

The signal received from the

k - th

user can be expressed as shown in Equation (5):

y_{k} = P_{k} H_{k} g_{k} s_{k} + P_{n} H_{k} \sum_{j = 1, j \neq k}^{K} g_{j} s_{j} + w_{k}

(5)

where the first term represents the expected signal of the

k - th

user, the second term represents the interference caused by the signals of all other users

(n \neq k)

to the

k - th

user, i.e., co-channel interference (CCI),

s_{k} \sim C N (0, σ^{2})

is the data symbol transmitted by the base station to the user k,

g_{k} \in C^{N \times 1}

is the transmission beamforming vector of the base station to user k, and P is the transmission power of each signal. Therefore, the signal-to-interference-plus-noise ratio (SINR) of the

k - th

user in the system can be expressed as shown in Equation (6):

β_{k} = \frac{P_{k} | H_{k} g_{k} |^{2}}{P_{n} \sum_{n . n \neq k}^{K} | H_{k} g_{j} |^{2} + σ^{2} + p \cdot σ_{imp}^{2}}

(6)

where p is the probability that

B

equals 1. Then, the total link rate in the system (in units of

bps / Hz

) can be expressed as shown in Equation (7):

C = \sum_{k = 1}^{K} {log}_{2} (1 + β_{k})

(7)

2.2. Signal-Propagation Path Analysis

The study uses the mirror method to predict the propagation characteristics of signals. First, it is necessary to obtain the effective reflection surface required for the mirror method. As shown in Figure 2, the arch is divided into Q equal parts. Regardless of whether Q is odd or even, the Y-axis coordinate and the Z-axis coordinate of any coordinate point

Q_{q} = (x_{q}, y_{q}, z_{q})

can be expressed as shown in Equation (8):

y_{q} = a - a cos (\frac{q π}{Q}), z_{q} = b + a sin (\frac{q π}{Q})

(8)

The

q - th

small plane of the arch is determined by two adjacent points

(y_{q}, z_{q})

and

(y_{q - 1}, z_{q - 1})

. The plane equation of the

q - th

small plane is shown in Equation (9):

\frac{(y - y_{q}) (z_{q} - z_{q - 1})}{y_{q} - y_{q - 1}} - z + z_{q} = 0

(9)

The plane equation of the coal-mine wall surface is shown in Equation (10):

\{\begin{matrix} \frac{(y - y_{q}) (z_{q} - z_{q - 1})}{y_{q} - y_{q - 1}} - z + z_{q} = 0 & Top plate \\ x = \pm a & Left / right side walls \\ z = 0 & Bottom plate \end{matrix}

(10)

Specify the coordinates of the launch point

A (x_{0}, y_{0}, z_{0})

, the coordinates of the receiving point

B (x_{1}, y_{1}, z_{1})

, the coordinates of the reflection point

C (x, y, z)

, and the equation of the reflection plane m is

a_{1} x + a_{2} y + a_{3} z + a_{4} = 0

, constructed in the Cartesian coordinate system shown in Figure 2. The mirror point

M (x_{m}, y_{m}, z_{m})

can be expressed as shown in Equation (11):

\{\begin{matrix} x_{m} = x_{0} \\ y_{m} = [(a_{3}^{2} - a_{2}^{2}) y_{0} - 2 a_{2} a_{3} z_{0} - 2 a_{2} a_{4}] / (a_{2}^{2} + a_{3}^{2}) \\ z_{m} = [(a_{2}^{2} - a_{3}^{2}) z_{0} - 2 a_{2} a_{3} y_{0} - 2 a_{3} a_{4}] / (a_{2}^{2} + a_{3}^{2}) \end{matrix}

(11)

According to the line connecting the mirror point and the receiving point

M B

and the reflection plane intersecting at the reflection point, we have:

\frac{x_{m} - x}{x_{1} - x} = \frac{y_{m} - y}{y_{1} - y} = \frac{z_{m} - z}{z_{1} - z} = u

(12)

From this, we can obtain:

x = \frac{x_{m} - u x_{1}}{1 - u}, y = \frac{y_{m} - u y_{1}}{1 - u}, z = \frac{z_{m} - u z_{1}}{1 - u} .

(13)

The reflection point is also located on the reflection plane, so it satisfies the reflection plane equation, giving us:

u = \frac{a_{1} x_{m} + a_{2} y_{m} + a_{3} z_{m} + a_{4}}{a_{1} x_{1} + a_{2} y_{1} + a_{3} z_{1} + a_{4}}

(14)

The coordinates of the reflection point can be obtained from Equations (13) and (14).

In a semi-circular arch-shaped mine tunnel, the signal undergoes more than three reflections, resulting in significant energy attenuation. Therefore, considering only the first three reflections is sufficient to fully reveal the signal-propagation behavior. Thus, the upper limit for the number of reflections is set to three [31,32]. The number of reflection lines depends on the number of reflective surfaces

ξ

in the tunnel, where reflective surfaces include the tunnel walls and RIS. A single reflection line can have up to

ξ

lines, with each reflection line involving one reflective surface; a secondary reflection line can have up to

ξ (ξ - 1)

lines, with any two different reflective surfaces forming a secondary reflection line; a tertiary reflection line can have up to

ξ {(ξ - 1)}^{2}

lines, with each tertiary reflection line passing through three different reflective points, and adjacent reflective points cannot be on the same reflective surface, but the first and third reflective points can be on the same reflective surface. In this process, the RIS, as a new type of controllable reflective surface, collaborates with traditional reflective surfaces to construct a multipath propagation environment. Therefore, the total effective scattering paths are shown in Equation (15):

ξ + ξ (ξ - 1) + ξ {(ξ - 1)}^{2}

(15)

Assuming that the signal passes through n reflection points

C_{1}, C_{2}, \dots, C_{n}

from the transmission point to the reception point, the path length of the signal propagation can be expressed as:

L = \sum_{i = 1}^{n} p_{i}, i = 1, 2, \dots, n

; where

p_{i}

represents the distance from the

i - th

reflection point

C_{i}

to the

(i + 1) - th

reflection point

C_{i + 1}

, and the reflection points include the position of RIS. In the case of dual RIS, the length of the reflection path changes with the number of RIS, so it is necessary to calculate the distance between each reflection point step by step.

According to Fresnel’s law, the reflection coefficients of vertically polarized waves and horizontally polarized waves can be expressed as shown in Equation (16) and (17):

R_{⊥} = \frac{cos θ_{i} - \sqrt{ε_{r} - {sin}^{2} θ_{i}}}{cos θ_{i} + \sqrt{ε_{r} - {sin}^{2} θ_{i}}}

(16)

R_{‖} = \frac{ε_{r} cos θ_{i} - \sqrt{ε_{r} - {sin}^{2} θ_{i}}}{ε_{r} cos θ_{i} + \sqrt{ε_{r} - {sin}^{2} θ_{i}}}

(17)

where

ε_{r}

is the relative dielectric constant of the tunnel wall. Assuming that the roughness distribution of the tunnel wall follows a Gaussian distribution with a mean of 0 and a variance of h and

ρ_{r}

is the roughness loss factor, the roughness loss coefficient caused by the rough surface of the tunnel wall is:

ρ_{⊥} = ρ_{r} R_{⊥}, ρ ‖ = ρ_{r} R_{‖}

(18)

where

ρ_{r} = exp [- 8 {(\frac{π h cos θ_{i}}{λ})}^{2}]

. Multiple reflections of the signal on both sides of the tunnel and the top and bottom plates will cumulatively affect the total reflection coefficient. Assuming the ray undergoes m reflections at the side walls and n reflections at the top and bottom plates (i.e., experiencing m reflections of horizontally polarized waves followed by n reflections of vertically polarized waves at the top and bottom plates), the reflection coefficient at this point can be expressed as shown in Equation (19):

ω_{p} = \{\begin{matrix} R_{⊥}^{m} R_{‖}^{n} ρ_{r}^{m + n} \\ R_{‖}^{m} R_{⊥}^{n} ρ_{r}^{m + n} \end{matrix}

(19)

3. Joint Design of Base-Station Beamforming and RIS Phase Shifting Based on TD3-NF

3.1. Optimization of Beamforming Matrix and RIS Phase Shift Matrix

The power of the multi-antenna BS transmission signal is subject to the maximum transmission power constraint, expressed as shown in Equation (20):

E {T r (G G^{H})} \leq P_{max}

(20)

where

P_{max}

is the maximum transmission power at BS,

E {\cdot}

represents the statistically expected value, and

T r (\cdot)

represents the trace of the matrix.

In order to maximize the total link rate of the system by optimizing

G

and

Φ

, define the optimization problem:

\begin{matrix} max \sum_{k = 1}^{K} {log}_{2} (1 + β_{k}) \\ s . t . Tr (G G^{H}) \leq P_{max} \\ | ϕ_{n} |^{2} = 1 \\ 0 \leq θ_{n} < 2 π, n = {1, 2, \dots, N} \end{matrix}

(21)

Due to the non-convexity of the objective function and the complexity of the constraints, traditional mathematical optimization methods require a large number of iterative calculations, resulting in high computational complexity. Therefore, this paper adopts a framework based on the TD3 algorithm to solve the problem in Equation (21) and obtain feasible

G

and

Φ

under the premise of satisfying all feasible constraints.

3.2. TD3-NF Algorithm Design

The TD3 algorithm is an improvement on the DDPG algorithm, both of which are used for continuous control problems. Compared to DDPG, TD3 introduces two key innovations: dual updates of the target network and an action delay update mechanism to reduce estimation errors during training.

In standard TD3, Ornstein–Uhlenbeck (OU) noise is used to increase the randomness of action selection to promote exploration. However, the standard deviation of OU noise is typically fixed, which may result in an overly prolonged exploration phase or excessive noise persisting in the later stages of training, thereby impairing the model’s exploitation efficiency. To address this issue, this paper proposes the TD3-NF algorithm to solve problem in Equation (21). This algorithm balances exploration and exploitation by gradually reducing the standard deviation of the noise, i.e.,

σ_{t} = max (σ_{t - 1} \cdot γ_{s}, σ_{min})

(22)

where

γ_{s}

is the noise decay rate,

σ_{min}

is the minimum standard deviation of noise, and

σ_{t - 1}

is the standard deviation of noise in the previous step. This decay strategy allows the algorithm to conduct extensive exploration in the early stages and gradually transition to a more strategy-dependent utilization phase as training progresses. The design process of problem in Equation (21) is shown in Figure 3.

Using CSI as input for the TD3-NF agent, generate the optimal

G

and

Φ

. The specific settings for state, action, and reward are as follows:

State $s_{t}$ : Transmit power at time t; channel matrices from BS to RIS1 and RIS2; channel matrices from RIS1 and RIS2 to UE; channel matrix from RIS1 to RIS2. The state space size is shown in Equation (23):

$N_{i} = 2 K + 2 K^{2} + 2 M K + 2 (N_{1} + N_{2}) + 2 M (N_{1} + N_{2}) + 2 N_{1} N_{2} + 2 (N_{1} + N_{2}) K$

(23)
Action $a_{t}$ : Composed of the beamforming matrix and phase shift matrix at time t; the size of the action space is shown in Equation (24):

$N_{o} = 2 M K + 2 N_{1} + 2 N_{2}$

(24)
Reward $r_{t}$ : The reward at time t is the value of the objective function in Equation (21):

$r_{t} = {log}_{2} (1 + β_{k})$

(25)

The TD3-NF algorithm proposed in this paper follows the procedure below, as summarized in Algorithm 1.

Algorithm 1. TD3-NF algorithm

1: Initial: transmit beamforming matrix G, phase-shift matrix $Φ$ , experience replay pool E, parameters of the Critic training network $w_{c_{1}}, w_{c_{2}}$ , parameters of the Actor training network $w_{a}$ , parameters of the target network $w_{c_{1}}^{'}, w_{c_{2}}^{'}, w_{a}^{'}$
2: Input: channel matrices from BS to RIS1 and RIS2 $H_{h_{1} m}, H_{h_{2} m}$ ; channel matrices from RIS1 and RIS2 to UE $H_{n_{1} k}, H_{n_{2} k}$ ; channel matrix from RIS1 to RIS2 $F$
3: Output: Q-value function, optimal action $a = {G, Φ}$
4: for $j = 1$ to J do
5: Get the initial state $s_{0}$ for $t = 1$ to T do
6: From the Actor training network $a_{t}$ , $a_{t} = π (s_{t}, w_{a})$
7: Based on the action $a_{t}$ , get the state $s_{t + 1}$ at the next moment, and get the immediate reward $R_{t + 1}$ Store $(s_{t}, a_{t}, R_{t + 1}, s_{t + 1})$ into the experience replay pool E
8: A small batch of samples of size W is randomly drawn from the empirical replay pool E
9: Select two target Critic networks for Q-value prediction, $y_{1}, y_{2}$
10: Calculate TD Objectives $y_{min}$
11: Update Critic: $w_{c} \leftarrow w_{c} - β_{c} \cdot \nabla_{w_{c}} l (w_{c})$
12: Update Actor: $w_{a} \leftarrow w_{a} - β_{a} \cdot \nabla_{a} q (s_{t}, a_{t}; w_{c}) \nabla_{w_{a}} π (s_{t}; w_{a})$
13: Update target networks: $w_{a}^{'} \leftarrow τ w_{a} + (1 - τ) w_{a}^{'}, 0 < τ ≪ 1$ $w_{c}^{'} \leftarrow τ w_{c} + (1 - τ) w_{c}^{'}, 0 < τ ≪ 1$
14: Let $s_{t} = s_{t + 1}$
15: end
16: end

As the number of training iterations increases, the agent continuously interacts with the environment to select the optimal actions, ultimately maximizing long-term returns. When training approaches convergence, the agent calculates the optimal beamforming matrix and RIS phase-shift matrix. The neural network structure designed is shown in Figure 4. The action network and evaluation network, as well as their respective target networks, are structurally similar, both consisting of four fully connected layers: an input layer, two hidden layers (with batch normalization applied in the hidden layers), and an output layer. Based on this, all layers of the deep neural networks in this paper use the tanh activation function, and all networks uniformly use the Adam optimizer. The learning rate in Adam is adaptively adjusted according to the gradient changes of each parameter, i.e.,

{β_{a}}^{(t + 1)} = λ_{a} {β_{a}}^{(t)}

, and

{β_{c}}^{(t + 1)} = λ_{c} {β_{c}}^{(t)}

, where

λ_{a}

and

λ_{c}

are the decaying rates of the action network and critic network, respectively.

β_{c}

is the learning rate for updating the Critic training network, and

β_{a}

is the learning rate for updating the Actor training network.

During the training phase, the variables L,

Z_{0}

and

Z_{l}

represent the size of the training layer, the size of the input layer, and the number of neurons in layer l, respectively, and the number of learned samples is set to B. The computational complexity of each time step is

O (B (Z_{0} Z_{l} + \sum_{l = 1}^{L - 1} Z_{l} Z_{l + 1}))

. During the training phase, each minibatch contains

N^{e p i}

events, each event is T time steps long, and each training model undergoes multiple iterations until convergence. Therefore, the total computational complexity of the training is expressed as:

O (N^{e p i} B (Z_{0} Z_{l} + \sum_{l = 1}^{L - 1} Z_{l} Z_{l + 1}))

. During the training phase, in addition to the forward propagation and backpropagation computations of the neural network, environmental simulation is required. This involves calculating the composite channel matrix for each user and performing matrix multiplication for RIS phase shifts. The computational complexity of channel calculation is

O (N_{r} \cdot K)

, where

N_{r}

represents the number of reflectors in the RIS and K denotes the number of users. Additionally, the SINR for each user must be computed, introducing further computational overhead. During the simulation phase, the overall computational complexity increases to

O (N_{r} \cdot K \cdot J)

, where J corresponds to the number of SINR calculations performed per user.

4. Results

To validate the feasibility of the proposed TD3-NF-based algorithm for improving transmission rates in RIS-assisted semi-circular arch tunnels, this section conducts simulation experiments using numerical modeling, with Python as the programming language. First, an environment initialization function is constructed and initial values are set. An environment transition function is designed to describe the changes in the environment state after each action step. The neural network structure and parameters of the proposed TD3-NF-based algorithm are designed, and a DRL agent is constructed to interact with the environment. The coal-mine height is set to 6 m, width to 5 m, arch crown radius to 2.5 m, and the distances from the base station and receiver to the tunnel sidewall are 4.5 m and 2 m, respectively. The number of reflector array elements is

4 \times 4

,

4 \times 8

,

8 \times 8

, and the element size is

d_{x} = d_{y}

= 5 cm. At 2.4 GHz, the dimensions of the RIS are

23.75 \times 23.75

cm,

23.75 \times 48.75

cm, and

48.75 \times 48.75

cm, with a transmission distance of 200 m. The distance between the RIS and the base station is fixed at 50 m. In this study, we assume that the underground mine communication system employs a leaky feeder antenna configuration, a common antenna structure in complex environments such as mines that provides effective signal coverage. Signals are modulated using QPSK, with a transmission bit rate set at 1 Mbps, a beamwidth of 120°, and a standard time-division multiplexing (TDM) frame structure. The channel matrices

H_{n_{1} m}

,

H_{n_{2} m}

and

H_{n_{1} k}

,

H_{n_{2} k}

are implemented using ray tracing, where the total effective scattering paths are

R = 128

, the gain of the receiving and transmitting antennas is 5 dBm, the roughness variance of the tunnel walls is

h = 0.01

, and the relative permittivity is

ε_{r}

= 5. The simulation parameters are shown in Table 1. In this paper, the average reward is used as the metric to evaluate system performance, defined as shown in Equation (26):

R_{a v g} = \frac{\sum_{t = 1}^{T} r e w a r d (t)}{t}, t = 1, 2, \dots, T

(26)

As shown in Figure 5, this paper compares and analyzes the performance of four reinforcement learning algorithms (TD3-NF, DDPG, Asynchronous Advantage Actor–Critic (A3C), and Deep Q-Network(DQN)) in a RIS-assisted tunnel communication system. The experimental results indicate that the TD3-NF algorithm demonstrates the best performance, with its link rate significantly improving from the initial 2.5 bps/Hz to 11.1 bps/Hz. In contrast, the DDPG algorithm eventually stabilizes at 8.6 bps/Hz, with the performance bottleneck primarily stemming from the issue of overestimating Q-values; the learning curve of the A3C algorithm was relatively flat, increasing from approximately 1.0 bps/Hz to 5.7 bps/Hz. The DQN algorithm, constrained by the expressive capability of its discrete action space, only slowly improved from 2.3 bps/Hz to 4.8 bps/Hz. Overall, the TD3-NF algorithm emerged as the most suitable for this tunnel communication system due to its strong adaptability and efficient policy learning. In contrast, DDPG, A3C, and DQN failed to achieve comparable optimization results due to their respective limitations. These algorithms face varying degrees of performance bottlenecks, particularly when handling complex continuous or large-scale action spaces.

When

P_{t} = 38

dBm, the changes in the system average reward under different RIS arrays for the DDPG and TD3-NF algorithms are shown in Figure 6. As the number of RIS elements increases, the system average reward using the TD3-NF algorithm improves from 7.8 bps/Hz to 11.1 bps/Hz, an increase of 3.3 bps/Hz, while DDPG only improves by 1.8 bps/Hz. This indicates that the TD3-NF algorithm can more effectively boost the system’s average reward, exhibiting stronger scalability and optimization capabilities as the number of RIS elements increases.

When the number of RIS elements in the system is 64, the average reward changes for the two algorithms under different transmit power levels are shown in Figure 7. When the power increases by 2 dBm, the TD3-NF algorithm exhibits a faster reward growth rate. When

P_{t} = 38

dBm, the average reward of the system under this algorithm increases from 3 bps/Hz to 11.1 bps/Hz, an increase of 8.1 bps/Hz; while the reward growth of DDPG only improves by 3 bps/Hz.The TD3-NF algorithm consistently outperforms the DDPG algorithm in efficiency across different transmission power levels, particularly under high-power conditions, demonstrating superior adaptability and optimization capabilities in complex environments. Even under identical conditions, the TD3-NF algorithm achieves significantly higher average reward improvements compared to DDPG.

Figure 6 and Figure 7 show that the performance of the TD3-NF algorithm is more prominent when the number of RIS elements and transmission power are increased, especially under higher RIS configurations and transmission power conditions, where the learning speed and reward value are significantly improved. This indicates that optimizing the number of RIS elements and transmission power has a positive effect on algorithm performance.

The CDF of the total system rate for different numbers of RIS elements and transmit powers is shown in Figure 8. As the transmit power and number of RIS elements increase, the CDF curve shifts to the right. When CDF = 0.6 and

P_{t} = 37

dBm, the number of RIS elements is positively correlated with the system rate, and the rate for an 8 × 8 array is 13.45 bps/Hz.The CDF curve validates the observations in Figure 6 and Figure 7, confirming that both the average and peak rates of the system improve with increases in transmit power and the number of RIS elements.

In the algorithm proposed in this paper, the Actor and Critic networks use fixed learning rates and decay rates. Under the conditions of

N = 8 \times 8

and

P_{t} = 37

dBm, the relationship between different learning rates, decay rates, average rewards, and episode steps is shown in Figure 9 and Figure 10.

The impact of different learning rates on algorithm performance is shown in Figure 9. When the learning rate is

10^{- 3}

, the system achieves optimal performance, so this paper selects

10^{- 3}

as the learning rate for this model. However, when the learning rate is

10^{- 2}

, the algorithm performs the worst, which may be due to the excessively high learning rate causing the algorithm to oscillate continuously during the optimization process and unable to find the optimal solution.

The relationship between the average reward and the time-step length under different decay rates is shown in Figure 10. Compared with the learning rate, the decay rate has a smaller impact on system performance. When the decay rate is

10^{- 6}

, the system achieves optimal performance, while the algorithm performs worst when the decay rate is

10^{- 3}

. Therefore, this paper selects

10^{- 6}

as the decay rate for this model.

5. Conclusions

This study proposes a communication design scheme for semi-circular arch mine tunnels based on RIS assistance. By jointly optimizing the base-station beamforming matrix and the RIS phase-shift matrix, the TD3-NF algorithm is used to evaluate the system link rate. The results show that the TD3-NF algorithm significantly outperforms traditional DDPG, A3C, and DQN algorithms in link-rate optimization, with the average link rate improving from an initial 2.5 bps/Hz to 11.1 bps/Hz. Compared to the second-best DDPG algorithm, the link rate achieved using the TD3-NF algorithm is superior to that of the DDPG algorithm under the same conditions. When the transmit power is 38 dBm and and 8 × 8 RIS elements, the system’s average link rate reached 11.1 bps/Hz. The study also analyzed the impact of RIS element count and base-station transmit power on system performance, finding that increasing the number of RIS elements and boosting transmit power can further enhance system performance, offering new solutions to improve the efficiency of intelligent mine communication systems.However, current research still faces several challenges, including performance optimization in complex mine structures, interference issues in multi-hop RIS deployment, and stability concerns with intelligent algorithms. Future research could focus on the following directions: further optimizing RIS deployment and multi-RIS collaborative optimization in complex environments, enhancing algorithm adaptability and stability in dynamic mine settings, and exploring the integrated application of RIS with other intelligent technologies. Concurrently, practical deployment and experimental validation will provide crucial evidence for further refining this solution.

Author Contributions

Conceptualization, S.W. and F.W.; methodology, S.W. and F.W.; software, S.W. and F.W.; validation, S.W. and F.W.; formal analysis, F.W.; investigation, S.W. and F.W.; resources, S.W. and F.W.; data curation, S.W. and F.W.; writing—original draft preparation, F.W.; writing—review and editing, S.W.; visualization, S.W. and F.W.; supervision, S.W.; project administration, S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC52174197) and the UWB Radar Life Information Feature Extraction and Quantitative Identification for Mine Drill Hole Rescue project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Acknowledgments

I would like to express special thanks to Shuqi Wang for his teaching and guidance, which has benefited me greatly.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luo, W.J.; Liu, D.Y.; Yang, Y.B. Design and Optimization of a Modernized Scheme for Underground Coal Mine Operations Integrating 5G Communication. Coal Technol. 2024, 43, 256–260. [Google Scholar]
Shen, X.; Bu, X.P.; Yu, W.K. Research on the Construction of “Smart Mine” Based on 5G Technology. J. China Acad. Electron. Inf. Technol. 2020, 15, 620–624. [Google Scholar]
Wang, G.F.; Pang, Y.H.; Ren, H.W. Research and Development Path of Intelligent Mine Technology System. Met. Mine 2022, 5, 1–9. [Google Scholar]
Li, S.Y.; Zhang, P.; Min, M.H.; Li, Z.W.; Zhang, M.D.; Xiao, J.Y. Intelligent Reflecting Surface Technology and Its Application in Wireless Blind Zone Coverage in Underground Coal Mines. J. Mine Autom. 2023, 49, 112–119. [Google Scholar]
Li, S.Y.; Yang, R.X.; Yang, L.; Shen, S.Q.; Li, F.F.; Hu, Q.S. A Review of Non-Line-of-Sight Wireless Coverage Technology Using Intelligent Reflecting Surfaces in Underground Coal Mines. J. China Univ. Min. Technol. 2024, 53, 613–622. [Google Scholar]
Luo, Y.J.; Hu, Q.S.; Wang, L.D.; Cheng, Y.X.; Li, S.Y. Reconfigurable Intelligent Surface in Underground Mines Radio Localization: A Review. Measurement 2025, 251, 117312. [Google Scholar] [CrossRef]
Kisseleff, S.; Chatzinotas, S.; Ottersten, B. Reconfigurable Intelligent Surfaces in Challenging Environments: Underwater, Underground, Industrial and Disaster. IEEE Access 2021, 9, 150214–150233. [Google Scholar] [CrossRef]
Huang, X.J.; Wang, Y.F.; Wang, Y.W.; Li, C.; Zhang, J.C.; Li, S.Q. Design and Implementation of Metasurfaces for Enhancing Non-Line-of-Sight Communication in Mining Tunnels. Opt. Laser Technol. 2025, 182, 112130. [Google Scholar] [CrossRef]
Cui, T.J.; Qi, M.Q.; Wan, X.; Zhao, J.; Cheng, Q. Coding Metamaterials, Digital Metamaterials and Programmable Metamaterials. Light. Sci. Appl. 2014, 3, e218. [Google Scholar] [CrossRef]
Abadal, S.; Cui, T.J.; Low, T.; Georgiou, J. Programmable Metamaterials for Software-Defined Electromagnetic Control: Circuits, Systems, and Architectures. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 6–19. [Google Scholar] [CrossRef]
Liu, Y.; Yang, Z.; Wang, B.; Xu, Y.H. Rate Optimization of Intelligent Reflecting Surface-Assisted Coal Mine Wireless Communication Systems. Entropy 2024, 26, 880. [Google Scholar] [CrossRef]
Ma, Z.; Wu, Y.; Xiao, M.; Liu, G.; Zhang, Z. Interference Suppression for Railway Wireless Communication Systems: A Reconfigurable Intelligent Surface Approach. IEEE Trans. Veh. Technol. 2021, 70, 11593–11603. [Google Scholar] [CrossRef]
Yin, B.; Li, M.L.; Ning, Y.H.; Song, S.Z. Research on Outage Performance of Underground CR-STBC-NOMA Communication System Assisted by RIS. J. Phys. Conf. Ser. 2024, 2849, 012134. [Google Scholar] [CrossRef]
Jiang, G.W.; Jin, Y.L.; Zheng, G.X. Performance Optimization Method for RIS-Assisted Wireless Communication in Tunnel Scenarios. Ind. Control Comput. 2025, 38, 30–32. [Google Scholar]
Surier, A.; Hakem, N.; Kandil, N. Assessing the 28 GHz Channel Capacity Enhancement in NLOS Mine Tunnels Through RIS Deployment. In Proceedings of the 2024 IEEE International Symposium on Antennas and Propagation and INC/USNC-URSI Radio Science Meeting (AP-S/INC-USNC-URSI), Florence, Italy, 14–19 July 2024; pp. 261–262. [Google Scholar]
Luo, W.; Ma, Z.; Wu, Y.L. Research on 5G-R Wireless Coverage Scheme for Mountainous Railways Based on Intelligent Reflecting Surface. China Railw. 2024, 12, 94–99. [Google Scholar]
Chen, X.; Zhang, Q.H.; Shi, J.F.; Zhu, J.Y. Joint Beamforming for Intelligent Reflecting Surface-Assisted Communication Systems Based on Deep Learning. Comput. Sci. 2024, 51, 685–689. [Google Scholar]
Guo, T.H.; Wang, Y.J.; Xu, L.X.; Mei, M.Y.; Shi, J.; Dong, L.; Xu, Y.J.; Huang, C.W. Joint Communication and Sensing Design for Multihop RIS-Aided Communication Systems in Underground Coal Mines. IEEE Internet Things J. 2023, 10, 19533–19544. [Google Scholar] [CrossRef]
Guo, T.H.; Li, X.Z.; Mei, M.Y.; Yang, Z.H.; Shi, J.; Wong, K.-K.; Zhang, Z.Y. Joint Communication and Sensing Design in Coal Mine Safety Monitoring: 3-D Phase Beamforming for RIS-Assisted Wireless Networks. IEEE Internet Things J. 2023, 10, 11306–11315. [Google Scholar] [CrossRef]
Wang, S.Q.; Zhang, W.; Wang, F.J.; Deng, W.J. Multi-User Detection Algorithm Based on RIS-Assisted Channel in Coal Mine Tunnel. IEICE Trans. Commun. 2025, E108-B, 842–850. [Google Scholar] [CrossRef]
Li, M.; Wang, X.; Wang, Y.; Cheng, Z.; Dong, Z. Research on Performance of RIS-Assisted NOMA Mine Communication System. J. Beijing Univ. Posts Telecommun. 2025, 48, 106–111. [Google Scholar]
Wang, A.Y.; Li, X.Y.; Li, M.Z.; Li, R.M. Channel Estimation Method for IRS-Assisted Mine Communication System Based on Self-Supervised Learning. J. Mine Autom. 2024, 50, 144–150. [Google Scholar]
Yin, B.; Li, X.L.; Li, Y.; Fu, X.D. Research on Wireless Signal Coverage Enhancement in Mine Tunnels with Different Turning Angle Based on PRIS. Prog. Electromagn. Res. C 2024, 149, 59–65. [Google Scholar] [CrossRef]
Cheng, Y.X.; Hu, Q.S.; Luo, Y.J.; Zhang, Y.S. RIS-Assisted Channel Prediction Model for Mine V2X Systems. In Proceedings of the 2024 5th International Conference on Computer, Big Data and Artificial Intelligence (ICCBD+AI), Jingdezhen, China, 1–3 November 2024; pp. 339–344. [Google Scholar]
Zhang, P.; Min, M.H.; Xiao, J.Y.; Li, S.Y.; Zhang, H.L. IRS-Aided Mobile Edge Computing for Mine IoT Networks Using Deep Reinforcement Learning. In Proceedings of the 2023 IEEE/CIC International Conference on Communications in China (ICCC), Dalian, China, 10–12 August 2023; pp. 1–6. [Google Scholar]
Xu, J.P.; Ai, B. When mmWave High-Speed Railway Networks Meet Reconfigurable Intelligent Surface: A Deep Reinforcement Learning Method. IEEE Wirel. Commun. Lett. 2021, 11, 533–537. [Google Scholar] [CrossRef]
Wang, S.Q.; Zhang, W. RIS-Assisted Wireless Channel Characteristic in Coal Mine Tunnel Based on 6G Mobile Communication System. Prog. Electromagn. Res. C 2024, 141, 13–23. [Google Scholar] [CrossRef]
Yang, Q.; Wu, Y.T.; Zhao, H.K.; Feng, Y.C.; Sun, Y.Q.; Fang, Z.; Zheng, G.X. Path Loss Modeling for RIS-Assisted Wireless Communication in Tunnel Scenarios. Sensors 2025, 25, 1247. [Google Scholar] [CrossRef]
Wang, S.Q.; Wang, F.J.; Zhang, W.; Wang, Y.Q. The Deep Deterministic Policy Gradient Algorithm Based on RIS Technology in a Coal Mine Tunnel. Appl. Sci. 2024, 14, 12014. [Google Scholar] [CrossRef]
Chen, C.S.; Pan, C.H. Blocking Probability in Obstructed Tunnels with Reconfigurable Intelligent Surface. IEEE Commun. Lett. 2021, 26, 458–462. [Google Scholar] [CrossRef]
Zhang, Y.R.; Zhang, T.R.; Huang, T.L.; Zhang, T.Y.; Qi, L.F. Line-of-Sight Propagation Model for Indoor Environments. Chin. J. Radio Sci. 1998, 3, 250–255. [Google Scholar]
Malik, W.Q.; Stevens, C.J.; Edwards, D.J. Spatio-Temporal Ultrawideband Indoor Propagation Modelling by Reduced Complexity Geometric Optics. IET Commun. 2007, 1, 751–759. [Google Scholar] [CrossRef]

Figure 1. Dual RIS-assisted coal-mine communication system.

Figure 2. Semi-circular arch coal-mine tunnel roof decomposition diagram.

Figure 3. TD3-NF algorithm block diagram.

Figure 4. Neural network structure diagram.

Figure 5. Impact of different algorithms on average reward.

Figure 6. Impact of different RIS element numbers on average reward

(P_{t}

= 38 dBm).

Figure 6. Impact of different RIS element numbers on average reward

(P_{t}

= 38 dBm).

Figure 7. Impact of different transmission power on average reward

(N = 8 \times 8)

.

Figure 7. Impact of different transmission power on average reward

(N = 8 \times 8)

.

Figure 8. CDF of total rate for different system settings.

Figure 9. Impact of learning rate on average reward under TD3-NF algorithm.

Figure 10. Impact of decay rate on average reward under TD3-NF algorithm.

Table 1. Simulation parameters.

Parameter	Value
Frequency	2.4 GHz
Batch size	16
Number of episodes	$5 \times 10^{3}$
Episode steps	$1.5 \times 10^{4}$
Experience replay $E$	$10^{6}$
Discount factor $γ$	0.999
Learning rate	$10^{- 3}$
Decaying rate $τ$	$10^{- 6}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Wang, F. RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels. Sensors 2025, 25, 6058. https://doi.org/10.3390/s25196058

AMA Style

Wang S, Wang F. RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels. Sensors. 2025; 25(19):6058. https://doi.org/10.3390/s25196058

Chicago/Turabian Style

Wang, Shuqi, and Fengjiao Wang. 2025. "RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels" Sensors 25, no. 19: 6058. https://doi.org/10.3390/s25196058

APA Style

Wang, S., & Wang, F. (2025). RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels. Sensors, 25(19), 6058. https://doi.org/10.3390/s25196058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels

Abstract

1. Introduction

2. System Model

2.1. Coal-Mine Tunnel Modeling

2.2. Signal-Propagation Path Analysis

3. Joint Design of Base-Station Beamforming and RIS Phase Shifting Based on TD3-NF

3.1. Optimization of Beamforming Matrix and RIS Phase Shift Matrix

3.2. TD3-NF Algorithm Design

4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI