1. Introduction
Social networks have become the primary medium for information dissemination and opinion exchange in modern society, profoundly shaping public opinion formation, marketing strategies, and social governance effectiveness [
1,
2,
3,
4,
5]. Understanding how collective opinions evolve over time (opinion dynamics) and designing effective interventions under resource constraints (optimal control) has emerged as a frontier research area at the intersection of control theory and network science [
6,
7]. The classic DeGroot model [
8] and its extensions, such as the Friedkin–Johnsen model [
9], elegantly capture interpersonal influence mechanisms through linear weighting rules, laying the foundation for subsequent theoretical control studies.
Currently, research on opinion control mainly proceeds along two technical routes. The first route focuses on the optimal reconstruction of network topology, indirectly guiding opinion convergence by adjusting edge weights or connection relationships [
10,
11]. However, such methods often face rigid constraints in practical applications where the network structure cannot be adjusted in real time. The second route focuses on designing external intervention strategies, directly influencing node states by applying control inputs, which is more in line with real-world scenarios such as advertising placement and policy propaganda [
12,
13].
The problem of opinion intervention in reality inherently contains profound characteristics of a “cost–benefit” game. Decision-makers expect the group opinions to gradually approach the target, while having to strictly control the long-term input costs, and the marginal utility of future benefits often decreases with time [
14,
15]. This structural feature naturally corresponds to the infinite-horizon discounted optimal control framework. By introducing the discount factor
, the objective function penalizes opinion deviations while balancing long-term control energy, which can better reflect the logic of resource optimization in the economic sense.
Despite these advances, current research still faces two critical limitations in addressing optimal control problems for opinion dynamics over large-scale complex networks:
L1: Dimensionality versus analytical tractability. Representative studies [
13,
15,
16,
17], represented by Jiang et al. [
13], focus on deriving analytical, closed-form solutions for optimal control. While mathematically rigorous, such approaches are typically limited to highly symmetric network topologies (e.g., complete graphs, star graphs) and rely on scalar-value analysis. Their computational complexity grows rapidly with the network size
n, making them impractical for realistic networks with moderate or large numbers of nodes and heterogeneous structures.
L2: Insufficient exploration of general complex topologies. Existing numerical methods, such as the greedy strategies in [
18,
19], improve scalability but often overlook the fundamental role of network structure in shaping optimal control performance. In particular, few works systematically compare control performance between representative complex networks [
20] such as small-world networks [
21] and scale-free networks [
22], which capture the core structural features of real social networks.
To overcome the above limitations, this paper develops a unified optimal control framework for opinion dynamics by recasting the problem into a discounted discrete-time LQR problem. By introducing a deviation-state transformation, the original opinion control problem is converted into a standard discounted LQR form, allowing us to avoid the complicated linear terms in traditional dynamic programming derivations [
23]. The proposed matrix-based approach naturally handles general network topologies and remains computationally tractable for medium-scale networks (tens to hundreds of nodes).
The main contributions of this paper are summarized as follows:
Theoretical Contribution: We establish a discounted LQR formulation tailored to row-stochastic opinion dynamics. The main theoretical clarification is not the invention of a new Riccati theory, but the observation that the row-stochastic structure of opinion dynamics, together with the discount factor , automatically makes the transformed pair stabilizable. This removes the need for case-by-case stabilizability verification with respect to network connectivity, controllability, or control-node placement.
Methodological Contribution: We derive the corresponding discounted DARE and optimal state-feedback law in the deviation-state coordinates. Unlike analytical approaches that are restricted to small or highly symmetric networks, the resulting matrix-based computation can be applied to arbitrary row-stochastic topologies. The proposed fixed-point Riccati iteration is used primarily as a transparent implementation of the discounted DARE, and its numerical consistency is checked against a standard DARE solver.
Applied Contribution: Numerical simulations validate the framework on benchmark cases, complete graphs, scale-free networks, and small-world networks. The results reveal that network heterogeneity significantly affects convergence speed and control energy consumption, providing insights for real-world opinion guidance strategies.
The remainder of this paper is organized as follows.
Section 2 formulates the controlled opinion dynamics model and the discounted LQR problem.
Section 3 presents the theoretical framework, including the discounted DARE and the stabilizability theorem.
Section 4 describes the numerical solution algorithm.
Section 5 reports numerical experiments, including benchmark validation and network topology comparisons.
Section 6 concludes the paper.
5. Numerical Experiments and Verification
This section evaluates the proposed discounted LQR formulation from three complementary perspectives. First, a four-agent complete graph benchmark is used to verify numerical consistency with the analytical case in [
13]. Second, a 20-agent complete graph is used to test whether the matrix-based Riccati computation remains stable and efficient beyond the small analytical setting. Third, scale-free and small-world networks are compared to clarify how topology, through the row-stochastic matrix
A, affects convergence speed, control energy allocation, and final regulation accuracy.
Unless otherwise stated, all experiments use the fixed-point Riccati iteration in Algorithm 1 with tolerance
and initialization
. The control energy and mean absolute error are computed as
The convergence time is defined as the first time step at which and remains below this threshold in the reported horizon. The implementation codes use 600 dpi figures, enlarged labels, and fixed random seeds where random network generation is involved. The current numerical evidence is intended to validate the proposed formulation and clarify structural mechanisms, rather than to provide an exhaustive large-scale simulation benchmark.
5.1. Benchmark Case Validation
We first reproduce the classical four-agent example in [
13]. This test serves two purposes: it checks whether the deviation-state discounted LQR formulation reproduces the known opinion control behavior, and it compares Algorithm 1 with a standard DARE solver applied to the transformed undiscounted system
. The influence matrix is
Control input applies to the first agent, i.e.,
. The target consensus value
, parameters
,
, and initial state
. The experimental results are shown in
Figure 1 and
Table 1. The scalar input is applied to Agent 1, i.e.,
. The target opinion is
, the control parameters are
and
, and the initial state is
. The results are shown in
Figure 1 and
Table 1.
Figure 1a shows that the controlled agent initially overshoots the target, which is typical of an optimal feedback policy that first applies a relatively strong intervention and then lets the network interaction redistribute the effect. Agents 2–4 have nearly identical trajectories because of the symmetry of the benchmark matrix. All opinions approach the target
within the displayed horizon.
Figure 1b shows that the control input decays rapidly from its initial value, reflecting the decreasing marginal need for external intervention as the deviation state becomes small.
For numerical verification, Algorithm 1 is compared with the standard DARE solver after applying the transformation and . The Frobenius-norm error of the Riccati matrix P is , and the Euclidean-norm error of the feedback gain K is . This comparison confirms consistency with a mature DARE solver; it is not intended to claim computational superiority of the fixed-point implementation.
Table 1 also illustrates the evolution of agent opinions and the trajectory of the optimal control input. As shown in
Figure 1a, Agent 1’s initial opinion of 0.7 first jumps to approximately 0.9 under control action, after which all agents’ opinions converge to the target consensus value
within about
time steps. The control input
exhibits a typical exponential decay characteristic (
Figure 1b), the initial value of 0.331 decreases to the magnitude of
(0.001) after 20 steps, which is consistent with the theoretical property of optimal control for linear systems. These results match the analytical solution in [
13] with an error of less than
, further verifying the correctness and effectiveness of the proposed numerical framework.
5.2. Scalability Validation
We next consider a 20-agent complete graph to test the computational scalability of the matrix-based implementation. Each agent assigns 40% of its trust to itself and distributes the remaining 60% uniformly among the other agents, leading to
The actuation vector is , the target is , and the parameters are and . Initial opinions are linearly distributed in . This example corresponds to a distributed single-input control scheme: a single scalar control signal is simultaneously applied to two selected agents, weighted by different actuation gains (0.7 and 0.3). This is not a true multi-input control experiment. In a standard multi-input LQR formulation, the actuation would be described by a matrix and the control input would be a vector .
While the extension to multi-input LQR is straightforward within the proposed framework, we leave this direction for future work. Our ongoing research will focus on multi-agent control design under game-theoretic frameworks, where multiple independent decision-makers optimize their own objectives, leading to a richer class of distributed intervention problems.
As shown in
Figure 2a, all 20 opinions approach the target despite the broad initial opinion spread.
Figure 2b shows a strong initial input followed by a smooth decay. The Riccati iteration converges in 26 iterations in this setting. Since each iteration is dominated by the multiplication
, the dense-matrix complexity is
per iteration. This supports the applicability of the proposed implementation to medium-scale networks, while very large-scale sparse networks would require specialized sparse or low-rank Riccati solvers.
Figure 3 provides a direct numerical explanation of the role of the discount factor. When
increases from
to
, the Riccati iteration count increases from 17 to 38 and the control energy increases from 0.1787 to 0.4100, indicating a higher numerical and intervention cost. At the same time, the convergence time decreases from 50 to 33 steps and the final error decreases from
to
, indicating better long-term regulation accuracy. Hence,
should be interpreted as a planning-horizon parameter rather than as a purely technical constant: larger values emphasize long-term accuracy, whereas smaller values reduce control effort and numerical burden.
5.3. Complex Network Topology Comparison
We compare a scale-free network and a small-world network to isolate the effect of topology on optimal opinion control (see
Figure 4). Both networks contain
nodes and use the same control parameters as in
Section 5.2:
,
, and
. Thus, the observed differences arise from the structure of the row-stochastic influence matrix
A.
Scale-Free Network: Generated using the Barab’asi–Albert model with parameter . This network exhibits significant degree heterogeneity, with average degree , clustering coefficient , and contains a few highly connected hub nodes.
Small-World Network: Generated using the Watts–Strogatz model with parameters . This network combines local clustering properties of regular networks with short path lengths of random networks, featuring average degree , clustering coefficient , and characteristic path length .
Trust matrices are generated through row randomization: edge weights are first assigned based on network topology (uniform distribution ), then normalized to ensure each row sums to 1, with self-trust weight set to .
Network topology directly determines the row-stochastic matrix
A, which governs the system’s convergence speed, control energy distribution, and steady-state error. This section quantifies the impact of different topologies on control performance, clarifying the role of network structure in the proposed framework. For the discount-factor sensitivity test on the complete graph, we additionally report the stricter final-tracking behavior over a longer horizon; for the topology Monte Carlo test, the same
convergence threshold is used so that the results remain comparable with
Figure 5.
In the scale-free network, several trajectories move rapidly toward the target after the initial intervention, indicating that hub-mediated diffusion helps spread the control effect as shown in
Figure 5a. In the small-world network, trajectories are more homogeneous but converge more slowly, because no small set of hubs dominates information propagation as shown in
Figure 5b. This contrast explains why a topological difference can be visible even though the same discounted LQR law is used.
The control inputs in
Figure 5c further show different energy allocation mechanisms. The scale-free network requires a larger early input but then decays faster, with total control energy
. The small-world network starts with a smaller input but requires a more persistent intervention, with
. Thus, the scale-free topology favors concentrated early action through hub-mediated diffusion, whereas the small-world topology favors a slower and more sustained regulation process.
The convergence error comparison in
Figure 5d confirms this mechanism. The scale-free case reaches the threshold
at approximately
, whereas the small-world case reaches it at approximately
. The final errors are
and
. These results suggest that heterogeneous topologies can improve convergence speed and final accuracy when the actuation vector is aligned with influential nodes.
To further check whether this topology-dependent tendency is robust to random network generation, we conducted a Monte Carlo comparison with 50 random realizations for each topology. For each run, the network structure, edge weights, and initial opinions are regenerated, while the control parameters remain fixed at
,
, and
. The resulting means and standard deviations of
,
, and
are summarized in
Figure 6.
The Monte Carlo results support the mechanism observed in the representative example. The scale-free networks yield an average final error of and an average convergence time of , whereas the small-world networks yield and , respectively. The average control energies are comparable: for scale-free networks and for small-world networks. These results do not replace a full statistical hypothesis test, but they show that the faster convergence and smaller final error of scale-free networks are not artifacts of a single network instance.
The comparison with pinning control should also be understood carefully. Pinning control methods often emphasize the importance of controlling hub or leader nodes. Our results are consistent with this intuition, but the present discounted LQR formulation optimizes the feedback gain for a prescribed actuation vector b rather than solving the separate combinatorial actuator placement problem. A systematic comparison among hub node, peripheral node, and budget-constrained optimized actuation sets is therefore left as future work.
Thus, the present experiments should be read as evidence that topology affects the performance of a prescribed LQR actuator, not as a complete solution to optimal actuator placement. This distinction keeps the numerical claim aligned with the theoretical scope of the paper.
6. Conclusions
This paper developed a discounted LQR framework for the optimal control of opinion dynamics on row-stochastic networks. By shifting the original opinion state to the deviation state , the target-consensus regulation problem is converted into a standard quadratic regulation problem. This transformation removes the affine terms that would otherwise appear in a direct dynamic programming treatment and makes it possible to compute the optimal feedback law through a discounted DARE. The main theoretical clarification is that discounting automatically verifies the stabilizability condition for the transformed pair whenever A is row-stochastic and . This observation should be understood as a problem-specific verification of classical discounted LQR assumptions, rather than as a new Riccati theory. Accordingly, the revised stability statements are made in the transformed discounted coordinates, while the Schur stability of the unscaled matrix is treated as a numerical property to be checked in concrete examples. The numerical results support the usefulness of the formulation in three ways. The four-agent benchmark confirms consistency with a standard DARE solver and with the known analytical example. The 20-agent complete graph demonstrates that the matrix-based implementation remains computationally feasible for medium-scale networks. The scale-free and small-world comparison shows that topology affects the temporal distribution of control energy, convergence speed, and final error through the influence matrix A. In particular, hub-mediated scale-free networks enable faster diffusion of early interventions, whereas small-world networks require more persistent control.
From a practical viewpoint, these findings suggest that opinion guidance strategies should not only choose the magnitude of intervention, but also account for how network structure propagates that intervention. When influential hub nodes are available, early concentrated actuation may be more efficient; when the network is more homogeneous, sustained moderate intervention may be required. This provides a control-theoretic explanation for intuitions commonly used in pinning or leader-based opinion control, while keeping the optimality criterion explicit through the discounted LQR cost. Several limitations remain. The actuation vector is prescribed rather than optimized, so the combinatorial problem of actuator placement is outside the scope of this paper. Moreover, the current model assumes a known and time-invariant row-stochastic influence matrix. Future work will therefore consider budget-constrained actuator selection, multi-input distributed control with , time-varying or partially observed networks, stubborn agents, multiplex structures, and reinforcement learning methods for model-free opinion guidance. In such model-free settings, the discounted LQR solution derived here can serve as a benchmark for evaluating learned policies. The sensitivity and Monte Carlo checks partially address the remaining numerical concerns: the former clarifies how trades off regulation accuracy, control effort, and Riccati iteration count, while the latter shows that the scale-free advantage in convergence speed and final error persists across repeated random realizations. Nevertheless, broader large-scale benchmarks and formal actuator placement optimization remain important future work.