In order to evaluate the performance of the CIM-LP mechanism, this section compares the CIM-LP mechanism proposed in this paper with three existing incentive mechanisms through simulations and provides a detailed introduction to the simulation settings, the comparative mechanism, the performance metrics, and a performance comparison and analysis.
5.1. Simulation Environment Settings
This simulation is based on Python version 3.9 and utilizes the PyTorch 2.1 framework for model construction and training. For the dataset selection, the Gowalla dataset is used to model the social relationships among participants in the real world. Gowalla is a location-based social networking service provider that allows users to check in at specific locations and share location information, thereby forming a social network through natural interactions among users [
30]. The dataset includes users’ geographical check-in information and activities across different times and locations and a social network consisting of 196,591 user nodes and 950,327 edges. As shown in Equation (
1), Jaccard similarity is used to calculate the social closeness between each pair of users in the Gowalla dataset, generating a social closeness matrix [
26]. Based on this matrix, subsets of small-scale social networks with a specific average social closeness level can be extracted, thereby constructing a corresponding social closeness matrix. Instead of the social network constructed using a normal distribution in [
31], we used a real dataset, making it more relevant to the real world.
The experiment was simulated using social networks of varying sizes, with the participant counts ranging from 100 to 500, in increments of 100, to evaluate the model’s performance across different scale scenarios. Additionally, the influence of varying average social closeness between the participants (ranging from 0.1 to 0.9, with step sizes of 0.2) was also considered.
For the time-series data modeling problem, this study designs a prediction module based on a single-layer long short-term memory network. Drawing on the application experience with an LSTM structure in mobile crowdsensing from reference [
13], the initial configuration includes 128 hidden units, a sequence length of 5, and a Dropout probability of 0.2 to mitigate overfitting. To validate the rationality of the hyperparameter selection, sensitivity experiments were conducted, comparing the impact of different numbers of hidden units (64, 128, 256) and different Dropout rates (0.1, 0.2, 0.3) on the model performance. The results indicate that a combination of 128 units and a 0.2 Dropout rate strikes a good balance between accuracy and stability.
In the strategy optimization section, the proximal policy optimization algorithm was selected, with the learning rate set to 0.0003, a discount factor of 0.99, and a clipping parameter of 0.1. These configurations are based on the findings in reference [
13], which highlight the strong performance of PPO in dynamic environments. During training, the model employs a mini-batch update method, sampling 125 trajectory samples from the experience buffer for each training step. To further demonstrate the stability of the hyperparameter settings, experiments were conducted testing different combinations of learning rates (0.0001, 0.0003, 0.001) and clipping ranges (0.05, 0.1, 0.2). The results show that the current configuration achieves a better performance in terms of the convergence speed and policy generalization.
The specific parameters used in the simulation experiments are provided in
Table 2.
5.3. Performance Metrics
In this section, evaluation metrics such as the average participant utility, average monetary utility, average social utility, and task completion rate will be introduced to assess program performance.
1. Average participant utility: This represents the average utility of all participants at all time periods, as shown in Equation (
25):
Here, n represents the total number of participants, T represents the total number of periods, and represents the utility of participant in period t.
2. Average monetary utility: This represents the average monetary utility of all participants over all time periods, as shown in Equation (
26):
Here, represents the monetary utility of participant in different periods.
3. Average social utility: This represents the average social utility of all participants at all times, as shown in Equation (
27):
Here, represents the social utility of participant in different time periods.
4. The task completion rate: To evaluate the effectiveness of various mechanisms in task completion during the sensing process, the average task completion rate throughout the sensing process
T is calculated as shown in Equation (
28):
Here, the function
is defined as shown in Equation (
29):
This function is used to indicate whether the sum of the sensing duration in time period t satisfies the threshold value required by the task . When the sensing duration is greater than or equal to the task threshold value, the task is considered to be completed. Otherwise, the task cannot be completed.
5.4. Performance Comparison and Analysis
The convergence of the average participant utility obtained using various comparison mechanisms under the conditions
and
is illustrated in
Figure 4. Based on the experimental results, it can be seen that the utility achieved by CIM-LP converges to approximately 4.50 after 500 training rounds and remains stable thereafter. In contrast, the benchmark mechanisms PPO-DSIM and RLPM converge to approximately 3.65 and 2.75 only after about 450 and 600 training iterations, respectively. It can be seen that the CIM-LP mechanism enables the participants to achieve a higher utility. Compared to the PPO-DSIM mechanism, the CIM-LP mechanism introduces long short-term memory networks on the basis of deep reinforcement learning. This enhancement improves the neural network’s ability to process sequential data, especially time-dependent data, thereby increasing the accuracy of the decision-making. In contrast to the RLPM mechanism, the experimental results demonstrate the advantages of the CIM-LP mechanism in handling continuous action spaces. Furthermore, incentive mechanisms based on deep reinforcement learning demonstrate higher utility compared to those in approaches such as GSIM-SPD. This is because GSIM-SPD finds it difficult to deal with complex and dynamic decision environments and lacks the ability to learn and optimize online, which leads to lower efficiency in practice.
Figure 5a–d represent the effects of changes in the number of participants on the average participant utility, average monetary utility, average social utility, and task completion rate at a social closeness
, respectively. As can be seen from
Figure 5a, regardless of the number of participants, CIM-LP consistently achieves a higher average utility. For CIM-LP, when
, the average utility is higher than that for PPO-DSIM, RLPM, and GSIM-SPD, with values of 0.75, 1.86, and 2.45, respectively. This demonstrates that CIM-LP can dynamically adjust the perception duration based on factors such as credibility, resource status, and task rewards. Additionally, compared to PPO-DSIM, the CIM-LP mechanism incorporates an LSTM network, which enables the inclusion of a large amount of critical temporal data into the decision-making process, thereby improving the utility reward.
As shown in
Figure 5b, the average monetary utility of the participants decreases as the number of participants increases. For example, when the number of participants is 100, the CIM-LP mechanism demonstrates a relatively high average monetary utility. However, when the number of participants increases to 300, the average monetary utility of CIM-LP drops to 0.59, which is 1.38 lower than that at 100 participants. This suggests that with an increasing number of participants, the competition for resources intensifies, leading to a gradual decrease in the rewards that each participant can obtain. Despite this, the CIM-LP mechanism consistently maintains a high level of average monetary utility. This is primarily attributed to the ability of CIM-LP, through LSTM-PPO, to dynamically adjust the reward distribution and sensing duration strategies, ensuring that each participant receives a reasonable monetary utility even in larger-scale environments. In contrast, the PPO-DSIM, RLPM, and GSIM-SPD mechanisms, when dealing with a large number of participants, fail to adjust the strategies and reward distributions as flexibly as CIM-LP. The lack of processing of the historical temporal data on the other participants results in these mechanisms not maintaining a comparable average monetary utility in more complex resource competition scenarios. As shown in
Figure 5c, the average social utility of the participants increases with the number of participants. For instance, for CIM-LP, when the number of participants is 300, the average social utility is 9.86, which is 7.33 higher than when the number of participants is 100. This is because a larger number of participants leads to a broader social network, which in turn motivates each participant’s sensing duration strategy, resulting in a higher average social utility. Additionally, for RLPM, as the number of participants increases from 300 to 500, there is a significant increase in the average social utility of the participants. This phenomenon arises because RLPM enhances the social utility by extending the participants’ sensing durations. However, this strategy leads to excessive energy consumption, which negatively impacts the task completion rates.
As shown in
Figure 5d, for the three deep-reinforcement-learning-based mechanisms (CIM-LP, PPO-DSIM, and RLPM), the task completion rate decreases at a progressively faster rate as the number of participants increases. For CIM-LP, for example, when
, the task completion rate reaches 90.20%, which is a 0.3% decrease compared to
and a 5.2% increase compared to
. This is because as the number of participants increases from 100 to 300, the amount of data collected from the participants increases, which facilitates smooth completion of the task. However, as the number of participants continues to increase, the social utility becomes more significant, and the participants tend to gain more utility rewards by increasing their sensing duration. But since mobile devices have limited energy, indiscriminately extending the sensing duration can lead to devices running out of power before tasks are completed, thereby reducing the task completion rate.
Figure 6a–d represent the impact of changes in the average social closeness
on the average participant utility, average monetary utility, average social utility, and task completion rate at a participant number
, respectively. As shown in
Figure 6a, CIM-LP achieves the best sensing strategy and the highest utility reward. For instance, when
, the average participant utility for CIM-LP is 7.73, surpassing PPO-DSIM, RLPM, and GSIM-SPD by 0.45, 2.61, and 4.35, respectively. CIM-LP, through its deep reinforcement learning mechanism, is able to effectively adjust its sensing strategy in complex environments, enabling the participants to achieve a higher utility. This reflects the mechanism’s comprehensive consideration of multiple factors in the decision-making optimization process. Furthermore, as shown in
Figure 6b, CIM-LP demonstrates a clear advantage in terms of average monetary utility, yielding more average monetary utility than the other mechanisms. This is because the mechanism takes into account factors such as the historical data, participants’ individual conditions, and device energy during the optimization of the decision-making, thereby effectively compensating for the shortcomings in sensing duration decision-making.
As shown in
Figure 6c, it can be seen that for all mechanisms, the average social utility of the participants increases with the growth of
. For example, in the case of GSIM-SPD, when
, the average social utility is 3.37. When
is 2.11, the former is 1.26 higher than the latter. Additionally, by combining
Figure 6c and
Figure 6b, the simple sensing strategy mechanism GSIM-SPD, when compared with the other mechanisms, is most affected by the average social closeness. However, if limited energy resources and task rewards are taken into account, the sensing strategy mechanism requires reasonable optimization during the long-term sensing process. The CIM-LP mechanism will guide the participants to adopt the appropriate sensing strategies, thereby maximizing the monetary and social utility rewards and achieving higher efficiency in the long term.
As shown in
Figure 6c,d, it can be seen that the average social utility increases with the growth of
for two reasons. For GSIM-SPD, the increase in the average social utility is primarily due to the enhancement of the sensing duration, which is evidenced by a reduction in the task completion rate. This is because GSIM-SPD lacks foresight, indiscriminately guiding the participants to extend their sensing durations to gain more immediate social utility, leading to excessive energy consumption and affecting the completion of subsequent tasks. In contrast, for CIM-LP, PPO-DSIM, and RLPM, the increase in average social utility is mainly caused by the increase in average social closeness, which can be confirmed by the almost constant task completion rate. In addition, it is noteworthy that when
, the task completion rate with CIM-LP reaches 91.13%. This advantage lies in the fact that CIM-LP aims to maximize the overall utility of all sensing tasks by guiding the participants to save device energy in a reasonable way, achieving greater utility in the future. Meanwhile, by integrating LSTM networks, the prediction of the sensing durations for the other participants has been optimized, enabling them to intelligently adjust their strategy execution, effectively manage their energy consumption, and ensure efficient completion of a large number of tasks.
CIM-LP demonstrates significant advantages in its experimental results, effectively enhancing participant utility and the task completion rate. However, this advantage comes with a computational complexity that increases with the number of participants. According to the computational complexity analysis, CIM-LP’s complexity is , where N is the number of participants, L is the number of network layers, and n is the number of neurons per layer. In particular, CIM-LP performs decision optimization through deep reinforcement learning (LSTM-PPO), which requires significant computational resources, especially when handling participant decisions. The behavior and decision-making processes for each participant involve complex network computations. In contrast, the computational complexity of PPO-DSIM, RLPM, and GSIM-SPD is lower, primarily due to the simplified decision-making processes in these mechanisms. Specifically, PPO-DSIM utilizes the PPO algorithm within deep reinforcement learning, RLPM relies on Q-learning, and GSIM-SPD adjusts the decisions through a policy optimization model, resulting in a much lower computational overhead compared to that of CIM-LP.
For the CIM-LP mechanism, participants with different levels of credibility receive varying utility rewards at each moment, and these differences ultimately influence their cumulative utility rewards.
Figure 7a illustrates the effect of varying the initial credibility levels within the same group on the cumulative utility rewards when
and
. The experimental results indicate that participants with a higher initial credibility achieve faster growth in their cumulative utility rewards. For instance, after 10 experimental rounds, the cumulative utility reward for participants with a credibility
reaches 550.25, which is 481.85 higher than that for participants with a credibility
. This demonstrates that participants with higher credibility are more likely to receive rewards under the CIM-LP mechanism, motivating them to consistently provide high-quality data in future tasks. Furthermore,
Figure 7b further shows the trend in the average credibility changes across experimental rounds for groups with different initial credibilities, where each group consists of 100 participants. The experimental results indicate that the CIM-LP mechanism can effectively guide different groups to improve their data quality through a dynamic reward system. Notably, groups with high initial credibility maintain a relatively high level of average credibility early in the experiment, with their credibility stabilizing over the course of the task. In contrast, groups with low initial credibility gradually improve their average credibility throughout the experimental rounds. For example, for an average credibility
, the value reaches 0.482 in the 10th round, an increase of 0.282 compared to that in the first round. This dynamic trend demonstrates that the CIM-LP mechanism can enhance the overall credibility levels while reducing fluctuations in the data quality.
Figure 8 illustrates the energy consumption performance of the four different mechanisms across time slots. By comparing the remaining device energy across time slots, the advantages and disadvantages of each mechanism in energy management are revealed. The experimental parameters are the same as those in
Figure 5a, and the number of participants is 100. For each mechanism, the initial energy of each device is 50. The CIM-LP mechanism shows the best performance, with the slowest rate of energy depletion and residual energy still remaining available after 50 time slots. This underscores its efficiency and long-term stability in managing the energy consumption. By prioritizing high-reward tasks and minimizing the use of energy on low-reward ones, CIM-LP demonstrates its capability to optimize the sensing strategies. In contrast, the GSIM-SPD mechanism exhibits rapid early energy consumption, leading to device energy depletion after approximately 30 time slots, rendering it incapable of completing subsequent tasks and reflecting its inefficiency. The PPO-DSIM and RLPM mechanisms achieve a balance between short-term and long-term efficiency through reinforcement learning. However, their energy management remains inferior to that of CIM-LP.
5.5. Analysis of Key Parameters
To evaluate the performance stability and adaptability of the CIM-LP mechanism under different parameter settings, this study conducts sensitivity experiments and robustness tests on key parameters, including the LSTM structure parameters, the beta distribution initialization parameters, and the PPO hyperparameters (such as the learning rate, clipping range, and discount factor ). The focus is on examining the impact of these parameters on the average participant utility, task completion rate, and model convergence speed.
The number of hidden units in the LSTM network is set to 64, 128, and 256, respectively, to compare their effects on model utility and convergence speed. The results are shown in
Table 3.
It can be observed that when the number of hidden units is 128, the model achieves an optimal balance between utility improvements and convergence speed. The initial parameters of the beta distribution are set as follows:
The impact on the credibility convergence and long-term rewards is tested, and the results are shown in
Table 4.
The experiment shows that the initial credibility setting affects the early-stage strategy formation and reward accumulation. Initial values that are too high or too low may lead to bias, while (1,1) or (2,2) settings exhibit greater generality and robustness.
Under the condition that the other settings remain unchanged, the following hyperparameters—learning rate, clipping parameter (
), and discount factor
—are individually adjusted to test their impact on the model performance. The results are shown in
Table 5.
As can be seen from the table above, the default settings (learning rate of 0.0003, , ) achieve an optimal balance between utility and stability. A larger value helps guide long-term strategy optimization, while an excessively large parameter weakens the constraint on the policy updates, leading to instability.
The above experimental results indicate that the CIM-LP mechanism demonstrates good robustness to the key parameters. Within a moderate range of parameter perturbations, the system performance remains stable, with the average utility and task completion rate maintaining high levels. Among them, the LSTM structure settings; the beta distribution’s initial parameters; and and the discount factor in PPO have a significant impact on the model’s performance and thus deserve attention and fine-tuning during actual deployment.