4.1. Collecting Data on Online Public Opinion and Tagging Events
The dataset used in this paper was obtained from relevant websites in the field of agricultural product safety through web crawling, constructing a multimodal news and public opinion corpus. The crawled data was preprocessed to meet the basic requirements of the online public opinion analysis model for agricultural product safety. The sources of the public opinion corpus include agricultural product-related information portals, such as Partner.com, Toutiao, China Quality News Network, and Baidu News. A total of 3847 news reports on online public opinion risks related to agricultural product safety were collected. Considering the characteristics of the corpus, we developed an event pattern specifically for online public opinion in the field of agricultural product safety. This pattern uses the same event description to match text and images. The dataset contains 3847 image-text pairs, each labeled with 5 event types and 28 AR-parameter roles.
Table 5 shows the crawled and processed multimodal corpus.
Table 6 details five categories of online public opinion events related to agricultural product safety, specifically including non-compliance events, hygiene events, poisoning events, counterfeit and infringement events, and expired agricultural product events.
The public opinion data texts were preprocessed, including HTML tag removal, handling of missing values, Chinese word segmentation, and stop-word removal. First, tools such as Beautiful Soup were used to remove HTML tags and clean redundant text. Next, missing values were handled: rows were deleted if the text was empty, and empty URLs were replaced with empty strings. Then, Chinese text was segmented using Jieba’s precise mode, dividing the text into words to prepare for subsequent vectorization and word embedding model training. Finally, stop words were removed using the Harbin Institute of Technology stop-word list to clean words without practical meaning, while retaining professional terms related to safety of edible agricultural products.
After preprocessing the collected multimodal public opinion corpus, the paper performed data annotation. First, the safety of edible agricultural products as per online public opinion events were categorized into five types: non-compliance events, hygiene events, poisoning events, counterfeiting and infringement events, and expiration events. Then, the event texts were annotated, where the event type contained in the corpus was labeled as “1,” and event types not contained were labeled as “0.” A sample of text annotations is shown in
Table 7.
The argument role extraction task involves extracting the argument roles associated with events from the event corpus based on the event types it contains. Considering the integration with subsequent research, this paper adopts the BIO tagging method, which is easier to implement for text, to accomplish the argument role annotation task. Three letters are used as labels for different argument roles within the same event type. Due to the specificity of the domain corpus, argument roles with identical values are annotated with the same BIO labels. For example, in cases where the “Involved Amount” in the “Counterfeiting and Infringement” event type and the “Fine Amount” in the “Expiration” event type have the same value in the same sentence, both are labeled with the same “B-MOY” and “I-MOY” tags. The detailed annotation of argument roles is shown in
Table 8.
4.2. Analysis of Experimental Results of Multi-Modal Event Extraction Method for Safety of Edible Agricultural Products Online Public Opinion
In this study, this paper allocates 80% of the dataset for multimodal event extraction and uses the remaining data for the test set, as shown in
Table 9.
To rigorously assess the performance of the developed model, several cutting-edge models were chosen for comparison experiments and evaluated against the model presented in this paper. Specifically, they are as follows:
DMCNN: A two-phase event extraction approach utilizing a dynamic multi-pooling convolutional neural network model.
Joint3EE: A jointly trained model based on shared bidirectional GRU hidden layer representations, which can simultaneously complete the prediction tasks of entity mentions, event triggers, and arguments.
BERT-CRF: An event extraction method that acquires trigger word features based on the BERT pre-training model and classifies through Conditional Random Field (CRF).
BERT-BLSTM-CRF: An event extraction method that, after the BERT pre-training model, uses a bidirectional LSTM network in the feature layer to extract text context features and then classifies through Conditional Random Field (CRF).
For the recognition task, the model introduced in this paper is evaluated against the four event extraction methods mentioned earlier in the experiments. The results of these experiments are presented in
Table 10 below.
The experimental results show that for the safety of edible agricultural products online public opinion dataset constructed in this paper, the proposed multimodal event extraction method delivers the highest performance among all the models compared in the event recognition task. Specifically, when compared with four existing event extraction methods—DMCNN, Joint3EE, BLSTM-CRF, and BERT-BLSTM-CRF—the multimodal event extraction approach shows notable enhancements in precision, recall, and F1 score. The precision increased from 75.32% with the DMCNN model to 81.03%, the F1 score improved from 76.27% with the Joint3EE model to 82.70%, and the recall rose from 74.93% with the Joint3EE model to 81.86%.
These results fully confirm the outstanding performance of the proposed model in event recognition. The performance improvement stems from the model’s effective integration and utilization of multimodal features, as well as its deep understanding of event semantics. On one hand, the multimodal event extraction method integrates information from different modalities, comprehensively capturing the context and intrinsic relationships of events, thereby improving the accuracy and robustness of event recognition. On the other hand, it optimizes the event classification process, demonstrating more efficient information processing capabilities. Experiments prove that introducing image features in addition to traditional event recognition tasks not only supplements missing semantic information for certain public opinion news texts but also helps to eliminate ambiguities, ultimately enhancing the model’s event classification ability. For the argument extraction task, the proposed model is also compared with the neural network models discussed earlier, and the experimental results are presented in
Table 11.
The experimental results show that in the comparison experiments for the argument extraction task, the proposed model also demonstrates excellent performance. Compared with existing advanced models such as DMCNN, Joint3EE, BLSTM-CRF, and BERT-BLSTM-CRF, the multimodal event extraction method achieves the highest scores in precision, recall, and F1 score. In particular, the highest recall demonstrates that integrating entity information from images along with textual features enhances the model’s ability to accurately identify event-related argument roles. Moreover, the precision increased from 73.31% with the BERT-BLSTM-CRF model to 78.15%, the recall rose from 75.76% with the Joint3EE model to 79.47%, and the F1 score improved from 73.71% with BERT-BLSTM-CRF to 78.80%. The experimental results indicate that the proposed model exhibits better generalization ability in the argument extraction task, enabling it to capture entity information that other models fail to annotate. Additionally, compared to joint learning models such as Joint3EE, the proposed model exhibits greater flexibility and accuracy when handling argument extraction tasks. This demonstrates that the model proposed in this paper not only performs well in single tasks but can also effectively handle complex, multi-task learning scenarios.
4.3. Analysis of Experimental Results of Model Based on HDBSCAN Algorithm
In the experiments on the safety of edible agricultural products online public opinion event discovery model, this paper obtains the event types and argument roles of safety of edible agricultural products online public opinion news through the multimodal event extraction method. Taking the “non-compliance” event type as an example of safety of edible agricultural products online public opinion events, the arguments and their corresponding roles extracted through the multimodal event extraction are shown in
Table 12.
To improve the accuracy of the event discovery task, argument roles are represented as simple sentences to construct the text dataset. An example is as follows: on 28 June 2022, the Market Supervision Administration of Qingyuan County, Zhejiang Province, reported that pure milk produced by Maqu’er was found to contain the non-compliant item propylene glycol.
Similarly, all news articles are processed using this method to obtain a simple-sentence text dataset. Combined with the image dataset for each news article, image-text pairs are created, and a multimodal event dataset for the event discovery task is ultimately constructed.
To evaluate the differences between the HDBSCAN event discovery clustering algorithm used in this paper and general text clustering methods, three algorithms—K-Means, DBSCAN, and HDBSCAN—were selected for three sets of comparative experiments. Furthermore, to eliminate the influence of variables other than the clustering algorithm, all comparative experiments use the same multimodal feature representation method. This experiment adopts three evaluation metrics—Silhouette Coefficient (SC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI)—to quantify clustering performance. The detailed results of the event discovery clustering comparison experiments are shown in
Table 13.
For different clustering algorithms, the larger the values of the clustering evaluation metrics within their respective ranges, the better the quality of the clustering results. Based on the experimental results in the table above, the HDBSCAN clustering algorithm demonstrates a significant performance advantage compared with traditional K-Means and DBSCAN algorithms, achieving the highest SC, NMI, and ARI values, which are 0.8692, 0.8751, and 0.8721, respectively. The three evaluation metric values of DBSCAN are slightly inferior, which may be due to the fact that the parameter eps in DBSCAN is a global variable, whereas HDBSCAN is a density-adaptive algorithm. The clustering performance of the K-Means algorithm shows greater variation, likely due to its inherent algorithmic limitations. Given that the text data in this paper contains a certain level of noise and the data distribution may not be uniform, the robustness of the K-Means clustering algorithm performs notably worse compared to the other two algorithms. Therefore, based on a comprehensive comparison of the experimental results, adopting the HDBSCAN clustering algorithm can efficiently accomplish the clustering-based event discovery for the corpus in this study.
After the event discovery clustering task, each multimodal data point is assigned a cluster label representing its corresponding event topic. By analyzing these cluster labels, all multimodal information belonging to the same event can be filtered, thereby obtaining the complete event clustering results. In this experiment, a total of 432 different event topics were identified, with some of the corpus samples shown in
Table 14.
4.4. Analysis of Experimental Results of the Online Public Opinion Risk Trend Prediction Model for Safety of Edible Agricultural Products
The paper aims to construct an intelligent model capable of positively or negatively predicting the future trends of online public opinion events related to food safety, addressing the shortcomings of traditional public opinion analysis methods in understanding temporal evolution and responding to sudden risks. Given the rapid spread, wide impact, and complex evolution of food safety public opinion, its development is manifested not only in changes in emotional intensity but also in the rise (negative evolution) or fall (positive evolution) of risk levels over time. Therefore, this paper constructs a deep learning prediction model to extract multimodal sentiment and semantic features, achieving accurate prediction of future positive or negative trends.
To better illustrate that the LSTM-PPO model designed in this paper outperforms single-model LSTM and PPO, an ablation experiment was designed. Then, based on the dataset collected in this paper, the risk trends of online public opinion on food safety were predicted using multiple models, and the comparative experimental results were analyzed.
- (1)
Ablation experiment
In the ablation experiments, this paper used four model evaluation metrics: success rate (task completion rate); time per experiment (average time per experiment); convergence steps (number of steps required to train to a stable policy); and stability index (measures the degree of policy fluctuation during training; the lower the index, the more stable the policy). Regarding the experimental environment settings, the LSTM in LSTM and LSTM-PPO is consistent, and the settings for PPO and LSTM_PPO are consistent.
As shown in
Table 15, the comprehensive analysis of the results for the three tasks—pure memory tasks, interference/noise tasks, and temporal inference tasks—leads to the conclusion that the LSTM-PPO combined model performs best across all tasks, based on the analysis of four key indicators: success rate, time efficiency, convergence steps, and policy stability. Specifically, LSTM-PPO achieves a higher success rate than either LSTM or PPO alone, reaching up to 92%, and maintaining a 78% success rate even in noisy or interference environments. In terms of time efficiency, each round takes only 0.11–0.12 s, significantly faster than both LSTM and PPO, resulting in the fastest training and inference speeds. It requires the fewest convergence steps, only 70k steps in pure memory tasks, and significantly fewer than single models in other tasks, indicating the highest learning efficiency. Regarding policy stability, the stability index of LSTM-PPO is 0.80–0.85, far higher than LSTM and PPO, indicating small reward fluctuations and strong policy reliability during training. Overall, the LSTM model combining sequence features and the PPO model optimized by reinforcement learning policies can give full play to the advantages of both, achieving high success rate, high efficiency, fast convergence and stable policies in different task environments, demonstrating the comprehensive performance advantages of the combined model in complex reinforcement learning scenarios.
- (2)
Comparative Experiment on Public Opinion Risk Trend Prediction
This section the paper selected 432 event topics related to safety of edible agricultural products from public opinion events and chose the “Maqu’er Propylene Glycol Non-compliance” incident as a typical case study. The “Maqu’er Propylene Glycol Non-compliance” public opinion event began on 29 June 2022 and ended on 24 August 2022. Using web crawling technology, indicator data related to this event during its dissemination period were collected, including event duration, number of comments, number of reports, and public attention. The public opinion risk indicator data of the “Maqu’er Propylene Glycol Non-compliance” event were used for subsequent trend prediction experiments. For model training and evaluation, these data were divided into training and testing sets, with a ratio of 80% and 20%, respectively.
Through a comprehensive evaluation of multiple models, this paper selected four typical deep learning prediction models—RNN, LSTM, Transformer, and Autoformer—for comparative experiments, examining their performance differences compared to the LSTM-PPO model proposed in this study for safety of edible agricultural products online public opinion trend prediction tasks. During the experiments, multiple predictions were conducted on the same dataset, and the model’s performance was comprehensively assessed using various metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Symmetric Mean Absolute Percentage Error (SMAPE). By comparing the results of different models on these metrics, we can gain a deeper understanding of the advantages and limitations of each prediction model. The experimental results are presented in
Table 16.
The larger the values of the evaluation metrics for the prediction models, the greater the prediction error. The results in the table clearly indicate that the safety of edible agricultural products online public opinion trend prediction model based on LSTM-PPO outperforms the other four deep prediction models (RNN, LSTM, Transformer, Autoformer), achieving the best performance. In particular, the LSTM-PPO model achieved the highest prediction accuracy across all evaluation metrics, with its MAE, MSE, MAPE, RMSE, and SMAPE values being notably lower than those of the other models. Among them, the MAE of the LSTM-PPO model was the smallest, at only 0.0422, while the MAE values of RNN, LSTM, Transformer, and Autoformer were 0.3814, 0.2643, 0.1123, and 0.0592. This presents that the improved LSTM-PPO model performed best in terms of mean absolute error, with the smallest gap between predicted and actual values, achieving the highest prediction accuracy. The Autoformer model produced error results closest to LSTM-PPO but was still slightly inferior, which proves that the improved PPO model incorporating LSTM can more accurately identify public opinion risk information.
Moreover, the SMAPE values of all prediction models were relatively high, possibly due to the presence of outliers or extreme values in the public opinion prediction data, such as data spikes during short-term public opinion surges. These values may affect the calculation of SMAPE, resulting in larger SMAPE values.
Overall, the excellent predictive performance of LSTM-PPO can be attributed to its model design, which employs sequence-level feature aggregation that better aligns with the continuity of time series, allowing it to capture dependencies within the input sequence more effectively. In addition, the LSTM component enhances the temporal feature representation of public opinion risk data, leading to more accurate predictions. Other models, constrained by their traditional RNN, LSTM, or Transformer structures, fail to fully exploit the characteristics of sequential data. Therefore, an examination of the experimental results reveals that the LSTM-PPO model has higher prediction accuracy and stability compared to other deep prediction models in the safety of edible agricultural products online public opinion trend prediction task, making it a more effective prediction model.
Furthermore, the experiment also visualized the prediction performance of the models by plotting prediction curves to intuitively compare how well the model predictions matched the actual trends. The prediction curve clearly shows the comparison between the predicted and actual values for the LSTM-PPO model, directly reflecting the performance of each model in the safety of edible agricultural products online public opinion trend prediction task, as shown in
Figure 3. Through the model designed in this paper, we can more accurately predict the hot development trends of safety of edible agricultural products time in safety-sensitive domains.
In order to better verify the performance of the LSTM-PPO model, 100 events were extracted from each of the five types of data in the dataset, and the prediction accuracy of the model was calculated. The results are shown in
Figure 4.
Figure 4 illustrates that the LSTM-PPO-based food safety online public opinion trend prediction model demonstrates high prediction accuracy across various event types. In three independent experiments, the model’s average accuracy was 90%, 89%, and 91%, respectively, showing stable overall performance and good robustness and reliability. In terms of event categories, poisoning events showed the highest prediction accuracy at 92%, 91%, and 93%, indicating that these events exhibit clear public opinion characteristics, and the model can easily capture their development trends. Expired events showed slightly lower accuracy at 88%, 87%, and 89%, respectively. Other event categories, such as substandard products, hygiene conditions, and counterfeit products, showed intermediate accuracy, indicating balanced overall performance. Overall, the LSTM-PPO-based food safety online public opinion trend prediction model not only provides high-accuracy prediction results but also possesses the ability to handle various types of public opinion events, providing reliable data support for the design of public opinion guidance and response strategies.