A Novel GDMD-PROMETHEE Algorithm Based on the Maximizing Deviation Method and Social Media Data Mining for Large Group Decision Making

: Multi-attribute group decision making is widely used in the real world, and many scholars have done a lot of research on it. The public’s focus on emergencies can provide an important reference for emergency handling decision making in the social media big data environment. Due to the complexity of emergency handling decision making, the asymmetry of user evaluation information is easy to cause the loss of important information. It is very important to mine valuable information for decision making through online reviews. Then, a generalized extended hybrid distance measure method between the probabilistic linguistic term sets is proposed. Based on this, an extended GDMD-PROMETHEE large-scale multi-attribute group decision-making method is proposed as well, which can be used to decision making under symmetric information and asymmetric information. Firstly, web crawler technology is used to explore the topics of public concern of emergency handling on social media platforms, and use k -means cluster analysis to classify the crawling variables, then the attributes and subjective weights of emergency handling plans are obtained by TF-IDF and Word2vec technology. Secondly, in order to better retain the linguistic evaluation information from decision-makers, a new generalized probabilistic hybrid distance measure method based on Hamming distance is proposed. Considering the difference of decision makers’ evaluation, the objective weight of decision makers is calculated by combining the maximum deviation method with the new extended hybrid Euclidean distance. On this basis, the comprehensive weights of the attributes are calculated by combining subjective and objective factors. Meanwhile, this paper realizes the distance measures and information fusion of probabilistic linguistic term sets under cumulative prospect theory, and the ranking results of the emergency handling plans based on the extended GDMD-PROMETHEE algorithm are given. Finally, the feasibility and effectiveness of the extended GDMD-PROMETHEE algorithm are veriﬁed by the case study of the explosion accident handling decision making of Shanghai “6.18” Petrochemical, and the comparative analyses between the several traditional algorithms demonstrate the extended GDMD-PROMETHEE algorithm is more scientiﬁc and superior in this paper.


Introduction
In recent years, the frequency of unconventional emergencies has been increasing, and such events not only constrain economic and social development, but also pose a serious threat to human livelihood security.Therefore, in the new situation, it is important to focus on improving the emergency management capabilities of emergency management agencies and reducing the adverse effects caused by emergencies.Since emergency decision-making events are uncertain, risky, and variable, different emergency management options need to be developed for different types of events [1][2][3][4].How to decide the best solution among various alternatives is a major problem that needs to be solved urgently and is the research of this paper.
Unlike traditional decision problems, the complexity and asymmetry of large group decision problems and the differences in decision makers' own knowledge level, life experience and research direction lead to the difficulty of decision makers to make accurate judgments on decision options under a short time pressure in the decision process.At this time, they often choose to express their preferences in the form of fuzzy numbers.In the actual multi-attribute decision making (MADM), Herrera et al. [5] extended linguistic forms of decision making to group decision making (GDM), where the use of linguistic terms allows for a convenient and intuitive representation of the evaluator's uncertainty preferences.This allows experts to scientifically weigh the choice of emergency response options for major disaster relief, corporate investment choices and large constructive projects [6].
With the popularity and development of the Internet, more and more social media platforms encourage the public to post their opinions and form text comments on the web, such as Weibo, Douban, AutoZone, and GoWhere.How to help decision makers (DMs) make choices based on text comments after an emergency event is a meaningful study and an essential task of this paper.So far, some scholars have mined and studied the behavior of social media users.Xu et al. [7][8][9] mined the topics of public concern events through social media platforms, introduced the social relationship network of experts, and built a consensus model to complete the selection of alternatives.The traditional GDM with multi-granularity linguistic details, on the other hand, focuses more on the expert side's opinions and loses the original data's complete information [10].In this paper, the study of academic, social network user clustering based on user behavior data, mining the degree of utilization and behavior patterns of different user groups, better retains the integrity of the information, solves the problem of completely unknown attribute weights [11] and helps to understand the information behavior patterns of academic and social network users.
For complex large-group decision problems, the representation and fusion of information are crucial.Many aggregation operators have developed in the literature, such as the ordered weighted average operator (OWA), the induced ordered weighted average operator (IOWA) and so on.Many scholars have also applied foreground theory in different linguistic value situations recently.Gao et al. [12] introduced foreground theory into the probabilistic language environment and proposed a foreground decision method based on a probabilistic linguistic term set (PLTS).Yu Zhang et al. [13] proposed an improved probabilistic linguistic multicriteria compromise solution group decision method PL-VIKOR based on cumulative prospect theory (CPT) and learned ratings.
Determining weights is an essential part of decision making.According to the source of the original data for calculating weights, these methods can be divided into three categories: subjective assignment method, objective assignment method, and combined assignment method.The subjective assignment method is an early and mature method, which determines the weight of attributes according to the importance of the DMs subjectively, and the DMs' subjective judgment obtains the original data based on experience.The commonly used subjective assignment methods are the expert's survey method (Delphi method) [14], analytic hierarchy process [15] (AHP), the binomial coefficient method [16], and ring score methods [17].Furthermore, the original data of the objective assignment method is formed by the actual data of each attribute in the decision scheme.The commonly used objective assignment methods are principal component analysis, entropy value method [18], multi-objective planning method, deviation [19] and mean square difference methods.In order to make the decision results accurate and reliable, scholars propose a third type of assignment method, namely, the subjective-objective integrated assignment method.The subjective-objective assignment method includes the compromise coefficient integrated weighting method, linear-weighted single-objective optimization method, combined assignment method [20], Frank-Wolfe method, etc.In the past decades, MADM methods have been successfully applied in several fields and disciplines, and different MADM methods yield similarity in the final rankings [21].These methods include the technique of preference ranking with similarity to the ideal solution (TOPSIS) [22], VIekriterijumsko KOmpromisno Rangiranje (VIKOR) [13], the preference ranking organization method for enrichment of evaluations (PROMETHEE) [23], to better solve the complex problem.However, many problems in real life have vague and uncertain information, thus leading to the language of probability.In 1965, Zadeh [24] introduced the concept of "fuzzy set," then, Pang [11] and others extended the set of hesitant fuzzy linguistic terms by adding probability values and gave the first definition of the probabilistic linguistic term set.Wang [25] proposed the comparative algorithm of the score function, deviation function, and probabilistic hesitant fuzzy set.In this paper, we use the probabilistic language PROMETHEE, which is an outer ranking method proposed by Brans and Vincle [26] in 1985 for obtaining partial (PROMETHEE I) and complete (PROMETHEE II) rankings of alternatives based on multiple attributes or criteria.
Considering the timeliness of emergency decision making, the weight of each decision expert is more quickly obtained by using maximizing deviation method [27][28][29] in this paper.Gong et al. [30] proposed a method based on cardinal deviations to measure the differences between multiplicative linguistic term sets and combined it with VIKOR.Akram et al. [31] proposed a decision method based on the maximum deviation method by TOPSIS to solve the MADM problem with incomplete attribute weight information.This paper combined the maximum deviation method with PROMETHEE on the basis of mixed distance to solve the multi-attribute group decision-making (MAGDM) problem.
Based on the above discussion, this paper addresses the problem of complex largegroup emergency decision making in the social media big data environment.This paper is organized as follows.In Section 2, we define basic concepts of probabilistic languages and a new generalized extended hybrid distance based on PLTS.In Section 3, we collect public opinions on social media platforms, extract keywords, and explore the attributes of emergency decision-making events as an essential basis for expert evaluation of solutions.Then, we use a combination of subjective and objective weighting models to integrate public opinions with expert decision making by CPT.In Section 4, we provide a specific flow on the GDMD-PROMETHEE algorithm.In Section 5, we verify the validity and feasibility of this paper's method through the "6-18" Shanghai Petrochemical explosion and compare it with other methods.Meanwhile, a sensitivity analysis was conducted.In Section 6, we present conclusions.

Probabilistic Linguistic Term Sets
PLTS is one of the most widely used research tool in MAGDM.In this section, we introduce the basic concepts of linguistic term sets (LTSs) and distance measure between them.On this basis, the basic concepts of PLTSs, as well as distance improvement are given.
a LTS, then different language terms may be used.For example, let S be the following LTS: S 9 = {s 0 = extremely low, s 1 = very low, s 2 = low, s 3 = slightly low, s 4 = f air, s 5 = slightly high, s 6 = high, s 7 = very high, s 8 = extremely high}, s α satisfies the following conditions: 1.
The set is ordered: The negation operator is defined: neg(s α ) = s 2g−α , where s α can be expressed by the linguistic scale transformation function f as: f (s α ) = α/2g, α is the subscript of s α .Definition 2. [1] Let S = {s α |α = 0, 1, 2, • • • , 2g, g ∈ N + } be a LTS, a PLTS can be defined as: where L (k) (p (k) ) denotes the associated probability of the set of linguistic terms L (k) with p (k) ; #L(p) denotes the number of linguistic terms in the set of probabilistic linguistic terms.

Note that if ∑
#L(p) k=1 p (k) = 1, then we have the complete information of probabilistic distribution of all possible linguistic terms; if ∑ #L(p) k=1 p (k) ≤ 1, then partial ignorance exists because current knowledge is not enough to provide complete assessment information, which is not rare in practical GDM problems.Especially, ∑ #L(p) k=1 p (k) = 0 means completely ignorance.Obviously, handling the ignorance of L(p) is a crucial work for the use of PLTSs.
Definition 4. [11] Let L(p) be a PLTS, the score of L(p) is E(L(p)) = s α , where: Definition 5. [11] The deviation degree of L(p) is: where α (k) is the subscript of linguistic term L (k) , given two PLTSs L 1 (p) and L 2 (p) then:

Distance Measures between PLTSs
PLTSs can more accurately represent qualitative information of DMs in complex linguistic environments.However, existing distance measures may distort the original information and lead to unreasonable results.For this reason, a new generalized hybrid distance based on the classical distance is proposed.

Definition 6.
Let L 1 and L (k) 2 are the kth linguistic terms of L 1 (p) and L 2 (p) respectively, p 2 , respectively, then a new probabilistic linguistic distance based on Reference [33] is defined as: Symmetry 2023, 15, 387 2 are the kth linguistic terms of L 1 (p) and L 2 (p) respectively, p (k) 2 are the probabilities of the kth linguistic terms of L 1 (p) and L 2 (p) respectively.Then, the extended Hausdor f f distance is: where f is the linguistic scale function, λ > 0, when λ = 1, the above Equation( 6) is Hamming-Hausdorff distance; when λ = 2, the above Equation is Euclidean-Hausdorff distance.
In MAGDM, when the above distances cannot meet the decision needs, this paper creatively introduces probability-related distances to achieve perfect integration with the probabilistic linguistic, and also fully considers the wishes of each decision maker, the new distance is given as follow.

Definition 8.
Let S = {s 2 ) PLTSs, then the generalized hybrid distance between PLTSs is defined as: From Equation (7), λ ≥ 1, η ∈ [0, 1], the generalized hybrid distance combines the generalized probabilistic linguistic distance and the extended Hausdorff distance through the parameter η.The parameter λ can be considered as the expert's risk attitude, so the proposed distance allows more options for the experts to decide their risk preferences through the parameters.
Theorem 1.Let L 1 (p), L 2 (p) and L 3 (p) be three complete probabilistic linguistic term sets, the three PLTSs are L 3 are the probabilities of the kth linguistic terms in L 1 (p), L 2 (p) and L 3 (p) respectively.Then, the generalized hybrid distance has the following properties: The proof of Theorem 1 is given in Appendix A.

Probabilistic Linguistic CPT 2.3.1. Classical CPT
To better retain the true evaluation information of DMs, probabilistic fusion is performed using CPT.CPT [35] is an improved version of prospect theory (PT) [36] to address stochastic dominance proposed by Tversky et al. in 1992, which well explains phenomena such as stochastic dominance, and its measure of the total value of a prospect through a value function and probability weights.The forms are shown as follows: Combined prospect value: CPT asserts that there exist a strictly increasing weighted value function v(x i ).The value function v(x i ) is defined on the deviations from a reference point, which represents the behavior of the DMs and can be expressed as follows.
Value function: The key difference between CPT and PT is that the weight function used in CPT is no longer a linear function, but an inverse S-shaped curve, indicating that individual decision makers tend to overestimate the possibility of small probability events and underestimate the possibility of medium and high probability events, so the probability weights of gains and losses are formulated as follows.
Weighting function: where b denotes the reference point; ξ, β are the risk attitude coefficients towards value in the face of gain or loss, ξ, β ∈ (0, 1); θ is the loss aversion coefficient, θ > 1; δ, ε are the risk attitude coefficients towards probability weights about gain or loss, δ, ε ∈ (0, 1).Combined with Reference [35], it is generally considered to take ξ = β = 0.88, θ = 2.25, δ = 0.61, ε = 0.69.Considering the risk preferences of DMs facing gains and losses in real problems, CPT gives a specific form of the value function and a form of decision weights, which let it be combined with probabilistic linguistic as follows.It would be more meaningful to integrate CPT into the practical application of GDM.

The Measures between PLTSs Based on CPT
In order to measure probabilistic linguistic terms more accurately, a new probabilistic linguistic terminology measure is obtained by fusing information based on the value function of the relative reference point variables and the probability weight function.Definition 9. [13] The measures between PLTSs based on CPT.The forms are shown as follows: Score value: Symmetry 2023, 15, 387

of 25
Variance value: where α k is the subscript of S, S = {s CPT not only analyzes the risk psychological factors of human in the decision making process.It also considers the value function and probability weight function of the relative reference point variables, which makes up for the shortcomings of PT.

Data Clustering of Large Groups Based on Data Attention
The typical way for the public to express their feelings, views, opinions, etc., is through behavior [7].Public behavior data are mainly divided into operational (interaction) behavior data and content behavior data.The first refers to published texts, while the second mainly involves data on public commenting, liking, and retweeting behaviors.This study uses a Python-based crawler technique to obtain the raw microblog data, which mainly includes the blogger's ID screen name, blog post text, posting time, number of likes, number of retweets, number of comments, and others.The flow of obtaining event attributes and attribute weights are shown in Figure 1.AN , CN , RN , FN , FSN and UGC represent the number of likes, comments, retweets, followers and tweet texts (user-generated content), respectively.Then, each text data is pre-processed using the Python natural language processing package, including word separation, cleaning, lexical annotation, and entity word recognition.Finally, after the data pre-processing, the k -means clustering algorithm is chosen to classify the data based on the public attention level.
In order to obtain the optimal number of clusters in the process of cluster analysis and ensure more scientific results of data classification, the elbow method and the contour coefficient method are generally adopted to determine the optimal value.In contrast, the optimal k value determined by the contour coefficient method is not necessarily optimal.Sometimes it needs to be obtained with the aid of SSE ; therefore, in this paper, we first consider using the elbow method of Equation ( 13) to determine the optimal number of clusters.The core index of the elbow method is sum of the squared errors SSE : where i C is the ith cluster, p is the sample points in i C , i m is the center of mass of i C (the mean of all samples in i C ) and SSE is the clustering error of all models, representing the good or lousy clustering effect.The core idea of the elbow method is that as the number of clusters k increases, the sample division will be finer, and the degree of aggregation of each cluster will gradually increase.Then, SSE will naturally become smaller gradually.First, use Python-based crawler technique to collect a large amount of information about user behavior on social media networks.Each piece of data can be denoted as D = (UGC, AN, CN, RN, FN, FSN).AN, CN, RN, FN, FSN and UGC represent the number of likes, comments, retweets, followers and tweet texts (user-generated content), respectively.Then, each text data is pre-processed using the Python natural language processing package, including word separation, cleaning, lexical annotation, and entity word recognition.Finally, after the data pre-processing, the k-means clustering algorithm is chosen to classify the data based on the public attention level.
In order to obtain the optimal number of clusters in the process of cluster analysis and ensure more scientific results of data classification, the elbow method and the contour coefficient method are generally adopted to determine the optimal value.In contrast, the optimal k value determined by the contour coefficient method is not necessarily optimal.Sometimes it needs to be obtained with the aid of SSE; therefore, in this paper, we first consider using the elbow method of Equation ( 13) to determine the optimal number of clusters.The core index of the elbow method is sum of the squared errors SSE: where C i is the ith cluster, p is the sample points in C i , m i is the center of mass of C i (the mean of all samples in C i ) and SSE is the clustering error of all models, representing the good or lousy clustering effect.The core idea of the elbow method is that as the number of clusters k increases, the sample division will be finer, and the degree of aggregation of each cluster will gradually increase.Then, SSE will naturally become smaller gradually.Each dataset is obtained after classifying the data containing multiple data objects.Based on the information of each data item in the dataset, the attention coefficient of the dataset is calculated, where DS i (AN), DS i (CN), DS i (RN), DS i (FN) and DS i (FSN) denote the average number of likes, average number of comments, the average number of retweets, the average number of followers and average number of fans of the ith dataset, respectively.The denominator n i denotes the number of data in the dataset, and the attention factor γ i formula [7] is as follows: where A linear programming model is developed to maximize the influence of the data, where Z is the influence of the data, A, B, C, D, E determine the number of likes, comments, retweets, followers and followers of the data, respectively, and ∂ a , ∂ c , ∂ r , ∂ f , ∂ s mean the weights of each index, respectively.If the influence of each factor is equal, find the data's maximum influence and the indicator's weight as follows: Obtain the weights of each indicator by solving this linear programming model.The method of using the model to determine the indicator weights is more objective than others.It can effectively avoid the risk of decision making caused by experts' subjective determination of the indicator weights, which makes the method to be more scientific and applicable.

Obtain Attributes and Weights
Once an emergency breaks out, the microblogging platform forms real-time hot topics, and the text of microblogs representing public views proliferates to form a significant data stream.Term frequency-inverse document frequency (TF-IDF) is a widely used keyword extraction technique in the field of data miningand evolved from IDF which is proposed by Sparck Jones [37,38] with heuristic intuition.It is a common weighting technique used in information retrieval and text mining to evaluate the importance of a word in a document collection by considering the word frequency and the inverse document frequency to determine the weight of the keyword.
The specific steps of the algorithm are as follows: Step1.Calculate the word frequency.Word frequency is the number of times a word appears in an article.The word frequency is standardized to facilitate the comparison of different articles and explained the difference in length of the articles.
where n i,j is the number of occurrences of the word in a document d j , and the denominator is the sum of the occurrences of all words in the document d j .
Step2.Calculate inverse document frequency as where |D| is the total number of documents in the corpus and j : t i ∈ d j denotes the number of documents containing the word t i .If the word is not in the corpus, it will result in a denominator of 0. Therefore, in general, 1 + j : t i ∈ d j is used, i.e., Step3.Calculate TF-IDF as Finally, the weights of public attributes are obtained by combining the attention coefficients γ i of the dataset, and the standard decision attribute weights are obtained after normalization.Combining the attention coefficients obtained by Equation ( 16), the weights of public attributes are obtained and normalized to get the standard decision attribute weights.

Determine Expert Subjective Weights Based on Disparity Maximization
In this paper, the idea of disparity maximization is used to determine the weight of each decision.Wang [34] proposed the maximum deviation method to deal with MADM problems with numerical information [39].For the MAGDM problem, if the variance of a DMs' attribute evaluation value is more minor for all solutions, it means that the DMs' decision plays a smaller role in the ranking of solutions; conversely, if the variance of a DMs' attribute evaluation value is larger for all solutions, it means that the DMs' decision plays a larger role in the ranking of solutions, and the DMs should be given a larger weight at this time.This method can motivate DMs' to make an objective and reasonable evaluation of known solutions.
Suppose all the attribute indicators in this paper are benefit-based indicators, which do not need to be normalized.
The specific steps are as follows: Step1.Obtain the decision-making matrix R = (r ij ) n×m from the expert e k .The evaluated value of the alternative A i on C j can be expressed as v k ij , which is expressed in PLTS.
Step2.Based on the maximum deviation method, construct the objective function: Solve this optimal model as a Lagrange function: Symmetry 2023, 15, 387 10 of 25 Derive the partial derivative of Equation ( 23) and let: Find the optimal solution: Step3.Normalize the weights as

Combined Weights
Let w k ij denotes the combined weight of expert e k for alternative A i on the attribute C j , by combining the subjective weight w j with the objective weight w k ij : where α, β are the linear expression coefficients of the combined weights and satisfy When α = 0 and β = 1 only subjective weights are considered in GDM; when α= 1 and β= 0, only objective weights are considered in GDM.

GDMD-PROMETHEE Algorithm Based on CPT
This section provides a new extended PROMETHEE using probabilistic linguistic information, namely the GDMD-PROMETHEE method, to evaluate multi-criterion GDM.Let A i (i = 1, 2, • • • , n) be the alternative, C j (j = 1, 2, • • • , m) be the criterion mined through social media, and E k (k ∈ N + , k ≥ 20) be the decision-making experts from relevant fields.Based on a two-by-two comparison of Based on the above analysis, the specific steps of GDMD-PROMETHEE are as follows: Step1.Combine big data network behavior data to mine event keywords, obtain event evaluation criteria, and use TF-IDF technique to find the subjective weights of event attributes.Step2.Solve the objective weights of experts using Equations ( 22)-( 26) to determine the comprehensive weights w k ij .Step3.Combine Equations ( 12)-( 14) to fuse the probabilistic linguistic evaluation information into specific real values to obtain the fused initial evaluation matrix.Step4.Combine the integrated weights with the initial evaluation matrix to obtain the group evaluation matrix Step5.Calculate the priority indices of two solutions under different attributes as Step6.Construct the dominance matrix for pairwise comparisons between solutions, when the solution is compared with itself, then the dominance ratio is 0.5, and the rest of the cases satisfy r jk + r kj = 1.
Step7.From Equation ( 33), the net flow value φ(i) of each solution is obtained, and the larger φ(i) is, the better the solution is.The outflow φ + (A j ) of A j indicates the extent to which A j outperforms the other (n − 1) scenarios in the set, and the larger the outflow φ + (A j ), the better A j is.The inflow indicates the extent to which the other (n − 1) solutions in the solution set out perform A j .The smaller φ − (A j ), the better A j is.The formulas are as follows: As one of the most widely used ranking methods in MAGDM, PROMETHEE is convenient and flexible to use due to its ease of understanding.Based on this paper, we propose the GDMD-PROMETHEE algorithm based on CPT.

Combined Weights
Let k ij w denotes the combined weight of expert k e for alternative i A on the attribute j C , by combining the subjective weight j w′ with the objective weight k ij w : where α β ， are the linear expression coefficients of the combined weights and satisfy 0 , 1, + =1

GDMD-PROMETHEE Algorithm Based on CPT
This section provides a new extended PROMETHEE using probabilistic linguistic information, namely the GDMD-PROMETHEE method, to evaluate multi-criterion GDM.Let

Case Background
Take the Shanghai Petrochemical explosion on 18 June 2022 as an example to verify the method's feasibility in this paper.At 4:28 pm on 18 June 2022, the chemical department of Shanghai Petrochemical caught fire, and the fireball shot up to the sky with explosions in many places.In order to protect the basic life safety of the public and ensure the emergency command carries out the coordination work quickly.After consulting professional information, four alternatives were identified, and 20 emergency decision-making experts from firefighting, medical, chemical and other related departments evaluated each option in terms of attributes, and the four options were: A 1 Timely understanding of the destruction of the surrounding traffic, communications, power supply, water supply and other facilities, the deployment of drones to draw a 360-degree panoramic map of the explosion site, survey the hidden fire point, determine the rescue route, organize a rescue, reasonable arrangement of firefighting and rescue forces, to protect the safety of people and property.After the fire is extinguished, the organization will organize forces to seal the leak point for repair work to ensure the successful completion of the anti-disaster work.A 2 After the fire, the attacking team was sent to the scene to detect the gas, strengthen the personal protection of rescue personnel and quickly rescue the trapped personnel.Moreover, take the initial battle to control the fire, cooling, and explosion suppression tactical measures, synchronization of multiple fire points and surrounding storage tanks, devices for cooling protection, to prevent heating, pressure and cause secondary fire explosion.A 3 Immediately after discovering the leaking device, stop transmission, close the cut-off valves on both sides of the pipeline leak point, take necessary protective measures for other pipelines near the leaking pipeline and, at the same time, be alert to electricity leakage, highly toxic and highly corrosive substances.Make every effort to help the injured, and take isolation, caution and evacuation measures to avoid extraneous personnel from entering the danger area.Activate the environmental emergency plan and arrange to test the surrounding air and water quality.A 4 To avoid the secondary explosion of unknown hazardous materials, suspend largescale firefighting, dispatch the chemical prevention regiment, nuclear, biological and chemical emergency rescue team to search and rescue the scene in depth, and sample burning materials, according to the composition of burning materials selected to correspond to the firefighting methods.Take anti-leakage and anti-proliferation control measures to prevent the spread of fire.After the fire was controlled, protective burning was implemented.
Using Python to crawl microblog data, keywords such as Shanghai petrochemical fire accident has set up an investigation team, Shanghai petrochemical fire information, aerial photography of Shanghai petrochemical fire scene, Shanghai petrochemical fire latest progress.A total of 1200 pieces of data were extracted; each piece of data consisted of D = (UGC, AN, CN, RN, FN, FSN) pieces of data.Data Availability Statement: The data of this study are available from the authors upon request.Relevant data are available from the "Wei Bo" website (https://weibo.com/(accessed on 18 June 2022)).
After cleaning and filtering the data, about 400 pieces of valid data were retained and used to generate a word cloud map as Figure 3.
After data pre-processing, the number of likes, comments and data as distance measures, the k -means clustering algorithm is app the behavioral big data clustering based on public attention, as the gories of classification increases, the decline of SSE will plummet a as the k -value continues to increase, the elbow method is to select tion point, so as shown in Figure 4, = 3 k should be selected.

Data Analysis
Step1.After data pre-processing, the number of likes, and retweets of the data as distance measures, the k-means clustering algorithm is applied to complete the behavioral big data clustering based on public attention, as the number of categories of classification increases, the decline of SSE will plummet and then level off as the k-value continues to increase, the elbow method is to select that the inflection point, so as shown in Figure 4, k = 3 should be selected.
After data pre-processing, the number of likes, comments and data as distance measures, the k -means clustering algorithm is appli the behavioral big data clustering based on public attention, as the n gories of classification increases, the decline of SSE will plummet an as the k -value continues to increase, the elbow method is to select tion point, so as shown in Figure 4, = 3 k should be selected.After converting the distances into probability distributions using Gaussian distributions in high-dimensional space, determining the optimal number of clusters k = 3 T-SNE Python by reducing the 3D features of high-dimensional data to 2D visualization.The different colors in the diagram represent a small group, and each small group is a category.Blue, green and red represent DS 1 , DS 2 and DS 3 at low dimension, respectively.It makes it possible to maintain the information they carry in high-dimensional, even in low-dimensional space, as shown in Figure 5.Using the linear programming model established by Equation ( 18), th the resulting indicators are calculated as shown in Table 1, and the concer for each data set are obtained according to Equation ( 16), which is presented follows: In this paper, the words with high TF-IDF values are selected as keyw subsequent extensive data use Jieba Python.In order to facilitate the subsequ some words that are not highly related to the emergency event and have dar deleted.For example, the words that are not related to the explosion ar "good night", "takeout", etc.Based on the above analysis, the emerge guidelines and their corresponding keywords considering the topic of publi mega emergencies are shown in Table 3. Determine four attributes where 1 C is "emergency response", including emergency response, cont ning, etc. C is "fire suppression and derivative disaster control", includin Using the linear programming model established by Equation ( 18), the weights of the resulting indicators are calculated as shown in Table 1, and the concern coefficients for each data set are obtained according to Equation ( 16), which is presented in Table 2 as follows: In this paper, the words with high TF-IDF values are selected as keywords for the subsequent extensive data use Jieba Python.In order to facilitate the subsequent analysis, some words that are not highly related to the emergency event and have dark themes are deleted.For example, the words that are not related to the explosion are "original", "good night", "takeout", etc.Based on the above analysis, the emergency decision guidelines and their corresponding keywords considering the topic of public concern for mega emergencies are shown in Table 3. Determine four attributes C j = {C 1 , C 2 , C 3 , C 4 }, where C 1 is "emergency response", including emergency response, control, preplanning, etc. C 2 is "fire suppression and derivative disaster control", including fire, burning, etc. C 3 is "site and surrounding environment detection", including photography, smoke, pollution, etc. C 4 is "casualty and rescue", including injury, death, rescue, etc.The corresponding weights of each attribute are w j = w 1 , w 2 , w 3 , w 4 = {0.250, 0.204, 0.174, 0.372}.Step2.The objective weights are obtained using Equations ( 22)-( 26) as in Table 4, and there is no difference in the deviation values of experts e 7 , e 9 and e 15 , so the weights are assigned to 0. Step3.According to Equation ( 27), the integrated weights are calculated, here let α = β = 0.5, and Table 5 is obtained.Step4.Combined with the four attributes identified in Table 3, the experts gave the ratings in terms of the four attributes under the five-grain language S = {s 0 , s 1 , s 2 , s 3 , s 4 } = {very low, low, fair, high, very high}, due to space issues, the rating matrices of the top two experts are listed, as shown in Tables 6 and 7 below.The evaluation information was fused based on the cumulative Equations ( 11) to (13), then obtain the initial evaluation matrix transformed into real values, and the results are shown in Table 8, additional complementary results are in Appendix B. Step5.The weights were combined with the evaluation information to obtain the normalized group evaluation matrix, as shown in Table 9. Step6.The advantage ratios between the two solutions are calculated using Equations ( 29)- (30), as shown in Table 10.Step7.By calculating the inflow, outflow and net flow for each scenario, the net flow for each scenario is derived and the results are shown in Table 11.By comparing the size of the net flow, the final program ranking: A 4 , that is, the choice of program A 2 : After the fire, the attack team was sent to the scene to detect the gas, strengthen the personal protection of rescue personnel and quickly rescue the trapped personnel.Take the initial battle to control the fire, cooling and explosion suppression tactical measures, simultaneous cooling protection of multiple fire points and surrounding tanks and devices to prevent heating and pressure and cause a fire secondary explosion.

Ranking Results under Different Parameters by the Same Decision Method
For sensitivity analysis, the effect of different sizes of α and β under the combined weights on the ranking results was investigated, where the coefficient α represents the percentage of objective weights and coefficient β represents the percentage of subjective weights.The results are shown in Table 12.

Parameter
Rank As can be seen from Table 12, the optimal solution is A 1 except when α= 1, β= 0 (i.e., only objective weights are considered); in all other cases, the solution ranking results maintain good consistency, i.e., A 2 A 1 A 3 A 4 .It shows that the goodness of the schemes is not affected by the large fluctuations of the parameters regardless of the cases, and the comparison with the method in Reference [40] confirms that the GDMD-PROMETHEE method combining generalized probability distance and Hausdorff is more stable.By observing the scores obtained in Figures 6 and 7, it can be seen that the scores of scenarios and schemes are relatively close under each parameter, but A 2 is the best and A 4 is the worst, and it is obviously undesirable to consider only the objective weights.
(i.e., only objective weights are considered); in all other cases, the solution ranking results maintain good consistency, i.e.,    It shows that the goodness of the schemes is not affected by the large fluctuations of the parameters regardless of the cases, and the comparison with the method in Reference [40] confirms that the GDMD-PROMETHEE method combining generalized probability distance and Hausdorff is more stable.By observing the scores obtained in Figures 6 and 7, it can be seen that the scores of scenarios and schemes are relatively close under each parameter, but 2 A is the best and 4 A is the worst, and it is obviously undesirable to consider only the objective weights.(i.e., only objective weights are considered); in all other cases, the solution ranking results maintain good consistency, i.e.,  A A A A .It shows that the goodness of the schemes is not affected by the large fluctuations of the parameters regardless of the cases, and the comparison with the method in Reference [40] confirms that the GDMD-PROMETHEE method combining generalized probability distance and Hausdorff is more stable.By observing the scores obtained in Figures 6 and 7, it can be seen that the scores of scenarios and schemes are relatively close under each parameter, but 2 A is the best and 4 A is the worst, and it is obviously undesirable to consider only the objective weights.

Comparison the Ranking Results of Different Decision Methods
To verify the validity and feasibility of the model in this paper, the methods of literature [40,41] and TOPSIS analysis were selected to make a comparison of the results, as shown in the following table.
(1) By observing Table 13, it can be obtained that the result of PROMETHEE ranking based on the literature [41] isA 3 A 2 A 1 A 4 , which is different from the result of this paper.The main reason is that this paper considers the weights of individual decision experts.The literature [41] only assigns the same average weight to decision groups.The method of assigning expert weights based on the maximum deviation value extracted from the evaluation of individual decision experts in this paper is more consistent with the individual decision risk levels and attitudes of experts compared to the simple average weight.(2) As can be seen from Table 13, the PROMETHEE method based on CPT has the same ranking results as the traditional TOPSIS method and the literature [40].Combined with Figure 8, it can be seen that the comparative analysis results of the first three methods are relatively consistent, i.e., the best solution A 2 , the worst solution A 4 , which further verifies the validity and reasonableness of the method in this paper.
To verify the validity and feasibility of the model in this paper, the methods of literature [40,41] and TOPSIS analysis were selected to make a comparison of the results, as shown in the following table.
(1) By observing Table 13, it can be obtained that the result of PROMETHEE ranking based on the literature [41] is , which is different from the result of this paper.The main reason is that this paper considers the weights of individual decision experts.The literature [41] only assigns the same average weight to decision groups.The method of assigning expert weights based on the maximum deviation value extracted from the evaluation of individual decision experts in this paper is more consistent with the individual decision risk levels and attitudes of experts compared to the simple average weight.
(2) As can be seen from Table 13, the PROMETHEE method based on CPT has the same ranking results as the traditional TOPSIS method and the literature [40].Combined with Figure 8, it can be seen that the comparative analysis results of the first three methods are relatively consistent, i.e., the best solution 2 A , the worst solution 4 A , which further verifies the validity and reasonableness of the method in this paper.

Conclusions
In this paper, we study emergency decision making in the social media environment in the era of big data and use probabilistic language methods to cluster the decision results.Compared with traditional GDM, this paper not only extracts event attributes from public information but also combines public opinion with weights, which effectively and quickly incorporates public opinion into the final decision information and helps to grasp the actual development of the emergency.A new generalized extended hybrid distance is proposed to determine the objective weights of each decision expert based on the expert

Conclusions
In this paper, we study emergency decision making in the social media environment in the era of big data and use probabilistic language methods to cluster the decision results.Compared with traditional GDM, this paper not only extracts event attributes from public information but also combines public opinion with weights, which effectively and quickly incorporates public opinion into the final decision information and helps to grasp the actual development of the emergency.A new generalized extended hybrid distance is proposed to determine the objective weights of each decision expert based on the expert decision information using the maximum difference method.The decision weight coefficients are used to adjust the proportion of subject, object, and view weights to obtain the total weights.The influence on the decisions made under the weights of different perspectives is studied.Using the CPT to combine the probabilistic linguistic evaluation information with the total weights and finally taking the Shanghai Petrochemical "6.18" explosion as an example, the rationality and feasibility of GDMD-PROMETHEE method are verified.Combining the external influences of public opinion with the influence of each public member in decision making needs to be studied further in future.In addition, the dynamic change process of experts' opinion can be described so that the decision-making process is closer to the actual situation and the decision results are more scientific.

Appendix B
All the results of Table 8 are as follows:

2
are the probabilities of the kth linguistic terms of L 1 (p) and L 2 (p) respectively, α

26 Figure 1 .
Figure 1.Attribute acquisition framework.First, use Python-based crawler technique to collect a large amount of information about user behavior on social media networks.Each piece of data can be denoted as ( , , , , , ) D UGC AN CN RN FN FSN =.AN , CN , RN , FN , FSN and UGC represent the number of likes, comments, retweets, followers and tweet texts (user-generated content), respectively.Then, each text data is pre-processed using the Python natural language processing package, including word separation, cleaning, lexical annotation, and entity word recognition.Finally, after the data pre-processing, the k -means clustering algorithm is chosen to classify the data based on the public attention level.In order to obtain the optimal number of clusters in the process of cluster analysis and ensure more scientific results of data classification, the elbow method and the contour coefficient method are generally adopted to determine the optimal value.In contrast, the optimal k value determined by the contour coefficient method is not necessarily optimal.Sometimes it needs to be obtained with the aid of SSE ; therefore, in this paper, we first consider using the elbow method of Equation (13) to determine the optimal number of clusters.The core index of the elbow method is sum of the squared errors SSE :
jective weights are considered in GDM; when =1 α and =0β , only objective weights are considered in GDM.
be the decision-making experts from relevant fields.Based on a two-by-two comparison of = -PROMETHEE.The flow chart of GDMD-PROMETHEE is shown in Figure 2.

Figure 5 .
Figure 5. Visualization of data clustering results.

Figure 7 .
Figure 7. Scheme scores under different parameter fluctuations.

Figure 7 .
Figure 7. Scheme scores under different parameter fluctuations.Figure 7. Scheme scores under different parameter fluctuations.

Figure 7 .
Figure 7. Scheme scores under different parameter fluctuations.Figure 7. Scheme scores under different parameter fluctuations.

Figure 8 .
Figure 8.Comparison of the results of the four methods.

Figure 8 .
Figure 8.Comparison of the results of the four methods.

Table 1 .
Information on the weight of each index.

Table 2 .
Attention factor for each data set.

Table 1 .
Information on the weight of each index.

Table 2 .
Attention factor for each data set.

Table 3 .
Attributes and weights.

Table 4 .
The Weights of 20 experts.

Table 6 .
The decision matrix given by e 1 .

Table 7 .
The decision matrix given by e 2 .

Table 8 .
The initial evaluation matrix.

Table 9 .
The group evaluation matrices.

Table 10 .
Priority index of each program.

Table 11 .
Ranking of alternatives.

Table 12 .
Comparison of different parameters.

Table 13 .
Comparison results with other literature methods.

Table 13 .
Comparison results with other literature methods.

Table A1 .
The complete evaluation matrix.