A Malware Propagation Model Considering Conformity Psychology in Social Networks

: At present, malware is still a major security threat to computer networks. However, only a fraction of users with some security consciousness take security measures to protect computers on their own initiative, and others who know the current situation through social networks usually follow suit. This phenomenon is referred to as conformity psychology. It is obvious that more users will take countermeasures to prevent computers from being infected if the malware spreads to a certain extent. This paper proposes a deterministic nonlinear SEIQR propagation model to investigate the impact of conformity psychology on malware propagation. Both the local and global stabilities of malware-free equilibrium are proven while the existence and local stability of endemic equilibrium is proven by using the central manifold theory. Additionally, some numerical examples and simulation experiments based on two network datasets are performed to verify the theoretical analysis results. Finally, the sensitivity analysis of system parameters is carried out.


Introduction
Malware is a program that can obtain unauthorized access and perform malicious tasks on a computer system [1].In essence, malware can perform a sequence of operations to obtain control of the operating system so it can interrupt system operations, spy on users, and steal sensitive data [2].With the development of modern malware programs, fileless malware has been developed, which does not need traditional executables to perform its operations.The fileless malware works directly within the memory of the target system instead of the hard drive [3,4].With the application of obfuscation techniques in malware development, the detection of new malware will become even more difficult than ever before [5][6][7].
Much effort has been made over recent years to deal with the threat of malware; however, it is still a severe risk in cyberspace.For example, by the end of 2016, the Mirai virus had infected more than 500,000 devices and performed Distributed Denial of Service (DDoS) attacks against many corporations and governments, including the French data service provider, the major Internet service of America, and a telecommunication service provider in Liberia [8].In 2017, the earliest version of WannaCry was discovered by researchers from Fortinet [9].It attacked more than 230,000 computers in over 150 countries and organizations, ranging from the UK National Health Service, the Bank of China, the Russian Interior Ministry, to FedEx [10].Moreover, some viruses have been developed to launch advanced persistent threats (APTs) on industrial control systems, such as Stuxnet, Industroyer, and Triton [11,12].In this context, it is extremely important to understand the propagation behavior of malware and then propose efficient control strategies to prevent its spread.
In the classical propagation models, the susceptible-infected-recovered (SIR) [13,14] and susceptible-exposed-infected-recovered (SEIR) [15][16][17] models are widely used, which depict the basic propagation process clearly.In [18], the perturbation method was used to obtain the asymptotic solution of the SEIR model.Bentaleb and Amine [19] used the Lyapunov function to prove that the disease-free equilibrium is globally asymptotically stable in the two-strain SEIR model.In [20], Khouzani et al. introduced the optimal control strategy to control the spread of malware.To study the computer virus propagation, the work [21] proposed a novel method that integrated the evolutionary computing paradigm to analyze the nonlinear dynamical behaviors of the model.In [22], the authors carried out numerical simulations on the trend of safety entropy creatively.Meanwhile, some prototype SEIR models, such as the SLBS model, have been investigated [23,24].
Quarantine is an early intervention measure to control the population of infected individuals [25].In the study of malware propagation, some researchers have investigated the quarantine strategy in the SEIR model [26,27].During the propagation of malware, one can quarantine the infected nodes by closing the connection between infected nodes and other nodes [28].In [29], Piqueira and Batistela used the numerical approach to obtain the parameter conditions of two equilibria in both saturated and unsaturated cases of the quarantine population, respectively.However, most existing work neglects the effect of user awareness on malware propagation.
Individuals can be influenced by the behaviors of others and begin to imitate them, which is referred to as conformity psychology [30].In the early stage of malware distribution, people will tell their friends what happened to them when their computers are infected, and their friends will be more alert.When the malware starts to become known to the general public, some people realize the threat of malware and will take some precautions against malware; then more people will follow suit [31].Thus, malware propagation can be controlled by raising user security awareness.In [32], the authors proposed an SEIRS with vaccination and quarantine states (SEIRS-QV) model considering the impacts of user awareness, network delay, and diverse configuration of nodes.Moreover, many researchers considered the impact of user awareness on the spread of malware (e.g., [33,34]).Moreover, social networks as a carrier of information dissemination can affect user awareness by sharing messages about malware.Thus, it is necessary to study the characteristics of social networks in information transmission.
With more and more people chatting online, online social networks have become an important part of people's lives.The trust between online users allows information to spread quickly through social networking applications [35,36].Hence, disseminating information about the spread of malware through social networks can make the public recognize the risk level.Then, the public can consciously take some preventive measures to avoid being infected, such as upgrading the firewall and running the security software.Many researchers focused on the characteristics of social networks [37][38][39].Jia et al. [40] considered the heterogeneity of infection rates and proposed an HSID model to describe the spread of viruses in social networks.Owing to the importance of describing the information dissemination process of social networks, Du and Wang [41] studied a reactiondiffusion malware propagation model with mixed delays.In [42], the authors investigated a fear effect where information about the impact of the virus from different networks can cause people to feel fear and confusion.Clearly, user awareness is an important factor in preventing virus propagation.
This paper aims to explore the impact of conformity psychology on malware propagation.Section 2 gives the description of the formulated model.In Section 3, the dynamic behavior of malware-free equilibrium and endemic equilibrium are explored.In Sections 4-6, the numerical simulations, experimental analysis, and sensitivity analysis are given, respectively.Section 7 presents conclusions to end this work.

Assumptions and Model Formulation
In this section, we classify all the computers into five categories: susceptible, exposed, infected, quarantined, and recovered computers.Susceptible computers mean they are vulnerable to malware.Exposed computers represent a class of computers that have been infected with malware but have not yet broken out and cannot infect others.An infected computer can infect other susceptible computers.Quarantined computers mean computers are disconnected but still alive.Quarantined computers will eventually be transferred to the recovered state and become immune to current malware.Let S(t), E(t), I(t), Q(t), R(t) represent their corresponding densities, respectively, and the equation S(t) + E(t) + I(t) + Q(t) + R(t) = 1 at time t is valid.
In the modeling of malware propagation, the bilinear incidence rate βSI is used to represent the rate of susceptible computers becoming exposed computers, which is affected by the number of susceptible and infected computers.β is denoted as the rate of malware contact transmission and infection.User awareness plays a significant role in controlling the number of infected computers.So, β −mI is usually used to describe the impact of psychology [43,44].Here, m is a non-negative parameter to measure the impact of information dissemination.The effects of information about the number of exposed, infected, and quarantined computers are expressed as m E , m I , and m Q in social networks, respectively.Hence, the effect of conformity is given by M e = e −m E E−m I I−m Q Q .
Then, the following assumptions are made for this model.(A1) Information is spread steadily and evenly on social networks.It is supposed that exposed computers have a lower impact on user awareness than infected and quarantined computers, since the more damage malware causes, the more people worry about the malware.
(A2) Let φ represent the probability of people adopting quarantine to address the problem of infected computers, and let γ denote the probability of quarantined computers reconnecting to the network.
(A3) As computers can deteriorate over time and be physically damaged, the mortality rate µ must be in the model.Suppose that the recruitment rate is equal to the mortality rate.

Remark 1.
As the model will eventually reach a dynamic equilibrium, the total number of nodes will eventually remain stable, which means that the recruitment rate is infinitely close to the mortality rate.If not, the total number will keep decreasing or increasing with time t → ∞.Therefore, the recruitment rate is required to equal the mortality rate in order to maintain the dynamic equilibrium and be consistent with other existing efforts [28,45,46].
The state transition connection of nodes in the model is presented in Figure 1 and the means of parameters are given in Table 1.Note that all parameters in this model are non-negative constants.The corresponding ordinary differential equations are shown as: (1) Thus, the feasible region Ψ of system ( 1) is defined as: which is a positively invariant set, and the system has been normalized.Due to the equation = 1, the system (1) can be written as: (2) In the following parts, both the local and global asymptotic stabilities will be the focus of our discussion.

Stability Analysis of the Equilibria
In this section, we will calculate the basic reproduction number R 0 and explore the local asymptotic stability and global asymptotic stability in the region Ψ.

Local Stability of the Equilibria
The malware-free equilibrium E 0 = (1, 0, 0, 0, 0) is obtained from system (2) in the original state.Here, matrix F and V represent the additional infection terms and the transformation of other terms, respectively.So, we can obtain: The Jacobian matrices of F and V at the malware-free equilibrium E 0 are: According to the matrices F and V, the matrix J = FV −1 can be followed as: The basic reproduction number R 0 of system ( 2) is given exactly by the spectral radius of the matrix: Theorem 1. E 0 is locally asymptotically stable with respect to Proof of Theorem 1.We can obtain the Jacobian matrix of system (2) with respect to malware-free equilibrium: The eigenvalue of J(E 0 ) can be expressed as: and the characteristic equation is: where According to the Vieta theorem, the roots of this characteristic equation are negative real parts only if R 0 < 1.The proof is completed.

Existence and Local Stability of Endemic Equilibrium
Proof of Theorem 2. The endemic equilibrium (E * , I * , Q * , R * ) of system ( 2) is shown as: where and The equation about I * is given by: The Equation (4) can be divided into two equations: Due to 0 ≤ I * ≤ 1, we can see that Equation ( 5) is monotonically decreasing, and its maximum value is 1.Similarly, Equation ( 6) is monotonically increasing, and its minimum value is 1 R 0 .If and only if R 0 ≥ 1, the curves of Equations ( 5) and ( 6) have one point of intersection.It means that the endemic equilibrium exists if and only if R 0 > 1.
Theorem 3. The endemic equilibrium E * of system (2) is locally asymptotically stable if R 0 > 1.
Proof of Theorem 3. In the central manifold theory, we consider parameter β as a bifurcation parameter [47].When R 0 = 1, the bifurcation parameter β is given by: It can be easily verified that Jacobian matrix J at β = β 0 has a right eigenvector (corresponding to the zero eigenvalue) given by W =(ω 1 , ω 2 , ω 3 , ω 4 ) T , where Then, the left eigenvector (corresponding to the zero eigenvalue) is given by V = (v 1 , v 2 , v 3 , v 4 ).Meanwhile, according to the calculation of the equation of VJ = 0 and VW = 1, the solution of vector V can be easily obtained: Hence, we have All of the second-order derivatives are calculated at the malware-free equilibrium and β = β 0 .Then, the solutions of f 1 and f 2 are: where and After the above calculation, we can draw the conclusion that f 1 < 0 and f 2 > 0, a transcritical bifurcation occurs at R 0 = 1.

Global Stability of the Malware-Free Equilibrium
Theorem 4. The malware-free equilibrium E 0 is globally asymptotically stable if R 0 < 1.
Proof of Theorem 4. We use the theorem in [48] to prove the global stability of the malwarefree equilibrium.Let X = (R) and Z = (E, I, Q) denote the uninfected group and the infected group, respectively, where X 0 = (0) and Z 0 = (0, 0, 0).U 0 = (X 0 , Z 0 ) denotes the disease-free equilibrium of this system.Then, the conditions (H1) and (H2) should be satisfied.

Numerical Simulations
In this section, we carry out a series of numerical simulations to demonstrate the dynamic behavior of the SEIQR model and the impact of the parameters on the infected nodes.Here, the initial densities of five components are given as S 0 = 0.45, E 0 = 0.2, I 0 = 0.15, Q 0 = 0.12, and R 0 = 0.08, respectively.The parameter values of the malware propagation model are given in Table 1.Based on Equation (3) and parameters in Table 1, we can obtain R 0 = 0.8483 < 1.
In Figure 2, it is evident that different initial densities of five components converge to the malware-free equilibrium with parameter values in Table 1 and the basic reproduction number R 0 = 0.8483 < 1.However, we further consider the case when malware infections become more powerful (β = 0.1 and α = 0.1), so R 0 = 1.9261 > 1.Then, all solutions converge to the endemic equilibrium as shown in Figure 3. From these curves in Figures 2  and 3, it is obvious that the system is asymptotically stable.2. The impact of conformity psychology plays a crucial role in the dynamic behavior of infected nodes.However, the dissemination of information about malware is influenced by social networks.Thus, we utilize the parameters m E , m I , and m Q to represent the impact of social networks corresponding to exposed, infected, and quarantined computers, respectively.Figure 4 shows the curves about the density of infected nodes, which is influenced by different sets of values for m E , m I , and m Q .Figure 4a shows that the conformity psychology contributes very little to reduce exposed nodes when the density of infected nodes decreases and R 0 < 1.It mean that most people do not care about the malware when it has not infected enough nodes, especially if the malware is not contagious enough.From these curves of Figure 4b, we can see that the density of infected nodes shows a trend of increasing rapidly at the beginning of malware propagation.As the infected nodes reach a size, many computer users hear information about through social networks.Due to the conformity psychology, most users may take some proper security measures to protect their computers.The bigger values of m E , m I , and m Q means that the social network is more powerful in spreading information.Then, many people are aware of the danger of malware and will strengthen the security of their own computers and inform their acquaintances about the protection methods.After that, malware will no longer infect computers on a large scale.In this context, conformity psychology works well in limiting the spread of malware.In Figure 5, we set different values of the quarantine rate φ to study the effectiveness of the quarantine measure.From Figure 5, we can draw a conclusion that the density of infected nodes will decrease with the increase of the value of φ.The higher quarantine rate will reduce the number of infected nodes to avoid more nodes being infected.

Experimental Analysis
In this section, we will perform a series of experiments based on two real datasets.The one dataset consists of 55,863 nodes and 858,490 edges from Reddit hyperlink network [49].The other dataset consists of 81,306 nodes and 1,768,149 edges from Twitter [50].The emulation program will be used to simulate the state transition of computers during malware propagation.
To compare with Section 4, S, E, I, Q, R indicate numerical simulation results, and S r , E r , I r , Q r , R r are the output results of Algorithm 1.The experiments (Examples 1 and 2) will validate the theoretical results with real datasets and analyze the important parameters.In Examples 1 and 2, we will use the Reddit and Twitter datasets to validate the model.
The main algorithm for validating this model is given below:     From Figure 6, we can see that the real dataset and the numerical simulation results are basically the same.With different initial values, the number of nodes in each state eventually tends to stabilize.This means that the stability of the system is not affected by the initial conditions as time passes.This proves the validity of the proposed model.
Figure 7 depicts the evolution of the two kinds of simulations under Example 2, and they both reach stability.This proves the feasibility of the model and the theoretical results in Section 3. From these five plots, we can see that Figure 7a,b show a larger deviation than the remaining three plots after reaching stability.One possible reason for this is that the two are simulated in different ways.

Sensitivity Analysis
R 0 is an important parameter to determine whether the malware will die out after its break-out.If R 0 < 1, the number of infected computers will decrease to zero in a period of time.If R 0 > 1, we will reach the opposite conclusion.Hence, we have to figure out how to reduce R 0 below one by controlling system parameters.By calculating various partial derivatives of R 0 , it is obvious that ∂R 0 ∂β , ∂R 0 ∂α > 0 and ∂R 0 ∂φ , ∂R 0 ∂µ , ∂R 0 ∂η , ∂R 0 0, and can the relationship between R 0 and other parameters.R 0 has an increasing relationship with β and α, but R 0 decreases along with the increase of φ, µ, η, and α.Furthermore, we need the sensitivity of R 0 about different parameters.
Sensitivity analysis is a method that can be used to study the sensitivity of R 0 about system parameters.The estimation of sensitive parameters should be performed with caution because even slight changes in this parameter can result in significant quantitative changes.Therefore, it is important to find out which parameters have a high impact on R 0 through sensitivity analysis.
Definition 1.The normalized forward sensitivity index of the variable R 0 , whose value is dependent on parameter x i , is defined by (see [51,52]): So, we can obtain: A conclusion can be drawn from the above five equations.The increase of β and α will cause R 0 increases, while an increase in η, µ, and φ will lead to a decrease in R 0 .We set up five groups of parameters with different values in Table 3 to evaluate the sensitivity indices of R 0 .Calculating these Equations ( 7)- (11) with these values of parameters, we can obtain Table 4.

Conclusions
In this paper, we briefly introduce malware and some of the damages it causes first.Then, we describe some characteristics of conformity psychology, paving the way for us to study how it affects malware transmission.A deterministic nonlinear SEIQR model considering the conformity psychology in social networks is designed.We calculate the basic reproduction number R 0 and investigate the stability of the two equilibria.There is only one malware-free equilibrium, E 0 , which is locally and globally stable when R 0 < 1 and one endemic equilibrium which is locally stable when R 0 > 1.The simulation results show that conformity psychology plays a great role in preventing the spread of malware.Through the result of sensitivity analysis, we can draw a conclusion that the basic reproduction number is sensitive to the parameters β and µ.Our future work will research more effective methods to control the spread of malware.We are considering complex networks to simulate malware propagation and solve the problems involved.

Figure 1 .
Figure 1.The transfer diagram of the model.

1 Figure 4 .
Figure 4.The density of infected nodes over time under different values of m E , m I , and m Q for R 0 < 1 and R 0 > 1.

16 Figure 5 .
Figure 5.The impact of the quarantined rate φ on the infected nodes with respect to time.

Figure 6
Figure6demonstrates the progression of system (1) under the initial numbers of Example 1 and the parameter values in Table1.
Figure6demonstrates the progression of system (1) under the initial numbers of Example 1 and the parameter values in Table1.

Figure 6 .
Figure 6.Trends of the numbers of different nodes for the Reddit dataset.

Table 2 .
The initial parameter values in our simulations.

Algorithm 1 :
The state transformation of computers on the Internet Input: Input the network G=(v,e) which is given by the set of data and the original number of nodes per state Output: Output the number of nodes in each state at time t * 1 some description; 2 for t=0 to t * with the step of 1 do 3 new nodes will be born with probability µ and all in state S

13 else 14 it
will turn to R with probability α or its state remains unchanged;

Table 3 .
Five groups of parameters with different values.

Table 4 .
The sensitivity indices of R 0 .The sensitivity indices of R 0 Table show R 0 most sensitive to β Cases 1, 3, 4, 5, and µ in Case 2.