Optimization of Big Data Scheduling in Social Networks

In social network big data scheduling, it is easy for target data to conflict in the same data node. Of the different kinds of entropy measures, this paper focuses on the optimization of target entropy. Therefore, this paper presents an optimized method for the scheduling of big data in social networks and also takes into account each task’s amount of data communication during target data transmission to construct a big data scheduling model. Firstly, the task scheduling model is constructed to solve the problem of conflicting target data in the same data node. Next, the necessary conditions for the scheduling of tasks are analyzed. Then, the a periodic task distribution function is calculated. Finally, tasks are scheduled based on the minimum product of the corresponding resource level and the minimum execution time of each task is calculated. Experimental results show that our optimized scheduling model quickly optimizes the scheduling of social network data and solves the problem of strong data collision.


Introduction
Social networking services delivered via the Internet have come to permeate every corner of people's lives [1], including business, academia, entertainment and dating, and their commercial and academic value are gradually increasing. With the development of "Web 2.0" technologies, the amount of data on the Internet has grown explosively [2,3]. However, more of this data is unstructured or semi-structured, rather than structured [4][5][6]. Because of this explosion of data, data sources are no longer singular and such data requires processing to yield meaningful insights [7,8]. Target data can be obtained from massive amounts of social network data using a reasonable scheduling method [9,10] and the problem of realizing accurate and efficient scheduling of massive data in social networks is one that requires analysis. This issue has also received the attention of relevant technical personnel. Finally, we have also seen optimization efforts for cloud systems. Persico et al. propose reproducible methodology to assess performance over cloud platforms [11,12]. Due to the breadth of the research thus far, we propose a new, optimized method for big data scheduling in social networks to provide a theoretical basis for the field of social network data management.

Entropy in Social Networks
The robustness of any defined system can be found by calculating its entropy, which is defined as the system's degree of disorder or, more specifically, the number of microstates the system possesses. Finding the number of these states can be accomplished by the well-known Boltzmann equation given in Equation (1).
where S is the entropy, k b is Boltzmann's constant and W is the number of microstates. Using statistical mechanics, an expression of entropy in terms of probability distribution seems more plausible and practically more versatile, as most statistical problems, such as the network problem, lead to a distribution function of the number of microstates rather than an actual number. Entropy is described through probability distribution in Equation (2).
where P(k) is the probability that the system is in state k. Besides basic entropy, there are three forms of calculated entropy in an information transfer domain: (i) search entropy, (ii) access entropy and (iii) target entropy. In this paper, we focus on access and target entropy. The interpretation of "entropy" as used in social network analysis is controversial. One interpretation of entropy, the degree of robustness of the network, leads to a profound philosophical question-what does "robustness" mean or refer to in a social network? "Robustness", for social networks, implies that social links are dynamic and allowed to be changed with and without restrictions or constraints. This indeed is applicable to all social networks. Some social networks are more dynamic as the restrictions on changes in social links are either minimal or non-existent, such as that of Facebook friends or LinkedIn connections. On the other hand, there also exist constrained social networks where social links are rigid and cannot experience change, such as in the real-life social networks of close families and organized crime syndicates. These links are almost completely rigid and unchanging over time; consequently, social networks built on them possess a small number of possible configurations (shapes). In summary, social networks can be classified into one of two extremes-unconstrained, very robust social networks, which have very low entropy and constrained, very non-robust social networks, which have very high entropy. A network is said to be in equilibrium, a state of no driving force potential for the network to change its configuration, when it has the maximum entropy for a given number of nodes. In this work, we focus on unconstrained robust social networks and try to optimize their target entropies using mass data scheduling techniques.

Partition Calculation for Scheduling Task Volume
First, the amount of massive data scheduling for social networks is calculated and is expanded in a partitioned computing manner. The overall task of scheduling big data in a social network is divided into several subtasks, which are defined as u = [u(0), u(1), · · · , u(e − 1)] k . and the condition a ∼ g(u, µ) is satisfied for the arrangement state. Next, the total number of scheduled tasks is calculated by Equation (3) where H(μ, µ) represents the cost function of the task, g(µ|u) represents the distribution probability of the subtask and ε is the set of all scheduling subtasks, which is calculated by Equation (4) where S is the upper limit on the amount of tasks and the corresponding cost function is calculated by Equation (6) is obtained by substituting Equation (5) whereμ represents the average time when the data is scheduled. Assuming the scheduling time satisfies the condition ∑ µ µ=1μ (µ|u) = 1, the estimation function is defined as Equation (7) H(μ, µ) = |μ − µ| Then, substitute Equation (7) into the task amount calculation formula and obtain Equation (8) Finally, the number of subtasks in scheduling is found by Equation (9).
The above method is used to obtain the subtask for massive data scheduling task of a social network and massive data scheduling model of a social network is constructed based on the data foundation.

Construction of Massive Data Scheduling Model for Social Networks
During the process of scheduling big data in social networks, the existence of a large amount of interfering information and the randomness of data nodes lead to conflicts in data nodes of the scheduling model, which seriously affects accuracy and efficiency [19][20][21]. Therefore, the task quantity of big data communication is calculated based on the transmission state of target data in the server. In addition, an optimized big data scheduling model is constructed to schedule massive data in a social network to avoid target data conflicting in the same data node.
z is defined to indicate the amount of data sent by data nodes per unit of time and γ indicates the fuzzy value of transmission efficiency for data nodes, where γ ∈ [0, 1]. The calculation method of the data efficiency increment coefficient of the data node for a scheduling model is as follows.
A mapping relationship between the massive data scheduling task of a social network and data nodes is constructed based on the above equation. γ = γ 1 , γ 2 , · · · , γ q is defined to represent a data set consisting of all scheduled task quantities in the mapping relationship, where γ i represents the data sent by the i-th data node. Target data sent by the data node T to request is defined as j i , where j = λ 1 j 1 , λ 2 j 2 , · · · , λ q j q and condition U i , j i ∈ [0, 1] is satisfied. The priority level of big data scheduling is λ i . During execution of data scheduling tasks, there is a constraint on the amount of data sent by data nodes [22] and calculation method of data volume is as shown in Equation (11).
where ϕ indicates the efficiency of scheduling tasks. The scheduling tasks are efficient when the correlation between γ and j is small.
The massive data scheduling model for social networks is constructed in Equation (13) min The model constructed above takes into account each task's amount of data communication during transmission of target data. The big data scheduling model is constructed according to the quantity of tasks, which effectively solves the problem of target data conflicting in the same data node.
In big data classification optimization scheduling, it is assumed that the task interval of periodic tasks is A, which is understood as the total time taken to complete the current instance and the next instance of a classification optimization scheduling task [23]. If there are multiple tasks and they are selectively executed, the periodic function selects the task with the smallest task cycle to perform classification optimization scheduling first. If there are m tasks in a periodic task set, the constraints of scalable scheduling for big data cycle tasks are obtained by Equations (14)-(17) where B i (A) represents weight of the task interval A. The periodic big data classification optimization scheduling task set is set to q i . b i , which denotes its optimization scheduling period. Periodic task classification optimizes a constraint D ≤ 1 of the scheduling. In classification and optimization of scheduling of big data, the data does not have periodic tasks and it is impossible to comprehensively study the randomness and contingency of its activities. The stopping time of each periodic task is also difficult to determine [24,25]. Therefore, we use stochastic process theory to describe the occurrence process that is an aperiodic task, and to calculate the distribution function and mathematical expectation that do not rely on the task being periodic.
In big data classification optimization scheduling, aperiodic tasks will be combined with the parameter φA into a Poisson distribution in interval [0, A]. The likelihood function D (y 1 , y 2 , . . . , y m , φ) is obtained by Equation (18) where the mathematical expectation of interval [0, A] is set to φA and φA is also understood as the number of times the mathematical expectation occurs in unit time expressed in φ. y represents a variable of classification optimization scheduling. d represents a scheduling time factor. Construction of the scheduling model is completed by Equation (19). The model considers the task amount of data communication when target data is transmitted [26], which effectively solves the problem of target data conflicts in the same data node.

Analysis of Schedulable Conditions for Massive Data Classification Tasks in Social Networks
According to the above calculation, when big data classification optimization scheduling is performed, the task A is a periodic task with completion time c A , preparation time e A , running time r A and stop time w A . When big data classification optimization scheduling is performed, E is a non-periodic task with task completion time c E and preparation time e E . The scheduling stop time for aperiodic tasks is given by w.
In big data classification optimization scheduling, we define n to be the number of processors and m to be the number of periodic tasks. d m indicates the execution time of the classification optimization scheduling, A m indicates the stop time of the task classification optimization scheduling. g is the number of aperiodic tasks, d Sg is the task execution time and A Sg is the average stop time of the task. From these definitions, the schedulable constraint for big data classification tasks is obtained in Equation (20).
In the optimization scheduling of big data classification, the main ideas for the equalization of data tasks are as follows-When the processor is isomorphic, the number of classification optimization scheduling task types is set to N, where N ≥ 3. In addition, for the task A i in the task type j, j ≤ N. Then, the following relationship is obtained in Equation (21).
When the big data scheduling performs classification optimization, the type of task A i is j. Based on the above calculations, the optimization classification of massive data scheduling is optimized. The detailed process is as follows. Firstly, the constraint conditions of the scalable scheduling for periodic tasks and the distribution function of computational periodic tasks are analyzed. Then, a massive data classification optimization scheduling model in social networks is established to lay the foundation for later optimization.

Big Data Classification and Optimization Scheduling Method in Social Networks
Based on the model in the previous section, when big data scheduling performs classification optimization, the execution value of each task in all social network resources is calculated. This means that each classification optimizes for a product of the corresponding resource level and the minimum execution time of scheduling task [27,28]. Then, the minimum value of this product is obtained and the optimization scheduling of massive data classification in a social network is completed.
It is assumed that the number of m classification optimization scheduling tasks is X = {x 1 , x 2 , . . . , x m } and the number of n social network resources is Y = {y 1 , y 2 , . . . , y n }. Then, we iterate through the following steps until all the sets are empty: 1. In classification optimization of big data scheduling, the optimal minimum time (min time) (x i to y 1 , y 2 , . . . , y n ) is calculated by Equation (22) x i A min (i) = Min (min time(x i )y 1 , y 2 , . . . y n ) where x i A min (i) represents the minimum time consumption of big data classification optimization scheduling. 2. When classifying and optimizing big data, Equation (23) is used to obtain the two-dimensional 3. When big data classification optimization scheduling is performed, com sx [i, j] is sorted to obtain a minimum com sx [i, j]. 4. x i is dispatched to y j when big data classification optimization scheduling.
The principle of big data scheduling optimization in social networks is described using the above calculations, completing the optimization of big data scheduling.

Results
To verify the effectiveness and superiority of our proposed method, we carry out optimization analysis of big social network data scheduling using the ns-2.34 network simulation platform.

Throughput Analysis
Our simulation includes three network interfaces referred to as path A, path B and path C, with bandwidths of 0.35 MB/s, 0.65 MB/s and 0.95 MB/s, respectively; 1.45 MB/s is the bandwidth for the application layer data of the sender.
In the process of social network big data scheduling, we define throughput to be the amount of data successfully transmitted to facilities such as networks and ports. A higher throughput indicates better performance. The average throughput of each path without the proposed method is shown in Figure 1.  Figure 1 shows that the throughputs of the three paths are increasing in the initial test. When the test is performed for 60 to 120 s, then throughputs of the three paths decrease linearly.
The average data throughput of each path in the social network after the scheduling is optimized by this method is shown in Figure 2. What we can clearly seen when comparing Figure 1 to Figure 2 is that across the board the throughput increases. Futhermore, in Figure 1 the linear decrease that was evident later in time, is now eliminated using our model and we also see more of a constant throughput as time increases which was an expected result of the optimization. Figure 2 shows that the throughputs of three paths are stable, and their data throughputs are virtually equal to their bandwidths from about 60 s onwards.

Analysis of Transmission Efficiency
The available bandwidth and delay parameters of each path change faster when scheduling big data in a social network. A massive data scheduling model of social networks should have better adaptability and exhibit excellent data scheduling performance under different network conditions. Therefore, the efficiency and bandwidth consumption of data transmitted by each path, from the perspective of changes in bandwidth, are analyzed before and after the data scheduling model is constructed using this method. The bandwidth consumption of each data transmission path is shown in Table 1. The time taken by each data transmission path to transmit data is shown in Table 2.  Table 1 shows that the transmission bandwidth of path C is bigger the than transmission bandwidth of path B and that the transmission bandwidth of path B is bigger than the transmission bandwidth of path A. Under the above bandwidth, the time consumption of data transmission between three paths is compared. The results are shown in Table 2. In Table 2, after using the method in this paper, although the path bandwidth is constantly changing, data transmission time of path A, path B and path C is stable. Transmission data of path A, path B and path C are approximately 9.3 s, 8.4 s and 8.4 s, respectively.

Performance Comparison
To verify the superiority of this method, its performance in big social network data scheduling optimization experiments is compared against a dynamic organization scheduling method and a greedy algorithm-based scheduling method. These three methods are compared in terms of their performance by the following indicators:

1.
Task response time for three methods. This can be understood as the reaction time when a massive data scheduling task in a social network starts and stops.

2.
Overall time taken for the task completion of three methods. This can be understood as the time it takes for big data scheduling optimization to start after the first task starts and the last task stops.

3.
Efficiency reduction ratio of three methods. This can be understood as a comparison of the response times and actual completion times of the three methods.

4.
Optimized social network resource usage rate. This can be understood as a comparison between the effective sharing of a social network resources and the maximum utilization after optimization of three methods.

5.
Set different amounts of target data, compare three methods of scheduling optimization with the actual number and analyze the comprehensiveness of scheduling optimization for three methods.

6.
Balance. This shows the balance of data nodes. 7.
Frequency normalized value. This is used to evaluate the stability of a scheduling optimization method. The smaller the numerical fluctuation, the stronger the stability of a scheduling optimization method.
The results for scheduling optimization task response time from each of the three methods are shown in Figure 3.    Figure 4 shows that overall completion time of the classification optimization scheduling task for the proposed method is less than 200 ms. The overall completion times for the classification optimization scheduling task for dynamic organization scheduling method and the greedy algorithm based on scheduling method increase with the number of tasks. The overall completion time of their tasks also shows a gradual increase and is greater than that of the proposed method.
Efficiency of scheduling optimization for three methods is lower than that of the comparison. Figure 5 shows that the efficiency reduction ratio of the proposed method is lower than those of the dynamic organization scheduling method and the greedy algorithm-based scheduling method. The social network resource utilization rate after three methods of scheduling optimization are compared and comparison results are shown in Figure 6.  Figure 6 shows that the optimized resource usage rate of the proposed method is as high as 100%, which is greater than the resource usage rates of both the dynamic organization scheduling method and the greedy algorithm-based scheduling method.
Comprehensiveness of the scheduling optimization of three methods is shown in Table 3. Table 3 shows that the maximum difference between the target data volume and the actual data volume is 1 and that its error is small when the proposed method is used. In addition, the maximum difference between the target data volume and the actual data volume when the other two methods are used is greater than that of proposed method.
The balance of massive data in social networks after three methods are optimized for scheduling are compared and comparison results are shown in Figure 7. Figure 7 shows that there is a large gap in the balance of big data in social networks after three methods are optimized for scheduling. After the proposed method is optimized, the balance of massive data in a social network is as high as 0.99, which indicates that there is only 0.01 probability of conflict between data. The other two methods only achieve balance degrees not greater than 0.6.
Distribution of the normalized frequency values when three methods optimize massive data of a social network is shown in Figure 8.    Figure 8 shows that the frequency of the scheduling model optimized by the proposed method is between 0.5 and 0.7 and its fluctuation is small. Meanwhile, the frequency of social network data scheduling model optimized by the dynamic organization scheduling method fluctuates between 0.2 and 0.9 and the upper and lower fluctuations are large. The frequency of social network data scheduling model optimized by the greedy algorithm-based scheduling method fluctuates between 0.28 and 0.95, exhibiting a high fluctuation range. Therefore, the social network massive data scheduling model optimized by the proposed method has strong stability and can be used for big data scheduling and optimization in actual social networks.

Discussion
The results of three path throughput tests are analyzed before using the optimization method in this paper. Experimental results show that throughputs of three paths are increasing at the initial stage of the test but decrease linearly from 60 s to 120 s. Many out-of-order packets exist in the network receiving buffer, occupying a large amount of receiving buffer space and causing a linear reduction in the amount of transmitted data. The throughputs of the three paths after optimization are stable when the test proceeds to about 60 s and the throughput is basically equal to the bandwidth. This phenomenon shows that the maximum possible throughput is achieved after optimization by this method and that bandwidth utilization efficiency is superior.
The three paths satisfy the condition that path C's transmission bandwidth > path B's transmission bandwidth > path A's transmission bandwidth. Under this constraint, the time consumption of data transmission for each of the three paths is compared. It can be seen that after optimization using the proposed method, although the path bandwidth is constantly changing, the data transmission times for the three paths are basically stable. Path A takes approximately 9.3 s to transmit data, Path B takes approximately 8.4 s to transmit data and Path C takes approximately 8.4 s to transmit data. The data shows that massive data scheduling model of a social network optimized by this method adapts to changes in bandwidth. Its data scheduling time is stable, which demonstrates its superior data transmission efficiency.
We compare the performance of the three methods in detail: 1.
The three methods have a large difference in scheduling response time. The task response time of the proposed method increases with the number of tasks and the maximum response time is 20 ms. The maximum time-consuming response time of dynamic organization scheduling method and greedy algorithm-based scheduling method is 48 ms and 64 ms, respectively. The task response time of the proposed method is the shortest and tasks are executed quickly.

2.
The overall completion time of each task under the proposed method is less than 200 ms. Overall, tasks scheduled under the dynamic organization scheduling method and the greedy algorithm based scheduling method take more time as the number of tasks increases. Their task completion time also gradually increases and takes longer than with the proposed method. The overall completion time of tasks by the proposed method is the shortest of the three. 3.
The efficiency reduction ratio of the proposed method is lower than that of the dynamic organization scheduling method and the greedy algorithm based scheduling method. The proposed method exhibits the same efficiency and high stability throughout.

4.
The optimized social network resource usage rate under the proposed method is as high as 100%, which is greater than the resource usage rate of the dynamic organization scheduling method and the greedy algorithm-based scheduling method. This demonstrates that there is no redundant data in big data optimized by the proposed method and its availability is high.

5.
When the proposed method optimizes big data scheduling in social networks, the maximum difference between the target data volume and the actual data volume is one and its error is small. For the other two methods, the maximum difference between the target data volume and the actual data volume is greater than the proposed method. It can be seen that the optimal scheduling of the proposed method is more comprehensive. 6.
There is a big gap in the balance of big data in social networks after the three methods are used. The balance of big data in a social network after scheduling optimization by the proposed method is as high as 0.99, indicating that the probability of conflict between data is only 0.01. After scheduling optimization by the other two methods, the balance of big data in social networks is not more than 0.6, the conflict between data is large and the scheduling optimization process is hindered.

7.
The frequency of the scheduling model optimized by the proposed method is between 0.5 and 0.7 and its fluctuation is small. It has the advantage of high stability compared with the other two methods.
In summary, the scheduling optimization effect of this method is significantly better than the dynamic organization scheduling method and the greedy algorithm based scheduling method. This is because the constraints that can be scheduled for big data cycle tasks in social networks [29,30] are analyzed and the distribution function without periodic tasks, is computed using our proposed method. On this basis, the task processor is prompted to follow classification criteria for the task. A flexible transformation of task assignments has been implemented. The product of each scheduled task response resource level and the minimum execution time is calculated. Therefore, the optimization of big data scheduling in social networks is completed by optimizing for the minimum of this product [31][32][33].

Conclusions
Big data scheduling models of social networks are optimized by our proposed method. The optimization process is as follows-firstly, a big data scheduling model is built. Secondly, the model is optimized to realize optimal scheduling. The social network big data scheduling model constructed in this paper considers the task amount of data communication during target data transmission. Construction of a big data scheduling model based on the amount of tasks effectively solves the problem of conflicting target data in the same data node. Before the method optimizes big data, the necessary conditions for the periodic task schedule ability of social network big data and the calculation of aperiodic task distribution function are analyzed. Experimental results show that the proposed method performs better than a dynamic organization scheduling method and a greedy algorithm-based scheduling method. It adapts to different network environments and solves the problem of load balancing in a short time. In addition, data scheduling and transmission are completed quickly. It provides an effective means for reasonable and efficient scheduling of big data in social networks.