Cloud Storage Strategy of Blockchain Based on Genetic Prediction Dynamic Files

: With the rapid expansion of data volume, traditional data storage methods have been unable to meet the practical application requirements of blockchain cloud storage. Aiming for the cloud storage problem of blockchain, a new storage access method for predicting dynamic file load is proposed. By predicting the load status of cloud storage files in advance, the load of each blockchain data node at the next moment is first estimated. A hierarchical genetic algorithm is used to construct the connection weights between the hidden layer and the output layer, which makes the data network converge faster and more accurate, thereby effectively predicting the node load. In addition, based on the file allocation, an evaluation analysis model is constructed to obtain the time response capability of each file during the allocation process. The node ʹ s periodic load prediction value is used to calculate the corresponding weight of the node and it is continuously updated, retaining the advantages of the static weighted polling algorithm. Combined with the genetic algorithm to help predict the file assignment access strategy of the later load of each node, it can meet the system requirements under complex load conditions and provide a reasonable and effective cloud storage method. The experimental evaluation of the proposed new strategy and new algorithm verifies that the new storage method has a faster response time, more balanced load, and greatly reduced energy consumption.


Introduction
Blockchain is regarded as a public ledger, in which all committed transactions are stored in a chain of blocks. This chain continuously grows when new blocks are appended to it. The blockchain technology has such key characteristics as decentralization, persistency, anonymity, and auditability. Blockchain can work in a decentralized environment, which is enabled by integrating several core technologies such as cryptographic hash, digital signature (based on asymmetric cryptography), and a distributed consensus mechanism. With blockchain technology, a transaction can take place in a decentralized fashion [1][2][3][4][5][6]. During the storage phase of blockchain, the problem to be solved is how to use redundant configuration, distribution, and cloud computing technologies to classify the blockchain data according to certain rules, reducing the storage capacity through filtering and de-duplication, and adding metadata that is easy to retrieve later to achieve low-cost, low-energy, and high-reliability goals. Cloud storage is developed on the basis of cloud computing technology. It regards data storage and management as a core task. Cloud storage can use related software to build on the basis of cluster applications, grid technology, and distributed file systems.
Cloud storage is based on the distributed network technology. It is a data storage method for blockchain with high-efficiency and low-cost. Users do not need to consider the complicated underlying technical details such as storage capacity, storage devices, storage locations, and data characteristics. They can get near-infinite storage space and enterprise-class service quality from their service provider. As shown in Figure 1, the cloud data center is the core module of cloud storage services. It uses distributed technology and parallel databases for various types of data storage. Based on the cloud service level agreement, it provides professional resource services to cloud users through interfaces. The cloud client is the medium through which the user interacts with the system. The user customizes the service through the browser and operates the service within the authority. The function of the cloud service interface is to manage usersʹ authorization, authentication, login, etc., and to manage available computing resources and services. It accepts user requests, forward corresponding programs according to user requests, and dynamically allocates, schedules, and recycles resources.  Under the cloud platform, the data are distributed on a large number of nodes in the form of files. The method and advantages of the file layout will seriously affect the storage access performance of blockchain on the cloud platform system [22,23]. To analyze the dynamic file assignment problem of blockchain data, a mathematical model is used to describe the system model by abstracting files and nodes into two independent collections, so that the file processing under the cloud platform is transformed into the above two collection mapping problems, thereby analyzing and solving the response speed of blockchain storage access and the load balancing problem of the cloud storage system.
The load balancing algorithm can reasonably distribute requested tasks among multiple server nodes, balancing the load of each node basically. Therefore, the algorithm quality directly affects the overall system performance. Load balancing strategies can generally be used in two stages. One is to call a balancing algorithm when a request task arrives. It assigns the task to the appropriate node according to the algorithm; the other is that a node is overloaded in the working state and the algorithm is started. The tasks on the heavily loaded nodes are transferred to the lightly loaded nodes for processing.
Under the blockchain cloud storage platform, the data nodes can be expressed as: where j D represents the node j , and m represents the number of nodes.
The file can be expressed as: In the formula, i f is referred to as a file, and n is the number of files.
According to the file set and the data node set, the mapping relationship between the two is established, which can be expressed as: 11 We normalize the column vectors of matrix A , so We expand ij A by rows and get and then get the maximum eigenvalue of A   In addition, because i f is only stored in j D , so A single file can also be defined as: , , , For any data node, it can also be defined as: , , , where j c , j tr , and j l represent the capacity, read rate, and load of j D , respectively.
Suppose the system request set is: where k r is request k, and | | R is the requested quantity. At this point, the response time of the cloud storage can be obtained: k rt in the formula is the response time of k r .
After calculation, the node load is expressed as:

Load Forecast Analysis
The cloud platform manages and schedules storage, computing, and network resources through the network to implement configuration optimization and resource integration, enabling each user to acquire and use computing resources on demand, minimizing application costs, and maximizing benefits. The realization of this process requires the cloud platform to allocate resources reasonably, and to reduce the waste of resources, but also to meet the resource requirements of the application. This requires us to make relevant predictions on the cloud platform resource load requirements, understand the future trend of resource requirements of application services, prepare for the increase of resource allocation for the arrival of high-load application services, and prevent violations of service level agreement (SLA) of applications.
The load blockchain data under the cloud platform are typical time series data. Before performing short-term load forecast, we normalize all the data first, fit the load data, and then convert quantitative data such as CPU, memory, hard disk, and network to the qualitative concept made up of multiple clouds.
As shown in Figure 2, short-term forecasting of the cloud platform resource load is an indispensable component of its resource optimization configuration process. It can obtain the resource load information of the blockchain application service in the future for a period of time through the relevant prediction model, and provide relevant decision support for the reasonable and systematic configuration of the cloud computing resource. The accuracy of the calculated value of the load prediction is the key to determine whether the cloud platform system can allocate storage resources reasonably and effectively.  Define LB as the load balancing parameter, which is used to evaluate the load balancing status of the cloud platform. Let the number of computing nodes in the cloud computing cluster be m, the number of parameters be n, and the projection center of gravity of the computing nodes in the n-dimensional parameter space be   , , ,..., n G X X X X . The average value of the distance projected from each node to the center of gravity is: Obviously, when the load in the cloud computing cluster is half empty and the nodes are full, the load balance of the cluster is the worst. The average value of the distance between the projection points of each node and the center of gravity in the calculation, and normalized, is defined as the node projection space load balance degree: That is the system load balance degree when task i is assigned to node j. The load balancing degree is an important indicator for measuring the load balancing status of cloud computing clusters, and quantitatively gives the load balancing degree of the cluster. In an ideal case, all computing blockchain nodes at a time have the same load, and their projection points are aggregated into one point. At this time, the load balancing degree of the cluster is 0, the cloud computing cluster is in an ideal load balancing state, and the maximum value of ij LB is 1.
, and the smaller the value, the better the load balance of the current cloud computing cluster. Blockchain storage access under the cloud platform can be done with the load table representing each node, according to ( ) AR n Model, which is calculated as: So, it can be deduced: ( ) AR n model is expressed as the predictive model of 1 2 , , , n     . Combine the least squares method to convert Equation (14) into Y X a    , in which At this point, the following is calculated: To achieve a reasonable predictive load situation, a suitable function evaluation ( ) AR n System model is required. The following functions are adopted: When ( ) FPE n in the case takes the minimum value, it corresponds to the best case for the system ( ) AR n model.

Genetic Prediction File Storage Access Algorithm
In order to make the cloud storage response time as fast as possible, at the same time, each data node in the blockchain system is in a balanced load state as much as possible, and a dynamic file storage access strategy combined with the predicted load state is proposed. In blockchain cloud storage, files are transmitted and stored in batches. Therefore, the new algorithm proposed here distributes files to the corresponding nodes in batch format. When the new file is transmitted, first, the load status and change of each node are queried through the node load table, and then the node load model obtained by the previous analysis is used to estimate the load change of the subsequent cloud storage node. Finally, a genetic algorithm is used to assign the transmitted file to the predicted  nodes, the occurrence of unbalanced nodes is prevented, the system response delay caused by node overload is reduced, and the reliability of the system data is also guaranteed.
Genetic algorithms have good parallel search and global optimization capabilities, and are also commonly used to train neural networks to better approximate the global minimum [24][25][26]. In each generation of population evolution, the individuals with the best fitness values are retained and copied directly to the next generation population. Elite retention strategy is an important guarantee for the convergence of genetic algorithms. It makes the best individuals not get destroyed by various genetic operators, and they can remain in the population until they are replaced by better individuals.
First, if any combination of storage characteristics is generated, the combination of file storage characteristics is directly encoded to form a chromosome individual. This process is repeated until the number of individuals is satisfied, thereby forming a population; then, the intra-class distance between classes is used to calculate the average intra-class distance between the storage feature combinations represented by each chromosome. The average inter-class distance is used as the aggregation degree determination index. The dominant individual is selected from the population by a certain selection algorithm and three different operations, such as replication, hybridization, and mutation operator, are performed with a certain probability to generate a new generation of population individuals. Repeat the above process until there are individuals in the group that meet the given degree of aggregation, or exit the loop after completing the specified number of iterations.
According to the genetic algorithm, the correspondence between the definition file and the node represents the unit individual; for the individual A , only when the following conditions (1) and (2) are true is A used to represent the desired individual.
(1) All files correspond to unique node.
(2) The amount of files assigned to any node should be less than its total load.
Generate a random number  between   0,1 , and if Using the roulette algorithm [27][28][29] in existing populations t p , extract individuals 1 p with 2 p as single point intersections, and then use the obtained new individual variation to obtain a new generation of samples, and select individuals from the current population according to individual fitness to form a new generation of population; use the simple elite strategy to preserve the optimal individual. In the process of evolution, the simple elite strategy always preserves the most adaptive individuals in the population, avoiding them being destroyed in various genetic operations such as mutation, inverse string, and string insertion. If the individualʹs fitness value is found in the current population, which is greater than the individual that is currently the best, it will be saved to replace the current one; finally, use the expectation function to calculate the good individuals that meet the systemʹs rapid response, Factors such as population size N, crossover probability c P , and mutation probability m P are considered in the design algorithm. The basic process is as follows: (1) Use a binary string to encode the search solution space and randomly generate an initial population of N individuals.
(2) Calculate the fitness function value i F of each individual in the population.
(3) Determine whether the fitness function value i F meets the algorithm termination conditions.
If the conditions are met, exit the algorithm, and if otherwise, continue. The improved roulette strategy not only ensures that good individuals can enter the next-generation population with a higher probability, but also gives individuals with low fitness a certain opportunity to choose, without losing individuals, and it ensures the integrity and diversity of the population.

Simulation Environments
To validate the proposed final performance of blockchain cloud storage strategy for genetic prediction dynamic files, a cloud storage verification system [30,31] is built based on Linux. In the experiments, a Linux server with 16GB memory and Intel Core-i7 CPU is utilized to implement the simulation in the blockchain environment. The software environment uses CloudSim [32] to evaluate the improved algorithm through simulation experiments. We rewrite the bindCloudletToVm method in the DatacenterBroker class, and use the Ant tool to add the improved algorithm to the task scheduling unit of the platform for simulation experiments.
CloudSim is a cross-platform open source software that provides cloud computing data center virtualization technology, and also provides a series of interfaces for virtualized cloud modeling and simulation functions. The simulation experiment calls the simulation layer module in CloudSim through the UserCode layer. This layer mainly provides support for the simulation of cloud computing data center environments. The content involves dedicated management interfaces such as virtual machines, memory, storage, and bandwidth in cloud computing data centers. As shown in Figure 3, the Simulation layer can formulate and perform virtual machine deployment plans, perform host-to-virtual machine mapping, and dynamically monitor the system. We create 60 virtual machines as data nodes on the Linux server for scheduling 60-120 restore tasks. Virtual machine to host mapping and distribution is implemented by CloudSimʹs own Time-Shared algorithm. Priority weighting factor  is set to 0.5. Experiments are performed to compare the execution time, packet loss rate, and load balance of the improved and traditional algorithms.

Parameter Configurations of Genetic Algorithm
After the load prediction analysis, the genetic algorithm is adopted to verify the reasonable mapping relationship between files and nodes. We use the improved algorithm proposed in Section 4 to establish an appropriate fitness function based on the elements of node memory, link bandwidth, and transmission path to obtain the optimal solution. In addition, we adopt a penalty function method to protect the survival of infeasible solutions, which can prevent the algorithm from prematurely falling into the local optimal solution to achieve the goal of global convergence. The main process of applying genetic algorithm in cloud storage strategy is shown in Figure 4. In the verification system, the userʹs attribute data space location does not change, and the amount of this data is relatively small. Assuming that the userʹs access rights change, the access control of the network data will also change. In this protection mechanism, the data in the cloud computing environment need to be segmented and divided into minimum attribute units. Then, we need to dynamically encrypt the data object of the smallest attribute unit to obtain the minimum attribute key. During the network operation, the secret key needs to be obtained according to the authorization, and the secret key is parsed according to the decryption method.
In order to verify the feasibility of the improved algorithm in blockchain cloud storage, all data nodes are assigned five mappings (customer, date, supplier, part, and lineorder) with a small amount of data. The nodes in this area include surviving, routing, and newly generated nodes. The corresponding genetic parameter settings are shown in Table 1.

Simulation Results and Analysis
After the source data packets collected by the genetic algorithm convergence node are passed to some nodes in the cloud storage system, based on this part of the nodes, we arrange the historical blockchain data in the cloud storage data volume according to the time sequence into a time stamp sequence. Multi-level hierarchical sampling and storage of historical data are used to ensure the randomness of new samples stored in the blockchain by setting different sampling ratios. During the extraction experiment, the number of data nodes is increased from 15 to 60 (step size is 15), and five mappings and one restoration task were assigned to all data nodes. The details of the cloud storage system verification process are shown in Table 2. Using the genetically predicted file storage access method, all nodes are effectively utilized, and compared with the traditional method in the same experimental situation, the experimental data of four set data sets are recorded, as shown in Figure 5. According to the data in the figure, it can be clearly seen that under the same amount of data, the execution time of the new storage access method is shorter, and the execution time is less affected by the amount of data, which proves that the new method is fast and efficient. The fixed data volume is 80GB , and the data nodes are dynamically changed. The new method and the traditional method are respectively adopted, and the experimental data are obtained as shown in Figure 6. When the number of nodes is the same, the execution time of the new method is faster, and in the case of a change in the number of nodes, there is still a significant speed advantage. Dynamically changing the amount of data, using new methods and traditional methods, the experimental data are shown in Figure 7. It can be seen that the method in this paper is based on the load balancing mechanism, which can prolong the network life cycle and reduce the network packet loss rate. The effect is better than the traditional method.  Table 3 shows the comparison of experimental data of a load balance degree. By comparing the experimental data of the two methods, it is proven that the blockchain cloud storage strategy of genetic forecasting dynamic file can reasonably distribute the node load and effectively improve the load balance, which is also beneficial to consumption control.

Traditional algorithm load balance (%)
Algorithm load balance of blockchain cloud storage based on genetic prediction dynamic file (% )  1  69  97  2  69  97  3  69  97  4  69  96  5  68  96  6  68  96  7  69  96  8  69  96  9 69 96 The information collection module collects the resource usage status information of the nodes in the cloud and submits this information to the message processing module. The message processing module performs quantitative processing on the collected information to obtain the information type that the inference engine can identify. Related rule information infers the load of the nodes in the system, and stores the obtained node load status information in the database. The coordination module selects the nodes that need to perform load balancing from the database, and executes the load balancing decision through the load balancing algorithm module. Then the number of migrated jobs is obtained through the migration strategy. The communication module specifically issues a load balancing instruction to each server.

Conclusions
Aiming at the status quo and existing problems of blockchain cloud storage, a blockchain cloud storage strategy for genetic prediction dynamic files is proposed. First, we establish a blockchain file assignment model, and analyze the node load situation and storage response time. Then, load prediction analysis is carried out. Finally, a genetic algorithm is adopted to assign the transmitted file to the predicted node, thereby realizing a reasonable mapping relationship between files and nodes. Through experimental comparison, the performance superiority of the blockchain cloud storage strategy for genetic forecasting dynamic file is verified. The system storage access speed is faster, the efficiency is higher, and it has good load balance. There are still some optimizations can be applied to minimize the cost of running the genetic algorithm in the blockchain data environment. We need to further optimize the algorithm process and design a faster population evolution method to achieve a more stable and efficient cloud storage strategy.