Next Article in Journal
Analysis of Rewetting Characteristics and Process Parameters in Tobacco Strip Redrying Stage
Next Article in Special Issue
GIS-Based Cluster and Suitability Analysis of Crop Residues: A Case Study in Yangon Region, Myanmar
Previous Article in Journal
Identification of the Kinetic Parameters of Thermal Micro-Organisms Inactivation
Previous Article in Special Issue
Novel Vision Transformer–Based Bi-LSTM Model for LU/LC Prediction—Javadi Hills, India
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Task Scheduling in Remote Sensing Data Acquisition from Open-Access Data Using CloudSim

1
School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China
2
Bohai-Rim Energy Research Institute, Northeast Petroleum University, Qinhuangdao 066004, China
3
School of Computing, Ulster University, Belfast BT15 1ED, UK
4
School of Communication and Electronic Engineering, Qiqihaer University, Qiqihaer 161003, China
5
State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute of Chinese Academy of Sciences, Beijing 100101, China
6
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11508; https://doi.org/10.3390/app122211508
Submission received: 20 September 2022 / Revised: 7 November 2022 / Accepted: 8 November 2022 / Published: 12 November 2022
(This article belongs to the Special Issue Sustainable Agriculture and Advances of Remote Sensing)

Abstract

:
With the rapid development of cloud computing and network technologies, large-scale remote sensing data collection tasks are receiving more interest from individuals and small and medium-sized enterprises. Large-scale remote sensing data collection has its challenges, including less available node resources, short collection time, and lower collection efficiency. Moreover, public remote data sources have restrictions on user settings, such as access to IP, frequency, and bandwidth. In order to satisfy users’ demand for accessing public remote sensing data collection nodes and effectively increase the data collection speed, this paper proposes a TSCD-TSA dynamic task scheduling algorithm that combines the BP neural network prediction algorithm with PSO-based task scheduling algorithms. Comparative experiments were carried out using the proposed task scheduling algorithms on an acquisition task using data from Sentinel2. The experimental results show that the MAX-MAX-PSO dynamic task scheduling algorithm has a smaller fitness value and a faster convergence speed.

1. Introduction

In recent years, with the rapid advancement of satellite remote sensing technology, the number of satellites launched by various countries across the world has been increasing year by year [1]. Low- and medium-resolution satellites used for ocean, atmosphere, environment, social welfare, and scientific research form a large proportion among them at the regional to global scale [2]. Research on massive remote sensing images utilizing deep learning models plays a significant part in urban planning [3,4], environmental observations [5,6], and energy development [7], especially with the recent increasing interest. The collection of data is time-consuming, labor-intensive, and difficult. The amount of data is undergoing explosive growth and increasing at a petabyte level. Taking the Landsat 8 OLI/TIRS Level-2 data type in the United States Geological Survey (USGS) [8] as an example, the China region alone generated about 1500 remote sensing data products per month, each with a data capacity of 2G, in 2019. Remote sensing technology has been unprecedentedly improved in terms of data acquisition capabilities and information service capabilities. In order to realize the sharing service of satellite remote sensing data, the National Aeronautics and Space Administration (NASA) [9] and European Space Agency (ESA) [10] have established remote sensing data-sharing portals for the above public satellites and provide each registered user with equal access to remote sensing data. Therefore, these data-sharing portals have gradually become a popular data source of large amounts of acquired remote sensing data for individuals, small, medium, and micro enterprises, and scientific research institutions. Although these data-sharing portals provide each registered user with an equal opportunity to obtain remote sensing data, the limitations on the access to IP, frequency, and bandwidth are becoming challenges in the large-scale acquisition of remote sensing data for users. Google Earth Engine (GEE) is a very useful resource for large-scale remote sensing data applications [11,12]. Although it is free for research projects, it is limited in capacity for data export. It is also not suitable for uploading private datasets to GEE for processing [13]. GEE cannot satisfy large-scale remote sensing scientific research. Many remote sensing research activities, including remote sensing observations and data assimilation for Earth system models, are not supported by GEE. The rapid development of cloud technologies [14] can be used for processing and integrating large-scale, heterogeneous resources, which provides further opportunities for large-scale massive remote sensing data acquisition.
Early work on task scheduling with parallel computing was conducted by Sarkar in 1989 [15]. Task scheduling has always been a hot research topic in project production and scientific research, and it is also a major research area of interest to researchers. This type of problem is ultimately a non-deterministic polynomial problem (NP) for task scheduling. Generally, task scheduling algorithms consist of traditional task scheduling and meta-heuristic algorithms [16]. Traditional task scheduling algorithms include First Come First Serve (FCFS) [17], Shortest Job First (SJF) [17], Max-Min [18], Min-Min [19], and polling scheduling [20,21]. Maheswaran et al. [22] modified standard heuristics for task assignment in predictable environments. Meta-heuristic task scheduling algorithms include Particle Swarm Optimization (PSO) [23,24], a genetic algorithm (GA) [25], an ant algorithm [26], BP algorithms [27,28], and an adaptive meta-heuristic-based method [29]. Recently, some studies have focused on improved hybrid algorithms based on the above-mentioned algorithms. Elmougy et al. [30] proposed a novel hybrid algorithm that combines SJF and Round Robin (RR) schedulers considering a dynamic variable task quantum. Manasrah and Ali [31] proposed a hybrid GA-PSO algorithm to reduce the makespan and balance the load of dependent tasks over heterogeneous resources. Choudhary et al. [32] proposed an efficient hybrid algorithm combining the gravitational search algorithm and heterogeneous earliest finish time algorithm.
With the significantly increasingly large amounts of data produced by satellites, more studies have been conducted to address the challenges in remote sensing big data storage [33,34] and the efficient and scalable processing of remotely sensed data [1,35,36,37]. PSO is widely used for different remote task scheduling research. An et al. [38] used PSO in task scheduling for unmanned aerial vehicle (UAV) swarm remote sensing in distributed photovoltaic array maintenance. Wu et al. [39] proposed a hybrid algorithm of PSO and a genetic algorithm to improve the dynamic regional splitting planning of a remote sensing satellite swarm. There is a growing need to develop an effective multi-objective scheduling framework that jointly minimizes the task cost, task energy consumption, and execution time. Alkayal et al. [40] developed a multi-objective PSO algorithm to minimize the waiting time and maximize the system throughput. Gabi et al. [41] proposed a multi-objective Quality-of-Service model to address customers’ expectations based on execution time and execution cost criteria. Xing et al. [42] proposed a comprehensive multi-objective model to solve a task scheduling sub-problem by using the Bayes belief model and learnable ant colony optimization. Chen et al. [43] used an evolutionary computation scheduling method for satellite periodic continuous observation task scheduling, and several constraints were considered for the satellite working pattern. In a recent study by Sun et al. [44], an energy-efficient solution for multi-objective task scheduling for the cloud implementation of hyperspectral image classification was proposed. However, only execution time and energy consumption were taken into consideration.
With the emergence of new types of Internet of Things (IoT) devices, the performance service requirements for the network have increased. In addition to the traditional way of using upgraded network bandwidth to avoid network congestion, it can also predict the available performance level of the network by analyzing historical service data so as to select network resources and make adjustments according to actual conditions. Based on the historical data of the network transmission of nodes, it is found that there are relatively few methods to study the law of the network transmission of nodes. Especially in the field of remote sensing data collection, there is little relevant research on predicting the collection speed of each node, guiding dynamic task scheduling, and improving the resource utilization of nodes. This paper mainly studies network-intensive task scheduling. The research on network data transmission problems is mostly aimed at the transmission of a large amount of task data or the higher requirements for network transmission data. In order to reduce the impact of Coflow Completion Time (CCT) on applications in the data parallel framework, in 2018, Yangming Zhao [45] changed the practice of pre-setting coflow composing flow endpoints in traditional research work and proposed a joint online Reducer Placement and Coflow bandwidth scheduling framework to minimize the average CCT in the cloud cluster. In 2019, Duggan et al. [46] used the recurrent neural network sequence prediction algorithm to realize the prediction of the central processing unit (CPU) utilization and network bandwidth utilization of real-time migration in the cloud environment. In order to monitor the network traffic of the network data center, network utilization must be improved. The conventional approach is to modify the hardware or terminal host to increase monitoring resources, but this will increase monitoring overhead and is not conducive to long-term development. Chao et al. [47] proposed a fast, low-overhead image flow monitoring and scheduling system called FlowSeer for data flow mining. The model can accurately and quickly predict the speed and duration of any initial data flow and meet the needs of the dynamic adjustment of data flow-routing strategies.
Many research methods on network performance prediction and network-intensive task scheduling are discussed above, but most of these methods aim to improve the internal execution efficiency of the network, avoid data collisions by scheduling time slots, and achieve the purpose of reducing the network data delay. In order to improve the resource utilization of the collection nodes of remote sensing data collection users and effectively improve the collection speed of a large amount of remote sensing data, this paper uses the backpropagation (BP) neural network algorithm to predict the collection speed of the remote sensing data collection node. The dataset used in this work was obtained from Sentinel2, and the selected area is the whole China Region in 2019 [10]. According to the difference in the speed of the collection task performed by each node at different times, a two-stage combined dynamic task scheduling algorithm (TSCD-TSA) is proposed to match the task set and the resource node set, making dynamic adjustments to achieve the aim of improving the utilization of resource node sets. Comparative experiments were designed to evaluate the performance of applying different improved dynamic task scheduling algorithms, including FCFS-PSO, SJF-PSO, MAX-MAX-PSO, and PSO. In order to evaluate the effectiveness of the proposed optimized task scheduling algorithms, the CloudSim platform was used to evaluate the algorithms. The execution results of different task sets were analyzed.
The rest of this paper is organized as follows: In Section 2, methods for task scheduling and our proposed optimized task scheduling algorithms are introduced. In Section 3, the experiment design to compare the performance of the different proposed optimized task scheduling algorithms is described, and the experimental results are presented and discussed. Finally, the conclusion is drawn in Section 4.

2. Methods

Given the massive amount of public remote sensing data collected using limited resource nodes, this paper is mainly focused on studying the large-scale public remote sensing matching problem between data collection tasks and collection nodes. First, a model of multi-objective task scheduling is proposed with the consideration of time limit constraints, energy consumption constraints, cost constraints, and the task scheduling objective function. Then, an improved multi-objective PSO optimization algorithm is proposed to further improve the allocation efficiency. Finally, multiple groups of simulations for comparative experiments were designed to compare the task execution time, energy consumption, and cost of various allocation schemes under various task scheduling.
The task scheduling architecture for conducting public remote sensing data collection is shown in Figure 1. A web crawler is used to obtain remote sensing data task Uniform Resource Locators (URLs) in batches from public remote sensing data sources and add them to the task pool. The scheduler selects a task set of an appropriate size from the task pool to match the execution resource set and outputs the corresponding scheduling scheme. Each executor selects related tasks for execution in strict accordance with the respective task sequence in the scheduler’s output task scheduling plan, and the prediction module predicts the future execution speed based on the historical execution records of each executor. The scheduler dynamically adjusts the current task scheduling scheme according to the prediction results of the prediction module and the task execution of each actuator (it only reassigns the tasks that have not been executed in the task set; the tasks that have been executed in the task set are not within the scope of rescheduling). For tasks completed by each actuator, for example, the quality of remote sensing data collected by each actuator needs to be verified. Qualified data will then be entered into the storage center, while unqualified tasks need to be re-collected.
According to the scheduling architecture of the open remote sensing data acquisition task in Figure 1, the following assumptions are made during the execution of the task:
  • The actuators perform tasks independently and do not interfere with each other.
  • The choice of the task set size in the task pool is defined by the user, and the number of task sets is usually much larger than the tasks being executed. Users can select an appropriate number of task sets according to the size of each task in the task set and the execution speed of each actuator so as to avoid some time-limited tasks being in a waiting state for a long time.
  • If the task URL has not been validated, each task URL will only be executed by an executor and only once; that is, there is no execution conflict in the task.
  • In order to speed up the training, the prediction module adopts the strategy of incremental learning of the execution history.
  • The update frequency of the scheduling algorithm in the scheduler is similar to that of the prediction module. After the prediction module produces new prediction results, task rescheduling is started immediately.

2.1. Task Scheduling Model

2.1.1. Problem Statement

It is assumed that the task set is T = { t 1 , t 2 , , t m } and the node set is W = { w 1 , w 2 , , w n } . Each task in set T is expressed as t i ( 1 i m ) , and the task scheduling algorithm will match task t i with node w j ( 1 j n ) according to the collection ability of each collection node in collection node set W . The result of static task scheduling can be expressed by t 1 ,   w 1 , t 2 , w 2 , , t i , w j . Dynamic task scheduling aims to dynamically adjust the matching sequence of tasks and nodes, of which the scheduler dynamically reschedules tasks that have not yet been executed according to the current task execution speed of each node.

2.1.2. Multi-Objective Optimization Model

Multi-objective optimization, also known as multi-criteria optimization, whose aim is to optimize multiple sub-objectives at the same time, is considered to be a hot research topic in the field of mathematics and multi-criteria analysis. When a sub-objective is being optimized by multiple objectives, it may affect the target performance of other sub-objectives. Therefore, there is a nonlinear relationship among multiple optimization sub-objectives, which makes it difficult to optimize multiple sub-objectives at the same time.
According to the Pareto optimal solution principle, when there are multiple optimization objectives in a problem at the same time, there are conflicts and incomparable phenomena between standards for the multiple sub-objectives. The solution of the problem may be optimal for one goal, but not for other goals. When solving the objective function of this type of problem, it will inevitably weaken the solution for other objectives. Therefore, this principle is called the non-dominated solution or Pareto solution. The solution to this kind of problem is to convert the corresponding mathematical model to a problem with multi-objective function solutions. The maximal problem with an m-dimensional decision vector and n optimization objectives is usually defined as:
M i n F x = f 1 x ,   f 2 x ,   ,   f n x
in which there is a decision space Q and a target space R , and there are m-dimensional decision variables x = x 1 ,   x 2 , , x m Q and an n-dimensional target vector f 1 ,   f 2 , , f n R . The solution process is to map the decision space through the objective function to project into the target space, that is, Q m R n . For the multi-objective task scheduling based on the time limit, energy consumption, and cost in this work, the problem is transformed into an optimization problem for these three mutually influencing objectives. The solution to this problem is a set of Pareto optimal solutions, and a set consisting of each solution represents a task scheduling scheme.

2.1.3. Multi-Objective Task Scheduling Based on Task Publisher

When the task publisher uses the crowdsourcing model to conduct large-scale public remote sensing data collection, the collection is released on a crowdsourcing platform. It is necessary to select different scheduling strategies according to the needs of different remote sensing data to determine the matching between tasks and nodes. Three task scheduling strategies, namely, the time limit priority, energy priority, and cost priority, are common scheduling strategies. Time-limited task scheduling is activated for particularly urgent task assignments. For normalized task assignments without time requirements, energy and cost constraint task scheduling can be utilized. Suppose m collection nodes are P = P 1 ,   P 2 ,   P m   ; the resources available for each collection node are represented as P i = M i p s i ,   E n e r g y i ,   C o s t i   , among which M i p s i represents the execution speed of node i in node set W for a certain type of task, E n e r g y i represents the energy consumption of node i per unit time, and C o s t i represents the cost of executing unit tasks for node i .
The expression B i ] j ] represents the matching relationship between task i ( i = 1 , 2 , , n ) and node j ( j = 1 , 2 , , m ) , and the assignment of B i ] j ] is shown in Equation (2):
B i ] j ] = 0 Task   i   is   not   assigned   to   node   j 1 Task   i   is   assigned   to   node   j
In Equation (2), the expression B i ] j ] represents the task execution matrix and is defined in Equation (3):
B = B 1 1 B [ n ] 1 B 1 [ m ] B n ] m ]
In Equation (3), each row represents a collection task, and each column represents a node resource. When j = 1 m B i ] j ] = 1 , it indicates that a task can only be assigned to one node.
Considering the large amount of public remote sensing data, the large data capacity, and the different execution speeds of different collection nodes for different types of tasks, the execution time of each task is determined by the length of the task assigned to the node and the execution speed of the node. For task set T = { t 1 , t 2 , , t n } , the purpose of dynamic task scheduling in this paper is to find an optimal task scheduling scheme to assign all tasks to the corresponding collection nodes in order to make the total task completion time the shortest. Since the execution speed of different tasks on the node is different, the execution speed of the task on the node can be represented using the execution time matrix in Equation (4):
E x e c u t e T i m e = T i m e 11 T i m e 21 T i m e n 1 T i m e 12 T i m e 22 T i m e n 2 T i m e 1 m T i m e 2 m T i m e n m
In Equation (4), the symbol T i m e i j ( 1 i n , 1 j m ) represents the execution time of task i on node j , and its calculation equation can be expressed in Equation (5):
T i m e i j = L e n g t h i M i p s j
The symbol L e n g t h i represents the task length of task i , and M i p s j represents the execution speed of collection node j .
When all tasks are distributed and executed, the tasks are executed independently and in parallel on each node. Therefore, the total execution time of the task set is expressed as (6):
S u m T i m e = m a x { i = 1 n B i ] j ] × T i m e [ i , j ] 1 j m }
When all tasks in task set T have been executed, if all nodes obtain tasks at the same time, and the total task execution time is determined by the node with the slowest execution speed, the optimization goal of task scheduling is to find an optimal task allocation scheme to minimize S u m T i m e .
Due to the differences in the machine configuration of each task execution node, there must be a difference in the energy consumption per unit time when the task is executed. It is known that the power of hardware such as motherboards, processors, graphics cards, hard drives, memory, and monitors is the main factor affecting the energy consumption of the machine. The total power is the sum of the power of the above hardware. Therefore, when the task scheduling of energy consumption constraints is carried out in this paper, each node is used for a period of time. The energy consumption of running tasks is used as an index to evaluate the energy consumption of each execution node. The total energy consumption of node set P to complete task set T can be expressed as:
S u m E n e r g y = j = 1 m i = 1 n B i j × E n e r g y i j
in which E n e r g y i j = E n e r g y j × T i m e i j , and it represents energy consumption for the total execution time of task set T j on node set P j . E n e r g y j represents the energy consumption on node P j in unit time, that is, the total power of the node. T i m e i j represents the task execution time of T j on node P j . The task scheduling of energy consumption constraints aims to find a set of optimal task allocation schemes to minimize S u m E n e r g y .
Since each node participating in the collection of public remote sensing data is obtained through crowdsourcing, the task initiator needs to pay remuneration to crowdsourcing users. According to the different abilities of each node to perform tasks, the unit task cost obtained is different. The total cost of the task publisher using node resource set P to complete a batch of tasks T can be expressed as:
S u m C o s t = j = 1 m i = 1 n B i j × C o s t i j
In the equation above, C o s t i j = C o s t j × c o u n t represents the total cost of executing task set T j on node set P j , where C o s t j represents the unit task cost of node P j , and c o u n t represents the number of tasks allocated to node P j . The task scheduling of bundles aims to find a set of optimal task assignments that minimize S u m C o s t .
From the task issuer’s perspective, in this work, when carrying out multi-objective task scheduling of the time limit, energy consumption, and cost, the goal of task scheduling is to map the task set to the appropriate collection node so as to achieve the purpose of realizing the lowest energy consumption and lowest cost. Therefore, the task scheduling objective function can be expressed as:
M i n S u m T i m e ,   S u m E n e r g y ,   S u m C o s t
S u m T i m e represents the total task execution time, S u m E n e r g y represents the total energy consumption of the task, and S u m C o s t indicates the total cost of the task.

2.2. PSO Optimization Algorithm and PSO-BP Algorithm

Improved PSO Optimization Algorithm

In order to solve the problem of slow particle optimization speed when the traditional PSO optimization algorithm uses the random initialization of particles, this paper proposes the use of the FCFS, SJF, and MAX_MAX (maximum resources required allocated to maximum capacity) algorithms to initialize the particles, which changes the traditional random initialization method. The specific method is to first use the FCFS, SJF, and MAX_MAX algorithms to perform task scheduling ahead of the PSO optimization algorithm and then execute the PSO optimization algorithm based on the scheduling results.
The pseudo-code of the improved PSO optimization algorithm (Algorithm 1) is as follows:
Algorithm 1: Using FCFS algorithm, SJF algorithm, or MAX_MAX algorithm to initialize the particle population.
DO
  FOR Particle i
       Use Equation (10) to calculate the fitness value of the particle
  IF The fitness value of the particle at the current position is better than the historical best value of the local particle
       Update the local best individual with the current particles
 END
  IF The current local particle is better than the global best particle of the population
       Update the global particles with the current local particle fitness value
  FOR Particle i
          Update the position and velocity of particles according to Formula (12)
 END
WHILE Achieve the maximum number of iterations
The advantage of using the above-mentioned improved PSO optimization algorithm is that after applying FCFS, SJF, and MAX_MAX algorithm scheduling, it optimizes the scheduling results of the PSO algorithm, which can speed up the optimization of the entire algorithm. Among them, the fitness function can be expressed as F i t n e s s :
F i t n e s s = α · M a k e s p a n p o s i t i o n + 1     α · M a k e s p a n p o s i t i o n
In Equation (10), the expression M a k e s p a n p o s i t i o n represents the total task completion time of the particles under the current allocation plan, which can be expressed as follows:
M a k e s p a n p o s i t i o n = Max p 1 , p 2 , , p m
v i k + 1 = w · v i k + c 1 r 1 p i x i k ] + c 2 r 2 p g x i k ] x i k + 1 = x i k + v i k , i = 1 , 2 , , N
The position and velocity of particles in the PSO algorithm are updated using Equation (12). The current iteration number of the particle is represented by k , and the inertia weight coefficient is represented by w . The larger the value of the inertia weight, the stronger the global search ability of the particle; otherwise, the stronger the local search ability. c 1 and c 2 are learning factors. c 1 describes whether the particle is affected by the individual extreme value, and it enables the particle to have a global search ability to avoid falling into a local solution. c 2 represents whether the particle is affected by the global extreme value. r 1 and r 2 are random numbers in the range (0,1). w , c 1 , and c 2 jointly determine the space-searching ability of the particle. The position of the optimal fitness value calculated by the particle in the iterative process is represented by the individual extreme value, which is expressed as p i = ( p i 1 , p i 2 , , p i D ) , i = 1 , 2 , , N . The position of the optimal fitness calculated by all particles in the population during the iteration process is represented by the global extreme, which is expressed as p g = ( p g 1 , p g 2 , , p g D ) .

2.3. TSCD-TSA Dynamic Task Scheduling Algorithm

Taking into account the differences in the network performance of each task execution node, it has different speeds when performing public remote sensing data collection tasks. In order to realize the prediction of the execution speed of each node task and to provide the basis for the node task execution ability for dynamic task scheduling, TSCD-TSA, combining a node network transmission speed prediction algorithm and a task scheduling algorithm, is proposed. The principle of TSCD-TSA is as follows: Firstly, the BP neural network algorithm is used to assist the timer in dynamically predicting the network transmission capacity of the node, and then the prediction results are used in the dynamic FCFS_PSO algorithm, dynamic SJF_PSO, dynamic MAX_MAX_PSO, and dynamic PSO algorithms to achieve dynamic task scheduling.
The TSCD-TSA dynamic task scheduling algorithm has the following rules during execution:
  • The update frequency of the BP neural network algorithm and dynamic task scheduling algorithm needs to be consistent.
  • The BP neural network algorithm is used to adopt a data incremental prediction strategy when predicting; that is, the training data of the algorithm are always taken from the dataset within the most recent period of time to avoid the excessively long training time of the algorithm when the historical collection of data of the node is too large.
  • When the dynamic task scheduling algorithm is executed, the scheduling result is updated, and tasks are only reallocated if they have been allocated but the node has not yet started execution. (That is, the result of dynamic task scheduling is the dynamic reallocation of those tasks that have been pre-allocated but not executed.)

3. Experiment and Results Analysis

3.1. Experimental Environment

CloudSim is a cloud simulation platform launched in 2009 by the University of Melbourne’s GRIDS Laboratory [48]. The initial goal of the platform design is to provide a cloud computing platform. Only simulation can complete the control and use of cloud computing service resources, which is convenient for users aiming to complete the deployment of related services and strategies. It is an open-source project developed using the JAVA language. It can run on multiple operating systems, such as Windows and Linux, and users can change its source code or add related functions according to their needs. The platform not only provides the definition of infrastructure in cloud computing but also provides a simulation interface for resource management and task scheduling in cloud computing. The simulation platform mainly has the following core classes: Cloudlet, DataCenter, DataCenterBroker, Host, VirtualMachine, VMScheduler, VMCharacteristics, VMMAllocationPolicy, and VMProvisioner, and the framework provides good interface extension services that can meet task scheduling needs. The version of CloudSim used in this work is 3.0.3. A number of scheduling algorithm classes have been developed under the original framework, including FCFS-PSO, SJF-PSO, Max-Max-PSO, and PSO.

3.2. Simulation Experiment

The Sentinel2 acquisition task was used to realize the simulation algorithm experiment of dynamic task scheduling. The task set was taken from one of the four resource nodes, and detailed information about task length and task execution nodes is shown in Table 1. The download power is the average download speed of the node performing this type of task over a period of time. The energy consumption per unit time represents the total power of the node when performing tasks, and the unit task cost represents the node’s cost for this type of task.
In order to compare the task performance under different algorithms, FCFS-PSO, SJF-PSO, MAX-MAX-PSO, and PSO were each implemented. The maximum iteration number of each algorithm is 500, the number of particles is 20, the number of tasks is 50, 100, and 150, and the number of task nodes is 4. The node network transmission capacity prediction module adopts the BP neural network algorithm, and the update frequency of the prediction algorithm and the task scheduling algorithm is set as 3 min. Table 2 lists the total working hours of each node and the number of tasks assigned to each node for different algorithm scheduling results.
In Table 3, the scheduling results of different algorithms are compared on the basis of the five indexes of the average task execution time, best fitness value, total task execution time, total task energy consumption, and total task cost under different scheduling algorithms.
As shown in Figure 2, Node 4, with the weakest download force, had the least number of tasks assigned but the longest working time. Node 2 and Node 3, with similar differences in download power, basically maintained the same working time during the execution of tasks. Nodes with similar download power had little difference in the number of tasks assigned. Therefore, the download power of nodes is positively correlated with the working time for executing tasks and the number of tasks assigned.
Figure 3 shows the optimal fitness value of the particle for different task scheduling results when the task number is 50, 100, and 150. As can be seen in Figure 3, with the increase in the number of tasks, the optimal fitness value of the MAX-MAX-PSO algorithm presents a decreasing trend, which fully indicates that this algorithm is more suitable for the scheduling of a large number of long tasks.
After many experiments and statistical average calculations, Figure 4 presents the convergence curve of four task scheduling algorithms. It can be seen in Figure 4 that the MAX-MAX-PSO algorithm always has the fastest convergence speed.

4. Discussion

Based on the optimization results, MAX-MAX-PSO is the best algorithm, which has the fastest convergence speed and lowest optimal fitness value compared with the convergence speed and optimal fitness value of the other proposed algorithms. In order to make better comparisons between these four different algorithms, the details of the task execution results are presented in Figure 5, where the total task execution time, total task energy consumption, total task cost, average task execution time, average task energy consumption, and average task cost are visualized and compared for three different task numbers: 50, 100, and 150.
As shown in Figure 5, when the number of tasks is the same, no matter which task scheduling algorithm is used, the total task execution time, the total energy consumption of the task, and the total cost of the task change very little with the different proposed algorithms. The values of the PSO algorithm are slightly lower than those of FCFS_PSO, SJF_PSO, and MAX_MAX_PSO. In addition, under the condition of the same number of tasks, the average time of the FCFS_PSO algorithm and the PSO algorithm is almost the same. As the number of tasks increases, the average task time, the average energy consumption of tasks, and the average cost of tasks show a decreasing trend.

5. Conclusions

This paper first reviews the research on network performance prediction and network-intensive task scheduling. Due to the limitations of the service capabilities of data sources and the constraints of data user node collection capabilities, large-scale public remote sensing data collection has low efficiency and low user collection node utilization. In order to solve these problems, we propose a task scheduling model for open remote sensing data acquisition. An improved PSO-BP algorithm is proposed by improving the inertia weight and learning factor and introducing the dynamic precision adjustment function. In order to address the challenges in the dynamic task allocation of a large amount of remote sensing data, especially to improve the resource utilization of the collection nodes and collection speed, a multi-objective task scheduling model was established. With the consideration of the difference in the speed of the collection task performed by each node at different times, the TSCD-TSA dynamic task scheduling algorithm was developed to improve the traditional PSO optimization algorithm by using FCFS, MAX-MAX, and SJF algorithms. Comparative simulation experiments of four dynamic task scheduling algorithms, namely, FCFS-PSO, SJF-PSO, MAX-MAX-PSO, and PSO, were carried out on the CloudSim platform with the Sentinel2 acquisition task and the BP neural network algorithm as the prediction algorithm of the acquisition node network transmission capacity. The task execution of each node of various task scheduling algorithms was analyzed and compared, and the fitness value and convergence of various algorithms were analyzed. The experimental results show that with the increase in the number of tasks, the fitness value of the MAX-MAX-PSO algorithm presents a decreasing trend, and the convergence speed is obviously accelerated. In the future, we would like to establish a crowdsourced node network transmission capacity prediction model and use deep-learning-based algorithms to compare with the current algorithms by analyzing the characteristic system of node network transmission capability.

Author Contributions

Conceptualization, Z.W. and L.B.; methodology, Z.W. and L.B.; software, X.L. and Y.C.; validation, X.L., Y.C. and M.Z.; formal analysis, X.L.; investigation, L.B. and Z.W.; resources, L.B., Z.W. and X.L.; data curation, X.L.; writing—original draft preparation, L.B. and X.L.; writing—review and editing, Z.W., L.B., X.L., Y.C., M.Z. and J.T.; visualization, X.L.; supervision, Z.W. and L.B.; project administration, J.T.; funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by TUOHAI special project 2020 of the Bohai Rim Energy Research Institute of Northeast Petroleum University under Grant HBHZX202002 and the Project of Excellent and Middle-aged Scientific Research Innovation Team of Northeast Petroleum University under Grant KYCXTD201903.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ma, Y.; Wu, H.; Wang, L.; Huang, B.; Ranjan, R.; Zomaya, A.; Jie, W. Remote sensing big data computing: Challenges and opportunities. Future Gener. Comput. Syst. 2015, 51, 47–60. [Google Scholar] [CrossRef] [Green Version]
  2. Ma, Y.; Wang, L.; Zomaya, A.Y.; Chen, D.; Ranjan, R. Task-tree based large-scFale mosaicking for massive remote sensed imageries with dynamic dag scheduling. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 2126–2137. [Google Scholar] [CrossRef]
  3. Wellmann, T.; Lausch, A.; Andersson, E.; Knapp, S.; Cortinovis, C.; Jache, J.; Scheuer, S.; Kremer, P.; Mascarenhas, A.; Kraemer, R. Remote sensing in urban planning: Contributions towards ecologically sound policies? Landsc. Urban Plan. 2020, 204, 103921. [Google Scholar] [CrossRef]
  4. Zhu, M.; Wang, Z.; Bai, L.; Zhang, J.; Tao, J.; Chen, L. Detection of industrial storage tanks at the city-level from optical satellite remote sensing images. In Proceedings of the Image and Signal Processing for Remote Sensing XXVII, Online, 12 September 2021; SPIE: Bellingham, WA, USA, 2021; Volume 11862, pp. 254–260. [Google Scholar]
  5. Barrett, E.C. Introduction to Environmental Remote Sensing; Routledge: London, UK, 2013; ISBN 0203761030. [Google Scholar]
  6. Zhang, J.; Wang, Z.; Bai, L.; Song, G.; Tao, J.; Chen, L. Deforestation Detection Based on U-Net and LSTM in Optical Satellite Remote Sensing Images. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3753–3756. [Google Scholar]
  7. Wang, Z.; Bai, L.; Song, G.; Zhang, J.; Tao, J.; Mulvenna, M.D.; Bond, R.R.; Chen, L. An oil well dataset derived from satellite-based remote sensing. Remote Sens. 2021, 13, 1132. [Google Scholar] [CrossRef]
  8. USGS. Available online: https://www.usgs.gov/ (accessed on 5 November 2022).
  9. NASA. Available online: https://ladsweb.modaps.eosdis.nasa.gov/search/ (accessed on 5 November 2022).
  10. ESA. Available online: https://scihub.copernicus.eu/dhus/#/home (accessed on 5 November 2022).
  11. Zhao, Q.; Yu, L.; Li, X.; Peng, D.; Zhang, Y.; Gong, P. Progress and trends in the application of Google Earth and Google Earth Engine. Remote Sens. 2021, 13, 3778. [Google Scholar] [CrossRef]
  12. Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S. Google earth engine cloud computing platform for remote sensing big data applications: A comprehensive review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
  13. Xu, C.; Du, X.; Yan, Z.; Fan, X. ScienceEarth: A Big Data Platform for Remote Sensing Data Processing. Remote Sens. 2020, 12, 607. [Google Scholar] [CrossRef] [Green Version]
  14. Houssein, E.H.; Gad, A.G.; Wazery, Y.M.; Suganthan, P.N. Task scheduling in cloud computing based on meta-heuristics: Review, taxonomy, open challenges, and future trends. Swarm Evol. Comput. 2021, 62, 100841. [Google Scholar] [CrossRef]
  15. Sarkar, V. Determining average program execution times and their variance. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation, Portland, OR, USA, 19–23 June 1989; pp. 298–312. [Google Scholar]
  16. Soltani, N.; Soleimani, B.; Barekatain, B. Heuristic Algorithms for Task Scheduling in Cloud Computing: A Survey. Int. J. Comput. Netw. Inf. Secur. 2017, 9, 16–22. [Google Scholar] [CrossRef] [Green Version]
  17. Aladwani, T. Types of task scheduling algorithms in cloud computing environment. In Scheduling Problems: New Applications and Trends; IntechOpen: London, UK, 2020. [Google Scholar]
  18. Mao, Y.; Chen, X.; Li, X. Max–min task scheduling algorithm for load balance in cloud computing. In Advances in Intelligent Systems and Computing, Proceedings of the International Conference on Computer Science and Information Technology, Kunming, China, 21–23 September 2013; Springer: New Delhi, India, 2014; pp. 457–465. [Google Scholar]
  19. Anousha, S.; Ahmadi, M. An improved Min-Min task scheduling algorithm in grid computing. In Lecture Notes in Computer Science, Proceedings of the International Conference on Grid and Pervasive Computing, Seoul, Korea, 9–11 May 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 103–113. [Google Scholar]
  20. Raj, A.; Kaur, K.; Dutta, U.; Sandeep, V.V.; Rao, S. Enhancement of hadoop clusters with virtualization using the capacity scheduler. In Proceedings of the 2012 Third International Conference on Services in Emerging Markets, Mysore, India, 12–15 December 2012; pp. 50–57. [Google Scholar]
  21. Yadav, R.K.; Mishra, A.K.; Prakash, N.; Sharma, H. An improved round robin scheduling algorithm for CPU scheduling. Int. J. Comput. Sci. Eng. 2010, 2, 1064–1066. [Google Scholar]
  22. Casanova, H.; Legrand, A.; Zagorodnov, D.; Berman, F. Heuristics for scheduling parameter sweep applications in grid environments. In Proceedings of the Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No. PR00556), Cancun, Mexico, 1 May 2000; pp. 349–363. [Google Scholar]
  23. Xu, X.; Xue, S.; Shi, W. A Heuristic Scheduling Algorithm based on PSO in the Cloud Computing Environment. Int. J. u-and e-Serv. 2016, 9, 349–362. [Google Scholar] [CrossRef]
  24. Fong, S.; Wong, R.; Vasilakos, A.V. Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Trans. Serv. Comput. 2016, 9, 33–45. [Google Scholar] [CrossRef]
  25. Hamad, S.A.; Omara, F.A. Genetic-based task scheduling algorithm in cloud computing environment. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 550–556. [Google Scholar]
  26. Fidanova, S.; Durchova, M. Ant algorithm for grid scheduling problem. In Lecture Notes in Computer Science, Proceedings of the International Conference on Large-Scale Scientific Computing, Sozopol, Bulgaria, 6–10 June 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 405–412. [Google Scholar]
  27. Tripathy, B.; Dash, S.; Padhy, S.K. Dynamic task scheduling using a directed neural network. J. Parallel Distrib. Comput. 2015, 75, 101–106. [Google Scholar] [CrossRef]
  28. Jena, R.K. Multi objective task scheduling in cloud environment using nested PSO framework. Procedia Comput. Sci. 2015, 57, 1219–1227. [Google Scholar] [CrossRef] [Green Version]
  29. Kiani, F.; Seyyedabbasi, A.; Nematzadeh, S.; Candan, F.; Çevik, T.; Anka, F.A.; Randazzo, G.; Lanza, S.; Muzirafuti, A. Adaptive Metaheuristic-Based Methods for Autonomous Robot Path Planning: Sustainable Agricultural Applications. Appl. Sci. 2022, 12, 943. [Google Scholar] [CrossRef]
  30. Elmougy, S.; Sarhan, S.; Joundy, M. A novel hybrid of Shortest job first and round Robin with dynamic variable quantum time task scheduling technique. J. Cloud Comput. 2017, 6, 12. [Google Scholar] [CrossRef] [Green Version]
  31. Manasrah, A.M.; Ba Ali, H. Workflow Scheduling Using Hybrid GA-PSO Algorithm in Cloud Computing. Wirel. Commun. Mob. Comput. 2018, 2018, 1934784. [Google Scholar] [CrossRef] [Green Version]
  32. Choudhary, A.; Gupta, I.; Singh, V.; Jana, P.K. A GSA based hybrid algorithm for bi-objective workflow scheduling in cloud computing. Future Gener. Comput. Syst. 2018, 83, 14–26. [Google Scholar] [CrossRef]
  33. Huang, Y.; Chen, Z.; Tao, Y.U.; Huang, X.; Gu, X. Agricultural remote sensing big data: Management and applications. J. Integr. Agric. 2018, 17, 1915–1931. [Google Scholar] [CrossRef]
  34. Ma, X.; Wang, Z.; Bai, L.; Xu, B.; Gao, J.; Wen, B.; Tao, J. Implementation of a Federated Large-Scale Remote Sensing Data Sharing Platform. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5771–5774. [Google Scholar]
  35. Guo, J.; Huang, C.; Hou, J. A Scalable Computing Resources System for Remote Sensing Big Data Processing Using GeoPySpark Based on Spark on K8s. Remote Sens. 2022, 14, 521. [Google Scholar] [CrossRef]
  36. Yu, Z.; Wang, Z.; Bai, L.; Chen, L.; Tao, J. Remote Sensing Inversion of PM10 Based on Spark Platform. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 1685–1688. [Google Scholar]
  37. Chebbi, I.; Boulila, W.; Mellouli, N.; Lamolle, M.; Farah, I.R. A comparison of big remote sensing data processing with Hadoop MapReduce and Spark. In Proceedings of the 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 21–24 March 2018; pp. 1–4. [Google Scholar]
  38. An, Q.; Hu, Q.; Tang, R.; Rao, L. Intelligent Scheduling Methodology for UAV Swarm Remote Sensing in Distributed Photovoltaic Array Maintenance. Sensors 2022, 22, 4467. [Google Scholar] [CrossRef]
  39. Wu, X.; Yang, Y.; Sun, Y.; Xie, Y.; Song, X.; Huang, B. Dynamic regional splitting planning of remote sensing satellite swarm using parallel genetic PSO algorithm. Acta Astronaut. 2022, in press. [Google Scholar] [CrossRef]
  40. Alkayal, E.S.; Jennings, N.R.; Abulkhair, M.F. Efficient task scheduling multi-objective particle swarm optimization in cloud computing. In Proceedings of the 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops), Dubai, United Arab Emirates, 7–10 November 2016; pp. 17–24. [Google Scholar]
  41. Gabi, D.; Ismail, A.S.; Zainal, A.; Zakaria, Z.; Al-Khasawneh, A. Cloud scalable multi-objective task scheduling algorithm for cloud computing using cat swarm optimization and simulated annealing. In Proceedings of the 2017 8th International Conference on Information Technology (ICIT), Amman, Jordan, 17–18 May 2017; pp. 1007–1012. [Google Scholar]
  42. Xing, L.; Li, W.; He, M.; Tan, X. Comprehensive multi-objective model to remote sensing data processing task scheduling problem. Concurr. Comput. Pract. Exp. 2017, 29, e4248. [Google Scholar] [CrossRef]
  43. Chen, H.; Du, C.; Li, J.; Jing, N.; Wang, L. An approach of satellite periodic continuous observation task scheduling based on evolutionary computation. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Berlin, Germany, 15–19 July 2017; pp. 15–16. [Google Scholar]
  44. Sun, J.; Li, H.; Zhang, Y.; Xu, Y.; Zhu, Y.; Zang, Q.; Wu, Z.; Wei, Z. Multiobjective task scheduling for energy-efficient cloud implementation of hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 587–600. [Google Scholar] [CrossRef]
  45. Zhao, Y.; Tian, C.; Fan, J.; Guan, T.; Qiao, C. RPC: Joint Online Reducer Placement and Coflow Bandwidth Scheduling for Clusters. In Proceedings of the International Conference on Network Protocols, ICNP, Cambridge, UK, 25–27 September 2018. [Google Scholar]
  46. Duggan, M.; Shaw, R.; Duggan, J.; Howley, E.; Barrett, E. A multitime-steps-ahead prediction approach for scheduling live migration in cloud data centers. Softw.-Pract. Exp. 2019, 49, 617–639. [Google Scholar] [CrossRef]
  47. Chao, S.C.; Lin, K.C.J.; Chen, M.S. Flow Classification for Software-Defined Data Centers Using Stream Mining. IEEE Trans. Serv. Comput. 2019, 12, 105–116. [Google Scholar] [CrossRef]
  48. Calheiros, R.N.; Ranjan, R.; Beloglazov, A.; De Rose, C.A.F.; Buyya, R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw.-Pract. Exp. 2011, 41, 23–50. [Google Scholar] [CrossRef]
Figure 1. Task scheduling architecture diagram.
Figure 1. Task scheduling architecture diagram.
Applsci 12 11508 g001
Figure 2. The working state of nodes.
Figure 2. The working state of nodes.
Applsci 12 11508 g002
Figure 3. Optimal fitness values of different algorithms.
Figure 3. Optimal fitness values of different algorithms.
Applsci 12 11508 g003
Figure 4. Convergence of different algorithms.
Figure 4. Convergence of different algorithms.
Applsci 12 11508 g004
Figure 5. Results of task execution of different algorithms.
Figure 5. Results of task execution of different algorithms.
Applsci 12 11508 g005aApplsci 12 11508 g005b
Table 1. List of node resources.
Table 1. List of node resources.
Node NumberDownload Power (kB/s)Energy Consumption per Unit Time (wh)Unit Task Cost (CNY)
12914.73002
25049.453304
34028.563203
41531.62851
Table 2. Task execution results of different algorithms.
Table 2. Task execution results of different algorithms.
The
Number of Tasks
AlgorithmNode 1Node 2Node 3Node 4
Time (min)The Number of TasksTime (min)The Number of TasksTime (min)The Number of tasksTime (min)The Number of Tasks
50FCFS_PSO35441422541818611452464
SJF_PSO30271418741724261459985
MAX_MAX_PSO31521419351623491457616
PSO354313225313186112524612
100FCFS_PSO567533385831366323977613
SJF_PSO56912831663440412611,03212
MAX_MAX_PSO55863033463341172210,43915
PSO567624385832366329977615
150FCFS_PSO89514560655458053314,80718
SJF_PSO86514451284764853616,67823
MAX_MAX_PSO87643751745064274216,46421
PSO89514360655058053314,80724
Table 3. Task execution results of different algorithms.
Table 3. Task execution results of different algorithms.
The
Number of Tasks
AlgorithmAverage Task
Execution Time (min)
Optimum
Fitness Value
Total Task
Execution Time (min)
Total Task Energy Consumption (Wh)Total Task Cost (CNY)
50FCFS_PSO258276612,9053,897,650146
SJF_PSO266276813,3244,012,270143
MAX_MAX_PSO264265713,1963,977,715140
PSO258275712,9053,897,020126
100FCFS_PSO230491222,9726,933,960278
SJF_PSO239487623,9307,189,320282
MAX_MAX_PSO235465323,4887,072,535273
PSO229576822,9726,934,260278
150FCFS_PSO238936535,62910,764,345423
SJF_PSO246839736,94311,115,970407
MAX_MAX_PSO245784436,83011,085,500421
PSO237894635,62810,764,345409
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, Z.; Bai, L.; Liu, X.; Chen, Y.; Zhao, M.; Tao, J. Dynamic Task Scheduling in Remote Sensing Data Acquisition from Open-Access Data Using CloudSim. Appl. Sci. 2022, 12, 11508. https://doi.org/10.3390/app122211508

AMA Style

Wang Z, Bai L, Liu X, Chen Y, Zhao M, Tao J. Dynamic Task Scheduling in Remote Sensing Data Acquisition from Open-Access Data Using CloudSim. Applied Sciences. 2022; 12(22):11508. https://doi.org/10.3390/app122211508

Chicago/Turabian Style

Wang, Zhibao, Lu Bai, Xiaogang Liu, Yuanlin Chen, Man Zhao, and Jinhua Tao. 2022. "Dynamic Task Scheduling in Remote Sensing Data Acquisition from Open-Access Data Using CloudSim" Applied Sciences 12, no. 22: 11508. https://doi.org/10.3390/app122211508

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop