A Greedy Scheduling Approach for Peripheral Mobile Intelligent Systems

Smart, pervasive devices have recently experienced accelerated technological development in the fields of hardware, software, and wireless connections. The promotion of various kinds of collaborative mobile computing requires an upgrade in network connectivity with wireless technologies, as well as enhanced peer-to-peer communication. Mobile computing also requires appropriate scheduling methods to speed up the implementation and processing of various computing applications by better managing network resources. Scheduling techniques are relevant to the modern architectural models that support the IoT paradigm, particularly smart collaborative mobile computing architectures at the network periphery. In this regard, load-balancing techniques have also become necessary to exploit all the available capabilities and thus the speed of implementation. However, since the problem of scheduling and load-balancing, which we addressed in this study, is known to be NP-hard, the heuristic approach is well justified. We thus designed and validated a greedy scheduling and load-balancing algorithm to improve the utilization of resources. We conducted a comparison study with the longest cloudlet fact processing (LCFP), shortest cloudlet fact processing (SCFP), and Min-Min heuristic algorithms. The choice of those three algorithms is based on the efficiency and simplicity of their mechanisms, as reported in the literature, for allocating tasks to devices. The simulation we conducted showed the superiority of our approach over those algorithms with respect to the overall completion time criterion.


Introduction
Continual and accelerated innovation in communications and information technologies has led to an expansion of the modern Internet and its applications and extensions, such as the Internet of things (IoT). The IoT continues to grow rapidly. It has become the basis for the so-called Fourth Industrial Revolution and the digital transformation of business and society [1]. It is an emerging paradigm aimed at providing appropriate and smart systems of objects [2]. Making objects smart may be interpreted in two ways. The first interpretation is about entrusting computing power to an object, making it autonomous by allowing it to make its own choices. The second interpretation, for its part, consists of allowing objects to communicate with the outside world and, if necessary, to communicate with machines that can calculate and make decisions. These machines can be located in cloud centers to limit their economic impact [3]. However, the proliferation of connected objects generates a huge amount of data that is difficult for resource-limited IoT objects to manage. In addition, the exchange and security of this data is a challenge when it is loaded and uploaded between peripheral devices and cloud server centers. In this regard, artificial intelligence (AI) can make a significant contribution to solving this problem, where it can become, along with the IoT, an important achievement [4]. The IoT supports a wide range of distributed systems that are interconnected with various devices such as servers, database centers, and computers. This is in addition to more powerful and compact portable devices such as smart cell phones and personal digital assistants (PDA) [5]. It is obvious to researchers and specialists that the demand for these devices has increased steadily. Unfortunately, technological progress has not always followed this tendency. Therefore, scheduling and load balancing techniques [6][7][8][9] have been developed to close this technological gap.
The scheduling process is concerned with the allocation of tasks to devices over time, whatever the environment may be (deterministic, stochastic, or online), with respect to a given criterion. For the sake of clarity, let us recall the definitions of these environments [10]: 1.
Deterministic scheduling: the data defining the problem are known in advance; 2.
Stochastic scheduling: all or most of the parameters that describe the problem are random variables over known distributions; 3.
Online scheduling: all or some of the parameters describing the model are known only at the time the decision has been made.
Improving the scheduling process requires a better understanding of the environment under study and depends on whether it is static (fixed and non-modifiable scheduling) or dynamic (re-evaluated online to respond to changes). Likewise, the environment treatment criteria on a local or distributed level can be classified into different types: (1) hard constraints (the constraints must be strictly respected), (2) soft constraints (there is a flexibility with respect to the constraints), (3) preemptive (tasks may be interrupted before completion), or (4) non-preemptive (tasks are executed without interruption until completion) [11][12][13].
Scheduling problems occur in various real-world applications, such as industrial production, hoist scheduling, airport control towers, assignment of processes on processors, and data transfer services. Scheduling problems were originally studied in the industrial sector, such as the manufacture and assembly of large generators, and this led to the introduction of critical path analysis (CPA) and the critical path method (CPM) [14,15]. Recently, scheduling problems has emerged as an active field of research in the IoT community [16] and in mobile collaborative computing models via architectural models of cloud, fog, and edge computing [6,17]. As these emerging architectural models of computing are distributed systems, we focused our study on scheduling parallel machine models.
The development of mobile devices and communication technologies has contributed to their growing utilization in collaborative applications that are complex and resourceintensive [18][19][20]. This has led to innovative designs in mobile collaborative computing and interaction techniques between devices. Among the most important closely related techniques are those for the scheduling and load balancing of tasks. Indeed, the scheduling problem consists of organizing the task execution order on ready devices, while load balancing aims to balance the tasks between devices [11]. As pointed out in Mishra et al. [21], the load balancing task determines where certain applications should be executed; it is used for distributing workloads across computing resources.
In this paper, we focus on establishing collaborative mobile systems based on modern technologies such as Pycom's LOPY4 and direct radio communication by devices to maintain wireless connectivity in different environments, even in the harshest ones [22,23]. In the present study, we develop a new scheduling algorithm based on a greedy approach, which is well suited to these environments in terms of speed, quality, and ease of decisionmaking at every step [24][25][26]. This approach enables small mobile devices to process tasks that match their capabilities where they have resource constraints in the IoT paradigm. Therefore, we adopted a strategy that consists of dividing the task and device groups into corresponding subgroups in terms of task size and device capacity. This algorithm aims to ensure load balancing through a technique that chooses among these corresponding subgroups the devices with the longest execution time first to allocate them to the appropriate tasks.
We conducted an experimental study (simulation) to compare, with respect to criteria based on the overall completion times, three existing solutions with our approach to improve the scheduling and load balancing techniques on the available devices of these systems. Let us note that the choice of these three algorithms was mainly based on the simplicity, efficiency, and speed, as well as the mechanism for allocating devices to tasks [27,28]. We believe that the present study provides a path to further research on the development and efficient use of mobile and pervasive computing devices in crucial and urgent situations, such as highly intensive communications that require high bandwidth, smart healthcare, smart cities, harsh cases, and smart cars. This paper is organized as follows. Section 2 presents the related works. Section 3 introduces greedy algorithms as a solution in our approach. Section 4 describes our contribution to scheduling and load balancing in peripheral autonomous mobile systems. Through an intensive experimental study, Section 5 validates our approach, then analyzes and undertakes a comparison between our solution and three other algorithms. Finally, Section 6 presents our concluding remarks and perspectives.

Related Works
In this section, we present a review of the literature on scheduling techniques applied in the IoT field. The algorithms presented in [11] show that the tendencies in this area are evolving more toward approximate (near-optimal) solutions that are generated within a reasonable time. The main reason for these trends is that most scheduling problems are difficult to solve from the computational point of view [29].
Le et al. [30] pointed out that source code optimization is an emerging technique used to reduce energy consumption in parallel with increasing the storage and power capacity of mobile devices. This improvement may be achieved by partitioning tasks into smaller tasks.
The classification of task scheduling algorithms differs depending on the viewpoints of the authors. The algorithms fall into six categories [31], which are presented in Table 1. Table 1. Classification of task scheduling algorithms.

Categories of Scheduling Algorithms Description
Immediate scheduling Direct scheduling of new tasks upon their arrival on VMs.
Batch scheduling of tasks The tasks are pregrouped into batches before they are sent.

Static scheduling
The strategies of scheduling within this environment are usually based on information known beforehand about the system's global state.
Dynamic scheduling Does not require current information about the global state of the system. In dynamic scheduling, tasks are distributed in terms of the capacity of the available VMs.
Preemptive scheduling Portions of a task executed on a resource are resumed later on the same or another resource.
Non-preemptive scheduling Requires processing the entire task without interruption until its completion on the same resource.
In cloud computing, there are three levels suitable for the task scheduling system [31]: 1.
The first level is the set of tasks (cloudlets) to be executed; 2.
The second level is the process of appropriately allocating resources to tasks to optimize the use of these resources with respect to the makespan or overall completion times; 3.
The third level is the use of a set of virtual machines (VMs) to perform tasks.
In Sharma et al. [32], it is stated that there are several task scheduling techniques available for scheduling tasks in a cloud environment under the following categories: 1.
Heuristic techniques can be divided into two subcategories. The first one is a traditional technique for scheduling various tasks, such as FCFS, round robin (RR), and SJF. This approach is simple and imperative, but it always approaches the local optima [33]. The second category uses a random sample to find the optimal or near-optimal solution. Some of its techniques are Min-Min, Max-Min, improved Max-Min [34], and Min-Min based on priorities. These techniques generate better results than traditional approaches [35]; 2.
Metaheuristic algorithms generally have functionalities that are like aspects of biological science. They are classified into three categories: (1) metaheuristics, such as genetic and transgenic algorithms, based on gene transfer; (2) metaheuristics based on insect behaviors and their interactions, such as ant colony optimization, the firefly algorithm, bee marriage optimization algorithm, and bee colony algorithm; and (3) metaheuristics based on aspects of biological life, such as the tabu search algorithm, simulated annealing algorithm, optimization algorithm for particle swarms, and artificial immune system [36].
The main difference between heuristic and metaheuristic is that the former is problemspecific and generates, step by step, only one solution, while the latter is problem-independent and generates several solutions [32].
Recently, improvements made to scheduling algorithms in cloud computing were discussed and investigated extensively in the literature (see, for example, Sharma et al. [32]). The analysis of scheduling algorithms is summarized in Scheme 1.
The need for the fair sharing of resources in wireless networks increases with both the increased demand for their use and the cost of their equipment as it becomes difficult to install more devices (Sherin et al. [47]). Therefore, the scheduling process becomes the most important factor for improving wireless resource management. Multiple users of the system can access their shared resources efficiently by this scheduling process. These resources are mostly limited when compared with the increasing number of users. Sched-Analysis of scheduling algorithms in cloud computing Effective scheduling algorithm Sindhu and Mukherjee proposed two algorithms to minimize the completion time of cloudlets or tasks, namely longest cloud fastest processing element (LCFP) and shortest cloud fastest processing element (SCFP) [37].

Min-Min algorithm
Kaur et al. [38] proposed an improved Min-Min algorithm which aims to maximize the utilization of resources in a distributed environment. It adopts the traditional Min-Min algorithm as a first stage in terms of minimum completion time. In the second stage, the idle resource is restored for an extended period of time.

Max-Min algorithm
It is a combination of two algorithms, in which a task with a longer than average execution time is selected instead of the largest task, to be assigned to the resource having the lowest completion time [39].

Selective algorithm
To ensure QoS, it takes advantage of the basic Min-Min and Max-Min algorithms to focus on defining the selective parameter as the standard deviation of task completion times that are not affected in the Meta task [40]. Razzaque et al. [42] proposed this algorithm to achieve the appropriate use of network bandwidth in the cloud environment using a non-linear programming model to allocate each virtual machine to a suitable number of tasks.
Cloud-based workflow scheduling algorithm Rimal et al. [43] proposed this algorithm to improve workflow, in a new service layer, which would be the fourth on the infrastructure, in a multitenant cloud environment. Moreover, this algorithm aims to reduce resource utilization time and makespan, and at the same time to maximize resource utilities and minimize their cost.

QoS-based scheduling algorithm
Wu et al. [44] proposed this algorithm to optimize the QoS in cloud computing by forming a list of all tasks according to their priorities in terms of their minimum completion time. Three indices are considered to measure its performance: i) makespan, ii) average waiting time of the longest task, and iii) load balancing index (LBI), which is used to determine the loading conditions of the system and its maximum capacity [45].
Multiple factor task scheduling algorithms Bansal et al. [46] proposed a comparison between traditional scheduling techniques (such as FCFS, QoS-driven, activity-based costing) with algorithms based on priorities. They concluded that the cost factor and the use of resources were the essential criteria for a better treatment of scheduling algorithms. Scheme 1. Categories of several task scheduling techniques in the cloud environment [37][38][39][40][41][42][43][44][45][46]. The need for the fair sharing of resources in wireless networks increases with both the increased demand for their use and the cost of their equipment as it becomes difficult to install more devices (Sherin et al. [47]). Therefore, the scheduling process becomes the most important factor for improving wireless resource management. Multiple users of the system can access their shared resources efficiently by this scheduling process. These resources are mostly limited when compared with the increasing number of users. Scheduling algorithms offer equitable user access and are essential for ensuring the provision of quality of service (QoS). Let us observe that researchers and practitioners are becoming more concerned with packet scheduling algorithms, which are highly essential for QoS. The design of scheduling algorithms has become more complex due to several factors, including the dynamism of mobile ad hoc networks (MANETs) and vehicular ad hoc networks (VANETs) with frequent topology changes and breakdowns in connectivity. A MANET has many applications, including disaster management, emergency relief, vehicular ad hoc network services (VANETs), war field communications, mobile teleconferencing, and electronic payments. QoS is essential in these real-time applications due to the limited resources and dynamic topology. QoS indicates the service performance level provided by the network to the end user. Depending on the application, it can be based on bandwidth, delay, packet loss, throughput, overhead, jitter, and so on.
One of the key features of ad hoc networks is their capability of operating without a standard infrastructure, and they can be deployed easily for various applications. However, it is difficult to provide QoS for all packets because of the dynamic nature of nodes [48]. Generally, scheduling algorithms manage changes in queuing dynamics and indicate the packets to be submitted from the queue. To ensure the QoS parameters, these algorithms are determined based on network requirements [49]. There are two major problems in multi-hop wireless networks: the packet (or package of information, as it is known in the jargon of computer networks [48]) and the channel scheduling. In this area, scheduling algorithms, generally considered to be non-preemptive, are based on analysis of the QoS parameters, such as the throughput rate, fairness in the network, bounded delay, and jitter. The traditional scheduling approach, which takes advantage of the priority lists, is mainly used in mobile ad hoc networks.
Wireless scheduling algorithms in mobile ad hoc networks are classified according to their application domains into two general classes: packet scheduling and channel-aware scheduling [47], as presented in Tables 2 and 3 and Scheme 2. Table 2. Class of channel-aware scheduling algorithms in mobile ad hoc networks.

Class Algorithm Description
CaSMA CaSMA concentrates on the awareness and coordination of the end-to-end channel condition to reduce the accumulation of packets in the network and avoid congestion by increasing the number of completely served packets.

Channel-aware scheduling
Channel-aware AOMDV It uses a preemptive handoff technique. In addition, it uses the non-fade duration to select the path during the route. Each node contains a table, which gives the information about the signal strength of the previously received packet.

PALM
Power-aware link maintenance is based on an ad hoc on-demand distance vector routing algorithm and is responsible for power control with route maintenance. It establishes the routing mechanism in MANETs (Mobile ad hoc networks). Table 3. Class of techniques for packet scheduling algorithms in mobile ad hoc networks.

Class Subclass Algorithm Description
Packet scheduling Existing techniques FIFO In first-in, first-out, all packets are inserted into a single queue and processed in the order of their arrival. The delay is directly proportional to the length of the queue.

Priority queuing
Packets are categorized and then grouped into queues with different priorities. Whereas high-priority packets are first processed, low-priority packets are likely to be dropped.

WFQ
In the weighted fair queuing, packets are scheduled with bandwidth requirements and placed in the respective queues. Packets with a smaller end time are chosen as the next packet for transmission.

CBWFQ
Class-based weighted fair queuing extends the WFQ functionality and supports user-defined traffic classes. CBWFQ services the class queue fairly based on the weight assigned to the queued packets.

WHS
Weighted-hop scheduling gives high priority to data packets with only a few remaining hops to pass. A weighted round robin scheduler is used instead of static priority to give a chance to all service classes.

WDS
The weighted distance scheduling algorithm considers the physical distance using a GPSR, where each data packet contains the destination address.

RR
The round robin scheduling algorithm preserves the per-flow queues. It provides equal service opportunity among flows.

Greedy scheduling
Each node redirects its own packet before forwarding the other nodes' packets, which are processed based on FIFO scheduling.

LLQ
Low-latency queuing has a single strict priority queue for placing separate traffic classes in. This queue allows traffic-sensitive delay treatment before processing other queues. All other queues are regulated by the (percentage of) bandwidth.

Greedy Approach
Greedy algorithms are a problem-solving paradigm in which local optimal choices are made at every step, and they are expected to yield an optimal overall solution. However, in many problems, they only produce near-optimal solutions [50]. Let us recall that the greedy approach is well justified for NP-hard problems and it is suited as a solving approach for real problems. However, it is worth noting that for problems with an optimal substructure, greedy algorithms are able to find globally optimal solutions [51].
For the objective of minimizing an unloading cost function, Mazouzi et al. [52] studied discharging policies to choose the tasks that must be discharged before selecting the assigned cloudlet, depending on the network and system resources. The unloading cost Scheme 2. Class of recent techniques for packet scheduling algorithms in mobile ad hoc networks.

Greedy Approach
Greedy algorithms are a problem-solving paradigm in which local optimal choices are made at every step, and they are expected to yield an optimal overall solution. However, in many problems, they only produce near-optimal solutions [50]. Let us recall that the greedy approach is well justified for NP-hard problems and it is suited as a solving approach for real problems. However, it is worth noting that for problems with an optimal substructure, greedy algorithms are able to find globally optimal solutions [51].
For the objective of minimizing an unloading cost function, Mazouzi et al. [52] studied discharging policies to choose the tasks that must be discharged before selecting the assigned cloudlet, depending on the network and system resources. The unloading cost is declared to be a combination of task execution time and energy consumption. This problem is represented with mixed binary programming. As this problem is NP-hard, a distributed linear relaxation-based heuristic approach is proposed. Likewise, a greedy heuristic algorithm is proposed to solve the subproblems, and it calculates the best cloudlet selection and bandwidth allocation based on the task unloading costs.
Ayanzadeh et al. [51] proposed a novel hybrid approach, the quantum-assisted greedy algorithm (QAGA), to improve the performance of physical quantum annealers in the process of finding the global minimum in an Ising Hamiltonian model. Ising models are widely used in many areas of science.They routinely apply a Hamiltonian equation to, among other subjects, alloy thermodynamics and the thermal properties of solids [53]. The QAGA algorithm leverages the quantum annealers to better select candidates at each stage of the greedy algorithm. In fact, it consists of using a quantum annealer at each step to give rise to samples of the ground state of the problem. These samples are used to estimate the probability distribution of the problem variables. Then, it fixes these variables with insignificant uncertainties to move to the next step, where quantum annealing will solve a smaller problem using scattered couplings [51]. In their experimental study, the authors used a D-Wave 2000Q quantum computer with 2000 qubits and new control features to solve larger problems. The results showed that the QAGA approach could find samples with remarkably lower energy values compared to the better-known improvements in the field of quantum annealing.
In their study, Durmus et al. [54] aimed to support the assumption that the classic algorithms generate optimal solutions while the greedy heuristic algorithms generate proximate solutions. They argued that not only do the dimensions of problems increase, but the dimensions and number of constraints in packet programs are also limited, so it is difficult for the classic algorithms to provide the appropriate results. However, regardless of the dimensions or the number of constraints, the greedy approach generates appropriate results. As shown in the literature, both the former and the newer versions, such as the ones proposed by Akçay et al. [55] and Zhou et al. [56], work efficiently. Indeed, greedy algorithms are more efficient in comparison with other complex algorithms; they produce solutions to everyday problems within a reasonable time. Durmus et al. [54] applied greedy and some classical algorithms to the problems of integer linear programming and then compared the differences and similarities of the obtained results. Throughout this work, they classified the most-used algorithms in the literature, as displayed in Table 4, to which we added other greedy approaches and algorithms [51,57]. Table 4. Algorithm classifications and descriptions.

Algorithm Category Algorithm Name Category Description
Classical algorithms (exact solution algorithms) Rounding and graphical method, cutting plane, branch-bound, Balas, Lagrange, branch and cut, Benders decomposition method, and all-integer integer programming.
These algorithms are known in the literature as exact solution algorithms. Their drawback is often their high computational cost. However, they are effective for small and medium instances.
These methods provide solutions close to the optimal one. They can solve a large variety of problems. They are conceptually simple. They can be flexed and adapted according to the problem under study. For example, in genetic algorithms, an analogy is developed between an individual in a population and a solution of a problem in the global solution space. In addition, the simulated annealing method is inspired by the process used in metallurgy to cool down steel. Likewise, particle swarm optimization is a cooperative, population-based global search swarm intelligent metaheuristic and population-based stochastic optimization technique, which is used in solving multimodal continuous optimization problems. Moreover, the tabu search algorithm selects a new search movement in such a way that temporally forbids the evaluation of previous solutions [11].
Greedy algorithms Greedy algorithms are approaches or techniques in which we consider only one choice at each stage. Their strategy is to seek the best for the current state. It is known in the literature that they may produce optimal global solutions. We may cite the Dijkstra, Kruskal, Prim and Huffman algorithms [58].
They are the most-used algorithms in everyday life. They tackle problems with a given objective function; their strategy is to select each stage variable that has the most benefit. They are quite easy to apply and implement, the computational costs are quite low, and they can be applied to all kinds of problems.

Scheduling Algorithms in Peripheral Autonomous Mobile and Pervasive Systems
The scheduling problems we discuss in this paper generally occur in the context of collaborative pervasive architectures within mobile and intelligent distributed systems. We focus our study particularly on the emerging architectures and technologies introduced in recent years, especially at the periphery of autonomous mobile systems.

Autonomous Mobile and Pervasive Architecture at the Periphery
The problem with these architectures is how to make autonomous smart mobile systems more efficient in terms of completing tasks quickly while respecting their limitations (energy consumption, data storage, computing, connection preservation, and mutual charging between terminals, cloud centers, or other structures). The present study aims to efficiently address these issues in terms of partitioning, scheduling, and load balancing. This requires relying solely on these devices to accomplish the required tasks. This should consider their modest capabilities for innovation, on the one hand, in the system architecture in terms of equipment and maintaining connectivity and, on the other hand, in improving the scheduling and load balancing processes [59]. The scheduling problem we are addressing in this study consists in transferring as many tasks as possible to be managed on the appropriate devices of a peripheral autonomous mobile system. This problem is described as follows.
Assume there are m available devices d i , i = 1, . . . , m and n tasks t j , j = 1, . . . , n. With no priority, the n tasks are processed by the m devices according to their due dates ddt j , j = 1, . . . , n. The devices are also assigned to the tasks based on the device's ability to IoT 2021, 2 258 accomplish the task. This is done when the device free execution time is greater than the task execution time, as illustrated by Table 5. Table 5. Parameters of devices and tasks. Task As usual, we assume that no device may process more than one task at a time, and no task is processed by more than one machine at a time. We seek an assignment of these tasks to these devices mainly to minimize the overall completion time. This kind of problem is known in scheduling theory as the parallel machine problem. This problem is known to be NP-hard with respect to most of the scheduling criteria, including the criterion we are addressing [29].

Scheduling Algorithms Based on a Greedy Approach
By comparing different classes of scheduling algorithms, we concluded that greedy algorithms are appropriate for performing tasks on mobile systems within the autonomous mobile architecture at the network's periphery or edges. The reason is that these systems are compatible with greedy algorithms, which are suited for use as a selection algorithm to prioritize options in a search in simple problems. They are generally used in situations where the number of possibilities for improvement is too large to be considered in a meaningful way within a reasonable time [10]. Consequently, to remedy the problems of these networks, it is necessary to design an appropriate algorithm inspired by these greedy algorithms, which require reasonable processing time. For this purpose, we designed a scheduling algorithm based on the greedy approach, composed of one choice at each stage. This approach seems to be appropriate for avoiding complex calculations. Our algorithm was inspired by those like FCFS [46], the expected time to compute (ETC) matrix, minimum execution time (MET), minimum completion time (MCT) [60], Min-Min algorithm for task scheduling [33], longest cloudlet fastest processing (LCFP), and shortest cloudlet fastest processing (SCFP), proposed by Sindhu et al. [37].
To evaluate the performance of our algorithm, we selected three algorithms for comparison in an experimental study with our own, with respect to the quality of the generated solutions, simplicity of calculation, and speed of execution. These algorithms were chosen to cover the different mechanisms for allocating devices to tasks. They are described as follows: 1.
The Min-Min algorithm selects the smaller tasks, with respect to processing time, to be executed first on the appropriate devices [61]; 2.
The LCFP algorithm chooses the longest cloudlet or task to be executed on the fastest processing element; 3.
The SCFP algorithm directs the shortest cloudlet or tasks to the fastest processing element.
The approach we developed to solve the problem described above consists of the following steps: (1) problem modeling, (2) describing the greedy algorithm, and (3) the operating mechanism for solving the problem.
Before closing this subsection, let us say a word on the time complexity of the three algorithms. Each of these algorithms is dominated by the sorting procedure with respect to either the set of tasks or speed of the devices. However, the assignment of the tasks to the devices can be implemented in O(nm). Therefore, the time complexity of a sorting procedure is O(nm + max(m log m, n log n).

Problem Modeling
Our approach consists of several steps. The first step begins with breaking down large tasks into tasks based on the device's capabilities. Likewise, based on a criterion for the number of instructions, in the case of applications, and a criterion for size in the case of data. Then, the second step groups and classifies them as well as the network devices into three categories: large, medium, and small. In the third step, the tasks are classified into three categories-large, medium, and small-in such a way that the sizes of the tasks in a category are proportional to the capacities of the devices of the analogous category ( Figure 1). These capacities are estimated based on a device score calculation. The most important aspect of a device's capacity is that it has sufficient execution time to accomplish the task assigned to it.
IoT 2021, 2, FOR PEER REVIEW 12 important aspect of a device's capacity is that it has sufficient execution time to accomplish the task assigned to it. The score for each device d is determined as a function of the battery energy (ValBattery), storage (ValStok), bandwidth (ValBandwidth), RAM (ValRam), and CPU (ValCPU). To each of these factors is assigned a constant as a percentage value, depending on its importance for device effectiveness. The score is calculated as follows: This formula is implemented as in Algorithm1: CalculatingScores, which is called within Algorithm4: SchedulingThroughAnalogousSubgroups. It takes a device d as a parameter and obtains the capacity values of its factors previously mentioned to return its score value (ValScore).
The variables (criteria) used in our algorithms (Algorithms 1-4) are described in Table 6.  The score for each device d is determined as a function of the battery energy (ValBattery), storage (ValStok), bandwidth (ValBandwidth), RAM (ValRam), and CPU (ValCPU). To each of these factors is assigned a constant as a percentage value, depending on its importance for device effectiveness. The score is calculated as follows: This formula is implemented as in Algorithm 1: CalculatingScores, which is called within Algorithm 4: SchedulingThroughAnalogousSubgroups. It takes a device d as a parameter and obtains the capacity values of its factors previously mentioned to return its score value (ValScore).
The variables (criteria) used in our algorithms (Algorithms 1-4) are described in Table 6. Ejaz et al. [62] formally defined the component execution time in terms of the processor speed, average number of instructions in the component, file size, number of processes running simultaneously on the mobile device, propagation delay, and transmission delay.
Based on [62], we supposed that the available device execution time would be proportional to its score as it depends on the same factors: the bandwidth and the capacities of the battery, storage or warehousing, and computing (RAM and CPU speed). This enabled us to estimate the execution time of a task as well as the available execution time of a device. Thus, we propose Algorithm: DeviceHasExecutionTime for the function that checks if this device has enough time to perform this task for use in our greedy Algorithm 4: Schedul-ingThroughAnalogousSubgroups. Algorithm 2 uses the following parameters: TaskTime (task rpocessing time), DeviceTime (available device execution time), and a constant β (0.85 ≤ β ≤ 0.90), used to ensure that the device's ready execution time is sufficiently larger than the task processing time. It then returns a Boolean value B that indicates whether the machine has sufficient time to process the task in a non-preemptive mode.

Proposed Greedy Algorithm
We propose a new approach to map tasks to devices such as laptops, smartphones, tablets, and similar machines, based on their categories and execution times. The approach we propose consists of partitioning each of the groups of tasks and devices into three subgroups and then sorting them in descending order, according to the processing times and device available execution time: LT (longest tasks), MT (medium tasks), ST (shortest tasks), LD (large devices), MD (medium devices), and SD (small devices). The next phase maps tasks from their subgroup to the appropriate devices in the subgroup of the same category and, if necessary, to other subgroups of devices having greater capabilities. Therefore, the tasks of the LT subgroup are mapped to the devices of the LD device subgroup. The MT subgroup tasks are then assigned to MD subgroup devices and, if required, to LD subgroup devices. The ST subgroup tasks are also mapped to the SD subgroup of devices and, if necessary, to MD and then LD device subgroups, as illustrated in (Figure 2). Using these group patterns, we aimed to initiate and direct parallel processes of assigned devices to tasks in order to reduce the task completion times. We proceeded by restricting the search to the subgroup of devices that was similar to the subgroup of tasks or to larger subgroups of devices. Due to our prior classification, smaller sets of devices did not contain the appropriate devices. As previously mentioned, three parallel processes could be launched. The first was from the LT subgroup to the LD subgroup, but only if the devices in the MD and SD subgroups had no available execution time to perform LT tasks in a non-preemptive mode. Likewise, a second process could be started from the MT subgroup to the MD and LD subgroups. In parallel, a third process could go from the ST subgroup to the SD, MD, and LD subgroups.
The next phase, after the partitioning and the generation of these subgroups, was to proceed through the assignment of the devices to the tasks via Algorithm4: Scheduling ThroughAnalogousSubgroups.
We structured this algorithm in three parts, which could be executed in parallel with the aim of reducing the overall completion time of the tasks and thus save the device's energy. Many techniques, such as multithread, message passing interface (MPI), or Ex-aMPI, were used in parallel [63][64][65]. All the three parts were involved in allocating devices to subtasks from their similar groups at the same time viz. Part I from the large group of subtasks LT to the large group of devices LD, Part II from the medium group of subtasks MT to the medium group MD, and if necessary, to the large group of the devices LD, and Part III from the small group of subtasks ST to the small group SD, and then, as needed, to the MD and LD groups of devices. In addition, we performed load balancing on the devices by updating their information and sorting them again in descending order by the available execution time of the devices.
We also facilitated the understanding and effective use of this algorithm, as mentioned above, through the following algorithms, which will be called by the main algorithm: • Algorithm1: CalculatingScores; Using these group patterns, we aimed to initiate and direct parallel processes of assigned devices to tasks in order to reduce the task completion times. We proceeded by restricting the search to the subgroup of devices that was similar to the subgroup of tasks or to larger subgroups of devices. Due to our prior classification, smaller sets of devices did not contain the appropriate devices. As previously mentioned, three parallel processes could be launched. The first was from the LT subgroup to the LD subgroup, but only if the devices in the MD and SD subgroups had no available execution time to perform LT tasks in a non-preemptive mode. Likewise, a second process could be started from the MT subgroup to the MD and LD subgroups. In parallel, a third process could go from the ST subgroup to the SD, MD, and LD subgroups.
The next phase, after the partitioning and the generation of these subgroups, was to proceed through the assignment of the devices to the tasks via Algorithm 4: Scheduling ThroughAnalogousSubgroups.
We structured this algorithm in three parts, which could be executed in parallel with the aim of reducing the overall completion time of the tasks and thus save the device's energy. Many techniques, such as multithread, message passing interface (MPI), or ExaMPI, were used in parallel [63][64][65]. All the three parts were involved in allocating devices to subtasks from their similar groups at the same time viz. Part I from the large group of subtasks LT to the large group of devices LD, Part II from the medium group of subtasks MT to the medium group MD, and if necessary, to the large group of the devices LD, and Part III from the small group of subtasks ST to the small group SD, and then, as needed, to the MD and LD groups of devices. In addition, we performed load balancing on the devices by updating their information and sorting them again in descending order by the available execution time of the devices.
We also facilitated the understanding and effective use of this algorithm, as mentioned above, through the following algorithms, which will be called by the main algorithm: • Algorithm 1: CalculatingScores; • Algorithm 2: DeviceHasExecutionTime; • Algorithm 3: AllocatingDeviceToTask.
They act as follows. Algorithm 1 calculates the score of a device that would be proportional to its available runtime. This time will be compared to the time of Algorithm 2 to complete a subtask and check whether it is sufficient to execute it. If so, Algorithm 3 will allocate this device to this task.
Algorithm 3 takes a subset of devices and a task as parameters, and it calls Algorithm 2 to check whether the device in this subset has enough time to perform the corresponding task. When such a device is found, it is assigned to the task. Therefore, the ready execution time of the device is reduced by the execution time of the task. Next, the SubGroupDevice device subgroup is sorted to perform load balancing. Algorithm 4 works, as is explained above, on the six sorted subgroups already generated (LT, MT, and ST and LD, MD, and SD) to produce a plan for allocating the devices of those subgroups to their tasks while ensuring true load balancing.

Operating Mechanism
Our greedy scheduling algorithm is designed to build the subgroups, as described in Algorithm 4. The operating mechanism is carried out by launching a search from a task subgroup in a similar subgroup of devices and the larger one, if needed, to allocate the appropriate devices to the tasks. It is based on the technique of parallelism, considering the number of processors and kernels of the devices in addition to their load-balancing factor (as illustrated in Figure 3). The score of a device is used as a factor to determine the availability of execution time, which is needed to accomplish specific tasks, as shown in Algorithm 2. This score represents the sum of the percentage of battery usage and the capacity of storage, computing, and connectivity (or bandwidth), as illustrated in Algorithm 1. The mechanism we used to map tasks into the appropriate devices in similar subgroups reduced the completion times. This improves the system performance of mobile devices as it reduces the limited energy consumption of this system's resources.

Algorithm 4 SchedulingThroughAnalogousSubgroups
Integer isdone = 0, lg, j; String task; CalculatingScores (LD), CalculatingScores (MD), CalculatingScores (SD); while isdone < 3 Part I while LT = Ø Go lg←length (LT); For j == 0, j < lg Do Algorithm2. This score represents the sum of the percentage of battery usage and the capacity of storage, computing, and connectivity (or bandwidth), as illustrated in Algo-rithm1. The mechanism we used to map tasks into the appropriate devices in similar subgroups reduced the completion times. This improves the system performance of mobile devices as it reduces the limited energy consumption of this system's resources. We close this section by mentioning the time complexity of our algorithm. To divide the tasks into the three classes, we needed to sort the tasks. This step could be done in O(n log n). The same argument goes for the three groups of devices with respect to their capacity scores. This step could be done in O(m). The rest of the algorithm (i.e., the assignment of those groups of tasks to the corresponding group of devices) could be implemented in O(nm). Therefore, the overall time complexity of our algorithm was O(nm + n log n).

Simulation
We present in this section the results of a simulation study we conducted to compare the performance of Algorithm4, as described above, with LCFP, SCFP, and Min-Min. In these algorithms, a variety of factors and their proper values are considered, such as the processing time, bandwidth, number of tasks, number of devices, sending and receiving tasks, waiting times in the queues of the devices, and capacity of each device.
Note that the simulation did not consider some factors in mobile systems such as device communication, bandwidth, the sending and receiving of tasks through networks, and their waiting time in the queues of the devices.
Even though we used a single computer in this experimental study instead of several devices to process all the tasks, we accurately calculated the values of the estimated factors We close this section by mentioning the time complexity of our algorithm. To divide the tasks into the three classes, we needed to sort the tasks. This step could be done in O(n log n). The same argument goes for the three groups of devices with respect to their capacity scores. This step could be done in O(m). The rest of the algorithm (i.e., the assignment of those groups of tasks to the corresponding group of devices) could be implemented in O(nm). Therefore, the overall time complexity of our algorithm was O(nm + n log n).

Simulation
We present in this section the results of a simulation study we conducted to compare the performance of Algorithm 4, as described above, with LCFP, SCFP, and Min-Min. In these algorithms, a variety of factors and their proper values are considered, such as the processing time, bandwidth, number of tasks, number of devices, sending and receiving tasks, waiting times in the queues of the devices, and capacity of each device.
Note that the simulation did not consider some factors in mobile systems such as device communication, bandwidth, the sending and receiving of tasks through networks, and their waiting time in the queues of the devices.
Even though we used a single computer in this experimental study instead of several devices to process all the tasks, we accurately calculated the values of the estimated factors of algorithm performance. We compared and analyzed the four algorithms based on the following criteria: 3.
The absolute difference AD between the maximum completion time M (M = max(C i : i = 1, . . . , m)) and the minimum completion time L (L = min(C i : i = 1, . . . , m)) (i.e., AD = |M-L|). We then computed the percentage for the number of times that each algorithm produced the minimum values of these factors (makespan, SD, and AD). The value of the makespan indicates the time at which the entire set of tasks is completed, thus representing how efficiently the devices are used. The value AD represents the range of the completion times of the devices, thus representing how well-balanced the generated solution is. The standard deviation expresses the dispersion of the completion times of the devices around the makespan. A low value of standard deviation indicates that the completion times tend to be close to the value of the makespan, which is to say that the schedule is well-balanced.
The experimental study we conducted used the WampServer web platform, an SQL server, and the PHP programming language to implement and validate the proposed algorithms. It was implemented on a laptop with the following characteristics: Intel ® Core™ i7 processor, x64-based processor, Windows 10 Pro 64-bit OS, 2.3 GHz CPU, and 16 GB of RAM. Figure 4 presents the graphical interface for the simulation.
of algorithm performance. We compared and analyzed the four algorithms based on the following criteria: We then computed the percentage for the number of times that each algorithm produced the minimum values of these factors (makespan, SD, and AD). The value of the makespan indicates the time at which the entire set of tasks is completed, thus representing how efficiently the devices are used. The value AD represents the range of the completion times of the devices, thus representing how well-balanced the generated solution is. The standard deviation expresses the dispersion of the completion times of the devices around the makespan. A low value of standard deviation indicates that the completion times tend to be close to the value of the makespan, which is to say that the schedule is well-balanced.
The experimental study we conducted used the WampServer web platform, an SQL server, and the PHP programming language to implement and validate the proposed algorithms. It was implemented on a laptop with the following characteristics: Intel ® Core™ i7 processor, x64-based processor, Windows 10 Pro 64-bit OS, 2.3 GHz CPU, and 16 GB of RAM. Figure 4 presents the graphical interface for the simulation.    To ensure the reliability of our method, it is necessary to mention the external factors that might have affected the tested algorithms. Among those factors, we may mention the computer self-configuration resources in execution times. Thus, they influenced the operating system in terms of scheduling their execution in conjunction with these algorithms. As these factors operate automatically while the algorithms are executed, they affect the length of the processing time. Therefore, considering these factors, in terms of the impacts they may have had on the implementation of the algorithms, enabled us to verify and confirm the credence of the results.
For each instance, we compared the makespan values of the four algorithms. We increased by one the score of the algorithm with the minimum makespan. If two algorithms had the same minimum makespan, we compared their standard deviations and increased by one the score of the algorithm with the minimum standard deviation. If they again produced the same minimum standard deviations, we compared their absolute difference and increased by one the score of the algorithm with the minimum value. If they still had the same absolute difference, we increased by one the score of the algorithm with the minimum score. The best algorithm was the one that had the highest score over the 100 instances.
In the following, we first present a comparison of the minimum makespan, the standard deviation, and the absolute difference with respect to the completion times of the four algorithms (LCFP, SCFP, Min-Min, and Algorithm 4). Next, we also present, for the same instances, the results with respect to the running times of these algorithms. To ensure the reliability of our method, it is necessary to mention the external factors that might have affected the tested algorithms. Among those factors, we may mention the computer self-configuration resources in execution times. Thus, they influenced the operating system in terms of scheduling their execution in conjunction with these algorithms. As these factors operate automatically while the algorithms are executed, they affect the length of the processing time. Therefore, considering these factors, in terms of the impacts they may have had on the implementation of the algorithms, enabled us to verify and confirm the credence of the results.
For each instance, we compared the makespan values of the four algorithms. We increased by one the score of the algorithm with the minimum makespan. If two algorithms had the same minimum makespan, we compared their standard deviations and increased by one the score of the algorithm with the minimum standard deviation. If they again produced the same minimum standard deviations, we compared their absolute difference and increased by one the score of the algorithm with the minimum value. If they still had the same absolute difference, we increased by one the score of the algorithm with the minimum score. The best algorithm was the one that had the highest score over the 100 instances.
In the following, we first present a comparison of the minimum makespan, the standard deviation, and the absolute difference with respect to the completion times of the four algorithms (LCFP, SCFP, Min-Min, and Algorithm 4). Next, we also present, for the same instances, the results with respect to the running times of these algorithms.
Displayed in Figure 5 are the different tables that summarize the results of the experimental study which satisfied the criteria (completion time M, standard deviation SD, and absolute difference AD).
IoT 2021, 2, FOR PEER REVIEW 20 Displayed in Figure 5 are the different tables that summarize the results of the experimental study which satisfied the criteria (completion time M, standard deviation SD, and absolute difference AD).  Figure 5 illustrates the performance of the four algorithms for various number of devices and tasks. Whereas, Figure 6 summarizes the average running times of these four algorithms.  Figure 5 illustrates the performance of the four algorithms for various number of devices and tasks. Whereas, Figure 6 summarizes the average running times of these four algorithms. IoT 2021, 2, FOR PEER REVIEW Figure 6. Performance of the algorithms with respect to their running times.
Regarding the scheduling solution, our algorithm succeeded in reducing the com tion time of the tasks. At the same time, it improved the device load balancing by pr ing devices with larger execution times to be chosen first by the algorithm and accom the appropriate tasks. Figure 7 shows our algorithm performance compared with other algorithms for load balancing in the experimental results, with respect to 10 de and 20, 70, 100, and 200 tasks. For the correct interpretation of these graphic lines in F 7, we proceeded from the fact that the ideal line representing the best performance i horizontal line crossing them in the middle. Therefore, the best algorithm is the one the closest graphic line to that ideal line, illustrated by our algorithm for most of the c Regarding the scheduling solution, our algorithm succeeded in reducing the completion time of the tasks. At the same time, it improved the device load balancing by preparing devices with larger execution times to be chosen first by the algorithm and accomplish the appropriate tasks. Figure 7 shows our algorithm performance compared with the other algorithms for load balancing in the experimental results, with respect to 10 devices and 20, 70, 100, and 200 tasks. For the correct interpretation of these graphic lines in Figure 7, we proceeded from the fact that the ideal line representing the best performance is the horizontal line crossing them in the middle. Therefore, the best algorithm is the one with the closest graphic line to that ideal line, illustrated by our algorithm for most of the cases.
IoT 2021, 2, FOR PEER REVIEW 22 horizontal line crossing them in the middle. Therefore, the best algorithm is the one with the closest graphic line to that ideal line, illustrated by our algorithm for most of the cases.    The results in these above tables clearly show the superiority of our algorithm. Indeed, our algorithm became more advantageous whenever the difference between the number of tasks and the number of devices became larger (i.e., in cases where the number of tasks is greater than the number of devices). This advantage is directly proportional to the increase in this difference in favor of tasks. Whenever this difference is large, the results generated by our algorithm are the best. The main reason for this is that the device and task groups were partitioned into three similar and relatively small subgroups. This reduced the time it took to navigate through them and find the appropriate devices to execute the tasks. The performance of the remaining algorithms, in decreasing order, is as follows: LCFP, Min-Min, and SCFP. From the above tables, it is easy to observe that our algorithm outperformed the other three algorithms with respect to three criteria: makespan, standard deviation, and the maximum of the absolute difference in completion times, whatever the number of devices or the number of tasks.
To summarize the results, our contribution is made through a dedicated approach adapted to the ubiquitous devices widely used in the IoT paradigm, particularly at the periphery of their communication networks. The goal was to build a new heuristic algorithm and compare it with the other scheduling heuristic algorithmsmostly used in the literature. The main advantages of this approach are brought up through the reduction of the size of the tasks to be performed and breaking them into smaller subtasks. In addition, the process of dividing the groups of subtasks and devices into smaller sub-groups makes it possible for a parallel search to get the desired solution. Proceeding as such makes our algorithm faster and the quality of the generated solutions much better. Consequently, our approach saves a lot of energy, in addition to improving the efficiency of these devices.

Conclusions and Future Works
The effectiveness of any scheduling algorithm depends, among other things, upon  The results in these above tables clearly show the superiority of our algorithm. Indeed, our algorithm became more advantageous whenever the difference between the number of tasks and the number of devices became larger (i.e., in cases where the number of tasks is greater than the number of devices). This advantage is directly proportional to the increase in this difference in favor of tasks. Whenever this difference is large, the results generated by our algorithm are the best. The main reason for this is that the device and task groups were partitioned into three similar and relatively small subgroups. This reduced the time it took to navigate through them and find the appropriate devices to execute the tasks. The performance of the remaining algorithms, in decreasing order, is as follows: LCFP, Min-Min, and SCFP. From the above tables, it is easy to observe that our algorithm outperformed the other three algorithms with respect to three criteria: makespan, standard deviation, and the maximum of the absolute difference in completion times, whatever the number of devices or the number of tasks.
To summarize the results, our contribution is made through a dedicated approach adapted to the ubiquitous devices widely used in the IoT paradigm, particularly at the periphery of their communication networks. The goal was to build a new heuristic algorithm and compare it with the other scheduling heuristic algorithmsmostly used in the literature. The main advantages of this approach are brought up through the reduction of the size of the tasks to be performed and breaking them into smaller subtasks. In addition, the process of dividing the groups of subtasks and devices into smaller sub-groups makes it possible for a parallel search to get the desired solution. Proceeding as such makes our algorithm faster and the quality of the generated solutions much better. Consequently, our approach saves a lot of energy, in addition to improving the efficiency of these devices.

Conclusions and Future Works
The effectiveness of any scheduling algorithm depends, among other things, upon the overall completion time. Partitioning large tasks into smaller tasks plays an important role in getting these tasks done with shorter completion times. Greedy scheduling algorithms are a popular and widely used approach that produces a balanced, near-optimal solution for scheduling problems. Based on this idea, we proposed a new greedy scheduling algorithm for autonomous smart mobile systems, especially those found at the network's periphery. It takes advantage of the ability of these devices to maintain connectivity through a D2D radio connection. This feature allows devices to continue to communicate and therefore to work with each other, even in the event of wireless failures or communication flooding on the network. Thus, they can always perform the necessary tasks at any time and in any situation. This algorithm is based on the idea of dividing large tasks into smaller tasks to facilitate and speed up the scheduling process and thus their rapid completion. It creates similar subgroups in terms of the size of the tasks and devices. Then, it allocates tasks to devices between similar subgroups and, if necessary, to the larger device subgroups. We have shown through a preliminary study that this algorithm is compatible with mobile systems having resource constraints. Due to the limited capacity of devices, they do not tolerate the long process of computing. Therefore, they are not able to adopt a complex scheduling method. Instead, they require a series of simple scheduling strategies. The preliminary study we conducted showed that our algorithm had a clear advantage over the LCFP, SCFP, and Min-Min algorithms. On the one hand, our algorithm generated small average times for allocating devices to tasks. On the other hand, our algorithm outperformed the three comparison algorithms we presented with respect to the makespan, standard deviation, and range of completion times criteria. Likewise, significant improvements in load balancing could also be added to these advantages.
Current research provides reliability in reducing the completion time for urgent requests and thus in speeding up their completion, especially in harsh environments. For future work, we aim to apply our algorithm to more scenarios and situations. In particular, we aim to use it with more autonomous intelligent mobile systems having a large number of tasks to be performed on multiple devices. These advantages of the proposed method could be added to other functionalities, such as maintaining connectivity and processing via parallelism. Institutional Review Board Statement: Not applicable but the 'Université du Québec à Chicoutimi requires that All research involving the participation of human beings, whether funded or not, conducted or supervised by its professors, employes, and students, be ethically reviewed. Links: http://recherche.uqac.ca/cer/ (accessed on March 2021).

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable but we generated the data in a random way to simulate and compare the algorithms.