An ANN-Based Approach for Real-Time Scheduling in Cloud Manufacturing

: The cloud manufacturing platform needs to allocate the endlessly emerging tasks to the resources scattered in different places for processing. However, this real-time scheduling problem in the cloud environment is more complicated than that in a traditional workshop because constraints, such as type matching, task precedence, resource occupation, and logistics duration, need to be met, and the internal manufacturing plan of providers must also be considered. Since the platform aggregates massive manufacturing resources to serve large-scale manufacturing tasks, the space of feasible solutions is huge, resulting in many conventional search algorithms no longer being applicable. In this paper, we considered resource allocation as the key procedure for real-time scheduling, and an ANN (Artiﬁcial Neural Network) based model is established to predict the task completion status for resource allocation among candidates. The trained ANN model has high prediction accuracy, and the ANN-based scheduling approach performs better than the preferred method in terms of the optimization objectives, including total cost, service satisfaction, and make-span. In addition, the proposed approach has potential in the application for smart manufacturing or Industry 4.0 because of its high response performance and good scalability.


Introduction
The development of the Internet, automation, intelligent decision support, and other technologies has driven the manufacturing industry to transform towards digitalization, networking, and intelligence [1][2][3]. The manufacturing environment of enterprises has gradually expanded from the traditional workshop to large-scale, highly dynamic networked version [4][5][6][7], such as the CMfg (Cloud Manufacturing), where the cloud platform has pooled massive virtualized manufacturing resources from geographically distributed providers [8,9], including hardware resources (such as machining center, lathe, assembly line, materials, etc.), software resources (such as CAD/CAE, simulation, computing capabilities), human resources, knowledge resources, and so on. Therefore, how to schedule these manufacturing resources to serve emerging manufacturing tasks in real-time is not only an urgent requirement in the CMfg environment but also a common problem of the generalized networked manufacturing models, such as Smart Manufacturing and Industrial 4.0 [10,11]. Compared with the traditional workshop environment, real-time scheduling in the CMfg is more complicated because logistics factors need to be considered when using distributed massive heterogeneous manufacturing resources [12], and complex manufacturing projects bring precedence constraints at the same time. In addition, the provider needs to take into account its internal processing plan while coordinating with the demander, and the considerations of differences in management, specifications, and protocols among the enterprises are also needed [13,14]. Although there have been many studies focused on The CMfg environment within the research scope of this article can be briefly described, as shown in Figure 1, and the meaning of the symbols is described in Table 1. T2 (4) T4 (4) T7 (1) T2 (8) T3 (8) T1 (3) T3 (9) T5 (9) Manufacturing Service Hub S1 S3 S2 S4

L c
The logistics duration vector of P c , each element corresponds to a type-matched MR.
The processing pending queue of MTs in R The immediate successor set of T The maximum available capacity of R The object value of P c , x ∈ {d, m, u} represents {make-span, cost, quality}.
The mean value of MTs in queue, x ∈ {g, p, l, v, d, d} represents the feature mark.
The standard deviation value of MTs in queue, x ∈ {g, p, l, v, d, d}represents the feature mark.
The probability for due time value located in interval n of T 1 predicate Indicator function (1 predicate = 1 if the predicate is true, else 0).

|S|
The element count of set S.
This architecture is mainly composed of three layers of functions, namely interface, application, and decision-making. The real-time scheduling problem comes from the decision-making layer. Accordingly, the process of the demander using the services provided by the platform can be described as follows: 1. Demanders publish MP (Manufacturing Project) to the platform; 2. Platform decomposes MP into MTs (Manufacturing Tasks); 3. Platform discovers and matches MR (Manufacturing Resource) in type-matched MS (Manufacturing Service) for these MTs; 4. Platform allocates MR for the processing of MT. 5. Providers arrange MTs using professional software and deliver the completed MTs to demanders. Therefore, the scheduling problem in the CMfg environment is as follows: how to reasonably allocate the emerging MTs to the type-matched MRs in real-time under the constraints of task precedence, resource occupation, and logistics duration while reducing the service costs, improving the service quality, and shortening the make-span for each MP?
Since the scheduling environment of CMfg is larger and more dynamic than that of the traditional workshop [17,18], it is much more difficult to find the optimal solution. As a consequence, the focus of related research turns to find the approximate optimal solution. As a major branch of approximation methods, a search-based heuristic scheduling approach expands the current schedule with well-designed strategies. The keys to designing such an approach are the neighborhood structure and the search direction, such as Simulated Annealing, Tabu-search, Discrete Search, Genetic Algorithm, and so on [19][20][21][22][23]. However, these algorithms cannot be directly applied to the scheduling problem in the CMfg environment because they will take a long time for the solution searching.
In order to solve the real-time scheduling problem, researchers have proposed and developed numerous approaches and algorithms that can be divided into two categories: generative and adaptive. Generative scheduling adopts the idea of integrating partial-schedules, and the key lies in the design of priority rules. However, the priority rules for the determined manufacturing environment are not suitable for dynamic random situations [24]. Hence, adaptive methods have attracted more researchers. Adaptive scheduling, which is also called rescheduling, modifies the existing schedule dynamically according to the decision environment. Proactive and reactive are the two main modes of rescheduling [25]. Due to the uncertain factors, such as task delay, unit failure, and order insertion, the disturbances become difficult to predict, which means that the proactive rescheduling cannot be used in the real-time environments [26]. As a result, more and more researchers have begun to focus on reactive or hybrid rescheduling methods [27]. One group studied the random resource constrained project scheduling problem. They transformed the problem into a multi-stage decision problem, and made dynamic decisions by designing rescheduling strategies [28]. Another group studied the rescheduling problem with new operation insertion in a single machine environment. They set up an event-triggered rescheduling model with the goal of minimizing the absolute deviation of the maximum delay compared with the baseline [29]. For the dynamic manufacturing environment where tasks arrive randomly, a dynamic event-driven task rescheduling method is designed to avoid service rearrangement, and the constructed parallel processing strategy takes service time, logistics time, the earliest available time, and other factors into consideration at the same time for optimal service selection [30]. Researchers have also proposed two reactive scheduling strategies based on the selection and buffering to solve the random resource constrained project scheduling problem. They found that the buffer-based reaction rules combined with the baseline scheduling turned out to be an effective solution [31].
The above reviews indicate that it is difficult to efficiently allocate massive manufacturing resources in the CMfg environment for the real-time scheduling. Research on the real-time scheduling in the CMfg is scarce, and the proposed scheduling methods are based on reactive scheduling strategy, the baseline of which is set in advance and needs to be modified as the environment changes. For the dynamic CMfg environment, such scheduling methods will consume numerous computing resources. Changing schedules often results in the re-allocation of resources, which will lead to additional logistics costs. Generally, real-time scheduling using these approaches requires a trade-off between the solution quality and decision efficiency. With the development of artificial intelligence, more and more teams are trying to apply related technologies to the manufacturing scope [32]. In order to solve the scheduling problem in the manufacturing environment, many neural networks are used to determine the key parameters to improve the performance of the genetic algorithm [33,34].
In this paper, we convert the aforementioned scheduling problem into the form of estimating the optimal objective value, and an ANN-based model is established to predict the task completion status for each candidate resource as the principle for allocating manufacturing tasks. The trained ANN model has high prediction accuracy, and the proposed ANN-based approach performs well in terms of the optimization objectives, including cost, satisfaction, and make-span. The high responsiveness of the ANN-based approach makes it suitable for the real-time scheduling in the CMfg environment.
The structure of the paper is as follows. First, we establish a mathematical model of the real-time scheduling problem in the CMfg environment in Section 2. Then, the solution framework of this problem is defined in Section 3. As the key issue in this framework, the decision of task allocation is made using a trained ANN model. Section 4 conducts comparison experiments for scheduling methods in terms of objectives and responsiveness. The application design for the proposed ANN-based approach is depicted in Section 5. Section 6 presents the conclusion.

Mathematical Modeling for the CMfg Scheduling Problem
According to the CMfg platform architecture (Figure 1), the real-time scheduling can be described as the procedure shown in Figure 2. The MPs in the CMfg platform consist of MTs with precedence sequence, the waiting ones for allocation are in white, and the ones in allocating are filled in gray, while the completed ones are filled in black. By using level-order traversal, the operator can allocate MTs to the type-matched MRs without precedence constraints, and these tasks will be manufactured by the corresponding providers.
Basic assumptions, as follows, are needed to focus on our research scope before mathematical modeling: 1. The set-up time for MT is already included in the processing time; 2. No interruption is considered in the processing of MTs; 3. The capacity of MR occupied by task processing will be released when the processing is completed; 4. Transportation logistics need to be considered before and after the processing of MT.

Formal Expression of the Main Components
The main components of the CMfg environment are MT and MR. Due to the consideration of management granularity, these two components are not directly presented to the scheduling problem.
Providers distributed in different locations register their encapsulated and virtualized MR capabilities to the CMfg platform. Then, the platform operator classifies these MRs according to their capability types and abstracts them into MS, which is an integrated form of MRs for the demander to use. Specifically, MS of type k can be expressed by a tuple, as shown in Equation (1).
where I k represents the label set of MR in the same type, and Q k (t) denotes the allocation pending queue of MTs at time t.
For MR labeled i in S k , its attributes are expressed as Equation (2).
represent unit cost and service quality, respectively. V j and e i,j is a 0-1 decision variable defined as Equation (4).
In the cloud platform, MP that is published by the demander will be pooled, and the precedence relationship of the contained MTs can be described as the activity-on-vertex network, as shown in Figure 3. −1 are dummy MTs to help stylize the MP into a single-input-single-output graph. Specifically, an MP labelled c can be expressed by a tuple shown in Equation (5).
where J c represents the label set of MT included in P c , B c denotes the publish time, and L c is the logistics duration vector defined as Equation (6).
where L (k) c,i is the logistics duration between P c and R (k) i . For MT labelled j in P c , its attributes can be represented as Equation (7).
where v j represent required service capacity, service capability type, and processing time, respectively. P (c) j and S (c) j , respectively, denote the immediate predecessor and the successor set. In this way, the mathematical model of real-time scheduling in the CMfg environment can be expressed as Equations (8)-(16), and it is a multi-objectives optimization problem.
where C t b :t e denotes the set of arrived MPs during time interval [t b , t e ), and the objective function of Equation (8) is the accumulated mean value of objective vector Z (c) defined as Equation (9). The composition of the objectives and the Constraints (10)-(16) will be described in detail in the rest of this section.

Optimization Objectives for Real-Time Scheduling
For each MP, the scheduling optimization objectives include three aspects, namely cost, make-span, and service quality. Without loss of generality, we proceed from the perspective of P c .
First of all, the service usage cost for any MP is incurred when the demander requests and uses specific MRs included in the type-matched MSs, it is equal to the sum of the usage costs of MTs inside the MP. The service usage cost for MT is determined by the unit cost of the allocated MR and the requirement for task processing.
i for processing, the service usage cost is determined by Equation (17).
Then, the service usage cost for P c will be Equation (18) Secondly, the make-span of MP refers to the overall processing time to complete all its MTs. It needs to consider the logistics duration for all the MTs inside the MP. According to the activity-on-vertex of P c as shown in Figure 3, its make-span can be expressed as Equation (19), where e (c) Thirdly, the quality of the product refers to the satisfaction of manufacturing P c , and it is determined by the intrinsic service quality of specific MRs. Since the satisfaction of processing T (c) j can be expressed as Equation (20), the quality of the product for processing P c can be defined as Equation (21), which is the lowest satisfaction obtained by processing MTs included in P c .

Constraints for Real-Time Scheduling
Constraints for real-time scheduling problems in the CMfg environment include task precedence, resources occupation, logistics duration, and so on.
Specifically, Constraints (10) and (11) indicate the logical limitation of allocation, which means that for type-matched MRs, any MT can only be assigned to one of them for processing, and each of the MT can only be processed once. As formulated in Equation (12), the processing time of any MT needs to be guaranteed, that is, the time span between the start time b    (15) and (16), which, respectively, mean the ready time of any MT is no earlier than the complete time of its predecessors and the start time of any MT is no earlier than its ready time plus its logistics duration. Where the auxiliary variable l (c) j is defined as Equation (22).

Artificial Neural Network based Resource Allocation Methodology
The main purpose of the real-time scheduling problem is to allocate MSs for every MT, and the allocation of T (c) j can be depicted as Figure 6. The candidate MRs are determined by their capability type and the upper bound of available capacity as formulated in Equation (23).
However, since the number of candidate MRs is large and the processing statuses of these MRs are changing over time, the conventional search algorithms are no longer applicable because they will spend a long time checking all of the possible time intervals in every candidate MR. Therefore, we adopt an ANN that is based on the multi-layer perceptron architecture to speed up the searching procedure by estimating the objective values of each candidate MR.
Specifically, take T (c) j as an example, after filtering out candidate MRs according to (23), the objective values projected on MT can be estimated by the completion status prediction model as shown in Figure 7.
where ϕ (c,j) i,n means the probability of completion time d j can be defined as Equation (27), which motivates the allocation of MR to process T (c) j as early as possible.
As shown in Figure 8, the ANN predicts the completion status of MT.
The inputs of this ANN are written in Input (29), which means all the possible features from the decision condition.
The decision condition for T (c) c,i ) and dynamic features of R (k) i defined in Equation (30).  i is defined as Equation (28).
The inputs of this ANN are written in Input (29), which means all the possible features from the decision condition.
The decision condition for T (c) c,i ) and dynamic features of R (k) i defined in Equation (30).
where µ (k) i (t) and σ The outputs of the ANN are formulated in Output (31), which means the possibilities of completion status as Equations (24)- (26).
Since the real completion status obtained by sampling is a discrete value as Equation (32), we convert the outputs into the style of one-hot encoding, which is in the same style of Equation (28).

Experimental Environment Setting
We conduct comparative experiments to verify the effectiveness of the proposed ANN-based approach for real-time scheduling, and the dataset is adapted from MPSPLib(www.mpsplib.com/ download.php), which consists of multiple project instances defined in Table 2. Table 2. Project Instance in MPSPLib (take file j301_1.sm as an example). Each project in the MPSPLib is enabled with random features as Equations (33)-(37) to imitate the process of publishing MPs to the CMfg platform from the demanders and to enlarge the diversity of MS from the providers.

MT Label (j) Processing Time (p
where U(a, b) represents a random distribution over the interval [a, b], and the involved parameters are listed in Table 3.

Preparation for Real-Time ANN-Based Scheduling Approach
The network inside the ANN-based approach is constructed in PyTorch, and its main parameters are listed in Table 4. Before starting to train the ANN model, we need to sample the data that is consistent with the style of Input (29) and Output (31). The quality of an ANN model can be judged by its cross-entropy loss function in Equation (38) during training, as depicted in Figure 9. As shown in Figure 9, after 1000 epochs, the training loss and testing loss both reached a small range and gradually converged. As the number of training epoch increases, the difference between training loss and testing loss becomes smaller and smaller, which indicates that the proposed ANN-based approach has the ability for generalization. Figure 10 plots the prediction accuracy as Equation (39) to evaluate the performance of the ANN model. After 1000 iterations, the prediction accuracy on the test set gradually converges to a high region about 94.7%, and other evaluation metrics are listed in Table 5. These metric values demonstrate the effectiveness of the ANN-based completion status prediction model.

Performance with Discussions
We use the modified NSGA-II (Nondominated Sorting Genetic Algorithm version II) [35] as the referred method for the comparative experiment. Specifically, the corresponding modification is based on the approximation that the whole problem is divided into the time dimension initially, then NSGA-II(α) is called to solve the divided sub-problems one by one. The problem segmentation factor α ∈ (0, 1] indicates the degree of the division, that α = 1 means no division and α → 0 will mimic the effect of real-time scheduling. Figure 11a-e show the performance comparison of these methods in terms of total cost, service satisfaction, and make-span.  Since the dataset groups for these experiments mentioned in Section 4.1 are dimensionless quantities, we use comparison ration in Equation (40) to evaluate the performance of the scheduling methods.
where Z obj (method) means the average of the objective value gained by the corresponding method, and the summarized experimental results are listed in Table 6. In each group of the dataset, the ANN-based method has an absolute advantage in the cost (Z d ) compared to the NSGA-II(α) method series. Although the advantages of ANN in make-span (Z d ) and service satisfaction (Z u ) are not significant, this method has also reached higher levels in both criteria. For the NSGA-II(α) method series, as the value of division factor α increases, the number of sub-problems decreases, which leads to the decrease of cost, but the make-span increases. As for the service satisfaction, all of the methods perform similarly, which are mainly due to the limited number of candidate MRs in each MS. It can be inferred that the ANN-based approach has a strong generalization ability since it has a great performance in different dataset groups.
In addition to the objective values, another important indicator is the responsiveness that measures the decision time of each allocation procedure in the scheduling. Figure 12 summarizes the average responsiveness for each dataset group, the decision time for using the ANN to determine a schedule is only about 4.4% of the NSGA-II(α) on average.

Exp-30
Exp-60 Exp-90 Exp-120 Exp-mix Dataset Group Name As for the performance of the ANN-based approach on a single MR allocation for each MT, Table 7 summarizes the average decision time. The average decision time for MR allocation is under 50 ms, and the average decision time for determining a schedule is under 40s, which indicates that the ANN-based approach is suitable for the real-time scheduling because, compared to the NSGA-II, the proposed ANN-based approach only takes 4.4% of the decision time to determine a sound schedule in such a discrete manufacturing environment, and the decision time is negligible if compared to the duration of resource configuration, manufacturing execution, transportation, and so on, which are usually measured in hours or days.

Application Design of ANN-Based Real-Time Scheduling in the Cloud Manufacturing Environment
Mold casting is a common industry that applies the CMfg. Figure 13 shows its demand and supply distribution in the Zhejiang province. It can be seen that the demanders are surrounded by numerous providers, and the resource optimization and coordination are in desperate need. With the increase in diversity and complexity of requirements from demanders, the real-time scheduling becomes much more difficult. Figure 14 shows the procedure for the demander to use MR via the CMfg platform. The key step is the allocation of MR, which can be realized by the ANN. Figure 15 shows the functional design of the application of the proposed ANN-based approach in the CMfg environment.  The "Intelligent coordination" module that implements the ANN will handle the real-time scheduling problem, and putting the design into practical application is an ongoing work.

Conclusions
In this paper, we studied the real-time scheduling problem in the CMfg environment. After presenting the premises and antecedents, several research contents and the results can be pointed out: • We constructed a mathematical model for real-time scheduling in the CMfg environment with optimization objectives as minimizing the cost, minimizing the make-span, and maximizing the service satisfaction constrained by task precedence, resource occupation, and logistics duration; • An ANN-based approach for real-time scheduling has been designed, it uses the MT attributes and process pending queue of MR as inputs to predict the completion status of MT if it will be allocated to any of the candidate MRs; • We conducted the comparison experiments and modified the NSGA-II as the referred scheduling method, the results show that: 1. The proposed ANN-based approach performs better than the NSGA-II in terms of objective values; 2. The response time of the ANN is only about 4.4% of the NSGA-II on average; 3. Using ANN, the average decision time for MR allocation is under 50 ms, which indicates that the proposed ANN-based approach is suitable for the real-time scheduling.
• We designed the application of ANN-based real-time scheduling to show how to implement this method to help coordinate providers of MR in the CMfg environment.
In addition, the proposed method has good performances on different MT scale datasets, which indicates its generalization ability. Since the proposed ANN-based approach solves the common problem for the platform based manufacturing environment, it can also be used in related web based manufacturing environments.
It should be noticed that the research in this article is mainly based on the library instances rather than the data from the real-world. Since there is no exactly the same real-time scheduling problem as described in this paper, the proposed ANN-based approach is not compared with other articles which also use ANN model to prove we have a better result. How to improve the ANN of the proposed real-time scheduling approach to perform better and apply this approach to the real world is our on-going work.

Conflicts of Interest:
The authors declare no conflict of interest.