Research on Collaborative Optimization of Green Manufacturing in Semiconductor Wafer Distributed Heterogeneous Factory

: Production scheduling of semiconductor wafer manufacturing is a challenging research topic in the ﬁeld of industrial engineering. Based on this, the green manufacturing collaborative optimization problem of the semiconductor wafer distributed heterogeneous factory is ﬁrst proposed, which is also a typical NP-hard problem with practical application value and signiﬁcance. To solve this problem, it is very important to ﬁnd an e ﬀ ective algorithm for rational allocation of jobs among various factories and the production scheduling of allocated jobs within each factory, so as to realize the collaborative optimization of the manufacturing process. In this paper, a scheduling model for green manufacturing collaborative optimization of the semiconductor wafer distributed heterogeneous factory is constructed. By designing a new learning strategy of initial population and leadership level, designing a new search strategy of the predatory behavior for the grey wolf algorithm, which is a new swarm intelligence optimization algorithm proposed in recent years, the diversity of the population is expanded and the local optimum of the algorithm is avoided. In the experimental stage, two factories’ and three factories’ test cases are generated, respectively. The e ﬀ ectiveness and feasibility of the algorithm proposed in this paper are veriﬁed through the comparative study with the improved Grey Wolf Algorithms—MODGWO, MOGWO, the fast and elitist multi-objective genetic algorithm—NSGA-II.


Introduction
With the advent of the fourth industrial revolution in the context of industrial 4.0, intelligent manufacturing has become a symbol of the new information age. The semiconductor industry, as the cornerstone of the information age, belongs to the high-tech industry and faces many opportunities and challenges. While it brings high profits, fierce industry competition also forces semiconductor manufacturers to continuously improve production efficiency and resource utilization [1]. In a semiconductor manufacturing process, wafer manufacturing is called the front-end process while wafer testing and packaging is called the back-end process. Because the equipment used in wafer manufacturing are expensive and the processing process is complex, it has attracted wide attention of scholars at home and abroad. Re-entrant is a remarkable feature of semiconductor wafer fabrication [2][3][4], which means that the jobs may repeatedly access certain devices at different stages of the processing process. More than one machine with at least one stage is used for processing. In addition, during the processing of the wafer, various physical and chemical processing steps are needed to form a required circuit layer on the surface of the wafer, and the processing steps involved mainly include oxidation, Fu et al. [19] established a stochastic multi-objective distributed permutation flow shop scheduling model considering total tardiness, make span, and total energy consumption, which also designed a new multi-objective brainstorming optimization algorithm to solve the problem. Rifai et al. [20] used the improved adaptive large neighborhood search algorithm to solve the distributed re-entrant permutation flow shop scheduling problem. It was the first time that the distributed re-entrant flow shop (DRFS) scheduling problem was studied, and it pointed out that DRFS problem was NP-Hard. To sum up, up to now, the studies on distributed scheduling problem in domestic and foreign literature is mostly confined to homogeneous factory, and most of them are research on flow shop, permutation flow shop, and flexible job shop. To the best of my knowledge, there is little research on a distributed reentrant hybrid flow shop. Distributed manufacturing of semiconductor wafers can be attributed to the distributed reentrant hybrid flow shop (DRHFS) problem, which has similar properties to the RHFS problem, but extends to multi-factory issues. Therefore, it is clear that it also belongs to the NP-Hard problem.
Environmental issues are the focus of global attention, and sustainable green manufacturing is a serious challenge for today's manufacturing industry. Noises and CO 2 emissions from the manufacturing industry are the main sources of pollution. For semiconductor wafer factories, because they are usually equipped with hundreds of precision equipment, and the wafer manufacturing process is very complex, there may be hundreds of products flowing simultaneously on the production line. Once one factory is established, it will run continuously for 365 days a year and 24 hours per day. In addition, the strict control of environmental parameters in the wafer production process makes it necessary for manufacturing factories and processing equipment to meet their production continuity requirements with uninterrupted high energy consumption. For example, precise temperature control of the etching process-the temperature fluctuation of 0.1 • C up and down will cause deviation of the wafers' excellent rate, which results in wafer scrapping. Therefore, for semiconductor wafer manufacturing, we should not only care about the process, but also put energy saving and environmental protection on the agenda. In this paper, carbon emissions from semiconductor wafer fabrication are taken as an energy-saving indicator for research. Regarding the study of carbon emissions indicator, Zhang et al. [21] created a flexible job shop low-carbon scheduling model that considers make span, machine load, and carbon emissions indicators, and proposed a hybrid non-dominated sorting genetic algorithm II to solve the model. In order to help enterprises to quantify the carbon footprint indicator, Liu et al. [22] proposed a method for calculating the carbon footprint of workshop products and an improved fruit fly optimization algorithm to minimize the carbon footprint and production cycle of all products. Nilakantan et al. [23] developed a study on the robot assembly line system, using a multi-objective co-evolution algorithm to solve the carbon footprint minimization problem. Piroozfar et al. [24] proposed an improved multi-objective genetic algorithm to solve the multi-objective flexible job shop scheduling problem while minimizing the total carbon footprint and total number of delayed jobs. So far, there is no literature on the research of carbon emissions indicator in the semiconductor wafer manufacturing.
Grey Wolf Optimizer (GWO) is a new swarm intelligence optimization algorithm designed in 2014 by Professor Mirjalili and his team. The algorithm is inspired by the strict social hierarchy and predatory behavior of the grey wolf population in nature [25]. It is used to solve single-objective optimization problems. In 2016, Mirjalili et al. proposed a Multi-Objective Grey Wolf Optimizer (MOGWO) [26] based on the GWO algorithm to solve multi-objective optimization problems. Due to its simple algorithm structure, parameter setting, and better optimization effect, the Grey Wolf algorithm has been widely used in the research of scheduling, image processing, medical treatment, and designing photonic crystal waveguides [27][28][29][30][31]. However, there is no relevant literature about its application in green collaborative scheduling of the semiconductor wafer distributed heterogeneous factory. In summary, this paper constructs a green manufacturing collaborative optimal scheduling model for a semiconductor wafer distributed heterogeneous factory for the first time, and designs an improved multi-objective Grey Wolf Optimizer (IMOGWO) to solve this problem. The remainder of this paper is organized as follows. The green manufacturing collaborative optimal scheduling model of the semiconductor wafer distributed heterogeneous factory is formulated in Section 2. The green collaborative scheduling problem of the semiconductor wafer distributed heterogeneous factory is described in detail in Section 3. Section 4 presents experimental results and analysis of the results. Section 5 provides the conclusions and future work.

DRHFS Problem Description
Under the concept of a global cooperative sharing economy, distributed job shop scheduling has become a hot topic for scholars at home and abroad. Distributed job shop scheduling includes many dispersed factories. In addition to the characteristics of traditional job shop scheduling, collaborative optimization among factories should also be considered. n total wafers need to be assigned to F factories for processing. Because the wafer production workshop is a dust-free environment, temperature, humidity, and air quality requirements are very high. The wafer surface is protected from the oxidative corrosion of the mixed air during the processing. Therefore, wafer i can only be allocated to one factory and, once it is allocated to factory f , all operations of wafer i must be allocated to factory f for processing, which avoid discarding wafers due to contamination caused by changing the production workshop.
No replacement of the processing factory is allowed during the processing. Job scheduling and machine allocation inside the factories belong to the RHFS problem, i.e., n(n < n total ) jobs are processed on s serial stages, and a job may need to be processed multiple times on one stage. In stage l, m l (m l ≥ 1) the same parallel machines can be selected for processing. The processing time of each machine is the same, and each operation can select any machine for processing at the corresponding stage. There is no priority constraint on the processing order between jobs, but the processing order between each operation of a job has a sequential constraint. At the same time, each operation can only be processed on one machine, and each machine can only process one operation. The number of jobs' reentrant, the number of jobs, and their processing time at each stage are fixed and known in advance. The number of production stages and the number of machines at each stage are fixed and known in advance. The capacity of buffers between successive stages is infinite. Preemption is not permitted, i.e., once an operation is started, it cannot be interrupted. The machines are continuously available for processing jobs throughout the scheduling period, and there is no maintenance, breakdown, or other interruptions. Because this paper studies the collaborative optimal scheduling problem of green manufacturing in semiconductor wafer distributed heterogeneous factory, which is assumed that all jobs can be processed in any factory, the number of production stages is the same among factories, the number of machines is different, the processing time of jobs are different in each factory, and the delivery time of jobs are different in each factory. Figure 1 is a schematic diagram of the DRHFS problem for n jobs processed in three factories. other interruptions. Because this paper studies the collaborative optimal scheduling problem of green manufacturing in semiconductor wafer distributed heterogeneous factory, which is assumed that all jobs can be processed in any factory, the number of production stages is the same among factories, the number of machines is different, the processing time of jobs are different in each factory, and the delivery time of jobs are different in each factory. Figure 1 is a schematic diagram of the DRHFS problem for n jobs processed in three factories.

Optimization Model of a Multi-Objective DRHFS Problem
In the multi-objective optimization problem, the objectives often conflict with each other. The enterprise decision-makers need to comprehensively consider the economic benefits as well as the social and environmental factors, and select a scheduling scheme from the Pareto solution set based on the actual production situation. In this paper, make span, total carbon emissions, and total tardiness are taken as optimization objectives. Among them, make span is the maximum of completion time in all processing factories. The total carbon emissions are the sum of total carbon emissions from all factories, and the total tardiness is the sum of total tardiness of all factories. Carbon emissions in the semiconductor wafer manufacturing process mainly include energy consumption during switching, processing, idling, coolant consumption, material loss of jobs, and carbon emissions caused by lubricant consumption. In this study, the carbon emissions generated from the loss of raw materials, coolant consumption, and power consumption of equipment on-off are not related to scheduling. Therefore, only carbon emissions from equipment processing, idling, and lubricant used are studied. In this paper, a mixed-integer programming model is established. Its description is shown in Tables 1 and 2.
Total tardiness of jobs processed in factory f Indicates the completion time of O ij must be earlier ∀i, j, l, l and l l than the starting time of O ij+1 Indicates any operation can be processed on only one machine in the corresponding stage Indicates one job can only be assigned to one factory for processing Indicates each machine in any stage can only process at most one operation at the same time Indicates the definition of make span Ni j=1 Y i, f r ijlk F e PW f lk T f i jlk

Basic GWO Algorithm Description
All grey wolves are divided into four grades, according to the fitness value of their solutions. The optimal, suboptimal, and third optimal solutions are randomly selected from the solution set as alpha (α) wolf, beta (β) wolf, and delta (δ) wolf, and the rest are called ω wolves. The search process of ω wolves is guided by α wolf, β wolf, and δ wolf. The mathematical model of the grey wolves surrounding the prey is as follows. Since the basic GWO algorithm is used to solve the continuous function optimization problem, the random key coding rule [32] based on ascending order is used to encode the jobs, which forms a one-to-one mapping of the individual position and the job processing order, in order to construct a one-dimensional chromosome encoding scheme. This method does not require excessive memory settings, but the assignment information of the jobs in the factories is not shown. In this paper, the transformation of chromosome from one dimension to two dimensions is realized by decoding the operation. The newly added one-dimensional chromosome is used to store the information that shows which factory the jobs are assigned to, and it occurs only during the manufacturing process, which is called the invisible chromosome in this paper. The decoding operation is divided into two phases. The first phase realizes the assignment of jobs to each processing factory. The second phase realizes the scheduling and corresponding machine allocation of assigned jobs in each factory. In the first phase, the heuristic rule DR is designed to decode, which considers processing time of jobs and factory load. In the second phase, the PS decoding method in literature [33] is used.
In order to clearly describe the two-stage decoding method, this paper gives a detailed description of it through a small-scale test case of semiconductor wafer fabrication with four jobs, three stages, and two re-entries. In this example, two processing factories are set up, each of which has three stages. The number of parallel machines at the same speed in the first factory is shown in References [1,2,2]. The processing power of one machine at the first stage is 7 kw/h and the idle power is 1 kw/h. The processing power of two machines at the second stage is 5 kw/h and the idle power is 2 kw/h. The processing power of two machines at the third stage is 8 kw/h and the idle power is 2 kw/h. The effective service time of lubricants on the machines of the same stage is equal, in order of [5.7,5. Table 3. The data outside the brackets is the processing time and delivery time of jobs processed in factory 1. The data inside the brackets is the processing time. The delivery time of the jobs processed in factory 2. [1, 3, 2, 4] represents the encoding scheme of jobs in this example. The decoding operation is first performed on job 1, and then job 3, 2, and 4 are sequentially performed.
The heuristic rule DR (Dynamic Rules) determines the strategy for allocation of jobs to the factories. The specific steps are as follows: Step 1: for each job, select a factory with less average processing time to process it. If there are multiple choices, go to Step 2.
Step 2: compare the number of jobs that have been allocated to the two factories, and select the factory with fewer jobs allocated for processing the job. If there are multiple choices, go to Step 3.
Step 3: compare the total number of machines in the two factories. The job is assigned to the factory with a greater number of machines for processing. If there are multiple choices, go to Step 4.
Step 4: randomly select one from factories for processing the job. The pseudocode for the core part of Step 1 is outlined in Algorithm 1.

Algorithm 1.
Calculation of average processing time. Table 3 and the number of machines at each stage in two factories. Output: The average processing time of job i in factory f (Table 4).

Input:
Step: Description We now use this test case to explain the DR rule. First, we calculate the average processing time of each job in two factories separately. The results are shown in Table 4. We use specific steps to calculate the average processing time of jobs in factory 1 as an example. Perform the decoding operation of the first phase according to [1,3,2,4] encoding scheme.
Because ___ (P 1 1 = 9) < ___ (P 2 1 = 9.5), the first job is scheduled to be processed in factory 1. For the second job ___ (P 1 3 = 7 .5) = ___ (P 2 3 = 7.5), factory 1 and factory 2 have the same average processing time. Then, considering the number of jobs that have been assigned to the two factories. Job 1 has been assigned to factory 1. However, no job has been assigned to factory 2. Therefore, job 3 is assigned to factory 2. For the third job ___ (P 1 2 = 10) > ___ (P 2 2 = 8.5), job 2 is assigned to factory 2 for processing. For the last job , it is assigned to factory 2 as well. In summary, job 1 is processed in factory 1, job 3, 2, and 4 are processed in factory 2, which completes the first phase of the decoding operation.
In the second phase of the decoding operation, the PS method determines the processing order and machine allocation strategy of assigned jobs in each factory. Figure 2a shows the scheduling Gantt chart of factory 1 processing job 1, and Figure 2b shows the scheduling Gantt chart of factory 2 processing job 3, 2, and 4. The PS decoding method means that each job is arranged to the machine, which, in the corresponding stage, can process the job earliest. As shown in Figure 2b, the first operation of job 4 can be processed by two machines at stage 1, and the processing time is P 21 41 = 2. The first idle state of the first machine at stage 1 is from time 2 to 4, which is just enough for processing O 41 . Since the earliest time was allowed to start processing on the second machine is S 41 T 12 = 2 (41 represents the first operation of job 4, and 12 represents the second machine of stage 1), the start processing time of O 41 is S 241 = 2. According to the PS method, the first machine with a previous serial number is selected for processing, and the end time of processing is C 41 = S 241 + P 21 41 = 2 + 2 = 4. In addition, for the sixth operation of job 4, it can select two machines at stage 3 for processing. The processing time of O 46 is P 23 46 = 1, the completion time of the previous operation is C 45 = 9. The earliest time of the first machine at stage 3, which meets the processing time constraint between operations allowed us to start processing is S 46 T 31 = 10. The earliest time of the second machine at stage 3, which meets the processing time constraint between operations allowed to start processing, is S 46 T 32 = 9. Due to S 46 T 31 > S 46 T 32 , the second machine is chosen for processing. The start processing time of O 46 is S 246 = 9. The machine allocation strategy for each operation of other jobs is the same. The corresponding objective function values in factory 1 is C max = 14 min, TCT = 1.5518 kgCO 2 , TDT = 0.8 min, and the corresponding objective function values in factory 2 is C max = 17 min, TCT = 3.6688 kgCO 2 , TDT = 3.6 min.
which, in the corresponding stage, can process the job earliest. As shown in Figure 2b, the first operation of job 4 can be processed by two machines at stage 1, and the processing time is = 21 41 2 P .
The first idle state of the first machine at stage 1 is from time 2 to 4, which is just enough for processing 41 O . Since the earliest time was allowed to start processing on the second machine is =

Learning Strategies of Initial Population and Leadership Level
For swarm intelligence algorithm based on iteration idea, the quality of initial population has a direct impact on the quality of solution. The IMOGWO algorithm is designed to perform a reverse learning operation after generating the initial grey wolf population [35]. The reverse individuals of the initial population are generated. Combining the two forms a new grey wolf population. According to the fitness value of each individual, the fast-non-dominant ranking and crowding distance of each individual are calculated [36]. According to the Pareto dominance relationship, the grey wolf individuals sorted in the top N (population number) position are selected to form the final initial grey wolf population in order to improve the quality of the initial population solution. The calculation formula of the individual position information reverse learning is shown in Equation (17).

Learning Strategies of Initial Population and Leadership Level
For swarm intelligence algorithm based on iteration idea, the quality of initial population has a direct impact on the quality of solution. The IMOGWO algorithm is designed to perform a reverse learning operation after generating the initial grey wolf population [35]. The reverse individuals of the initial population are generated. Combining the two forms a new grey wolf population. According to the fitness value of each individual, the fast-non-dominant ranking and crowding distance of each individual are calculated [36]. According to the Pareto dominance relationship, the grey wolf individuals sorted in the top N (population number) position are selected to form the final initial grey wolf population in order to improve the quality of the initial population solution. The calculation formula of the individual position information reverse learning is shown in Equation (17).
The pseudo-code for the core part of the reverse learning strategy is outlined in Algorithm 2. α Wolf, β wolf, and δ wolf represent three different leadership levels in the grey wolf population and guide the search behavior of ω wolves. Therefore, the quality of the individual solutions of α wolf, β wolf, and δ wolf directly affects the quality of solutions of the entire grey wolf population. However, in the actual predation behavior, the α wolf also needs to learn the predation experience of grey wolf individuals at the same level to constantly improve itself, β wolf needs to learn from α wolf, while δ wolf needs to learn from β wolf or α wolf to enhance their predation skills. Therefore, this paper designs a learning strategy for the leadership level. For the α wolf individual, two unequal random numbers a and b (a < b) are randomly generated, and the fliplr mutation is sequentially performed on the variable between a and b in each individual to generate a new α wolf individual. For β wolf individuals, one α wolf individual was randomly selected, and LOX cross-mutation [37] was carried out between them to produce a new β wolf individual. For the δ wolf individual, a random number r is randomly generated. If r < 0.5, one α wolf individual is randomly selected. Otherwise, one β wolf individual is randomly selected, and the LOX cross-mutation is carried out between the two to produce a new δ wolf individual. The pseudocode for the core part of LOX cross-mutation is outlined in Algorithm 3. P2 select from (α , β , δ , ω grey individuals) The pseudocode for the core part of LOX cross-mutation is outlined in Algorithm 3.

Predatory Behavior Search Strategy
The position updating of each individual in the grey wolf population is also influenced by the ω wolves, in addition to the three leadership levels of α wolf, β wolf, and δ wolf. Therefore, this paper designs a new formula for grey wolf individual location updating to ensure the diversity of predation search strategies. A random number rand between [0, 1] is generated. According to the value of rand, the LMOX cross operation is carried out between the current i-th grey wolf individual and the α wolf, β wolf, and δ wolf, which is an arbitrarily selected ω wolves individual, respectively. The formula is shown in Equation (18).
The LMOX cross operation generates an r flag array whose elements consist of 0 and 1.  Figure 4 for example. An r flag array with the same length as the grey wolf individual is generated, which consists of 1 and 0 elements. The value of each element is determined by the decision number rr, which is randomly generated between [0, 1]. If rr < 0.5, the corresponding position of r flag array is set to 1, otherwise 0. Copy the variables [3,4] whose position are 1and 4 to the offspring, meanwhile keep their order unchanged. The two variables are deleted from the selected β wolf individual π t β . The remaining variables [1,2] are copied to the offspring to form a new offspring individual π t+1 i . When the r flag array is all 1 or all 0, there are two special cases. The new offspring individuals are the same as the gray wolf individuals π t i and π t β , respectively.
position are 1and 4 to the offspring, meanwhile keep their order unchanged. The two variables are deleted from the selected β wolf individual β π t . The remaining variables [1,2] are copied to the offspring to form a new offspring individual π +1 t i . When the r flag array is all 1 or all 0, there are two special cases. The new offspring individuals are the same as the gray wolf individuals π t i and β π t , respectively.  P2 randomly select from (α , β , δ , ω grey wolves individuals)

6.
If index is not null  The pseudocode for the core part of the LMOX cross operation is outlined in Algorithm 4. P2 randomly select from (α, β, δ, ω grey wolves individuals) 6.
If index is not null 7. For If a is not null 13.
Remove the same elements from P2 and the remaining variables generate b 14.

Simulation Experiments
In order to verify the effectiveness of the IMOGWO algorithm proposed in this paper for solving green manufacturing collaborative optimization problems of the semiconductor wafer distributed heterogeneous factory, the NSGA-II algorithm proposed in literature [36], the MODGWO algorithm proposed in literature [38], and the MOGWO algorithm proposed in literature [26] are used as comparative algorithms in the experimental stage. Among them, MODGWO is an algorithm that improved the GWO algorithm and achieved better experimental results. MOGWO is the most original multi-objective Grey Wolf algorithm. The NSGA-II algorithm is a classical and effective algorithm for solving multi-objective optimization problems. In the experimental stage, the three objective functions are first normalized, according to the min-max standard, so that they are in the same order of magnitude, which is suitable for comprehensive comparative evaluation. The experimental simulation environment is set to Windows 10 operating system, the processor is Intel(R) Core (TM) i7-4770 CPU @3.40 GHz, the memory is 8 GB, and the algorithm is programmed with MATLAB R2017a.

Test Cases
In this paper, the collaborative optimization of green manufacturing in a distributed heterogeneous factory is studied. The relevant parameters of factory 1 are selected from 12 small scale bi-objective RHFS scheduling problem test cases designed by Cho et al. [8]. On the basis of this, the carbon emissions indicator is added. Due to the heterogeneity and rapid manufacturing feature of the distributed manufacturing, the processing time of the jobs in each stage of factory 2 and factory 3 are integers and conform to the uniform distribution between [1,10] and [1,20], respectively. The numbers of parallel machines at the same speed of each stage are integers and conform to the uniform distribution between [1,2]. The delivery time of jobs are calculated, according to the method in literature [39,40]. The formula is shown below.
Among them, PT i is the sum of the processing time of job i at each stage in the processing factory. The other parameters and values involved in algorithms are shown in Table 5.

Performance Measures
In this paper, SP, GD, IGD, and Ω are selected as evaluation indicators [33,41]. It is very important to note that we do not know the Pareto optimal solution or the best known Pareto solutions for the considering problem. Thus, we regard the Pareto optimal solutions of all test algorithms as the Pareto optimal solutions.
The SP indicator is used to evaluate the uniformity of the non-inferior solution distributed on the Pareto front. The minimum value of SP is 0. The smaller the value, the more uniform of the non-inferior solution distribution. The formula is shown in Equation (21).
where i, j = 1, . . . , |PF|,i j. d is the mean of all d i .|PF| is the number of non-inferior solutions on the Pareto front.
The GD indicator is used to measure the approximation between the Pareto front obtained by this algorithm and the optimal Pareto front. The minimum value of GD is 0. The smaller the value, the closer the non-inferior solution on the Pareto front to the optimal Pareto front. The formula is shown in Equation (22).
where |PF| is the number of non-inferior solutions on the Pareto front. d i is the Euclidean distance between the i-th solution on the Pareto front and the nearest solution on the optimal Pareto front.
The IGD indicator is more comprehensive than the GD indicator, which can evaluate the convergence and diversity of the non-dominant solutions simultaneously. The smaller the IGD value, the better the performance of the algorithm is. The formula is shown in Equation (23).
where dist(x, PF) is the Euclidean distance of a solution on the optimal Pareto front and the nearest solution on the obtained Pareto front. |PF * | represents the number of non-inferior solutions on the optimal Pareto front. Ω indicator is used to calculate the percentage of the non-inferior solution set obtained by an algorithm to the optimal Pareto solution, which reflects the dominant performance of the algorithm. The formula is as shown in Equation (24).
where L i represents the non-inferior solution set obtained by the i-th algorithm,|P(O)| represents the number of non-dominated solutions in the set O. Ω L k takes a value between [0, 1], the larger it is, and the better the dominance of the Pareto front obtained by the algorithm.

Performance Testing and Results Analysis
In the experimental stage, each algorithm runs 10 times independently, taking the average value as the final result, and the optimal values of each indicator are shown in bold. The mean value (Avg) and the minimum value (Min) or the maximum value (Max) of each indicator are selected for comparative analysis. For SP, GD, and IGD indicators, the smaller the value is, the better the algorithm is. Then, its minimum value (Min) is selected. For the Ω indicator, the larger the value is, the better the dominance of the algorithm is. Then, its maximum value (Max) is selected. It can be seen from Table 6 that the IMOGWO algorithm is superior to the MODGWO, MOGWO, and NSGA-II algorithms in the uniformity, convergence, diversity, and dominance of the solutions on the Pareto front for 2 factories cases. In order to more clearly determine whether there are significant differences between algorithms, SPSS Statistics 17 was used in this paper to conduct the Wilcoxson symbolic rank test, and the results are shown in Table 7. Table 6 shows the experimental results of the test cases with 2 factories. For the dominant proportion of 'Avg' in SP indicator, the IMOGWO algorithm is 59%, the MODGWO algorithm is 8%, the MOGWO algorithm is 33%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min', the IMOGWO algorithm is 25%, the MODGWO algorithm is 33%, the MOGWO algorithm is 25%, and the NSGA-II algorithm is 17%. For the dominant proportion of the 'Avg' in the GD indicator, the IMOGWO algorithm is 83%, the MODGWO algorithm is 0%, the MOGWO algorithm is 17%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 83%, the MODGWO algorithm is 8.5%, the MOGWO algorithm is 8.5%, and the NSGA-II algorithm is 0%. For the dominant proportion of 'Avg' in the IGD indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 83%, the MODGWO algorithm is 8.5%, the MOGWO algorithm is 8.5%, and the NSGA-II algorithm is 0%. For the dominant proportion of 'Avg' in Ω indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Max,' the IMOGWO algorithm is 75%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 17%. As can be seen from 'Mean' in the last line of Table 6, the IMOGWO algorithm is superior to the other three compared algorithms in SP, GD, IGD, and Ω indicators. Table 7 summarizes the results of the Wilcoxson symbolic rank test with 2 factories, from which it can be seen that, all relevant p-values are less than 0.05 for IGD and Ω indicators. For SP and GD indicators, the p-values are less than 0.05 for all except when IMOGWO compares with the MOGWO algorithm aiming at one problem. It can be concluded that the proposed IMOGWO is significantly superior to the other compared algorithms at 95% confidence level for IGD and Ω indicators. For SP and GD indicators, there is no significant difference between IMOGWO and MOGWO for 01 problem, but compared with other algorithms, IMOGWO is significantly superior to them. The box-plots of the four evaluation indicators in Figure 5 further verifies the conclusion.      It also can be seen from Table 8 that the IMOGWO algorithm is superior to the MODGWO, MOGWO, and NSGA-II algorithms in the uniformity, convergence, diversity, and dominance of the solutions on the Pareto front for 3 factories cases. In order to more clearly determine whether there are significant differences between algorithms, SPSS Statistics 17 was used in this paper to conduct the Wilcoxson symbolic rank test, and the results are shown in Table 9. Table 8 shows the experimental results of the test cases with three factories. For the dominant proportion of 'Avg' in the SP indicator, the IMOGWO algorithm is 67%, the MODGWO algorithm is 0%, the MOGWO algorithm is 16.5%, and the NSGA-II algorithm is 16.5%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 42%, the MODGWO algorithm is 16.5%, the MOGWO algorithm is 16.5%, and the NSGA-II algorithm is 25%. For the dominant proportion of 'Avg' in the GD indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 83%, the MODGWO algorithm is 8.5%, the MOGWO algorithm is 8.5%, and the NSGA-II algorithm is 0%. For the dominant proportion of 'Avg' in the IGD indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For the dominant proportion of 'Avg' in the Ω indicator, the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Max,' the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. As can be seen from 'Mean' in the last line of Table 8, the IMOGWO algorithm is superior to the other three compared algorithms in GD, IGD, and Ω indicators. For the SP indicator, the IMOGWO is slightly worse than the NSGA-II algorithm. Table 9 summarizes the results of the Wilcoxson symbolic rank test with three factories, from which it can be seen that, all relevant p-values are less than 0.05 for GD, IGD, and Ω indicators. For SP indicators, the p-values are less than 0.05 for all except when IMOGWO compares with the MOGWO algorithm aiming at the 01 problem, and when IMOGWO compares with MOGWO, the NSGA-II algorithm aims at 11 problem. It can be concluded that the proposed IMOGWO is significantly superior to the other compared algorithms at 95% confidence level for GD, IGD, and Ω indicators. For SP indicators, there is no significant difference between IMOGWO and MOGWO (for 01 and 11 problems), NSGA-II (for 11problem), but, compared with the MODGWO algorithm, IMOGWO is significantly superior to it. The box-plots of the four evaluation indicators in Figure 6 further verify the conclusion.  It also can be seen from Table 8 that the IMOGWO algorithm is superior to the MODGWO, MOGWO, and NSGA-II algorithms in the uniformity, convergence, diversity, and dominance of the solutions on the Pareto front for 3 factories cases. In order to more clearly determine whether there are significant differences between algorithms, SPSS Statistics 17 was used in this paper to conduct the Wilcoxson symbolic rank test, and the results are shown in Table 9. Table 8 shows the experimental results of the test cases with three factories. For the dominant proportion of 'Avg' in the SP indicator, the IMOGWO algorithm is 67%, the MODGWO algorithm is 0%, the MOGWO algorithm is 16.5%, and the NSGA-II algorithm is 16.5%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 42%, the MODGWO algorithm is 16.5%, the MOGWO algorithm is 16.5%, and the NSGA-II algorithm is 25%. For the dominant proportion of 'Avg' in the GD indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 83%, the MODGWO algorithm is 8.5%, the MOGWO algorithm is 8.5%, and the NSGA-II algorithm is 0%. For the dominant proportion of 'Avg' in the IGD indicator, the IMOGWO algorithm is 100%, the MODGWO algorithm is 0%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Min,' the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA II algorithm is 0%. For the dominant proportion of 'Avg' in the Ω indicator, the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. For its dominant proportion of 'Max,' the IMOGWO algorithm is 92%, the MODGWO algorithm is 8%, the MOGWO algorithm is 0%, and the NSGA-II algorithm is 0%. As can be seen from 'Mean' in the last line of Table 8, the IMOGWO algorithm is superior to the other three compared algorithms in GD, IGD, and Ω indicators. For the SP indicator, the IMOGWO is slightly worse than the NSGA-II algorithm. Table 9 summarizes the results of the Wilcoxson symbolic rank test with three factories, from which it can be seen that, all relevant p-values are less than 0.05 for GD, IGD, and Ω indicators. For SP indicators, the p-values are less than 0.05 for all except when IMOGWO compares with the MOGWO algorithm aiming at the 01 problem, and when IMOGWO compares with MOGWO, the NSGA-II algorithm aims at 11 problem. It can be concluded that the proposed IMOGWO is significantly superior to the other compared algorithms at 95% confidence level for GD, IGD, and Ω indicators. For SP indicators, there is no significant difference between IMOGWO and MOGWO (for 01 and 11 problems), NSGA-II (for 11problem), but, compared with the MODGWO algorithm, IMOGWO is significantly superior to it. The box-plots of the four evaluation indicators in Figure 6 further verify the conclusion.

Case Analysis
This paper takes the test case of Sproblem-02-01 in two factories as an example to make a specific analysis. There are 15 jobs in this case, and the number of re-entry is 1. It is assumed that there are eight stages in each production line, including oxidation, deposition, injection, metallization, lithography, etching, polishing, and cleaning. The related parameters of the first factory are shown in Table 10, and those of the second factory are shown in Table 11. The Pareto optimal fronts obtained by each algorithm for the test case are shown in Figure 7. Figure 7a is a three-dimensional graph of three optimization objectives. The Pareto front obtained by the IMOGWO algorithm is located at the lower right of the Pareto fronts obtained by the other three algorithms. This is closer to the real Pareto front, followed by NSGA-II, MOGWO, and MODGWO, which has the worst convergence. Among the 17 solutions of obtained optimal Pareto front, 15 are obtained by IMOGWO, one is obtained by MODGWO, and one is obtained by NSGA-II. The two-dimensional diagram of make span and total carbon emissions is shown in Figure 7b, the two-dimensional diagram of total carbon emissions and total tardiness is shown in Figure 7c, the two-dimensional diagram of make span and total tardiness is shown in Figure 7d. We can see from them that there are contradictions among the optimization indicators. Improvements in one indicator can lead to deterioration of at least one other indicator. Table 12 lists part of the solutions and their corresponding scheduling schemes.

Case Analysis
This paper takes the test case of Sproblem-02-01 in two factories as an example to make a specific analysis. There are 15 jobs in this case, and the number of re-entry is 1. It is assumed that there are eight stages in each production line, including oxidation, deposition, injection, metallization, lithography, etching, polishing, and cleaning. The related parameters of the first factory are shown in Table 10, and those of the second factory are shown in Table 11. The Pareto optimal fronts obtained by each algorithm for the test case are shown in Figure 7. Figure 7a is a three-dimensional graph of three optimization objectives. The Pareto front obtained by the IMOGWO algorithm is located at the lower right of the Pareto fronts obtained by the other three algorithms. This is closer to the real Pareto front, followed by NSGA-II, MOGWO, and MODGWO, which has the worst convergence. Among the 17 solutions of obtained optimal Pareto front, 15 are obtained by IMOGWO, one is obtained by MODGWO, and one is obtained by NSGA-II. The two-dimensional diagram of make span and total carbon emissions is shown in Figure 7b, the two-dimensional diagram of total carbon emissions and total tardiness is shown in Figure 7c, the two-dimensional diagram of make span and total tardiness is shown in Figure 7d. We can see from them that there are contradictions among the optimization indicators. Improvements in one indicator can lead to deterioration of at least one other indicator. Table 12 lists part of the solutions and their corresponding scheduling schemes.      Taking the first solution set in Table 12 as an example, this paper draws scheduling Gantt charts of jobs in factory 1 and factory 2, which are shown in Figure 8. The make span of jobs in factory 1 is 126 min, the total carbon emissions are 40.2161 kgCO 2 , and the total tardiness is 120.1580 min. The make span of jobs in factory 2 is 140 min, the total carbon emissions are 67.3113 kgCO 2 , and the total tardiness is 41.4895 min. The make span of all jobs is 140 min, which is the maximum of make span in two factories. The total carbon emissions are 107.5274 kgCO 2 , which are the sum of total carbon emissions in two factories, and the total tardiness is 161.6475 min, which is the sum of total tardiness in two factories. Taking the first solution set in Table 12 as an example, this paper draws scheduling Gantt charts of jobs in factory 1 and factory 2, which are shown in Figure 8. The make span of jobs in factory 1 is 126 min, the total carbon emissions are 40.2161 kgCO2, and the total tardiness is 120.1580 min. The make span of jobs in factory 2 is 140 min, the total carbon emissions are 67.3113 kgCO2, and the total tardiness is 41.4895 min. The make span of all jobs is 140 min, which is the maximum of make span in two factories. The total carbon emissions are 107.5274 kgCO2, which are the sum of total carbon emissions in two factories, and the total tardiness is 161.6475 min, which is the sum of total tardiness in two factories.

Conclusions
In this paper, the IMOGWO algorithm is designed to solve the green manufacturing collaborative optimization problem of the semiconductor wafer distributed heterogeneous factory. The IMOGWO realizes the rational allocation of jobs to factories and the production scheduling of

Conclusions
In this paper, the IMOGWO algorithm is designed to solve the green manufacturing collaborative optimization problem of the semiconductor wafer distributed heterogeneous factory. The IMOGWO realizes the rational allocation of jobs to factories and the production scheduling of assigned jobs in factories by adopting heuristic rules DR and the PS decoding method. By designing a reverse learning strategy for the initial population, the fliplr mutation and LOX cross-mutation learning strategies for the leadership level, and the LMOX cross search strategy for the grey wolf individual and randomly selected α wolf, β wolf, δ wolf, and the ω wolf, which enlarges the diversity of the population and improves the quality of the solution. Simulation results demonstrate that the proposed IMOGWO has certain advantages and competitiveness compared with the other three algorithms. The algorithm designed in this paper can find the optimal solution of the scheduling scheme by enlarging the diversity of the population so that the results can jump out of the local optimum. The green scheduling model designed in this paper focuses on the carbon emission indicator. All of these are also applicable to other production scheduling problems, such as distributed flexible job shop green scheduling, distributed uncertain job shop green scheduling, and so on. In the actual semiconductor production, new wafers are continuously put into processing every day, and some operations such as trial processing lead to uncertain processing time of wafers. Therefore, uncertain scheduling in semiconductor wafer manufacturing is our future research direction. In addition, a semiconductor production line may be composed of hundreds of devices, and there may be hundreds of different flowing product types. This is on a very large scale. Therefore, it is imperative to study the scheduling problem of large-scale jobs in semiconductor production.
Lastly, the discussion of more novel technical methods, such as the game theory model, the fuzzy theory, machine learning, and Internet of Things technology to solve the scheduling problem of distributed job shop manufacturing in an effective time is also the focus of our future research.
Author Contributions: J.D. contributed to developing ideas about the framework with algorithms, conducted the experiments, and wrote the paper. C.Y. contributed toward providing guidance for this paper.