An Expandable Yield Prediction Framework Using Explainable Artiﬁcial Intelligence for Semiconductor Manufacturing

: Enormous amounts of data are generated and analyzed in the latest semiconductor industry. Established yield prediction studies have dealt with one type of data or a dataset from one procedure. However, semiconductor device fabrication comprises hundreds of processes, and various factors affect device yields. This challenge is addressed in this study by using an expandable input data-based framework to include divergent factors in the prediction and by adapting explainable artiﬁcial intelligence (XAI), which utilizes model interpretation to modify fabrication conditions. After preprocessing the data, the procedure of optimizing and comparing several machine learning models is followed to select the best performing model for the dataset, which is a random forest (RF) regression with a root mean square error (RMSE) value of 0.648. The prediction results enhance production management, and the explanations of the model deepen the understanding of yield-related factors with Shapley additive explanation (SHAP) values. This work provides evidence with an empirical case study of device production data. The framework improves prediction accuracy, and the relationships between yield and features are illustrated with the SHAP value. The proposed approach can potentially analyze expandable ﬁelds of fabrication conditions to interpret multifaceted semiconductor manufacturing.


Introduction
In the highly competitive semiconductor manufacturing industry, yield analysis plays a vital role in increasing production efficiency and reducing operating costs.This is because yield is one of the most critical factors in profit, and yield analysis results are applied to enhance yield and adjust settings for poor process conditions.If yield can be predicted in advance, production planning efficiency would increase, and the accomplishment of the yield enhancing plan would be confirmed punctually.Since modern electronic devices are scaled and complicated, the overall manufacturing procedure takes over several weeks from the input of wafers to the chip packaging.Briefly, the fabrication of a device generally consists of three stages: (1) the wafer fabrication procedure constructs integrated circuits on each wafer via hundreds of precise and well-controlled processes; (2) the assembly utilizes wafer probing test results to sort dice and encases each die through multiple steps; and (3) the final test stage includes electrical testing to verify the reliability and the quality of produced chips.When there is a problematic situation that lowers the yield in the wafer fabrication stages and the cause of the lowering yield is detected in the final test phase, it lowers the profit and takes a considerable time to normalize the whole manufacturing procedure [1].Therefore, at the end of the wafer fabrication phase, there are probing and testing steps to exclude wafers expected to have a low final test yield.Developing a model to predict the yield and counteract the lowering yield procedures in a timely manner based Appl.Sci.2023, 13, 2660 2 of 14 on the induced knowledge, is one way to effectively utilize the intermediate test results.The prediction models can strengthen the competitiveness of a semiconductor manufacturer.
In work-site operations, a substantial part of yield analysis is practiced manually by experienced domain engineers.For instance, the wafers expected to have yield improvement are analyzed to confirm yield-enhancing process conditions.After the confirmation, the wafers with the modified process are predicted to have the same yield gain.On the other hand, if an instrument malfunctions, the wafers processed in the machine for a certain amount of time are predicted to have a lower yield.This method has a limit because there are possibilities for engineers to miss yield-affecting circumstances and for the same condition of one step to affect each wafer differently.Moreover, there are overfull data to monitor and examine manually in practical yield prediction.The various monitoring systems in production lines include metrology and inspection tools for monitoring processes, sensors on instruments to check the process conditions, and tests of properties for each wafer, such as electric die sorting (EDS) yield, wafer acceptance test (WAT), and final test (FT) [2].These systems monitor only a limited number of wafers, a part of the area on chips, or certain conditions due to limitations of time, capacity, and cost.Therefore, they are not enough to identify multifaceted factors that enhance or lower the yield.Another limitation is that various data are stored in several systems separately for individual areas and are organized in different formats, while there are plenty of kinds of data, including numerical type, categorical type, and serial type, among others.Additionally, practical yield analysis is repeatedly conducted for each stored data to screen the yield-related features.Consequently, the yield analysis necessarily includes wide ranging data and needs to be expandable.
An overview of the related literature is presented in this paragraph.In previous studies, semiconductor yield analysis has a long history, and particular research on yield modeling is summarized in the reference [3].Various perspectives include defective yield loss [4], statistical process control [5], and integrated process specification systems [6].Recently, machine learning techniques have attracted much attention as they enable researchers to handle large-scale data and automate analysis.In numerous prediction studies in diversified fields including information technology, geographical science, industrial manufacturing, and energy production, established regression models for each research purpose are used through comparison and selection among various machine learning and deep learning models [7][8][9][10][11][12].In our study, we adopted the procedure of choosing the most fitting model for the prediction.Specifically, the machine learning-based method has been applied in various studies in the semiconductor industry [2,[13][14][15][16][17][18].In our view, yield research mainly focuses on analysis with one kind of input variable.Various types of independent variables exist; however, the most used variables are one type or from a dataset obtained in one test procedure.One study focuses on non-normal distribution test parameters to estimate the yield using the principal component analysis [19].Considering the delays of gates and paths on a chip, researchers predict parametric yield using statistical timing analysis [20].Machine learning-based root cause detection approaches are developed to target a specific circuit test yield [21].The existing approaches for metrology data usage for yield classification focus on the imputation of missing data and counteraction of the imbalanced classes [22].There are studies on back-end FT yield regression and classification modeling based on front-end WAT data [1,23].Some studies preprocess the categorical input data using one-hot encoder or other encoders to utilize non-numerical data as input data for machine learning models, and the categorical and numerical data are recorded in the same step [1,24].In previous research, big data analysis for low yield validates its detection efficiency with simulation, especially for the development of new devices [25].There is a study to modify the gradient boosting algorithm model to analyze multi-step data from semiconductor manufacturing [26].
Regarding XAI, a few studies have applied this method for semiconductor manufacturing.As discussed in the literature, XAI has been studied to apply artificial intelligence with more transparency while maintaining high performance [27].Additionally, the explainability makes the continuous improvement of models possible and approves AI-based decisions.The SHAP value method is adopted to improve process quality in the real production of semiconductor devices [28].SHAP analysis is used to rank the features that affect the electrical test scores of the device [24].To the best of our knowledge, few studies cover machine learning-based modeling using various kinds of wafer fabrication data to predict EDS yield and explain each wafer and feature with the XAI method.This study uses different types of data from the whole wafer fabrication stage as input data to examine yield impact.Multiple machine learning models are trained to compare prediction performance for the wafer yield and the top model is explained on the basis of SHAP values.Brief explanations of the utilized machine learning algorithms and the SHAP value method are mentioned in the proposed framework part of this manuscript.
Less attention has been focused on the analysis of multiple kinds of data in the frontend production stage to predict EDS yield.Therefore, this paper aims to predict wafer yield based on combined fabrication data to include possible causes in investigating as many fields as possible and speeding up the feedback.The framework is expanded by interpreting the model to determine the yield-lowering factors and by explaining the relationship between factors and yields.The XAI technique-based SHAP decomposes individual attribution of target-affecting factors [29,30].The main motivation for this research is to build an automated yield prediction model, to enhance the effectiveness of production scheduling, to report an analysis for adjusting the problematic condition of processes, and to deepen the understanding of fabrication conditions.To sum, the work presented here provides the highlights listed below:

•
The yield prediction framework utilizes various types of fabrication data, allowing input data expansion; • XAI technology, i.e., SHAP, is implemented and improves the possibility of explanation for the most performant model; • Demonstration using a real-world dataset is analyzed by SHAP values, including the discovery of factors affecting yield.
Moreover, this work possibly contributes to developing procedures of advanced devices, including frequent evaluation of various process modifications in the wafer fabrication, with the determination of yield-affecting factors.
The remainder of this paper is as follows: Section 2 describes the framework and methods used in this study.Section 3 provides main findings of a case study and further discussion.Finally, Section 4 provides the conclusion, which contains the possible application of this proposal and the suggestion for future studies.

Proposed Framework
The previous section reveals that earlier studies have not included mostly diversified fabrication data in modeling or the XAI techniques for explaining models.To solve the addressed problems, we propose a method, i.e., building models with four different types of data to predict EDS yield and interpreting the chosen model with SHAP values.Although the input and output data are selected for practical use in the field, the framework is adoptable for expansion and various investigations.The framework includes (1) data preparation with preprocessing, (2) model optimization and selection, and (3) prediction of yield and explanation of the model.Figure 1 shows the overall framework proposed in this paper.To confirm the stability of the expandable framework, the framework is applied to each dataset, and the extracted yield-related features are compared.

Data Preparation with Preprocessing
The input variables consist of various data subsets related to wafer fabrication, including process operating conditions, time spans, equipment units, and some sensor parameters.These are common fields of investigation for practical yield analysis.Various fabrication data are combined with each wafer to organize the input variables into a twodimensional form.
The numerical data include the time spans from process steps to a designated step, sensor parameters measured for each processed wafer on several steps, and EDS yield data as target variables.Each data item has a different scale of values; hence, standardization is necessary to prevent models from being biased.The categorical data include the settings of conditions for process operation and units of instruments on several processes.These are string-type data that need to be converted into numeric values for machine learning applications [1].The one-hot encoding method is used to transform data and maintain the information on column names, which is useful in the explanation stage.The encoding procedure is chosen since most of the categorical data have no rank order, and the same degree separating ordinal encoding possibly delivers unintended meaning.The Pearson correlation coefficients are calculated, and the dimensions of input data are

Data Preparation with Preprocessing
The input variables consist of various data subsets related to wafer fabrication, including process operating conditions, time spans, equipment units, and some sensor parameters.These are common fields of investigation for practical yield analysis.Various fabrication data are combined with each wafer to organize the input variables into a two-dimensional form.
The numerical data include the time spans from process steps to a designated step, sensor parameters measured for each processed wafer on several steps, and EDS yield data as target variables.Each data item has a different scale of values; hence, standardization is necessary to prevent models from being biased.The categorical data include the settings of conditions for process operation and units of instruments on several processes.These are string-type data that need to be converted into numeric values for machine learning applications [1].The one-hot encoding method is used to transform data and maintain the information on column names, which is useful in the explanation stage.The encoding procedure is chosen since most of the categorical data have no rank order, and the same degree separating ordinal encoding possibly delivers unintended meaning.The Pearson correlation coefficients are calculated, and the dimensions of input data are reduced in case of coefficient values exceeding 0.95 [23].We split the dataset into training and test sets and used the only training set for model optimization and selection.

Model Optimization and Selection
In this research, ten popular and high-performance regression models were hired for comparison.The ten machine learning algorithms consist of a linear-based learner, four tree-based models, two kernel-based learners, a neural network-based learner, an instancebased learner, and a sample consensus algorithm [8].The training dataset was used for training and validation, adopting the cross-validation method which splits the training set multiple times to obtain different training and validation datasets and avoids overfitting.The lasso algorithm is a linear model using a regularizer of L1, with strength in excluding useless variables [31].The adaptive boosting (AdaBoost) regression model is a decision tree-based model that uses multiple regressors trained according to the errors of the prior regressors [32].Two more tree-based boosting models are extreme gradient boosting (XG-Boost) regression and light gradient boosting model (LightGBM) regression.XGBoost uses a gradient boosting mechanism and is well-known for efficient computing [33].Light-GBM is well-known for its speed and leaf-wise expanded growth strategy [34].The RF regression algorithm uses bagging techniques to build trees using subsamples and random subsets of predictors, and RF aggregates multiple tree models to avoid overfitting [35].The two kernel-based regression models are the support vector regression (SVR) model and the Gaussian process regression (GPR) model.SVR, or support vector machine-based regression problem solving, employs the kernel trick of mapping input vectors to higherdimensional feature spaces [36].GPR is a nonparametric machine learning model based on the Bayesian approach, which is beneficial for measuring uncertainty over prediction [37].Multilayer perceptron (MLP) is a feedforward artificial neural network algorithm that trains models with the backpropagation technique [38].By using neighborhood interpolation, K nearest neighbor (KNN) regression predicts the target [39].Random Sample Consensus (RANSAC) is an algorithm that generates putative solutions with the most points in a consensus set through random sampling iteration and is suitable for datasets with a high number of outliers [40].In this study, sklearn in Python was used for Lasso, AdaBoost, RF, SVR, GPR, MLP, KNN, and RANSAC modeling.The Python packages xgboost and lightgbm are used for XGBoost and LightGBM modeling.
The RMSE and mean absolute error (MAE) are the metrics for calculating model performance [7,12].The formulas are as follows: where n denotes the total quantity of data points, y i denotes the empirical yield values in the dataset, and ŷi denotes the predicted yield values from the model.The grid search with a cross-validation method is used to determine the optimal hyper-parameters for each model.RMSE and MAE values of hyper-parameter tuned models are compared to select the best model.The scores are measured multiple times with N-fold cross-validation method to enhance the robustness of models.

Prediction and Explanation
The tuned models are applied for the prediction of the test dataset, which is separated from the model optimization and selection procedure.The scores of prediction performance can prove the feasibility of the model selection procedure.The best performing model is combined with the XAI method, i.e., SHAP.Originating from game theory in economic science, the Shapley value is the relative contribution of a factor to the outcome [29].The SHAP value is a calculated attribution value combining conditional expectation and the Shapley value [30].As shown in the following Equation (3), the SHAP value for each feature is defined as where F is the set of input features, S ⊆ F\{i} is the subset with feature i, f is the model prediction, and f S∪{i} x S∪{i} − f S (x S ) denotes the marginal contribution of feature i to the prediction.There are several reasons to paying attention to SHAP, such as local accuracy, consistency, model agnostic nature, and ability to visualize the interpretation.
The computed SHAP values show the contribution of each instance to the prediction.The explanation model f (x), which approximates the model, is the summation function of feature attributions, and the sum is equal to the model output of a single instance x, following the local accuracy property [30,[41][42][43]: where φ 0 is E f (x) expected value of the function.The method also calculates the rank of features, specifically the averaged absolute SHAP values on features.The SHAP values for a specific parameter illustrate the relationship between the parameter and target values in the model [44].The Python package shap is adapted in this manuscript.

Results and Discussion
The purpose of this experiment is to confirm the proposed method with the recent empirical data.The fabrication and yield data are provided from the device processes of a manufacturer of semiconductors.For proprietary reasons, the exact names of variables and the device are not revealed.The data are restricted by the wafers of a specific device with EDS yield data to use them in supervised modeling.The organized dataset has 352 parameters and 327 wafers for specific device production.After input data preprocessing, including one-hot encoding and Pearson correlation coefficient-based dimension reduction, the number of input parameters becomes 983.Table 1 summarizes the counts of the overall dataset and each type of dataset during preprocessing and after the train test split.The distributions of target variables are similar and not skewed, as statistically summarized in Table 1.

Model Selection and Prediction
During the model optimization and selection procedure, grid search tunes hyperparameters, which vary depending on the model algorithms as shown in Table 2.The hyper-parameters include the number of estimators, max depth, subsampling size, learning rate, kernels, regularization factors, activation functions, max iterations, etc.The ten tuned models are compared using MAE and RMSE scores, which are obtained dozens of times by the cross-validation method to select the model of the best performance.As shown in Figure 2, the RF model has the smallest MAE and RMSE average, and the standard deviation of RF records good scores among the ten models.The KNN model shows the second-best performance with the training and test datasets.In terms of validation and prediction scores, the RANSAC and MLP models are ranked at the bottom.Other models, which are SVR, LightGBM, GPR, XGBoost, AdaBoost, and Lasso, have similar validation and prediction performances for estimating EDS yield with the combined fabrication data.

Model Selection and Prediction
During the model optimization and selection procedure, grid search tunes hyperparameters, which vary depending on the model algorithms as shown in Table 2.The hyper-parameters include the number of estimators, max depth, subsampling size, learning rate, kernels, regularization factors, activation functions, max iterations, etc.The ten tuned models are compared using MAE and RMSE scores, which are obtained dozens of times by the cross-validation method to select the model of the best performance.As shown in Figure 2, the RF model has the smallest MAE and RMSE average, and the standard deviation of RF records good scores among the ten models.The KNN model shows the second-best performance with the training and test datasets.In terms of validation and prediction scores, the RANSAC and MLP models are ranked at the bottom.Other models, which are SVR, LightGBM, GPR, XGBoost, AdaBoost, and Lasso, have similar validation and prediction performances for estimating EDS yield with the combined fabrication data.The prediction of EDS yield with the tuned models is executed using the test dataset.The MAE and RMSE values of each model are compared, as shown in Figure 3.The prediction results reveal that the performance of the RF model is the best among all the models in both metrics, with MAE and RMSE values of 0.520 and 0.648, respectively.The result shows a similar order to earlier cross-validated scores in the model selection phase.RMSE values show constantly larger values than MAE because RMSE uses squared differences, and larger differences enhance RMSE more.As reference scores, a statistical estimator predicts target values as an average value of the training dataset.The scores of the estimator are 0.744 for MAE and 0.919 for RMSE, and they are inevitably poorer than the scores of machine learning models, as presented in Table 2.
Appl.Sci.2023, 13, 2660 8 of 14 The prediction of EDS yield with the tuned models is executed using the test dataset.The MAE and RMSE values of each model are compared, as shown in Figure 3.The prediction results reveal that the performance of the RF model is the best among all the models in both metrics, with MAE and RMSE values of 0.520 and 0.648, respectively.The result shows a similar order to earlier cross-validated scores in the model selection phase.RMSE values show constantly larger values than MAE because RMSE uses squared differences, and larger differences enhance RMSE more.As reference scores, a statistical estimator predicts target values as an average value of the training dataset.The scores of the estimator are 0.744 for MAE and 0.919 for RMSE, and they are inevitably poorer than the scores of machine learning models, as presented in Table 2. To conclude the building yield model part, the case study shows how the multi-fabrication data are delivered as input data and how the optimization and comparison of models work.Through the suggested framework, the EDS yield of the device is predicted more accurately by 30.1% in the MAE score compared with the simple statistical prediction.The RF algorithm-based model ranks first for the case and is explained with SHAP values in the following chapter.The XAI method illustrates a chosen model and itemizes important features with visualization.
Table 2. Prediction performance of 10 regression models and their tuned hyper-parameters.To conclude the building yield model part, the case study shows how the multifabrication data are delivered as input data and how the optimization and comparison of models work.Through the suggested framework, the EDS yield of the device is predicted more accurately by 30.1% in the MAE score compared with the simple statistical prediction.The RF algorithm-based model ranks first for the case and is explained with SHAP values in the following chapter.The XAI method illustrates a chosen model and itemizes important features with visualization.

Explanation of the Model Using the SHAP Value Method
In the following interpretation part, the selected RF model is applied for SHAP analysis, i.e., TreeSHAP, a specialized method for tree-based models [30,41].Much research is actively conducted regarding XAI because there is uncertainty in decision-making based on the machine learning results.In this study, the SHAP value method is adopted, enabling granular explanations of the contribution of each feature [24,27].Figure 4a is the SHAP summary plot, showing how the top parameters affect yield prediction.The plot lists in descending order the average of absolute SHAP values and shows the correlation between input variables and the output.For a numerical feature example, as the "step_aa_T" parameter value decreases, the SHAP values decrease to −0.06 where a negative SHAP value means a negative impact on prediction as shown in Equation (4).To examine partial dependency, SHAP value scatterplots describe the effect of changing an individual feature [41,42].In Figure 5, the SHAP values scatterplot with the "step_aa_T" parameter shows the details of the nonlinear relationship, in which the EDS yield is roughly proportional to the parameter values and differs only near the peak value.As shown in the other plot of Figure 5, the "step_ax_P" feature implies that low and high sensor values are related to lowering yield relatively.As an example of a categorical parameter, the "step_s_R_53" parameter shows its yield-enhancing influence with the SHAP value up to 0.04 on the summary plot, as shown in Figure 4a and on the individual feature's SHAP value scatter plot in Figure 5.The "step_s_R_52" parameter is the other category processing condition of "step_s_R_53" converted from one-hot encoder, and they show opposite responses in the summary plot, as shown in Figure 4a.As shown in the waterfall chart of Figure 4b, these two categorical parameters share SHAP values, while they mean the wafer is processed with the operating condition presented by "52" instead of "53."As shown in Figures 4 and 5, the "53" process condition is supposed to increase yield in the prediction model.Practically, these parameters represent process design change, and the influence on yield is consistent with the domain knowledge that "53" is the advanced process of the step.

Explanation of the Model Using the SHAP Value Method
In the following interpretation part, the selected RF model is applied for SHAP analysis, i.e., TreeSHAP, a specialized method for tree-based models [30,41].Much research is actively conducted regarding XAI because there is uncertainty in decision-making based on the machine learning results.In this study, the SHAP value method is adopted, enabling granular explanations of the contribution of each feature [24,27].Figure 4a is the SHAP summary plot, showing how the top parameters affect yield prediction.The plot lists in descending order the average of absolute SHAP values and shows the correlation between input variables and the output.For a numerical feature example, as the "step_aa_T" parameter value decreases, the SHAP values decrease to −0.06 where a negative SHAP value means a negative impact on prediction as shown in Equation (4).To examine partial dependency, SHAP value scatterplots describe the effect of changing an individual feature [41,42].In Figure 5, the SHAP values scatterplot with the "step_aa_T" parameter shows the details of the nonlinear relationship, in which the EDS yield is roughly proportional to the parameter values and differs only near the peak value.As shown in the other plot of Figure 5, the "step_ax_P" feature implies that low and high sensor values are related to lowering yield relatively.As an example of a categorical parameter, the "step_s_R_53" parameter shows its yield-enhancing influence with the SHAP value up to 0.04 on the summary plot, as shown in Figure 4a and on the individual feature's SHAP value scatter plot in Figure 5.The "step_s_R_52" parameter is the other category processing condition of "step_s_R_53" converted from one-hot encoder, and they show opposite responses in the summary plot, as shown in Figure 4a.As shown in the waterfall chart of Figure 4b, these two categorical parameters share SHAP values, while they mean the wafer is processed with the operating condition presented by "52" instead of "53."As shown in Figures 4 and 5, the "53" process condition is supposed to increase yield in the prediction model.Practically, these parameters represent process design change, and the influence on yield is consistent with the domain knowledge that "53" is the advanced process of the step.The waterfall charts indicate how each feature influences the expected target prediction for a specific data point as shown in Figure 4b-d.In other words, SHAP waterfall charts illustrate how the explanation model decomposes the model output for an instance (i.e., a wafer) as the summands of plus and minus SHAP values of each feature as Equation ( 4).The basic prediction yield of the RF model is Ε[ ()], and the predicted yield according to the summation of SHAP values for each example wafer is  () [30,44].This approach can help analysts perform investigations on specific low-yield wafers.For example, Figure 4c shows that a specific low-yield wafer is processed by a specific unit, as   The waterfall charts indicate how each feature influences the expected target prediction for a specific data point as shown in Figure 4b-d.In other words, SHAP waterfall charts illustrate how the explanation model decomposes the model output for an instance (i.e., a wafer) as the summands of plus and minus SHAP values of each feature as Equation ( 4).The basic prediction yield of the RF model is Ε[ ()], and the predicted yield according to the summation of SHAP values for each example wafer is  () [30,44].This approach can help analysts perform investigations on specific low-yield wafers.For example, Figure 4c shows that a specific low-yield wafer is processed by a specific unit, as The waterfall charts indicate how each feature influences the expected target prediction for a specific data point as shown in Figure 4b-d.In other words, SHAP waterfall charts illustrate how the explanation model decomposes the model output for an instance (i.e., a wafer) as the summands of plus and minus SHAP values of each feature as Equation ( 4).
The basic prediction yield of the RF model is E f (x) , and the predicted yield according to the summation of SHAP values for each example wafer is f (x) [30,44].This approach can help analysts perform investigations on specific low-yield wafers.For example, Figure 4c shows that a specific low-yield wafer is processed by a specific unit, as represented by the plus value for "step_ac_U_8" and it has a yield lowering effect of −0.04.The sum of all the SHAP values of 983 parameters for the wafer, f (x) of Figure 4c, is −0.694, and the predicted yield of the wafer by the RF model, f (x), is −0.673.The SHAP value method decodes the model well for the data point and illustrates the stepwise prediction.The top parameters identified in the expandable framework overlap with those of individual dataset-based models, as shown in Figure S1.The proposed approach can replace iterative modeling for each dataset and save effort and time.

Discussion and Limitation
The proposed framework to predict EDS yield with extensive wafer fabrication data is demonstrated with real-world data.The dataset consists of diverse data applied for practical yield analysis in the field.The 10 different machine learning models are optimized and compared, to select the best performing prediction model for the data.The chosen RF model shows improved prediction scores, which are 0.520 for MAE and 0.648 for RMSE.We employ the XAI method, i.e., SHAP, to explain the model and present the relationship between the key features and the yield.Thus, this study raises the possibility of practical yield prediction with expandable fabrication data and interpretation through SHAP analysis.
The cautious application of feature analysis is necessary because these relationships, inferred from the SHAP value scatterplot over feature values, do not guarantee causality.Therefore, the counteraction on the fabrication process should be considered carefully with domain knowledge and proper experiments to establish possible causation.Knowledge derived based on the XAI analysis is possibly considered a controlling factor in the wafer fabrication processes, although the physical or chemical background of this phenomenon needs to be discussed and examined through further research.The framework goes beyond simply predicting the yield and listing important features, that is XAI informs how each feature is reflected in the yield prediction and how the yield of each wafer is predicted based on the correlation in the model.Therefore, the XAI method increases the transparency of the model to improve usability.
The limitations of this study must be acknowledged.For new trial operating conditions, modification and retraining of the model are necessary.Some datasets from wafer fabrication are not included in the analysis because of their characteristics.As a typical example, metrology and inspection data have a high missing rate, making the utilization of these data challenging in this framework.Nevertheless, there is research that proposes a method for identifying the key steps in fabrication processes using missing value imputation [22] and other studies that focus on advanced imputation mechanisms, such as virtual metrology (VM) [45,46].Moreover, there is various information related to quality control, e.g., line condition, equipment maintenance [17,47], source material change, and engineers' notes.Without wafer information, these datasets necessitate a complicated conversion in order to serve as the input dataset for this modeling.To counteract the expansion of input variables in future studies, the principal component analysis can improve dimension reduction efficiency during preprocessing [13,19,48].Another limitation is that the yield-improving action based on the analysis results needs to consider various aspects of manufacturing, such as production efficiency, equipment maintenance costs, and serial changes to other features.For example, some time span parameters affect the yield, as shown in Figure 5; however, the restriction on the features would affect prior and later steps and require advanced scheduling [49,50].In addition, deep learning models are not studied in this paper due to the limited number of data points but are necessarily included in future research [12,51].Furthermore, a study of the SHAP method with non-tree-based models is required, considering other candidate models such as KNN and SVR [42,52].

Figure 1 .
Figure 1.Overall flowchart of yield prediction and explanation with multi-type fabrication data.

Figure 1 .
Figure 1.Overall flowchart of yield prediction and explanation with multi-type fabrication data.

Figure 2 .
Figure 2. Distribution of the 20 validated (a) MAE and (b) RMSE values for the RF, KNN, LightGBM, SVR, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP models using cross-validation method and statistical tables of the average (Avg) and standard deviation (StdDev) values for the models.

Figure 2 .
Figure 2. Distribution of the 20 validated (a) MAE and (b) RMSE values for the RF, KNN, LightGBM, SVR, GPR, XGBoost, AdaBoost, Lasso, RANSAC, and MLP models using cross-validation method and statistical tables of the average (Avg) and standard deviation (StdDev) values for the models.

Figure 5 .
Figure 5. SHAP value scatterplots according to the values of six example features.

Figure 4 .
Figure 4. (a) SHAP value plot of feature attribution for top parameters by the RF model.The color corresponds to the range of feature values from high (red) to low (blue).A positive SHAP value means a positive impact on prediction, leading the model to predict a high yield for the wafer.Waterfall plots of example wafers (b) with mid yield, (c) with low yield, and (d) with high yield.The color corresponds to the SHAP value: positive (red) and negative (blue).

Figure 4 .
Figure 4. (a) SHAP value plot of feature attribution for top parameters by the RF model.The color corresponds to the range of feature values from high (red) to low (blue).A positive SHAP value means a positive impact on prediction, leading the model to predict a high yield for the wafer.Waterfall plots of example wafers (b) with mid yield, (c) with low yield, and (d) with high yield.The color corresponds to the SHAP value: positive (red) and negative (blue).

Figure 5 .
Figure 5. SHAP value scatterplots according to the values of six example features.

Figure 5 .
Figure 5. SHAP value scatterplots according to the values of six example features.

Table 1 .
Summary of the dataset.

Table 2 .
Prediction performance of 10 regression models and their tuned hyper-parameters.
* Statistical estimator predicts target values of test dataset as the average value of training data set.** This table is sorted in ascending order by the RMSE value.
Statistical estimator predicts target values of test dataset as the average value of training data set.** This table is sorted in ascending order by the RMSE value. *