Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Predicting Work-in-Process in Semiconductor Packaging Using Neural Networks: Technical Evaluation and Future Applications

Electronics 2024, 13(21), 4275; https://doi.org/10.3390/electronics13214275

by Chin-Ta Wu^1,2, Shing-Han Li^3,* and David C. Yen⁴

Reviewer 1:

Eugene Pinsky

Reviewer 2:

Aleksander Radovan

Reviewer 3:

Sunhee Hwang

Electronics 2024, 13(21), 4275; https://doi.org/10.3390/electronics13214275

Submission received: 21 September 2024 / Revised: 23 October 2024 / Accepted: 29 October 2024 / Published: 31 October 2024

(This article belongs to the Special Issue Feature Review Papers in Electronics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study compared the Backpropagation Neural Network with SVM. Why SVM. SVM, by itself, requires a lot of tuning of its hyperparameters (kernel, misclassification, etc.) . Were these parameters investigated? It is stated that MAPE is much lower for a neural network. Which particular SVM was chosen and why? What about other classifiers that could offer some explainability, like nearest neighbors?

Are the SVM predictions good enough to be used in practice?

I think these issues must be addressed before this can be considered for publication.

Author Response

Comments 1: The study compared the Backpropagation Neural Network with SVM. Why SVM. SVM, by itself, requires a lot of tuning of its hyperparameters (kernel, misclassification, etc.) . Were these parameters investigated? It is stated that MAPE is much lower for a neural network. Which particular SVM was chosen and why?

Response 1: For this study, the goal is to compare and conduct the assessment of deep learning capability with a traditional machine learning algorithm. In terms of the machine learning algorithm, SVM (or SVR) is a classic algorithm for regression or classification task.

However, SVM has a high computational complexity. When data size is large, we need to spend more time and cost to obtain the optimal modeling performance. Considering the data size of experimental data, we adopted the Least Squares Support Vector Regression (LS-SVR), and this method was proposed by Suykens and Vandewalle (2001), introducing the least squares method to resolve the optimization problem of SVM.

To handle the hyperparameter of SVM, we adopted the Radial Basis Function (RBF) as kernel function, and parameter of RBF kernel set 0.01. Further, the parameter of RBF kernel is adopted through grid search method.

SVM References:

Suykens and J. Vandewalle. (1999) Least squares support vector machine classifiers, Neural Processing Letters, vol. 9, no.3, pp. 293–300.

Xu, S., An, X., Qiao, X., Zhu, L., & Li, L. (2013) Multi-output least-squares support vector regression machines, Pattern recognition letters, 34(9), 1078-1084.

Comments 2: What about other classifiers that could offer some explainability, like nearest neighbors? Are the SVM predictions good enough to be used in practice?

Response 2: In this study, SVM is used for handling the regression task. Based on the experimental dataset, we compare the regression performance of classic machine learning algorithms (Linear regression, Lasso regression, Ridge regression, and SVM), and uncover SVM is better than all other algorithms cited in this study.

	SVM	Linear Regression	Lasso Regression	Ridge Regression
MAPE	0.475	0.725	1.664	0.729
Adj-R2	0.826	0.721	0.573	0.721

The analysis source code has been attached (please check the following file: MDPI_Source_Code.html).

https://html-preview.github.io/?url=https://github.com/LiShingHan/Python/blob/main/MDPI_Source_Code.html

Reviewer 2 Report

Comments and Suggestions for Authors

The paper describes the application of neural networks in semiconductor packaging - examining how the BPNN model predicts the work-in-progress arrival rates at various stages of semiconductor packaging processes and focusing on enhancing production efficiency.

The paper is structured well, it adequately covers the introduction and motivation for the research, related research, and research methodology. On the other hand, the paper is pretty short and does not highlight why BPNN is better in comparison to other neural network types.

Please consider accepting the following suggestions to improve the paper:

1) Sometimes WIP is explained as "work-in-process" and sometimes as "work-in-progress" - please explain the difference or use the same term if there is a typo or mistake.

2) Please reference Moore's Law with external sources in the Introduction section.

2) Please provide the link to the source code used during the research.

3) Please provide the details of the machine configuration used during the research.

4) Please describe why you think that BPNN is so superior to other neural network types.

5) Please reflect on possible usages of this methodology and approach to other areas and domains.

6) Please provide more details related to the dataset (data entries), the data percentage you used for training, and the validation of the neural network precision.

Author Response

Comments 1: Sometimes WIP is explained as "work-in-process" and sometimes as "work-in-progress" - please explain the difference or use the same term if there is a typo or mistake.

Response 1: While “work-in-process” and “work-in-progress” are often used interchangeably, there can be subtle differences. In the semiconductor industry, “work-in-process” is typically a more precise term referring to wafers or chips actively moving through specific manufacturing steps. To follow up this comment, we have standardized the term as “work-in-process” throughout the manuscript as suggested.

Comments 2: Please reference Moore's Law with external sources in the Introduction section.

Response 2: We have added an external reference to Moore’s Law in the Introduction section and updated the references list with the following citation as item 19 provided below.

“19. Moore, G. (2021). Cramming more components onto integrated circuits (1965).”

Comments 3: Please provide the link to the source code used during the research.

Response 3: The source code is attached (MDPI_Source_Code.html)

Since the research method has been applied to data from a new time, we had updated the experimental data presented in this paper, and the results of the new data are in fact, consistent with the conclusions of the original version.

https://html-preview.github.io/?url=https://github.com/LiShingHan/Python/blob/main/MDPI_Source_Code.html

Comments 4: Please provide the details of the machine configuration used during the research.

Response 4: The hardware used for training the BPNN model consists of an Intel 12th Gen i7-12700 CPU with 12 cores (8 Performance-cores and 4 Efficient-cores) and a maximum single-core frequency of 4.9 GHz, 32GB of RAM, and an NVIDIA GeForce RTX 3090 graphics card. This above information has been added to the manuscript as well.

Comments 5: Please describe why you think that BPNN is so superior to other neural network types.

Response 5: As mentioned in our response to another reviewer, our goal was to compare and evaluate the deep learning algorithms with application of the traditional machine learning methods. Among the deep learning algorithms, we selected the classic BPNN as a representative model. However, we also acknowledge that other newer neural network architectures could potentially yield better results and this can be another future research direction.

Comments 6: Please reflect on possible usages of this methodology and approach to other areas and domains.

Response 6: In addition to its application in the manufacturing industry, it is notable that this proposed methodology has the potential to be applied in the transportation sector. Accurate prediction of arrival rates could improve the overall operational efficiency and hence, reduce time-related costs for related businesses and/or industries.

Comments 7: Please provide more details related to the dataset (data entries), the data percentage you used for training, and the validation of the neural network precision.

Response 7: The description for each variable and related response have been updated in the revised paper (See Table 1).

For experimental data presented in this study, we collected 5740 data for production system, and split those into 3 sub-datasets including training set, validation set, and testing set. The training set (68%) is used to train the regression model, the validation set (12%) is employed to evaluate performance in model training, and finally the testing set (20%) is utilized to show the performance of inferencing new data.

For BPNN algorithm, the model contains an input layer, an output layer, and 5 hidden layer between them. Considering the computational efficiency and minimizing the risk of gradient vanishing, the activation function was adopted with Rectified Linear Unit (ReLU), one of the most popular activation functions for neural networks algorithm (Ramachandran, 2017). The epoch is 5000 and the early stopping is adopted when the validation loss does not decrease for 30 consecutive epochs, otherwise the training will be terminated .

ReLU Reference:

Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.

Reviewer 3 Report

Comments and Suggestions for Authors

It would be beneficial for the paper to include more detailed information on the experimental environment, such as the hardware specifications (e.g., CPU, GPU) and the software environment used, including the specific libraries or frameworks (e.g., TensorFlow, PyTorch). Providing these details would likely improve the reproducibility of the experiments and give readers a clearer understanding of the computational resources required.

Clarifying the model training process could strengthen the paper. Specifically, explaining the number of epochs used and the criteria for selecting that number would provide valuable insight. Additionally, offering more details on the validation and test set separation, as well as any early stopping or convergence criteria, would likely enhance the transparency and rigor of the training process.

Author Response

Comments 1: It would be beneficial for the paper to include more detailed information on the experimental environment, such as the hardware specifications (e.g., CPU, GPU) and the software environment used, including the specific libraries or frameworks (e.g., TensorFlow, PyTorch). Providing these details would likely improve the reproducibility of the experiments and give readers a clearer understanding of the computational resources required.

Response 1: For software environment, the experiment is mainly performed in python 3.6.14, and the relevant library and associated versions are TensorFlow(1.12.0), Keras (2.2.4) and Scikit-learn(0.18.1). For hardware environment, the training the BPNN model used in this study consists of an Intel 12th Gen i7-12700 CPU with 12 cores (8 Performance-cores and 4 Efficient-cores) and a maximum single-core frequency of 4.9 GHz, 32GB of RAM, and an NVIDIA GeForce RTX 3090 graphics card.

Comments 2: Clarifying the model training process could strengthen the paper. Specifically, explaining the number of epochs used and the criteria for selecting that number would provide valuable insight. Additionally, offering more details on the validation and test set separation, as well as any early stopping or convergence criteria, would likely enhance the transparency and rigor of the training process.

Response 2: We appreciate the reviewer's insightful feedback on our manuscript and agree that a detailed description of the model training process would enhance the clarity and robustness of our study. Below, we address each point raised:

Number of Epochs and Criteria for Selection: The number of epochs was determined based on the convergence behavior observed during the training phase. Specifically, we set the training to continue for a maximum of 100 epochs with a patience parameter of 10 epochs for early stopping. This setup was chosen after several preliminary trials, which indicated that extending training beyond this point did not yield significant improvements in model performance on the validation set, hence avoiding overfitting.
Validation and Test Set Separation: Our data was split into three distinct sets: training (68%), validation (12%), and testing (20%). This distribution was carefully selected to ensure that the model would be trained on a sufficiently large dataset while also allowing for rigorous validation and independent testing. The validation set was used to monitor the model's performance during training, particularly for tuning hyperparameters and early stopping to prevent overfitting.
Early Stopping and Convergence Criteria: We implemented an early stopping mechanism to halt training when the validation loss did not improve for 10 consecutive epochs. This approach helps in preventing the model from overfitting to the training data. The convergence of the model was assessed based on the stabilization of loss and accuracy metrics on the validation set, which is a standard practice in neural network training.

These methodologies were selected to ensure the robustness and generalizability of our neural network model, aiming to provide reliable predictions of WIP arrival rates in semiconductor manufacturing processes. Further details of these procedures have now been added to the Methods section of our manuscript to provide clarity and enhance the transparency of our research approach.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have sufficiently addressed previous review comments and have revised the manuscript accordingly. It can be published

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors, thank you for accepting the suggestions for improving your manuscript.

Article Menu

Predicting Work-in-Process in Semiconductor Packaging Using Neural Networks: Technical Evaluation and Future Applications

Further Information

Guidelines

MDPI Initiatives

Follow MDPI