Next Article in Journal
Effect of the Thermo-Mechanical Processing on the Impact Toughness of a 12% Cr Martensitic Steel with Co, Cu, W, Mo and Ta Doping
Next Article in Special Issue
Casting Defects in Sand-Mold Cast Irons—An Illustrated Review with Emphasis on Spheroidal Graphite Cast Irons
Previous Article in Journal
Structural Transition of Vacancy–Solute Complexes in Al–Mg–Si Alloys
Previous Article in Special Issue
The Role of Selenium on the Formation of Spheroidal Graphite in Cast Iron
 
 
Article
Peer-Review Record

Development of Data-Driven Machine Learning Models for the Prediction of Casting Surface Defects

by Shikun Chen 1,*,† and Tim Kaufmann 2,*,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Submission received: 2 November 2021 / Revised: 10 December 2021 / Accepted: 17 December 2021 / Published: 21 December 2021
(This article belongs to the Special Issue Optimizing Techniques and Understanding in Casting Processes)

Round 1

Reviewer 1 Report

The author carried out six machine learning algorithms to predict the casting surface related defects using SHAP framework. The following points should be modified: 1. The paper introduce the algorithms in detail, however, the details of data sets have not been introduced. More information of the data sets is needed. 2. Six algorithms have been used here, but there is no conclusion. Brief conclusion should be included in this paper.

Author Response

Thank you for your suggestions. 

Point 1: We have added subsection 3.1 to introduce the input data. 

Point2: We added a conclusion for six algorithms (line 368-376)

Reviewer 2 Report

An approach for the application of machine learning in the prediction and understanding of casting surface related defects were presented in the current manuscript. Six different machine learning algorithms were trained and SHAP framework was used to complete the interpretation of the model output, which can provide guidance for the optimization of casting process chain. The manuscript still exists deficiencies to be modified prior.

  • Please check the use of headings. As stated in 2.3, there is only one subtitle 2.3.1. Besides that, the headings of SHAP benefits worth should not use “1. 2. 3.”
  • The title of chapter 2 is Machine Learning and Casting Defects, however, description of casting defects is only mentioned in 2.4, which should be introduced in chapter 1 Introduction.
  • The paragraph below chapter 3 “Experiments and Results” is telling about the data set. Therefore, it is necessary to generate a title for it.
  • The legends of Fig.4 and Fig.5 should be changed to illustrate the meaning of these two figures.
  • What are the parameters para1, para2, and para3 respectively? Please add a note.
  • As mentioned above, there are 51 independent variables, why there are 282 SHAP features? Please check the data carefully.

Author Response

Thank you for your suggestions. 

  1. we removed subtitle 2.3.1
  2. we changed chapter 2 to "Machine learning" and moved "Casting Defects" to subsection 1.1
  3. We added a new subsection called "Introduction to data set" as 3.1
  4. We modified the legend of the figure.
  5. Because of business considieration, we can't give explict name of those parameters, however, we explain the meaning of them at line 414-416.
  6. We preprocessed the input data, after preprocessing the input data increased from 51 to 282. 

Reviewer 3 Report

(1) First, what are input variables exactly selected in the present models, should be clearly stated. Second, please represent ranges of input variables of teaching data, where average values should be also presented.

(2) For the input variables, there must be some variables that were difficult to quantify numerically, such as the data from the quality management, how did you deal with them.

(3) Six types of machine learning algorithm were applied, please show more detailed calculation parameters in these machine learning algorithms.

(4) In line 344-350, the training data was also tested to find the best hyperparameters. Why the training data was divided in 10 groups, and then the fold left out was used for test? What is the difference between the results in Table 1 and Table 2? It can also be concluded that the ET had given the best response by the results in Table 1.

(5) The prediction model indicates that the process variables para_1 , para_2 and para_3 are the main influencing variables for the casting surface defect to occur. What does these parameters referred to in detail.

Comments for author File: Comments.pdf

Author Response

Thank you for your suggestions.

  1. We added new subsetion 3.1 "Introduction to data set" to explain the input data. We also added a table 1 to represent their mean and standard deviation.
  2. As explain in subsetion 3.2, the input variables will be pre-processed firstly. We will transform the text or string to numeric values so they can be feeded into machine learning algorithms.
  3. We add Table 3 to show those hyperparameters after tuning.
  4. As explaine in subsection 3.3. The 20% of testing  data will keep unseen until training phase ended. K-fold cross-validation will be applied only on training data set, the prediction function is learned using k 1 folds, and the resulting model is validated on the remaining part of the data. Thr purpose of k-fold cross-validation is to selected best hyperparameters combinations.
  5. Because of business consideration we can't give their name explictly, but we explain their meaning at line 414-416.

Reviewer 4 Report

Casting defects are a complex problem. Defects of a similar appearance have different causes. The details are decisive here. The cause of the surface defect can be inferred from the inclusions it contains, the defect's surface quality, color, etc. 

Do I understand correctly that quality control is human and that this feedback is entered into the algorithm? The algorithm then associates the data with the process parameters.

  • The article does not validate theoretical considerations. Two or three instances where the system diagnosed a fault can be shown along with an example of the fault on an actual casting.
  • Chapter 4 should be divided into two parts. Separate discussion of the results. Here you can show the actual result from the foundry (I am writing about it above). Separate ending of the chapter. Here, you should show what is most important in the form of short conclusions.

  • Figure 5. The residuals plot shows the difference between residuals on the vertical axis and the dependent variable on the horizontal axis, allowing us to detect regions within the target that might be susceptible to more or less error. Test R2 = 0.745. Poor result.
  • There are minor editorial errors in the work; see lines 58, 70, 80-81 and 248.

Author Response

Thank you for your suggestinos:

  1. Yes, that is correct. We took all the available data, that the foundry provided to us. The quality control is done by the workers in the foundry. They check the component for the surface defects, the component are then cleaned and further processes in the machining center. It is then measured if either the whole component is 100% scrap, or if by cleaning the surface in the machining center the component is still usable. The amount that resulted in scrap, either 100% if the entire component was scrap or the amount that had to be removed from the component, is then recorded and was given to us.
  2. The system is not running in real-time yet, we did a post-production analysis, based on the data that was given. We cannot provide such images, due to the face, that the foundry doesn't want sensible process information to be public. This is very common in industry! Therefore, we also had to shorten the given table with the process parameters. If we would publicly give all the information about which components were cast with what specific process parameters, then this could give competitors of the foundry an unfair advantage.
  3. We split the chapter in 2. From our point of view, we wanted to show, that machine learning can help in understanding the process and related quality problems better. The foundry process has a lot of different sub-processes. The mold has to be created, the scrap has to be melted etc. A „normal“ e.g. linear regression analysis does not work on this data at all, even though the ML results are not „optimal“  (R2 75% roughly), this is still much better than an analysis with „normal“ regression, which is very often done in foundry.
  4. The residuals plot shows this, yes. It's there to get a rough idea on how well the model performs on different regions of scrap-amounts. We analyzed a wide variety of different components. Obviously a  1 kg component can maximum result in 1 kg of scrap, if the whole component couldn't be processed in the machining center. The point of this paper is also to show that machine learning models understand correctly the influencing parameters that lead to scrap (SHAP analysis). Achieving a „perfect“ process model (e.g. R2 of 95-99% or something like this) is very unrealistic in the real foundry environment. This is due to the fact, that even though a lot of process data is recorded, there is still an uncertainty in the measurements. As described in the discussion, the sand quality plays the major role in the casting defects. However, the sand is only tested in intervals, not for every single mold created. Therefore, it is impossible to have conclusive information about the sand quality for each mold, only for a certain time span of molds produced. The model needs improvement, but this will only be possible, if more sensors and therefore more data is generated, e.g. from the sand molds. At this time, foundries and companies, that produce the machines for the mixing of the sand, are working on this problem, to achieve better information about the sand and its quality.   Overall to this point it can be said, that it is extremely unrealistic, to achieve high R2 values for such complex processes, due to  the above-mentioned reasons.
  5. We have fixed those issues. 

Round 2

Reviewer 2 Report

Please check the use of headings. There is only one subtitle such as 1.1 and 4.1.

Reviewer 4 Report

Thank you for your answers. The explanations provided by the authors are meaningful and I fully agree with the authors' position. Some data cannot be disclosed due to company secrets; this is a legitimate reason to hide sensitive production data. I accept the authors' responses. I am satisfied with the amendments made to the text by the authors. Thank you for understanding my comments and for the cooperation that allowed us to improve the article.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop