Forecasting Obsolescence of Components by Using a Clustering-Based Hybrid Machine-Learning Algorithm

Product obsolescence occurs in every production line in the industry as better-performance or cost-effective products become available. A proactive strategy for obsolescence allows firms to prepare for such events and reduces the manufacturing loss, which eventually leads to positive customer satisfaction. We propose a machine learning-based algorithm to forecast the obsolescence date of electronic diodes, which has a limitation on the amount of data available. The proposed algorithm overcomes these limitations in two ways. First, an unsupervised clustering algorithm is applied to group the data based on their similarity and build independent machine-learning models specialized for each group. Second, a hybrid method including several reliable techniques is constructed to improve the prediction accuracy and overcome the limitation of the lack of data. It is empirically confirmed that the prediction accuracy of the obsolescence date for the electrical component data is improved through the proposed clustering-based hybrid method.


Introduction
A rapidly changing technological industry has caused the market to rapidly incorporate new materials and parts. This has caused product obsolescence to occur in every production line in the industry owing to the availability of products that achieve better performance or are more cost-effective or both. Strategies for addressing obsolescence are related to the expenses of firms and customer satisfaction. For the obsolescence management, reactive strategies such as lifetime buy, last-time buy, or identification of alternative parts are only temporary and may cause additional delays compared to the proactive strategies. If the probability of obsolescence and the cost associated with the obsolescence are high, it is recommended that one apply proactive management strategies to minimize the risk of obsolescence and associated costs. In fact, forecasting the occurrence of obsolescence is the key factor in proactive management, and many researchers have focused on the development of methods based on the prediction of obsolescence. Proactive strategies allow firms to prepare for the event of obsolescence; manufacturing losses can be reduced by predicting the life cycle of various components, including electronic components [1][2][3].
In this study, we aim to predict the cycle of diminishing manufacturing sources and materials shortages (DMSMS) obsolescence, which is defined as the loss of the ability to procure a technology or part from its original manufacturer. It is necessary to accurately predict the obsolescence cycle to reduce the risk for manufacturers and various companies caused by problems such as fast technology processes and short technology life cycles. Various statistical models for the accurate prediction of the obsolescence risk and date have been studied [4][5][6][7]. A Weibull-based conditional probability method as a risk-based approach to predicting microelectronic component obsolescence is described in [6]. The references to the problem of component obsolescence are summarized in [8]. However, it is difficult to implement a rapidly adapting statistical model to predict the obsolescence cycle of thousands of different types of components. Moreover, it is difficult to gather the input parameters of different models.
With recent improvements in computer performance, many methods for predicting future trends by learning large-capacity data and collecting necessary information are being studied. These learning methods, particularly machine-learning or deep-learning methods, are demonstrating outstanding results in various fields [9][10][11][12]. Depending on the data type or application, various machine-learning methods can be used. To the best of the authors' knowledge, there are few studies in which these machine-learning or deep-learning methods have been applied to predict the cycle of DMSMS obsolescence. Jennings et al. (2016) [13] proposed two machine learning-based methods for predicting the obsolescence risk and life cycle. Good prediction results were reported by using random forest, artificial neural networks, and support vector machines for cell phone market data. Grichi et al. (2017Grichi et al. ( , 2018 [14,15] proposed the use of a random forest and a random forest together with genetic algorithm searches for optimal parameter and feature selection for cell phone data, respectively. Trabelsi et al. (2021) [16] combined a feature selection and machine learning for obsolescence prediction. As described above, ordinary learning methods attempted to increase the accuracy of prediction by combining the existing machine-learning methods and applying them to the component obsolescence data. Although it is necessary to present efficient methods and hybridize them, it is expected that the accuracy of prediction can be improved further if the characteristics of each part data are used for learning. Therefore, in this study, the clustering method, which first classifies and learns data according to characteristics, is newly applied to predict the obsolescence of components.
The objective of this paper is as follows: Does machine learning improve the proactive strategy and prediction of obsolescence? Can it be effective and reliable? The obsolescence of the parts of diodes is predicted in this study when a sufficient amount of the data is not provided; the lack of available data for obsolescence problems is a crucial weakness in ordinary machine-or deep-learning methods. We propose a very accurate, fast, and reliable machine-learning method, which overcomes this weakness by using an unsupervised clustering algorithm and an ensemble of supervised regression techniques. Supervised regression tries to identify the parameters of the model from the labelled data and unsupervised clustering partitions the entire data into a few groups of similar data based on outward appearance. It is expected that the parameters obtained from a cluster of similar data fit machine-learning models better than the parameters from the entire set because the entire set has more variation and randomness. Thus, instead of constructing a single model for the entire set, several models are constructed, each of which is independently trained with the data in one cluster only, and the conjecture is experimentally validated by using several real datasets. It is the novelty of the study to apply an unsupervised clustering algorithm to supervised regression to improve model training. The usage of a hybrid ensemble method including several reliable regression techniques additionally improves the prediction accuracy; this is another novelty of the study. It is confirmed by using various measures that the prediction accuracy of the obsolescence date is improved through the proposed clustering-based hybrid method for diode data from three categories such as Zener diodes, varactors, and bridge rectifier diodes. The proposed clustering-based hybrid method can be easily extended not only to electrical component data but also to other types of obsolescence cycle prediction problems.
The rest of the paper is organized as follows. Section 2 describes the machine-learning and deep-learning algorithms used in the experiments. The proposed hybrid method based on k-means clustering is explained in Section 3. The statistics of the data and the descriptions of the hyperparameters are presented in Section 4. The accuracy measures and experimental results are presented and discussed in Section 5. The conclusions are drawn in Section 6.

Learning Models
It is important to choose a machine-learning or deep-learning algorithm with good predictive and computational performance for the dataset. For example, decision tree (DT) is a tree-building algorithm, which is easy to interpret and can adapt to learn complex relationships. An ensemble method can be constructed by combining several techniques into one that has better generalization performance than each individual algorithm. The two popular ensemble methods are bagging and boosting. We propose a hybrid method in this study and the merits of the proposed method are compared with those of various standard algorithms from individual algorithms (DT) to bagging algorithms (random forest), boosting algorithms (gradient boosting), and deep learning methods (deep neural network and recurrent neural network). We briefly introduce the following machinelearning and deep-learning algorithms and consider their combinations for improved results.

•
Decision tree, random forest, gradient boosting • Deep neural network, recurrent neural network

Decision Tree
The decision tree (DT) is a machine-learning method that is easy to understand and interpret and easy to use for both classification and regression. DTs based on features in training data start at the root of the tree and split the data based on the information gain. The following is used as the objective function to maximize this information gain in each division: Here, I is the impurity indicator, N is the number of samples of the node, the subscript p denotes the parent node, the subscript j denotes the j-th child node, and n is the number of child nodes. As an impurity indicator, entropy I E or Gini impurity I G is widely used, where m is the number of classes in the node t and the subscript i denotes the i-th class in node t. DTs have a few restrictions on the training data; thus they are prone to overfitting. Therefore the maximum depth of the DT is usually controlled as a regulatory variable [10,11].

Random Forest
The random forest (RF) uses multiple DTs to improve prediction performance and reduces the risk of overfitting. First, a DT is trained by randomly selected samples from training data based on an objective function such as that in Equation (1). This process is then repeated several times to collect the prediction of each tree and make a decision by the majority vote method. When an RF splits the nodes of a tree, it finds the optimal features by considering randomly selected feature candidates among all features. This makes the tree more diverse and lowers the variance. Additionally, it is easy to measure the relative importance of a feature by checking how much a node using a certain feature reduces the impurity. The number of trees generated by an RF is a hyperparameter, and the larger the number of trees, the higher the computational cost, but the better the performance. Although an RF is more complex than DTs, it is more stable and can handle high dimensionality and multicolinearity better, being both fast and insensitive to overfitting [17][18][19].

Gradient Boosting
The gradient boosting (GB) method is used to train a set of predictors while complementing the previous model by sequentially adding classifiers. Starting from the leaf node of the DT, the estimate of the target is found from the argument that minimizes the sum of the loss functions. In other words, from the dataset {(x i , y i )} n i=1 , the prediction is first computed by For instance, with a differentiable loss function L(y i , γ) = (y i − γ) 2 /2, we obtain the sample average f 0 (x) = 1 n ∑ n i=1 y i . The prediction is then sequentially updated by reducing the average of the pseudo residual as follows: is the residual of the data, R jm (x) is the average of the residuals r im that a sample x can be found in the j-th leaf node in the m-th tree, and J m is the number of leaf nodes of the m-th tree. Here, ν is the learning rate and is between 0 and 1, which reduces the effect of each tree and eventually improves the accuracy [11,20].

Deep Neural Network
Deep learning is based on artificial neural networks created by mimicking the principles and structure of human neural networks. In the human brain, neurons receive a certain signal, stimulus, etc., and when this stimulus exceeds a certain threshold, it is conceived in the process of transmitting the resulting signal. Here, the input stimulus and signal are input data from the artificial neural network, the threshold value is a weight, and the type of action performed by the stimulus can be compared to the output data. Hidden layers exist between the input and output layers, and the hidden layer uses an activation function to determine the optimal weight and bias. A learning method with two or more hidden layers is referred to as a deep neural network (DNN), as shown in Figure 1a. The computer creates a classification label on its own, distorts the space, and repeats the process of classifying data to derive the optimal dividing line [11,21].

Recurrent Neural Network
The recurrent neural network (RNN) algorithm is a type of artificial neural network specialized in repetitive and sequential data learning and contains an internal cyclic structure as shown in shown Figure 1b. By using a circular structure, past learning is reflected in the current learning through weights. It is an algorithm that solves the limitations of existing continuous, iterative, and sequential data-learning algorithms. It enables the connection of the present learning with the past learning and is time dependent [11,22].

Grid Search
For each machine-learning algorithm mentioned above, hyperparameter optimization is performed by using a grid search as shown in Figure 2 to determine the optimal parameters through which the best learning model is derived.  Figure 2. Flowchart of the grid search, which finds the right hyperparameters of a machine-learning model to achieve optimal performance.

Hybrid Method
The machine-learning and deep-learning methods introduced in Section 2 can be applied as they are, but the prediction results can be further improved by grouping data with common properties. The k-means method from unsupervised learning is first introduced as a grouping method.

k-Means Clustering
A partition of a set {X 1 , X 2 , . . . , X n } in mathematics is a grouping of its elements into non-empty subsets {A 1 , A 2 , . . . , A k } in such a way that every element is included in exactly one subset. The k-means clustering is a method that aims to partition the observations into k clusters to minimize the variance of each cluster and distance difference. It is one of the unsupervised learning methods, which represent algorithms that learn patterns from unlabelled data.
The detailed process is as follows. First, we randomly select k data and set this as the centroids of each cluster. All data are grouped to minimize the distance to k centroids. The centroid of the configured cluster is recalculated, and the above calculation is repeated until the cluster of each data does not change. That is, it is used to find k clusters that minimize the following variance, The proposed method performs the k-means method on the training data so that the unsorted data as in Figure 3a can be clustered into groups with certain similarities as in Figure 3b.
Although the k-means method has the advantage of improving the learning results, it also has limitations. First, the number of clusters k should be specified in advance, and depending on this value, different clustering results may be obtained. Additionally, there is a possibility that the error convergence of the algorithm converges to a local minimum rather than a global minimum, and it is sensitive to data outliers [10,23]. Because the number k of the clusters is a parameter dependent on the dataset, it is obtained by performing preliminary preprocessing as presented in Section 4.2.  When machine learning is used for problem solving, a single model is constructed from the entire dataset. The k-means clustering algorithm in this study partitions the training data into disjoint clusters of similar data and multiple machine-learning models are then constructed, i.e., one model for each cluster. The grid search in Figure 2 is performed on each cluster as in Figure 4 to train each model separately and independently and determine the optimal hyperparameters of the model for each cluster. Learning through k-means clustering is referred to as learning with clustering in this study. Learning in Figure 2 without k-means clustering, i.e., learning the entire training data all together is referred to as learning without clustering.  Figure 4. Determining the optimal hyperparameters of a machine-learning method for each cluster obtained by using unsupervised k-means clustering.

Hybrid Method with Clustering
The proposed method first divides the training data into k groups through k-means clustering. Then for each data in the test data, an appropriate model is selected for the prediction as follows. Suppose that {C 1 , C 2 , . . . , C k } are the centroids of the groups in the partition of the training data. Given test data X, the distance between X and each of the centroids is measured as in (2), If the distance is minimized at i = i * , that is then the learning model obtained from the (i * )th cluster of the training data is applied for the prediction of X. The procedure is repeated for each of the test data as in Figure 4. As shown in Section 4, different machine-learning methods exhibit different prediction results and no method is dominant in accuracy. Thus, a modification of ordinary machinelearning methods is considered, which is an ensemble method. When the obsolescence dates are predicted by three machine-learning methods, DT, RF, and GB, their average defines an obsolescence date (denoted by y Hybrid ) by where y DT , y RF , and y GB are the obsolescence dates from DT, RF, and GB, respectively. The proposed hybrid method shows accurate and reliable results as presented in Section 4 and the application of the hybrid method is another novelty of this study. Algorithm 1 summarizes the procedure of the proposed method. It should be noted that the algorithm is automatic so that no human intervention is required during the operation from the input data processing to the prediction of the obsolescence date.

Algorithm 1: Proposed algorithm
input : Information of the parts output : obsolescence dates of the parts 1 convert the categorical data to numeric data; 2 define the features; 3 define the target data; 4 find the optimal size of clustering; 5 partition the training data; 6 for each cluster do 7 find optimal hyperparameters; 8 find best model; 9 end 10 for each data in the test data do 11 find the closest cluster using (3); 12 for each learning method in DT, RF, GB do 13 predict using the best model of the closest cluster;

Data and Measures
We consider a case study to demonstrate the performance of the proposed machinelearning method in forecasting.

Data Collection and Problem Description
Because the prediction of the probability or the period of components obsolescence reduces the cost of purchase and maintenance, many defense industries and electronic component manufacturers have developed commercial component obsolescence prediction software. Many companies such as RAC, Z2Data, IHS, QTEC, Silicon Expert, Total Parts Plus, and AVCOM provide their own obsolescence prediction information by using various data and statistical methodologies, but the detailed methodologies or algorithms have not been disclosed. Particularly, in the case of software that provides the expected discontinuation period of parts, the error range is large or uncertain and it is provided without reference to any evidence. Therefore, it is difficult to use the obsolescence information from the commercial software as a basis for the study.
However, in the case of the parts that have already been discontinued, the part number can be obtained from QTEC along with the evidence that the discontinuation is certain, and the detailed characteristics and specifications of the part can be obtained from the Z2Data software. Among the parts available from those, the active discontinued parts in Zener diodes, varactors diodes, and bridge rectifier diodes with more than 10,000 cases have been selected and used in the study. In the case of passive components, the detailed characteristics and specifications of the parts are not diverse; thus they are excluded from this study. The data of Zener diodes, varactors diodes, and bridge rectifier diodes used in this study are provided by Leo Innovision Ltd.
The characteristics and specifications of the parts from various manufacturers are different in terms of the content and format. To standardize this, detailed technical specifications in the data sheets and test reports for each part have been thoroughly reviewed. Subsequently the characteristics common to most manufacturers that are considered important are selected as the features for each part. Through this process, 2366 Zener diodes, 350 varactors diodes, and 307 bridge rectifier diodes consisting of only discontinued parts among active electronic components while retaining different characteristics and specifications for each type are selected for the research. The diodes data from those three categories have 31, 44, and 41 features, respectively, and each dataset consists of numeric and categorical features. Table 1 lists the features for each category and the data type of the dataset used in this study.   The features in Table 1 have different contributions to machine learning. Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. Although there are many types and sources of feature importance scores, the feature importance is quantified in the current study by the permutation feature importance, which is a model inspection technique that can be used for any fitted estimator [24]. It is defined to be the decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target; thus the drop in the model score is indicative of how much the model depends on the feature. See [24] for more information. In this study, the features are standardized by removing the mean and scaling to unit variance and then the feature importance is computed by using the R 2 score as the scoring parameter of the permutation importance function. Figure 5a-c show the top 10 features for the Zener diodes, varactors, and bridge rectifier diodes, respectively, when the DT method is applied. Figures 6 and 7 show the importances when the RF and GB methods are applied, respectively.  Table 3 presents the statistics of the features of the Zener diodes. The count, mean and std represent the number, mean, and standard deviation of the data, respectively. Min and max are the minimum and maximum values, respectively and 25%, 50%, and 75% are quartiles, which divide the data points into four parts. The statistics for the varactors and bridge rectifier diodes are shown in Tables 4 and 5, respectively.

Hyperparameters
For the k-means to be effective, an appropriate number k of clusters should be estimated. As a preprocessing step, for each of k = 1, 2, . . . , training data is partitioned into k clusters and DT is applied to estimate the accuracy. k is increased until the improvement |e k − e k−1 | is small enough, where e k represents the MRE error defined in Section 5 with k clusters. That is, k is chosen such that where h is a threshold. α ≡ (e 2 − e 1 ) −1 is introduced to avoid dependency on the dataset. Figure 8 shows e k,k−1 in (5) for several k values. h = 0.06 is used in this study and the optimal k for the datasets are listed in Table 6.  Each machine-learning method has hyperparameters and the hyperparameters used in this study are summarized in Table 7. The leftmost column in Table 7 represents the names of the parameters, which are taken from the scikit library [24]. For instance, DT in the current study considers 4 hyperparameters, i.e., min_samples_split, max_depth, max_sample_lea f , max_lea f _nodes. The column in the middle describes the definition of each hyperparameter and the values of the hyperparameter considered in this study are shown in the rightmost column. For instance, the maximum depth of the tree (max_depth) for DT is one of 2, 4, 6, and 8. Then, one creates a grid of all possible hyperparameter combinations. For instance, in case of DT, all combinations from 5 values of min_samples_split, 4 values of max_depth, 9 values of min_samples_lea f , and 4 values of max_lea f _nodes are created, and DT is trained with each one of them to find the best parameters. Model tuning with such a grid search is performed for other models similarly with the values in Table 7. Table 7. Hyperparameters for the machine learning methods used in this study.

Results and Discussion
To compare the performance of different methods, the accuracy is measured by the mean relative error (MRE): and the root mean squared relative error (RMSRE): where y i is the actual value andỹ i is the predicted value. N is the number of predictions. If a machine-learning method is not applied, statistical methods can be applied for the prediction of the obsolescence date. For the expected value of the obsolescence date, the sample mean of the observed, i.e., known obsolescence dates from the training data can be used as a prediction value, which will be referred to as "Statistic" below. That is, Statistic is defined by (8) where N tr is the number of the training data, which can be used as a naive prediction value for the test data. We first determine whether learning with clustering produces any improvement over learning without clustering. Figure 9 shows the distribution of the relative error of the prediction for the Zener diode data when DT and the naive statistic are applied. Figure 9a shows the distribution without clustering and Figure 9b with clustering, respectively. The deviations from DT are smaller and the corresponding predictions are closer to the actual values than the naive approach. It should be noted that the predicted values from DT with clustering are closer to the actual values than those without clustering. Clustering is observed to reduce the variation and improve the prediction accuracy.  Figure 10 shows the distributions of r i in (9) by using the hybrid method (a) without clustering and (b) with clustering. Similarly to Figure 9, the range of the distributed values from the hybrid method is narrower than that from the naive statistic, and the result from the hybrid method with clustering is superior to the result from the hybrid method without clustering. Similar trends are observed for other machine-learning methods or other datasets as well (not shown), and it is empirically supported that clustering leads to improvement. Next, we determine the machine-learning method that produces the best prediction result. Figure 11 shows the distributions of the deviation of the prediction from various machine-learning methods, DT, RF, GB, DNN, RNN, and hybrid, when clustering is applied to Zener diodes. Four machine-learning methods, DT, RF, GB, and hybrid, result in similar prediction distributions, whereas the results from two deep-learning methods, DNN and RNN, are slightly worse than those from the machine-learning methods. Figures 12 and 13 show the results of the varactors and bridge rectifier diodes, and similar trends are observed. One of the reasons for the poor results from the deep-learning methods may be result from insufficient data. In fact, deep learning is superior to ordinary shallow machine learning if the number of data is large enough. However, the data for the current case study are insufficient and the ordinary shallow machine-learning produces better results than the deep learning in this study. Subsequently, we compare the prediction accuracy with respect to two measures, MRE and RMSRE. Table 8 presents the MRE errors of the training data with and without clustering. It shows that the errors from the naive statistic prediction and two deep-learning methods, the DNN and RNN methods are larger than those of the other shallow machinelearning methods and that training with GB overfits the given training data.  Table 9 lists the MRE error of the test data with and without clustering. The predictions from all the machine-learning or deep-learning methods with or without clustering are better than the naive statistic prediction and the four shallow machine-learning methods, DT, RF, GB, and hybrid methods produce better results than DNN and RNN for for all the three categories. Deep learning methods produce good regression accuracies in many applications, but they have difficulty in finding right parameters in this study owing to the lack of data. Although the prediction of Statistic from clustering is improved over the prediction without clustering, the results from the machine learning still dominate. When clustering is applied, the errors from the four shallow learning methods are smaller than those from deep-learning methods. Among shallow machine-learning methods, the DT, GB, and hybrid methods give good predictions for the Zener diodes and bridge rectifier diodes, whereas the DT, RF, and hybrid methods give good predictions for the varactors. Because the data in each cluster from the k-means algorithm has less variation than the entire data, the machine-learning model trained with the clusters represents the data better than a single model trained with the entire data and thus the accuracies of the models with clustering are better than those without clustering even when the same model is applied. It should be noted that the hybrid method produces good accuracy regardless of the category or the training method, which implies that the hybrid method is reliable. Figure 14a presents the MRE of the test data with and without clustering for Zener diodes, which shows that model training with unsupervised clustering algorithm improves the prediction accuracy and reduces the errors. Similar reduction in MRE is observed in the varactors as in Figure 14b and bridge rectifier diodes as in Figure 14c.  Table 10 lists the RMSRE errors of the training data with and without clustering. Similarly to Table 8, the errors from the naive statistic, DNN, and RNN methods are larger than the others and training with GB seems to overfit.  Table 11 lists the RMSRE errors of the test data with and without clustering. The predictions from all the machine-learning methods without clustering are better than the naive statistic prediction for the Zener diodes and varactors. In case of the bridge rectifier diodes, the Statistic and RNN methods without clustering result in large errors. In fact, the RMSRE errors from RNN method are large for all the three categories. The RMSRE errors from the models with clustering are smaller than those without clustering as in Table 12. The RMSRE errors from the deep-learning methods, DNN and RNN, with clustering are as small as those from the other methods for the varactors. Although the trends of the results from the RMSRE are quite similar to those from the MRE, the errors from the RMSRE are relatively larger than those from the MRE because some errors are large owing to an insufficient amount of data and the RMSRE is dependent more on such values than the MRE. Figure 15 presents the RMSRE of the test data with and without clustering for the Zener diodes, varactors, and bridge rectifier diodes, respectively. The figure shows again that unsupervised clustering algorithm improves the prediction accuracy of the supervised regression models as observed in Figure 14.    Table 13 lists the widths of the 95% confidence intervals of the predicted values using various methods. As shown in the Table 13, the size of the confidence interval of the hybrid method with clustering is much smaller than that of the method without clustering. Therefore, it can be inferred that the estimate using the proposed method with clustering is more stable and accurate. As an example, for the bridge rectifier diodes data, the width of the confidence interval of the predicted value using an RNN is 24 times wider, and in the case of using an RF, the width is 7.8 times wider than that obtained by using the proposed hybrid method. Figure 16 presents the widths of the 95% confidence interval using a bar graph, which shows the variation of the prediction accuracy of various machine-learning methods. The bar corresponding to the proposed hybrid method with clustering (red) is shorter than the others for all the three categories, which confirms the superiority of the proposed method.

Conclusions
This paper proposed an accurate and reliable method for the prediction of the obsolescence date of the components of the diodes based on the k-means method and a hybrid ensemble method. It is the novelty of the study to apply the unsupervised clustering method to the supervised regression problem to improve the prediction. The k-means unsupervised clustering algorithm partitioned the entire set into clusters of similar data. The proposed method trained with similar data in each cluster demonstrated better predictions than the single model trained with the entire set regardless of the category of the diodes even when a sufficient amount of data was not provided whereby ordinary shallow or deep-learning methods would face difficulties in realizing accurate forecasts. The hybrid method including several regression techniques made further improvements in prediction accuracy.
There are two research directions from the current proposed model. One is the combination of unsupervised clustering and deep-learning models with many hidden layers and sufficiently many data samples, which was not supported in the current study. It is expected that the accuracy of the deep-learning method will be improved when training is performed with similar data samples. The other direction is to improve the clustering method. Although the k-means algorithm is a good clustering method, there still exist areas for continued development such as sensitivity to initial values or hyperparameter tuning. Moreover, because unsupervised clustering method partitions the entire data into disjointed clusters, some samples near a boundary are assigned to clusters, which are not intuitively appropriate. If there can be a way to handle those data properly and assign them to appropriate clusters, the prediction will be improved even further.
The proposed method is applied to the obsolescence of electric diodes in this study, which can be applied to various fields from the obsolescence of other components to any regression problems in sciences such as financial market prediction.

Conflicts of Interest:
The authors declare no conflict of interest.