Reliability-Enhanced Camera Lens Module Classiﬁcation Using Semi-Supervised Regression Method

: Artiﬁcial intelligence has become the primary issue in the era of Industry 4.0, accelerating the realization of a self-driven smart factory. It is transforming various manufacturing sectors including the assembly line for a camera lens module. The recent development of bezel-less smartphones necessitates a large-scale production of the camera lens module. However, assembling the necessary parts of a module needs much room to be improved since the procedure followed by its inspection is costly and time-consuming. Consequently, the collection of labeled data is often limited. In this study, a reliable means to predict the state of an unseen camera lens module using simple semi-supervised regression is proposed. Here, an experimental study to investigate the effect of different numbers of training samples is demonstrated. The increased amount of data using simple pseudo-labeling means is shown to improve the general performance of deep neural network for the prediction of Modulation Transfer Function (MTF) by as much as 18%, 15% and 25% in terms of RMSE, MAE and R squared. The cross-validation technique is used to ensure a generalized predictive performance. Furthermore, binary classiﬁcation is conducted based on a threshold value for MTF to ﬁnally demonstrate the better prediction outcome in a real-world scenario. As a result, the overall accuracy, recall, speciﬁcity and f1-score are increased by 11.3%, 9%, 1.6% and 7.6% showing that the classiﬁcation of camera lens module has been improved through the suggested semi-supervised regression method.


Introduction
The recent development of bezel-less smartphones necessitates a large-scale production of the camera lens module before it is ready for use in the assembly stage. A camera lens module, which is normally placed at the front and back of a smartphone, consists of several spherical and aspheric lens stacked vertically inside a customized barrel [1]. Figure 1 illustrates the exploded view of an ordinary camera lens module. The determination of a well-made camera lens module relies upon the combination of the upright lens configuration as well as several external factors including barrel thickness and lens diameter [1]. With the number of factors that influence the performance of a module reaching a few hundred, it is important to understand which ones are more significant in its overall performance. Just as importantly, one must figure out the future modification of the factors to achieve a higher yield rate on the production of camera modules. One of the key factors, lens arrangement, directly determines the amount and the focus of light hitting the sensor. The main purpose of stacking multiple lenses in the right arrangement is to reduce the light from a large screen to fit the small size of the sensor. Modulation Transfer Function (MTF), a continuous variable that indicates the definition of the image created by the focused light, is a commonly used measure to compare the performance of optical systems. In definition, MTF is the magnitude response of an optical system to sinusoids of various spatial frequencies. In this study, MTF is used as the target variable to represent the quality of a lens module.
Despite the huge impact on the quality, the traditional method to find the right arrangement is to simply turn each lens one at a time by a certain amount of degrees manually until the requirement is met. The procedure includes multiple inspections of every 10 products sampled when an issue is discovered from already manufactured products. By plotting the MTF curve of each product, either lens re-arrangement or thickness modification of a spacer and the lenses is made. Modifying the thickness of a spacer and lenses is done if the vertices are concentrated at a wrong spot. On the other hand, the lens re-arrangement is conducted when the vertex of MTF curve is lower than a threshold value, which is usually the most common case. It is a more preferred option since it is easier to do than the other. The lenses are normally rotated 45 degrees clockwise from their original positions. This kind of task is not only time-consuming but it also does not guarantee a better result after it has been conducted. In addition, no reliable way to screen the effect of slight modification in real-time exists, implying that one sample can be tested only after it is completely capped inside a barrel. Such repetitive and tedious operation necessitated an automated system to foresee the outcome of products without the assembly.
Over the past few years, deep learning has emerged as a powerful tool in many engineering fields such as image processing, voice recognition and natural language processing, triggered by the advent of a fast computing hardware, e.g., GPU [2]. Its wide range of applicability has brought a new trend in both research and industrial sectors of our society, leading to the development of the popular deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), and autoencoder [2]. DNN is simply an extension of the ordinary neural network with a single hidden layer, and a simplified model representation of the complex human brain. It is often referred to as a multi-layered perceptron network that is capable of understanding the non-linear nature of high-dimensional feature space. This is due to the non-linear activation function applied after the nodes of the hidden networks. Despite its huge potential as a non-linear function approximator, one of the limitations of DNN is that it normally requires a sufficient amount of data to provide reliable output. Though no strict guideline is established for the right amount of training data; since it varies depending on the type of problem to be solved, a general rule of thumb is to have the number of data at least ten times the feature dimension [3]. However, obtaining big data is often limited due to the lack of available infrastructure, which raises demand for a method to create a similar training set and provide it with a reasonable label. In the manufacturing industry, the labeling of acquired data is often very restricted since it is only possible after the actual manufacturing of products, which often requires much resource and time. Such restriction led to the development of various methods that artificially provide labels to those data without the need for an actual experiment. Most previous studies, however, focus on the pseudo-labeling of categorical data. Semi-supervised regression (SSR), a method to pseudo-label a continuous variable has so far been limited in real-world applications due to the complexity of understanding the existing algorithms that provide a suitable regression output.
In this study, a simple but effective SSR coupled with simple random sampling (SRS) for the generation of synthetic data is proposed to boost the overall regression performance of DNN. Then, camera lens module classification is performed based on the improved regression model enhancing the reliability of the classification result. To the best of our knowledge, no prior work has ever attempted a deep learning-based approach to the classification of the camera lens module. The module is classified as either 'satisfactory' or 'unsatisfactory' based on the regression result that is improved using SSR. The rest of the paper is broken down as follows. Section 2 provides an overview of related work, whilst Section 3 details the proposed methodology. The experimental result is discussed in Section 4.1, and finally, the paper is concluded in Section 5.

Semi-Supervised Learning
Semi-supervised learning is a branch of machine learning which utilizes both labeled and unlabeled data to improve the overall generalization performance of a model. It has earned a tremendous amount of attention in various fields of research such as bioscience, material science, and manufacturing industry where the actual testing of unseen data is time-consuming and labor intensive [4]. Due to its boost in popularity, despite its abstruse nature, several approaches including expectation-maximization, split-learning, graph-based methods, co-training, and self-training have been developed [5]. However, none of these approaches is easily comprehensible to non-experts, especially those practitioners of the deep learning community who do not know the physical meaning of or statistics behind the data they deal with. Hao et al. [6] used a clustering algorithm, Dirichlet process mixture model (DPMM) to provide pseudo-class labels for unlabeled data. It has shown to be effective since DPMM is a non-parametric clustering algorithm and is thus able to figure out the unknown background classes from the unlabeled ones. Even if no labels are given, samples can be grouped based on their similarities, providing them with pseudo labels. To improve the performance, a pre-trained DNN model is first constructed using all the training data with the pseudo-labels. Then, a second DNN model is introduced by taking all the layers except the last one from the pre-trained model. The new model is then fine-tuned using only the true class labels.
It gets trickier in a regression problem because the dependent variable is no longer a discrete categorical value and is thus difficult to be categorized into groups. To make the matter worse, most studies deal with semi-supervised classification (SSC) rather than SSR as the former is deemed a more general case [7]. As a result, several SSR algorithms, despite the efficiency, is hard to implement. Similar algorithms such as semi-supervised support vector regression (SS-SVR) [8], COREG [9], CoBCReg [10], semi-supervised kernel regression (SSKR) [11], spectral regression [12] are proposed but only a few of them have been used in real-world applications due to the limited applicability, i.e., highly dependent on data type [7]. Cases of previous literature regarding SSR and its use on real-world applications are tabulated in Table 1. To circumvent the issue, this study utilizes the self-labeling method [13], which is simple and effective to enhance the overall regression performance on data collected from sensors at a production line.

Machine Learning Algorithms
In this section, machine learning algorithms used for pseudo-labeling are briefly explained.

• XGBoost
XGBoost [34] is a type of ensemble as well as a boosting algorithm that combines multiple decision trees. It is a scalable version of Gradient Boosting introduced by Friedman et al. [35]. It adjusts the weights of the next observation based on the previous one. The algorithm generally shows strong predictive performance and it has earned much popularity at the Kaggle competitions by achieving high marks in numerous machine learning tasks. •

Random Forest
Random Forest [36] is a bagging algorithm that uses an ensemble of decision trees. It effectively decreases the variance in the prediction by generating various combinations of training data. Its major advantage lies in the diversity of decision trees as variables are randomly selected. It is frequently used for both classification and regression tasks in machine learning.
• Extra Trees Extra trees [37] is a variant of random forest which is a popular bagging algorithm based on the max-voting scheme. extra trees are made up of several decision trees in which there are multiple root nodes, internal nodes, and leaf nodes. For each decision tree, a different bootstrapped data set is used. It should also be noted that for bootstrapping, all samples are eventually used because they are selected without replacement. At each step in building a tree and forming a node, a random subset of variables is considered resulting in a wide variety of trees. At the same time, a random value is selected for each variable under consideration leading to more diversified trees and fewer splitters to evaluate. The diversity and smaller tree depths are what make extra trees more effective than Random Forest as well as individual decision trees.
Once various decision trees are established, a test sample would run down each decision tree to deliver outputs. The most frequent output will be provided as the final label to the test sample as the result of the max-voting scheme. •

Multilayer Perceptron
Multilayer perceptron [38] is a feed-forward neural network composed of multiple layers of perceptron. It is usually differentiated from DNN by the number of hidden layers. It utilizes a non-linear activation function to learn the non-linearity of input features.

Dataset Preprocessing
Raw data goes through a series of steps, called pre-processing so that it becomes readily actionable, or structured as in the view of a data scientist. One example of this includes the labeling of the MTF outcome that should be assigned with either '0' if the value is above a threshold or '1' otherwise. The labels indicate 'satisfactory' and 'unsatisfactory' status of products respectively. The class-labeled data is only used for classification. Then, the input variables are normalized to values within range from 0 to 1. In this study, camera lens module data obtained by various sensors is used as an input. Table 2 shows the types of input variables used for training, while the output variable is MTF. However, the number of collected data samples from the sensors is less than a few hundred, making it extremely challenging to predict the output. To solve this issue, the unlabeled data set is synthesized using SRS on each input variable. SRS takes samples of each element of the variables based on equal probability distribution [39]. Then, they are given labels following the simple semi-supervised pseudo-labeling method as illustrated in Figure 2. First, a model is selected to train the raw data set. The best performing model on the raw data is then used to assign pseudo-labels on the synthesized data. For selecting a model, we take account of only machine learning algorithms because they are known to be much more suitable with small amount of data. Of the various algorithms (Random Forest, XGBoost, Gradient Boost, K-nearest Neighbor, Extra Trees, and Multi-layer Perceptron), the selected model is Extra Trees since it showed the best performance on the raw data. Lastly, the newly labeled data is shuffled well with the raw data, thus leading to a better performing model with an increased number of data samples. The disadvantage of this simplified pseudo-labeling method is that the selection of the model determines the quality of synthesized data labeling. Moreover, the error might add up because of an increase in the number of data samples. Therefore, the optimal number of samples for the best prediction is determined by an exhaustive method, where 10 different data sets with the number of samples increasing in order are tested respectively. Table 3 is a list of the aforementioned data sets with each corresponding number of samples. Furthermore, 5-fold cross-validation is implemented to ensure a more generalized test result and to avoid overfitting. The performance metrics used for the regression task are RMSE, MAE, and R-squared. For the computation, we used GeForce RTX 2080 (NVIDIA, Santa Clara, California, CA, USA, 1993) that was used to run Python 3.5.3 and Tensorflow 1.13.1. Table 3. Raw data and nine data sets augmented by Simple Random Sample.

Sampling Method Training Data Set Number of Sampled Data
Simple Random Sample (SRS)

Deep Learning Architecture
Followed by data pre-processing, a DNN model as shown in Figure 3 is constructed to predict the MTF. The structure of the model is determined in a heuristic fashion. There are three hidden layers, each of which has 10 nodes respectively. The final node constitutes a single node for outputting a regression result. For the training, Adam optimizer, ReLU activation function and Xavier initializer are used. ReLU activation function is applied after every multiplication of weight plus bias but is omitted at the final layer. The objective function is the Mean Squared Error (MSE), which calculates the mean of the squared difference between the target value and each batch samples. The optimal hyper-parameters including learning rate, iteration and batch size are discovered using a small portion of training data (15%) based on grid-search. Early stopping is implemented to break the training procedure. Table 4 shows the top 5 results when different hyper-parameters are applied for the training of the model. Once the hyper-parameters are optimized, the best regression result is attained and the optimal number of samples is determined, the next step is to implement classification using class-labeled data. The deep learning model is similar to the one used for the regression but the objective function as well as the activation function used at the final layer. Since it is a multi-class classification problem, categorical cross-entropy loss is minimized for optimization and softmax function is applied at the final layer of the DNN. Here, an area under the ROC curve (AUC) is calculated to see if there is any improvement by comparing the classification results between the augmented and the raw data.

Experimental Workflow
The overall procedure is illustrated and summarized in the flowchart below ( Figure 4). 'HP opt' stands for hyper-parameter optimization. The overall workflow is divided largely into two parts. The first part is to improve the general regression performance by training additional data in a semi-supervised manner. The second part is to make classification on the raw and the augmented models. To convert into a binary classification task, it is necessary to switch the continuous target variable into a discrete variable with either '0' or '1' labels based on the user-defined discrimination threshold value of MTF, and it is done after an optimized regression model has been discovered. The class-labeling process is described in Section 3.1. Optimizing the regression model is necessary because the classification accuracy is directly dependent on how closely the established regression model can predict the output. Getting into more details of the first part, given the raw data, the procedure splits into 10 branches in which different data sets from Table 3 are processed to give the best performing model in terms of regression. For synthesizing different data sets as listed in Table 3, simple random sampling on raw data is conducted. The sampled data are then appended to the raw data to make up augmented data sets. They are then given pseudo labels by extra trees regressor. Subsequently, the type of regression model used is DNN with the architecture as specified in Section 3.2 Each regression model is tested by 5-fold cross-validation. As for the second part, the aforementioned class-labeling is implemented, followed by training of a similar network with the selected model k. Finally, it is then compared with the model using raw data for the classification task.

Regression Performance
Prediction of MTF is plotted in Figure 5. Since the test set is split into five subsets, the regression plot for each test set are shown from top to bottom. Even though the best performing data set for all performance metrics tends to vary as the test set is changed, the overall regression result improves as the number of samples is increased. This is shown by the prediction curve in the red dotted line that is much more generalized than the blue dotted line. Figure 6 shows the average values of validation results presented in Figure 5 in terms of RSME, MAE, and R 2 . It shows that it is best to use 768 samples for training to get the least RMSE, MAE and the highest R 2 , which in this case are 5.89, 4.78 and 0.65 respectively.
The result implies that the performance increases by 18%, 15% and 25% for each performance metric when more training data is used. However, the plots also show that the performance begins to degrade at the number of samples of 768. This is because the first model for pseudo-labeling of the synthesized data is not the best, thus the pseudo labels should bear an error indispensably. In addition, through SRS for generating synthetic data, infeasible samples may have been generated. This implies that additional training data with error should be used only in a limited amount. To demonstrate that the data augmentation through pseudo labeling brings improvement in the general regression performance, the evaluation metrics are also measured for several other promising machine learning algorithms as shown in Table 5. Though it does not excel in every aspect, our proposed model shows the best result overall. The hyper-parameters for each machine learning algorithms have been heuristically optimized as follows: Support Vector Machine (kernel = 'rbf' and regularization parameter C = 5.0), Random Forest (number of trees = 100 and max depth = 2), Gradient Boosting (number of boosting stages = 100, learning rate = 0.1 and max depth = 5), XGBoost (learning rate = 0.3, max depth = 6 and L2 regularization term on weights = 1).

Classification Performance
The effectiveness of increasing the number of training data through a pseudo-labeling is also apparent in the classification task. Table 6 shows accuracy, recall, specificity and f1-score for the augmented and the raw data. The overall accuracy, recall, specificity and f1-score increase by 11.3%, 9%, 1.6% and 7.6% at best when more data is used to train a deep learning model implying that the proposed method would most likely be correct on camera lens module classification and is more reliable in terms of prediction. The better f1-score provides more reliability to the conclusion because the test sets are class imbalanced and f1-score is a more suitable measure for such a case. The rates are calculated based on the summed confusion matrices for five test sets shown in Figure 7. Looking at the ROC curves, AUCs generally get higher when the augmented data is used for training. This implies that the constructed model is more capable of distinguishing between the two classes. Therefore, it is now better at predicting zeros as zeros and ones as ones. Table 6. Accuracy, recall, specificity and f1-score for augmented and raw data.

Data Type
Accuracy Recall Specificity F1-Score

Conclusions
Collecting data is often very restricted due to the lack of infrastructure. It is even more restrictive when labeling data is possible only through the manufacturing of products. The goal of this paper is to demonstrate the effectiveness of making a prediction through a simple but effective method when there is small amount of labeled data available on-hand. In this study, a semi-supervised regression (SSR) coupled with simple random sampling (SRS) for the generation of synthetic data is proposed to boost the overall regression performance of DNN. Then, camera lens module classification is performed based on the improved regression model enhancing the reliability of the classification result. The performance of the proposed model is validated on five different test sets for both raw and augmented models. The proposed scheme shows that the regression performance increases by 18%, 15%, and 25% while the classification performance does by 11.3%, 9%, 1.6%, and 7.6% at best, achieving 97.7% accuracy, 98.8% recall, 75.7% specificity, and 98.8% f1-score on average. This result shows that the classification performance which is heavily influenced by the discrimination threshold value of MTF is increased with the help of improved regression. Although the overall performance has improved, it is shown in the result section of regression that it begins to degrade at some point. This is due to the error in the pseudo label and the lack of raw data making it challenging to figure out the true distribution of the data. One possible alternative to SRS would be to try some other sampling methods given that the distributions of the input variables are known. Another interesting approach would be to iterate the pseudo-labeling process a few more times until the performance ceases to get better.

Conflicts of Interest:
The authors declare that there is no conflict of interest.