Quality-Related Monitoring and Grading of Granulated Products by Weibull-Distribution Modeling of Visual Images with Semi-Supervised Learning

The topic of online product quality inspection (OPQI) with smart visual sensors is attracting increasing interest in both the academic and industrial communities on account of the natural connection between the visual appearance of products with their underlying qualities. Visual images captured from granulated products (GPs), e.g., cereal products, fabric textiles, are comprised of a large number of independent particles or stochastically stacking locally homogeneous fragments, whose analysis and understanding remains challenging. A method of image statistical modeling-based OPQI for GP quality grading and monitoring by a Weibull distribution(WD) model with a semi-supervised learning classifier is presented. WD-model parameters (WD-MPs) of GP images’ spatial structures, obtained with omnidirectional Gaussian derivative filtering (OGDF), which were demonstrated theoretically to obey a specific WD model of integral form, were extracted as the visual features. Then, a co-training-style semi-supervised classifier algorithm, named COSC-Boosting, was exploited for semi-supervised GP quality grading, by integrating two independent classifiers with complementary nature in the face of scarce labeled samples. Effectiveness of the proposed OPQI method was verified and compared in the field of automated rice quality grading with commonly-used methods and showed superior performance, which lays a foundation for the quality control of GP on assembly lines.


Introduction
Product quality is the driving force for every enterprise, which is an important factor to keep an impregnable position in the modern global competitive environment [1,2]. Product quality inspection or monitoring is basically performed by the performance tests of products as well as appearance assessments, to avoid the possible defects in products and ensure customer satisfaction [3][4][5]. The quality of most of types of products can be reflected with their corresponding visual attributes, e.g., glossiness, color, object size, surface coarseness and varieties of defects on the product surface, which are effective sensory indicators for product quality inspection or condition monitoring to a  [19]; (e) image segmentation result by the watershed algorithm integrated with a morphological grayscale reconstruction method [20]. Results from the canny operator, Sobel operator are post-processed by using Otsu's threshold method.
some first or second statistics as image features in a special transform domain. Unfortunately, the results are possibly misleading or ambiguous in some extreme circumstances. For instance, we can get the same statistics from two sample groups, which actually come from different distributions or distribution models.
As research continued, considerable efforts were devoted to the probabilistic model-based methods to interpret ISS, especially integrated with the prevalent multi-channel image analysis [28], such as Wavelet transform and Gabor filtering. Many useful distribution models are introduced to do statistical modeling of the ISS based on some basic assumptions, e.g., the independent and identically distribution of pixels, and homogeneous spatial assumptions.
In particular, researchers adopted leptokurtosis and fat-tail-shaped distribution models, e.g., Gaussian scale mixture (GSM) [29], generalized Gaussian (GG) [30], Gamma distribution (GD) [31], Gaussian Mixture model (GMM) [32] to characterize the marginal distribution of the image wavelet coefficients, owing to the sparse behavior of the wavelet coefficients. In other words, the marginal distributions of wavelet coefficients are highly kurtotic and long tailed in the coefficient domain or seriously left-skewed in the magnitude coefficient domain. The higher order statistics, e.g., the joint distribution representing the statistical correlations of the pixels in adjustable distances and fixed orientations, is subsequently investigated by augmenting a simple parametric model with a set of hidden random variables that govern the parameters. These established statistical models are used as the prior probability and substantially improve the power of image processing and analysis technologies, such as image denoising [33], foreground or object segmentation [31,32], texture image retrieval [34].
Although model-based methods provide a promising idea for the GPI analysis, current model-based methods conduct limited theoretical analyses of the underlying spatial distribution characteristics of these complex GPIs. These proposed statistical models mainly depend on the experience of experts with the observation of limited image samples. If the predefined statistical models do not really conform to the real distribution profiles of the candidate GPIs, image-based OPQI system may lead to wrong decisions and cause potential economic losses. Thus, the visual appearance with respect to the microheterogeneity, complexity, and uncertainty, and spatial stochastic distribution properties, exhibiting in the GPI remain a great challenge to be depicted effectively [14,35] for the visual sensor-based OPQI.
This work concerns the theoretical statistical distribution of GPIs, introducing an essential statistical model, WD model, by the theory of sequential fragmentation, which is well known in the continued comminution processes. The WD-MPs are then extracted as the GPI feature, whose corresponding significant perceptual meaning is discussed.

Classification
Visual images-based OPQI is essentially a pattern classification or recognition problem. Mathematically, given the labeled example set L " tpx L1 , y L1 q , . . . , px Li , y Li q , . . . , px LM , y LM qu , the task of OPQI is to assign the proper product quality tagŷ t to the probe sample t (with the image feature vector x t ), in order to judge whether the product sample t is in compliance with the quality requirements.
Many supervised learning methods such as linear discriminant analysis (LDA), support vector machine (SVM), least squares-support vector machine classifier (LS-SVM), linear regression (LR), kernel ridge regression (KRR), artificial neural network (ANN) [36] and their variants can solve this problem. The performance of the existing pattern classification methods mainly depends on the amount of labeled samples as well as their distribution in the whole sample space. Generally speaking, the larger the amount of training samples, the better the performance that can be achieved for every supervised learning classifier. Unfortunately, labeling the samples is expensive in terms of cost and effort in most practical applications. For example, in the OPQI of rice products, rice product grade tags should be assigned based on the aggregative indicators of rice surface gloss, Sensors 2016, 16, 998 5 of 34 grain size, and the nutritional ingredient assay measured in the laboratory, which is a very tedious and time-consuming work. Hence, although we can easily obtain a great amount of unlabeled rice image samples U " tpx U1 ,´q , px U2 ,´q , . . . , px UN ,´qu by visual sensors, where the strikeout means the corresponding quality label is unknown, only a few labeled samples are available for classifier learning.
Apparently, exploiting unlabeled samples to help supervised classifier learning is a promising solution to solve the scarcity of labeled samples and has been a hot research topic in recent years. To take full advantage of the underlying classification information from the unlabeled samples, semi-supervised learning-based classifier design cause great attention and many successful cases have been reported in the literature, see [37][38][39][40]. Roughly speaking, current semi-supervised learning methods can be categorized into three groups: the first are the generative model-based semi-supervised learning methods. These methods regard the probability of the category labels of the unlabeled samples as a missing parameter, and then the expectation-maximization (EM) algorithm is usually employed to estimate the unknown model parameters [41]. Many commonly-used models are reported in the literature, e.g., Gaussian mixture model [42], and mixture-of-experts system [43]. This method is intuitive and easy to understand and simple to implement, but its accuracy relies on the choice of generative models.
Another are the graph-regularization-framework based methods [44]. These methods usually build a data graph structure based on the marked sample points and unlabeled data points, the tags of the samples are propagated from the labeled points based on the adjacency diagram of the tags to the unlabeled points. Analogously, the performance of these methods also depends on the construction of the data graph.
A third are the co-training methods, which have undergone many improvements [21,45] and have been recognized as one of the main paradigms of semi-supervised learning since they were first proposed [46]. Based on the idea of ensemble learning, more than one, e.g., two, classifiers are established separately on the corresponding sufficient and redundant views. Then, each classifier predicts the labels of the unlabeled samples for the other classifier during the learning process. Predicted labels with high confidence are chosen to augment the training set.
Although co-training methods have been used in many fields, sufficient and redundant views for the corresponding classifiers are required for the traditional semi-supervised learning, which is a condition that cannot be met in many scenarios, especially in practical applications [21,45]. Hence, researchers have attempted to design algorithms that overcome that adverse restriction.
Actually, as stated in [45], with the idea of bagging ensemble learning, different supervised learning classifiers can work without attribute partition or redundant view construction. The labeling confidence can be explicitly measured when a classifier attempts to label the unmarked samples to the other classifier. Hence, researchers have attempted to establish different classifiers by different learning algorithms with complementary prosperities to realize the semi-supervised learning, which do not need the attribute partition and redundant view construction. The appropriate unlabeled samples with high enough confidence labeled by the classifier are chosen to regularize the learning process in order to gain much better generalizationability. More detailed information can be found in [47].
In this paper, a co-training-style semi-supervised classifier named COSC-Boosting algorithm, inspired by the semi-supervised co-training regressor algorithm, COREG [48], is proposed for OPQI. This algorithm employs two different classifiers with complementary natures, each of which labels the unlabeled samples for the other during the learning process on account of the parallel learning property of Bagging ensemble learning. This algorithm does not require sufficient redundant view construction, nor does it require a tenfold cross-validation for label confidence evaluation. These two complementary classifiers are thin plate spline regression classifier (TPSRC) [49] and multivariate adaptive regression spline classifier (MARSC) [50]. TPSRC mines classification information from the overall description of the feature vector of each labeled object, ignoring the local factor interaction in each feature vector (in this paper, any one variable in the GPI feature vector is called a factor). Complementarily, MARSC considers the factor interaction in each feature vector while making full usage of each factor in the feature vector.

WD Model
As stated in [2,51], any object in the viewing field fragments the scene into two regions, e.g., the internal (foreground or object) and external region (background). In terms of the number of particles in the GPI, the visual scene is inevitably divided into plenty of fragments. The mixture of the shapes, edges, cast shadows with their distributions or organizations of the fragments result in the visual appearance of the ISS of GPI [51].
The best way to describe the ISS of GPI is the spatial statistics of the organization of the texture elements (fragments or particles) using their spatial stochastic aspect. The spatial statistics are the result of an image forming process [51], which is actually equivalent to a single-event fragmentation process of continued comminution in the ore grinding process, which is well studied and can be characterized by the sequential fragmentation theory [2,52].
According to the theory of sequential fragmentation, the probability distribution of the fragmentations of the GPI shows a power-law distribution with the assumption that small details are occurring more often in an image than large structures [53,54]. The resolution power of the visual sensor in real applications cannot be infinite, the fragmentation process of the local particles will inevitably cease. Hence, the statistical distributions of the ISS of the GPI just correspond to the fragments with local contrast larger than a certain fine structure x in GPI. Therefore, the statistical distribution model of the ISS of GPI can be described by the integral form WD model [2,51,52]. A detailed demonstration of the WD process of GPI can be found in Appendix A.
The probability density function (PDF) of the WD of integral form is given by a tri-parameter function f px; µ, λ, βq , namely: 1 λ βΓ´1 λ¯ı is a normalizing constant, only related to the model parameters λ and β and Γ( ) is the gamma function and Γ pxq " ş 8 0 t x´1 e´tdt. Since we cannot get a closed-form solution of WD-MPs, µ, λ, β, based on the maximum likelihood estimation (MLE), WD-MPs can be estimated by an iterative procedure, the Nelder-Mead simplex algorithm [55], to get the optimal numerical solution. Detailed computation steps can be found in Appendix B.
To evaluate the accuracy of the WD model, the statistics χ 2 and Kullback-Leibler divergence (KLD) can be used to measure the goodness of fit, which is computed as follows: KLD " where h(x i ) and f px i ;μ,λ,β) represent the empirical and expected probability on x i , respectively. Both lower χ 2 and KLD values indicate more precise statistical modeling results.

Perceptual Significance of WD Model
The WD model can represent a series of classical statistical distribution by changing the model parameters. For example, when λ = 1, WD becomes the double exponential distribution with a mean value of β. Given λ = 2, WD becomes a Gaussian distribution (GD). Assigning a small value of λ, WD is basically close to the symmetric power-law distribution, given by f py; δ, uq " 1 2δ {y´µ{´δ´1, where the exponent δ is the measure of fractal dimension [51]. In addition, some studies have shown that the fractal dimension D f of an image can be estimated by the shape parameter of WD [53], namely, Researchers have demonstrated that the WD-MPs are directly related to the visual perception characteristics of biological vision systems [51]. With respect to WD-MPs, µ is the location parameter, indicating the global reflectance of the image and it can be derived from the shape-from-shading method [56]. λ is a shape parameter, indicating the grain size with respect to the resolving power, and β is the scale parameter, controlling the width of the distribution. Studies [2,51] have demonstrated that the WD-MPs can make a physical explaination of the human visual perception (HVP) properties [57], such as coarseness, regularity, contrast, roughness, and directionality.
(1) Coarseness or fineness is a fundamental HVP attribute. Commonly, the larger the basis element (fragment or particle) size is, the coarser it is felt in the HVP of the image texture. The coarse texture comes from the magnification of the image accompanied with an increase of resolving power. It can be indicated with the shape parameter λ of the WD model. However, the perception property "coarseness" is evidently related to the observation scale. For example, if we magnify a GPI, but without increasing the resolving power, no new details are included, then the scale invariance will achieved by the HVP with the adaption of the reception field size to the digital magnification of the image. The WD model well fits this kind of scale invariance of the HVP property, namely, a constant shape parameter λ remains. Whereas, if the image magnification with an increase of the resolving power, more details with larger grain are captured in the field of view. Though this process does not affect the WD nature of the ISS, the shape parameter λ becomes smaller in the perception of the "coarse" texture [51]. (2) Regularity is a visual perceptual property regarding the layout variations of the basis texture elements (particles) in the image. In general, a fine texture image tends to be perceived as regular.
As addressed above, the coarseness or fineness can be indicated by the shape parameter λ of the WD model. Hence, the shape parameter λ can make a physical explain of the HVP property, regularity. In the extreme case, a few particles or even just one particle exists in the receptive field, namely, the image can be distinguished by a few foreground objects and the remaining background regions. Then, the WD model is rejected, and the spatial distribution of the viewing field can be depicted either by power-law distribution or as pure regular texture (one object fully occupies the entire view of the field). Alternatively, the exhibiting statistical distribution shape is often multimodal when the GPI includes numerous fine particles fully filled in the entire receptive field. This phenomenon can be reflected with the maximum likelihood estimation (MLE) of the shape parameter λ with the resultant of λ " 2 [51], which is the overfitting result of fat tails of the WD model. In this case, the spatial distribution of the grain image is perceived to be regular. (3) Contrast records the variation range of the illumination intensity or even the color depth of the texture images. The perceptual property "contrast" is essentially caused by the illumination intensity together with the surface height variations of observation objects. The incident and the reflection of the light ray on the object surface codetermine the visual appearance of the ISS with the perception of illumination contrast. The global illumination variation can be reflected by the location parameter µ, which determines the center of the WD and indicates the inhomogeneous illumination surface. Whereas, the surface height variations of the observed objects can be reflected by the scale parameter β, the "width" of WD model, according to the theory of "sequential fragmentation". Hence, the perceptual contrast can be indicated by two WD-MPs, µ and β [51]. (4) Roughness is originally a tactile property of a surface. Though it seems a 3D property, humans can perceive it with the observation of the 2D image. Roughness is the perceptual properties results from the height variations of a certain granularity of particles. The greater the height variations of the special sizes of particles are in the view of the field, the rougher perception it is felt. As discussed above, β is an indicator of the height variations of the texture, and the shape parameter λ is the indicator of the granularity of the particles (perceptual coarseness) in the grain image. Hence, perceptual roughness can be reflected by the WD-MPs, β and λ, under a special illumination condition, indicated by the location parameter µ as addressed by Geusebroek [51]. Thus, the combination of these parameters can effectively indicate the roughness of the grain image. (5) Directionality is a global sense over the entire view of the field. It indicates the dominant orientation of the texture, which is caused by the shapes of the texture elements as well as their placement rules. Though WD-MPs do not include the direct shape information of thetextons(particles), the placement of textons (particles) can be implicitly characterized with WDMPs. Studies [51,53,54] have demonstrated that the anisotropy of grain sizes can be described by the dominant direction of the shape parameter λ. Anisotropy in texture shadows (or contrast) can be reflected by the dominant orientation of the scale parameter β. GPI may exhibit two types of directionalities or anisotropies. The first type is caused by the particle size, and the second type is caused by the contrast variations of particles. Thus, if we fully consider the structural information of the grain image, the HVP-related perceptual attribute, directionality, can be described by the corresponding dominant orientation information of the shape parameter λ and scale parameter β of WD model.

Gaussian Derivative Filter (GDF)
Digital images are discrete versions of continuous 2D functions. The image functions at a given point are the finite order truncations of their Taylor series according to Taylor's theorem. The pixel intensity I(x,y) at the position (x,y) in any local spatial fragment around a predefined original observation point (x 0 ,y 0 ) can be determined by the Taylor expansion, namely: where the term I x i y n´i is a n-order mixed derivative of the image function I evaluated at the point (x 0 ,y 0 ), the orders of partial derivative with respect to x direction and y direction are i, n -i respectively; n! denotes the factorial of n; R n is the Lagrange residual term. Equation (4) indicates that the local observed value in the visual pattern is actually determined by the weighted addition of the derivatives at the original observation point over a certain spatial extent (the observation scale), which are derived from the visual appearance ISS. The n-order mixed derivatives (I x i y n´i ) in Equation (4) are closely related to the edge layout, involving the most important spatial structure information of the image I. The combination of the mixed derivatives in the Taylor expansion gives a complete representation of the local ISS. I x i y n´i is usually expressed by the GDF, namely: where˚denotes the convolution operator and G x i y n´i px, y, σq is an i+ (n -i) = n-order mixed derivative of Gaussian filters, subject to i ě 0; n -i ě 0: G σ px, yq is the Gaussian kernel with the scale parameter σ.

OGDF
To facilitate description, we denote a simple notation G κ,σ as a κ-order mixed GDF with scale parameter σ. The use of G κ,σ for ISS characterization can reflect the structure information of a GPIat the corresponding x coordinate and y coordinate direction with respect to the mixed partial derivative of G κ,σ . As mentioned above, the ISS of GPI is directional. The shapes and directional arrangements of the local homogeneous particles contribute the visual appearance of the ISS of the GPI. In order to take full consideration of the multi-directional ISS, it is necessary to construct the orientated filter for omnidirectional ISS feature characterization.
Intuitively, the response of the image I at direction θ results from the convolution operation of the image I with the rotated GDF at the corresponding orientation θ. Hence, we should do the convolution operation at all of the directions if we need the omnidirctional ISS features. Unfortunately, this computing style is fairly time-consuming, which is not suitable for GPI processing in the OPQI.
As stated by Freeman [58], the linear combination of several Gaussian derivative filter bases is steerable, hence we can obtain the omnidirectional ISS by linear weighting addition of a limited number of filtering responses of GDFs. Suppose we have the following filter template: where G κ,σ px, yq represents a mixed κ-order GDF with the highest order of κ. As the filter template in Equation (7) is steerable, the filtering response of image I with filter template G κ,σ px, yq at a special rotation angle θ satisfies the following formula [59]: where X = (x,y) T , R θ is the rotation factor, R θ "˜c osθ sinθ sinθ cosθ¸, G κ,σ pX, R θ q is the rotational version of G κ,σ pXq in the direction of θ, * represents the convolution operation; I m,i (X) is the filtering response of the image I with the mixed partial derivative of Gaussian filter G m,i,σ with the derivative orders of i and m´i with respect to the x and y directions, respectively, namely: Hence, if we obtain the linear weighting coefficient α m,i , we can achieve ISS at any orientation with the steerable filter G κ,σ px, yq via a weighted summation of several base GDF responses of the GPI I at a very low computational cost. The computing mode of the steerable filtering is displayed in Figure 2.
Hence, if we obtain the linear weighting coefficient α , , we can achieve ISS at any orientation with the steerable filter , ( , ) via a weighted summation of several base GDF responses of the GPI I at a very low computational cost. The computing mode of the steerable filtering is displayed in Figure 2. The coefficient α , can be computed by using the properties of linearity, rotational and differentiation of the Fourier transform of Equation (8), and the eventual result α , is a trigonometric polynomial function of the orientation θ, given by the following computing rule: The detailed derivation process can be seen in Appendix C. The coefficient α m,i can be computed by using the properties of linearity, rotational and differentiation of the Fourier transform of Equation (8), and the eventual result α m,i is a trigonometric polynomial function of the orientation θ, given by the following computing rule: The detailed derivation process can be seen in Appendix C.

GPI Feature Extraction
The responses of OGDF always exhibit a periodicity of period π on the orientation map in terms of the properties of steerable GDF. Hence, only the directions in the [0~π] are essential for ISS analysis. We discretize the continuous direction of [0~π] into N discrete orientations uniformly for omnidirectional ISS characterization. ISS at the N pre-chosen directions (ISSPDs) are computed by linear weighted summation of the responses of the separable GDF bases as expressed in Equation (8). We do statistical modeling of ISSPDs based on the WD model and the WD-MPs are used to generate the feature variables of ISS.
ISS is not only direction-dependent, but also related to the Gaussian scale parameter σ of G κ,σ . In order to obtain the features of ISS of multi-scales, we employ a series of scales {σ i |i = 1,2, . . . ,T} for ISS analysis. Suppose a fixed scale space for ISS analysis is σ, the omnidirectional statistical distribution feature vector, f Global I,σ of ISS of a global image I, can be expressed as follows:  The fitting accuracies with WD model and GD model ofISS are also compared and displayed with measures of statistics KLD and χ 2 , which indicate clearly that the WD model is much better than the GD model to do statistical modeling of the ISSI of grain images. (a) GPI feature extraction with a first-order GDF template G 1,0 ; (b) GPI feature extraction with a second-order GDF template G 2,0 .
However, the global WD-MPs (GWD-MPs) obtained from the entire filtered images suffer from the loss of the local structure information of ISS. We adopt the analogous processing mode reported in [30] to gain the local information, namely, we split the filtered images into non-overlapping sub-images and each sub-image is treated as an individual image, whose WD-MPs are extracted as the local WD-MPs (LWD-MPs) of GPIs. Finally, GWD-MPs and LWD-MPs are concatenated into an extended ISS feature representation. Finally, the extended WD-MPs feature of a GPI is formulated as: where f GWD´MPs

Basic Idea of COSC-Boosting
COSC-Boosting algorithm is a semi-supervised classifier algorithm based on the parallel learning property of Bagging ensemble learning, which employ two classifiers, TPSRCh TPSRC (X) and MARSC h MARSC (X). Each of the classifier labels the unlabeled samples for the other. The predicted labels of the unlabeled samples with high confidence are chosen to augment the labeled data set, to regularize the classifier learning process. The labeling confidence is estimated by consulting the influence of the labeling of the unlabeled samples on the actually labeled samples, namely, the classification error of the classifier on the labeled example set should decrease most quickly if the most confidential unlabeled samples are used [48].
In more detail, COSC-Boosting algorithm repeatedly selects M' unlabeled samples from the unlabeled samples U randomly with replacement. After each selection, the pre-configured TPSRC and MARSC (trained based on the labeled samples) are used separately to pre-label the M' unlabeled samples, and then the samples with high degrees of confidence are applied to the TPSRC and MARSC for classifier model updating and eventually to achieve better classification performance. Such selection of M' unlabeled samples for classifier updating is to ensure the diversity of sample selection on one hand and to avoid the classifier overfitting by restricting the number of potential confident points on the other hand.
In terms of the complementary nature of TPSRC and MARSC, for every unlabeled sample x ui in the repeated selection, if TPSRC and MARSRC assign the sample label on x ui separately on the current training condition, namelyŷ ui " Label th TPSRC px ui qu " Label th MARSC px ui qu , then the sample px ui ,ŷ ui q is a candidate high confidence sample. However, only the truly high confidence samples should be used to update the classifier learning. Intuitively, the truly high confidence labeled samples should be the samples that make the classifier consistent with the labeled example set (augment set, including the unlabeled samples but assigned labels with the classifier in the previous learning steps with high confidence level). To evaluate the consistency, the mean square error (MSE) of the classifier on the labeled are used to evaluate the confidence.
The MSE of the classifier utilizing the information provided by px ui ,ŷ ui q can be evaluated on the labeled example set. Let hT PSRC pxq and hM ARSC pxq be the refined classifiers of h TPSRC pxq and h MARSC pxq respectively, which are updated by a high confidential label px u ,ŷ u q , and r L be the augmented training sample set. The high confidence labeled examples tpx ui ,ŷ ui qu are identified through the following rule: where if and only if both MSE ph TPSRC q and MSE ph MARSC q satisfy the above conditions at the same time, then the unlabeled sample is confirmed as a high confidence labeled sample candidate and can possibly be used to refine the classifier, in other words, we accept hT PSRC pxq and hM ARSC pxq at the condition when MSE declines and the labeled set is augmented by px u ,ŷ u q who maximizes the values of MSE ph l q.

TPSRC
Here we take the two-class case as an instance. In this case the class label y i on the sample feature x i belongs to a two-value label set {+1,-1}, e.g., let "´1" represents the "high quality" product and "+1" label the "other quality" product. Given that L " tx Lt , y Lt u M t"1 is a training data set consisting of M samples, including M 1 sampling points representing the "high quality" product samples, denoted as Ω H , and M -M 1 sampling points of "other quality" product samples, denoted as Ω 0 , i˘" y 0 1 « 1. This regression or classification task can be solved in a generalized framework with data fitting and function smoothness by solving the following optimal problem, namely, [49,60]: where s(f ) is the smoothness penalty function on function f, and η is the regularization parameter.
Minimizing J(h TPSRC ) with different hypotheses will achieve the spline function h TPSRC of different forms. In this study, we attempt to find the form of function h TPSRC in the Sobolev space [60], where s(f ) is defined as a semi-norm and researchers have proved that the result of Equation (15) is a unique spline function given by [49,60]: s is the order of the partial derivative of the semi-norm, satisfying s > 0; tp i pxqu d i"1 p i pxq is a set of primitive polynomials, which spans the polynomial space of total degrees less than s. The larger the s is, the smoother function h TPSRC will be achieved.
In real applications, researchers usually limit the polynomial space to be linear and consider a kind of radial basis function as the only one form of Green's function, then the following spline function can be derived [49,60]: where φ H j pxq and φ 0 j pxq both take the radial basis function as the form of Green's function, namely, In terms of the geometrical meaning of the Equality (17), ω 0 represents the translation, reflects the affine transformation, whereas the remaining terms in the Equality (17) record the locally nonlinear deformation of the Ntraining samples. As stated in [49,60], the coefficients , ω 0 , ω, ψ H and ψ 0 , in the classifier h TPSRC (x) can be learned by a group of linear equations based on the training set at a very low computational cost. Detailed construction of the linear equation set with the solution of the coefficients can be found in Appendix D.

MARSC
The multivariate adaptive regression spline (MARS) method is a multivariable, nonlinear and nonparametric regression technique for flexible modeling of high dimensional data. The classification model simulates the complex nonlinear relationship by spline functions, which takes the form of an expansion in product spline basis functions, where the number of basic functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the training samples [50].
MARS is not only a data-driven function regression method, but also has the advantages of accurate classification ability, which has broad applications in pattern recognition, system identification, process control. In contrast to TPSRC, it provides a computationally feasible approach that approximates the basis function subset selection procedure [50].
Given the training set L " tx Lt , y Lt u M t"1 and x = [x 1 ,x 2 , . . . ,x p ] T , a feature vector with p-dimensional variable, each variable in the feature vector represents a factor, a general MARSC is as follows: where K is the number of spline basis functions, α = {α 0 ,α 1 , . . . ,α K } is the output weighting value, K m is the number of factors in the m-th basis function H i (x), s km accepts only two values {+1,-1}, which indicates the sense of the truncation, v(k,m) labels the predictor variables and represents which predictor variable locating in the k-th section of m-th basis function, t km is the knot location or partition threshold. [s km (x v ( k,m )-t km )] + is a step polynomial, namely: MARSC function described in the Equation (18) can be expanded in a more explicit form by a simple rearrangement of terms: where the first summation term in Equation (20) collects together all basis functions that involve only one variable (K m = 1), the second term collects together all the basic functions which involve two but only two factors, analogously, the i-th term collects all the basic functions that involve i factors.
Standard MARS uses a forward/backward stepwise strategy to produce a set of basic functions. The forward part is an iterative procedure, each of which simultaneously constructs an expanded list of basic functions to be considered and then decides which ones to enter at that step. After implementing the forward stepwise selection of basic functions, a backward procedure is applied in which the model is pruned by removing those basis functions that are associated with the smallest increase in the (least squares) goodness-of-fit. A least squares error function (inverse of goodness-of-fit) is computed. The so-called generalized cross validation (GCV) error is a measure of the goodness of fit that takes into account not only the residual error but also the model complexity as well. When the GCV reaches the least value, the best model is achieved. GCV is given by [50]: where C(K) = K(d/2 + 1), d regards as a smooth parameter, usually d = 3.

COSC-Boosting Algorithm Steps
The main steps of the COSC-Boosting algorithm for semi-supervised classifier learning are displayed in Algorithm 1. It is noteworthy that the unlabeled samples set and the test set can overlap. In each iteration, COSC-Boosting chooses M'samples randomly from the unlabeled sample set, based on the labeling confidence computing, the samples who are assigned the sample labels by the complementary classifier are recorded as the candidate high confidence samples to update the classifiers in the current repeating step and the most confidential labeling samples are elected as the virtually labeled samples to augment the labeled sample set and consequently regularize the classifier learning.
Labelˆy MARSC x U 1 i˙% gain the same label from TPSRC and MARSC If neither L 1 and L 2 changes then directly exit the repeating; Else Training h TPRSC (x) based on L 1 and h MARSC (x) based on L 2 separately; Reset U' and Randomly select M' samples from U with replacement to U'; End If End Repeat Output: the ultimate classifier f pxq " 1 2 ph TPRSC pxq`h MARSC pxqq ;;

Post-Processing for Product Quality Labeling
After the coefficients of the classifiers h TPRSC (x) and h MARSC (x) have been learnt, the regression values of the unlabeled points can be directly evaluated by the function described in Equations (17) and (20). For instance, for a new input image t i with feature vector x t i , its associated quality-related output isŷ TPRSC However, the output is apparently not restricted to be the labels in {+1,-1}. Hence, we should employ a threshold T, e.g., assigning the "medium" value zero as the classification threshold, namely, a very simple labeling rule is as follows: p"high quality" productq ifŷ t i ď T 1 p"other quality" productq others Furthermore, the eventual labeling results by the output of COSC-Boosting algorithm should be post-processed analogously as the labeling process of single classifier, namely, a thresholding processing should be carried out, in the two-class case the threshold can be set as the "medium" value zero as described in Equation (22).

Relation to Other Algorithms
The proposed COSC-Boosting algorithm is inspired by the COREG algorithm, a co-training-style semi-supervised regression algorithm [48], whose effectiveness is demonstrated in detail. Analogously, in the learning processing of the COSC-Boosting algorithm, if and only if the newly labeled sample x u makes the classifier more consistent with the labeled samples is set as the candidate set to regulate the classifier learning. The evaluation criterion of consistence of x u is as follows: 1ˇˇr where h l represents TPSRC or MARSC, hl denotes the refined version of h l with the newly and properly labeled example (x u , h l px u qq. If ∆ u is positive, it indicates that the refined classifier hl evolves towards being more consistent with the labeled sample. The labeling sample of most confidence, maximizing the value of MSE ph l ; x u q is picked out to augment the labeled samples for classifier learning. The unlabeled example chosen according to the maximization of MSE ph l ; x u q will result in a positive value of ∆ u , as explained in [48]. Unlike the commonly-used cross validation method in semi-supervised learning for determining the label confidence of the unlabeled samples, the proposed COSC-Boosting algorithm employs two complementary classifiers for classifier learning, which does not require cross validation, nor does it require redundant views construction on the training samples. The complementarities of the classifiers with the label confidence evaluation criterion guarantee the most confident unlabeled samples in each iteration to benefit the classifier learning. In the final formula of TPSRC in Equation (17), the Green's function ϕ(r) relates to the labeled sample's overall feature vector, r " x-x i , which is a distance metric and cannot use the single factor as well as the coordinating role or the interaction of the factors in the feature vector x. In other words, TPSRC establish an appropriate regression or classification model from the overall feature vector of the sample, whereas MARSC does not only consider the contribution of single factor, but also the cooperation and interaction of the factors in the feature vector. As described by Equation (20), MARSC takes full advantage of the characteristic of the sample feature information and mines the underlying complex structure information from the multidimensional feature vector, which is complementary to the TPSRC.
The COREG algorithm is more inclined to the diversities of the newly labeled samples, while the proposed COSC-Boosting algorithm doses not only consider the diversity of newly labeling samples, but also the noise suppression ability and overfitting prevention. The diversity is carried out by the selection of M' unlabeled samples in each iteration for classifier updating. Since the established classifiers are complementary, the probability of misclassification on an unlabeled sample with both of the classifiers will significantly decrease, when the same tag unanimously made by the two classifiers. Hence, in terms of the problem of noisy samples, the COSC-Boosting algorithm achieves high accuracy on the labeling of unlabeled samples.
The criterion for high confident sample selection by complementary classifiers is much more stringent in the proposed COSC-Boosting algorithm, namely, only the samples gaining consistent labels by two complementary classifiers and both the two classifiers can be refined more consistently with the labeled samples are chosen as the high confident samples for classifier learning, which can effectively prevent the problem of overfitting or vibration in the learning procedure caused by the wrongly regulation based on the impertinent introduction of newly labeled samples. In the extreme situation, only a few unlabeled samples in U achieve consistent labels based on TPSRC and MARSC, then the Learning process COSC-Boosting algorithm degenerates to a normal ensemble of two classifiers, by consulting the results of two complementary classifiers, better classification accuracy will also be achieved.

Experimental Verification
The proposed image statistical modeling with semi-supervised learning-based product quality inspection approach is tested on a food processing factory, in which several kinds of cereal, e.g., rice, corn, are processed by dehusking, polishing, screening, grading and packaging for marketing, located in a province in south China. The proposed method is tested on the assembly line for rice processing-quality grading.

Overview of Visual Sensors-Based Cereal Product Quality Classification
Rice is the world's most consumed staple food, and the predominant dietary energy source in China. Almost all of the rice processing enterprises attempt to employ OPQI technology instead of inefficient and subjective manual inspection to provide high-quality rice products. Nowadays, visual sensor-based rice quality inspection [61] has drawn wide attention.
Rice product is a typical GP. In earlier years, people tended to be more concerned about the physical properties of each individual grain, e.g., surface gloss, shape, size and other related characteristics [62], in visual sensor-based cereal quality monitoring. As addressed in the surveys [63,64], the first step is GPI segmentation, which is the foundation for GPI feature extraction [61]. Then, a kind of classifier such as artificial neural networks (ANN), support vector machines (SVM) or other supervised methods is exploited for product quality grading [65].
Although many experiments have verified the effectiveness of these methods and their high grading accuracies (higher than 90%), many unresolved problems restrict their practical application. For example, there is a lack of sufficiently effective and efficient GPI segmentation methods in GPI analysis. As reported in the literature, the highest record reported is only 1200 particles per minute [63]. Furthermore, in the product quality classification stage, the lack of adequate labeled samples is another barrier for classifier learning, because labeling the rice quality is fairly expensive. Figure 4 displays the schematic of a visual inspection system for rice quality monitoring.
During the rice processing, rice grains are evenly distributed on the conveyor belts. The OPQI system examines the cereal grains on the conveyor belts to identify the rice quality based on rice image features. When the cereal quality is found inferior, the actuator (blast nozzle) is automatically controlled to blast air. Then, the low-quality product is blown away from the conveyor belt for cereal food reprocessing or recovery to classify the cereal grains of different qualities and to yield high-quality rice product for consumers.
During the rice processing, rice grains are evenly distributed on the conveyor belts. The OPQI system examines the cereal grains on the conveyor belts to identify the rice quality based on rice image features. When the cereal quality is found inferior, the actuator (blast nozzle) is automatically controlled to blast air. Then, the low-quality product is blown away from the conveyor belt for cereal food reprocessing or recovery to classify the cereal grains of different qualities and to yield high-quality rice product for consumers.

Configuration
In the rice processingassembly line, to process the ricein parallel in the rice milling process there are multiple conveyors, among which each the conveyor belt is approximately 95mm wide. The capacity of each conveyor belt reaches 45 kg/h. The visual sensors in the OPQI system in the experiment are equipped above the conveyor belt perpendicular to the belt surface for rice quality monitoring. In the experiments, an IL-P3-2048 Linear CCD is mounted, whose pixel dimensions are14μm × 14μm, and the active pixel per line is 2048 with a 20MHz data rate per tap. The F24mm/2.8 fixed-focus lens (Computar, CBC Group, Tokyo, Japan) is used, and 8-bit gray scale image frame with the pixel resolution of 2048 × 128 is captured for rice quality inspection.
According to the actual demand of the factory, we only consider a two-value classification problem. The label of product quality is defined as follows:

Configuration
In the rice processingassembly line, to process the ricein parallel in the rice milling process there are multiple conveyors, among which each the conveyor belt is approximately 95mm wide. The capacity of each conveyor belt reaches 45 kg/h. The visual sensors in the OPQI system in the experiment are equipped above the conveyor belt perpendicular to the belt surface for rice quality monitoring.
In the experiments, an IL-P3-2048 Linear CCD is mounted, whose pixel dimensions are14 µmˆ14 µm, and the active pixel per line is 2048 with a 20MHz data rate per tap. The F24mm/2.8 fixed-focus lens (Computar, CBC Group, Tokyo, Japan) is used, and 8-bit gray scale image frame with the pixel resolution of 2048ˆ128 is captured for rice quality inspection.
According to the actual demand of the factory, we only consider a two-value classification problem. The label of product quality is defined as follows: 1 "high quality" rice "other quality" rice (24) A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1. Table 1. Rice varieties with the corresponding sample numbers for experimental verification.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1 "high quality" rice 1 "other quality" rice A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1295 Ningxia Pearl rice(NPR) 1206 Jinyou rice(JR) 1198 Round-grain glutinous rice(RGGR) 1246 Wuchang paddy aroma rice (WPAR) 1305 In the classification experiments, given samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: 1 "high quality" rice 1 "other quality" rice A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1295 Ningxia Pearl rice(NPR) 1206 Jinyou rice(JR) 1198 Round-grain glutinous rice(RGGR) 1246 Wuchang paddy aroma rice (WPAR) 1305 In the classification experiments, given samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: 1 "high quality" rice 1 "other quality" rice A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1295 Ningxia Pearl rice(NPR) 1206 Jinyou rice(JR) 1198 Round-grain glutinous rice(RGGR) 1246 Wuchang paddy aroma rice (WPAR) 1305 In the classification experiments, given samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: Round-grain glutinous rice(RGGR) 1 "high quality" rice 1 "other quality" rice A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1295 Ningxia Pearl rice(NPR) 1206 Jinyou rice(JR) 1198 Round-grain glutinous rice(RGGR) 1246 Wuchang paddy aroma rice (WPAR) 1305 In the classification experiments, given samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: 1 "high quality" rice 1 "other quality" rice A total number of 6250 samples, including rice images with corresponding rice quality labels calibrated manually, from five different rice varieties (RVs) are collected for experimental verification. The rice varieties with their corresponding number of samples in the experiment are displayed in Table 1.

Rice Variety Number of Samples
Chinese "see-mew" rice(CSMR) 1295 Ningxia Pearl rice(NPR) 1206 Jinyou rice(JR) 1198 Round-grain glutinous rice(RGGR) 1246 Wuchang paddy aroma rice (WPAR) 1305 In the classification experiments, given samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: In the classification experiments, given NT T i samples of rice variety i are randomly selected as the labeled samples for training samples and the remainder are used as the unlabeled samples both for classifier learning and classification performance evaluation, the classification error (CE) of rice variety i can be calculated with the following formula: whereŷ t i and y t i are the estimated and true label of the test sample t of rice variety i. Thus, the average CE (ACE) can be evaluated with respect to the RVs (ACE RV ), which does not consider the different test numbers of different RVs: And it can be also calculated with respect to the total test samples (ACE TS ), which does not take into account of the RVs of the samples: To increase the statistical robustness of the classification results, 100 independent Monte Carlo experiments are conducted and the experimental mean value and standard deviation are recorded. The average CE,ACE RV and ACE TS with the standard deviation of the repeated experiments is recorded to evaluate the rice quality classification performance.
Regarding the image feature extraction, the optimal steerable edge and ridge GDF templates proposed by Jacob [12] is adopted to capture the omnidirectional ISS of GPI by OGDF based on the Formula (8). Six steerable GDF templates adopted in the experiment are as follows: T 3 "´1.0655G y´0 .2σ 2 G xxy´0 .042σ 2 G yyy (30) Among these filter templates, T 1 , T 2 and T 3 are the optimal edge detectors with different derivative orders of mixed GDF based on the optimization of a Canny-like criterion, T 4 and T 5 are the best ridge detectors of different derivative order, and T 6 is a Wedge detector with the wedge angle ϕ. Templates T 1~T6 with their corresponding rotated versions in [0~π] are displayed in Figure 5. The computing method of OGDF with these steerable templates is discussed in Appendix E.

 
Among these filter templates, T1, T2 and T3 are the optimal edge detectors with different derivative orders of mixed GDF based on the optimization of a Canny-like criterion, T4 and T5 are the best ridge detectors of different derivative order, and T6 is a Wedge detector with the wedge angle φ. Templates T1~T6 with their corresponding rotated versions in [0~π] are displayed in Figure 5. The computing method of OGDF with these steerable templates is discussed in Appendix E. A total of 60 directions is uniformly sampled in [0~π] to obtain omnidirectional ISS of GPI. Five Gaussian scales, [σ1,σ2,…,σ5] = [0.5, √2/2,1,√2, 2], are selected for the sake of extracting the multiscale ISSs under various Gaussian observation scales. Hence, we can achieve a 5 × 60 × 3 × (N + 1)-dimensional WD-MP feature vector from the 5 × 60 filtering responses with regard to each steerable filter template as denoted by Equation (13), where N denotes the number of the nonoverlapingsubimages considered in the original image. If N = 0, it means we do not partition the original image into subimages for local ISS feature extraction and only the global ISS information is considered.

Classification Result and Performance Comparison Experiment I: Validation Experiments
Both single steerable filter template experiments and hybrid filter template experiments are performed in this study to evaluate the rice quality classification accuracy. Table 2 displays the rice quality classification results measured in CE (%) with single different filter templates (T1~T6) of the 100 independent Monte Carlo repetition experiments.
In the experiments, the number of the sub-images for local ISS feature analysis is four, namely, one image is partitioned to four local subimages for feature extraction and the extended WD-MP features are used for rice quality classifier. In terms of the classifier learning, the label rate of the samples is 45% for classifier training and the reminding 55% samples as the unlabeled samples both used for training and classification test in each repetition experiment. In the COSC-Boosting learning, the number ' M of unlabeled samples selected with replacement for classifier learning is a constant 100. 2,2s, are selected for the sake of extracting the multiscale ISSs under various Gaussian observation scales. Hence, we can achieve a 5ˆ60ˆ3ˆ(N + 1)-dimensional WD-MP feature vector from the 5ˆ60 filtering responses with regard to each steerable filter template as denoted by Equation (13), where N denotes the number of the nonoverlapingsubimages considered in the original image. If N = 0, it means we do not partition the original image into subimages for local ISS feature extraction and only the global ISS information is considered.

Classification Result and Performance Comparison Experiment I: Validation Experiments
Both single steerable filter template experiments and hybrid filter template experiments are performed in this study to evaluate the rice quality classification accuracy. Table 2 displays the rice quality classification results measured in CE (%) with single different filter templates (T 1~T6 ) of the 100 independent Monte Carlo repetition experiments.
In the experiments, the number of the sub-images for local ISS feature analysis is four, namely, one image is partitioned to four local subimages for feature extraction and the extended WD-MP features are used for rice quality classifier. In terms of the classifier learning, the label rate of the samples is 45% for classifier training and the reminding 55% samples as the unlabeled samples both used for training and classification test in each repetition experiment. In the COSC-Boosting learning, the number M1 of unlabeled samples selected with replacement for classifier learning is a constant 100.
In Table 2, ACE RV,mean and ACE TS,mean record the average values of ACE RV and ACE TS of the 100 independent repetition experiments, respectively; whereas ACE RV,std and ACE TS,std are the mean standard deviation values of ACE RV and ACE TS , respectively. The column titles "mean" and "std" record the mean and standard deviation values of CE for each rice variety.
As can be seen from Table 2, when only one kind of GDF template is chosen independently for ISS feature extraction, the best classification result can be achievedwith the filter template T 5 , whose average classification accuracy (ACA) on the five RVs ((1´ACE RV )ˆ100%) can reach 94.53% and ACA on all of the test samples ((1´ACE TS )ˆ100%) is 93.08%. That the rice image is composed of a large number of locally homogeneous grains of random distribution and the ISS of rice image is determined largely on the edges and ridges of the grains in the rice image, resulting from the randomly distributed grains of locally homogeneous. In accordance with the properties of the templates, T 5 is a kind of ridge detector, it can effectively extract the ridge structures as well as the edge curves of the rice grains, thus the most important ISS of the rice image can be attained. Hence, the extracted WD-MP features with GDF template T 5 , attaining the more essential ISS characteristics of the rice image, achieve relatively better classification results. The next best classification result can be achieved with the GDF template T 1 , whose ACA on the six RVs reaches 93.28%. The worst classification accuracy comes from the GDF template T 3 , whose ACA on the six RVs is slightly below 90%. However, the classification accuracies on any of the five RVs basically satisfy the practical requirements. The ACAs of the six templates on the five RVs and on all the test samples are 91.95% and 90.52%, respectively.
In addition, the proposed method in the single template experiment is also quite stable, which can be apparently indicated by the low standard deviation values of CE of the independent Monte Carlo experiments on different RVs. Hence, the proposed WD-MP features on the omnidirectional ISS based on the optimal steerable edge or ridge detection templates T 1´T6 is reasonable and basically satisfying in the rice quality inspection.
The focuses of the filter templates T 1´T6 are different, e.g., T 1´T3 are dedicated to detect the edge information in images, and T 4 , T 5 are ridge detectors, and T 6 is the wedge detector, thus the combination of the filter templates can obtain more elaborate ISS characterization of GPI and consequently it can achieve better classification performance for the OPQI on the theory view. Hence, we made additional rice quality classification experiments based on the combination of multiple filter templates. The rules are the combination of an edge template from different type of detectors as well as a fully integrated template (T 1 + T 2 + T 3 + T 4 + T 5 + T 6 ) experiment. The classification results are displayed in Table 3. The table entries by the fully integrated template (FIT) are boldfaced and underlined. Apart from the results by FIT, the best results on each of the five RVs by the other combination of GDF templates are underlined and the worst results on each RV are in boldface. Comparing the classification results of single steerable filter template experiments in Table 2 with hybrid filter template experiments in Table 3, the classification accuracies improved apparently in the hybrid filter template experiments (much lower CE). Specifically, the FIT(T 1 + T 2 + T 3 + T 4 Sensors 2016, 16, 998 23 of 34 + T 5 + T 6 ) achieve the best classification accuracy, the average CE on all of the six RVs reaches as low as 1.77% and the average CE on the all of the test samples is 3.22%, in other words, the average classification accuracy can reach as high as 98.23% on all of the six RVs, which is generally a quite high classification accuracy.
The next optimal combination is "T 1 + T 5 ", which can also achieve a very low CE or very high classification accuracy. The relatively poor results come from the combination of T 2 + T 4 , however, the ACA on the entire six RVs still over 93%.
In view of the standard deviation values of CE to the 100 independent repeated experiments, the statistical result from the fully integrated filter templates is about 1.6%, whereas the statistics from the combination of T 2 + T 2 is slightly below 1%. And even the "largest" standard deviation value of CE among the different combination experiments is less than 3%.

Experiment II: Parameter Selection on Classifier Performance
In this group of experiments, we are mainly concerned with the classifier performance resulting from the different parameters setting in the COSC-Boosting algorithm. In contrast to the validation experiment I, the label rate of the samples for classifier learning is changed from 10% to 60%, and the number of unlabeled samples M'is no long a constant, whereas the GPI feature is the proposed WD-MP feature based on the fully integrated template (FIT) as the classifier model input. Analogously, 100 independent Monte Carlo repetition experiments are conducted for robust comparison. The improvements (average improvement with the standard deviation value on the 100 independent repetition experiments) on the CE are displayed in Table 4, where the improvement is evaluated by subtracting the final CE (based on the proposed COSC-Boosting algorithm) from the initial CE, which does not perform the semi-supervised learning procedure and just performs by consulting the results from the two separated TPSC and MARSC (ph TPRSC pxq`h MARSC pxqq {2q. The results from Table 4 can be seen clearly that the proposed COSC-Boosting algorithm performs significantly better than the simple ensemble of the two initial classifiers. In other words, that the proposed semi-supervised learning classifier can effectively exploit the underlying information of the unlabeled samples and hence it eventually greatly improves the performance of product quality grading.

Experiment III: Comparisons
In order to compare the performance of the proposed method with related methods, we perform rice quality classification experiments with different GPI feature extraction methods and different pattern classification methods. In the GPI feature extraction, we selected some well-known related feature extraction methods for rice quality classification test, namely, the gray level co-occurrence matrix (GLCM) [66] and the gray level run length matrix (GLRM) [66], Wavelet transform analysis (WTA) method [67], and Gabor transform (GT) method [68].
Detailed GLCM/GLRM feature extractionmethod are as follows:(a) The image intensity is quantized to 8, 32 and 64 brightness levels; (b) At each quantization scale, calculate the GLCM/GLRM matrix;(c) Based on each GLCM/GLRM matrix, we extract in a total of 14 statistics [66], e.g., energy, inertia moment, partial correlation, entropy, fineness, to constitute the spatial structural feature vector of the rice images.
WTA feature are extracted as follows: (a) Image colour space is transformed into HIS and CIE L*a*b*color spaces; (b) In each independent colour space, we conduct multi-scale image decomposition using Db4 wavelet for rice image analysis, until the coarsest scale of the image size is not less than 8ˆ8; (c) In each decomposition scale, we calculate energy, colour covariance, etc. In total of 15 parameters [69] computed from the detail Wavelet decomposition coefficients.
GT feature is extracted as follows: (a) Gabor wavelets are defined by: where u and v define the orientation and scale of Gabor kernels, Z " px, yq T , denotes the norm operator, and the wave vector k u,v " k v e iφ u , where k v = k max /f v and ϕ u = πu/8, k max is the maximum frequency and f is the spacing factor between kernels in the frequency domain. (b) Five scales and eight orientations of the defined Gabor kernels are considered in the rice image analysis, namely u = 8, and v = 5, the other parameters in the Gabor kernel are defined as σ = 2π, k max = π/2 and f " ? 2. (c) Statistics, mean value and standard deviation value of each magnitude response of Gabor filtering are extracted to establish a 40ˆ2 dimensional feature vector as the rice image feature.
The classifier selection is another influencing factor besides the GPI feature extraction to OPQI. As addressed before, any supervised classifier can be used in this task. Two commonly-used supervised learning classifier, least squares-support vector machine (LS-SVM) and learning vector quantization-neural network(LVQ-NN) [70] classifier is used for the rice-quality grading experiment. The node number of hidden layer of LVQ-NN is determined by the best classification performance through cross-validation. The procedure of tenfold cross-validation repeated 10 times is used in the comparative experiment. By the repetitive experiments with different training samples, we find that 25 hidden layer nodes could obtain the best recognition results.
In order to achieve robust classification results, 100 independent repetition experiments are also carried out.In each repetition experiment, the training and test samples used for five different kinds of rice are identical (in this experiment, 45% of the samples are used for classifier training and the remainder is used for testing). The classification results (average improvement with the standard deviation value on the 100 independent repetition experiments) achieved by the combination of different image features with different classifiers are displayed in Table 5. The proposed WD-MP feature is extracted based on the FIT. The best classification performance (the minimum CE mean value) on each rice variety is underlined in the Table 5. As can be seen from Table 5, regardless of the choice of different classifier, the best classification performance (minimum CE) comes from the proposed WD-MP feature on almost every rice variety. The only exception of the rice variety of RGGR, where the best classification performance is achieved by the combination of the GLCM and GLRM feature with the LS-SVM, however, it is only a little better than that of the combination of the WD-MP feature with LS-SVM. The performances based on the commonly-used GPI features, GLCM, GLRM, WTA, GT, regardless of the classifier, are actually basically comparable, in other words, no single feature extraction method performed better than the others in the GP quality classification. The average classification accuracy by the commonly-used image feature with commonly-used classifiers on every variety of rice is basically lower than 90%, whereas, the average classification accuracy will be slightly higher than 91% if we change the commonly-used image feature extraction methods to the proposed WD-MP feature. However, the classification performance based on the combination of the proposed WD-MP feature with the commonly supervised learning classifier is apparently inferior to that of the combination of WD-MP feature with COSC-Boosting classifier, which effectively exploits the unlabeled samples and the classification accuracies with the same GPI feature can reach higher than 98% on four rice varieties and the classification accuracy on the reminder rice variety is only a litter lower than 98%.
Combining the results from the validation experiment and the comparative experiment, we can draw the following conclusions: (1) With the combination of the edge, ridge and wedge detector of GDF template, the ISS of GPI can be effectively characterized. (2) Because the ISS of GPI is proved to conform to the WD model, the proposed WD-MP features are a generally elaborate descriptor of ISS of grain images, which are closely related to the perceptual significance of HVP. (3) According to the presented design procedure of OGDF, the omnidirectional ISS of grain image can be effectively computed with low computation cost by the formula, which facilitates the practical applications of the proposed image statistical modeling based OPQI of product quality. (4) The proposed COSC-Boosting algorithm is an effective semi-supervised learning algorithm based on the complementary classifiers, TPSRC and MARSC, which can effectively exploit the underlying information of the unlabeled samples and achieve much better classification performance.
In summary, OPQI based on the proposed WD-MP features integrated with the semi-supervised COSC-Boosting classifier can achieve pretty high classification accuracies with sufficient robustness, which can effectively meet the imperative demand of industrial product quality inspection.

Conclusions
A kind of image statistical modeling integrated with a semi-supervised learning method for GP quality grading is presented to facilitate the practical applications of OPQI systems. We focus on delineating the ISS features of GPIs, comprising of stochastically stacking fragments (particles) of local homogeneity, without distinctive foregrounds and backgrounds, which brings great challenges in the intelligent identification of the product qualities, e.g., rice images, fabric images.
The WD processes of ISS of these images are explained by introducing the theory of sequential fragmentation. The OGDFs are established with low computation complexity to attain the ISS of complex grain images. The WD-MPs of ISS are extracted as the visual features for product quality identification, which are demonstrated to be closely related to the human visual perceptual properties with great perceptual significance. In the face of the scarcity of labeled samples, a co-training-style semi-supervised classifier algorithm, named COSC-Boosting, is exploited for semi-supervised GP quality recognition, by integrating two independent classifiers, TPSRC and MARSC, with complementary nature.
The proposed GP quality grading method integrated WD-MP features with COSC-Boosting classifier is tested in the field of rice quality grading for rice processing monitoring onan industrial scale assembly line. The experimental results indicate that the proposed WD-MP features can effectively characterize the statistical distribution profiles of ISS of these intricate texture images with a large number of stochastically accumulative fragmentations. The proposed method provides an effective tool for grain image modeling and analysis and consequently lays a foundation for the intelligent perception of the product-quality on assembly lines. small details are occurring more often in an image than large structures [53,54]. It can be described by the following formula: where x 1 Ñ x represents the process of decomposing a coarse fragment structure x' into fine fragment structures x; µ means the average mass of the particles in the mill of the ore grinding processes, which in this study can be represented by the average illuminant contrast of the fragments(particles) in the image; β is a scale parameter, related to the "width" of the contrast of the local fragments; λ is a shape parameter, controlling the distribution shape and satisfying λ ě 0, which is related to the particle size.
In accordance with the real phenomenon that small details are occurring more often, the parameter λ increases, as the particle size becomes smaller.
Since the visual image is composed of diverse local fragments, the statistical distribution of the contrast of the fragment structures is the result of the integral over the various power-laws patches caused by each coarse particle [35,51]: where N T " ş 8 0 n pmq dm is a normalized parameter. The resolving power of the visual sensor in real applications cannot be infinite, the fragmentation process of the local particles will inevitably cease. After the particle details tend to be stable, and the special visual appearance of ISS exhibits. Hence, the statistical distributions of ISS just correspond to the fragmentswith local contrast larger than x. Therefore, the statistical distribution model of ISSI can be described by the WD model of integral form. Its probability density model is given by: f px; µ, λ, βq " N pą xq " 1 λ βΓ´1 λ¯ı is a normalized constant, only related to the model parameter λ and β and Γ(x) is the gamma function.

Appendix B. Parameter Estimation of WD Model
Given X = {x 1 ,x 2 , . . . ,x n } is the sampling data from an integral-form WD variable. WD Model parameters µ, λ and β can be estimated by the maximum likelihood estimation (MLE) method. The corresponding log-likelihood function lnL`X;μ,λ,β˘indicates how well the model describes the sampling data: lnL`X;μ,λ,β˘" ln whereμ,λ,β represent the estimation values of WD Model parameters µ, λ and β . The expansion of lnL`X;μ,λ,β˘is: lnL`X;μ,λ,β˘" The model parameters can be estimated by setting the partial derivative of lnL`X;μ,λ,β˘with respect toμ,λ andβ be equal to zero, respectively. Thus we can obtain:

B Bβ
lnL`X;μ,λ,β˘"´n β`1 β where Φ(‚) is the digamma function, Φ pxq " d dx lnΓ pxq " d dx Γ pxq {Γ pxq. Since we cannot obtain the closed-form solution from the Equation set (B3)-(B5), we use an iterative procedure for numerically searching the maximum value of the log-likelihood function lnL`X;μ,λ,β˘to get the numerical solution of the model parametersμ,λ andβ. The kernel principle of the iterative procedure is derived from the Nelder-Mead simplex algorithm [55], which only uses the function values without any derivative information. This maximum value-searching method is identical to the minimum value search problem by multiplying -1 by the log-likelihood function lnL`X;μ,λ,β˘. For convenience, the negative log-likelihood function is denoted as T`X;θ˘"´lnL`X;μ,λ,β˘, whereθ represents the parameter vector. The initialization parameter vector is denoted asθ 0 ,θ 0 " "μ 0 ,λ 0 ,β 0 ‰ . The procedure for the statistical model parameter estimation is as follows: (1) Initiation: (a) A simplex of four points is generated for the 3D parameter vector θ = [θ 0 ;θ 1 ;θ 2 ;θ 3 ], where θ i (i > 0) is assigned as the initial value of θ 0 by adding a% of each component θ 0 (i), (b) Four scalar parameters ρ, χ, γ and δ and the terminal threshold tol X are initiated. (c) The subscripts of the simplex points are reordered from the lowest function value to highest function value with i, i = 1, 2, 3, 4, with the new subscript of θ, T pX; θ 1 q ď T pX; θ 2 q ď T pX; θ 3 q ď T pX; θ 4 q .
(2) Iterative process: Four-point convergence is repeated, e.g., max θ i´θj i,j"1,2,3,4 ď tolX ( , then obtain the estimation θ " θ i . The repeated procedure is as follows: (a) θ r " p1`γq θ´γθ 4 , where θ " mean pθq If T pX; θ r q ă T pX; θ 1 q ; (b) θ e " p1`ρχq θ´ρχθ n`1 If T pX; θ e q ă T pX; θ r q ; θ 4 " θ e % accept θ e else θ 4 " θ r ; % accept θ r end (c) elseif T pX|θ r q ă T pX|θ 3 q θ 3 " θ r ; u% accept θ r elseif T pX|θ r q ă T pX|θ 4 q θ c " p1`γρq θ´γρθ n`1 If T pX; θ c q ă T pX; θ r q ; θ 4 " θ c else goto shrink; end else θ cc " p1´γq θ´γθ 4 If T pX; θ cc q ă T pX; θ 4 q ; θ 4 " θ cc else gotoshrink; end end end shrink: for j = 2:4θ j " θ 1``θj´θ1˘e nd end (d) The subscripts of the simplex points are reordered from lowest function value to the highest function value rT pX|θq , js " sort pT pX|θqq , θ new " θ pjq The convergence properties of this procedure are presented in the study [55]. In the original report, the four scalar parameters can be set asρ = 1, χ = 2, γ = 0.5, and δ = 0.5. In practical applications, this procedure converges rapidly near the minimum position of the negative log-likelihood function. Hence, for fast searching, the initial model parameter θ 0 is set as the empirical value close to the extremum, namely,μ 0 " median pXq,λ 0 " 2 As stated in [58,59], the design and steer of the steerable filter can be better explained in the Fourier domain. Hence, we transform the rotated filter G κ,σ pXR θ q into the Fourier domain to facilitate computing, and we can then obtain [59]: where p f pxqq means the Fourier transform of the function f pxq ; C n m is the binomial coefficient, C n m " m! n!pm´nq! ;Ĝ σ`ωx , ω y˘i s the Fourier transform of the Gaussian function G σ px, yq . According to the convolution theorem, we can perform Fourier transform on both sides of Equation (8), and then we can obtain: tI pXq˚G κ,σ pXR θ qu "Î`ω x , ω y˘ tG κ,σ pXR θ qu "Î`ω x , ω y˘k We then perform inverse Fourier transformation on the formula (46) and we achieve the time domain expression of pXq˚G κ,σ pXR θ q: I pXq˚G κ,σ pXR θ q " ´1 t tI pXq˚G κ,σ pXR θ quu Consequently, we can obtain α m,j as the following expression [59]: