# Valid Probabilistic Predictions for Ginseng with Venn Machines Using Electronic Nose

## Abstract

## 1. Introduction

## 2. Probabilistic Prediction Methods

#### 2.1. Venn Machine

#### 2.2. Naive Bayes Classification

#### 2.3. Softmax Regression

#### 2.4. Platt’s Method

#### 2.5. The Validity of Probabilistic Predictions

#### 2.5.1. Loss Function

#### 2.5.2. Cumulative Probability Values versus Cumulative Correct Predictions

## 3. Materials and Methods

#### 3.1. Sample Preparation

#### 3.2. E-Nose Equipment and Measurement

#### 3.3. Data Processing

- 1.
- The maximal absolute response, ${\mathrm{V}}_{\mathrm{max}}=\mathrm{max}(\left|\mathrm{V}\right|)$, which is most efficient and widely-used steady feature.
- 2.
- The area under the full response curve, ${\mathrm{V}}_{\mathrm{int}}={{\displaystyle \int}}_{0}^{\mathrm{T}}\mathrm{V}(\mathrm{t})\mathrm{dt}$, where T (=340 s) is the total measurement time, which is also widely- used steady feature.
- 3–8.
- Exponential moving average of derivative [18,19] of V, ${\mathrm{E}}_{\mathrm{a}}(\mathrm{V})=[\mathrm{min}(\mathrm{y}(\mathrm{k})),\mathrm{max}(\mathrm{y}(\mathrm{k}))$, where the discretely sampled exponential moving average $\mathrm{y}(\mathrm{k})=(1-\mathrm{a})\mathrm{y}(\mathrm{k}-1)+\mathrm{a}(\mathrm{V}(\mathrm{k})-\mathrm{V}(\mathrm{k}-1))$ with smoothing factors $\mathrm{a}=1/(100\times \mathrm{SR}),1/(10\times \mathrm{SR}),1/\mathrm{SR}$. SR is the sampling rate, SR = 10 Hz. $\mathrm{y}(1)=\mathrm{aV}(1)$. For each smoothing factor, two features were extracted. A total of six transient feature were extracted. Besides steady features, transient features were considered to contain much effective information that should be made the best of [20].Finally, 16 × 8 = 128 features are extracted from each sample and all the features are scaled to [0 1]:$${\mathrm{x}}_{\mathrm{i},\mathrm{j}}^{\prime}=\frac{{\mathrm{x}}_{\mathrm{i},\mathrm{j}}-\underset{\mathrm{j}}{\mathrm{min}}({\mathrm{x}}_{\mathrm{i},\mathrm{j}})}{\underset{\mathrm{j}}{\mathrm{max}(}{\mathrm{x}}_{\mathrm{i},\mathrm{j}})-\underset{\mathrm{j}}{\mathrm{min}(}{\mathrm{x}}_{\mathrm{i},\mathrm{j}})}$$

^{−1}] with 5-fold cross-validation to obtain optimum parameters for the model. Then, the trained model was performed on the testing examples. For Naïve Bayes, the distribution was set to be Gaussian distribution. All algorithm were implemented in MATLAB 2014a.

## 4. Results and Discussion

#### 4.1. Performance of Probabilistic Predictors in Offline Mode

#### 4.2. Performance of Venn Predictors in Online Mode

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

**Figure 2.**Validity of probabilistic predictions by (

**a**) VM-SVM; (

**b**) VM-SR; (

**c**) VM-NB; (

**d**) Platt’s method; (

**e**) Softmax Regression; and (

**f**) Naïve Bayes in offline mode.

**Figure 3.**Validity of probabilistic predictions by (

**a**) VM-SVM (Venn machine based on SVM); (

**b**) VM-SR (Venn machine based on Softmax Regression); (

**c**) VM-NB (Venn machine based on Naïve Bayes) in online mode.

**Figure 4.**Change of precision of predicted probability intervals for samples from category (

**1**–

**9**) during online prediction process with VM-SVM.

No. | Ginseng Species | Places of Production |
---|---|---|

1 | Chinese red ginseng | Ji’an |

2 | Chinese red ginseng | Fusong |

3 | Korean red ginseng | Ji’an |

4 | Chinese white ginseng | Ji’an |

5 | Chinese white ginseng | Fusong |

6 | American ginseng | Fusong |

7 | American ginseng | USA |

8 | American ginseng | Canada |

9 | American ginseng | Tonghua |

**Table 2.**Classification rates and assessment criteria of validity of probabilistic prediction results for ginseng samples by each method.

Methods | Classification Rate | Assessment Criteria of Validity | ||
---|---|---|---|---|

d_{ln} | d_{sq} | d_{1} | ||

VM-SVM | 86.35% | 0.3862 | 0.3419 | 0.0373 |

Platt’s method | 88.57% | 0.3876 | 0.3439 | 0.1480 |

VM-SR | 77.78% | 0.4690 | 0.3938 | 0.1085 |

SR | 76.19% | Inf ^{a} | 0.4376 | 0.1853 |

VM-NB | 60.32% | 0.5683 | 0.4475 | 0.0266 |

NB | 40.32% | 0.5851 | 0.4510 | 0.0332 |

^{a}In the prediction result of SR, predicted probability value of certain sample was 1, whereas the prediction was wrong, which lead this criteria to be infinite.

Method/Category | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|

VM-SVM | Sensitivity | 0.9714 | 0.8857 | 1 | 0.8571 | 0.8286 | 1 | 0.6857 | 0.8286 | 0.7143 |

Specificity | 0.9857 | 0.9893 | 0.9964 | 0.9821 | 0.9786 | 1 | 0.9536 | 0.9857 | 0.9750 | |

Platt’s method | Sensitivity | 0.9714 | 0.8857 | 1 | 0.8571 | 0.8857 | 1 | 0.7143 | 0.8571 | 0.7714 |

Specificity | 0.9857 | 0.9893 | 0.9964 | 0.9893 | 0.9821 | 1 | 0.9607 | 0.9857 | 0.9786 | |

VM-SR | Sensitivity | 0.8000 | 0.8286 | 0.9429 | 0.7429 | 0.5714 | 1 | 0.5143 | 0.8000 | 0.8000 |

Specificity | 0.9714 | 0.9714 | 1 | 0.9536 | 0.9643 | 0.9964 | 0.9571 | 0.9750 | 0.9607 | |

SR | Sensitivity | 0.7429 | 0.7429 | 0.9714 | 0.7143 | 0.6000 | 1 | 0.5429 | 0.8000 | 0.7429 |

Specificity | 0.9714 | 0.9714 | 1 | 0.9536 | 0.9643 | 0.9964 | 0.9571 | 0.9750 | 0.9607 | |

VM-NB | Sensitivity | 0.8000 | 0 | 0.9714 | 0.6571 | 0.4286 | 0.9429 | 0.4286 | 0.4857 | 0.7143 |

Specificity | 0.8464 | 0.9786 | 0.9929 | 0.9500 | 0.9321 | 1 | 0.9429 | 0.9714 | 0.9393 | |

NB | Sensitivity | 0 | 0.3143 | 0.9714 | 0.2857 | 0.286 | 0.6857 | 0.571 | 0.3714 | 0.1429 |

Specificity | 1 | 0.9607 | 0.9929 | 0.8750 | 0.8786 | 0.9107 | 0.8929 | 0.8679 | 0.9500 |

