A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing
Abstract
:1. Introduction
- Mega Trend Diffusion (MTD):MTD was proposed in [15] based on the information diffusion method that uses fuzzy theories to fill missing data [16,17]. The main difference between MTD and the information diffusion method is that MTD employs a general diffusion function to scatter a collection of data across the whole dataset, while the information diffusion method diffuses each sample separately [17]. MTD merges mega diffusion with data trend estimation to control the symmetrical expansion issue and to increase the learning accuracy in flexible manufacturing system scheduling. Both the diffusion-neural-network and megatrend-diffusion require membership function values such as extra information, and that means the appearance possibility for each input attribute and the number of input attributes needed for artificial neural network training. This process makes the calculation more complicated and lengthy in duration. In addition, the membership function values basically do not hold any managerial meaning [15,18]. Lastly, the MTD technique is applied to simulated systems. Therefore, it is not clear what this technique achieves in real system situations [19].
- Functional Virtual Population (FVP):The functional virtual population was developed in [20]. FVP is based on data domain expansion methods (i.e., left, right, and both sides) for small datasets [17]. The FVP method operates by adding virtual samples for training assistance and acquiring scheduling knowledge in dynamic industrializing systems. The strategy includes data decreasing, data increasing, and mixed data to form a functional virtual population. The generated virtual samples increase the learning performance of neural networks [21]. The FVP technique was the first method that was proposed for small dataset managing, and it was developed to extend the domain of attributes and produce virtual samples for the purposes of constructing early scheduling knowledge. It is based on a trial and error procedure and requires many steps to complete the process [22]. This method has significant limitations when applied to systems including nominal variables or high variance between stages [23].
- Multivariate Normal synthetic sample generation (MVN):MVN has two parameters, and each parameter contains more than one piece of information. One parameter sets the centre of the distribution, and the other parameter determines the dispersion and the width of the spread from side to side of the distribution centre [24]. The MTD and FVP methods extend the dataset by enlarging the domains of the feature dataset while the MVN method synthetically produces input specifying single dimensional multivariate normal data sample generation [19,25]. MVN synthetic sample generation uses multivariate covariance dependencies among basic samples. In addition, it maintains the ingrained noise of samples [19]. MVN utilizes the covariance matrix that summarizes the interaction between different components of data [26].
2. Virtual Sample Generation technique
3. The Proposed Method
3.1. Virtual Sample Generation Method
3.1.1. The Pre-Processing Method
Algorithm 1 The algorithm for virtual sample generation. |
Input: Small number of samples as classes A and B. Output: Hundreds of thousands or millions of samples (N). 1: loop I = 1: number_ of_ features//number of columns in the matrix 2: {Find the Max, Min, and mean for each feature in classes A and B} 3: {Initialize MaxA = the maximum value in feature I in matrix A} 4: {Initialize MinA = the minimum value in feature I in matrix A} 5: {Initialize MeanA = the mean value for all values in feature I in matrix A} 6: {Initialize MaxB = the maximum value in feature I in matrix B} 7: {Initialize MinB = the minimum value in the feature I in matrix B} 8: {Initialize MeanB = the mean value for all values in feature I in matrix B} 9: If (MinA >= MaxB) or (MinB >= MaxA) //Ideal cases: see Figure 2 10: {If (MaxA > MaxB) //Expand the intervals before generating N virtual samples 11: MaxA = MaxA + MeanA 12: else 13: MaxB = MaxB + MeanB 14: if (MinA < MinB) //Expand the intervals 15: MinA = MinA − MeanA 16: else 17: MinB = MinB − MeanB} //Generate N virtual sample with a normal distribution as follows 18: VGA = MinA + (MaxA − MinA)*sum(rand(N, constant), 2)/constant; 19: VGB = MinB + (MaxB − MinB)*sum(rand(N, constant), 2)/constant; 20: else //Not ideal case: see Figure 3 21: call Algorithm 2 // The preprocessing 22: end loop 23: {New samples = small number of samples + N samples} |
Algorithm 2 is the algorithm for solving overlapping. |
Input: Feature with overlapping} // A and B are two different classes// Output: Feature without overlapping 1 :{Find the MaxA, MaxB, MinA, MinB, MeanA, and MeanB} 2 :{Initialize MlsA = the features in descending order for class A} 3 :{Initialize mlsA = the features in ascending order for class A} 4 :{Initialize MlsB = the features in descending order for class B} 5 :{Initialize mlsB = the features in ascending order for class B} 6 :{Initialize C = 1}// counter 7 : Loop until ((MinA >= MaxB) or (MinB >= MaxA))//Ideal case: Figure 2. 8 : {C = C + 1 9 : if (MinA < MinB and MaxB < MaxA)//Case E: Figure 3. 10 : if (MeanA − MinA <= MaxA-MeanA) 11 : MinA = mlsA[C] 12 : else 13 : MaxA = MlsA[C] 14 : if (MinA > MinB and MaxB > MinA and MaxB < MaxA)//Case G: Figure 3. 15 :if (MeanA − MinA <= MaxB-MeanB) 16: MinA = mlsA[C] 17: else 18: MaxA = MlsA[C] 19:if (MinB > MinA and MaxA > MinB and MaxA < MaxB)//Case H: Figure 3. 20:if (MeanB − MinB <= MaxA-MeanA) 21: MinB = mlsB[C] 22: else 23: MaxB = MlsB[C] 24: if (MinB < MinA and MaxA < MaxB)//Case F: Figure 3. 25 : if (MeanB − MinB <= MaxB − MeanB) 26 : MinB = mlsB[C] 27 : else 28: MaxB = MlsB[C] 29: end loop 30:{Back to Algorithm 1} |
3.1.2. Random Virtual Generation Technique
4. Experiment
4.1. Datasets
4.2. Classification Techniques
5. Experimental Results
6. The Numerical Example
7. Conclusions and Future Extensions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Deng, L.; Yu, D. Deep learning: Methods and applications. Found. Trends® Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef]
- Charalambous, C.C.; Bharath, A.A. A data augmentation methodology for training machine/deep learning gait recognition algorithms. arXiv 2016, arXiv:1610.07570. [Google Scholar] [Green Version]
- Masood, A.; Al-Jumaily, A. Semi-advised learning model for skin cancer diagnosis based on histopathalogical images. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; pp. 631–634. [Google Scholar]
- Li, D.C.; Wen, I.H. A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing 2014, 143, 222–230. [Google Scholar] [CrossRef]
- Strauss, K.A.; Puffenberger, E.G.; Morton, D.H. Maple syrup urine disease. J. Pediatr. 2013, 132, 17S–23S. [Google Scholar]
- Fu, Q.; Cao, J.; Renner, J.B.; Jordan, J.M.; Caterson, B.; Duance, V.; Luo, M.; Kraus, V.B. Radiographic features of hand osteoarthritis in adult Kashin-Beck Disease (KBD): The Yongshou KBD study. Osteoarthr. Cartil. 2015, 23, 868–873. [Google Scholar] [CrossRef]
- Radiology, E.S. Medical imaging in personalised medicine: A white paper of the research committee of the European Society of Radiology (ESR). Insights Imaging 2015, 6, 141–155. [Google Scholar]
- Colubri, A.; Silver, T.; Fradet, T.; Retzepi, K.; Fry, B.; Sabeti, P. Transforming clinical data into actionable prognosis models: Machine-learning framework and field-deployable app to predict outcome of Ebola patients. PLoS Negl. Trop. Dis. 2016, 10, e0004549. [Google Scholar] [CrossRef]
- Vymetal, J.; Skacelova, M.; Smrzova, A.; Klicova, A.; Schubertova, M.; Horak, P.; Zadrazil, J. Emergency situations in rheumatology with a focus on systemic autoimmune diseases. Biomed. Pap. Med Fac. Palacky Univ. Olomouc 2016, 160, 20–29. [Google Scholar] [CrossRef] [Green Version]
- Ildstad, S.T.; Evans, C.H. Small Clinical Trials: Issues and Challenges; National Academy Press: Washington, DC, USA, 2001. [Google Scholar]
- Orru, G.; Pettersson-Yeo, W.; Marquand, A.F.; Sartori, G.; Mechelli, A. Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: A critical review. Neurosci. Biobehav. Rev. 2012, 36, 1140–1152. [Google Scholar] [CrossRef]
- Wedyan, M.; Al-Jumaily, A. Early diagnosis autism based on upper limb motor coordination in high risk subjects for autism. In Proceedings of the 2016 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS), Tokyo, Japan, 17–20 December 2016; pp. 13–18. [Google Scholar]
- Wedyan, M.; Al-Jumaily, A. Upper limb motor coordination based early diagnosis in high risk subjects for Autism. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–8. [Google Scholar]
- Li, D.C.; Wu, C.S.; Tsai, T.I.; Lina, Y.S. Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput. Oper. Res. 2007, 34, 966–982. [Google Scholar] [CrossRef]
- Huang, C.; Moraga, C. A diffusion-neural-network for learning from small samples. Int. J. Approx. Reason. 2004, 35, 137–161. [Google Scholar] [CrossRef] [Green Version]
- Khot, L.; Panigrahi, S.; Woznica, S. Neural-network-based classification of meat: Evaluation of techniques to overcome small dataset problems. Biol. Eng. Trans. 2008, 1, 127–143. [Google Scholar] [CrossRef]
- Li, D.C.; Chen, C.C.; Chang, C.J.; Lin, W.K. A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst. Appl. 2012, 39, 1575–1581. [Google Scholar] [CrossRef]
- Khot, L.R.; Panigrahi, S.; Doetkott, C.; Chang, Y.; Glower, J.; Amamcharla, J.; Logue, C.; Sherwood, J. Evaluation of technique to overcome small dataset problems during neural-network based contamination classification of packaged beef using integrated olfactory sensor system. LWT Food Sci. Technol. 2012, 45, 233–240. [Google Scholar] [CrossRef]
- Li, D.C.; Chen, L.S.; Lin, Y.S. Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. Int. J. Prod. Res. 2003, 41, 4011–4024. [Google Scholar] [CrossRef]
- Li, D.C.; Yeh, C.W. A non-parametric learning algorithm for small manufacturing data sets. Expert Syst. Appl. 2008, 34, 391–398. [Google Scholar] [CrossRef]
- Chao, G.Y.; Tsai, T.I.; Lu, T.J.; Hsu, H.C.; Bao, B.Y.; Wu, W.Y.; Lin, M.T.; Lu, T.L. A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Syst. Appl. 2011, 38, 7963–7969. [Google Scholar] [CrossRef]
- Li, D.C.; Lin, Y.S. Using virtual sample generation to build up management knowledge in the early manufacturing stages. Eur. J. Oper. Res. 2006, 175, 413–434. [Google Scholar] [CrossRef]
- Johnson, R.; Wichern, D. The multivariate normal distribution. In Applied Multivariate Statistical Analysis; Prentice-Hall Inc.: Englewood Cliffs, NJ, USA, 1982; pp. 150–173. [Google Scholar]
- Scott, P.D.; Wilkins, E. Evaluating data mining procedures: Techniques for generating artificial data sets. Inf. Softw. Technol. 1999, 41, 579–587. [Google Scholar] [CrossRef]
- Khot, L.R. Characterization and Pattern Recognition of Selected Sensors For Food Safety Applications; North Dakota State University: Fargo, ND, USA, 2009. [Google Scholar]
- Li, D.C.; Liu, C.W.; Chen, W.C. A multi-model approach to determine early manufacturing parameters for small-data-set prediction. Int. J. Prod. Res. 2012, 50, 6679–6690. [Google Scholar] [CrossRef]
- Niyogi, P.; Girosi, F.; Poggio, T. Incorporating prior information in machine learning by creating virtual examples. Proc. IEEE 1998, 86, 2196–2209. [Google Scholar] [CrossRef] [Green Version]
- Li, D.C.; Fang, Y.H.; Lai, Y.Y.; Hu, S.C. Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Inf. Sci. 2009, 179, 2740–2753. [Google Scholar] [CrossRef]
- Dheeru, D.; Karra Taniskidou, E. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml/datasets.php (accessed on 25 November 2019).
- Liu, Y.; Zhou, Y.; Liu, X.; Dong, F.; Wang, C.; Wang, Z. Wasserstein GAN-Based Small-Sample Augmentation for New-Generation Artificial Intelligence: A Case Study of Cancer-Staging Data in Biology. Engineering 2019, 5, 156–163. [Google Scholar] [CrossRef]
- Martin, C.; Springate, C. Synthetic Sample Generation Representing the English Population Using Spearman Rank Correlation and Chomsky Decomposition. Value Health 2018, 21, S221. [Google Scholar] [CrossRef]
- MathLab. Normally Distributed Random Numbers. 2018. Available online: https://www.mathworks.com/help/matlab/ref/randn.html (accessed on 25 November 2018).
- Yang, J.; Yu, X.; Xie, Z.Q.; Zhang, J.P. A novel virtual sample generation method based on Gaussian distribution. Knowl. Based Syst. 2011, 24, 740–748. [Google Scholar] [CrossRef]
- Crippa, A.; Salvatore, C.; Perego, P.; Forti, S.; Nobile, M.; Molteni, M.; Castiglioni, I. Use of Machine Learning to Identify Children with Autism and Their Motor Abnormalities. J. Autism Dev. Disord. 2015, 45, 2146–2156. [Google Scholar] [CrossRef]
- UCI. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml/datasets.php (accessed on 25 November 2019).
- Lichman, M. UCI Machine Learning Repository. 2013. Available online: http://archive.ics.uci.edu/ml (accessed on 25 November 2019).
- MathLab. Train Stacked Autoencoders for Image Classification. 2017. Available online: https://www.mathworks.com/help/deeplearning/examples/train-stacked-autoencoders-for-image-classification.html (accessed on 25 November 2019).
- Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188. [Google Scholar]
- Svozil, D.; Kvasnicka, V.; Pospichal, J. Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 1997, 39, 43–62. [Google Scholar] [CrossRef]
Datasets | Number of Samples | Number of Attributes | Number Of Classes |
---|---|---|---|
IRCCSMedea | 30 | 17 | 2 |
E. coli | 336 | 8 | 8 |
Breast Tissue | 106 | 9 | 6 |
Number of Original Samples for Each Class | Number of the Virtual Samples for Each Class | Total Number of Virtual Samples | Number of Samples for Training | Number of Samples for Testing | Deep Learning without VSG | Linear SVM Accuracy without VSG | Accuracy with VSG and Deep Learning | Improvement |
---|---|---|---|---|---|---|---|---|
3 | 10,000 | 20,000 | 20,006 | 24 | 66.7% | 70.83% | 79.2% | 8.37% |
3 | 25,000 | 50,000 | 50,006 | 24 | 66.7% | 70.83% | 79.2% | 8.37% |
3 | 50,000 | 100,000 | 100,006 | 24 | 66.7% | 70.83% | 75% | 4.17% |
3 | 100,000 | 200,000 | 200,006 | 24 | 66.7% | 70.83% | 75% | 4.17% |
3 | 200,000 | 400,000 | 400,006 | 24 | 66.7% | 70.83% | 75% | 4.17% |
3 | 300,000 | 600,000 | 600,006 | 24 | 66.7% | 70.83% | 75% | 4.17% |
5 | 10,000 | 20,000 | 20,010 | 20 | 65.0% | 75% | 90% | 15% |
5 | 25,000 | 50,000 | 50,010 | 20 | 65.0% | 75% | 95% | 20% |
5 | 50,000 | 100,000 | 100,010 | 20 | 65.0% | 75% | 85% | 10% |
5 | 100,000 | 200,000 | 200,010 | 20 | 65.0% | 75% | 90% | 15% |
5 | 200,000 | 400,000 | 400,010 | 20 | 65.0% | 75% | 90% | 15% |
5 | 300,000 | 600,000 | 600,010 | 20 | 65.0% | 75% | 85% | 10% |
6 | 10,000 | 20,000 | 20,012 | 18 | 72.2% | 77.78% | 94.4% | 16.62% |
6 | 25,000 | 50,000 | 50,012 | 18 | 72.2% | 77.78% | 88.9% | 11.12% |
6 | 50,000 | 100,000 | 100,012 | 18 | 72.2% | 77.78% | 88.9% | 11.12% |
6 | 100,000 | 200,000 | 200,012 | 18 | 72.2% | 77.78% | 83.3% | 5.52% |
6 | 200,000 | 400,000 | 400,012 | 18 | 72.2% | 77.78% | 88.9% | 11.12% |
6 | 300,000 | 600,000 | 600,012 | 18 | 72.2% | 77.78% | 83.3% | 5.52% |
9 | 10,000 | 20,000 | 20,018 | 12 | 75.0 | 83.33% | 83.3% | 0% |
9 | 25,000 | 50,000 | 50,018 | 12 | 75.0 | 83.33% | 83.3% | 0% |
9 | 50,000 | 100,000 | 100,018 | 12 | 75.0 | 83.33% | 91.8% | 8.33% |
9 | 100,000 | 200,000 | 200,018 | 12 | 75.0 | 83.33% | 91.8% | 8.33% |
9 | 200,000 | 400,000 | 400,018 | 12 | 75.0 | 83.33% | 83.3% | 0% |
9 | 300,000 | 600,000 | 600,018 | 12 | 75.0 | 83.33% | 83.3% | 0% |
Number of Original Samples for Each Class | Number of the Virtual Samples for Each Class | Total Number of Virtual Samples | Number of Samples for Training | Number of the Sample for Testing | Deep Learning without VSG | Linear SVM Accuracy without VSG | Accuracy with VSG and Deep Learning | Improvement |
---|---|---|---|---|---|---|---|---|
3 | 10,000 | 20,000 | 20,006 | 148 | 50% | 83.8% | 92.8% | 9.0% |
3 | 25,000 | 50,000 | 50,006 | 148 | 50% | 83.8% | 93.9% | 10.1% |
3 | 50,000 | 100,000 | 100.006 | 148 | 50% | 83.8% | 93.9% | 10.1% |
3 | 100,000 | 200,000 | 200,006 | 148 | 50% | 83.8% | 91.9% | 8.1% |
3 | 200,000 | 400,000 | 400,006 | 148 | 50% | 83.8% | 93.2% | 10.1% |
3 | 300,000 | 600,000 | 600,006 | 148 | 50% | 83.8% | 91.2% | 8.1% |
5 | 10,000 | 20,000 | 20,010 | 144 | 50% | 85.4% | 93.8% | 8.4% |
5 | 25,000 | 50,000 | 50,010 | 144 | 50% | 85.4% | 91.0% | 5.6% |
5 | 50,000 | 100,000 | 100,010 | 144 | 50% | 85.4% | 94.4% | 9.0% |
5 | 100,000 | 200,000 | 200,010 | 144 | 50% | 85.4% | 91.0% | 5.6% |
5 | 200,000 | 400,000 | 400,010 | 144 | 50% | 85.4% | 94.4% | 9.0% |
5 | 300,000 | 600,000 | 600,010 | 144 | 50% | 85.4% | 90.3% | 4.9% |
6 | 10,000 | 20,000 | 20,012 | 142 | 50% | 85.2% | 92.3 | 7.1% |
6 | 25,000 | 50,000 | 50,012 | 142 | 50% | 85.2% | 93.0% | 7.8% |
6 | 50,000 | 100,000 | 100,012 | 142 | 50% | 85.2% | 92.3% | 7.1% |
6 | 100,000 | 200,000 | 200,012 | 142 | 50% | 85.2% | 91.5% | 6.30% |
6 | 200,000 | 400,000 | 400,012 | 142 | 50% | 85.2% | 92.9 | 7.7% |
6 | 300,000 | 600,000 | 600,012 | 142 | 50% | 85.2% | 91.5% | 6.30% |
9 | 10,000 | 20,000 | 20,018 | 136 | 50% | 91.9% | 93.4% | 1.5% |
9 | 25,000 | 50,000 | 50,018 | 136 | 50% | 91.9% | 92.6% | 0.07% |
9 | 50,000 | 100,000 | 100,018 | 136 | 50% | 91.9% | 94.9% | 3.0% |
9 | 100,000 | 200,000 | 200,018 | 136 | 50% | 91.9% | 94.9% | 3.0% |
9 | 200,000 | 400,000 | 400,018 | 136 | 50% | 91.9% | 95.6% | 3.74% |
9 | 300,000 | 600,000 | 600,018 | 136 | 50% | 91.9% | 94.9% | 3.0% |
Number of Original Samples for Each Class | Number of the Virtual Samples for Each Class | Total Number of Virtual Samples | Number of Samples for Training | Number of the Sample for Testing | Deep Learning without VSG | Linear SVM Accuracy without VSG | Accuracy with VSG and Deep Learning | Improvement |
---|---|---|---|---|---|---|---|---|
3 | 10,000 | 20,000 | 20,006 | 24 | 79.17% | 87.5% | 95.8% | 8.3% |
3 | 25,000 | 50,000 | 50,006 | 24 | 79.17% | 87.5% | 95.8% | 8.3% |
3 | 50,000 | 100,000 | 100,006 | 24 | 79.17% | 87.5% | 95.8% | 8.3% |
3 | 100,000 | 200,000 | 200,006 | 24 | 79.17% | 87.5% | 95.8% | 8.3% |
3 | 200,000 | 400,000 | 400,006 | 24 | 79.17% | 87.5% | 91.7% | 4.2% |
3 | 300,000 | 600,000 | 600,006 | 24 | 79.17% | 87.5% | 91.7% | 4.2% |
5 | 10,000 | 20,000 | 20,010 | 20 | 85.0% | 85.0% | 90.0% | 5.0% |
5 | 25,000 | 50,000 | 50,010 | 20 | 85.0% | 85.0% | 95.0% | 10.0% |
5 | 50,000 | 100,000 | 100,010 | 20 | 85.0% | 85.0% | 90.0% | 5.0% |
5 | 100,000 | 200,000 | 200,010 | 20 | 85.0% | 85.0% | 90.0% | 5.0% |
5 | 200,000 | 400,000 | 400,010 | 20 | 85.0% | 85.0% | 90.0% | 5.0% |
5 | 300,000 | 600,000 | 600,010 | 20 | 85.0% | 85.0% | 90.0% | 5.0% |
6 | 10,000 | 20,000 | 20,012 | 18 | 88.89% | 88.89% | 88.9% | 0.0% |
6 | 25,000 | 50,000 | 50,012 | 18 | 88.89% | 88.89% | 88.9% | 0.0% |
6 | 50,000 | 100,000 | 100,012 | 18 | 88.89% | 88.89% | 88.9% | 0.0% |
6 | 100,000 | 200,000 | 200,012 | 18 | 88.89% | 88.89% | 88.9% | 0.0% |
6 | 200,000 | 400,000 | 400,012 | 18 | 88.89% | 88.89% | 94.4% | 5.51% |
6 | 300,000 | 600,000 | 600,012 | 18 | 88.89% | 88.89% | 94.4% | 5.51% |
9 | 10,000 | 20,000 | 20,018 | 12 | 91.70% | 91.70% | 91.7 | 0.0% |
9 | 25,000 | 50,000 | 50,018 | 12 | 91.70% | 91.70% | 100% | 8.3% |
9 | 50,000 | 100,000 | 100,018 | 12 | 91.70% | 91.70% | 100% | 8.3% |
9 | 100,000 | 200,000 | 200,018 | 12 | 91.70% | 91.70% | 95.8% | 4.19% |
9 | 200,000 | 400,000 | 400,018 | 12 | 91.70% | 91.70% | 91.7% | 0.00% |
9 | 300,000 | 600,000 | 600,018 | 12 | 91.70% | 91.70% | 91.70% | 0.00% |
Feature 1 | Feature 2 | Feature 3 | Feature 4 |
---|---|---|---|
5.1000 | 3.5000 | 1.4000 | 0.2000 |
4.9000 | 3.0000 | 1.4000 | 0.2000 |
4.7000 | 3.2000 | 1.3000 | 0.2000 |
4.6000 | 3.1000 | 1.5000 | 0.2000 |
5.0000 | 3.6000 | 1.4000 | 0.2000 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 |
---|---|---|---|
7.000 | 3.2000 | 4.7000 | 1.4000 |
6.4000 | 3.2000 | 4.5000 | 1.5000 |
6.9000 | 3.1000 | 4.9000 | 1.5000 |
5.5000 | 2.3000 | 4.0000 | 1.3000 |
6.5000 | 2.8000 | 4.6000 | 1.5000 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 | |
---|---|---|---|---|
MAXA | 5.1000 | 3.6000 | 1.5000 | 0.2000 |
MAX B | 7 | 3.2000 | 4.9000 | 1.5000 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 | |
---|---|---|---|---|
MINA | 4.600 | 3.0000 | 1.3000 | 0.2000 |
MIN B | 5.50000 | 2.30000 | 4.00 | 1.3000 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 | |
---|---|---|---|---|
Mean A | 4.8600 | 3.2800 | 1.4000 | 0.2000 |
Mean B | 6.4600 | 2.9200 | 4.5400 | 1.4400 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 |
---|---|---|---|
4.8856 | 3.7828 | 1.3874 | 0.2000 |
4.9425 | 3.0040 | 1.4145 | 0.2000 |
5.0047 | 3.4980 | 1.4130 | 0.2000 |
5.0954 | 3.4012 | 1.4648 | 0.2000 |
4.6168 | 3.0401 | 1.4256 | 0.2000 |
4.3748 | 3.3998 | 1.3600 | 0.2000 |
4.5446 | 3.5024 | 1.3004 | 0.2000 |
5.2189 | 3.5120 | 1.4902 | 0.2000 |
4.0010 | 3.5980 | 1.4533 | 0.2000 |
4.3178 | 3.5890 | 1.4503 | 0.2000 |
Feature 1 | Feature 2 | Feature 3 | Feature 4 |
---|---|---|---|
5.7083 | 2.8643 | 4.4777 | 1.3378 |
6.0240 | 2.4625 | 4.2046 | 1.4321 |
5.7270 | 2.8160 | 4.6385 | 1.4882 |
6.2451 | 2.4472 | 4.1338 | 1.4951 |
6.7130 | 3.1154 | 4.5923 | 1.3216 |
6.4493 | 2.3696 | 4.5706 | 1.3358 |
6.5326 | 2.6047 | 4.2064 | 1.4493 |
6.4594 | 2.8226 | 4.1640 | 1.3099 |
6.5940 | 2.7277 | 4.1497 | 1.3143 |
6.7898 | 3.0248 | 4.1346 | 1.3978 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wedyan, M.; Crippa, A.; Al-Jumaily, A. A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing. Algorithms 2019, 12, 160. https://doi.org/10.3390/a12080160
Wedyan M, Crippa A, Al-Jumaily A. A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing. Algorithms. 2019; 12(8):160. https://doi.org/10.3390/a12080160
Chicago/Turabian StyleWedyan, Mohammad, Alessandro Crippa, and Adel Al-Jumaily. 2019. "A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing" Algorithms 12, no. 8: 160. https://doi.org/10.3390/a12080160
APA StyleWedyan, M., Crippa, A., & Al-Jumaily, A. (2019). A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing. Algorithms, 12(8), 160. https://doi.org/10.3390/a12080160