A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing

: Deep neural networks are successful learning tools for building nonlinear models. However, a robust deep learning-based classiﬁcation model needs a large dataset. Indeed, these models are often unstable when they use small datasets. To solve this issue, which is particularly critical in light of the possible clinical applications of these predictive models, researchers have developed approaches such as virtual sample generation. Virtual sample generation signiﬁcantly improves learning and classiﬁcation performance when working with small samples. The main objective of this study is to evaluate the ability of the proposed virtual sample generation to overcome the small sample size problem, which is a feature of the automated detection of a neurodevelopmental disorder, namely autism spectrum disorder. Results show that our method enhances diagnostic accuracy from 84%–95% using virtual samples generated on the basis of ﬁve actual clinical samples. The present ﬁndings show the feasibility of using the proposed technique to improve classiﬁcation performance even in cases of clinical samples of limited size. Accounting for concerns in relation to small sample sizes, our technique represents a meaningful step forward in terms of pattern recognition methodology, particularly when it is applied to diagnostic classiﬁcations of neurodevelopmental disorders. Besides, the proposed technique has been tested with other available benchmark datasets. The experimental outcomes showed that the accuracy of the classiﬁcation that used virtual samples was superior to the one that used original training data without virtual samples.


Introduction
Deep learning computational models consist of many processing layers in order to learn representations of data with many levels of abstraction [1].Deep learning is a machine learning mechanism that employs many layers of nonlinear information processing for the purposes of supervised or unsupervised learning, feature extraction, and classification.Deep learning algorithms depend heavily on the number of data and computing power [2][3][4].Accordingly, they do not perform to an optimal level with small datasets.Recent advances in terms of adequate amounts of collected data and increased levels of computing power [2] have led to a resurgence in neural network research, and this has in turn sparked a new era of (deep) machine learning research.
The size of available clinical samples is ordinarily small in the field of medical research due to the intrinsic prevalence of disorders and other factors such as elevated costs for patient recruitment and the limited time available for evaluations.Small sample sizes significantly limit the ability of pattern recognition methods to predict or classify individuals of different groups correctly, and this leads to inaccurate classification performance.Indeed, supervised classification methods require a training dataset in order to learn the classification algorithm that best differentiates the two groups and a testing dataset in order to verify the classification performance on previously unseen data.In medical research, samples and datasets are usually not large enough to perform the training phase and the testing phase of the computerized algorithm on a totally independent dataset.This renders this methodology prone to over fitting.Nevertheless, different methods have been proposed to overcome this critical issue.The main three methods are Mega Trend Diffusion (MTD), Functional Virtual Population (FVP), and Multivariate Normal synthetic sample generation (MVN).

• Mega Trend Diffusion (MTD):
MTD was proposed in [15] based on the information diffusion method that uses fuzzy theories to fill missing data [16,17].The main difference between MTD and the information diffusion method is that MTD employs a general diffusion function to scatter a collection of data across the whole dataset, while the information diffusion method diffuses each sample separately [17].
MTD merges mega diffusion with data trend estimation to control the symmetrical expansion issue and to increase the learning accuracy in flexible manufacturing system scheduling.Both the diffusion-neural-network and megatrend-diffusion require membership function values such as extra information, and that means the appearance possibility for each input attribute and the number of input attributes needed for artificial neural network training.This process makes the calculation more complicated and lengthy in duration.In addition, the membership function values basically do not hold any managerial meaning [15,18].Lastly, the MTD technique is applied to simulated systems.Therefore, it is not clear what this technique achieves in real system situations [19].
• Functional Virtual Population (FVP): The functional virtual population was developed in [20].FVP is based on data domain expansion methods (i.e., left, right, and both sides) for small datasets [17].The FVP method operates by adding virtual samples for training assistance and acquiring scheduling knowledge in dynamic industrializing systems.The strategy includes data decreasing, data increasing, and mixed data to form a functional virtual population.The generated virtual samples increase the learning performance of neural networks [21].The FVP technique was the first method that was proposed for small dataset managing, and it was developed to extend the domain of attributes and produce virtual samples for the purposes of constructing early scheduling knowledge.It is based on a trial and error procedure and requires many steps to complete the process [22].This method has significant limitations when applied to systems including nominal variables or high variance between stages [23].

• Multivariate Normal synthetic sample generation (MVN):
MVN has two parameters, and each parameter contains more than one piece of information.One parameter sets the centre of the distribution, and the other parameter determines the dispersion and the width of the spread from side to side of the distribution centre [24].The MTD and FVP methods extend the dataset by enlarging the domains of the feature dataset while the MVN method synthetically produces input specifying single dimensional multivariate normal data sample generation [19,25].MVN synthetic sample generation uses multivariate covariance dependencies among basic samples.In addition, it maintains the ingrained noise of samples [19].MVN utilizes the covariance matrix that summarizes the interaction between different components of data [26].

Virtual Sample Generation technique
As mentioned above, the main goal of MTD, FVP, and MVN is generating virtual samples.Virtual Sample Generation (VSG) is a data preprocessing technique proposed to increase the prediction accuracy for the small-dataset issue [27].This idea was first proposed in 1998 by [28] to improve the recognition (object and speech recognition) performance for image and sound datasets.
Another work [22] used the bootstrap method to generate virtual samples in order to enhance the accuracy of a computerized algorithm derived from a small clinical sample in predicting radiotherapy outcomes.Results showed that when the amount of training data and learning accuracy were directly correlated to the prediction, the outcome of radiotherapy increased stably from 55%-85%.
The study published in [29] developed a novel technique, based on random variate generation, that was used in the early phase of data gathering to handle a DNA microarray categorization issue.This technique was employed to search for discrete collections in the DNA microarray and to consider outliers as various sets in a nonlinear method.This technique generated virtual samples from the original small datasets based on binary classification.Moreover, the UCI database [30] experiments were used in this study.Finally, the results showed that this method largely enhanced the accuracy of machine learning in the early stages of DNA microarray data, and it significantly helped in terms of solving the problem of the extremely small training dataset with nonlinear data or outliers.
The work in [31] proposed a method based on a generative adversarial network put together with a deep neural network.The generative adversarial network was trained with the training group to generate virtual sample data, which increased the training group.Then, the deep neural network classifier was trained with the virtual samples.After that, they tested the classifier with the original test group, and the indicators validated the effectiveness of the method for multi-classification with a small sample size.As an experimental case, the method was then used to recognize the stages of cancers with a small classified sample size.The empirical results verified that the proposed method achieved better accuracy than traditional methods.
The study [32] aimed to generate a synthetic sample that reflected the attributes of the people listed in the "Health Survey for England".They used data from the "Health Survey for England" to define gender and the age-dependent distributions of continuous variable risk factors such as weight, height, number of cigarettes/day, and units of alcohol/week and the prevalence of binary risk factors such as diabetes and smoking status.Spearman rank correlations including gender and age were defined for these risk variables.A table of normally-distributed random numbers was produced.The sample was then generated utilizing a reverse lookup of the gamma distribution value using the random percentiles for continuous variables or setting a binary variable to one when the random percentile fell below the prevalence threshold.The new method produced big virtual samples with risk factor distributions very closely matching those of the real "Health Survey for England" population.This sample can be utilized to model the likely impact of new therapies or predict mortality.
The work published in another paper [5] developed a new method of VSG called genetic algorithm-based virtual sample generation, which was based on the integrated effects and restrictions of data attributes.The first procedure determined the appropriate range by utilizing MTD functions.Then, a genetic algorithm was applied to accelerate the generation of the most appropriate virtual samples.The last step was verification of the performance of the proposed method by comparing the results with two different forecasting models.The experimental outcomes showed that the performance of the method that used virtual samples was superior to the one that used original training data without virtual samples.
Consequently, the proposed technique can enhance learning performance significantly when working with small samples.Many of these previous sample generation approaches have shown a good ability to enhance prediction and classification performance.However, none of them are based on the overlapping that exists in the features stage.Accordingly, this study presents a new virtual sample technique that also considers avoiding overlaps among each of the features in the different classes.Furthermore, this study is distinguished by its ability to generate and deal with a huge amount of virtual samples that is hundreds of thousands of samples instead of tens or hundreds of virtual samples.

The Proposed Method
This section presents detailed steps to illustrate the method we developed in the present work, from data selection to the construction and generation of the virtual samples necessary to build the models and the classification tool.A schematic description of the whole procedure is shown in Figure 1.The procedure involved three steps: the first step was the selection of a small set of samples from a whole dataset; the second was the application of the VSG method to generate new samples; and the third step included data prediction and classification.The algorithm automatically ranked the top discriminative features and excluded the non-discriminative features.In the second step, a small set of samples was randomly selected, called selected samples, from the entire dataset, and the remaining samples were used as testing data.In the third step, we applied the proposed VSG method to generate new virtual samples, depending on the selected samples from the second step.Then, virtual samples were added to the selected samples for the next learning step.In the fourth step, the combined dataset (i.e., the original dataset plus the newly-generated virtual samples) was used to train the deep neural network, then we checked the classifier performance by classifying the testing data.
The first step of the proposed technique was to select samples randomly.These numbers were selected randomly from intervals of 3-10 as a small database for the first time; then, these numbers were set for all experiments to achieve comparisons.The number of randomly selected samples were 3, 5, 6, and 9. From each class, the Selected Samples (SS) were from the whole dataset (30 samples), and the remaining 24, 20, 18, and 12 samples, respectively, were the testing data.
The features of small selected samples of two different classes are represented in two matrices A and B with the same size as follows: Let I = {1, . . . ,m} denote the indices of the rows of matrices A and B. Let J = {1, . . . ,n} denote the indices of the column of matrices A and B. Let a ij denote the value of the element A The algorithm started by finding the following: The minimum value for each feature in matrix A is NA.It appears as a single dimension matrix as follows: The maximum value for each feature in matrix A is MA.It appears as a single dimension matrix as follows: The mean value for each feature in matrix A is MeA.It appears as a single dimension matrix as follows: The minimum value for each feature in matrix B is NB.It appears as a single dimension matrix as follows: The maximum value for each feature in matrix B is MB.It appears as a single dimension matrix as follows: The mean value for each feature in matrix B is MeB.It appears as a single dimension matrix as follows: Then, for each iteration, starting from the first elements in the matrices NA, MA, NB, and MB to the last elements in these matrices, these four elements were the inputs of Equation (1) to check if there was any overlapping occurring in between them.If there was no overlap, then we expanded the intervals by Equations ( 5) and ( 6), otherwise, we tried to solve the overlaps by Equation (2).
The second process was the random variant generation to create new Virtual Samples (VS) based on the maximum, minimum, and mean calculated for every discriminative feature in each group in SS.After this, the virtual samples were added to SS by Equations ( 3) and ( 4).The number of generated samples could vary in terms of N (N = 10,000, 25,000, 50,000, 100,000, 200,000, and 300,000).All generated feature j values were between the interval (Min j , Max j ), while with a normal Gaussian distribution, the VS was generated by Equation (7).
The third step was the training phase of the system: SS and VS were employed as learning data and the residual data in the first step as testing data.Lastly, the classification performance was evaluated by using the Softmax classifier.
The proposed work can be summarized as follows: the new virtual samples were generated, depending on the original small dataset; the original dataset plus the newly-generated virtual samples both can be added to the training samples to expand the training dataset for machine learning significantly.Thus, our method can produce virtual samples closely related to the original datasets by applying a normal Gaussian distribution.The proposed work aims to make the classification model more stable and overcome the common limitations of the classification performance.
The working condition of the proposed system was that there were no overlaps between intervals.The system checked if the intervals had a direct ideal case, and the system started to generate a virtual sample without any pre-processing.Otherwise, the system tried to get rid of overlaps if success expanded the interval range, and it would start to generate virtual samples between the known intervals.Otherwise, it would skip to the next features.

Virtual Sample Generation Method
In the third step, for each discriminative feature, we applied the VSG method described in Algorithm 1 to generate virtual samples and then added them to the original dataset to produce a combined dataset for learning.The proposed method consisted of two parts.The first was the pre-processing technique, which searched for the rational interval (without overlaps) of each feature with the corresponding feature in another class.The second part was the random variant generation that increased the training dataset based on the original dataset for each rational feature.The code structure of the VSG method was as follows.

The Pre-Processing Method
The purpose of the pre-processing step was to preface the discriminative feature within the selected samples precisely.This step was mandatory in order to avoid overlapping of the sequence of corresponding discriminative features before generating new virtual samples.As an example, one of the datasets tested in this work included two groups: one Related Group (RG) including participants with autism spectrum disorders and an Unrelated Group (URG), which means healthy participants.It was clear that it was not reasonable to overlap the two different groups during the generation; therefore, this event was precluded by the pre-processing method as the following equations: For each feature, its values are ordered in descending sort for class A in matrix MlsA, as follows: For each feature, its values are ordered in ascending sort for class A in matrix mlsA, as follows: For each feature, its values are ordered in descending sort for class B in matrix MlsB, as follows: For each feature, its values are ordered in ascending sort for class B in matrix mlsB, as follows: Here, Equation (2) tries to handle overlaps, and after each execution of this equation, it checks the ideal case.
If it succeeded to handle overlap (c < I), then Equations ( 5) and ( 6) were executed to expand the intervals, which were follow by Equation (7), which would be executed to generate N virtual features.Otherwise, the feature would be deleted and would not be taken into the consideration for generating virtual samples.Furthermore, the columns (features) in matrices A and B that failed to generate virtual numbers according to the previous Equation (2) would be deleted, and a new two matrices with new column dimension (H) would be generated NA I H and NB I H H denotes the new number of the column of matrices NA and NB after deletion of unsuccessful columns (features) from matrices A and B, where H ≤ J.
In addition, the generated single dimension matrices GAh N1 , h = 1, . . ., H and GBh N1 , h = 1, . . ., H were combined horizontally into a single matrix NGA NH and NGB NH respectively, as shown in Equation (3).N = number of virtual samples.
Finally, in Equation ( 4) combined vertically the NA I H with NGA NH matrices, and NB I H with NGB NH into a single matrices AA (I+N)H and BB (I+N)H respectively.These matrices (data) will be used as learning data for the classifier.
Algorithm 1 The algorithm for virtual sample generation.The pre-processing technique primarily grouped samples based on their spatial relationship.The pre-processing algorithm can be summarized as follows:

Input
Step 1: Find the maximum value, minimum value, and computed arithmetic mean for each feature for each class.Following this, check for any possible overlaps between the maximum values and minimum values for each feature in the first class with the corresponding feature in the second class.It is necessary to exclude possible overlaps (ideal cases, as can be noticed in Figure 2) before generating meaningful virtual samples.
Step 2: After having verified the absence of overlaps, expand the sides using the mean for each feature and generating meaningful virtual samples based on a Gaussian distribution, as shown in Figure 4.  Now, expand the intervals, both right and left sides, by Equations ( 5) and ( 6), respectively.
Step 3: In the case of overlaps, as in the case depicted in Figure 3 of the process, start removing overlaps by moving the points that are far from the mean.Step 4: Repeat Steps 1-3 until removing all overlaps based on "Algorithm 2".After this, the process generated virtual samples based on a Gaussian distribution, Equation (7).
During this step, after the collections were formed using the pre-processing method for each discriminative feature, the mean and standard deviation for each group for all discriminative features were recalculated before moving to the next process.The distribution example of the virtual sample distribution after the pre-processing method is illustrated in Figure 5.

Random Virtual Generation Technique
In the second process, Equation ( 7) generated new virtual samples depending on the maximum, minimum, mean, and standard deviation computed for every collection of all discriminative features and added them to the corresponding group of the original dataset Equations ( 3) and (4).A MATLAB function [33] was used to generate a virtual sample in a normal distribution.The distribution of the virtual sample is illustrated in Figure 6.
The Gaussian distribution is very widely used for distributions in probability, statistics, and many natural phenomena [34].For example, heights, weight, blood pressure, the temperature during a year, and IQ scores follow a normal distribution.The Gaussian normal distribution is helpful because of the central limit theorem, which is a very important theorem in statistics.In addition, this was to avoid accumulation of all values in a few points only. where: X is a normal random variable.
µ is a mean.σ is a standard deviation.

Datasets
The above-described method was developed using a dataset provided by Scientific Institute IRCCSEugenio Medea.In addition, the proposed technique was tested with other available benchmark datasets.
The first dataset was collected at Scientific Institute IRCCS Eugenio Medea in Italy (Bosisio Parini, Italy).The dataset included data from 30 participants divided into two groups: 15 children with a clinical diagnosis of Autism Spectrum Disorder (ASD) ("ASD is a neurodevelopmental disorder characterised by persistent social impairment, communication abnormalities, and restricted and repetitive behaviours (DSM-5) (1, 2).ASD is a complex condition with an average prevalence of about 1% worldwide.")and 15 participants with typical development.For each participant, 17 kinematic parameters (numerical features) related to an upper-limb movement were registered [35].In the original work, a feature selection algorithm was used to identify the seven features that best differentiated participants with ASD from healthy controls.In the present work, our algorithm randomly selected six participants (from now on, samples) as an original small dataset, with three samples for each group (ASD and healthy children), and the remaining 24 samples were used as the testing dataset.For comparison, we generated 10,000 virtual samples using the proposed method for every collection in the first trial, and then, this rose by the following numbers each time: 25,000, 50,000, 100,000, 200,000, and 300,000.The identical learning procedure was repeated for five rounds.
The second dataset was the Escherichia coli dataset [30,36], and this included values of the measurements of E. coli bacteria (commonly found in the human gut).This dataset included 336 samples.Each sample belonged to one of eight classes, where each class referred to a type of E. coli bacteria.Each sample consisted of eight attributes; see the dataset description listed in Table 1.The training and testing set for this dataset were chosen as follows.We selected 77 samples for each of the first two classes (CPand IM) to ensure that we had binary classes with an equal sample number.Then, three samples were selected randomly from each of the selected class.These three samples represented the original small dataset.Starting from these three samples, the proposed method generated 10,000 virtual samples for every class for the first time.Then, the number of virtual samples were increased as follows: 25,000, 50,000, 100,000, 200,000, and 300,000 from each class.To give an example of this, the total number of training samples for the first round was 20,006 (six actual samples plus 20,000 virtual samples).The remaining 148 samples were used as testing data.After that, we repeated the experiment, randomly selecting five samples as an original small dataset, from each class of the E. coli dataset.Doing so, the total number of training samples was 10 plus the virtual samples.The remaining 144 samples were used as testing samples, and these steps were repeated five times.
The third dataset was the Breast Tissue dataset [36,37], and this included values of the measurements relative to the electrical impedance of freshly-excised tissue from the breast.This dataset included 106 samples where each sample belonged to one of six classes.Six classes of freshly-excised tissue were studied using electrical impedance measurements: carcinoma, fibroadenoma, mastopathy, glandular, connective, and adipose.The training and the testing set for this dataset were chosen as follows.We selected 15 samples for each of the first two classes (carcinoma and fibroadenoma) to ensure that we had binary classes with an equal samples number.Then, three samples were selected randomly from each of the selected classes.These three samples represented the original small dataset.Starting from these three samples, the proposed method generated 10,000 virtual samples for every class for the first time.Then, the number of virtual samples were increased as follows: 25,000, 50,000, 100,000, 200,000, and 300,000 from each class.To give an example of this, the total number of training samples for the first round was 20,006 (six actual samples plus 20,000 virtual samples).The remaining 24 samples were used as testing data.After that, we repeated the experiment, randomly selecting five samples as an original small dataset, from each class of the breast tissue dataset.Doing so, the total number of training samples was 20 plus the virtual samples.The remaining 20 samples were used as testing samples.These steps were repeated five times.

Classification Techniques
In this study, a deep neural network was implemented with a stacked auto-encoder, using the MATLAB Neural Network Toolbox TM auto-encoder functionality [38] for training a deep neural network and to classify an ASD.In more detail, a stacked auto-encoder is a neural network composed of various layers of sparse auto-encoders where the outputs of each layer are connected to the inputs of the sequential layers.
For classification, we used Softmax, which is a layer located in the top layer of the fully-connected layer network.Its non-linearity predicts the probability distribution over classes that have been given the input sentence [38,39].Details of the Softmax method were described in [40].In the present work, we applied Softmax to validate and evaluate the performance of our newly-proposed method.
In a neural network, it is not feasible to determine the goodness of the network topology only on the basis of the number of inputs and outputs [40].Usually, a neural network must be empirically specified, selecting a model and determining the hidden layer quantity with the minimum validation error among multiple variants of the models [40].Our first experiment was implemented by using a deep learning network with 100 hidden nodes in each layer.In the second experiment, we reduced the number of hidden nodes to 50, whereas in the third experiment, the first layer included 100 hidden nodes and the second layer 50 hidden nodes.This procedure was carried out for investigating the effect of hidden node size in the deep learning network.We continued until we obtained satisfactory results.The best performance was 100 hidden nodes in the first layer and 50 hidden nodes in the second layer.

Experimental Results
This section presents the results of the experiments on the three datasets (IRCCS Medea, E. coli, and Breast Tissue) in detail, for each dataset.
For the IRCCS Medea dataset, the averages accuracies are shown in Figures 7-9 and listed in Table 2.
For the E. coli dataset, the average accuracies are shown in Figures 10 and 11 and are listed in Table 3.
The third dataset was the Breast Tissue dataset.The average accuracies after performing these steps are shown in Figures 12 and 13 and listed in Table 4.The graph in Figure 7 depicts the relationship between the number of original samples for each class that was used to generate the virtual sample and the average of classification accuracy.The graph showed a growth in the relationship between six samples, followed by a levelling out.Overall, from this graph, it is possible to conclude that the number of original samples was not critical for generating the virtual sample.Moreover, this demonstrated that the proposed method was efficient with only a small number of original samples.The plot depicted in Figure 8 shows distinctly that the present deep learning method with the proposed method exceeded in the first three intervals the linear Support Vector Machine method (SVM) used in [35].In the 3, 5, and 6 samples, the "proposed method showed better results in terms of classification performance when using limited sample size compared to traditional linear SVM." The box plot in Figure 9 shows distinctly that the present deep learning method with the proposed method exceeded in overall comparisons the linear SVM method using the IRCCS Medea dataset.The line graph in Figure 10 shows the comparison between the performance of linear SVM without VSG and deep learning with VSG number in the E. coli dataset.The experiment of our method performed on the E. coli dataset replicated the trend of findings for the IRCCS Medea dataset.Indeed, it was clear that deep learning with the proposed method outperformed the linear SVM method without VSG for the first of all intervals considered.
The box plot in Figure 11 shows distinctly that the present deep learning method with the proposed method exceeded in overall comparisons the linear SVM method using the E. coli dataset.
The line graph in Figure 12 shows the comparison between the performance of linear SVM without VSG and deep learning with VSG in the Breast Tissue dataset.The experiment of our method performed on the Breast Tissue dataset replicated the trend of the findings for the IRCCS Medea dataset and the E. coli dataset.Indeed, it is clear that deep learning with the proposed method outperformed the linear SVM method without VSG for the first three intervals considered.The box plot in Figure 13 shows distinctly that the present deep learning method with the proposed method exceeded in overall comparisons the linear SVM method using the Breast Tissue dataset.The graph in Figure 14 shows the relationship between the number of virtual samples (five original samples were used to generate these samples) and the average of classification accuracy.It may be seen clearly that average accuracy reached a peak with 25,000 samples.The accuracy performance remained adequately stable until 200,000 samples; then, a loss of accuracy was observed for 300,000 samples.

The Numerical Example
In this section, we provide a numerical example to explain the proposed method, i.e., "the proposed method simulation."Five samples from both classes A and B are shown in Tables 5 and 6, respectively.Then, the maximum and minimum values were extracted, and the mean values were calculated from the first five features in both classes shown in Tables 7-9, respectively.Check that the case is an ideal condition to expand the interval and start to generate virtual random numbers based on a Gaussian distribution as shown in Tables 10 and 11.

Conclusions and Future Extensions
Deep neural networks are successful learning tools for building nonlinear models; however, they display unstable performance when using small datasets.In order to solve this issue, the present work proposed a new technique to generate virtual samples, starting from the original small dataset, which can be added to the original samples to expand the training dataset significantly for machine learning.Thus, our method can produce meaningful virtual samples closely related to the original datasets in order to make the learning phase more stable and overcome the common limitations of the classification performance, such as the limited size of clinical sample data.
The newly-developed technique focused on binary classification.However, this technique can also be used for multi-class data to generate virtual samples.In the case of multi-class data, the only condition needed would be that intervals must not overlap.Apart from this, the remaining procedure would be the same as for binary classification.
As just mentioned, our method required not having overlaps between intervals.The system checked if the intervals had a direct ideal case (Figure 2), then the system started generating a virtual sample without any pre-processing.Otherwise, the system tried to get rid of overlaps if success expanded the interval range, and it would start to generate virtual samples between the known intervals.Otherwise, it would skip to the next features.Therefore, the novel technique was easy to apply and could help to make the learning stage more stable and enhance the classification performance.
The application of our method to the IRCCS Medea and UCI datasets showed that the present technique could significantly improve the classification accuracy in cases where the dataset had a limited size.More specifically, when applied to the IRCCS Medea dataset, our method augmented the classification accuracy from 84.9 [35] to 95%, 83.8%-93.9%for only three samples in each class for the E. coli dataset, and from 91.7%-100% for the Breast Tissue dataset.
These findings demonstrated that our technique not only offered high classification accuracy, but was also reliable and easy to apply (refer to Algorithms 1 and 2).The experiments reported in this study also showed that, given an appropriate number of the generated virtual samples, the generalization efficiency of the classification on the new training set may be better than that of the original training set, and this agrees with many studies such as [34].
Future studies might delineate more specific matters in determining the best number for the original sample and the virtual sample in order to get maximum accuracy in terms of the results.In addition, the technique could be applied to a wide dataset to determine the strengths and limitations.These suggested directions could help to ensure that this important methodology is further and enhanced.

Figure 1 .
Figure 1.Flowchart of the whole procedure with three different steps: (1) selection of a small sub-sample from the entire dataset and the remaining samples used as testing data; (2) application of the VSG method depending on the selected samples from the first step; (3) data prediction and classification.

Figure 2 .
Figure 2. Ideal cases where there is no overlapping between the feature and the same feature in the corresponding class.

Figure 3 .
Figure 3. Overlapping cases between the feature and the same feature in the corresponding class.

Figure 4 .
Figure 4. Generating meaningful virtual samples based on a Gaussian distribution.

Figure 6 .
Figure 6.Normal distribution for one feature after generating the virtual sample.

Figure 7 .
Figure 7.The curve of the average accuracies of the simulated datasets.

Figure 8 .
Figure 8.The trend of average accuracy using the IRCCS dataset.The line graph in Figure 8 shows the comparison between the performance of linear SVM without Virtual Sample Generation (VSG) and deep learning with the VSG number in the IRCCS Medea dataset.The x-axis of this graph shows the original samples for each class that were used to generate the virtual sample in deep learning and the training sample for linear SVM, while the average of the classification accuracy is the y-axis.

Figure 9 .
Figure 9. Overall comparisons of the experimental results using the IRCCS Medea dataset.

Figure 10 .
Figure 10.The trend of average accuracy using the E. coli dataset.The x-axis of this graph shows the original samples for each class that were used to generate the virtual sample in deep learning and the training sample for linear SVM, while the average of classification accuracy is the y-axis.

Figure 11 .
Figure 11.Overall comparisons of experimental results using the E. coli dataset.

Figure 12 .
Figure 12.The trend of the average accuracy using the Breast Tissue dataset.The x-axis of this graph shows the original samples for each class that were used to generate the virtual sample in deep learning and the training sample for linear SVM, while the average of classification accuracy is the y-axis.

Figure 13 .
Figure 13.Overall comparisons of the experimental results using the Breast Tissue dataset.

Figure 14 .
Figure 14.The accuracy using only the original five small training IRCCS Medea datasets.The x-axis of this graph shows the number of virtual samples, while average accuracy appears on the y-axis.

Table 1 .
Description of the three datasets.

Table 2 .
The results of the average accuracy using the IRCCS Medea dataset.

Table 3 .
The results of the average accuracy using the E. coli dataset.

Table 4 .
The results of the average accuracy using the Breast Tissue dataset.

Table 5 .
Five samples from the first class (A).

Table 6 .
Five samples from the first class (B).

Table 7 .
The maximum values extracted from the first five features.

Table 8 .
The minimum values extracted from the first five features.

Table 9 .
The mean values calculated from the first five features.

Feature 1 Feature 2 Feature 3 Feature 4
, and 4 represented ideal cases, where we expanded the intervals and started to generate virtual random numbers based on a Gaussian distribution without any pre-processing, but Feature 2 was case H, so we performed a preprocessing before generating random numbers.In case H, to convert it to the ideal case, the MaxB or MinA would be changed depending on this simple comparison.

position in the list, which is sorted in ascending order; Else MaxB = next position in the list, which is sorted in descending order; End
The condition is false (3.28 -3 > 3.2 -2.92), so MaxB = 3.1; Check that the case is still not the ideal case; it is a case H; Check the same condition; It is true now, so MinA = 3.1;