Algorithm 1 Embedded Feature Subset Optimization Algorithm | |

functionGenetic-Algorithm(population, Fitness-Function) | |

inputs: population, set of c random feature subsets | ▹ c: number of chromosomes |

repeat | |

new$\_$population$\phantom{\rule{3.33333pt}{0ex}}\leftarrow \phantom{\rule{3.33333pt}{0ex}}$ empty set | |

for $i=1$ to Size(population) do | |

$x\leftarrow $Random-Selection(population, Fitness-Function) | ▹ x is selected random w.r.t. fitness-score as its probability |

$y\leftarrow $Random-Selection(population, Fitness-Function) | ▹ y is selected random w.r.t. fitness-score as its probability |

child ← Reproduce($x,y$) | ▹ $x,y$ are chromosomes and subsets of the feature set |

if small random probability then child ← Mutate(child) | ▹ this probability is defined by a selected mutation-rate |

add child to new$\_$population | |

$population\leftarrow new\_population$ | |

until solution is found that satisfies minimum criteria, or enough $generations$ have elapsed | |

return the best set in population, according to Fitness-Function | ▹ best feature subset, according to the Fitness-Function |

functionReproduce($x,y$) | |

inputs: $x,y$, two chromosomes from the population | ▹ evaluated by the Fitness-Function |

$n\leftarrow $Length(x); | |

$l\leftarrow $ number from 1 to n | ▹ l is defined by a selected crossover-rate |

$child\leftarrow $Append(Substring($x,1,l$), Substring($y,l+1,n$)) | ▹ new chromosome |

returnchild | |

functionFitness-Function(population) | ▹ user defined |

inputs: population, a set of c random feature subsets | ▹ c: number of chromosomes |

for $j=1$ to Size(population) do | |

$m\leftarrow $Size(population(j)) | ▹ $m\phantom{\rule{3.33333pt}{0ex}}$ is the dimensionality of ${j}^{th}$ chromosome |

SVM-model ←Train-SVM(Training-Samples(population(j)) | |

$AC{C}_{train}\leftarrow $ accuracy of the SVM-model on the Training-Samples | |

${\mathit{w}}_{\mathit{m}}\leftarrow $ Decision-Boundary(SVM-model) | |

$\left|\right|{\mathit{w}}_{\mathit{m}}\left|\right|\leftarrow $Magnitude(${\mathit{w}}_{\mathit{m}}$) | ▹ the 2-norm of $\left|\right|{\mathit{w}}_{\mathit{m}}\left|\right|=$ decision boundary margin |

$F{F}_{1}\leftarrow \frac{1}{AC{C}_{train}}$ | |

$F{F}_{2}\leftarrow a\frac{1}{m}$ | ▹ a is a user defined weight parameter of $F{F}_{2}$ |

$F{F}_{3}\leftarrow b\frac{1}{\left|\right|{\mathit{w}}_{\mathit{m}}\left|\right|}$ | ▹ b is a user defined weight parameter of $F{F}_{3}$ |

$\mathit{FF}\left(\mathit{j}\right)\leftarrow F{F}_{1}+F{F}_{2}+F{F}_{3}$ | ▹ fitness-score of the ${j}^{th}$ chromosome |

fitness-score $\leftarrow \mathit{FF}$ | ▹ fitness-scores for all chromosomes in the population |

returnfitness-score |

**Figure 2.**Dimensionality reduction and most relevant feature subset selection: (

**a**) Filter method and (

**b**) Wrapper method.

**Figure 3.**Optimization function $F{F}_{3}$ in terms of set distribution, where the set ${B}_{k}\phantom{\rule{3.33333pt}{0ex}}\in \phantom{\rule{3.33333pt}{0ex}}{A}_{m}$, $k\phantom{\rule{3.33333pt}{0ex}}\le \phantom{\rule{3.33333pt}{0ex}}m$ and m is the number of all features.

**Figure 4.**Filter method (benchmark): Four machine learning algorithms evaluated with respect to sequential dimensionality reduction using the mutual information relevance metric. (

**a**) Simple Tree, (

**b**) Radial Basis Function kernel Support Vector Machine, (

**c**) Linear Support Vector Machine, (

**d**) k-Nearest Neighbor. Data: $N=Overall=56\phantom{\rule{3.33333pt}{0ex}}(p=32,n=24),\phantom{\rule{3.33333pt}{0ex}}R=Train=38\phantom{\rule{3.33333pt}{0ex}}(p=23,n=15),\phantom{\rule{3.33333pt}{0ex}}E=Test=18\phantom{\rule{3.33333pt}{0ex}}(p=9,n=9)$.

**Figure 5.**Feature Optimization Results (left y-scale) of the Embedded Genetic Algorithm Wrapper Method and the Dimension Reduction (right y-scale). Data: $N=56\phantom{\rule{3.33333pt}{0ex}}(p=32,n=24),\phantom{\rule{3.33333pt}{0ex}}R=38\phantom{\rule{3.33333pt}{0ex}}(p=23,n=15),\phantom{\rule{3.33333pt}{0ex}}E=18\phantom{\rule{3.33333pt}{0ex}}(p=9,n=9)$.

