# Optimizing Image Classification: Automated Deep Learning Architecture Crafting with Network and Learning Hyperparameter Tuning

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Recent Advances in Automated Network Architecture Design

#### 1.2. Existing Challenges of MSA-Based Automated Network Architecture Design

#### 1.3. Research Objectives and Contributions of Current Works

- ETLBOCBL-CNN is an automated network design approach for discovering optimal CNN architectures for specific classification tasks. It harnesses ETLBOCBL’s optimization capability to identify the best combinations of network hyperparameters (e.g., network depth, layer types, kernel size, filter numbers, pooling size, pooling stride, and neuron numbers) and learning hyperparameters (e.g., optimizer type, learning rate, initializer type, and L2-regularizer) without human intervention.
- ETLBOCBL-CNN incorporates an efficient solution encoding scheme, enabling the search for CNN architectures of varying lengths for diverse datasets while ensuring model validity and promoting the discovery of novel architectures. Moreover, it employs an efficient fitness evaluation process for practicality.
- In ETLBOCBL-CNN, a competency-based learning concept is integrated into the modified teacher phase to encourage exploration and prevent convergence towards local optima. Learners are grouped based on their competency levels, with the more proficient learners collaborating with the teacher solution and population mean to provide more effective guidance to those with lower competence, promoting the discovery of promising CNN architectures.
- To enhance ETLBOCBL-CNN’s robustness against premature convergence, a stochastic peer interaction scheme is introduced in the modified learner phase. This scheme emulates collaborative learning dynamics observed in a classroom, enabling each learner to effectively use available information during the search process by engaging in knowledge sharing and retention with one or multiple peer learners.
- In ETLBOCBL-CNN, a tri-criterion selection scheme is introduced as an enhanced alternative to the conventional greedy selection method. This new selection scheme determines learners’ survival in subsequent iterations by considering their fitness, diversity, and improvement rates. The proposed scheme preserves valuable network information and contributes to long-term population quality improvement by favoring learners with relatively good diversity and commendable fitness improvement, even if their current fitness levels are temporarily lower.
- Extensive simulation studies are performed on image datasets with varying complexity to assess the effectiveness and feasibility of ETLBOCBL-CNN in autonomously discovering optimal CNN architectures. The findings reveal that ETLBOCBL-CNN produces superior CNN architectures, achieving excellent classification performance with reduced complexity compared to state-of-the-art methods on most datasets.

#### 1.4. Paper Outline

## 2. Related Works

#### 2.1. Original TLBO

#### 2.2. CNN

#### 2.3. Existing MSA-Based Network Architecture Design Methods

## 3. Proposed ETLBOCBL-CNN

#### 3.1. Proposed Solution Encoding Scheme

#### 3.2. Population Initialization of ETLBOCBL-CNN

Algorithm 1: Population Initialization of ETLBOCBL-CNN | |

Input: N, ${N}_{min}^{Conv}$, ${N}_{max}^{Conv}$, ${N}_{min}^{Fil}$, ${N}_{max}^{Fil}$, ${S}_{min}^{Ker}$, ${S}_{max}^{Ker}$, ${S}_{min}^{Pool}$, ${S}_{max}^{Pool}$, ${S}_{min}^{Str}$, ${S}_{max}^{Str}$, ${N}_{min}^{FC}$, ${N}_{max}^{FC}$, ${N}_{min}^{Neu}$, ${N}_{max}^{Neu}$, $L{H}_{min}^{Opt}$, $L{H}_{min}^{LR}$, $L{H}_{min}^{Int},$ $L{H}_{min}^{L2}$, $L{H}_{max}^{Opt}$, $L{H}_{max}^{LR}$, $L{H}_{max}^{Int}$, $L{H}_{max}^{L2}$ | |

01: | Compute the dimensional size as $D=5{N}_{max}^{Conv}+{N}_{max}^{FC}+6$; |

02: | Initialize teacher solution as ${X}^{Teacher}.Pos\leftarrow \varnothing $ and ${X}^{Teacher}.Err\leftarrow \infty $; |

03: | for n = 1 to N do |

04: | Initialize ${X}_{n}.Pos\leftarrow \varnothing $; |

05: | for d = 1 to D do |

06: | if $d==1$ then |

07: | Assign ${X}_{n}.{Pos}_{d}$ with ${N}^{Conv}\in \left\{{N}_{min}^{Conv},{N}_{max}^{Conv}\right\}$; |

08: | else if $d==2l$ then |

09: | Assign ${X}_{n}.{Pos}_{d}$ with ${N}_{l}^{Fil}\in \left\{{N}_{min}^{Fil},{N}_{max}^{Fil}\right\}$ for $l=1,\dots ,{N}_{max}^{Conv}$; |

10: | else if $d==2l+1$ then |

11: | Assign ${X}_{n}.{Pos}_{d}$ with ${S}_{l}^{Ker}\in \left\{{S}_{min}^{Ker},{S}_{max}^{Ker}\right\}$ for $l=1,\dots ,{N}_{max}^{Conv}$; |

12: | else if $d==2{N}_{max}^{Conv}+3l-1$ then |

13: | Assign ${X}_{n}.{Pos}_{d}$ with ${P}_{l}^{Pool}\in \left[0,1\right]$ for $l=1,\dots ,{N}_{max}^{Conv}$; |

14: | else if $d==2{N}_{max}^{Conv}+3l$ then |

15: | Assign ${X}_{n}.{Pos}_{d}$ with ${S}_{l}^{Pool}\in \left\{{S}_{min}^{Pool},{S}_{max}^{Pool}\right\}$ for $l=1,\dots ,{N}_{max}^{Conv}$; |

16: | else if $d==2{N}_{max}^{Conv}+3l+1$ then |

17: | Assign ${X}_{n}.{Pos}_{d}$ with ${S}_{l}^{Str}\in \left\{{S}_{min}^{Str},{S}_{max}^{Str}\right\}$ for $l=1,\dots ,{N}_{max}^{Conv}$; |

18: | else if $d==5{N}_{max}^{Conv}+2$ then |

19: | Assign ${X}_{n}.{Pos}_{d}$ with ${N}^{FC}\in \left\{{N}_{min}^{FC},{N}_{max}^{FC}\right\}$; |

20: | else if $d==\left(5{N}_{max}^{Conv}+2\right)+q$ then |

21: | Assign ${X}_{n}.{Pos}_{d}$ with ${N}_{q}^{Neu}\in \left\{{N}_{min}^{Neu},{N}_{max}^{Neu}\right\}$ for $q=1,\dots ,{N}_{max}^{FC}$; |

22: | else if $d==5{N}_{max}^{Conv}+{N}_{max}^{FC}+3$ then |

23: | Assign ${X}_{n}.{Pos}_{d}$ with $L{H}^{Opt}\in \left\{L{H}_{min}^{Opt},L{H}_{max}^{Opt}\right\}$; |

24: | else if $d==5{N}_{max}^{Conv}+{N}_{max}^{FC}+4$ then |

25: | Assign ${X}_{n}.{Pos}_{d}$ with $L{H}^{LR}\in \left\{L{H}_{min}^{LR},L{H}_{max}^{LR}\right\}$; |

26: | else if $d==5{N}_{max}^{Conv}+{N}_{max}^{FC}+5$ then |

27: | Assign ${X}_{n}.{Pos}_{d}$ with $L{H}^{Int}\in \left\{L{H}_{min}^{Int},L{H}_{max}^{Int}\right\}$; |

28: | else if $d==5{N}_{max}^{Conv}+{N}_{max}^{FC}+6$ then |

29: | Assign ${X}_{n}.{Pos}_{d}$ with $L{H}^{L2}\in \left\{L{H}_{min}^{L2},L{H}_{max}^{L2}\right\}$; |

30: |
end if |

31: |
end for |

32: | Fitness evaluation of ${X}_{n}.Pos$ as ${X}_{n}.Er$ using Algorithm 2; |

33: | if ${X}_{n}.Err<{X}^{Teacher}.Err$ then |

34: | ${X}^{Teacher}.Pos\leftarrow {X}_{n}.Pos$, ${X}^{Teacher}.Err\leftarrow {X}_{n}.Err$; |

35: |
end if |

36: | end for |

Output: $P=\left[{X}_{1},\dots ,{X}_{n},\dots .,{X}_{N}\right]$, ${X}^{Teacher}$ |

#### 3.3. Fitness Evaluation of ETLBOCBL-CNN

Algorithm 2: Fitness Evaluation of ETLBOCBL-CNN | |

Inputs: ${X}_{n}.Pos$, ${\mathcal{R}}^{train}$, ${\mathcal{R}}^{valid}$, ${S}^{batch}$, ${\epsilon}^{train}$, ${R}^{L}$, ${C}^{num}$ | |

01: | Construct a candidate CNN architecture based on the network and learning hyperparameters decoded from ${X}_{n}.Pos$ and insert a fully connected layer with ${C}^{num}$ output neurons; |

02: | Compute ${\tau}^{train}$ and ${\tau}^{valid}$ using Equations (4) and (6), respectively; |

03: | Generate the initial weights of the CNN model as $\mathit{\varpi}=\left\{{\varpi}_{1},{\varpi}_{2},\dots \right\}$ using the selected weight initializer; |

04: | for $\epsilon =1$ to ${\epsilon}^{train}$ do |

05: | for $i=1$ to ${\tau}^{train}$ do |

06: | Calculate $f\left(\mathit{\varpi},{\mathcal{R}}_{i}^{train}\right)$ of CNN model; |

07: | Update the weights ${\mathit{\varpi}}^{\mathit{n}\mathit{e}\mathit{w}}=\left\{{\varpi}_{1}^{new},{\varpi}_{2}^{new},\dots \right\}$ based on Equation (5); |

08: |
end for |

09: | end for |

10: | for $j=1$ to ${\tau}^{valid}$ do |

11: | Classify the ${\mathcal{R}}_{j}^{valid}$ dataset using the trained CNN model; |

12: | Record the classification errors for solving the ${\mathcal{R}}_{j}^{valid}$ dataset as $Err\_Batc{h}_{j}$; |

13: | end for |

14: | Calculate ${X}_{n},Err$ of the candidate CNN architecture built from ${X}_{n}.Pos$ with Equation (7); |

Output:
${X}_{n}.Err$ |

#### 3.4. Modified Teacher Phase of ETLBOCBL-CNN

#### 3.4.1. Construction of Mean Network Architecture Represented by Population Mean

Algorithm 3: Computation of Mean Network Architecture Represented by Population Mean | |

Input: $P=\left[{X}_{1},\dots ,{X}_{n},\dots .,{X}_{N}\right]$, $N$, $D$ | |

01: | $\overline{X}.Mean\leftarrow \varnothing $; |

02: | for $d=1$ to D do |

03: | Compute $\overline{X}.Mea{n}_{d}$ using Equation (8); |

04: | if $d\ne 2{N}_{max}^{Conv}+3l-1$ with $l=1,\dots ,{N}_{max}^{Conv}$ do |

05: | $\overline{X}.Mea{n}_{d}\leftarrow Round\left(\overline{X}.Mea{n}_{d}\right)$; |

06: |
end if |

07: | end for |

Output:
$\overline{X}.Mean$ |

#### 3.4.2. Construction of New CNN Architecture Using Competency-Based Learning

**Scenario 1:**When ${X}_{n}.Grp$ is assigned to a group index of $g\ge 3$ for any n-th learner with $n=2{S}^{Group},\dots ,G{S}^{Group}$, at least two groups of learners perform better than ${X}_{n}$.**Scenario 2:**When ${X}_{n}.Grp$ is assigned to a group index of $g=2$ for any n-th learner with $n={S}^{Group},\dots ,2{S}^{Group}$, only one group of learners perform better than ${X}_{n}$.**Scenario 3:**When ${X}_{n}.Grp$ is assigned to a group index of $g=1$ for any n-th learner with $n=1,\dots ,{S}^{Group}$, no learners from any other group perform better than ${X}_{n}$.

Algorithm 4: Competency-Based Learning in ETLBOCBL-CNN’s Modified Teacher Phase | |

Inputs: $P=\left[{X}_{1},\dots ,{X}_{n},\dots .,{X}_{N}\right]$, $N$, $D$, ${X}^{Teacher}$, ${\mathcal{R}}^{train}$, ${\mathcal{R}}^{valid}$, ${S}^{batch}$, ${\epsilon}^{train}$, ${R}^{L}$, ${C}^{num}$, ${S}^{Group}$, $G$ | |

01: | Initialize offspring population set as ${P}^{off}\leftarrow \varnothing $; |

02: | Calculate the population mean $\overline{X}.Mean$ using Algorithm 3; |

03: | Sort all solution members of $P$ ascendingly by referring to their fitness values ${X}_{n}.Err$; |

04: | Determine the group index g assigned to ${X}_{n}.Grp$ of all sorted learners using Equations (9) and (10); |

05: | for $n=1$ to $N$ do |

06: | Initialize the n-th offspring learner as ${X}_{n}^{off}\leftarrow \varnothing $; |

07: | if ${X}_{n}.Grp\ge 3$ then |

08: | Randomly select two better group indices of ${g}_{r1},{g}_{r2}\in \left\{1,g-1\right\},$ where ${g}_{r1}<{g}_{r2}<g;$ |

09: | Randomly select two predominant learners with the population indices represented as |

${n}_{r1}^{{g}_{r1}}\in \left\{\left({g}_{r1}-1\right){S}^{Group},{g}_{r1}{S}^{Group}\right\}$ and ${n}_{r2}^{{g}_{r2}}\in \left\{\left({g}_{r2}-1\right){S}^{Group},{g}_{r2}{S}^{Group}\right\};$ | |

10: | Calculate ${X}_{n}^{off}.Pos$ using Equation (11); |

11: | else if ${X}_{n}.Grp=2$ then |

12: | Randomly select two predominant learners from the first group (i.e., $g=1$) with the population |

indices of ${n}_{r1}^{1},{n}_{r2}^{1}\in \left\{1,{S}^{Group}\right\}$ and ${n}_{r1}^{1}\ne {n}_{r2}^{1},$ where ${X}_{{n}_{r1}^{1}}.Grp={X}_{{n}_{r2}^{1}}.Grp=1$; | |

13: | Compare the fitness values of two predominant learners, i.e., ${X}_{{n}_{r1}^{1}}.Err$ and ${X}_{{n}_{r2}^{1}}.Err$; |

14: | Calculate ${X}_{n}^{off}.Pos$ using Equation (12); |

15: | else if ${X}_{n}.Grp=1$ then |

16: | Calculate ${X}_{n}^{off}.Pos$ using Equation (13); |

17: |
end if |

18: | for $d=1$ to D do |

19: | if $d\ne 2{N}_{max}^{Conv}+3l-1$ with $l=1,\dots ,{N}_{max}^{Conv}$ then |

20: | ${X}_{n}^{off}.Po{s}_{d}\leftarrow Round\left({X}_{n}^{off}.Po{s}_{d}\right)$; |

21: |
end if |

22: |
end for |

23: | Perform fitness evaluation on ${X}_{n}^{off}.Pos$ to obtain ${X}_{n}^{off}.Err$ using Algorithm 2; |

24: | if ${X}_{n}^{off}.Er<{X}^{Teacher}.Err$ then |

25: | ${X}^{Teacher}.Pos\leftarrow {X}_{n}^{off}.Pos$ , ${X}^{Teacher}.Err\leftarrow {X}_{n}^{off}.Err$ |

26: |
end if |

27: | ${P}^{off}\leftarrow {P}^{off}{\displaystyle \cup}{X}_{n}^{off}$; |

28: | end for |

Outputs: ${P}^{off}=\left[{X}_{1}^{off},\dots ,{X}_{n}^{off},\dots .,{X}_{N}^{off}\right]$, $P=\left[{X}_{1},\dots ,{X}_{n},\dots .,{X}_{N}\right]$, ${X}^{Teacher}$ |

#### 3.5. Modified Learner Phase of ETLBOCBL-CNN

#### 3.6. Tri-Criterion Selection Scheme

Algorithm 5: Stochastic Peer Interaction in ETLBOCBL-CNN’s Modified Teacher Phase | |

Inputs: $N$, $D$, ${P}^{off}=\left[{X}_{1}^{off},\dots ,{X}_{n}^{off},\dots .,{X}_{N}^{off}\right]$, ${X}^{Teacher}$, ${\mathcal{R}}^{train}$, ${\mathcal{R}}^{valid}$, ${S}^{batch}$, ${\epsilon}^{train}$, ${R}^{L}$, ${C}^{num}$ | |

01: | Initialize clone population set as ${P}^{clone}\leftarrow \varnothing $; |

02: | Construct ${P}^{clone}$ by duplicating ${P}^{off}$ and sorting the offspring learners ascendingly by referring to their fitness values of ${X}_{n}^{clone}.Err$ for $n=1,\dots ,N$; |

03: | Construct ${P}^{T20}$ and ${P}^{T50}$ by extracting the top 20% and 50% of offspring learners stored in ${P}^{clone}$; |

04: | for $n=1$ to $N$ do |

05: | for $d=1$ to D do |

06: | Randomly generate $rand\in \left[0,1\right]$ from uniform distribution; |

07: | if $0\le rand<1/3$ then |

08: | Randomly select ${X}_{p}^{T20}$ and ${X}_{q}^{T20}$ from ${P}^{T20}$, where $p\ne q\ne n$; |

09: | Update ${X}_{n}^{off}.{Pos}_{d}$ using Equation (14); |

10: | else if $1/3\le rand<2/3$ then |

11: | Randomly select ${X}_{r}^{T50}$ from ${P}^{T50}$, where $r\ne n$; |

12: | Update ${X}_{n}^{off}.{Pos}_{d}$ using Equation (15); |

13: | else if $2/3\le rand\le 1$ then |

14: | Retain the original value of ${X}_{n}^{off}.{Pos}_{d}$; |

15: |
end if |

16: | if $d\ne 2{N}_{max}^{Conv}+3l-1$ with $l=1,\dots ,{N}_{max}^{Conv}$ then |

17: | ${X}_{n}^{off}.{Pos}_{d}\leftarrow Round\left({X}_{n}^{off}.Po{s}_{d}\right)$; |

18: |
end if |

19: |
end for |

20: | Perform fitness evaluation on the updated ${X}_{n}^{off}.Pos$ to obtain new ${X}_{n}^{off}.Err$ using Algorithm 2; |

21: | if ${X}_{n}^{off}.Err<{X}^{Teacher}.Err$ then |

22: | ${X}^{Teacher}.Pos\leftarrow {X}_{n}^{off}.Pos$, ${X}^{Teacher}.Err\leftarrow {X}_{n}^{off}.Err$; |

23: |
end if |

24: | end for |

Output: Updated ${P}^{off}=\left[{X}_{1}^{off},\dots ,{X}_{n}^{off},\dots .,{X}_{N}^{off}\right]$ and ${X}^{Teacher}$ |

Algorithm 6: Tri-Criterion Selection Scheme | |

Inputs: $N$, $P=\left[{X}_{1},\dots ,{X}_{n},\dots ,{X}_{N}\right]$, ${P}^{off}=\left[{X}_{1}^{off},\dots ,{X}_{n}^{off},\dots .,{X}_{N}^{off}\right]$ | |

01: | Initialize ${P}^{Next}\leftarrow \varnothing $; |

02: | for $n=1$ to N do |

03: | Assign ${X}_{n}.Impr=0$ for each n-th original learner stored in $P$; |

04: | Calculate ${X}_{n}^{off}.Impr$ of every n-th offspring learner stored in ${P}^{off}$ with Equation (16); |

05: | end for |

06: | Construct the merged population ${P}^{MG}$ using Equation (17); |

07: | Sort the solution members in ${P}^{MG}$ ascendingly based on fitness values; |

08: | for $n=1$ to 2N do |

09: | Calculate ${X}_{n}^{MG}.Dis$ of every n-th solution stored in ${P}^{MG}$ with Equation (18); |

10: | end for |

11: | Randomly generate the integers of ${K}_{1}\in \left\{1,N\right\}$, ${K}_{2}\in \left\{1,N-{K}_{1}\right\}$ and ${K}_{3}=N-{K}_{1}-{K}_{2}$; |

12: | for $n=1$ to ${K}_{1}$ do /*Fitness criterion*/ |

13: | ${X}_{n}^{Next}\leftarrow {X}_{n}^{MG}$; |

14: | ${P}^{Next}\leftarrow {P}^{Next}{\displaystyle \cup}{X}_{n}^{Next}$; |

15: | end for |

12: | for $n={K}_{1}+1$ to 2N do |

13: | Randomly generate $\alpha $ based on a normal distribution of $N\left(0.9,0.05\right)$; |

14: | Restrict the value of $\alpha $ in between 0.8 and 1. |

15: | Compute the ${X}_{n}^{MG}.WF$ of each n-th solution stored in ${P}^{MG}$ with Equation (19); |

16: | Initialize the flag variable of each n-th solution stored in ${P}^{MG}$ as ${X}_{n}^{MG}.Flag=0$; |

17: | end for |

18: | for $n={K}_{1}+1$ to ${K}_{1}+{K}_{2}$ do /*Diversity criterion*/ |

19: | Randomly select ${X}_{a}^{MG}$ and ${X}_{b}^{MG}$ from ${P}^{MG}$, where $a,b\in \left\{{K}_{1}+1,2N\right\}$, $a\ne b$, and ${X}_{a}^{MG}.Flag={X}_{b}^{MG}.Flag=0$. |

20: | Determine ${X}_{n}^{Next}$ with Equation (20); |

21: | ${P}^{Next}\leftarrow {P}^{Next}{\displaystyle \cup}{X}_{n}^{Next}$; |

22: | if ${X}_{a}^{MG}$ is selected as ${X}_{n}^{Next}$ then /*Prevent the selection of same solution members*/ |

23: | ${X}_{a}^{MG}.Flag=1$; |

24: | else if ${X}_{b}^{MG}$ is selected as ${X}_{n}^{Next}$ then |

25: | ${X}_{b}^{MG}.Flag=1$; |

26: |
end if |

27: | end for |

28: | for $n={K}_{1}+{K}_{2}+1$ to $N$ do /*Improvement rate criterion*/ |

29: | Randomly select ${X}_{e}^{MG}$ and ${X}_{f}^{MG}$ from ${P}^{MG}$, where $e,f\in \left\{{K}_{1}+1,2N\right\}$, e $\ne f$, and ${X}_{e}^{MG}.Flag={X}_{f}^{MG}.Flag=0$. |

30: | Determine ${X}_{n}^{Next}$ using Equation (21); |

31: | ${P}^{Next}\leftarrow {P}^{Next}{\displaystyle \cup}{X}_{n}^{Next}$; |

32: | if ${X}_{e}^{MG}$ is selected as ${X}_{n}^{Next}$ then /*Prevent the selection of same solution members*/ |

33: | ${X}_{e}^{MG}.Flag=1$; |

34: | else if ${X}_{f}^{MG}$ is selected as ${X}_{n}^{Next}$ then |

35: | ${X}_{f}^{MG}.Flag=1$; |

36: |
end if |

37: | end for |

Output:
${P}^{Next}=\left[{X}_{1}^{Next},\dots ,{X}_{n}^{Next},\dots .,{X}_{N}^{Next}\right]$ |

#### 3.7. Complete Mechanisms of ETLBOCBL-CNN

Algorithm 7: Proposed ETLBOCBL-CNN | |

Inputs: $N$, $D$, ${\mathcal{R}}^{train}$, ${\mathcal{R}}^{valid}$, ${S}^{batch}$, ${\epsilon}^{train}$, ${\epsilon}^{FT}$, ${R}^{L}$, ${C}^{num}$, ${N}_{min}^{Conv}$, ${N}_{max}^{Conv}$, ${N}_{min}^{Fil}$, ${N}_{max}^{Fil}$, ${S}_{min}^{Ker}$, ${S}_{max}^{Ker}$, ${S}_{min}^{Pool}$, ${S}_{max}^{Pool}$, ${S}_{min}^{Str}$, ${S}_{max}^{Str}$, ${N}_{min}^{FC}$, ${N}_{max}^{FC}$, ${N}_{min}^{Neu}$, ${N}_{max}^{Neu}$, ${S}^{Group}$, $G$ | |

01: | Load ${\mathcal{R}}^{train}$ and ${\mathcal{R}}^{valid}$ from the directory; |

02: | Initialize the population $P=\left[{X}_{1},\dots ,{X}_{n},\dots ,{X}_{N}\right]$ using Algorithm 1; |

03: | Initialize the iteration counter as $t\leftarrow 0$; |

04: | while $t<{T}^{max}$ do |

05: | Generate ${P}^{off}$ and update ${X}^{Teacher}$ using modified teacher phase (Algorithm 4); |

06: | Update ${P}^{off}$ and ${X}^{Teacher}$ using modified learner phase (Algorithm 5); |

07: | Determine ${P}^{Next}$ using tri-criterion selection scheme (Algorithm 6); |

08: | $P\leftarrow {P}^{Next};$ |

09: | $t\leftarrow t+1$; |

10: | end while |

11: | Fully train the CNN architecture constructed from ${X}^{Teacher}.Pos$ with larger ${\epsilon}^{FT}$ (Algorithm 2); |

Output: ${X}^{Teacher}$ and its corresponding optimal CNN architecture |

## 4. Performance Evaluation of ETLBOCBL-CNN

#### 4.1. Benchmark Dataset Selection

#### 4.2. Simulation Settings

#### 4.3. Performance Analyses

#### 4.3.1. Comparison in Classifying the First Eight Benchmark Datasets

#### 4.3.2. Comparison in Classifying the MNIST-Fashion Datasets

#### 4.3.3. Optimal Network and Learning Hyperparameters Obtained by ETLBOCBL-CNN

#### 4.4. Discussion

#### 4.4.1. Impact of Proposed Modifications in ETLBOCBL-CNN

#### 4.4.2. Qualitative Complexity Analysis of ETLBOCBL-CNN

#### 4.4.3. Quantitative Complexity Analysis of ETLBOCBL-CNN

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 35, 221–231. [Google Scholar] [CrossRef] [PubMed] - Wang, P.; Li, Z.; Hou, Y.; Li, W. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 102–106. [Google Scholar]
- Jayanthi, J.; Jayasankar, T.; Krishnaraj, N.; Prakash, N.; Sagai Francis Britto, A.; Vinoth Kumar, K. An intelligent particle swarm optimization with convolutional neural network for diabetic retinopathy classification model. J. Med. Imaging Health Inform.
**2021**, 11, 803–809. [Google Scholar] [CrossRef] - Goel, T.; Murugan, R.; Mirjalili, S.; Chakrabartty, D.K. OptCoNet: An optimized convolutional neural network for an automatic diagnosis of COVID-19. Appl. Intell.
**2021**, 51, 1351–1366. [Google Scholar] [CrossRef] [PubMed] - Müller, A.; Karathanasopoulos, N.; Roth, C.C.; Mohr, D. Machine learning classifiers for surface crack detection in fracture experiments. Int. J. Mech. Sci.
**2021**, 209, 106698. [Google Scholar] [CrossRef] - Sharma, N.; Jain, V.; Mishra, A. An analysis of convolutional neural networks for image classification. Procedia Comput. Sci.
**2018**, 132, 377–384. [Google Scholar] [CrossRef] - Tang, Y.; Huang, Z.; Chen, Z.; Chen, M.; Zhou, H.; Zhang, H.; Sun, J. Novel visual crack width measurement based on backbone double-scale features for improved detection automation. Eng. Struct.
**2023**, 274, 115158. [Google Scholar] [CrossRef] - Wu, Z.; Tang, Y.; Hong, B.; Liang, B.; Liu, Y. Enhanced precision in dam crack width measurement: Leveraging advanced lightweight network identification for pixel-level accuracy. Int. J. Intell. Syst.
**2023**, 2023, 9940881. [Google Scholar] [CrossRef] - Wu, F.; Yang, Z.; Mo, X.; Wu, Z.; Tang, W.; Duan, J.; Zou, X. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms. Comput. Electron. Agric.
**2023**, 209, 107827. [Google Scholar] [CrossRef] - Yu, N.; Xu, Q.; Wang, H. Wafer defect pattern recognition and analysis based on convolutional neural network. IEEE Trans. Semicond. Manuf.
**2019**, 32, 566–573. [Google Scholar] [CrossRef] - Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Tan, K.C. A survey on evolutionary neural architecture search. IEEE Trans. Neural Netw. Learn. Syst.
**2021**, 34, 550–570. [Google Scholar] [CrossRef] [PubMed] - Wistuba, M.; Rawat, A.; Pedapati, T. A survey on neural architecture search. arXiv
**2019**, arXiv:1905.01392. [Google Scholar] [CrossRef] - Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; Murphy, K. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 19–34. [Google Scholar]
- Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, Shanghai, China, 28–30 November 2018; pp. 4095–4104. [Google Scholar]
- Jaafra, Y.; Laurent, J.L.; Deruyver, A.; Naceur, M.S. Reinforcement learning for neural architecture search: A review. Image Vis. Comput.
**2019**, 89, 57–66. [Google Scholar] [CrossRef] - Zhao, J.; Zhang, R.; Zhou, Z.; Chen, S.; Jin, J.; Liu, Q. A neural architecture search method based on gradient descent for remaining useful life estimation. Neurocomputing
**2021**, 438, 184–194. [Google Scholar] [CrossRef] - Kandasamy, K.; Neiswanger, W.; Schneider, J.; Poczos, B.; Xing, E.P. Neural architecture search with bayesian optimisation and optimal transport. Adv. Neural Inf. Process. Syst.
**2018**, 31. [Google Scholar] - Zhou, H.; Yang, M.; Wang, J.; Pan, W. Bayesnas: A bayesian approach for neural architecture search. In Proceedings of the International Conference on Machine Learning, Bangkok, Thailand, 13–15 December 2019; pp. 7603–7613. [Google Scholar]
- Camero, A.; Wang, H.; Alba, E.; Bäck, T. Bayesian neural architecture search using a training-free performance metric. Appl. Soft Comput.
**2021**, 106, 107356. [Google Scholar] [CrossRef] - Ahmad, M.; Abdullah, M.; Moon, H.; Yoo, S.J.; Han, D. Image classification based on automatic neural architecture search using binary crow search algorithm. IEEE Access
**2020**, 8, 189891–189912. [Google Scholar] [CrossRef] - Oyelade, O.N.; Ezugwu, A.E. A bioinspired neural architecture search based convolutional neural network for breast cancer detection using histopathology images. Sci. Rep.
**2021**, 11, 1–28. [Google Scholar] [CrossRef] - Arman, S.E.; Deowan, S.A. IGWO-SS: Improved grey wolf optimization based on synaptic saliency for fast neural architecture search in computer vision. IEEE Access
**2022**, 10, 67851–67869. [Google Scholar] [CrossRef] - Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv
**2016**, arXiv:1611.01578. [Google Scholar] [CrossRef] - Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv
**2018**, arXiv:1806.09055. [Google Scholar] [CrossRef] - Yu, H.; Peng, H.; Huang, Y.; Fu, J.; Du, H.; Wang, L.; Ling, H. Cyclic differentiable architecture search. IEEE Trans. Pattern Anal. Mach. Intell.
**2022**, 45, 211–228. [Google Scholar] [CrossRef] - Xue, Y.; Qin, J. Partial connection based on channel attention for differentiable neural architecture search. IEEE Trans. Ind. Inform.
**2022**, 19, 6804–6813. [Google Scholar] [CrossRef] - Cai, Z.; Chen, L.; Liu, H.-L. EPC-DARTS: Efficient partial channel connection for differentiable architecture search. Neural Netw.
**2023**, 166, 344–353. [Google Scholar] [CrossRef] - Zhu, X.; Li, J.; Liu, Y.; Wang, W. Improving Differentiable Architecture Search via Self-Distillation. arXiv
**2023**, arXiv:2302.05629. [Google Scholar] [CrossRef] [PubMed] - Mihaljević, B.; Bielza, C.; Larrañaga, P. Bayesian networks for interpretable machine learning and optimization. Neurocomputing
**2021**, 456, 648–665. [Google Scholar] [CrossRef] - Karathanasopoulos, N.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P. Bayesian identification of the tendon fascicle’s structural composition using finite element models for helical geometries. Comput. Methods Appl. Mech. Eng.
**2017**, 313, 744–758. [Google Scholar] [CrossRef] - Chen, J.; Chen, M.; Wen, J.; He, L.; Liu, X. A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem. Mathematics
**2022**, 10, 3498. [Google Scholar] [CrossRef] - Ma, Z.; Yuan, X.; Han, S.; Sun, D.; Ma, Y. Improved chaotic particle swarm optimization algorithm with more symmetric distribution for numerical function optimization. Symmetry
**2019**, 11, 876. [Google Scholar] [CrossRef] - Gharehchopogh, F.S.; Maleki, I.; Dizaji, Z.A. Chaotic vortex search algorithm: Metaheuristic algorithm for feature selection. Evol. Intell.
**2022**, 15, 1777–1808. [Google Scholar] [CrossRef] - Behera, M.; Sarangi, A.; Mishra, D.; Mallick, P.K.; Shafi, J.; Srinivasu, P.N.; Ijaz, M.F. Automatic Data Clustering by Hybrid Enhanced Firefly and Particle Swarm Optimization Algorithms. Mathematics
**2022**, 10, 3532. [Google Scholar] [CrossRef] - Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput.
**1997**, 1, 67–82. [Google Scholar] [CrossRef] - Ang, K.M.; El-kenawy, E.-S.M.; Abdelhamid, A.A.; Ibrahim, A.; Alharbi, A.H.; Khafaga, D.S.; Tiang, S.S.; Lim, W.H. Optimal Design of Convolutional Neural Network Architectures Using Teaching–Learning-Based Optimization for Image Classification. Symmetry
**2022**, 14, 2323. [Google Scholar] [CrossRef] - Rao, R.V.; Savsani, V.J.; Vakharia, D.P. Teaching–learning-based optimization: A novel method for constrained mechanical design optimization problems. Comput.-Aided Des.
**2011**, 43, 303–315. [Google Scholar] [CrossRef] - Ang, K.M.; Lim, W.H.; Tiang, S.S.; Ang, C.K.; Natarajan, E.; Ahamed Khan, M. Optimal Training of Feedforward Neural Networks Using Teaching-Learning-Based Optimization with Modified Learning Phases. In Proceedings of the 12th National Technical Seminar on Unmanned System Technology 2020, Kuala Lumpur, Malaysia, 24–25 November 2022; pp. 867–887. [Google Scholar]
- Schaffer, J.D.; Caruana, R.A.; Eshelman, L.J. Using genetic search to exploit the emergent behavior of neural networks. Phys. D Nonlinear Phenom.
**1990**, 42, 244–248. [Google Scholar] [CrossRef] - Kitano, H. Empirical studies on the speed of convergence of neural network training using genetic algorithms. In Proceedings of the AAAI, Boston, MA, USA, 29 July–3 August 1990; 1990; pp. 789–795. [Google Scholar]
- Stanley, K.O.; Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput.
**2002**, 10, 99–127. [Google Scholar] [CrossRef] - Siebel, N.T.; Sommer, G. Evolutionary reinforcement learning of artificial neural networks. Int. J. Hybrid Intell. Syst.
**2007**, 4, 171–183. [Google Scholar] [CrossRef] - Stanley, K.O.; D’Ambrosio, D.B.; Gauci, J. A hypercube-based encoding for evolving large-scale neural networks. Artif. Life
**2009**, 15, 185–212. [Google Scholar] [CrossRef] - Banharnsakun, A. Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method. Int. J. Mach. Learn. Cybern.
**2019**, 10, 1301–1311. [Google Scholar] [CrossRef] - Zhu, W.; Yeh, W.; Chen, J.; Chen, D.; Li, A.; Lin, Y. Evolutionary convolutional neural networks using abc. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 156–162. [Google Scholar]
- Ozcan, T.; Basturk, A. Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition. Neural Comput. Appl.
**2019**, 31, 8955–8970. [Google Scholar] [CrossRef] - Dixit, U.; Mishra, A.; Shukla, A.; Tiwari, R. Texture classification using convolutional neural network optimized with whale optimization algorithm. SN Appl. Sci.
**2019**, 1, 1–11. [Google Scholar] [CrossRef] - Kylberg, G. Kylberg Texture Dataset v. 1.0; Centre for Image Analysis, Swedish University of Agricultural Sciences: Uppsala, Sweden, 2011. [Google Scholar]
- Brodatz, P. Textures: A Photographic Album for Artists and Designers; Dover Pub.: New York, NY, USA, 1966. [Google Scholar]
- Ojala, T.; Maenpaa, T.; Pietikainen, M.; Viertola, J.; Kyllonen, J.; Huovinen, S. Outex-new framework for empirical evaluation of texture analysis algorithms. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec, QC, Canada, 11–15 August 2002; pp. 701–706. [Google Scholar]
- Ratre, A. Stochastic gradient descent–whale optimization algorithm-based deep convolutional neural network to crowd emotion understanding. Comput. J.
**2020**, 63, 267–282. [Google Scholar] [CrossRef] - Murugan, R.; Goel, T.; Mirjalili, S.; Chakrabartty, D.K. WOANet: Whale optimized deep neural network for the classification of COVID-19 from radiography images. Biocybern. Biomed. Eng.
**2021**, 41, 1702–1718. [Google Scholar] [CrossRef] [PubMed] - Wen, L.; Gao, L.; Li, X.; Li, H. A new genetic algorithm based evolutionary neural architecture search for image classification. Swarm Evol. Comput.
**2022**, 75, 101191. [Google Scholar] [CrossRef] - Xue, Y.; Wang, Y.; Liang, J.; Slowik, A. A self-adaptive mutation neural architecture search algorithm based on blocks. IEEE Comput. Intell. Mag.
**2021**, 16, 67–78. [Google Scholar] [CrossRef] - He, C.; Tan, H.; Huang, S.; Cheng, R. Efficient evolutionary neural architecture search by modular inheritable crossover. Swarm Evol. Comput.
**2021**, 64, 100894. [Google Scholar] [CrossRef] - Xu, Y.; Ma, Y. Evolutionary neural architecture search combining multi-branch ConvNet and improved transformer. Sci. Rep.
**2023**, 13, 15791. [Google Scholar] [CrossRef] - Salih, S.Q. A new training method based on black hole algorithm for convolutional neural network. J. Southwest Jiaotong Univ.
**2019**, 54, 1–12. [Google Scholar] [CrossRef] - Llorella, F.R.; Azorín, J.M.; Patow, G. Black hole algorithm with convolutional neural networks for the creation of brain-computer interface based in visual perception and visual imagery. Neural Comput. Appl.
**2023**, 35, 5631–5641. [Google Scholar] [CrossRef] - Nguyen, T.; Nguyen, G.; Nguyen, B.M. EO-CNN: An enhanced CNN model trained by equilibrium optimization for traffic transportation prediction. Procedia Comput. Sci.
**2020**, 176, 800–809. [Google Scholar] [CrossRef] - Nandhini, S.; Ashokkumar, K. An automatic plant leaf disease identification using DenseNet-121 architecture with a mutation-based henry gas solubility optimization algorithm. Neural Comput. Appl.
**2022**, 34, 5513–5534. [Google Scholar] [CrossRef] - Pandey, A.; Jain, K. Plant leaf disease classification using deep attention residual network optimized by opposition-based symbiotic organisms search algorithm. Neural Comput. Appl.
**2022**, 34, 21049–21066. [Google Scholar] [CrossRef] - Junior, F.E.F.; Yen, G.G. Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol. Comput.
**2019**, 49, 62–74. [Google Scholar] [CrossRef] - Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv
**2012**, arXiv:1212.5701. [Google Scholar] [CrossRef] - Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 6–10 June 2010; pp. 249–256. [Google Scholar]
- Lydia, A.; Francis, S. Adagrad—An optimizer for stochastic gradient descent. Int. J. Inf. Comput. Sci.
**2019**, 6, 566–568. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Zeng, X.; Zhang, Z.; Wang, D. AdaMax Online Training for Speech Recognition. Available online: http://cslt.riit.tsinghua.edu.cn/mediawiki/images/d/df/Adamax_Online_Training_for_Speech_Recognition.pdf. (accessed on 3 June 2023).
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv
**2016**, arXiv:1609.04747. [Google Scholar] [CrossRef] - LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] - Larochelle, H.; Erhan, D.; Courville, A.; Bergstra, J.; Bengio, Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, NewYork, NY, USA, 20–24 June 2007; pp. 473–480. [Google Scholar]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv
**2017**, arXiv:1708.07747. [Google Scholar] - Bruna, J.; Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 1872–1886. [Google Scholar] [CrossRef] - Chan, T.-H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process.
**2015**, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed] - Rifai, S.; Vincent, P.; Muller, X.; Glorot, X.; Bengio, Y. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the International Conference on Machine Learning, Bellevue, WA, USA, 28 June 2011; pp. 833–840. [Google Scholar]
- Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput.
**2019**, 24, 394–407. [Google Scholar] [CrossRef] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef] - Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv
**2016**, arXiv:1602.07360. [Google Scholar] - Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput.
**2011**, 1, 3–18. [Google Scholar] [CrossRef] - Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv
**2014**, arXiv:1412.6806. [Google Scholar]

**Figure 7.**Visualization of the idea of competency-based learning introduced into the ETLOCBL-CNN’s modified teacher phase. Color dots refer to learners assigned to different groups.

**Figure 8.**Sample images of the datasets: (

**a**) MNIST, (

**b**) MNIST-RD, (

**c**) MNIST-RB, (

**d**) MNIST-BI, (

**e**) MNIST-RD + BI, (

**f**) Rectangles, (

**g**) Rectangles-I, (

**h**) Convex, and (

**i**) Fashion.

**Figure 9.**Test errors obtained by ETLBOCBL-CNN while solving the eight datasets: (

**a**) MNIST, (

**b**) MNIST-RD, (

**c**) MNIST-RB, (

**d**) MNIST-BI, (

**e**) MNIST-RD + BI, (

**f**) Rectangles, (

**g**) Rectangles-I, and (

**h**) Convex.

Section | Hyperparameter | Value |
---|---|---|

Convolution | Lower limit of convolutional layers, ${N}_{min}^{Conv}$ | 1 |

Upper limit of convolutional layers, ${N}_{max}^{Conv}$ | 3 | |

Lower limit of filter numbers, ${N}_{min}^{Fil}$ | 3 | |

Upper limit of filter numbers, ${N}_{max}^{Fil}$ | 256 | |

Lower limit of kernel size, ${S}_{min}^{Ker}$ | $3\times 3$ | |

Upper limit of kernel size, ${S}_{max}^{Ker}$ | 9 $\times 9$ | |

Pooling | Lower limit of pooling size, ${S}_{min}^{Pool}$ | 1 $\times 1$ |

Upper limit of pooling size, ${S}_{max}^{Pool}$ | $3\times 3$ | |

Lower limit of stride size, ${S}_{min}^{Str}$ | 1 $\times 1$ | |

Upper limit of stride size, ${S}_{max}^{Str}$ | 2 $\times 2$ | |

Fully connected | Lower limit of fully connected layer number, ${N}_{min}^{FC}$ | 1 |

Upper limit of fully connected layer number, ${N}_{max}^{FC}$ | 2 | |

Lower limit of neuron numbers, ${N}_{min}^{Neu}$ | 1 | |

Upper limit of neuron numbers, ${N}_{max}^{Neu}$ | 300 | |

Training | Lower limit of integer indices to select optimizer type, $L{H}_{min}^{Opt}$ | 1 |

Upper limit of integer indices to select optimizer type, $L{H}_{max}^{Opt}$ | 5 | |

Lower limit of integer indices to select learning rate, $L{H}_{min}^{LR}$ | 1 | |

Upper limit of integer indices to select learning rate, $L{H}_{max}^{LR}$ | 5 | |

Lower limit of integer indices to select initializer type, $L{H}_{min}^{Int}$ | 1 | |

Upper limit of integer indices to select initializer type, $L{H}_{max}^{Int}$ | 5 | |

Lower limit of integer indices to select L2-regularizer, $L{H}_{min}^{L2}$ | 1 | |

Upper limit of integer indices to select L2-regularizer, $L{H}_{max}^{L2}$ | 5 |

Integer Index | Types of Training Hyperparameters | |||
---|---|---|---|---|

$\mathbf{Optimizer}\mathbf{Type}\mathit{L}{\mathit{H}}^{\mathit{O}\mathit{p}\mathit{t}}$ | Learning Rate $\mathit{L}{\mathit{H}}^{\mathit{L}\mathit{R}}$ | Initializer Type $\mathit{L}{\mathit{H}}^{\mathit{I}\mathit{n}\mathit{t}}$ | L2-Regularizer $\mathit{L}{\mathit{H}}^{\mathit{L}2}$ | |

1 | Adadelta [63] | 0.0001 | Glorot Normal [64] | 0.001 |

2 | Adagrad [65] | 0.0005 | Glorot Uniform [64] | 0.005 |

3 | Adam [66] | 0.001 | He Normal [67] | 0.01 |

4 | Adamax [68] | 0.005 | He Uniform [67] | 0.05 |

5 | SGD [69] | 0.01 | Random Uniform | 0.1 |

Dataset | Total No. of Dataset | No. of Training Dataset | No. of Testing Dataset | Input Size | No. of Output Classes |
---|---|---|---|---|---|

MNIST | 70,000 | 60,000 | 10,000 | $28\times 28\times 1$ | 10 |

MNIST-RD | 62,000 | 12,000 | 50,000 | $28\times 28\times 1$ | 10 |

MNIST-RB | 62,000 | 12,000 | 50,000 | $28\times 28\times 1$ | 10 |

MNIST-BI | 62,000 | 12,000 | 50,000 | $28\times 28\times 1$ | 10 |

MNIST-RD + BI | 62,000 | 12,000 | 50,000 | $28\times 28\times 1$ | 10 |

Rectangles | 51,200 | 1200 | 50,000 | $28\times 28\times 1$ | 2 |

Rectangles-I | 62,000 | 12,000 | 50,000 | $28\times 28\times 1$ | 2 |

Convex | 58,000 | 8000 | 50,000 | $28\times 28\times 1$ | 2 |

Fashion | 70,000 | 60,000 | 10,000 | $28\times 28\times 1$ | 10 |

Parameter | Value |
---|---|

Maximum iteration number, ${T}^{max}$ | 10 |

Population size, N | 20 |

Dimension size, D | 23 |

Lower limit of convolutional layer numbers, ${N}_{min}^{Conv}$ | 1 |

Upper limit of convolutional layer numbers, ${N}_{max}^{Conv}$ | 3 |

Lower limit of filter number, ${N}_{min}^{Fil}$ | 3 |

Upper limit of filter number, ${N}_{max}^{Fil}$ | 256 |

Lower limit of kernel size, ${S}_{min}^{Ker}$ | $3\times 3$ |

Upper limit of kernel size, ${S}_{max}^{Ker}$ | $9\times 9$ |

Lower limit of pooling size, ${S}_{min}^{Pool}$ | $1\times 1$ |

Upper limit of pooling size, ${S}_{max}^{Pool}$ | $3\times 3$ |

Lower limit of stride size, ${S}_{min}^{Str}$ | $1\times 1$ |

Upper limit of stride size, ${S}_{max}^{Str}$ | $2\times 2$ |

Lower limit of of fully connected layer numbers, ${N}_{min}^{FC}$ | 1 |

Upper limit of fully connected layer numbers, ${N}_{max}^{FC}$ | 2 |

Lower limit of neuron numbers, ${N}_{min}^{Neu}$ | 1 |

Upper limit of neuron numbers, ${N}_{max}^{Neu}$ | 300 |

Lower limit of integer index to select optimizer type, $L{H}_{min}^{Opt}$ | 1 |

Upper limit of integer index to select optimizer type, $L{H}_{max}^{Opt}$ | 5 |

Lower limit of integer index to select learning rate, $L{H}_{min}^{LR}$ | 1 |

Upper limit of integer index to select learning rate, $L{H}_{max}^{LR}$ | 5 |

Lower limit of integer index to select initializer type, $L{H}_{min}^{Int}$ | 1 |

Upper limit of integer index to select initializer type, $L{H}_{max}^{Int}$ | 5 |

Lower limit of integer index to select L2-regularizer, $L{H}_{min}^{L2}$ | 1 |

Upper limit of integer index to select L2-regularizer, $L{H}_{max}^{L2}$ | 5 |

Inclusion of batch normalization | Yes |

Dropout rate | 0.5 |

Epoch number for the fitness evaluation of learner, ${\epsilon}^{train}$ | 1 |

Epoch number for the full training of the best learner returned, ${\epsilon}^{FT}$ | 100 |

**Table 5.**Classification accuracies obtained by ETLBOCBL-CNN and its peers when tackling the eight selected datasets.

Algorithm | MNIST | MNIST-RD | MNIST-RB | MNIST-BI | MNIST-RD + BI |
---|---|---|---|---|---|

ScatNet-2 | 98.73% (+) | 92.52% (+) | 87.70% (+) | 81.60% (+) | 49.52% (+) |

LDANet-2 | 98.95% (+) | 92.48% (+) | 93.19% (+) | 87.58% (+) | 61.46% (+) |

PCANet-2 | 98.60% (+) | 91.48% (+) | 93.15% (+) | 88.45% (+) | 64.14% (+) |

RandNet-2 | 98.75% (+) | 91.53% (+) | 86.53% (+) | 88.35% (+) | 56.31% (+) |

NNet | 95.31% (+) | 81.89% (+) | 79.96% (+) | 72.59% (+) | 37.84% (+) |

CAE-1 | 98.60% (+) | 95.48% (+) | 93.19% (+) | 87.58% (+) | 61.46% (+) |

CAE-2 | 97.52% (+) | 90.34% (+) | 89.10% (+) | 84.50% (+) | 54.77% (+) |

DBN-3 | 96.89% (+) | 89.70% (+) | 93.27% (+) | 83.69% (+) | 52.61% (+) |

SAA-3 | 96.54% (+) | 89.70% (+) | 88.72% (+) | 77.00% (+) | 48.07% (+) |

SVM + Poly | 96.31% (+) | 84.58% (+) | 83.38% (+) | 75.99% (+) | 43.59% (+) |

SVM + RBF | 96.97% (+) | 88.89% (+) | 85.42% (+) | 77.49% (+) | 44.82% (+) |

EvoCNN | 98.82% (+) | 94.78% (+) | 97.20% (+) | 95.47% (+) | 64.97% (+) |

psoCNN | 99.51% (+) | 94.56% (+) | 97.61% (+) | 96.87% (+) | 81.05% (+) |

ETLBOCBL-CNN (Best) | 99.72% | 96.67% | 98.28% | 97.22% | 83.45% |

ETLBOCBL-CNN (Mean) | 99.66% | 95.65% | 98.00% | 96.85% | 81.72% |

Algorithm | Rectangles | Rectangles-I | Convex | w/t/l | #BCA |

ScatNet-2 | 99.99% (=) | 91.98% (+) | 93.50% (+) | 7/1/0 | 1 |

LDANet-2 | 99.86% (+) | 83.80% (+) | 92.78% (+) | 8/0/0 | 0 |

PCANet-2 | 99.51% (+) | 86.61% (+) | 95.81% (+) | 8/0/0 | 0 |

RandNet-2 | 99.91% (+) | 83.00% (+) | 94.55% (+) | 8/0/0 | 0 |

NNet | 92.84% (+) | 66.80% (+) | 67.75% (+) | 8/0/0 | 0 |

CAE-1 | 99.86% (+) | 83.80% (+) | NA | 7/0/0 | 0 |

CAE-2 | 98.46% (+) | 78.00% (+) | NA | 7/0/0 | 0 |

DBN-3 | 97.39% (+) | 77.50% (+) | 81.37% (+) | 8/0/0 | 0 |

SAA-3 | 97.59% (+) | 75.95% (+) | 81.59% (+) | 8/0/0 | 0 |

SVM + Poly | 97.85% (+) | 75.95% (+) | 80.18% (+) | 8/0/0 | 0 |

SVM + RBF | 97.85% (+) | 75.96% (+) | 80.87% (+) | 8/0/0 | 0 |

EvoCNN | 99.99% (=) | 94.97% (+) | 95.18% (+) | 7/1/0 | 1 |

psoCNN | 99.93% (+) | 96.03% (+) | 97.74% (+) | 8/0/0 | 0 |

ETLBOCBL-CNN (Best) | 99.99% | 97.41% | 98.35% | NA | 8 |

ETLBOCBL-CNN (Mean) | 99.97% | 96.02% | 97.76% | NA | NA |

ETLBOCBL-CNN vs. | ${\mathit{R}}^{+}$ | ${\mathit{R}}^{-}$ | p Value | h Value |
---|---|---|---|---|

ScatNet-2 | 28.0 | 0.0 | $1.42\times {10}^{-2}$ | + |

LDANet-2 | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

PCANet-2 | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

RandNet-2 | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

NNet | 36.0 | 0.0 | $8.37\times {10}^{-3}$ | + |

DBN-3 | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

SAA-3 | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

SVM + Poly | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

SVM + RBF | 36.0 | 0.0 | $9.58\times {10}^{-3}$ | + |

EvoCNN | 28.0 | 0.0 | 1$.42\times {10}^{-3}$ | + |

psoCNN | 36.0 | 0.0 | $8.37\times {10}^{-3}$ | + |

Algorithm | Ranking | Chi-Square Statistic | p Value |
---|---|---|---|

ScatNet-2 | 5.8125 | 76.658654 | $0.00\times {10}^{0}$ |

LDANet-2 | 5.2500 | ||

PCANet-2 | 5.2500 | ||

RandNet-2 | 6.0000 | ||

NNet | 12.0000 | ||

DBN-3 | 7.9375 | ||

SAA-3 | 9.0625 | ||

SVM + Poly | 10.5625 | ||

SVM + RBF | 9.4375 | ||

EvoCNN | 3.0625 | ||

psoCNN | 2.5000 | ||

ETLBOCBL-CNN | 1.1250 |

ETLBOCBL-CNN vs. | z | Unadjusted p | Bonferroni–Dunn p | Holm p | Hochberg p |
---|---|---|---|---|---|

Nnet | $6.03\times {10}^{0}$ | $0.00\times {10}^{0}$ | $0.00\times {10}^{0}$ | $0.00\times {10}^{0}$ | $0.00\times {10}^{0}$ |

SVM + Poly | $5.23\times {10}^{0}$ | $0.00\times {10}^{0}$ | $2.00\times {10}^{-6}$ | $2.00\times {10}^{-6}$ | $2.00\times {10}^{-6}$ |

SVM + RBF | $4.61\times {10}^{0}$ | $4.00\times {10}^{-6}$ | $4.40\times {10}^{-5}$ | $3.60\times {10}^{-5}$ | $3.60\times {10}^{-5}$ |

SAA-3 | $4.40\times {10}^{0}$ | $1.10\times {10}^{-5}$ | $1.17\times {10}^{-4}$ | $8.50\times {10}^{-5}$ | $8.50\times {10}^{-5}$ |

DBN-3 | $3.78\times {10}^{0}$ | $1.58\times {10}^{-4}$ | $1.73\times {10}^{-3}$ | $1.10\times {10}^{-3}$ | $1.10\times {10}^{-3}$ |

RandNet-2 | $2.70\times {10}^{0}$ | $6.85\times {10}^{-3}$ | $7.53\times {10}^{-2}$ | $4.11\times {10}^{-2}$ | $4.11\times {10}^{-2}$ |

ScatNet-2 | $2.60\times {10}^{0}$ | $9.32\times {10}^{-3}$ | $1.02\times {10}^{-1}$ | $4.66\times {10}^{-2}$ | $4.66\times {10}^{-2}$ |

LDANet-2 | $2.29\times {10}^{0}$ | $2.21\times {10}^{-2}$ | $2.43\times {10}^{-1}$ | $8.85\times {10}^{-2}$ | $6.64\times {10}^{-2}$ |

PCANet-2 | $2.29\times {10}^{0}$ | $2.21\times {10}^{-2}$ | $2.43\times {10}^{-1}$ | $8.85\times {10}^{-2}$ | $6.64\times {10}^{-2}$ |

EvoCNN | $1.07\times {10}^{0}$ | $2.82\times {10}^{-1}$ | $3.11\times {10}^{0}$ | $5.65\times {10}^{-1}$ | $4.66\times {10}^{-1}$ |

psoCNN | $7.63\times {10}^{-1}$ | $4.66\times {10}^{-1}$ | $4.90\times {10}^{0}$ | $5.65\times {10}^{-1}$ | $4.66\times {10}^{-1}$ |

**Table 9.**Performance evaluation of the proposed ETLBOCBL-CNN alongside its peer algorithms when tackling the MNIST-Fashion dataset.

Algorithm | Classification Accuracy | No. of Trainable Parameters |
---|---|---|

Human Performance ^{1} | 83.50% | NA |

2C1P2F + Dropout ^{1} | 91.60% | $3.27\times {10}^{6}$ |

2C1P ^{1} | 92.50% | $1.00\times {10}^{5}$ |

3C2F ^{1} | 90.70% | NA |

3C1P2F + Dropout ^{1} | 92.60% | $7.14\times {10}^{6}$ |

GRU + SVM ^{1} | 88.80% | NA |

GRU + SVM + Dropout | 89.70% | NA |

HOG + SVM ^{1} | 92.60% | NA |

AlexNet [77] | 89.90% | $6.00\times {10}^{7}$ |

SqueezeNet-200 [78] | 90.00% | $5.00\times {10}^{5}$ |

MLP 256-128-64 ^{1} | 90.00% | $4.10\times {10}^{4}$ |

MLP 256-128-100 ^{1} | 88.33% | $3.00\times {10}^{6}$ |

EvoCNN [76] | 94.53% | $6.68\times {10}^{6}$ |

psoCNN [62] | 92.81% | $2.58\times {10}^{6}$ |

ETLBOCBL-CNN (Best) | 93.70% | $8.43\times {10}^{5}$ |

ETLBOCBL-CNN (Mean) | 93.12% | $1.95\times {10}^{6}$ |

^{1}https://github.com/zalandoresearch/fashion-mnist (accessed on 3 June 2023).

**Table 10.**Optimal network and learning hyperparameters derived by ETLBOCBL-CNN to solve each selected image dataset with the highest classification accuracy.

Dataset | Layers | Network Hyperparameters | Learning Hyperparameters |
---|---|---|---|

MNIST | Convolutional | ${N}_{1}^{Fil}=231$, ${S}_{1}^{Ker}=9\times 9$ | $L{H}^{Opt}=3$ (‘Adam’) |

Maximum Pooling | ${S}_{1}^{Pool}=2\times 2$, ${S}_{1}^{Str}=1\times 1$ | $L{H}^{LR}=3$ (‘0.001’) | |

Convolutional | ${N}_{2}^{Fil}=101$, ${S}_{2}^{Ker}=9\times 9$ | $L{H}^{Int}=1$ (‘Glorot Normal’) | |

Convolutional | ${N}_{3}^{Fil}=97$, ${S}_{3}^{Ker}=9\times 9$ | $L{H}^{L2}=1$ (‘0.001’) | |

Fully Connected | ${N}_{1}^{Neu}=10$ | ||

MNIST-RD | Convolutional | ${N}_{1}^{Fil}=96$, ${S}_{1}^{Ker}=9\times 9$ | $L{H}^{Opt}=3$ (‘Adam’) |

Convolutional | ${N}_{2}^{Fil}=47$, ${S}_{2}^{Ker}=9\times 9$ | $L{H}^{LR}=3$ (‘0.001’) | |

Convolutional | ${N}_{3}^{Fil}=125$, ${S}_{3}^{Ker}=9\times 9$ | $L{H}^{Int}=1$ (‘Glorot Normal’) | |

Average Pooling | ${S}_{3}^{Pool}=3\times 3$, ${S}_{3}^{Str}=1\times 1$ | $L{H}^{L2}=2$ (‘0.005’) | |

Fully Connected | ${N}_{1}^{Neu}=10$ | ||

MNIST-RB | Convolutional | ${N}_{1}^{Fil}=47$, ${S}_{1}^{Ker}=3\times 3$ | $L{H}^{Opt}=3$ (‘Adam’) |

Convolutional | ${N}_{2}^{Fil}=112$, ${S}_{2}^{Ker}=9\times 9$ | $L{H}^{LR}=4$ (‘0.005’) | |

Convolutional | ${N}_{3}^{Fil}=65$, ${S}_{3}^{Ker}=9\times 9$ | $L{H}^{Int}=2$ (‘Glorot Uniform’) | |

Average Pooling | ${S}_{3}^{Pool}=3\times 3$, ${S}_{3}^{Str}=1\times 1$ | $L{H}^{L2}=3$ (‘0.01’) | |

Fully Connected | ${N}_{1}^{Neu}=10$ | ||

MNIST-BI | Convolutional | ${N}_{1}^{Fil}=76$, ${S}_{1}^{Ker}=3\times 3$ | $L{H}^{Opt}=3$ (‘Adam’) |

Convolutional | ${N}_{2}^{Fil}=137$, ${S}_{2}^{Ker}=6\times 6$ | $L{H}^{LR}=4$ (‘0.005’) | |

Convolutional | ${N}_{3}^{Fil}=181$, ${S}_{3}^{Ker}=7\times 7$ | $L{H}^{Int}=2$ (‘Glorot Uniform’) | |

Maximum Pooling | ${S}_{3}^{Pool}=3\times 3$, ${S}_{3}^{Str}=2\times 2$ | $L{H}^{L2}=3$ (‘0.01’) | |

Fully Connected | ${N}_{1}^{Neu}=10$ | ||

MNIST-RD + BI | Convolutional | ${N}_{1}^{Fil}=48$, ${S}_{1}^{Ker}=5\times 5$ | $L{H}^{Opt}=3$ (‘Adam’) |

Convolutional | ${N}_{2}^{Fil}=63$, ${S}_{2}^{Ker}=7\times 7$ | $L{H}^{LR}=3$ (‘0.001’) | |

Maximum Pooling | ${S}_{2}^{Pool}=3\times 3$, ${S}_{2}^{Str}=1\times 1$ | $L{H}^{Int}=2$ (‘Glorot Uniform’) | |

Convolutional | ${N}_{3}^{Fil}=108$, ${S}_{3}^{Ker}=8\times 8$ | $L{H}^{L2}=3$ (‘0.01’) | |

Average Pooling | ${S}_{3}^{Pool}=2\times 2$, ${S}_{3}^{Str}=1\times 1$ | ||

Fully Connected | ${N}_{1}^{Neu}=10$ | ||

Rectangles | Convolutional | ${N}_{1}^{Fil}=234$, ${S}_{1}^{Ker}=9\times 9$ | $L{H}^{Opt}=3$ (‘Adam’) |

Convolutional | ${N}_{2}^{Fil}=89$, ${S}_{2}^{Ker}=9\times 9$ | $L{H}^{LR}=2$ (‘0.0005’) | |

Maximum Pooling | ${S}_{2}^{Pool}=3\times 3$, ${S}_{2}^{Str}=1\times 1$ | $L{H}^{Int}=2$ (‘Glorot Uniform’) | |

Convolutional | ${N}_{3}^{Fil}=85$, ${S}_{3}^{Ker}=9\times 9$ | $L{H}^{L2}=1$ (‘0.001’) | |

Average Pooling | ${S}_{3}^{Pool}=3\times 3$, ${S}_{3}^{Str}=2\times 2$ | ||

Fully Connected | ${N}_{1}^{Neu}=2$ | ||

Rectangles-I | Convolutional | ${N}_{1}^{Fil}=74$, ${S}_{1}^{Ker}=3\times 3$ | $L{H}^{Opt}=4$ (‘Adamax’) |

Maximum Pooling | ${S}_{1}^{Pool}=2\times 2$, ${S}_{1}^{Str}=1\times 1$ | $L{H}^{LR}=3$ (‘0.001’) | |

Convolutional | ${N}_{2}^{Fil}=161$, ${S}_{2}^{Ker}=9\times 9$ | $L{H}^{Int}=1$ (‘Glorot Normal’) | |

Average Pooling | ${S}_{2}^{Pool}=1\times 1$, ${S}_{2}^{Str}=2\times 2$ | $L{H}^{L2}=4$ (‘0.05’) | |

Convolutional | ${N}_{3}^{Fil}=207$, ${S}_{3}^{Ker}=9\times 9$ | ||

Maximum Pooling | ${S}_{3}^{Pool}=3\times 3$, ${S}_{3}^{Str}=1\times 1$ | ||

Fully Connected | ${N}_{1}^{Neu}=2$ | ||

Convex | Convolutional | ${N}_{1}^{Fil}=136$, ${S}_{1}^{Ker}=9\times 9$ | $L{H}^{Opt}=4$ (‘Adamax’) |

Maximum Pooling | ${S}_{1}^{Pool}=3\times 3$, ${S}_{1}^{Str}=1\times 1$ | $L{H}^{LR}=3$ (‘0.001’) | |

Convolutional | ${N}_{2}^{Fil}=118$,${S}_{2}^{Ker}=9\times 9$ | $L{H}^{Int}=1$ (‘Glorot Normal’) | |

Convolutional | ${N}_{3}^{Fil}=197$,${S}_{3}^{Ker}=9\times 9$ | $L{H}^{L2}=3$ (‘0.01’) | |

Fully Connected | ${N}_{1}^{Neu}=2$ | ||

MNIST-Fashion | Convolutional | ${N}_{1}^{Fil}=164$, ${S}_{1}^{Ker}=3\times 3$ | $L{H}^{Opt}=3$ (‘Adam’) |

Maximum Pooling | ${S}_{1}^{Pool}=2\times 2$, ${S}_{1}^{Str}=1\times 1$ | $L{H}^{LR}=3$ (‘0.001’) | |

Convolutional | ${N}_{2}^{Fil}=96$, ${S}_{2}^{Ker}=3\times 3$ | $L{H}^{Int}=3$ (‘He Normal’) | |

Fully Connected | ${N}_{1}^{Neu}=10$ | $L{H}^{L2}=1$ (‘0.001’) |

Dataset | Metric | TLBO-CNN | ETLBOCBL-CNN Variants | |||
---|---|---|---|---|---|---|

v1 | v2 | v3 | Complete | |||

MNIST | Accuracy | 98.54% | 98.66% | 98.74% | 99.01% | 99.72% |

Complexity | 12.10 M | 10.68 M | 8.94 M | 4.71 M | 3.41 M | |

MNIST-RD | Accuracy | 94.66% | 96.34% | 96.46% | 96.58% | 96.67% |

Complexity | 10.10 M | 3.92 M | 2.34 M | 2.19 M | 1.67 M | |

MNIST-RB | Accuracy | 96.91% | 97.90% | 97.92% | 98.04% | 98.28% |

Complexity | 7.23 M | 3.85 M | 6.16 M | 5.67 M | 1.47 M | |

MNIST-BI | Accuracy | 95.53% | 96.34% | 96.37% | 97.10% | 97.22% |

Complexity | 5.02 M | 2.07 M | 2.45 M | 1.98 M | 1.90 M | |

MNIST-RD + BI | Accuracy | 77.58% | 78.19% | 82.20% | 82.74% | 83.45% |

Complexity | 3.71 M | 6.11 M | 4.95 M | 2.22 M | 1.26 M | |

Rectangles | Accuracy | 99.68% | 99.71% | 99.79% | 99.90% | 99.99% |

Complexity | 12.60 M | 6.24 M | 11.51 M | 2.76 M | 2.34 M | |

Rectangles-I | Accuracy | 95.71% | 97.24% | 97.36% | 97.37% | 97.41% |

Complexity | 6.63 M | 2.02 M | 6.05 M | 3.47 M | 5.51 M | |

Convex | Accuracy | 95.20% | 97.12% | 97.55% | 97.71% | 98.35% |

Complexity | 1.54 M | 3.55 M | 2.59 M | 1.51 M | 1.46 M | |

MNIST-Fashion | Accuracy | 91.89% | 92.91% | 91.99% | 93.12% | 93.70% |

Complexity | 4.31 M | 3.12 M | 3.44 M | 2.97 M | 0.84 M |

Datasets | Computational Time (s) |
---|---|

MNIST | 5945.89 |

MNIST-RD | 2851.13 |

MNIST-RB | 3087.26 |

MNIST-BI | 3317.30 |

MNIST-RD + BI | 2735.13 |

Rectangles | 1013.88 |

Rectangles-I | 4132.82 |

Convex | 1380.63 |

MNIST-Fashion | 14,227.21 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ang, K.M.; Lim, W.H.; Tiang, S.S.; Sharma, A.; Eid, M.M.; Tawfeek, S.M.; Khafaga, D.S.; Alharbi, A.H.; Abdelhamid, A.A.
Optimizing Image Classification: Automated Deep Learning Architecture Crafting with Network and Learning Hyperparameter Tuning. *Biomimetics* **2023**, *8*, 525.
https://doi.org/10.3390/biomimetics8070525

**AMA Style**

Ang KM, Lim WH, Tiang SS, Sharma A, Eid MM, Tawfeek SM, Khafaga DS, Alharbi AH, Abdelhamid AA.
Optimizing Image Classification: Automated Deep Learning Architecture Crafting with Network and Learning Hyperparameter Tuning. *Biomimetics*. 2023; 8(7):525.
https://doi.org/10.3390/biomimetics8070525

**Chicago/Turabian Style**

Ang, Koon Meng, Wei Hong Lim, Sew Sun Tiang, Abhishek Sharma, Marwa M. Eid, Sayed M. Tawfeek, Doaa Sami Khafaga, Amal H. Alharbi, and Abdelaziz A. Abdelhamid.
2023. "Optimizing Image Classification: Automated Deep Learning Architecture Crafting with Network and Learning Hyperparameter Tuning" *Biomimetics* 8, no. 7: 525.
https://doi.org/10.3390/biomimetics8070525