Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases

Franco-Gaona, Erick; Avila-Garcia, Maria Susana; Cruz-Aceves, Ivan

doi:10.3390/math13040605

Open AccessArticle

Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases

by

Erick Franco-Gaona

¹,

Maria Susana Avila-Garcia

¹

and

Ivan Cruz-Aceves

^2,*

¹

Departamento de Estudios Multidisciplinarios, División de Ingenierías, Campus Irapuato-Salamanca Universidad de Guanajuato, Av. Universidad S/N, Yuriria 38944, Guanajuato, Mexico

²

SECIHTI-Centro de investigación en Matemáticas (CIMAT), Valenciana 36023, Guanajuato, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(4), 605; https://doi.org/10.3390/math13040605

Submission received: 18 January 2025 / Revised: 6 February 2025 / Accepted: 7 February 2025 / Published: 12 February 2025

(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Convolutional neural networks (CNNs) are widely used for image classification; however, setting the appropriate hyperparameters before training is subjective and time consuming, and the search space is not properly explored. This paper presents a novel method for the automatic neural architecture search based on an estimation of distribution algorithm (EDA) for binary classification problems. The hyperparameters were coded in binary form due to the nature of the metaheuristics used in the automatic search stage of CNN architectures which was performed using the Boltzmann Univariate Marginal Distribution algorithm (BUMDA) chosen by statistical comparison between four metaheuristics to explore the search space, whose computational complexity is O(

2^{29}

). Moreover, the proposed method is compared with multiple state-of-the-art methods on five databases, testing its efficiency in terms of accuracy and F1-score. In the experimental results, the proposed method achieved an F1-score of 97.2%, 98.73%, 97.23%, 98.36%, and 98.7% in its best evaluation, better results than the literature. Finally, the computational time of the proposed method for the test set was ≈0.6 s, 1 s, 0.7 s, 0.5 s, and 0.1 s, respectively.

Keywords:

Boltzmann Univariate Marginal Distribution; classification; convolutional neural network; estimation of distribution algorithms; neural architecture search

MSC:

68T07; 68T20; 68T27

1. Introduction

Convolutional neural networks (CNNs) have radically transformed the field of computer vision, becoming the primary tool for the task of image classification [1,2]. This popularity is not fortuitous, but the result of unique characteristics that make them extremely efficient and effective for processing data in grid form through key features such as convolutions, which extract local patterns in a hierarchical form [3]. In addition, its structure takes advantage of spatial relationships in the data. Traditional methods using machine learning techniques such as support vector machines (SVMs) or K-Nearest Neighbors (KNNs) need descriptors such as Histogram of Oriented Gradients (HOGs), Speeded-Up Robust Features (SURFs), and Scale-Invariant Feature Transform (SIFT) to extract features from images [4,5,6], carefully designed for each type of problem, which is expensive. Unlike traditional methods, which require the manual design of features in the images, CNNs learn these features directly from the data during the training process. The first layers of the network detect basic patterns, such as edges and lines, while deeper layers combine these patterns into more complex representations [7]. This allows CNNs to adapt to a wide range of tasks and images, without the need for human intervention to define which features are relevant. CNNs consist of basic layers such as convolutional layers that are responsible for extracting important features through the convolution operation. On the other hand, pooling layers reduce the size of the feature maps generated by the convolution layers. Finally, the fully connected layers are responsible for classifying the features found.

However, due to the need to adjust multiple architectural aspects to optimize training, CNNs have a considerable amount of hyperparameters. The hyperparameters are parameters that are configured before the training process and affect the architecture and performance of the model [8]. Some examples of hyperparameters are the number of convolutional layers, size of convolution filters, number of pooling layers, etc. The number of hyperparameters available to modify generates a research question for this work: what values should the hyperparameters have to obtain the best possible result in image classification? The response to this question is complex because it is possible to consider the architecture of a neural network as a combinatorial problem that depends on how many hyperparameters are considered to define the number of possible solutions.

To avoid this combinatorial problem, some authors use pretrained networks such as VGG16 [9] and VGG19 [10]; however, it is not always possible to have an extensive database, and the deeper a CNN is, the more likely it is that the model will be over-fitted. Basically, a model that suffers from overfitting performs exceptionally well on the training set, but its performance drops significantly when evaluated with test data. It is common to try to find the best possible architecture manually, i.e., by trying different architectures and conserving the best one found. This method is very time consuming and requires a lot of human effort [11].

Therefore, it is important to have a method that automatically generates CNN architectures and finds the best solution. Neural Architecture Search (NAS) is a branch of machine learning and artificial intelligence aimed at automating the design of neural network architectures [12]. Its primary goal is to discover highly effective network structures for specific tasks with minimal human intervention [13]. NAS uses optimization techniques to automatically explore the search space and evaluate different neural network architectures to deliver the best solution found based on a chosen evaluation metric. These strategies include random search, evolutionary algorithms, and gradient-based optimization techniques, etc.

In the literature, a convention is reached for NAS in terms of search space and evaluation strategy. However, the search strategy is approached in different ways by different authors. For instance, in the works [14,15,16,17] the authors use Bayesian optimization for classification of medical images of different improvements over the state of the art. On the other hand, using evolutionary computation and genetic algorithms (GAs) in the works [18,19,20], due to their versatility in multi-purpose databases, authors concluded that these algorithms are effective in finding architectures with better results than the literature. Proceeding with the exploration of population-based algorithms as discussed in the literature [21,22,23,24], the particle swarm optimization algorithm (PSA) is used in multi-purpose databases demonstrating efficient results. The choice of technique is influenced by the author’s preferences and the computational hardware available. The advantage of using population algorithms such as GAs is that they are highly parallelizable to reduce the execution time of the techniques. The key contributions of this study include the following:

A novel method to generate automatically convolutional neural networks architectures for binary classification on image databases using an estimation of distribution algorithm.
The present method was tested on five databases with different sizes obtaining an F1-score of 98.2%, 98.7%, 97.2%, 98.4%, and 98.7% for each one, better results than the literature.
Comparative analysis of different metaheuristics in NAS demonstrating the effectiveness of the Boltzmann Univariate Marginal Distribution (BUMDA) optimization method in the field of NAS.

In this work, the following sections are presented: Section 2 shows the proposed methods for the search of architectures. In Section 3, the databases used in this work are presented in detail. Section 4 shows the tests performed and the results obtained from BUMDA with the compared methods. Finally, in Section 5, the work is concluded and future work for this research is presented.

2. Methods

This section presents the general method on which the proposed method is based and the optimization technique selected for the automatic search of architectures. Finally, the proposal for this research is described.

2.1. Neural Architecture Search

NAS has evolved along with advances in deep neural networks (DNNs). The effectiveness of a deep learning model for a specific task is highly dependent on the complexities of its network architecture. To address this challenge, it is necessary to automate the design process using machine learning, which is the main goal of NAS [25]. The steps of NAS are described below, and the general methodology is shown in Figure 1.

Search space: A search space is defined that describes all possible architectures that could be evaluated. This space includes network hyperparameters such as the number of layers, the type of layers (convolutional, recurrent, fully connected, etc.), the connectivity between layers, the learning ratio, among others.
Architecture generation: Algorithms are used to automatically generate new architectures within the search space. These architectures can be generated randomly or using more sophisticated approaches, such as metaheuristics.
Performance evaluation: Each generated architecture is evaluated on a training dataset using some performance criterion, such as F1-score in the classification task.
Model update: Based on the evaluation results, the automatic search model adjusts its parameters to improve the generation of architectures in subsequent iterations. This process is repeated several times to continuously refine the model performance.
Selection of the best architecture: After a given number of iterations, the automatic search model selects the architecture that has demonstrated the best performance according to the specified criteria.

NAS has three basic concepts: the search space, the search strategy, and the evaluation strategy. The search space defines the possible potential architectures that can be theoretically represented. Integrating prior knowledge about the typical properties of architectures suitable for a given task can reduce the search space and speed up the exploration process. The search strategy is based on the approach to navigating the search space, addressing the delicate balance between exploration and exploitation. The goal is to quickly identify high-performance architectures, while avoiding premature convergence to a zone of local optima. Hence, use is made of metaheuristics, as they inherently possess the necessary capabilities to effectively achieve this balance. For performance estimation, any CNN evaluation metric such as accuracy or F1-score can be used. This implies that each performance estimation involves training a CNN architecture and computing its performance by means of the metrics.

2.2. Boltzmann Univariate Marginal Distribution

Soft computing constitutes a subfield of artificial intelligence that is distinguished by its reliance on uncertainty management, imprecision, and approximation to address real-world problems [26]. Unlike hard computing, which employs precise methods, soft computing encompasses techniques that can effectively handle ambiguity [27]. Metaheuristics are closely associated with the soft computing domain because of their characteristic of not guaranteeing optimal solutions [28]. However, they excel in discovering feasible solutions in a reasonable amount of time, especially when faced with complex and high-dimensional problems. One of the characteristics of metaheuristics is their potential to prolong execution times. Consequently, it is crucial to discern the appropriate context for their application. These techniques are often used to address non-polynomial problems (NPs), i.e., those in which, given a proposed solution, verification can be performed in polynomial time, but the determination of the optimal solution is intractable in polynomial time, as exemplified by the NAS problem.

According to several works [29,30,31], metaheuristic methods represent a primary domain within stochastic optimization, employing varying degrees of randomness to discover solutions that are as optimal as possible for challenging optimization problems. In designing a metaheuristic, two features must be considered in the techniques: exploration of the search space to diversify and exploitation of the best solutions to intensify [32]. In exploitation, promising regions continue to be explored with the prospect of finding better solutions. On the other hand, exploration causes unexplored regions to be visited to rule out whether promising solutions exist there. Population metaheuristics are optimization algorithms that operate on a population of candidate solutions, rather than working on a single solution at a time, as trajectory metaheuristics do. Some techniques are based on evolutionary computation, such as genetic algorithms, and others on probabilistic methods known as estimation of distribution algorithms (EDAs). In an EDA, the statistical distribution of promising solutions is modeled to generate new solutions. These algorithms learn a probabilistic model that describes the most suitable solutions.

In the present research, as part of the EDAs, the Boltzmann Univariate Marginal Distribution (BUMDA) obtained the best results compared to iterated local search (ILS), simulated annealing (SA), and genetic algorithms (GAs). BUMDA in particular uses a Normal-Gaussian model to approximate the Boltzmann distribution, so the formulas for calculating the mean and variance parameters of the Gaussian model are derived from the analytical minimization of the Kullback–Leibler divergence [33]. The Gaussian distribution obtains a better bias for intensively sampling the most promising regions than simply using the maximum likelihood estimator of the selected set. Algorithm 1 shows the BUMDA pseudocode.

Algorithm 1. Pseudocode of BUMDA.
	Input: Population size N, minimum variance allowed
	Output: Best individual found I*
1	$Uniformly generate the initial population P_{0}$ $, set t = 0$ ;
2	$while v > m i n v a r$ do
3	$t \leftarrow t + 1;$
4	Evaluate and truncate the population;
5	$Compute the approximation to μ$ $and v$ by using the selected set
6	$μ \approx \frac{\sum_{i = 1}^{n s e l e c t e d} x_{i} \bar{g} (x_{i})}{\sum_{i = 1}^{n s e l e c t e d} \bar{g} (x_{i})}$ $, v \approx \frac{\sum_{i = 1}^{n s e l e c t e d} \bar{g} (x_{i}) {(x_{i} - μ)}^{2}}{1 + \sum_{i = 1}^{n s e l e c t e d} \bar{g} (x_{i})};$
7	$Generate n s a m p l e - 1$ $from the new model Q (x, t)$ and insert the elite individual;
8	return best individual found I*;

Therefore, BUMDA has the following features:

BUMDA converges to best approximation.
The variance tends to 0 for a large number of generations.
BUMDA has a low number of parameters required for tuning.
The average quickly shifts to the region where the best solutions are found.

2.3. Proposed Method

First, the search space was defined with binary encoding in order to allow the metaheuristics to work naturally with the representation of the architectures. For the search space, in this work hyperparameters are selected based on the literature [15,18,21] to build the architectures, as shown in Table 1. The hyperparameters were encoded in a binary array of size 29 that contains each of them, and the range of values depends on the length of each hyperparameter; in consequence, the complexity of the problem is

{O (2}^{29})

. For instance, if the hyperparameter has only 2 options, a bit is used where 0 represents the first option and 1 the second. If the hyperparameter can have a range of values, that range is explicitly defined by the bit-to-decimal conversion.

Following the workflow in Figure 1, four metaheuristics were tested defining as objective function maximizing the F1-score as metrics to assess the effectiveness of the network during the optimization process [34,35,36]. The metrics typically employed for the assessment of CNNs are presented in Equations (1)–(4). The F1-score is used in this work because it is the harmonic mean between precision and recall and is important because it penalizes both false positive and false negative errors.

Accuracy refers to the proportion of correct predictions over the number of observations.

Accuracy = \frac{TP + TN}{TP + FP + FN + TN}

(1)

Precision is calculated by dividing the number of correct predictions by the total number of predictions. This metric evaluates the accuracy of the correct data and compares it to the overall data returned.

Precision = \frac{TP}{TP + FP}

(2)

The recall metric reports how much the machine learning model can identify.

Recall = \frac{TP}{TP + FN}

(3)

F-Measure is calculated by taking the harmonic mean between precision and recall.

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(4)

The elite architecture was retained for each technique and database considering the best F1-score in training and finally tested on a test set validating the efficient performance.

3. Databases for Automatic Binary Classification

This section presents the databases to test the method and define the best one for each one of them. Table 2 shows the general descriptions of each database used in this work. Subsequently, examples of database images are shown. These images were placed in their original scales, so some of them can be seen in low resolution.

3.1. Information Elements in Digital Documents Databases

The presented databases called Marmot are designed for the detection of tables and equations in digital documents; however, due to their binary nature, they were used for classification in this research.

3.1.1. Tables

The Marmot database for tables classification [37] contains 2000 PNG images extracted from PDF documents collected from the following:

The Founder Apabi digital library, which contains PDF research documents in Chinese. A total of 120 e-books from different subjects were extracted.
1500 conference and journal articles published in English and Chinese obtained from 1970 to 2011.

The database includes both “positive” and “negative” instances, representing the presence and absence of tables, respectively, in an equal 1:1 proportion. There are 1000 pages that have at least one table, while the other 1000 pages do not have tables but may have page components that resemble tables, such as arrays and figures. From the images, patches of the tables were extracted forming a dataset of 2752 images in total of tables and non-tables. The resulting patches are presented in Figure 2.

3.1.2. Equations

On the other hand, from the database for equation classification [38], 400 PDFs were collected from which patches were extracted, obtaining 777 positive and 777 negative images, resulting a total of 1554 images. The image patches are shown in Figure 3.

3.2. Medical Image Databases

The following databases were taken from the literature, two of them for binary classification of coronary stenosis and the third for classification of pneumonia with real images. The authors indicate that the images were classified by experts.

3.2.1. Coronary Stenosis from Antczak and Liberadzki

The first stenosis database obtained from the natural stenosis database of Antczak and Liberadzki [39] contains 125 positives of 32 × 32 pixels patches; the negatives had to be trimmed to the same amount. Figure 4 illustrates samples of patches with positive and negative natural patches.

3.2.2. Coronary Stenosis608

The second stenosis database, proposed in [40], was composed of 608 natural patches of 64 × 64 pixels having 304 positive and negative. The database was proposed for the first time in this work and was validated by specialists. Figure 5 illustrates the natural patches.

3.2.3. Pneumonia

The database presented by Melendez et al. [41] consists of a total of 5856 anteroposterior chest radiographs of pediatric patients between 1 and 5 years of age (see Figure 6). All chest radiographs were performed as part of the patients’ routine medical care. The database is divided between images of healthy lungs and those with pneumonia.

4. Results and Discussion

In this section, we emphasize the selection of the selected metaheuristic for the proposed method with a statistical analysis and compare it with other techniques in the literature showing superior results with an architecture found automatically by the method.

The experiments were conducted using a server equipped with robust specifications: 128 GB of RAM, an Intel Xeon Silver 4214 processor, and an NVIDIA Titan RTX 24 GB graphics card. This server compatibility and powerful tools that helped the execution of the population algorithms, such as the Parallel Computing Toolbox, allowed us to leverage the cores and finish the process more efficiently. In addition, MATLAB 2024b integrates with hardware such as GPUs (NVIDIA) to accelerate training. The construction of CNNs from scratch during algorithms execution was facilitated by the Deep Learning Toolbox.

Comparative Analysis of Metaheuristics

Some of the selected techniques have been previously tested in NAS as described in the literature review. The iterated local search (ILS), simulated annealing (SA), and genetic algorithms (GAs) were chosen because of their effectiveness in combinatorial problems and low computational cost. In addition, the algorithms work naturally with binary coding. The objective function of each algorithm was set as the F1-score, in order to maximize its value. Table 3 shows an overview of the parameters of each algorithm, ensuring the same number of iterations to standardize all processes. A total of 200 function evaluations were considered as it was proven to be sufficient to find an efficient solution due to the convergence of the algorithms.

The algorithms used for NAS were run 30 times for each database with the same number of function evaluations. Table 3 shows the minimum, maximum, mean, median, and standard deviation statistical results for the F1 score. For the experiments, the database was taken and divided into 80% for training, 10% for validation, and 10% for testing. The results shown in Table 4 are from the training set.

As can be seen from the results in Table 4, the technique with the best F1-score was BUMDA, which also has the lowest standard deviation. As mentioned, the techniques were tested 30 times on each database. Figure 7 shows the best training of BUMDA on each of them, observing the evolution of the elite solution over 20 generations. A low mean in the F1-score statistical data indicates that the algorithm on average did not find better solutions than the other algorithms. Probably, some algorithms with a low mean were trapped in local optima. On the other hand, Table 5, Table 6, Table 7, Table 8 and Table 9 show a comparison of the best result (BUMDA) with the best results in the literature using the same databases. The methods described in the literature review use techniques that, according to the authors, need more computational weight due to the different ways of testing performance. The execution times of the reviewed papers are not provided; moreover, the acronym NP stands for not provided by the author.

In Table 5, most of the works reported in the literature for classification and detection of tables use deformable convolutional networks or some of their variants. All reported architectures were proposed empirically by testing different configurations. The proposed method overcame literature, generating the architecture automatically. In this case, the authors provide three possible metrics to evaluate the models.

Table 5. Comparison of the best results in the literature for the Marmot database for tables [37] with the proposed method in testing set.

Models	Accuracy	Precision	Recall	F1-Score
DeCNT [42]	NP	84.9	94.6%	89.5%
CDeC-Net [43]	NP	97.5%	93%	95.2%
HybridTabNet [44]	NP	96.2%	96.1%	95.6%
CasTabDetectorRS [45]	NP	95.2%	96.5%	95.8%
DCTable [46]	NP	96.9%	97.1%	96.9%
Proposed	97.4%	97%	97.5%	97.2%

In Table 6, authors used the Marmot database [38] for classification of equations in digital documents using deformable convolutional networks and variants. The proposals were empirical, having efficiencies that were overcome by the proposed method.

Table 6. Comparison of the best results in the literature for the Marmot database for equations [38] with the proposed method in testing set.

Models	Accuracy	Precision	Recall	F1-Score
WAP method [47]	93%	NP	NP	NP
Cascade Network [48]	93%	NP	NP	NP
Proposed	98.71%	98.75%	98.72%	98.73%

Table 7 shows two different approaches to compare with the proposed method. In the work [49], metaheuristics are used to choose the best features to classify coronary stenosis images using machine learning. This generates an approach that in execution and testing times is superior to using deep learning as CNNs that are used in work [39], generating the architecture empirically but having better efficiency. However, combining both concepts as in the proposed method produced better results than separately.

Table 7. Comparison of the best results in the literature for the natural stenosis database [39] with the proposed method in testing set.

Models	Accuracy	Precision	Recall	F1-Score
Features selection [49]	88%	NP	NP	NP
CNN stenosis [39]	90%	NP	NP	NP
Proposed	97.27%	97.73%	96.73%	97.23%

Paper [40] released the database for coronary stenosis whose results are shown in Table 8. As in this paper, the authors used BUMDA to optimize the number of extracted features to be used in a machine learning system to classify coronary stenosis. As can be seen, the difference between literature and this work is the proposal to optimize a deep learning method such as CNNs, resulting in a better efficiency than using machine learning.

Table 8. Comparison of the best results in the literature for the natural Stenosis608 database [40] with the proposed method in testing set.

Models	Accuracy	Precision	Recall	F1-Score
Feature selection with BUMDA [40]	92%	NP	NP	92%
Proposed	98.33%	98.39%	98.33%	98.36%

For the results in Table 9 with the pneumonia database, the authors generally use pretrained CNNs and data augmentation as the classes are unbalanced. The authors in paper [50] use a weighted classifier to solve this problem. In the proposal of this work, the database was truncated to balance it and by the method used it was not necessary to make efforts to generate more data.

Table 9. Comparison of the best results in the literature for the pneumonia database [41] with the proposed method in testing set.

Models	Accuracy	Precision	Recall	F1-Score
CNN from scratch and data augmentation [51]	93.73	NP	NP	NP
Lightweight deep learning architecture [52]	97.09%	97%	98%	97%
CNN with weighted classifier [50]	98.43%	98.26%	99%	98.63%
Proposed	98.72%	98.76%	98.68%	98.7%

According to the comparative results, the method that always obtained better results with respect to accuracy or F1-score was the method proposed using BUMDA. One of the advantages of BUMDA is that it is a population algorithm with solutions independent of each other; it is also highly parallelizable, which facilitates the efficiency of the technique with respect to path algorithms. BUMDA found solutions that the other algorithms did not find, so we conclude that it exploited the search space better. The results were obtained automatically for each database with the proposed method using BUMDA so human intervention was minimal in the proposal of each architecture. While EDAs such as BUMDA have already been used to optimize the total of features extracted from images for classification with machine learning, to the best of our knowledge, this is the first EDA applied to NAS. The testing times for classification of each database were 0.6 s, 1 s, 0.7 s, 0.5 s, and 0.1 s, respectively, as the databases were presented. It can be seen that the deeper an architecture is, the longer it takes to classify, also depending on the size of the input. Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 show the architectures obtained from the proposed method for each database. A single architecture is not able to classify all databases; however, the proposed method is generic as the database can be replaced to obtain a specific architecture for each problem.

The proposal presented in this paper is limited by the restrictions of the value ranges. Some hyperparameters such as the learning ratio are continuous variables; therefore, the number of values considered in this work may not be adequate, despite taking into consideration the typical values used. The computational time is crucial to consider increasing the hyperparameter ranges; the hardware with which the experiments were performed is robust, but is limited by availability. As the search space increases, iterations must be increased to explore it efficiently, which increases the computational cost. This also affects the exploration of various kernel sizes in the convolutions and the number of layers. Therefore, the values were determined to maintain the search in size-controlled architectures.

5. Conclusions

In this paper, a novel method for automatic search of neural architectures based on an estimation of distribution algorithm (EDA) for binary classification problems was presented. The proposal helped to better explore the search space with a complexity of O(

2^{n}

) with

n = 29

, finding better architectures with optimal results. The hyperparameters were encoded in a binary array of size 29 and using the Boltzmann Univariate Marginal Univariate Distribution algorithm (BUMDA) chosen from four metaheuristics; the best possible architectures were obtained for five databases. In addition, the proposed method was compared with multiple state-of-the-art methods, testing its efficiency in terms of accuracy and F1-score. In experimental results, the proposed method with BUMDA was run 30 times in training and achieved F1-scores of 97.2%, 98.73%, 97.23%, 98.36%, and 98.7% in its best architecture with the test set, better results than the literature. The proposals of the state of the art, besides being empirically generated, tried to solve problems with the databases such as unbalancing to obtain the best possible efficiency, which generated a greater effort for the authors. With the proposed method this was not necessary, and the results were superior. For the test set, the computational time of the proposed method was ≈0.6 s, 1 s, 0.7 s, 0.5 s, and 0.1 s, respectively, being an efficient time but depending on the network depth and input size due to the increased number of calculations and memory usage.

Author Contributions

Conceptualization, E.F.-G. and I.C.-A.; methodology, E.F.-G.; software, E.F.-G.; validation, E.F.-G., I.C.-A. and M.S.A.-G.; formal analysis, I.C.-A.; investigation, E.F.-G.; resources, M.S.A.-G.; data curation, M.S.A.-G.; writing—original draft preparation, E.F.-G.; writing—review and editing, E.F.-G.; visualization, I.C.-A.; supervision, M.S.A.-G.; project administration, M.S.A.-G.; funding acquisition, M.S.A.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article Further inquiries can be directed to the corresponding author.

Acknowledgments

This research has been funded by the Secretaría de Ciencia, Humanidades, Tecnología e Innovación de México (SECIHTI), under the national scholarship for doctoral studies No. 812-657 for PhD student Erick Franco-Gaona, and IxM-SECIHTI No. 3097-7185. The Centro de Investigacion en Matematicas (CIMAT) through the Laboratorio de Supercomputo del Bajio provided the necessary hardware for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
Taye, M.M. Theoretical Understanding of Convolutional Neural Network: Concepts, Architectures, Applications, Future Directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
Patel, C.I.; Labana, D.; Pandya, S.; Modi, K.; Ghayvat, H.; Awais, M. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors 2020, 20, 7299. [Google Scholar] [CrossRef]
Zhang, J.; Li, Y.; Tai, A.; Wen, X.; Jiang, J. Motion Video Recognition in Speeded-Up Robust Features Tracking. Electronics 2022, 11, 2959. [Google Scholar] [CrossRef]
Kuo, C.H.; Huang, E.H.; Chien, C.H.; Hsu, C.C. Fpga design of enhanced scale-invariant feature transform with finite-area parallel feature matching for stereo vision. Electronics 2021, 10, 1632. [Google Scholar] [CrossRef]
Khan, S.; Rahmani, H.; Shah, S.A.A.; Bennamoun, M. A Guide to Convolutional Neural Networks for Computer Vision, 1st ed.; Morgan & Claypool: San Rafael, CA, USA, 2018; Volume 8. [Google Scholar] [CrossRef]
Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 2019, 20, 1–32. [Google Scholar]
Shuying, L.; Weihong, D. Very deep convolutional neural network based image classification using small training sample size. In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 730–734. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Jang, S.; Son, Y. Empirical Evaluation of Activation Functions and Kernel Initializers on Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 16–18 October 2019; pp. 1140–1142. [Google Scholar] [CrossRef]
Kang, J.S.; Kang, J.K.; Kim, J.J.; Jeon, K.W.; Chung, H.J.; Park, B.H. Neural Architecture Search Survey: A Computer Vision Perspective. Sensors 2023, 23, 1713. [Google Scholar] [CrossRef]
Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Comput. Surv. 2021, 54, 76. [Google Scholar] [CrossRef]
Borgli, R.J.; Stensland, H.K.; Riegler, M.A.; Halvorsen, P. Automatic Hyperparameter Optimization for Transfer Learning on Medical Image Datasets Using Bayesian Optimization. In Proceedings of the 13th International Symposium on Medical Information and Communication Technology (ISMICT), Oslo, Norway, 8–10 May 2019. [Google Scholar] [CrossRef]
Monteiro, T.G.; Skourup, C.; Zhang, H. Optimizing CNN Hyperparameters for Mental Fatigue Assessment in Demanding Maritime Operations. IEEE Access 2020, 8, 40402–40412. [Google Scholar] [CrossRef]
Amou, M.A.; Xia, K.; Kamhi, S.; Mouhafid, M. A Novel MRI Diagnosis Method for Brain Tumor Classification Based on CNN and Bayesian Optimization. Healthcare 2022, 10, 494. [Google Scholar] [CrossRef] [PubMed]
Atteia, G.; Samee, N.A.; El-Kenawy, E.S.M.; Ibrahim, A. CNN-Hyperparameter Optimization for Diabetic Maculopathy Diagnosis in Optical Coherence Tomography and Fundus Retinography. Mathematics 2022, 10, 3274. [Google Scholar] [CrossRef]
Gaspar, A.; Oliva, D.; Cuevas, E.; Zaldívar, D.; Pérez, M.; Pajares, G. Hyperparameter Optimization in a Convolutional Neural Network Using Metaheuristic Algorithms. Stud. Comput. Intell. 2021, 967, 37–59. [Google Scholar] [CrossRef]
Mostafa, S.S.; Mendonca, F.; Ravelo-Garcia, A.G.; Julia-Serda, G.; Morgado-Dias, F. Multi-Objective Hyperparameter Optimization of Convolutional Neural Network for Obstructive Sleep Apnea Detection. IEEE Access 2020, 8, 129586–129599. [Google Scholar] [CrossRef]
Kilichev, D.; Kim, W. Hyperparameter Optimization for 1D-CNN-Based Network Intrusion Detection Using GA and PSO. Mathematics 2023, 11, 3724. [Google Scholar] [CrossRef]
Singh, P.; Chaudhury, S.; Panigrahi, B.K. Hybrid MPSO-CNN: Multi-level Particle Swarm optimized hyperparameters of Convolutional Neural Network. Swarm Evol. Comput. 2021, 63, 100863. [Google Scholar] [CrossRef]
Raziani, S.; Azimbagirad, M. Deep CNN hyperparameter optimization algorithms for sensor-based human activity recognition. Neurosci. Inform. 2022, 2, 100078. [Google Scholar] [CrossRef]
Lestari, C.A.D.; Anam, S.; Sa’adah, U. Tomato Leaf Disease Classification with Optimized Hyperparameter: A DenseNet-PSO Approach. In Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023), Denpasar, Indonesia, 21–22 November 2023; pp. 228–239. [Google Scholar] [CrossRef]
Aguerchi, K.; Jabrane, Y.; Habba, M.; El Hassani, A.H. A CNN Hyperparameters Optimization Based on Particle Swarm Optimization for Mammography Breast Cancer Classification. J. Imaging 2024, 10, 30. [Google Scholar] [CrossRef]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1–21. [Google Scholar]
Samir, R.; Udit, C. Introduction to Soft Computing: Neuro-Fuzzy and Genetic Algorithms; Pearson: London, UK, 2013. [Google Scholar]
Duke, O.; Okwong, E.; Emmanuel, O.I. The synopsis of soft computing. Br. J. Comput. 2024, 7, 47–57. [Google Scholar] [CrossRef]
Tomar, V.; Bansal, M.; Singh, P. Metaheuristic Algorithms for Optimization: A Brief Review. Eng. Proc. 2023, 59, 238. [Google Scholar] [CrossRef]
Sean, L. Essentials of Metaheuristics: A Set of Undergraduate Lecture Notes, 2nd ed.; Lulu: Morrisville, NC, USA, 2013. [Google Scholar]
Johann, D.; Alain, P.; Eric, T. Metaheuristics for Hard Optimization; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Gendreau, M.; Potvin, J.-Y. Handbook of Metaheuristics, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Talbi, E.-G. Metaheuristics: From Design to Implementation; Jhon Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Valdez, S.I.; Hernández, A.; Botello, S. A Boltzmann based estimation of distribution algorithm. Inf. Sci. 2013, 236, 126–137. [Google Scholar] [CrossRef]
Diniz, D.N.; Souza, M.J.; Carneiro, C.M.; Ushizima, D.M.; de Medeiros, F.N.S.; Oliveira, P.H.; Bianchi, A.G. An iterated local search algorithm for cell nuclei detection from pap smear images. In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), Heraklion, Greece, 3–5 May 2019; pp. 307–315. [Google Scholar] [CrossRef]
Dhamar, B.; Arif, D.; Ahmad, M.; Wiwik, A.; Sasmi, H. Advanced Traveller Information Systems: Itinerary Optimisation Using Orienteering Problem Model and Great Deluge Iterative Local Search (Case Study: Angkot’s Route in Surabaya). In Proceedings of the 2019 2nd International Conference on Applied Engineering (ICAE), Batam, Indonesia, 2–3 October 2019; pp. 1–6. [Google Scholar]
Rodrigues, N.M.; Malan, K.M.; Ochoa, G.; Vanneschi, L.; Silva, S. Fitness landscape analysis of convolutional neural network architectures for image classification. Inf. Sci. 2022, 609, 711–726. [Google Scholar] [CrossRef]
Fang, J.; Tao, X.; Tang, Z.; Qiu, R.; Liu, Y. Dataset, ground-truth and performance metrics for table detection evaluation. In Proceedings of the 10th IAPR International Workshop on Document Analysis Systems, DAS, Gold Coast, Australia, 27–29 March 2012; pp. 445–449. [Google Scholar] [CrossRef]
Lin, X.; Gao, L.; Tang, Z.; Lin, X.; Hu, X. Performance evaluation of mathematical formula identification. In Proceedings of the 10th IAPR International Workshop on Document Analysis Systems, DAS, Gold Coast, Australia, 27–29 March 2012; pp. 287–291. [Google Scholar] [CrossRef]
Antczak, K.; Liberadzki, Ł. Stenosis Detection with Deep Convolutional Neural Networks. MATEC Web Conf. 2018, 210, 04001. [Google Scholar] [CrossRef]
Gil-Rios, M.A.; Cruz-Aceves, I.; Hernandez-Aguirre, A.; Hernandez-Gonzalez, M.A.; Solorio-Meza, S.E. Improving Automatic Coronary Stenosis Classification Using a Hybrid Metaheuristic with Diversity Control. Diagnostics 2024, 14, 2372. [Google Scholar] [CrossRef]
Melendez, J.; Van Ginneken, B.; Maduskar, P.; Philipsen, R.H.; Reither, K.; Breuninger, M.; Adetifa, I.M.O.; Maane, R.; Ayles, H.; Sánchez, C.I. A novel multiple-instance learning-based approach to computer-aided detection of tuberculosis on chest X-rays. IEEE Trans. Med. Imaging 2015, 34, 179–192. [Google Scholar] [CrossRef]
Siddiqui, S.A.; Malik, M.I.; Agne, S.; Dengel, A.; Ahmed, S. DeCNT: Deep deformable CNN for table detection. IEEE Access 2018, 6, 74151–74161. [Google Scholar] [CrossRef]
Agarwal, M.; Mondal, A. CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2020; pp. 1–12. [Google Scholar] [CrossRef]
Nazir, D.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Afzal, M.Z. HybridTabNet: Towards Better Table Detection in Scanned Document Images. Appl. Sci. 2021, 11, 8396. [Google Scholar] [CrossRef]
Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution. J. Imaging 2021, 7, 214. [Google Scholar] [CrossRef]
Kazdar, T.; Mseddi, W.S.; Akhloufi, M.A.; Agrebi, A.; Jmal, M. DCTable: A Dilated CNN with Optimizing Anchors for Accurate Table Detection. J. Imaging 2023, 9, 62. [Google Scholar] [CrossRef]
Phong, B.-H.; Dat, L.-T.; Yen, N.-T.; Hoang, T.-M.; Le, T.-L. A deep learning based system for mathematical expression detection and recognition in document images. In Proceedings of the 2020 12th International Conference on Knowledge and Systems Engineering (KSE), Can Tho City, Vietnam, 12–14 November 2020; pp. 85–90. [Google Scholar] [CrossRef]
Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Cascade network with deformable composite backbone for formula detection in scanned document images. Appl. Sci. 2021, 11, 7610. [Google Scholar] [CrossRef]
Gil-Rios, M.A.; Guryev, I.V.; Cruz-Aceves, I.; Avina-Cervantes, J.G.; Hernandez-Gonzalez, M.A.; Solorio-Meza, S.E.; Lopez-Hernandez, J.M. Automatic feature selection for stenosis detection in x-ray coronary angiograms. Mathematics 2021, 9, 2471. [Google Scholar] [CrossRef]
Hashmi, M.F.; Katiyar, S.; Keskar, A.G.; Bokde, N.D.; Geem, Z.W. Efficient pneumonia detection in chest xray images using deep transfer learning. Diagnostics 2020, 10, 417. [Google Scholar] [CrossRef] [PubMed]
Stephen, O.; Sain, M.; Maduh, U.J.; Jeong, D.U. An Efficient Deep Learning Approach to Pneumonia Classification in Healthcare. J. Healthc. Eng. 2019, 2019, 4180949. [Google Scholar] [CrossRef] [PubMed]
Trivedi, M.; Gupta, A. A lightweight deep learning architecture for the automatic detection of pneumonia using chest X-ray images. Multimed. Tools Appl. 2022, 81, 5515–5536. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed neural architecture search methodology for multi-purpose databases. During the search process using a metaheuristic, random architecture is initially used; at the end of the process an optimal architecture is obtained for each case of study.

Figure 2. Subsection (a) shows examples of tables, and subsection (b) shows examples of non-tables from Marmot [37].

Figure 3. Subsection (a) shows examples of equations, and subsection (b) shows examples of non-equations from Marmot [38].

Figure 4. Subsection (a) shows examples of stenosis, and subsection (b) shows examples of non-stenosis in real angiograms from Antczak and Liberadzki [39].

Figure 5. Subsection (a) shows examples of stenosis, and subsection (b) shows examples of non-stenosis in real angiograms from Stenosis608 [40].

Figure 6. Subsection (a) shows examples of lungs with pneumonia, and subsection (b) shows examples of healthy lungs from Melendez et al. [41].

Figure 7. Evolution of the elite in the best run of the BUMDA algorithm for each database in training process.

Figure 8. The best architecture found by the proposed method for the classification of tables with the Marmot database [37].

Figure 9. The best architecture found by the proposed method for the classification of equations with the Marmot database [38].

Figure 10. The best architecture found by the proposed method for the classification of coronary stenosis with the Antczak and Liberadzki database [39].

Figure 11. The best architecture found by the proposed method for the classification of coronary stenosis with the Stenosis608 database [40].

Figure 12. The best architecture found by the proposed method for the classification of coronary stenosis with the pneumonia database [41].

Table 1. Ranges and values of hyperparameters encoded in a binary array.

Hyperparameter	Ranges or Values
Kernel size for convolutions	3 × 3/5 × 5
Pooling stride size	1/2
Number of dropout layers	0–3
Number of pooling layers	0–7
Batch size	8/16/24/32
Number of filters per convolutional layer	[5–63]
Number of convolutional layers	[1–15]
Number of neurons per fully connected layer	[4–31]
Number of fully connected layers	[1–7]
Learning rate	0.001/0.01/0.1/0.02

Table 2. Description of the databases taken from literature to test the architectures generated for each one.

Database	Number of Images	Size of Images	Description
Marmot tables [37]	2752	Various sizes	It contains 2000 PNG images extracted from PDF documents. Patches were extracted obtaining 2752 images. The patches are of multiple sizes and the original images are of low resolution. Positive and negative instances, representing an equal 1:1 proportion. Multiple types of tables (positive) and non-tables (negative) are presented in the database. Negative instances are elements similar to tables.
Marmot equations [38]	1554	Various sizes	It originally contains 400 PDF documents from which 1554 patches of multiple sizes were extracted. Positive and negative instances, representing an equal 1:1 proportion. Multiple types of equations (positive) and non-equations (negative) are presented in the database.
Coronary stenosis from Antczak [39]	250	32 × 32 pixels	The database consists of 250 patches taken from real angiograms and classified by experts. The patches show examples of blocked and unblocked arteries due to coronary stenosis. Positive and negative instances, representing an equal 1:1 proportion. Images of 32 × 32 pixels indicate that they are of low resolution.
Stenosis608 [40]	608	64 × 64 pixels	It contains 180 digital images of 512 × 512 pixels delineated by experts. In total, a set of 608 64 × 64 pixel patches was formed. Patches indicate arteries with stenosis or no coronary stenosis. Positive and negative instances, representing an equal 1:1 proportion.
Pneumonia [41]	5856	Various sizes	Database consists of a total of 5856 anteroposterior chest radiographs of pediatric patients between 1 and 5 years of age. The images do not have a standardized size but are of high resolution. For classification, the complete image is used and not patches. The database has 4273 positive images and 1583 negative images.

Table 3. Parameters of three metaheuristics to be compared with BUMDA in the NAS process.

Algorithm	Parameter	Value
ILS	Number of iterations	200
ILS	Type of local search	First improvement
SA	Initial temperature	50
	Final temperature	0.0014
	Alpha	0.9
GA	Generations	20
	Population size	10
	Percentage of selection	0.6
	Percentage of mutation	0.1
BUMDA	Generations	20
BUMDA	Population size	10

Table 4. Statistical summary of F1-score for SA, ILS, GA, and BUMDA for each database using training set.

Metaheuristic	Minimum	Maximum	Mean	Median	Standard Deviation
Marmot Tables [37]
SA	0.919	0.942	0.929	0.923	0.0102
ILS	0.923	0.957	0.941	0.940	0.011
GA	0.940	0.968	0.958	0.962	0.011
BUMDA	0.968	0.988	0.975	0.975	0.005
Marmot Equations [38]
SA	0.913	0.943	0.932	0.935	0.012
ILS	0.913	0.953	0.939	0.943	0.013
GA	0.962	0.987	0.975	0.975	0.010
BUMDA	0.962	0.989	0.981	0.987	0.009
Stenosis from Antczak and Liberadzki [39]
SA	0.923	0.960	0.949	0.960	0.018
ILS	0.923	0.960	0.949	0.960	0.018
GA	0.923	0.960	0.952	0.960	0.0157
BUMDA	0.960	0.978	0.965	0.960	0.006
Stenosis608 [40]
SA	0.885	0.968	0.941	0.950	0.025
ILS	0.937	0.984	0.957	0.952	0.017
GA	0.937	0.984	0.970	0.983	0.019
BUMDA	0.952	0.989	0.974	0.984	0.015
Pneumonia [41]
SA	0.959	0.972	0.963	0.962	0.007
ILS	0.962	0.975	0.969	0.972	0.006
GA	0.962	0.975	0.971	0.974	0.006
BUMDA	0.962	0.992	0.970	0.972	0.005

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Franco-Gaona, E.; Avila-Garcia, M.S.; Cruz-Aceves, I. Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases. Mathematics 2025, 13, 605. https://doi.org/10.3390/math13040605

AMA Style

Franco-Gaona E, Avila-Garcia MS, Cruz-Aceves I. Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases. Mathematics. 2025; 13(4):605. https://doi.org/10.3390/math13040605

Chicago/Turabian Style

Franco-Gaona, Erick, Maria Susana Avila-Garcia, and Ivan Cruz-Aceves. 2025. "Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases" Mathematics 13, no. 4: 605. https://doi.org/10.3390/math13040605

APA Style

Franco-Gaona, E., Avila-Garcia, M. S., & Cruz-Aceves, I. (2025). Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases. Mathematics, 13(4), 605. https://doi.org/10.3390/math13040605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Neural Architecture Search Based on an Estimation of Distribution Algorithm for Binary Classification of Image Databases

Abstract

1. Introduction

2. Methods

2.1. Neural Architecture Search

2.2. Boltzmann Univariate Marginal Distribution

2.3. Proposed Method

3. Databases for Automatic Binary Classification

3.1. Information Elements in Digital Documents Databases

3.1.1. Tables

3.1.2. Equations

3.2. Medical Image Databases

3.2.1. Coronary Stenosis from Antczak and Liberadzki

3.2.2. Coronary Stenosis608

3.2.3. Pneumonia

4. Results and Discussion

Comparative Analysis of Metaheuristics

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI