MDPI - Publisher of Open Access Journals

36 pages, 9139 KiB

Open AccessArticle

On the Synergy of Optimizers and Activation Functions: A CNN Benchmarking Study

by Khuraman Aziz Sayın, Necla Kırcalı Gürsoy, Türkay Yolcu and Arif Gürsoy

Mathematics 2025, 13(13), 2088; https://doi.org/10.3390/math13132088 - 25 Jun 2025

Viewed by 483

In this study, we present a comparative analysis of gradient descent-based optimizers frequently used in Convolutional Neural Networks (CNNs), including SGD, mSGD, RMSprop, Adadelta, Nadam, Adamax, Adam, and the recent EVE optimizer. To explore the interaction between optimization strategies and activation functions, we [...] Read more.

In this study, we present a comparative analysis of gradient descent-based optimizers frequently used in Convolutional Neural Networks (CNNs), including SGD, mSGD, RMSprop, Adadelta, Nadam, Adamax, Adam, and the recent EVE optimizer. To explore the interaction between optimization strategies and activation functions, we systematically evaluate all combinations of these optimizers with four activation functions—ReLU, LeakyReLU, Tanh, and GELU—across three benchmark image classification datasets: CIFAR-10, Fashion-MNIST (F-MNIST), and Labeled Faces in the Wild (LFW). Each configuration was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, mean absolute error (MAE), and mean squared error (MSE). All experiments were performed using k-fold cross-validation to ensure statistical robustness. Additionally, two-way ANOVA was employed to validate the significance of differences across optimizer–activation combinations. This study aims to highlight the importance of jointly selecting optimizers and activation functions to enhance training dynamics and generalization in CNNs. We also consider the role of critical hyperparameters, such as learning rate and regularization methods, in influencing optimization stability. This work provides valuable insights into the optimizer–activation interplay and offers practical guidance for improving architectural and hyperparameter configurations in CNN-based deep learning models. Full article

(This article belongs to the Special Issue Artificial Intelligence and Data Science, 2nd Edition)

► Show Figures

Figure 1

44 pages, 3458 KiB

Open AccessArticle

Fractional Optimizers for LSTM Networks in Financial Time Series Forecasting

by Mustapha Ez-zaiym, Yassine Senhaji, Meriem Rachid, Karim El Moutaouakil and Vasile Palade

Mathematics 2025, 13(13), 2068; https://doi.org/10.3390/math13132068 - 22 Jun 2025

Viewed by 545

Abstract

This study investigates the theoretical foundations and practical advantages of fractional-order optimization in computational machine learning, with a particular focus on stock price forecasting using long short-term memory (LSTM) networks. We extend several widely used optimization algorithms—including Adam, RMSprop, SGD, Adadelta, FTRL, Adamax, [...] Read more.

This study investigates the theoretical foundations and practical advantages of fractional-order optimization in computational machine learning, with a particular focus on stock price forecasting using long short-term memory (LSTM) networks. We extend several widely used optimization algorithms—including Adam, RMSprop, SGD, Adadelta, FTRL, Adamax, and Adagrad—by incorporating fractional derivatives into their update rules. This novel approach leverages the memory-retentive properties of fractional calculus to improve convergence behavior and model efficiency. Our experimental analysis evaluates the performance of fractional-order optimizers on LSTM networks tasked with forecasting stock prices for major companies such as AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH. Considering four metrics (Sharpe ratio, directional accuracy, cumulative return, and MSE), the results show that fractional orders can significantly enhance prediction accuracy for moderately volatile stocks, especially among lower-cap assets. However, for highly volatile stocks, performance tends to degrade with higher fractional orders, leading to erratic and inconsistent forecasts. In addition, fractional optimizers with short-memory truncation offer a favorable trade-off between computational efficiency and modeling accuracy in medium-frequency financial applications. Their enhanced capacity to capture long-range dependencies and robust performance in noisy environments further justify their adoption in such contexts. These results suggest that fractional-order optimization holds significant promise for improving financial forecasting models—provided that the fractional parameters are carefully tuned to balance memory effects with system stability. Full article

► Show Figures

Figure 1

18 pages, 3561 KiB

Open AccessArticle

Improving TMJ Diagnosis: A Deep Learning Approach for Detecting Mandibular Condyle Bone Changes

by Kader Azlağ Pekince, Adem Pekince and Buse Yaren Kazangirler

Diagnostics 2025, 15(8), 1022; https://doi.org/10.3390/diagnostics15081022 - 17 Apr 2025

Viewed by 2350

Abstract

Objectives: This paper evaluates the potential of using deep learning approaches for the detection of degenerative bone changes in the mandibular condyle. The aim of this study is to enable the detection and diagnosis of mandibular condyle degenerations, which are difficult to [...] Read more.

Objectives: This paper evaluates the potential of using deep learning approaches for the detection of degenerative bone changes in the mandibular condyle. The aim of this study is to enable the detection and diagnosis of mandibular condyle degenerations, which are difficult to observe and diagnose on panoramic radiographs, using deep learning methods. Methods: A total of 3875 condylar images were obtained from panoramic radiographs. Condylar bone changes were represented by flattening, osteophyte, and erosion, and images in which two or more of these changes were observed were labeled as “other”. Due to the limited number of images containing osteophytes and erosion, two approaches were used. In the first approach, images containing osteophytes and erosion were combined into the “other” group, resulting in three groups: normal, flattening, and deformation (“deformation” encompasses the “other” group, together with osteophyte and erosion). In the second approach, images containing osteophytes and erosion were completely excluded, resulting in three groups: normal, flattening, and other. The study utilizes a range of advanced deep learning algorithms, including Dense Networks, Residual Networks, VGG Networks, and Google Networks, which are pre-trained with transfer learning techniques. Model performance was evaluated using datasets with different distributions, specifically 70:30 and 80:20 training-test splits. Results: The GoogleNet architecture achieved the highest accuracy. Specifically, with the 80:20 split of the normal-flattening-deformation dataset and the Adamax optimizer, an accuracy of 95.23% was achieved. The results demonstrate that CNN-based methods are highly successful in determining mandibular condyle bone changes. Conclusions: This study demonstrates the potential of deep learning, particularly CNNs, for the accurate and efficient detection of TMJ-related condylar bone changes from panoramic radiographs. This approach could assist clinicians in identifying patients requiring further intervention. Future research may involve using cross-sectional imaging methods and training the right and left condyles together to potentially increase the success rate. This approach has the potential to improve the early detection of TMJ-related condylar bone changes, enabling timely referrals and potentially preventing disease progression. Full article

(This article belongs to the Special Issue Advancements in Artificial Intelligence for Dentomaxillofacial Radiology)

► Show Figures

Figure 1

29 pages, 8824 KiB

Open AccessArticle

Toward Reliable Post-Disaster Assessment: Advancing Building Damage Detection Using You Only Look Once Convolutional Neural Network and Satellite Imagery

by César Luis Moreno González, Germán A. Montoya and Carlos Lozano Garzón

Mathematics 2025, 13(7), 1041; https://doi.org/10.3390/math13071041 - 23 Mar 2025

Viewed by 775

Abstract

Natural disasters continuously threaten populations worldwide, with hydrometeorological events standing out due to their unpredictability, rapid onset, and significant destructive capacity. However, developing countries often face severe budgetary constraints and rely heavily on international support, limiting their ability to implement optimal disaster response [...] Read more.

Natural disasters continuously threaten populations worldwide, with hydrometeorological events standing out due to their unpredictability, rapid onset, and significant destructive capacity. However, developing countries often face severe budgetary constraints and rely heavily on international support, limiting their ability to implement optimal disaster response strategies. This study addresses these challenges by developing and implementing YOLOv8-based deep learning models trained on high-resolution satellite imagery from the Maxar GeoEye-1 satellite. Unlike prior studies, we introduce a manually labeled dataset, consisting of 1400 undamaged and 1200 damaged buildings, derived from pre- and post-Hurricane Maria imagery. This dataset has been publicly released, providing a benchmark for future disaster assessment research. Additionally, we conduct a systematic evaluation of optimization strategies, comparing SGD with momentum, RMSProp, Adam, AdaMax, NAdam, and AdamW. Our results demonstrate that SGD with momentum outperforms Adam-based optimizers in training stability, convergence speed, and reliability across higher confidence thresholds, leading to more robust and consistent disaster damage predictions. To enhance usability, we propose deploying the trained model via a REST API, enabling real-time damage assessment with minimal computational resources, making it a low-cost, scalable tool for government agencies and humanitarian organizations. These findings contribute to machine learning-based disaster response, offering an efficient, cost-effective framework for large-scale damage assessment and reinforcing the importance of model selection, hyperparameter tuning, and optimization functions in critical real-world applications. Full article

(This article belongs to the Special Issue Mathematical Methods and Models Applied in Information Technology)

► Show Figures

Figure 1

23 pages, 9189 KiB

Open AccessArticle

A Dendritic Neural Network-Based Model for Residential Electricity Consumption Prediction

by Ting Jin, Rui Xu, Kunqi Su and Jinrui Gao

Mathematics 2025, 13(4), 575; https://doi.org/10.3390/math13040575 - 9 Feb 2025

Viewed by 933

Abstract

Residential electricity consumption represents a large percentage of overall energy use. Therefore, accurately predicting residential electricity consumption and understanding the factors that influence it can provide effective strategies for reducing energy demand. In this study, a dendritic neural network-based model (DNM), combined with [...] Read more.

Residential electricity consumption represents a large percentage of overall energy use. Therefore, accurately predicting residential electricity consumption and understanding the factors that influence it can provide effective strategies for reducing energy demand. In this study, a dendritic neural network-based model (DNM), combined with the AdaMax optimization algorithm, is used to predict residential electricity consumption. The case study uses the U.S. residential electricity consumption dataset.This paper constructs a feature selection framework for the dataset, reducing the high-dimensional data to 12 features. The DNM model is then used for fitting and compared with five commonly used prediction models. The

R^{2}

of DNM is 0.7405, the highest among the six models, followed by the XGBoost model with an

R^{2}

of 0.7286. Subsequently, the paper leverages the interpretability of DNM to further filter the data, obtaining a dataset with 6 features, and the

R^{2}

on this dataset is further improved to 0.7423, resulting in an increase of 0.0018. Full article

(This article belongs to the Special Issue Biologically Plausible Deep Learning)

► Show Figures

Figure 1

25 pages, 1676 KiB

Open AccessArticle

Research of Chinese Entity Recognition Model Based on Multi-Feature Semantic Enhancement

by Ling Yuan, Chenglong Zeng and Peng Pan

Electronics 2024, 13(24), 4895; https://doi.org/10.3390/electronics13244895 - 12 Dec 2024

Cited by 1 | Viewed by 873

Abstract

Chinese Entity Recognition (CER) aims to extract key information entities from Chinese text data, supporting subsequent natural language processing tasks such as relation extraction, knowledge graph construction, and intelligent question answering. However, CER faces several challenges, including limited training corpora, unclear entity boundaries, [...] Read more.

Chinese Entity Recognition (CER) aims to extract key information entities from Chinese text data, supporting subsequent natural language processing tasks such as relation extraction, knowledge graph construction, and intelligent question answering. However, CER faces several challenges, including limited training corpora, unclear entity boundaries, and complex entity structures, resulting in low accuracy and a call for further improvements. To address issues such as high annotation costs and ambiguous entity boundaries, this paper proposes the SEMFF-CER model, a CER model based on semantic enhancement and multi-feature fusion. The model employs character feature extraction algorithms, SofeLexicon semantic enhancement for vocabulary feature extraction, and deep semantic feature extraction from pre-trained models. These features are integrated into the entity recognition process via gating mechanisms, effectively leveraging diverse features to enhance contextual semantics and improve recognition accuracy. Additionally, the model incorporates several optimization strategies: an adaptive loss function to balance negative samples and improve the F1 score, data augmentation to enhance model robustness, and dropout and Adamax optimization algorithms to refine training. The SEMFF-CER model is characterized by a low dependence on training corpora, fast computation speed, and strong scalability. Experiments conducted on four Chinese benchmark entity recognition datasets validate the proposed model, demonstrating superior performance over existing models with the highest F1 score. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

44 pages, 11292 KiB

Open AccessArticle

Enhancing Efficacy in Breast Cancer Screening with Nesterov Momentum Optimization Techniques

by Priyanka Ramdass, Gajendran Ganesan, Salah Boulaaras and Seham Sh. Tantawy

Mathematics 2024, 12(21), 3354; https://doi.org/10.3390/math12213354 - 25 Oct 2024

Cited by 1 | Viewed by 1308

Abstract

In the contemporary landscape of healthcare, machine learning models are pivotal in facilitating precise predictions, particularly in the nuanced diagnosis of complex ailments such as breast cancer. Traditional diagnostic methodologies grapple with inherent challenges, including excessive complexity, elevated costs, and reliance on subjective [...] Read more.

In the contemporary landscape of healthcare, machine learning models are pivotal in facilitating precise predictions, particularly in the nuanced diagnosis of complex ailments such as breast cancer. Traditional diagnostic methodologies grapple with inherent challenges, including excessive complexity, elevated costs, and reliance on subjective interpretation, which frequently culminate in inaccuracies. The urgency of early detection cannot be overstated, as it markedly broadens treatment modalities and significantly enhances survival rates. This paper delineates an innovative optimization framework designed to augment diagnostic accuracy by amalgamating momentum-based optimization techniques within a neural network paradigm. Conventional machine learning approaches are often encumbered by issues of overfitting, data imbalance, and the inadequacy of capturing intricate patterns in high-dimensional datasets. To counter these limitations, we propose a sophisticated framework that integrates an adaptive threshold mechanism across an array of gradient-based optimizers, including SGD, RMSprop, adam, adagrad, adamax, adadelta, nadam and Nesterov momentum. This novel approach effectively mitigates oscillatory behavior, refines parameter updates, and accelerates convergence. A salient feature of our methodology is the incorporation of a momentum threshold for early stopping, which ceases training upon the stabilization of momentum below a pre-defined threshold, thereby pre-emptively preventing overfitting. Leveraging the Wisconsin Breast Cancer Dataset, our model achieved a remarkable 99.72% accuracy and 100% sensitivity, significantly curtailing misclassification rates compared to traditional methodologies. This framework stands as a robust solution for early breast cancer diagnosis, thereby enhancing clinical decision making and improving patient outcomes. Full article

► Show Figures

Figure 1

13 pages, 2961 KiB

Open AccessArticle

LSTM Model Integrated Remote Sensing Data for Drought Prediction: A Study on Climate Change Impacts on Water Availability in the Arid Region

by Haitham Abdulmohsin Afan, Atheer Saleem Almawla, Basheer Al-Hadeethi, Faidhalrahman Khaleel, Alaa H. AbdUlameer, Md Munir Hayet Khan, Muhammad Izzat Nor Ma’arof and Ammar Hatem Kamel

Water 2024, 16(19), 2799; https://doi.org/10.3390/w16192799 - 1 Oct 2024

Cited by 4 | Viewed by 3149

Abstract

Climate change is one of the trending terms in the world nowadays due to its profound impact on human health and activity. Extreme drought events and desertification are some of the results of climate change. This study utilized the power of AI tools [...] Read more.

Climate change is one of the trending terms in the world nowadays due to its profound impact on human health and activity. Extreme drought events and desertification are some of the results of climate change. This study utilized the power of AI tools by using the long short-term memory (LSTM) model to predict the drought index for Anbar Province, Iraq. The data from the standardized precipitation evapotranspiration index (SPEI) for 118 years have been used for the current study. The proposed model employed seven different optimizers to enhance the prediction performance. Based on different performance indicators, the results show that the RMSprop and Adamax optimizers achieved the highest accuracy (90.93% and 90.61%, respectively). Additionally, the models forecasted the next 40 years of the SPEI for the study area, where all the models showed an upward trend in the SPEI. In contrast, the best models expected no increase in the severity of drought. This research highlights the vital role of machine learning models and remote sensing in drought forecasting and the significance of these applications by providing accurate climate data for better water resources management, especially in arid regions like that of Anbar province. Full article

(This article belongs to the Section Water and Climate Change)

► Show Figures

Figure 1

30 pages, 3939 KiB

Open AccessArticle

Predictive Modeling of Conveyor Belt Deterioration in Coal Mines Using AI Techniques

by Parthkumar Parmar, Leszek Jurdziak, Aleksandra Rzeszowska and Anna Burduk

Energies 2024, 17(14), 3497; https://doi.org/10.3390/en17143497 - 16 Jul 2024

Cited by 6 | Viewed by 2524

Abstract

Conveyor belts are vital for material transportation in coal mines due to their cost-effectiveness and versatility. These belts endure significant wear from harsh operating conditions, risking substantial financial losses if they fail. This study develops five artificial neural network (ANN) models to predict [...] Read more.

Conveyor belts are vital for material transportation in coal mines due to their cost-effectiveness and versatility. These belts endure significant wear from harsh operating conditions, risking substantial financial losses if they fail. This study develops five artificial neural network (ANN) models to predict conveyor belt damage using 11 parameters from the Belchatow brown coal mine in Poland. The models target five outputs: number of repairs and cable cuts, cumulative number of repairs and cable cuts, and their ages. Various optimizers (Adam, Nadam, RMSprop, Adamax, and stochastic gradient descent or SGD) and activation functions (ReLU, Swish, sigmoid, tanh, Leaky ReLU, and softmax) were tested to find the optimal configurations. The predictive performance was evaluated using three error indicators against actual mine data. Superior models can forecast belt behavior under specific conditions, aiding proactive maintenance. The study also advocates for the Diagbelt+ system over human inspections for failure detection. This modeling approach enhances proactive maintenance, preventing total system breakdowns due to belt wear. Full article

(This article belongs to the Special Issue Advances in Optimization and Modelling of Coal Mining)

► Show Figures

Figure 1

18 pages, 2185 KiB

Open AccessArticle

Evaluation of Optimization Algorithms for Measurement of Suspended Solids

by Daniela Lopez-Betancur, Efrén González-Ramírez, Carlos Guerrero-Mendez, Tonatiuh Saucedo-Anaya, Martín Montes Rivera, Edith Olmos-Trujillo and Salvador Gomez Jimenez

Water 2024, 16(13), 1761; https://doi.org/10.3390/w16131761 - 21 Jun 2024

Cited by 3 | Viewed by 2631

Abstract

Advances in convolutional neural networks (CNNs) provide novel and alternative solutions for water quality management. This paper evaluates state-of-the-art optimization strategies available in PyTorch to date using AlexNet, a simple yet powerful CNN model. We assessed twelve optimization algorithms: Adadelta, Adagrad, Adam, AdamW, [...] Read more.

Advances in convolutional neural networks (CNNs) provide novel and alternative solutions for water quality management. This paper evaluates state-of-the-art optimization strategies available in PyTorch to date using AlexNet, a simple yet powerful CNN model. We assessed twelve optimization algorithms: Adadelta, Adagrad, Adam, AdamW, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, and SGD under default conditions. The AlexNet model, pre-trained and coupled with a Multiple Linear Regression (MLR) model, was used to estimate the quantity black pixels (suspended solids) randomly distributed on a white background image, representing total suspended solids in liquid samples. Simulated images were used instead of real samples to maintain a controlled environment and eliminate variables that could introduce noise and optical aberrations, ensuring a more precise evaluation of the optimization algorithms. The performance of the CNN was evaluated using the accuracy, precision, recall, specificity, and F_Score metrics. Meanwhile, MLR was evaluated with the coefficient of determination (

R^{2}

), mean absolute and mean square errors. The results indicate that the top five optimizers are Adagrad, Rprop, Adamax, SGD, and ASGD, with accuracy rates of 100% for each optimizer, and

R^{2}

values of 0.996, 0.959, 0.971, 0.966, and 0.966, respectively. Instead, the three worst performing optimizers were Adam, AdamW, and NAdam with accuracy rates of 22.2%, 11.1% and 11.1%, and

R^{2}

values of 0.000, 0.148, and 0.000, respectively. These findings demonstrate the significant impact of optimization algorithms on CNN performance and provide valuable insights for selecting suitable optimizers to water quality assessment, filling existing gaps in the literature. This motivates further research to test the best optimizer models using real data to validate the findings and enhance their practical applicability, explaining how the optimizers can be used with real data. Full article

(This article belongs to the Special Issue Application of Machine Learning Techniques in Water Resources Management and Environmental Engineering)

► Show Figures

Figure 1

23 pages, 6087 KiB

Open AccessArticle

Automatic Calibration of Microscopic Traffic Simulation Models Using Artificial Neural Networks

by Rodrigo F. Daguano, Leopoldo R. Yoshioka, Marcio L. Netto, Claudio L. Marte, Cassiano A. Isler, Max Mauro Dias Santos and João F. Justo

Sensors 2023, 23(21), 8798; https://doi.org/10.3390/s23218798 - 29 Oct 2023

Cited by 7 | Viewed by 3374

Abstract

Traffic simulations are valuable tools for urban mobility planning and operation, particularly in large cities. Simulation-based microscopic models have enabled traffic engineers to understand local transit and transport behaviors more deeply and manage urban mobility. However, for the simulations to be effective, the [...] Read more.

Traffic simulations are valuable tools for urban mobility planning and operation, particularly in large cities. Simulation-based microscopic models have enabled traffic engineers to understand local transit and transport behaviors more deeply and manage urban mobility. However, for the simulations to be effective, the transport network and user behavior parameters must be calibrated to mirror real scenarios. In general, calibration is performed manually by traffic engineers who use their knowledge and experience to adjust the parameters of the simulator. Unfortunately, there is still no systematic and automatic process for calibrating traffic simulation networks, although some methods have been proposed in the literature. This study proposes a methodology that facilitates the calibration process, where an artificial neural network (ANN) is trained to learn the behavior of the transport network of interest. The ANN used is the Multi-Layer Perceptron (MLP), trained with back-propagation methods. Based on this learning procedure, the neural network can select the optimized values of the simulation parameters that best mimic the traffic conditions of interest. Experiments considered two microscopic models of traffic and two psychophysical models (Wiedemann 74 and Wiedemann 99). The microscopic traffic models are located in the metropolitan region of São Paulo, Brazil. Moreover, we tested the different configurations of the MLP (layers and numbers of neurons) as well as several variations of the backpropagation training method: Stochastic Gradient Descent (SGD), Adam, Adagrad, Adadelta, Adamax, and Nadam. The results of the experiments show that the proposed methodology is accurate and efficient, leading to calibration with a correlation coefficient greater than 0.8, when the calibrated parameters generate more visible effects on the road network, such as travel times, vehicle counts, and average speeds. For the psychophysical parameters, in the most simplified model (W74), the correlation coefficient was greater than 0.7. The advantage of using ANN for the automatic calibration of simulation parameters is that it allows traffic engineers to carry out comprehensive studies on a large number of future scenarios, such as at different times of the day, as well as on different days of the week and months of the year. Full article

(This article belongs to the Special Issue Advanced Sensing Technology for Intelligent Transportation Systems)

► Show Figures

Figure 1

22 pages, 6487 KiB

Open AccessArticle

An Intelligent Attention-Based Transfer Learning Model for Accurate Differentiation of Bone Marrow Stains to Diagnose Hematological Disorder

by Hani Alshahrani, Gunjan Sharma, Vatsala Anand, Sheifali Gupta, Adel Sulaiman, M. A. Elmagzoub, Mana Saleh Al Reshan, Asadullah Shaikh and Ahmad Taher Azar

Life 2023, 13(10), 2091; https://doi.org/10.3390/life13102091 - 20 Oct 2023

Cited by 18 | Viewed by 2459

Abstract

Bone marrow (BM) is an essential part of the hematopoietic system, which generates all of the body’s blood cells and maintains the body’s overall health and immune system. The classification of bone marrow cells is pivotal in both clinical and research settings because [...] Read more.

Bone marrow (BM) is an essential part of the hematopoietic system, which generates all of the body’s blood cells and maintains the body’s overall health and immune system. The classification of bone marrow cells is pivotal in both clinical and research settings because many hematological diseases, such as leukemia, myelodysplastic syndromes, and anemias, are diagnosed based on specific abnormalities in the number, type, or morphology of bone marrow cells. There is a requirement for developing a robust deep-learning algorithm to diagnose bone marrow cells to keep a close check on them. This study proposes a framework for categorizing bone marrow cells into seven classes. In the proposed framework, five transfer learning models—DenseNet121, EfficientNetB5, ResNet50, Xception, and MobileNetV2—are implemented into the bone marrow dataset to classify them into seven classes. The best-performing DenseNet121 model was fine-tuned by adding one batch-normalization layer, one dropout layer, and two dense layers. The proposed fine-tuned DenseNet121 model was optimized using several optimizers, such as AdaGrad, AdaDelta, Adamax, RMSprop, and SGD, along with different batch sizes of 16, 32, 64, and 128. The fine-tuned DenseNet121 model was integrated with an attention mechanism to improve its performance by allowing the model to focus on the most relevant features or regions of the image, which can be particularly beneficial in medical imaging, where certain regions might have critical diagnostic information. The proposed fine-tuned and integrated DenseNet121 achieved the highest accuracy, with a training success rate of 99.97% and a testing success rate of 97.01%. The key hyperparameters, such as batch size, number of epochs, and different optimizers, were all considered for optimizing these pre-trained models to select the best model. This study will help in medical research to effectively classify the BM cells to prevent diseases like leukemia. Full article

(This article belongs to the Section Medical Research)

► Show Figures

Figure 1

22 pages, 868 KiB

Open AccessArticle

An Efficient Optimization Technique for Training Deep Neural Networks

by Faisal Mehmood, Shabir Ahmad and Taeg Keun Whangbo

Mathematics 2023, 11(6), 1360; https://doi.org/10.3390/math11061360 - 10 Mar 2023

Cited by 70 | Viewed by 13362

Abstract

Deep learning is a sub-branch of artificial intelligence that acquires knowledge by training a neural network. It has many applications in the field of banking, automobile industry, agriculture, and healthcare industry. Deep learning has played a significant role in solving complex tasks related [...] Read more.

Deep learning is a sub-branch of artificial intelligence that acquires knowledge by training a neural network. It has many applications in the field of banking, automobile industry, agriculture, and healthcare industry. Deep learning has played a significant role in solving complex tasks related to computer vision, such as image classification, natural language processing, and object detection. On the other hand, optimizers also play an intrinsic role in training the deep learning model. Recent studies have proposed many deep learning models, such as VGG, ResNet, DenseNet, and ImageNet. In addition, there are many optimizers such as stochastic gradient descent (SGD), Adam, AdaDelta, Adabelief, and AdaMax. In this study, we have selected those models that require lower hardware requirements and shorter training times, which facilitates the overall training process. We have modified the Adam based optimizers and minimized the cyclic path. We have removed an additional hyper-parameter from RMSProp and observed that the optimizer works with various models. The learning rate is set to minimum and constant. The initial weights are updated after each epoch, which helps to improve the accuracy of the model. We also changed the position of the epsilon in the default Adam optimizer. By changing the position of the epsilon, it accumulates the updating process. We used various models with SGD, Adam, RMSProp, and the proposed optimization technique. The results indicate that the proposed method is effective in achieving the accuracy and works well with the state-of-the-art architectures. Full article

► Show Figures

Figure 1

18 pages, 4175 KiB

Open AccessArticle

Metaheuristics with Deep Learning Model for Cybersecurity and Android Malware Detection and Classification

by Ashwag Albakri, Fatimah Alhayan, Nazik Alturki, Saahirabanu Ahamed and Shermin Shamsudheen

Appl. Sci. 2023, 13(4), 2172; https://doi.org/10.3390/app13042172 - 8 Feb 2023

Cited by 36 | Viewed by 4516

Abstract

Since the development of information systems during the last decade, cybersecurity has become a critical concern for many groups, organizations, and institutions. Malware applications are among the commonly used tools and tactics for perpetrating a cyberattack on Android devices, and it is becoming [...] Read more.

Since the development of information systems during the last decade, cybersecurity has become a critical concern for many groups, organizations, and institutions. Malware applications are among the commonly used tools and tactics for perpetrating a cyberattack on Android devices, and it is becoming a challenging task to develop novel ways of identifying them. There are various malware detection models available to strengthen the Android operating system against such attacks. These malware detectors categorize the target applications based on the patterns that exist in the features present in the Android applications. As the analytics data continue to grow, they negatively affect the Android defense mechanisms. Since large numbers of unwanted features create a performance bottleneck for the detection mechanism, feature selection techniques are found to be beneficial. This work presents a Rock Hyrax Swarm Optimization with deep learning-based Android malware detection (RHSODL-AMD) model. The technique presented includes finding the Application Programming Interfaces (API) calls and the most significant permissions, which results in effective discrimination between the good ware and malware applications. Therefore, an RHSO based feature subset selection (RHSO-FS) technique is derived to improve the classification results. In addition, the Adamax optimizer with attention recurrent autoencoder (ARAE) model is employed for Android malware detection. The experimental validation of the RHSODL-AMD technique on the Andro-AutoPsy dataset exhibits its promising performance, with a maximum accuracy of 99.05%. Full article

(This article belongs to the Special Issue Information Security and Privacy)

► Show Figures

Figure 1

34 pages, 14410 KiB

Open AccessArticle

Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles

by Marat Sadykov, Sam Haines, Mark Broadmeadow, Geoff Walker and David William Holmes

Energies 2023, 16(4), 1628; https://doi.org/10.3390/en16041628 - 6 Feb 2023

Cited by 6 | Viewed by 3180

Abstract

This paper presents a practical usability investigation of recurrent neural networks (RNNs) to determine the best-suited machine learning method for estimating electric vehicle (EV) batteries’ state of charge. Using models from multiple published sources and cross-validation testing with several driving scenarios to determine [...] Read more.

This paper presents a practical usability investigation of recurrent neural networks (RNNs) to determine the best-suited machine learning method for estimating electric vehicle (EV) batteries’ state of charge. Using models from multiple published sources and cross-validation testing with several driving scenarios to determine the state of charge of lithium-ion batteries, we assessed their accuracy and drawbacks. Five models were selected from various published state-of-charge estimation models, based on cell types with GRU or LSTM, and optimisers such as stochastic gradient descent, Adam, Nadam, AdaMax, and Robust Adam, with extensions via momentum calculus or an attention layer. Each method was examined by applying training techniques such as a learning rate scheduler or rollback recovery to speed up the fitting, highlighting the implementation specifics. All this was carried out using the TensorFlow framework, and the implementation was performed as closely to the published sources as possible on openly available battery data. The results highlighted an average percentage accuracy of 96.56% for the correct SoC estimation and several drawbacks of the overall implementation, and we propose potential solutions for further improvement. Every implemented model had a similar drawback, which was the poor capturing of the middle area of charge, applying a higher weight to the voltage than the current. The combination of these techniques into a single custom model could result in a better-suited model, further improving the accuracy. Full article

(This article belongs to the Special Issue Computational Intelligence in Electrical Systems)

► Show Figures

Figure 1

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI