MDPI - Publisher of Open Access Journals

16 pages, 23492 KiB

Open AccessArticle

CAGNet: A Network Combining Multiscale Feature Aggregation and Attention Mechanisms for Intelligent Facial Expression Recognition in Human-Robot Interaction

by Dengpan Zhang, Wenwen Ma, Zhihao Shen and Qingping Ma

Sensors 2025, 25(12), 3653; https://doi.org/10.3390/s25123653 - 11 Jun 2025

Viewed by 477

Abstract

The development of Facial Expression Recognition (FER) technology has significantly enhanced the naturalness and intuitiveness of human-robot interaction. In the field of service robots, particularly in applications such as production assistance, caregiving, and daily service communication, efficient FER capabilities are crucial. However, existing [...] Read more.

The development of Facial Expression Recognition (FER) technology has significantly enhanced the naturalness and intuitiveness of human-robot interaction. In the field of service robots, particularly in applications such as production assistance, caregiving, and daily service communication, efficient FER capabilities are crucial. However, existing Convolutional Neural Network (CNN) models still have limitations in terms of feature representation and recognition accuracy for facial expressions. To address these challenges, we propose CAGNet, a novel network that combines multiscale feature aggregation and attention mechanisms. CAGNet employs a deep learning-based hierarchical convolutional architecture, enhancing the extraction of features at multiple scales through stacked convolutional layers. The network integrates the Convolutional Block Attention Module (CBAM) and Global Average Pooling (GAP) modules to optimize the capture of both local and global features. Additionally, Batch Normalization (BN) layers and Dropout techniques are incorporated to improve model stability and generalization. CAGNet was evaluated on two standard datasets, FER2013 and CK+, and the experiment results demonstrate that the network achieves accuracies of 71.52% and 97.97%, respectively, in FER. These results not only validate the effectiveness and superiority of our approach but also provide a new technical solution for FER. Furthermore, CAGNet offers robust support for the intelligent upgrade of service robots. Full article

(This article belongs to the Special Issue Sensing Technologies Applied in Human Emotion and Facial Expression Recognition)

► Show Figures

Figure 1

21 pages, 452 KiB

Open AccessArticle

LG-BiTCN: A Lightweight Malicious Traffic Detection Model Based on Federated Learning for Internet of Things

by Yuehua Huo, Junhan Chen, Yunhao Guo, Wei Liang and Jiyan Sun

Electronics 2025, 14(8), 1560; https://doi.org/10.3390/electronics14081560 - 11 Apr 2025

Viewed by 389

Abstract

The rapid growth of IoT devices has increased security attack behaviors, posing a challenge to IoT security. Some Federated-Learning-based detection methods have been widely used to detect malicious attacks in the IoT by analyzing network traffic; because of the nature of Federated Learning, [...] Read more.

The rapid growth of IoT devices has increased security attack behaviors, posing a challenge to IoT security. Some Federated-Learning-based detection methods have been widely used to detect malicious attacks in the IoT by analyzing network traffic; because of the nature of Federated Learning, these methods can protect user privacy and reduce bandwidth consumption. However, existing malicious traffic detection models are often complex, requiring significant computational resources for training. In addition, high-dimensional input features often contain redundant information, which further increases computational overhead. To mitigate this, many model lightweighting techniques are utilized, and many non-end-to-end dimensionality reduction methods are employed; however, these lightweighting methods still struggle to meet the computational demands, and these feature downscaling methods tend to compromise the model’s generalizability and accuracy. In addition, existing methods are unable to dynamically select long-term dependencies when extracting traffic time-series features, limiting the performance of the model when dealing with long time series. To address the above challenges, this paper proposes a lightweight malicious traffic detection model, named the lightweight gated bidirectional temporal convolutional network (LG-BiTCN), based on Federated Learning. First, we use global average pooling (GAP) and a pointwise convolutional layer as a classification module, significantly reducing the model’s parameter count. We also propose an end-to-end adaptive PCA dimension adjustment algorithm for automatic dimensionality reduction to reduce computational complexity and enhance model generalizability. Second, we incorporate gated convolution into the LG-BiTCN architecture, allowing for the dynamic selection of long-term dependencies, enhancing detection accuracy while maintaining computational efficiency. We evaluated the LG-BiTCN’s effectiveness by comparing it with three advanced baseline models on three generic datasets. The results show that the LG-BiTCN achieves over 99.6% accuracy while maintaining the lowest computational complexity. Additionally, in a Federated Learning setup, it requires just two communication rounds to reach 96.75% accuracy. Full article

(This article belongs to the Special Issue Internet of Things (IoT) Privacy and Security in the Age of Big Data)

► Show Figures

Figure 1

16 pages, 3363 KiB

Open AccessArticle

E-HRNet: Enhanced Semantic Segmentation Using Squeeze and Excitation

by Jin-Seong Kim, Sung-Wook Park, Jun-Yeong Kim, Jun Park, Jun-Ho Huh, Se-Hoon Jung and Chun-Bo Sim

Electronics 2023, 12(17), 3619; https://doi.org/10.3390/electronics12173619 - 27 Aug 2023

Cited by 9 | Viewed by 3839

Abstract

In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature [...] Read more.

In the field of computer vision, convolutional neural network (CNN)-based models have demonstrated high accuracy and good generalization performance. However, in semantic segmentation, CNN-based models have a problem—the spatial and global context information is lost owing to a decrease in resolution during feature extraction. High-resolution networks (HRNets) can resolve this problem by keeping high-resolution processing layers parallel. However, information loss still occurs. Therefore, in this study, we propose an HRNet combined with an attention module to address the issue of information loss. The attention module is strategically placed immediately after each convolution to alleviate information loss by emphasizing the information retained at each stage. To achieve this, we employed a squeeze-and-excitation (SE) block as the attention module, which can seamlessly integrate into any model and enhance the performance without imposing significant parameter increases. It emphasizes the spatial and global context information by compressing and recalibrating features through global average pooling (GAP). A performance comparison between the existing HRNet model and the proposed model using various datasets show that the mean class-wise intersection over union (mIoU) and mean pixel accuracy (MeanACC) improved with the proposed model, however, there was a small increase in the number of parameters. With cityscapes dataset, MeanACC decreased by 0.1% with the proposed model compared to the baseline model, but mIoU increased by 0.5%. With the LIP dataset, the MeanACC and mIoU increased by 0.3% and 0.4%, respectively. The mIoU also decreased by 0.1% with the PASCAL Context dataset, whereas the MeanACC increased by 0.7%. Overall, the proposed model showed improved performance compared to the existing model. Full article

(This article belongs to the Special Issue Applications of Smart Internet of Things)

► Show Figures

Figure 1

17 pages, 2232 KiB

Open AccessArticle

Gaze Tracking Based on Concatenating Spatial-Temporal Features

by Bor-Jiunn Hwang, Hui-Hui Chen, Chaur-Heh Hsieh and Deng-Yu Huang

Sensors 2022, 22(2), 545; https://doi.org/10.3390/s22020545 - 11 Jan 2022

Cited by 5 | Viewed by 2385

Abstract

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In [...] Read more.

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

15 pages, 3202 KiB

Open AccessArticle

A Novel Analog Circuit Soft Fault Diagnosis Method Based on Convolutional Neural Network and Backward Difference

by Chenggong Zhang, Daren Zha, Lei Wang and Nan Mu

Symmetry 2021, 13(6), 1096; https://doi.org/10.3390/sym13061096 - 21 Jun 2021

Cited by 22 | Viewed by 2792

Abstract

This paper develops a novel soft fault diagnosis approach for analog circuits. The proposed method employs the backward difference strategy to process the data, and a novel variant of convolutional neural network, i.e., convolutional neural network with global average pooling (CNN-GAP) is taken [...] Read more.

This paper develops a novel soft fault diagnosis approach for analog circuits. The proposed method employs the backward difference strategy to process the data, and a novel variant of convolutional neural network, i.e., convolutional neural network with global average pooling (CNN-GAP) is taken for feature extraction and fault classification. Specifically, the measured raw domain response signals are firstly processed by the backward difference strategy and the first-order and the second-order backward difference sequences are generated, which contain the signal variation and the rate of variation characteristics. Then, based on the one-dimensional convolutional neural network, the CNN-GAP is developed by introducing the global average pooling technical. Since global average pooling calculates each input vector’s mean value, the designed CNN-GAP could deal with different lengths of input signals and be applied to diagnose different circuits. Additionally, the first-order and the second-order backward difference sequences along with the raw domain response signals are directly fed into the CNN-GAP, in which the convolutional layers automatically extract and fuse multi-scale features. Finally, fault classification is performed by the fully connected layer of the CNN-GAP. The effectiveness of our proposal is verified by two benchmark circuits under symmetric and asymmetric fault conditions. Experimental results prove that the proposed method outperforms the existing methods in terms of diagnosis accuracy and reliability. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

17 pages, 2604 KiB

Open AccessArticle

Malware Classification Based on Shallow Neural Network

by Pin Yang, Huiyu Zhou, Yue Zhu, Liang Liu and Lei Zhang

Future Internet 2020, 12(12), 219; https://doi.org/10.3390/fi12120219 - 2 Dec 2020

Cited by 10 | Viewed by 3256

Abstract

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code [...] Read more.

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code families and trace the source of cybercrime. The existing methods of malware classification emphasize the depth of the neural network, which has the problems of a long training time and large computational cost. In this work, we propose the shallow neural network-based malware classifier (SNNMAC), a malware classification model based on shallow neural networks and static analysis. Our approach bridges the gap between precise but slow methods and fast but less precise methods in existing works. For each sample, we first generate n-grams from their opcode sequences of the binary file with a decompiler. An improved n-gram algorithm based on control transfer instructions is designed to reduce the n-gram dataset. Then, the SNNMAC exploits a shallow neural network, replacing the full connection layer and softmax with the average pooling layer and hierarchical softmax, to learn from the dataset and perform classification. We perform experiments on the Microsoft malware dataset. The evaluation result shows that the SNNMAC outperforms most of the related works with 99.21% classification precision and reduces the training time by more than half when compared with the methods using DNN (Deep Neural Networks). Full article

(This article belongs to the Section Cybersecurity)

► Show Figures

Figure 1

15 pages, 636 KiB

Open AccessArticle

Multi-Pooled Inception Features for No-Reference Image Quality Assessment

by Domonkos Varga

Appl. Sci. 2020, 10(6), 2186; https://doi.org/10.3390/app10062186 - 23 Mar 2020

Cited by 35 | Viewed by 5165

Abstract

Image quality assessment (IQA) is an important element of a broad spectrum of applications ranging from automatic video streaming to display technology. Furthermore, the measurement of image quality requires a balanced investigation of image content and features. Our proposed approach extracts visual features [...] Read more.

Image quality assessment (IQA) is an important element of a broad spectrum of applications ranging from automatic video streaming to display technology. Furthermore, the measurement of image quality requires a balanced investigation of image content and features. Our proposed approach extracts visual features by attaching global average pooling (GAP) layers to multiple Inception modules of on an ImageNet database pretrained convolutional neural network (CNN). In contrast to previous methods, we do not take patches from the input image. Instead, the input image is treated as a whole and is run through a pretrained CNN body to extract resolution-independent, multi-level deep features. As a consequence, our method can be easily generalized to any input image size and pretrained CNNs. Thus, we present a detailed parameter study with respect to the CNN base architectures and the effectiveness of different deep features. We demonstrate that our best proposal—called MultiGAP-NRIQA—is able to outperform the state-of-the-art on three benchmark IQA databases. Furthermore, these results were also confirmed in a cross database test using the LIVE In the Wild Image Quality Challenge database. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

10 pages, 2566 KiB

Open AccessArticle

Low-Cost Image Search System on Off-Line Situation

by Mery Diana, Juntaro Chikama, Motoki Amagasaki and Masahiro Iida

Electronics 2020, 9(1), 153; https://doi.org/10.3390/electronics9010153 - 14 Jan 2020

Cited by 3 | Viewed by 3220

Abstract

Implementation of deep learning in low-cost hardware, such as an edge device, is challenging. Reducing the complexity of the network is one of the solutions to reduce resource usage in the system, which is needed by low-cost system implementation. In this study, we [...] Read more.

Implementation of deep learning in low-cost hardware, such as an edge device, is challenging. Reducing the complexity of the network is one of the solutions to reduce resource usage in the system, which is needed by low-cost system implementation. In this study, we use the general average pooling layer to replace the fully connected layers on the convolutional neural network (CNN) model, used in the previous study, to reduce the number of network properties without decreasing the model performance in developing image classification for image search tasks. We apply the cosine similarity to measure the characteristic similarity between the feature vector of image input and extracting feature vectors from testing images in the database. The result of the cosine similarity calculation will show the image as the result of the searching image task. In the implementation, we use Raspberry Pi 3 as a low-cost hardware and CIFAR-10 dataset for training and testing images. Base on the development and implementation, the accuracy of the model is 68%, and the system generates the result of the image search base on the characteristic similarity of the images. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence and Deep Learning in Wireless Communications Systems)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI