Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (11)

Search Parameters:
Keywords = GPU slice

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1573 KB  
Article
Lightweight Multi-Class Autoencoder Model for Malicious Traffic Detection in Private 5G Networks
by Jinha Kim, Seungjoon Na and Hwankuk Kim
Appl. Sci. 2025, 15(22), 12242; https://doi.org/10.3390/app152212242 - 18 Nov 2025
Viewed by 453
Abstract
This study proposes a lightweight autoencoder-based detection framework for the efficient detection of multi-class malicious traffic within a private 5G network slicing environment. Conventional deep learning-based detection approaches encounter difficulties in real-time processing and edge environment applications because of their significant computational complexity [...] Read more.
This study proposes a lightweight autoencoder-based detection framework for the efficient detection of multi-class malicious traffic within a private 5G network slicing environment. Conventional deep learning-based detection approaches encounter difficulties in real-time processing and edge environment applications because of their significant computational complexity and resource demands. To address this issue, this study balances traffic data using slice-label-based hierarchical sampling and performs domain-specific feature grouping to reflect semantic similarity. Independent autoencoders are trained for each group, and the latent vectors from the encoder outputs are combined to be used as input for an SVM-based multi-class classifier. This structure reflects traffic differences between slices while also improving computational efficiency. Four sets of experiments were constructed to verify the model’s performance and evaluate its structural performance, resource usage efficiency, classifier generalization performance, and whether it met SLA constraints from various perspectives. As a result, the proposed Multi-AE model achieved an accuracy of 0.93, a balanced accuracy of 0.93, and an ECE of 0.03, demonstrating high stability and detection reliability. Regarding resource utilization efficiency, GPU utilization was under 7%, and the average memory usage was approximately 5.7 GB, demonstrating resource efficiency. In SLA verification, inference latency below 10 ms and a throughput of 564 samples/s were achieved based on URLLC. This study is significant in that it experimentally demonstrated a detection structure that achieves a balance of accuracy, lightweight design, and real-time performance in a 5G slicing environment. Full article
(This article belongs to the Special Issue AI-Enabled Next-Generation Computing and Its Applications)
Show Figures

Figure 1

20 pages, 914 KB  
Article
LR-SQL: A Supervised Fine-Tuning Method for Text2SQL Tasks Under Low-Resource Scenarios
by Wuzhenghong Wen, Yongpan Zhang, Su Pan, Yuwei Sun, Pengwei Lu and Cheng Ding
Electronics 2025, 14(17), 3489; https://doi.org/10.3390/electronics14173489 - 31 Aug 2025
Viewed by 1604
Abstract
In supervised fine-tuning (SFT) for Text2SQL tasks, particularly for databases with numerous tables, encoding schema features requires excessive tokens, escalating GPU resource requirements during fine-tuning. To bridge this gap, we propose LR-SQL, a general dual-model SFT framework comprising a schema linking model and [...] Read more.
In supervised fine-tuning (SFT) for Text2SQL tasks, particularly for databases with numerous tables, encoding schema features requires excessive tokens, escalating GPU resource requirements during fine-tuning. To bridge this gap, we propose LR-SQL, a general dual-model SFT framework comprising a schema linking model and an SQL generation model. At the core of our framework lies the schema linking model, which is trained on a novel downstream task termed slice-based related table filtering. This task dynamically partitions a database into adjustable slices of tables and sequentially evaluates the relevance of each slice to the input query, thereby reducing token consumption per iteration. However, slicing fragments destroys database information, impairing the model’s ability to comprehend the complete database. Thus, we integrate Chain of Thought (CoT) in training, enabling the model to reconstruct the full database context from discrete slices, thereby enhancing inference fidelity. Ultimately, the SQL generation model uses the result from the schema linking model to generate the final SQL. Extensive experiments demonstrate that our proposed LR-SQL reduces total GPU memory usage by 40% compared to baseline SFT methods, with only a 2% drop in table prediction accuracy for the schema linking task and a negligible 0.6% decrease in overall Text2SQL Execution Accuracy. Full article
(This article belongs to the Special Issue Advances in Data Security: Challenges, Technologies, and Applications)
Show Figures

Figure 1

21 pages, 2346 KB  
Article
Explainable Liver Segmentation and Volume Assessment Using Parallel Cropping
by Nitin Satpute, Nikhil B. Gaikwad, Smith K. Khare, Juan Gómez-Luna and Joaquín Olivares
Appl. Sci. 2025, 15(14), 7807; https://doi.org/10.3390/app15147807 - 11 Jul 2025
Viewed by 1174
Abstract
Accurate liver segmentation and volume estimation from CT images are critical for diagnosis, surgical planning, and treatment monitoring. This paper proposes a GPU-accelerated voxel-level cropping method that localizes the liver region in a single pass, significantly reducing unnecessary computation and memory transfers. We [...] Read more.
Accurate liver segmentation and volume estimation from CT images are critical for diagnosis, surgical planning, and treatment monitoring. This paper proposes a GPU-accelerated voxel-level cropping method that localizes the liver region in a single pass, significantly reducing unnecessary computation and memory transfers. We integrate this pre-processing step into two segmentation pipelines: a traditional Chan-Vese model and a deep learning U-Net trained on the LiTS dataset. After segmentation, a seeded region growing algorithm is used for 3D liver volume assessment. Our method reduces unnecessary image data by an average of 90%, speeds up segmentation by 1.39× for Chan-Vese, and improves dice scores from 0.938 to 0.960. When integrated into U-Net pipelines, the post-processed dice score rises drastically from 0.521 to 0.956. Additionally, the voxel-based cropping approach achieves a 2.29× acceleration compared to state-of-the-art slice-based methods in 3D volume assessment. Our results demonstrate high segmentation accuracy and precise volume estimates with errors below 2.5%. This proposal offers a scalable, interpretable, efficient liver segmentation and volume assessment solution. It eliminates unwanted artifacts and facilitates real-time deployment in clinical environments where transparency and resource constraints are critical. It is also tested in other anatomical structures such as skin, lungs, and vessels, enabling broader applicability in medical imaging. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision Applications)
Show Figures

Figure 1

23 pages, 5188 KB  
Article
Comparison of Affine and Rational Quadratic Spline Coupling and Autoregressive Flows through Robust Statistical Tests
by Andrea Coccaro, Marco Letizia, Humberto Reyes-González and Riccardo Torre
Symmetry 2024, 16(8), 942; https://doi.org/10.3390/sym16080942 - 23 Jul 2024
Cited by 15 | Viewed by 3686
Abstract
Normalizing flows have emerged as a powerful brand of generative models, as they not only allow for efficient sampling of complicated target distributions but also deliver density estimation by construction. We propose here an in-depth comparison of coupling and autoregressive flows, both based [...] Read more.
Normalizing flows have emerged as a powerful brand of generative models, as they not only allow for efficient sampling of complicated target distributions but also deliver density estimation by construction. We propose here an in-depth comparison of coupling and autoregressive flows, both based on symmetric (affine) and non-symmetric (rational quadratic spline) bijectors, considering four different architectures: real-valued non-Volume preserving (RealNVP), masked autoregressive flow (MAF), coupling rational quadratic spline (C-RQS), and autoregressive rational quadratic spline (A-RQS). We focus on a set of multimodal target distributions of increasing dimensionality ranging from 4 to 400. The performances were compared by means of different test statistics for two-sample tests, built from known distance measures: the sliced Wasserstein distance, the dimension-averaged one-dimensional Kolmogorov–Smirnov test, and the Frobenius norm of the difference between correlation matrices. Furthermore, we included estimations of the variance of both the metrics and the trained models. Our results indicate that the A-RQS algorithm stands out both in terms of accuracy and training speed. Nonetheless, all the algorithms are generally able, without too much fine-tuning, to learn complicated distributions with limited training data and in a reasonable time of the order of hours on a Tesla A40 GPU. The only exception is the C-RQS, which takes significantly longer to train, does not always provide good accuracy, and becomes unstable for large dimensionalities. All algorithms were implemented using TensorFlow2 and TensorFlow Probability and have been made available on GitHub. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis II)
Show Figures

Figure 1

12 pages, 1921 KB  
Brief Report
Efficient Brain Age Prediction from 3D MRI Volumes Using 2D Projections
by Johan Jönemo, Muhammad Usman Akbar, Robin Kämpe, J. Paul Hamilton and Anders Eklund
Brain Sci. 2023, 13(9), 1329; https://doi.org/10.3390/brainsci13091329 - 15 Sep 2023
Cited by 6 | Viewed by 3399
Abstract
Using 3D CNNs on high-resolution medical volumes is very computationally demanding, especially for large datasets like UK Biobank, which aims to scan 100,000 subjects. Here, we demonstrate that using 2D CNNs on a few 2D projections (representing mean and standard deviation across axial, [...] Read more.
Using 3D CNNs on high-resolution medical volumes is very computationally demanding, especially for large datasets like UK Biobank, which aims to scan 100,000 subjects. Here, we demonstrate that using 2D CNNs on a few 2D projections (representing mean and standard deviation across axial, sagittal and coronal slices) of 3D volumes leads to reasonable test accuracy (mean absolute error of about 3.5 years) when predicting age from brain volumes. Using our approach, one training epoch with 20,324 subjects takes 20–50 s using a single GPU, which is two orders of magnitude faster than a small 3D CNN. This speedup is explained by the fact that 3D brain volumes contain a lot of redundant information, which can be efficiently compressed using 2D projections. These results are important for researchers who do not have access to expensive GPU hardware for 3D CNNs. Full article
(This article belongs to the Special Issue Advanced Machine Learning Algorithms for Biomedical Data and Imaging)
Show Figures

Figure 1

21 pages, 8057 KB  
Article
GPU-Accelerated Infill Criterion for Multi-Objective Efficient Global Optimization Algorithm and Its Applications
by Shengguan Xu, Jiale Zhang, Hongquan Chen, Yisheng Gao, Yunkun Gao, Huanqin Gao and Xuesong Jia
Appl. Sci. 2023, 13(1), 352; https://doi.org/10.3390/app13010352 - 27 Dec 2022
Cited by 3 | Viewed by 2401
Abstract
In this work, a novel multi-objective efficient global optimization (EGO) algorithm, namely GMOEGO, is presented by proposing an approach of available threads’ multi-objective infill criterion. The work applies the outstanding hypervolume-based expected improvement criterion to enhance the Pareto solutions in view of the [...] Read more.
In this work, a novel multi-objective efficient global optimization (EGO) algorithm, namely GMOEGO, is presented by proposing an approach of available threads’ multi-objective infill criterion. The work applies the outstanding hypervolume-based expected improvement criterion to enhance the Pareto solutions in view of the accuracy and their distribution on the Pareto front, and the values of sophisticated hypervolume improvement (HVI) are technically approximated by counting the Monte Carlo sampling points under the modern GPU (graphics processing unit) architecture. As compared with traditional methods, such as slice-based hypervolume integration, the programing complexity of the present approach is greatly reduced due to such counting-like simple operations. That is, the calculation of the sophisticated HVI, which has proven to be the most time-consuming part with many objectives, can be light in programed implementation. Meanwhile, the time consumption of massive computing associated with such Monte Carlo-based HVI approximation (MCHVI) is greatly alleviated by parallelizing in the GPU. A set of mathematical function cases and a real engineering airfoil shape optimization problem that appeared in the literature are taken to validate the proposed approach. All the results show that, less time-consuming, up to around 13.734 times the speedup is achieved when appropriate Pareto solutions are captured. Full article
(This article belongs to the Section Aerospace Science and Engineering)
Show Figures

Figure 1

15 pages, 894 KB  
Article
Secure and Robust Internet of Things with High-Speed Implementation of PRESENT and GIFT Block Ciphers on GPU
by Hyunjun Kim , Siwoo Eum , Wai-Kong Lee , Sokjoon Lee  and Hwajeong Seo
Appl. Sci. 2022, 12(20), 10192; https://doi.org/10.3390/app122010192 - 11 Oct 2022
Cited by 4 | Viewed by 2351
Abstract
With the advent of the Internet of Things (IoT) and cloud computing technologies, vast amounts of data are being created and communicated in IoT networks. Block ciphers are being used to protect these data from malicious attacks. Massive computation overheads introduced by bulk [...] Read more.
With the advent of the Internet of Things (IoT) and cloud computing technologies, vast amounts of data are being created and communicated in IoT networks. Block ciphers are being used to protect these data from malicious attacks. Massive computation overheads introduced by bulk encryption using block ciphers can become a performance bottleneck of the server, requiring high throughput. As the need for high-speed encryption required for such communications has emerged, research is underway to utilize a graphics processor for encryption processing based on the high processing power of the GPU. Applying bit-slicing of lightweight ciphers was not covered in the previous implementation of lightweight ciphers on GPU architecture. In this paper, we implemented PRESENT and GIFT lightweight block ciphers GPU architectures. It minimizes the computation overhead caused by optimizing the algorithm by applying the bit-slicing technique. We performed practical analysis by testing practical use cases. We tested PRESENT-80, PRESENT-128, GIFT-64, and GIFT-128 block ciphers in RTX3060 platforms. The throughput of the exhaustive search are 553.932 Gbps, 529.952 Gbps, 583.859 Gbps, and 214.284 Gbps for PRESENT-80, PRESENT-128, GIFT-64, and GIFT-128, respectively. For the case of data encryption, it achieved 24.264 Gbps, 24.522 Gbps, 85.283 Gbps, and 10.723 Gbps for PRESENT-80, PRESENT-128, GIFT-64, and GIFT-128, respectively. Specifically, the proposed implementation of a PRESENT block cipher is approximately 4× higher performance than the latest work that implements PRESENT block cipher. Lastly, the proposed implementation of a GIFT block cipher on GPU is the first implementation for the server environment. Full article
(This article belongs to the Special Issue IoT in Smart Cities and Homes)
Show Figures

Figure 1

12 pages, 4929 KB  
Article
Slicing Algorithm and Partition Scanning Strategy for 3D Printing Based on GPU Parallel Computing
by Xuhui Lai and Zhengying Wei
Materials 2021, 14(15), 4297; https://doi.org/10.3390/ma14154297 - 31 Jul 2021
Cited by 7 | Viewed by 4555
Abstract
Aiming at the problems of over stacking, warping deformation and rapid adjustment of layer thickness in electron beam additive manufacturing, the 3D printing slicing algorithm and partition scanning strategy for numerical control systems are studied. The GPU (graphics processing unit) is used to [...] Read more.
Aiming at the problems of over stacking, warping deformation and rapid adjustment of layer thickness in electron beam additive manufacturing, the 3D printing slicing algorithm and partition scanning strategy for numerical control systems are studied. The GPU (graphics processing unit) is used to slice the 3D model, and the STL (stereolithography) file is calculated in parallel according to the normal vector and the vertex coordinates. The voxel information of the specified layer is dynamically obtained by adjusting the projection matrix to the slice height. The MS (marching squares) algorithm is used to extract the coordinate sequence of the binary image, and the ordered contour coordinates are output. In order to avoid shaking of the electron gun when the numerical control system is forming the microsegment straight line, and reduce metal overcrowding in the continuous curve C0, the NURBS (non-uniform rational b-splines) basis function is used to perform curve interpolation on the contour data. Aiming at the deformation problem of large block components in the forming process, a hexagonal partition and parallel line variable angle scanning technology is adopted, and an effective temperature and deformation control strategy is formed according to the European-distance planning scan order of each partition. The results show that the NURBS segmentation fits closer to the original polysurface cut line, and the error is reduced by 34.2% compared with the STL file slice data. As the number of triangular patches increases, the algorithm exhibits higher efficiency, STL files with 1,483,132 facets can be cut into 4488 layers in 89 s. The slicing algorithm involved in this research can be used as a general data processing algorithm for additive manufacturing technology to reduce the waiting time of the contour extraction process. Combined with the partition strategy, it can provide new ideas for the dynamic adjustment of layer thickness and deformation control in the forming process of large parts. Full article
(This article belongs to the Topic Modern Technologies and Manufacturing Systems)
Show Figures

Figure 1

20 pages, 7955 KB  
Article
Detecting the Early Flowering Stage of Tea Chrysanthemum Using the F-YOLO Model
by Chao Qi, Innocent Nyalala and Kunjie Chen
Agronomy 2021, 11(5), 834; https://doi.org/10.3390/agronomy11050834 - 23 Apr 2021
Cited by 15 | Viewed by 4232
Abstract
Detecting the flowering stage of tea chrysanthemum is a key mechanism of the selective chrysanthemum harvesting robot. However, under complex, unstructured scenarios, such as illumination variation, occlusion, and overlapping, detecting tea chrysanthemum at a specific flowering stage is a real challenge. This paper [...] Read more.
Detecting the flowering stage of tea chrysanthemum is a key mechanism of the selective chrysanthemum harvesting robot. However, under complex, unstructured scenarios, such as illumination variation, occlusion, and overlapping, detecting tea chrysanthemum at a specific flowering stage is a real challenge. This paper proposes a highly fused, lightweight detection model named the Fusion-YOLO (F-YOLO) model. First, cutout and mosaic input components are equipped, with which the fusion module can better understand the features of the chrysanthemum through slicing. In the backbone component, the Cross-Stage Partial DenseNet (CSPDenseNet) network is used as the main network, and feature fusion modules are added to maximize the gradient flow difference. Next, in the neck component, the Cross-Stage Partial ResNeXt (CSPResNeXt) network is taken as the main network to truncate the redundant gradient flow. Finally, in the head component, the multi-scale fusion network is adopted to aggregate the parameters of two different detection layers from different backbone layers. The results show that the F-YOLO model is superior to state-of-the-art technologies in terms of object detection, that this method can be deployed on a single mobile GPU, and that it will be one of key technologies to build a selective chrysanthemum harvesting robot system in the future. Full article
Show Figures

Figure 1

22 pages, 3238 KB  
Article
Groupwise Non-Rigid Registration with Deep Learning: An Affordable Solution Applied to 2D Cardiac Cine MRI Reconstruction
by Elena Martín-González, Teresa Sevilla, Ana Revilla-Orodea, Pablo Casaseca-de-la-Higuera and Carlos Alberola-López
Entropy 2020, 22(6), 687; https://doi.org/10.3390/e22060687 - 19 Jun 2020
Cited by 11 | Viewed by 4360
Abstract
Groupwise image (GW) registration is customarily used for subsequent processing in medical imaging. However, it is computationally expensive due to repeated calculation of transformations and gradients. In this paper, we propose a deep learning (DL) architecture that achieves GW elastic registration of a [...] Read more.
Groupwise image (GW) registration is customarily used for subsequent processing in medical imaging. However, it is computationally expensive due to repeated calculation of transformations and gradients. In this paper, we propose a deep learning (DL) architecture that achieves GW elastic registration of a 2D dynamic sequence on an affordable average GPU. Our solution, referred to as dGW, is a simplified version of the well-known U-net. In our GW solution, the image that the other images are registered to, referred to in the paper as template image, is iteratively obtained together with the registered images. Design and evaluation have been carried out using 2D cine cardiac MR slices from 2 databases respectively consisting of 89 and 41 subjects. The first database was used for training and validation with 66.6–33.3% split. The second one was used for validation (50%) and testing (50%). Additional network hyperparameters, which are—in essence—those that control the transformation smoothness degree, are obtained by means of a forward selection procedure. Our results show a 9-fold runtime reduction with respect to an optimization-based implementation; in addition, making use of the well-known structural similarity (SSIM) index we have obtained significative differences with dGW with respect to an alternative DL solution based on Voxelmorph. Full article
Show Figures

Figure 1

20 pages, 2138 KB  
Article
AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional Datasets
by Peng Mi, Maoyuan Sun, Moeti Masiane, Yong Cao and Chris North
Informatics 2016, 3(4), 18; https://doi.org/10.3390/informatics3040018 - 7 Oct 2016
Cited by 2 | Viewed by 9075
Abstract
This paper presents the Animated VISualization Tool (AVIST), an exploration-oriented data visualization tool that enables rapidly exploring and filtering large time series multidimensional datasets. AVIST highlights interactive data exploration by revealing fine data details. This is achieved through the use of animation and [...] Read more.
This paper presents the Animated VISualization Tool (AVIST), an exploration-oriented data visualization tool that enables rapidly exploring and filtering large time series multidimensional datasets. AVIST highlights interactive data exploration by revealing fine data details. This is achieved through the use of animation and cross-filtering interactions. To support interactive exploration of big data, AVIST features a GPU (Graphics Processing Unit)-centric design. Two key aspects are emphasized on the GPU-centric design: (1) both data management and computation are implemented on the GPU to leverage its parallel computing capability and fast memory bandwidth; (2) a GPU-based directed acyclic graph is proposed to characterize data transformations triggered by users’ demands. Moreover, we implement AVIST based on the Model-View-Controller (MVC) architecture. In the implementation, we consider two aspects: (1) user interaction is highlighted to slice big data into small data; and (2) data transformation is based on parallel computing. Two case studies demonstrate how AVIST can help analysts identify abnormal behaviors and infer new hypotheses by exploring big datasets. Finally, we summarize lessons learned about GPU-based solutions in interactive information visualization with big data. Full article
(This article belongs to the Special Issue Information Visualization for Massive Data)
Show Figures

Figure 1

Back to TopTop