MDPI - Publisher of Open Access Journals

18 pages, 2109 KB

Open AccessArticle

An FPGA-Based YOLOv5n Accelerator for Online Multi-Track Particle Localization

by Zixuan Song, Wangwang Tang, Wendi Deng, Hongxia Wang, Guangming Huang, Haoran Wu, Yueting Guo, Jun Liu, Kai Jin and Zhiyuan Ma

Electronics 2026, 15(4), 810; https://doi.org/10.3390/electronics15040810 - 13 Feb 2026

Viewed by 227

Abstract

Reliability testing for Single Event Effects (SEEs) requires accurate localization of heavy-ion tracks from projection images. Conventional localization often relies on handcrafted features and geometric fitting, which is sensitive to noise and difficult to accelerate in hardware. This paper presents a lightweight detector [...] Read more.

Reliability testing for Single Event Effects (SEEs) requires accurate localization of heavy-ion tracks from projection images. Conventional localization often relies on handcrafted features and geometric fitting, which is sensitive to noise and difficult to accelerate in hardware. This paper presents a lightweight detector based on YOLOv5n that treats charge tracks in Topmetal pixel sensor projections as distinct objects and directly regresses the track angle and intercept, along with bounding boxes, in a single forward pass. On a synthetic dataset, the model achieves a precision of 0.9626 and a recall of 0.9493, with line-parameter errors of 0.3930° in angle and 0.4842 pixels in intercept. On experimental krypton beam data, the detector reaches a precision of 0.92 and a recall of 0.96, with a position resolution of 52.05 μm. We further deploy the model on an Xilinx Alveo U200, achieving an average per-frame accelerator latency of 3.1 ms while preserving measurement quality. This approach enables accurate, online track localization for SEE monitoring on Field-Programmable Gate Array (FPGA) platforms. Full article

(This article belongs to the Section Industrial Electronics)

► Show Figures

Figure 1

13 pages, 835 KB

Open AccessEditor’s ChoiceArticle

Layer-Pipelined CNN Accelerator Design on 2.5D FPGAs

by Mengxuan Wang and Chang Wu

Electronics 2025, 14(23), 4587; https://doi.org/10.3390/electronics14234587 - 23 Nov 2025

Viewed by 643

Abstract

With the rapid advancement of 2.5D FPGA technology, the integration of multiple FPGA dies enables larger design capacity and higher computing power. This progress provides a high-speed hardware platform well-suited for neural network acceleration. In this paper, we present a high-performance accelerator design [...] Read more.

With the rapid advancement of 2.5D FPGA technology, the integration of multiple FPGA dies enables larger design capacity and higher computing power. This progress provides a high-speed hardware platform well-suited for neural network acceleration. In this paper, we present a high-performance accelerator design for large-scale neural networks on 2.5D FPGAs. First, we propose a layer pipeline architecture that utilizes multiple accelerator cores, each equipped with individual high-bandwidth DDR memory. To address inter-die data dependencies, we introduce a block convolution mechanism that enables independent and efficient computation across dies. Furthermore, we propose a design space exploration scheme to optimize computational efficiency under resource constraints. Experimental results demonstrate that our proposed accelerator achieves 4860.87 GOPS when running VGG-16 on the Alveo U250 board, significantly outperforming existing layer pipeline designs on the same platform. Full article

(This article belongs to the Special Issue Advances in High-Performance and Parallel Computing)

► Show Figures

Figure 1

21 pages, 653 KB

Open AccessArticle

A Stateful Extension to P4THLS for Advanced Telemetry and Flow Control

by Mostafa Abbasmollaei, Tarek Ould-Bachir and Yvon Savaria

Future Internet 2025, 17(11), 530; https://doi.org/10.3390/fi17110530 - 20 Nov 2025

Viewed by 574

Abstract

Programmable data planes are increasingly essential for enabling In-band Network Telemetry (INT), fine-grained monitoring, and congestion-aware packet processing. Although the P4 language provides a high-level abstraction to describe such behaviors, implementing them efficiently on FPGA-based platforms remains challenging due to hardware constraints and [...] Read more.

Programmable data planes are increasingly essential for enabling In-band Network Telemetry (INT), fine-grained monitoring, and congestion-aware packet processing. Although the P4 language provides a high-level abstraction to describe such behaviors, implementing them efficiently on FPGA-based platforms remains challenging due to hardware constraints and limited compiler support. Building on P4THLS framework, which leverages HLS for FPGA data-plane programmability, this paper extends the approach by introducing support for P4-style stateful objects and a structured metadata propagation mechanism throughout the processing pipeline. These extensions enrich pipeline logic with real-time context and flow-level state, thereby facilitating advanced applications while preserving programmability. The generated codebase remains extensible and customizable, allowing developers to adapt the design to various scenarios. We implement two representative use cases to demonstrate the effectiveness of the approach: an INT-enabled forwarding engine that embeds hop-by-hop telemetry into packets and a congestion-aware switch that dynamically adapts to queue conditions. Evaluation of an AMD Alveo U280 FPGA implementation reveals that incorporating INT support adds roughly 900 LUTs and 1000 Flip-Flops relative to the baseline switch. Furthermore, the proposed meter maintains rate measurement errors below 3% at 700 Mbps and achieves up to a 5× reduction in LUT and 2× reduction in Flip-Flop usage compared to existing FPGA-based stateful designs, substantially expanding the applicability of P4THLS for complex and performance-critical network functions. Full article

(This article belongs to the Special Issue Key Enabling Technologies for Beyond 5G Networks—2nd Edition)

► Show Figures

Figure 1

9 pages, 583 KB

Open AccessArticle

Porting MADGRAPH to FPGA Using High-Level Synthesis (HLS)

by Héctor Gutiérrez Arance, Luca Fiorini, Alberto Valero Biot, Francisco Hervás Álvarez, Santiago Folgueras, Carlos Vico Villalba, Pelayo Leguina López, Arantza Oyanguren Campos, Valerii Kholoimov, Volodymyr Svintozelskyi and Jiahui Zhuo

Particles 2025, 8(3), 63; https://doi.org/10.3390/particles8030063 - 20 Jun 2025

Viewed by 1219

Abstract

The escalating demand for data processing in particle physics research has spurred the exploration of novel technologies to enhance the efficiency and speed of calculations. This study presents the development of an implementation of MADGRAPH, a widely used tool in particle collision simulations, [...] Read more.

The escalating demand for data processing in particle physics research has spurred the exploration of novel technologies to enhance the efficiency and speed of calculations. This study presents the development of an implementation of MADGRAPH, a widely used tool in particle collision simulations, to Field Programmable Gate Array (FPGA) using High-Level Synthesis (HLS). This research presents a proof of concept limited to a single, relatively simple process

e^{+} e^{-} \to μ^{+} μ^{-}

. The experimental evaluation methodology is described, focusing on performance comparison between traditional CPU implementations, GPU acceleration, and the new FPGA approach. This study describes the complex process of adapting MADGRAPH to FPGA using HLS, focusing on optimizing algorithms for parallel processing. These advancements could enable faster execution of complex simulations, highlighting FPGA’s crucial role in advancing particle physics research. The encouraging results obtained in this proof of concept prove potential interest in testing the performance of the FPGA implementation of more complex processes. Full article

(This article belongs to the Special Issue Selected Papers from the 4th MODE Workshop on Differentiable Programming for Experiment Design)

► Show Figures

Figure 1

18 pages, 2152 KB

Open AccessArticle

Development and Laboratory Validation of Rapid, Bird-Side Molecular Diagnostic Assays for Avian Influenza Virus Including Panzootic H5Nx

by Matthew Coopersmith, Remco Dijkman, Maggie L. Bartlett, Richard Currie, Sander Schuurman and Sjaak de Wit

Microorganisms 2025, 13(5), 1090; https://doi.org/10.3390/microorganisms13051090 - 8 May 2025

Cited by 2 | Viewed by 6225

Abstract

Avian influenza A viruses (AIV) significantly impact both animal and human health. Reliable diagnostics are crucial for controlling AIV, including the highly pathogenic strains like H5Nx. In this study, we developed and validated the on-site Alveo Sense Poultry Avian Influenza Tests to rapidly [...] Read more.

Avian influenza A viruses (AIV) significantly impact both animal and human health. Reliable diagnostics are crucial for controlling AIV, including the highly pathogenic strains like H5Nx. In this study, we developed and validated the on-site Alveo Sense Poultry Avian Influenza Tests to rapidly detect the AIV M-gene and subtypes H5, H7, and H9 in unprocessed samples using reverse-transcription loop-mediated isothermal amplification (RT-LAMP) and impedance-based measurements. The Alveo Sense tests, using single-use microfluidic cartridges, deliver results within 45 min. Each cartridge includes assays for the AIV M gene and specific H5 and H7 or H9 subtypes, with internal process controls. The laboratory validation involved specificity, limit of detection (LoD), diagnostic sensitivity, reproducibility, and robustness tests using various AIV strains, other avian pathogens, and field samples. The assays showed 100% specificity for AIV subtypes without cross-reactivity with non-AIV pathogens. The LoD95 for H5, H7, and H9 ranged between RT-PCR Ct values of 29–33 in both cloacal and oropharyngeal samples and were able to detect avian influenza virus in both spiked samples and field samples. Reproducibility and repeatability studies showed perfect agreement across operators and laboratories and remained stable and accurate under different pre-analytical conditions. The Alveo Sense tests offer rapid, accurate, and reliable on-site diagnostics for AIV subtypes H5, H7, and H9 on samples from fresh dead and sick birds, valuable for early flock-level detection and outbreak control. Further field validation will improve the understanding of their diagnostic performance across various avian species. Full article

(This article belongs to the Section Virology)

► Show Figures

Figure 1

19 pages, 494 KB

Open AccessArticle

Hardware-Accelerated Data Readout Platform Using Heterogeneous Computing for DNA Data Storage

by Xiaopeng Gou, Qi Ge, Quan Guo, Menghui Ren, Tingting Qi, Rui Qin and Weigang Chen

Appl. Sci. 2025, 15(9), 5050; https://doi.org/10.3390/app15095050 - 1 May 2025

Viewed by 1232

Abstract

DNA data storage has emerged as a promising alternative to traditional storage media due to its high density and durability. However, large-scale DNA storage systems generate massive sequencing reads, posing substantial computational complexity and latency challenges for data readout. Here, we propose a [...] Read more.

DNA data storage has emerged as a promising alternative to traditional storage media due to its high density and durability. However, large-scale DNA storage systems generate massive sequencing reads, posing substantial computational complexity and latency challenges for data readout. Here, we propose a novel heterogeneous computing architecture based on a field-programmable gate array (FPGA) to accelerate DNA data readout. The software component, running on a general computing platform, manages data distribution and schedules acceleration kernels. Meanwhile, the hardware acceleration kernel is deployed on an Alveo U200 data center accelerator card, executing multiple logical computing units within modules and utilizing task-level pipeline structures between modules to handle sequencing reads step by step. This heterogeneous computing acceleration system enables the efficient execution of the entire readout process for DNA data storage. We benchmark the proposed system against a CPU-based software implementation under various error rates and coverages. The results indicate that under high-error, low-coverage conditions (error rate of 1.5% and coverage of 15×), the accelerator achieves a peak speedup of up to 373.1 times, enabling the readout of 59.4 MB of stored data in just 12.40 s. Overall, the accelerator delivers a speedup of two orders of magnitude. Our proposed heterogeneous computing acceleration strategy provides an efficient solution for large-scale DNA data readout. Full article

► Show Figures

Figure 1

27 pages, 2477 KB

Open AccessArticle

BPAP: FPGA Design of a RISC-like Processor for Elliptic Curve Cryptography Using Task-Level Parallel Programming in High-Level Synthesis

by Rares Ifrim and Decebal Popescu

Cryptography 2025, 9(1), 20; https://doi.org/10.3390/cryptography9010020 - 19 Mar 2025

Cited by 1 | Viewed by 1885

Abstract

Popular technologies such as blockchain and zero-knowledge proof, which have already entered the enterprise space, heavily use cryptography as the core of their protocol stack. One of the most used systems in this regard is Elliptic Curve Cryptography, precisely the point multiplication operation, [...] Read more.

Popular technologies such as blockchain and zero-knowledge proof, which have already entered the enterprise space, heavily use cryptography as the core of their protocol stack. One of the most used systems in this regard is Elliptic Curve Cryptography, precisely the point multiplication operation, which provides the security assumption for all applications that use this system. As this operation is computationally intensive, one solution is to offload it to specialized accelerators to provide better throughput and increased efficiency. In this paper, we explore the use of Field Programmable Gate Arrays (FPGAs) and the High-Level Synthesis framework of AMD Vitis in designing an elliptic curve point arithmetic unit (point adder) for the secp256k1 curve. We show how task-level parallel programming and data streaming are used in designing a RISC processor-like architecture to provide pipeline parallelism and increase the throughput of the point adder unit. We also show how to efficiently use the proposed processor architecture by designing a point multiplication scheduler capable of scheduling multiple batches of elliptic curve points to utilize the point adder unit efficiently. Finally, we evaluate our design on an AMD-Xilinx Alveo-family FPGA and show that our point arithmetic processor has better throughput and frequency than related work. Full article

(This article belongs to the Special Issue Interdisciplinary Cryptography)

► Show Figures

Figure 1

20 pages, 4631 KB

Open AccessArticle

An On-Chip Architectural Framework Design for Achieving High-Throughput Multi-Channel High-Bandwidth Memory Access in Field-Programmable Gate Array Systems

by Xiangcong Kong, Zixuan Zhu, Chujun Feng, Yongxin Zhu and Xiaoying Zheng

Electronics 2025, 14(3), 466; https://doi.org/10.3390/electronics14030466 - 24 Jan 2025

Cited by 1 | Viewed by 3371

Abstract

The integration of High-Bandwidth Memory (HBM) into Field-Programmable Gate Arrays (FPGAs) has significantly enhanced data processing capabilities. However, the segmentation of HBM into 32 pseudo-channels, each managed by a performance-limited crossbar, imposes a significant bottleneck on data throughput. To overcome this challenge, we [...] Read more.

The integration of High-Bandwidth Memory (HBM) into Field-Programmable Gate Arrays (FPGAs) has significantly enhanced data processing capabilities. However, the segmentation of HBM into 32 pseudo-channels, each managed by a performance-limited crossbar, imposes a significant bottleneck on data throughput. To overcome this challenge, we propose a transparent HBM access framework that integrates a non-blocking network-on-chip (NoC) module and fine-grained burst control transmission, enabling efficient multi-channel memory access in HBM. Our Omega-based NoC achieves a throughput of 692 million packets per second, surpassing state-of-the-art solutions. When implemented on the Xilinx Alveo U280 FPGA board, the proposed framework attains near-maximum single-channel write bandwidth, delivering 12.94 GB/s in many-to-many unicast communication scenarios, demonstrating its effectiveness in optimizing memory access for high-performance applications. Full article

► Show Figures

Figure 1

19 pages, 1186 KB

Open AccessArticle

PrismParser: A Framework for Implementing Efficient P4-Programmable Packet Parsers on FPGA

by Parisa Mashreghi-Moghadam, Tarek Ould-Bachir and Yvon Savaria

Future Internet 2024, 16(9), 307; https://doi.org/10.3390/fi16090307 - 27 Aug 2024

Cited by 2 | Viewed by 1978

Abstract

The increasing complexity of modern networks and their evolving needs demand flexible, high-performance packet processing solutions. The P4 language excels in specifying packet processing in software-defined networks (SDNs). Field-programmable gate arrays (FPGAs) are ideal for P4-based packet parsers due to their reconfigurability and [...] Read more.

The increasing complexity of modern networks and their evolving needs demand flexible, high-performance packet processing solutions. The P4 language excels in specifying packet processing in software-defined networks (SDNs). Field-programmable gate arrays (FPGAs) are ideal for P4-based packet parsers due to their reconfigurability and ability to handle data transmitted at high speed. This paper introduces three FPGA-based P4-programmable packet parsing architectural designs that translate P4 specifications into adaptable hardware implementations called base, overlay, and pipeline, each optimized for different packet parsing performance. As modern network infrastructures evolve, the need for multi-tenant environments becomes increasingly critical. Multi-tenancy allows multiple independent users or organizations to share the same physical network resources while maintaining isolation and customized configurations. The rise of 5G and cloud computing has accelerated the demand for network slicing and virtualization technologies, enabling efficient resource allocation and management for multiple tenants. By leveraging P4-programmable packet parsers on FPGAs, our framework addresses these challenges by providing flexible and scalable solutions for multi-tenant network environments. The base parser offers a simple design for essential packet parsing, using minimal resources for high-speed processing. The overlay parser extends the base design for parallel processing, supporting various bus sizes and throughputs. The pipeline parser boosts throughput by segmenting parsing into multiple stages. The efficiency of the proposed approaches is evaluated through detailed resource consumption metrics measured on an Alveo U280 board, demonstrating throughputs of 15.2 Gb/s for the base design, 15.2 Gb/s to 64.42 Gb/s for the overlay design, and up to 282 Gb/s for the pipelined design. These results demonstrate a range of high performances across varying throughput requirements. The proposed approach utilizes a system that ensures low latency and high throughput that yields streaming packet parsers directly from P4 programs, supporting parsing graphs with up to seven transitioning nodes and four connections between nodes. The functionality of the parsers was tested on enterprise networks, a firewall, and a 5G Access Gateway Function graph. Full article

(This article belongs to the Special Issue Convergence of Edge Computing and Next Generation Networking)

► Show Figures

Figure 1

21 pages, 640 KB

Open AccessArticle

A High-Performance Non-Indexed Text Search System

by Binh Kieu-Do-Nguyen, Tuan-Kiet Dang, Nguyen The Binh, Cuong Pham-Quoc, Huynh Phuc Nghi, Ngoc-Thinh Tran, Katsumi Inoue, Cong-Kha Pham and Trong-Thuc Hoang

Electronics 2024, 13(11), 2125; https://doi.org/10.3390/electronics13112125 - 29 May 2024

Cited by 1 | Viewed by 2082

Abstract

Full-text search has a wide range of applications, including tracking systems, computer vision, and natural language processing. Standard methods usually implement a two-phase procedure: indexing and retrieving, with the retrieval performance entirely dependent on the index efficiency. In most cases, the more powerful [...] Read more.

Full-text search has a wide range of applications, including tracking systems, computer vision, and natural language processing. Standard methods usually implement a two-phase procedure: indexing and retrieving, with the retrieval performance entirely dependent on the index efficiency. In most cases, the more powerful the index algorithm, the more memory and processing time are required. The amount of time and memory required to index a collection of documents is proportional to its overall size. In this paper, we propose a full-text search hardware implementation without the indexing phase, thus removing the time and memory requirements for indexing. Additionally, we propose an efficient design to leverage the parallel architecture of High Bandwidth Memory (HBM). To our knowledge, few (if not zero) researchers have integrated their full-text search system with an effective data access control on HBM. The functionality of the proposed system is verified on the Xilinx Alveo U50 Field-Programmable Gate Array (FPGA). The experimental results show that our system achieved a throughput of 8 Gigabytes per second, about 6697× speed-up compared to other software-based approaches. Full article

(This article belongs to the Section Microelectronics)

► Show Figures

Figure 1

16 pages, 1997 KB

Open AccessArticle

AlveoMPU: Bridging the Gap in Lung Model Interactions Using a Novel Alveolar Bilayer Film

by Minoru Hirano, Kosuke Iwata, Yuri Yamada, Yasuhiko Shinoda, Masateru Yamazaki, Sayaka Hino, Aya Ikeda, Akiko Shimizu, Shuhei Otsuka, Hiroyuki Nakagawa and Yoshihide Watanabe

Polymers 2024, 16(11), 1486; https://doi.org/10.3390/polym16111486 - 23 May 2024

Cited by 4 | Viewed by 4903

Abstract

The alveoli, critical sites for gas exchange in the lungs, comprise alveolar epithelial cells and pulmonary capillary endothelial cells. Traditional experimental models rely on porous polyethylene terephthalate or polycarbonate membranes, which restrict direct cell-to-cell contact. To address this limitation, we developed AlveoMPU, a [...] Read more.

The alveoli, critical sites for gas exchange in the lungs, comprise alveolar epithelial cells and pulmonary capillary endothelial cells. Traditional experimental models rely on porous polyethylene terephthalate or polycarbonate membranes, which restrict direct cell-to-cell contact. To address this limitation, we developed AlveoMPU, a new foam-based mortar-like polyurethane-formed alveolar model that facilitates direct cell–cell interactions. AlveoMPU features a unique anisotropic mortar-shaped configuration with larger pores at the top and smaller pores at the bottom, allowing the alveolar epithelial cells to gradually extend toward the bottom. The underside of the film is remarkably thin, enabling seeded pulmonary microvascular endothelial cells to interact with alveolar epithelial cells. Using AlveoMPU, it is possible to construct a bilayer structure mimicking the alveoli, potentially serving as a model that accurately simulates the actual alveoli. This innovative model can be utilized as a drug-screening tool for measuring transepithelial electrical resistance, assessing substance permeability, observing cytokine secretion during inflammation, and evaluating drug efficacy and pharmacokinetics. Full article

(This article belongs to the Special Issue Advanced Polymeric Scaffolds Applied in the Biomedical Field)

► Show Figures

Figure 1

16 pages, 1447 KB

Open AccessArticle

High-Performance Reconfigurable Pipeline Implementation for FPGA-Based SmartNIC

by Xiaoyong Song, Rui Lu and Zhichuan Guo

Micromachines 2024, 15(4), 449; https://doi.org/10.3390/mi15040449 - 27 Mar 2024

Cited by 5 | Viewed by 3542

Abstract

As the key module of programmable switches or the SmartNIC card, the packet processing pipeline undertakes the task of packet forwarding and processing. However, the current pipeline for the FPGA-based SmartNIC is inflexible, and the related reconfigurable commercial device designs are closed-source. To [...] Read more.

As the key module of programmable switches or the SmartNIC card, the packet processing pipeline undertakes the task of packet forwarding and processing. However, the current pipeline for the FPGA-based SmartNIC is inflexible, and the related reconfigurable commercial device designs are closed-source. To solve this problem, this paper proposes a high-performance reconfigurable pipeline design, which has fully reconfigurable match-action units, supporting various network functions by its flexible reconfiguration. The fields of the match key and the size of the match table can be reconfigured without recompiling the HDL code or modifying the hardware. The processing rules and action instructions for the pipeline can be dynamically installed by the configuration module at runtime. We implement our design on the Xilinx Alveo U200 board with a Virtex UltraScale+ XCU200-2FSGD2104E FPGA and show that the designed pipeline supports fast reconfiguration to implement new network functions and that the throughput of the designed pipeline reaches 100 Gbps with low latency. Full article

(This article belongs to the Special Issue FPGA Applications and Future Trends)

► Show Figures

Figure 1

16 pages, 5908 KB

Open AccessArticle

Memory-Tree Based Design of Optical Character Recognition in FPGA

by Ke Yu, Minguk Kim and Jun Rim Choi

Electronics 2023, 12(3), 754; https://doi.org/10.3390/electronics12030754 - 2 Feb 2023

Cited by 10 | Viewed by 6408

Abstract

As one of the fields of Artificial Intelligence (AI), Optical Character Recognition (OCR) systems have wide application in both industrial production and daily life. Conventional OCR systems are commonly designed and implement data computation on the basis of microprocessors; the performance of the [...] Read more.

As one of the fields of Artificial Intelligence (AI), Optical Character Recognition (OCR) systems have wide application in both industrial production and daily life. Conventional OCR systems are commonly designed and implement data computation on the basis of microprocessors; the performance of the processor relates to the effect of the computation. However, due to the “Memory-wall” problem and Von Neumann bottlenecks, the drawbacks of traditional processor-based computing for OCR systems are gradually becoming apparent. In this paper, an approach based on the Memory-Centric Computing and “Memory-Tree” algorithm has been proposed to perform hardware optimization of traditional OCR systems. The proposed algorithm was first designed in software implementation using C/C++ and OpenCV to verify the feasibility of the idea and then the RTL conversion of the algorithm was done using the Xilinx Vitis High Level Synthesis (HLS) tool to implement the hardware. This work chose Xilinx Alveo U50 FPGA Accelerator to complete the hardware design, which can be connected to the x86 CPU in the PC by PCIe to form heterogeneous computing. The results of the hardware implementation show that the system this work designed can recognize characters of English capital letters and numbers within 34.24 us. The power of FPGA is 18.59 W, which saves 77.87% of energy consumption compared to the 84 W of the processor in PC. Full article

(This article belongs to the Special Issue FPGAs Based Hardware Design)

► Show Figures

Figure 1

15 pages, 4288 KB

Open AccessArticle

Archaeometric Surveys of the Artifacts from the Archaeological Site of Baro Zavelea, Comacchio (Ferrara, Italy)

by Elena Marrocchino, Chiara Telloli, Umberto Tessari, Mario Cesarano, Marco Bruni and Carmela Vaccaro

Appl. Sci. 2022, 12(22), 11692; https://doi.org/10.3390/app122211692 - 17 Nov 2022

Cited by 2 | Viewed by 2336

Abstract

This work is part of a project of the Superintendence of Archaeology, Fine Arts, and Landscape for the enhancement of the widespread archaeological heritage of the Po delta area. Excavation activities, carried out in 2015, allowed the sampling of the stratigraphic elements and [...] Read more.

This work is part of a project of the Superintendence of Archaeology, Fine Arts, and Landscape for the enhancement of the widespread archaeological heritage of the Po delta area. Excavation activities, carried out in 2015, allowed the sampling of the stratigraphic elements and artifacts of the archaeological site of the lighthouse tower of Baro Zavelea, municipality of Comacchio (Ferrara, northeast Italy). In this work, the geochemical characterization of sediments and building materials was conducted using granulometric analyses, X-ray fluorescence analysis, and calcimetry on different types of samples, including sands, clays, mortars, and bricks, with the scope to better characterize all of the different types of sediments collected. This multidisciplinary approach allowed the diagnostic and evaluation of the state of conservation of Baro Zavalea. Granulometric analyses highlighted the fact that depositional environments were of very different natures: fluvial environments and paleo–alveo environments. In addition, XRF analysis allowed the discrimination of different clay samples, some from basins poor in carbonates, while, for the construction of the bricks of the second wall structure, clays rich in carbonate were chosen to add lightness to the structure. Full article

(This article belongs to the Special Issue Approaches and Challenges in Diagnostics and Conservation of Cultural Heritage)

► Show Figures

Figure 1

17 pages, 19950 KB

Open AccessArticle

High Performance Computing PP-Distance Algorithms to Generate X-ray Spectra from 3D Models

by César González, Simone Balocco, Jaume Bosch, Juan Miguel de Haro, Maurizio Paolini, Antonio Filgueras, Carlos Álvarez and Ramon Pons

Int. J. Mol. Sci. 2022, 23(19), 11408; https://doi.org/10.3390/ijms231911408 - 27 Sep 2022

Cited by 2 | Viewed by 2631

Abstract

X-ray crystallography is a powerful method that has significantly contributed to our understanding of the biological function of proteins and other molecules. This method relies on the production of crystals that, however, are usually a bottleneck in the process. For some molecules, no [...] Read more.

X-ray crystallography is a powerful method that has significantly contributed to our understanding of the biological function of proteins and other molecules. This method relies on the production of crystals that, however, are usually a bottleneck in the process. For some molecules, no crystallization has been achieved or insufficient crystals were obtained. Some other systems do not crystallize at all, such as nanoparticles which, because of their dimensions, cannot be treated by the usual crystallographic methods. To solve this, whole pair distribution function has been proposed to bridge the gap between Bragg and Debye scattering theories. To execute a fitting, the spectra of several different constructs, composed of millions of particles each, should be computed using a particle–pair or particle–particle (pp) distance algorithm. Using this computation as a test bench for current field-programmable gate array (FPGA) technology, we evaluate how the parallel computation capability of FPGAs can be exploited to reduce the computation time. We present two different solutions to the problem using two state-of-the-art FPGA technologies. In the first one, the main C program uses OmpSs (a high-level programming model developed at the Barcelona Supercomputing Center, that enables task offload to different high-performance computing devices) for task invocation, and kernels are built with OpenCL using reduced data sizes to save transmission time. The second approach uses task and data parallelism to operate on data locally and update data globally in a decoupled task. Benchmarks have been evaluated over an Intel D5005 Programmable Acceleration Card, computing a model of 2 million particles in 81.57 s – 24.5 billion atom pairs per second (bapps)– and over a ZU102 in 115.31 s. In our last test, over an up-to-date Alveo U200 board, the computation lasted for 34.68 s (57.67 bapps). In this study, we analyze the results in relation to the classic terms of speed-up and efficiency and give hints for future improvements focused on reducing the global job time. Full article

(This article belongs to the Section Molecular Biophysics)

► Show Figures

Figure 1

Search Results (17)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (17)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI