MDPI - Publisher of Open Access Journals

21 pages, 388 KB

Open AccessArticle

PhishGraph: A Disk-Aware Approximate Nearest Neighbor Index for Billion-Scale Semantic URL Search

by Dimitrios Karapiperis, Georgios Feretzakis and Sarandis Mitropoulos

Electronics 2025, 14(21), 4331; https://doi.org/10.3390/electronics14214331 - 5 Nov 2025

Viewed by 262

The proliferation of algorithmically generated malicious URLs necessitates a shift from syntactic detection to semantic analysis. This paper introduces PhishGraph, a disk-aware Approximate Nearest Neighbor (ANN) search system designed to perform billion-scale semantic similarity searches on URL embeddings for threat intelligence applications. Traditional in-memory ANN indexes are prohibitively expensive at this scale, while existing disk-based solutions fail to address the unique challenges of the cybersecurity domain: the high velocity of streaming data, the complexity of hybrid queries involving rich metadata, and the highly skewed, adversarial nature of query workloads. PhishGraph addresses these challenges through a synergistic architecture built upon the foundational principles of DiskANN. Its core is a Vamana proximity graph optimized for SSD residency, but it extends this with three key innovations: a Hybrid Fusion Distance metric that natively integrates structured attributes into the graph’s topology for efficient constrained search; a dual-mode update mechanism that combines high-throughput batch consolidation with low-latency in-place updates for streaming data; and an adaptive maintenance policy that monitors query patterns and dynamically reconfigures graph hotspots to mitigate performance degradation from skewed workloads. Our comprehensive experimental evaluation on a billion-point dataset demonstrates that PhishGraph’s adaptive, hybrid design significantly outperforms strong baselines, offering a robust, scalable, and efficient solution for modern threat intelligence. Full article

(This article belongs to the Special Issue Advanced Research in Technology and Information Systems, 2nd Edition)

► Show Figures

Figure 1

26 pages, 4196 KB

Open AccessArticle

Amphisbaena: A Novel Persistent Buffer Management Strategy to Improve SMR Disk Performance

by Chi Zhang, Fangxing Yu, Shiqiang Nie, Wei Tang, Fei Liu, Song Liu and Weiguo Wu

Appl. Sci. 2024, 14(2), 630; https://doi.org/10.3390/app14020630 - 11 Jan 2024

Cited by 1 | Viewed by 1684

Abstract

The explosive growth of massive data makes shingled magnetic recording (SMR) disks a promising candidate for balancing capacity and cost. SMR disks are typically configured with a persistent buffer to reduce the read–modify–write (RMW) overhead introduced by non-sequential writes. Traditional SMR zones-based persistent buffers are subject to sequential-write constraints, and frequent cleanups cause disk performance degradation. Conventional magnetic recording (CMR) zones with in-place update capabilities enable less frequent cleanups and are gradually being used to construct persistent buffers in certain SMR disks. However, existing CMR zones-based persistent buffer designs fail to accurately capture hot blocks with long update periods and are limited by an inflexible data layout, resulting in inefficient cleanups. To address the above issues, we propose a strategy called Amphisbaena. First, a two-phase data block classification method is proposed to capture frequently updated blocks. Then, a locality-aware buffer space management scheme is developed to dynamically manage blocks with different update frequencies. Finally, a latency-sensitive garbage collection policy based on the above is designed to mitigate the impact of cleanup on user requests. Experimental results show that Amphisbaena reduces latency by an average of 29.9% and the number of RMWs by an average of 37% compared to current state-of-the-art strategies. Full article

(This article belongs to the Special Issue Resource Management for Emerging Computing Systems)

► Show Figures

Figure 1

19 pages, 2067 KB

Open AccessArticle

Balloon: An Elastic Data Management Strategy for Interlaced Magnetic Recording

by Chi Zhang, Song Liu, Fangxing Yu, Menghan Li, Wei Tang, Fei Liu and Weiguo Wu

Appl. Sci. 2023, 13(17), 9767; https://doi.org/10.3390/app13179767 - 29 Aug 2023

Cited by 4 | Viewed by 1711

Abstract

Recently, the emerging technology known as Interlaced Magnetic Recording (IMR) has been receiving widespread attention from both industry and academia. IMR-based disks incorporate interlaced track layouts and energy-assisted techniques to dramatically increase areal densities. The interlaced track layout means that in-place updates to the bottom track require rewriting the adjacent top track to ensure data consistency. However, at high disk utilization, frequent track rewrites degrade disk performance. To address this problem, we propose a solution called Balloon to reduce the frequency of track rewrites. First, an adaptive write interference data placement policy is introduced, which judiciously places data on tracks with low rewrite probability to avoid unnecessary rewrites. Next, an on-demand data shuffling mechanism is designed to reduce user-requests write latency by implicitly migrating data and promptly swapping tracks with high update block coverage to the top track. Finally, a write-interference-free persistent buffer design is proposed. This design dynamically adjusts buffer admission constraints and selectively evicts data blocks to improve the cooperation between data placement and data shuffling. Evaluation results show that Balloon significantly improves the write performance of IMR-based disks at medium and high utilization compared with state-of-the-art studies. Full article

(This article belongs to the Special Issue Resource Management for Emerging Computing Systems)

► Show Figures

Figure 1

21 pages, 1688 KB

Open AccessReview

Crash Recovery Techniques for Flash Storage Devices Leveraging Flash Translation Layer: A Review

by Abdulhadi Alahmadi and Tae Sun Chung

Electronics 2023, 12(6), 1422; https://doi.org/10.3390/electronics12061422 - 16 Mar 2023

Cited by 5 | Viewed by 4268

Abstract

The flash storage is a type of nonvolatile semiconductor device that is operated continuously and has been substituting the hard disk or secondary memory in several storage markets, such as PC/laptop computers, mobile devices, and is also used as an enterprise server. Moreover, it offers a number of benefits, including compact size, low power consumption, quick access, easy mobility, heat dissipation, shock tolerance, data preservation during a power outage, and random access. Different embedded system products, including digital cameras, smartphones, personal digital assistants (PDA), along with sensor devices, are currently integrating flash memory. However, as flash memory requires unique capabilities such as “erase before write” as well as “wear-leveling”, a FTL (flash translation layer) is added to the software layer. The FTL software module overcomes the problem of performance that arises from the erase before write operation and wear-leveling, i.e., flash memory does not allow for an in-place update, and therefore a block must be erased prior to overwriting upon the present data. In the meantime, flash storage devices face challenges of failure and thus they must be able to recover metadata (as well as address mapping information), including data after a crash. The FTL layer is responsible for and intended for use in crash recovery. Although the power-off recovery technique is essential for portable devices, most FTL algorithms do not take this into account. In this paper, we review various schemes of crash recovery leveraging FTL for flash storage devices. We illustrate the classification of the FTL algorithms. Moreover, we also discuss the various metrics and parameters evaluated for comparison with other approaches by each scheme, along with the flash type. In addition, we made an analysis of the FTL schemes. We also describe meaningful considerations which play a critical role in the design development for power-off recovery employing FTL. Full article

(This article belongs to the Special Issue Artificial Intelligence Driven Software-Defined Networking (SDN) Technologies for Next Generation Networks)

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI