Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms

Cho, Kyungwoon; Bahn, Hyokyung

doi:10.3390/app15084539

Open AccessArticle

Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms

by

Kyungwoon Cho

¹

and

Hyokyung Bahn

^2,*

¹

Embedded Software Research Center, Ewha University, Seoul 03760, Republic of Korea

²

Department of Computer Engineering, Ewha University, Seoul 03760, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4539; https://doi.org/10.3390/app15084539

Submission received: 31 December 2024 / Revised: 18 February 2025 / Accepted: 18 April 2025 / Published: 20 April 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This work can be applied to enhance the robustness of image filtering systems in large-scale content platforms, specifically for detecting unauthorized images and their transformed versions, preventing the dissemination of manipulated data.

Abstract

Image filtering systems have become essential in large-scale content platforms to prevent the dissemination of unauthorized data. While extensive research has focused on identifying images based on categories or visual similarity, the filtering problem addressed in this study presents distinct challenges. Specifically, it involves a predefined set of filtering images and requires real-time detection of whether a distributed image is derived from an unauthorized source. Although three major approaches—bitmap-based, image processing-based, and deep learning-based techniques—have been explored, no comprehensive comparison has been conducted. To bridge this gap, we formalize the concept of image equivalence and introduce performance metrics tailored for fair evaluation. Through extensive experiments, we derive the following key findings. First, bitmap-based methods are practically viable in real-world scenarios, offering reasonable detection rates and fast search speeds even under resource constraints. Second, despite their success in tasks such as image classification, deep learning-based methods underperform in our problem domain, highlighting the need for customized models and architectures. Third, image processing-based techniques demonstrate superior performance across all key metrics, including execution time and detection rates. These findings provide valuable insights into designing efficient image filtering systems for diverse content platforms, particularly for detecting unauthorized images and their transformations effectively.

Keywords:

image DNA; fingerprinting; unauthorized content; image filtering system; image equivalence

1. Introduction

Instagram users upload over 95 million photos and videos every day, with an average of 65,000 images posted per minute [1]. Pinterest hosts over 240 billion pins across 5 billion boards, providing a vast repository of user-generated and shared images [2]. These massive volumes of digital content are easily copied, modified, and shared through high-speed internet and mobile networks, leading to a rapid increase in unauthorized content, from copyright violations to the non-consensual distribution of intimate recordings captured through covert cameras [3,4].

To mitigate the spread of such unlawful digital content, social media platforms require filtering mechanisms at various stages involving media creators, service providers, and consumers. Automated filtering techniques leveraging AI are widely adopted to identify and block illegal content, including pornography [5,6]. However, the filtering systems discussed in this study specifically target predefined unauthorized or unlawful data, such as content violating personal portrait rights or non-consensual imagery. Failure to effectively filter such content can lead to its rapid spread online, causing significant harm to individuals [7,8]. Although traditional research in digital rights management (DRM) has tackled similar issues [9], the growing sophistication of data manipulation techniques to evade filtering presents new challenges.

Figure 1 illustrates the structure of the filtering system discussed in this paper. Service providers manage a database of known unauthorized data, periodically updated by regulatory authorities. When media data are uploaded by creators or downloaded by consumers, the system verifies whether the content is unauthorized and blocks it if necessary. The effectiveness of a filtering system depends on how accurately and quickly it can identify illegal data while avoiding excessive false positives. High false positive rates require manual intervention by administrators, increasing operational costs and excessively restricting users from uploading legitimate images.

Due to the inherent nature of digital data, it is trivial to modify original content, such as changing file formats or altering image properties. Consequently, filtering systems must not only detect exact matches but also identify derivative data generated through transformations. This study focuses on filtering techniques for images, aiming to provide a quantitative analysis of methods for detecting modified or derivative images based on the original. Video data, being a temporal sequence of multiple images, could be extended using the techniques discussed in this paper.

Traditionally, fingerprint techniques, a specific method within ICD (Image Copy Detection), have been used to uniquely identify digital data, generating unique identifiers akin to human fingerprints [10]. Recently, these techniques have expanded into the concept of image DNA, which extracts features in more diverse ways. This study adopts the term “image DNA”, which ensures that transformed images produce similar DNA values while distinct images yield dissimilar DNA. Thus, the extent to which DNA similarity reflects image transformation becomes the key metric for evaluating the effectiveness of image DNA techniques.

1.1. Problem Definition

Filtering systems for image services must distinguish between an image derived from an original and a visually similar but non-derivative image, as incorrectly filtering out unrelated yet similar-looking images can lead to frequent false positives. To address this, we introduce the concept of image equivalence, which distinguishes transformation-derived images from those that are merely similar in appearance.

Definition 1.

Two images, A₁ and A₂, are equivalent if and only if A₂ can be generated by applying a transformation to A₁.

Equivalent images encompass not only exact copies of the original but also images that have undergone various transformations, such as format conversion, caption addition, resizing, color adjustment, facial mosaicking, and other modifications. Most Image Copy Detection (ICD) techniques focus on identifying equivalent images with high precision and recall.

In contrast, images that may appear visually similar but are not transformation-derived are classified as similar images rather than equivalent images. To formalize this distinction, we define image DNA as a unique descriptor, ensuring that equivalent images exhibit similar DNA values, whereas visually similar but non-equivalent images possess distinct DNA characteristics.

Definition 2.

The image DNA of an image must satisfy the following two properties:

If images A and B are equivalent images, then their DNA representations, DNA(A) and DNA(B), should have a small distance under a predefined similarity metric dist, i.e.,

dist(DNA(A), DNA(B)) ≤ τ₁

(1)

where τ₁ is a threshold ensuring that small variations from transformations do not disrupt equivalence.
If images A and B are not equivalent images, then their DNA representations should have a large distance under the same metric, i.e.,

dist(DNA(A), DNA(B)) > τ₂

(2)

where τ₂ is a threshold ensuring that visually similar but non-equivalent images remain distinguishable.

A robust image DNA technique, in this context, is one that consistently upholds these properties, ensuring clear differentiation between equivalent and non-equivalent images. The core challenge addressed in this study is the real-time verification of uploaded images in large-scale social media or content service platforms. This process requires computing the image DNA upon upload and instantly comparing it against a database of stored images to determine equivalence.

The process of filtering unauthorized images involves the steps illustrated in Figure 2. First, DNA is extracted from the filtering target images and stored in a searchable DNA database. This database can either be prepopulated with unauthorized images or updated incrementally as needed. For filtering, the DNA of distributed images is extracted, compared against the database, and analyzed for similarity to detect unauthorized content.

For image DNA techniques to be practical in real-world scenarios, several requirements must be satisfied. First, the overhead for DNA generation and search must remain within acceptable limits. Search overhead is particularly critical as the database grows, potentially impacting real-time upload and download performance. Second, the false positive rate must be kept low to avoid mistakenly restricting legitimate uploads and downloads, which would impose additional manual verification costs on operators. Third, the system must achieve high detection rates for equivalent images to effectively block the spread of unauthorized content.

1.2. Main Findings

Image DNA techniques have been studied across three categories: bitmap-based, image processing-based, and deep learning-based methods. Each category demonstrates distinct trade-offs between performance and detection accuracy, making the choice of technique dependent on factors like service scale, user base, and detection accuracy requirements. However, no comprehensive quantitative comparison of these techniques has been conducted. We formalize the notion of image equivalence and define the requirements for image DNA techniques, particularly the real-time verification of uploaded images on large-scale social media and content service platforms. From this perspective, the originality of this study lies in its comprehensive, quantitative comparison of three existing categories, thoroughly analyzing their performance across varying datasets and conditions while evaluating their ability to meet these requirements.

The main findings of this study are as follows:

Each DNA technique is sensitive to threshold settings, which significantly impact its performance in detecting equivalent images, necessitating careful optimization of threshold values.
Bitmap-based methods offer fast DNA extraction and search with minimal computational requirements, demonstrating practical advantages for field deployment.
Image processing-based methods, particularly ORB (Oriented FAST and Rotated BRIEF) [11], provide superior detection accuracy due to their use of multiple feature vectors as DNA while maintaining reasonable overhead for DNA generation and search.
Deep learning-based methods perform poorly in both speed and detection accuracy, highlighting their limitations in addressing image equivalence compared to their strengths in classification and similarity tasks.
Our findings indicate that DNA DB build and search times depend heavily on DNA extraction time, emphasizing the need to prioritize extraction efficiency in practical large-scale filtering systems with high processing demands and frequent DB updates.

These findings provide valuable insights into the strengths and limitations of current techniques and contribute to the design of effective image filtering systems for large-scale content platforms.

1.3. Paper Structure

The remainder of this paper is organized as follows. Section 2 provides a detailed explanation of the three categories of image DNA techniques. Section 3 describes the experimental methodology used for the quantitative analysis of these techniques. Section 4 presents the results of comparative experiments on DNA extraction, insertion into the DNA database, search performance, and detection rates. Finally, Section 5 concludes this paper.

2. Image DNA Techniques

Image DNA techniques can be broadly categorized into three approaches: bitmap-based, image processing-based, and deep learning-based methods. Bitmap-based methods extract DNA directly from image bitmaps, offering advantages such as fast DNA extraction and compact DNA size, but they may struggle with identifying modified images. Image processing-based methods employ advanced computer vision techniques to extract features and generate DNA through computationally intensive processes. Recently, deep learning-based methods have been explored, leveraging pre-trained inference models or deep learning architectures to generate image DNA. Despite their promise, deep learning approaches require significant computational resources and prior training on image features, making them distinct from the other two approaches [12]. Table 1 presents a comparative analysis of the DNA features utilized in each of the three approaches.

2.1. Bitmap-Based Approaches

Bitmap-based methods employ straightforward operations or preprocessing steps to compress bitmap information while maintaining image distinguishability. Examples include histogram analysis and hash-based feature extraction, such as average hash and difference hash. Bitmap-based methods operate on low-level image data, resulting in low computational overhead for DNA generation.

This study adopts a bitmap-based image DNA method termed Nabla DNA (∇-DNA) [13], which converts image bitmaps into a reverse-pyramid structure. As illustrated in Figure 3, the method normalizes the brightness, color, and resolution of an image bitmap. It then segments the bitmap into quadrants along the diagonal, averaging pixel intensities within triangular regions to create a rotation-neutral vector representation.

Nabla generates DNA by converting the resolution of the original image into a square bitmap, where the number of pixels along one axis of the bitmap is defined as the resolution of the DNA. The grayscale bitmap is then normalized according to the number of bits per pixel, which is referred to as the DNA depth. While higher resolution and depth enhance distinguishability, they may reduce the ability to identify transformed images. For representing the configuration of Nabla, a notation such as “x8d4” is used, where “x8” denotes a resolution of 8 × 8, and “d4” indicates a depth of 4.

2.2. Image Processing-Based Approaches

Image processing-based methods generate higher-dimensional feature vectors compared to bitmap-based methods, relying on image transformations and computations to extract distinctive features. Traditional computer vision algorithms, such as SIFT (Scale-Invariant Feature Transform), are prominent examples [14]. SIFT extracts key points from images, providing scale- and rotation-invariant features. It generates 128-dimensional feature vectors and can produce hundreds of unique vectors per image based on the number of key detected points.

SURF (Speeded-Up Robust Features) [15], an improvement over SIFT, enhances speed and efficiency by using the Hessian matrix for key point detection and reduces feature vector dimensions to 64. This results in a 3 to 10 times faster performance compared to SIFT.

ORB was developed to circumvent the patent restrictions of SIFT and SURF, offering a fast and memory-efficient algorithm. ORB combines FAST (Features from Accelerated Segment Test) [16] and BRIEF (Binary Robust Independent Elementary Features) [17], further optimizing speed and memory usage.

2.3. Deep Learning-Based Approaches

Deep learning-based methods utilize neural networks to automatically learn features from large datasets. Popular image recognition networks such as CNN (Convolutional Neural Network) [18], VGG [19], MobileNet [20], and ResNet [21] leverage pre-trained models (e.g., ImageNet [22]) to extract feature vectors representing probabilities across 1000 categories.

VGG employs a simple design with 3 × 3 convolutional kernels and identical padding, utilizing deep architectures with 16–19 layers to enhance representational power. In contrast, MobileNet focuses on lightweight and efficient architectures, though its simplicity may limit its expressiveness compared to high-performance models. ResNet introduces Residual Blocks to address vanishing gradient issues in deep networks, achieving superior performance with fewer computations than VGG.

Autoencoders [23] are unsupervised learning models that compress input data into a low-dimensional latent space and reconstruct them through an encoder–decoder structure. They can generate variable-sized latent vectors by balancing compression efficiency with feature expressiveness. Self-supervised learning approaches such as SimCLR [24] and BYOL [25] learn image similarity and can be adapted for constructing image DNA. However, these methods require a dedicated training process, and incremental learning becomes challenging when filtering target images are continually added [26]. This necessitates the fine-tuning of incremental learning mechanisms for practical deployment.

3. Strategies for Analyzing Image DNA Techniques

3.1. Performance Metrics for Image DNA

Evaluating the performance of image DNA techniques requires well-defined metrics from the perspective of the DNA database. These metrics are essential for understanding the strengths and limitations of different techniques and ensuring their practical applicability in real-world scenarios. They can be categorized into two primary aspects: execution time and detection accuracy.

Execution time metrics are further divided into DNA build time and DNA search time. DNA build time refers to the time required to extract DNA from target images and insert it into the database. DNA search time is the time needed to determine the presence of similar DNA within the database, which includes both DNA extraction and similarity computation. DNA extraction time, common to both the build and search processes, is influenced by the computational complexity of generating image DNA. The size of DNA generated by each technique varies, and larger DNA data typically contain more detailed image features but increase storage costs and negatively impact extraction and search performance.

Detection accuracy is evaluated through two complementary metrics: the true detection rate (TDR) and the false detection rate (FDR). TDR measures the ability to correctly identify derivative images as equivalent to the original, while FDR quantifies the rate of incorrectly identifying distinct images as equivalent. These rates depend on the similarity threshold used to determine equivalence based on DNA similarity. A tighter threshold leads to a simultaneous decrease in both TDR and FDR.

Figure 4 illustrates the average detection rates of 17 different image DNA techniques applied to 53 image pairs selected from a Kaggle dataset [27]. In the figure, the blue line represents the “true detection rate” (TDR), which indicates the proportion of correctly identified equivalent images as the threshold τ₁ in Definition 2 varies. Conversely, the dashed red line represents the “false detection rate” (FDR), which measures the proportion of correctly identified non-equivalent images as the threshold τ₂ varies. Cosine similarity was used as the measure of DNA similarity in this experiment, with the threshold ranging from 0.5 to 1.

A lower threshold improves TDR, approaching 1, but also increases FDR, highlighting the importance of selecting an appropriate threshold. Even at thresholds close to 1, 9 out of 96 non-equivalent test images were misclassified as equivalent, underscoring the frequent occurrence of false detections. These results indicate the importance of fine-tuning τ₁ and τ₂ to optimize filtering performance. However, optimal thresholds depend heavily on image characteristics, DNA techniques, and distance metrics, requiring extensive environment-specific optimization. As such, fine-tuning falls beyond the scope of this study; we instead conduct a comparative analysis using empirically optimized threshold values to ensure a fair evaluation of each method’s effectiveness for the given dataset.

3.2. Image DNA DB

The DNA of target images stored in the DNA database consists of real-valued vector data, making exact matching infeasible and incompatible with traditional indexing techniques. Filtering decisions based on metrics such as Euclidean distance or cosine similarity often requires computing the similarity between a query DNA and every DNA stored in the database. While these calculations are independent and can be parallelized to reduce processing time, the computational cost scales linearly with the size of the DNA database.

Recent advances in vector database (VDB) technologies [28] provide a way to significantly reduce DNA similarity computation time. Approximate indexing techniques, commonly used in VDBs, enable probabilistic similarity calculations that address the inefficiencies of brute-force methods. In this study, we utilize FAISS [29], a VDB supporting incremental updates, with the HNSW (Hierarchical Navigable Small World) [30] indexing technique to construct the DNA database.

Image processing-based DNA techniques generate a large and variable number of 128-dimensional feature vectors per image, often reaching hundreds, depending on image complexity. Consequently, their DNA databases are significantly larger than those of other methods and require additional overhead for managing the mapping information for individual vectors. Moreover, the similarity between the DNA must be calculated based on the ratio of matched vectors between the query DNA and stored DNA, rather than simple pairwise comparisons, further complicating the computation process. This ratio is referred to as the Matched Vector Ratio (MVR).

For our experiments with Nabla DNA, we configured different resolutions and depths for storage DNA and indexing DNA to evaluate their performance. Specifically, the notation x8d4x16 indicates that vector indexing is performed with a resolution of 8 and a depth of 4, while storage DNA is created with a resolution of 16 and a default depth of 8. Such configurations allow us to analyze the impact of varying DNA parameters on indexing and retrieval performance.

3.3. Experimental Data Setup

To compare the performance of image DNA techniques, experiments were conducted using two datasets: a small-scale Kaggle dataset [27] and the COCO dataset (2014 version) [31]. The Kaggle dataset contains 1752 images, excluding exact duplicates, of which 34 image pairs deemed equivalent were used for experimentation. This dataset includes images that appear visually similar but are inherently different, as well as transformed versions of the same image, making it ideal for testing equivalence detection while ensuring sufficient diversity.

The COCO dataset comprises 164,062 images with no equivalent pairs. For equivalence experiments, 20 types of transformations were performed as detailed in Table 2. For performance evaluation, only the large-scale COCO dataset was used, considering that the smaller Kaggle dataset exhibited significant speed fluctuations and that extraction and search time generally scale with dataset size.

3.4. Limitations of the Study

Despite the promising results, this study has several limitations that should be addressed. First, while the COCO dataset is widely used in image-based deep learning research, its size and diversity may not be sufficient to fully represent real-world scenarios. As a result, the model’s performance in highly variable or complex environments remains uncertain. Nevertheless, the dataset serves as a reasonable benchmark for performance comparisons, providing a foundational reference for evaluating different approaches.

Additionally, although various editing methods were applied to the original images, the study did not incorporate manually augmented datasets. More extensive manual augmentation could have further enriched the dataset, enhancing the robustness of the model. Moreover, future research could benefit from employing advanced data augmentation techniques to improve dataset diversity and generalization. Finally, while this study primarily focused on autoencoders and classification-based deep learning models pretrained on ImageNet, it did not evaluate models that directly learn transformation methods. Exploring such models could provide further insights into optimizing image DNA-based techniques.

4. Experiment Results

In this section, we conduct a quantitative analysis of three different types of image DNA techniques using various performance metrics through experiments. For the bitmap-based methods, two configurations of the Nabla DNA technique were used: low-resolution Nabla (x8d4x8) and high-resolution Nabla (x8d8x16). The image processing-based methods included SIFT and ORB, while the deep learning-based methods consisted of Autoencoder, MobileNet, and VGG. Table 3 summarizes the image DNA techniques employed in this study. The code and datasets used for the experiments are publicly available on GitHub [13], versioned at commit 0ca06ef.

Figure 5 illustrates a heatmap of DNA similarities for the Kaggle dataset, which comprises 34 equivalent image pairs and 19 non-equivalent image pairs. Cosine similarity was used to measure DNA similarity. For equivalent image pairs, similarities below 0.9 (to −1) are represented with a gradient of red, while for non-equivalent pairs, similarities from 0.5 to 1 are shown in blue. Darker red indicates low DNA similarity for equivalent images, signifying poor true detection rates, while darker blue indicates high DNA similarity for non-equivalent pairs, signifying elevated false detection rates. The closer the colors are to white, the better the performance.

As shown in Figure 5, ORB and Autoencoder (u16) achieved relatively good overall results. However, none of the methods could perfectly detect image equivalence. Many equivalent image pairs exhibited low cosine similarity across all techniques, likely due to the nature of the dataset. The Kaggle equivalent image pairs often included the same object photographed from different angles or heavily cropped images, resulting in low similarity for bitmap-based and image processing-based methods. Deep learning-based DNA techniques, which excel at distinguishing image classification and similarity, generally demonstrate higher cosine similarity. In particular, the Autoencoder trained on a limited dataset achieves a notably high true detection rate. However, apart from SIFT, all other methods show relatively high DNA similarity for non-equivalent image pairs, indicating a higher likelihood of incorrectly judging image equivalence.

Figure 6 illustrates the time required to construct DNA databases for each technique, using 3000 randomly selected images from the COCO dataset as the filtering target. The database construction time is composed of DNA extraction time and the time to insert the DNA into the VDB. DNA extraction time in our experiments excludes I/O delays, as image files are read from the buffer cache instead of secondary storage [32].

As shown in the figure, the bitmap-based Nabla method outperformed other techniques, achieving speeds that were 5.6 to 30.5 times faster. This superior performance is attributed to its reduced DNA extraction time and smaller DNA size, which collectively contribute to faster processing. When separating DNA extraction and database insertion times, the insertion time for bitmap-based and deep learning-based methods is negligible. In contrast, image processing-based methods incurred significant insertion delays, taking 1.5 to 2 times longer than DNA extraction. This is because image processing-based methods generate hundreds of high-dimensional feature vectors per DNA. To mitigate this limitation, enhancing parallelism for database insertion in image processing-based methods is essential. By contrast, the extraction process is embarrassingly parallel for filtering target images, giving all techniques the potential for performance improvements.

For the Autoencoder method, training time was excluded from the measurement. Including training would add several hours to the overall build time. Even when comparing DNA extraction times alone, the bitmap-based method demonstrated the best performance. Image processing methods and some lightweight deep learning methods (e.g., Autoencoder, MobileNet) also achieved reasonable overheads of less than 100 s. However, the VGG method required nearly 300 s, due to its deep network architecture with numerous layers, which increases computational demand for DNA extraction.

In summary, for environments where database insertion time is critical, bitmap-based and deep learning-based methods impose minimal overhead and are thus more suitable. Conversely, image processing-based methods exhibit considerable overhead during insertion. For scenarios prioritizing DNA extraction speed, bitmap-based and image processing-based methods outperform deep learning techniques.

Notably, the bitmap-based Nabla method required only 3.5 ms per image for DNA extraction, even when additional DNA was generated for vector indexing. Similarly, the image processing-based ORB method achieved an extraction time of approximately 5.9 ms per image. However, when accounting for the large number of feature vectors generated per DNA, the total time, including database insertion, increased to approximately 20 ms.

The DNA search time consists of both DNA extraction time and database query time. Figure 7 illustrates the total search time for 161,062 images from the COCO dataset, excluding the 3000 filtering target images used to construct the DNA database. As shown in the figure, the search time trends are consistent with those observed during DNA database construction. Except for image processing-based methods, the database query time is negligible compared to the DNA extraction time.

The bitmap-based Nabla method achieved a search time of approximately 3.5 ms per image, making it 3 to 27 times faster than other techniques. This superior performance stems from the crucial role of DNA extraction time in search efficiency. Since Nabla significantly reduces DNA extraction time, it directly accelerates the overall search process. For image processing-based methods such as SIFT and ORB, the search time accounted for 60% and 83%, respectively, of the DNA extraction time. This is because these methods generate hundreds of high-dimensional feature vectors per DNA, increasing the computational cost of database queries.

Deep learning-based methods exhibited higher DNA extraction times, with VGG being the slowest due to its deep network architecture. Notably, the size of the DNA itself, regardless of whether it is derived from deep learning- or bitmap-based methods, had minimal impact on the extraction speed.

To quantitatively compare the detection accuracy of each technique, the detection rates were evaluated using 20 transformed images from the COCO dataset. Figure 8 plots the true detection rate (TDR) and false detection rate (FDR) for the bitmap-based Nabla method as a function of the cosine similarity threshold between DNA vectors. The results show that increasing the threshold reduces both the TDR and FDR.

For the bitmap-based method, certain equivalent images fail to be detected even at thresholds close to zero, indicating that its ability to distinguish image features is weaker compared to image processing-based methods. However, the TDR decreases more gradually compared to other techniques, resulting in a broader effective threshold range. Notably, when the DNA resolution is high, the gap between the TDR and FDR widens, making it easier to determine an optimal threshold.

Figure 9 shows the trends in the True Detection Ratio (TDR) and False Detection Ratio (FDR) for the image processing-based methods SIFT and ORB. Both methods used the Matched Vector Ratio (MVR) to determine thresholds, and the results indicate that FDR remained exceptionally low across all threshold values. This is because image processing-based methods generate hundreds of feature vectors per DNA, and the likelihood of feature vectors from non-equivalent DNA matching those of transformed DNA is extremely rare. However, compared to bitmap-based methods, the TDR for image processing-based methods decreases more sharply as the threshold increases, making it relatively more challenging to determine an optimal threshold for deployment.

Figure 10 depicts the TDR and FDR for deep learning-based methods. The thresholds on the x-axis are based on L2 distances. Among the methods, Autoencoder performed well in detecting the 20 transformed images it was specifically trained on. However, overall TDR was lower than that of other approaches, and the lightweight MobileNet model exhibited a higher FDR than TDR, making it impractical for real-world use. Although VGG achieved higher detection rates compared to MobileNet, its performance was inadequate for practical equivalence detection. This limitation is attributed to its dependence on class-probability vectors, which are specifically designed for image classification tasks. These results indicate that deep learning models have significant room for improvement in addressing the image equivalence detection problem, particularly for untrained arbitrary images, by shifting the focus from similarity detection to equivalence detection.

The poor performance of deep learning approaches can largely be attributed to the architecture itself. While current deep learning architectures demonstrate strong performance in terms of “image similarity”, they are less effective in determining “image equivalence”. If a model were trained on a sufficiently diverse dataset of entirely distinct images, it could establish an appropriate threshold while maintaining a low false negative rate. However, training on a limited dataset may perform well in image classification problems but remains less effective in distinguishing whether an image is a derivative of another. Figure 10 clearly illustrates that a well-trained Autoencoder, which learns a compact representation of the data, achieves a higher detection rate than classification-based models such as MobileNet or VGG.

Figure 11 illustrates the maximum achievable TDR for each method across the 20 types of image transformations in Table 2. As shown in the figure, image processing-based methods achieved the highest detection rates for all image cases we considered. In contrast, low-resolution Nabla (x4d4x8) and the lightweight MobileNet network achieved TDRs below 20%, making them unsuitable for filtering systems. However, high-resolution Nabla (x8d8x16) achieved a TDR of 80%, demonstrating the potential of bitmap-based methods despite their simplicity.

Figure 12 compares the minimum FDR across 161,062 non-equivalent images. As shown, the image processing-based ORB method exhibited the lowest FDR among all techniques. The extraction of hundreds of 128-dimensional feature vectors in image processing-based methods minimizes the likelihood of non-equivalent DNA vectors matching, resulting in consistently low FDRs. In contrast, FDRs exceeding 2–3% impose substantial costs on system operators due to the manual effort required to manage false positives, making deep learning-based methods less suitable for practical deployment. These results suggest that deep learning techniques, while effective for image classification and similarity detection, are less robust for image equivalence detection.

Given the trade-off between the TDR and FDR, Figure 13 evaluates the FDR for each method when the TDR is fixed at 80%. Only four methods achieved a TDR of 80%, and their results are presented. Despite being a bitmap-based method, Nabla (x8d8x16) achieved an acceptable FDR at minimal cost. While Autoencoder demonstrated a low FDR for the limited dataset, it showed a higher tendency for false detections when handling untrained images, resulting in higher FDRs compared to other methods.

5. Related Works

Image copy detection (ICD) is a well-known research area focused on detecting transformed versions of images. One of the earliest approaches to image copy detection involves perceptual hashing. Techniques such as Average Hashing (aHash) [33], Difference Hashing (dHash) [34], and the more robust Perceptual Hashing (pHash) [35] generate compact representations of images, enabling efficient comparison and duplicate detection. Microsoft PhotoDNA [36] is another significant development in this field, designed to detect and identify altered images through robust hashing techniques. These methods are computationally efficient but may struggle with highly transformed copies, such as those subjected to geometric distortions or heavy compression.

More recent research has leveraged deep learning to improve robustness against complex transformations. Convolutional Neural Networks (CNNs) [18] have been widely used to learn high-level feature representations that remain invariant to various image manipulations. Models such as ResNet [21] and VGG [19] have been fine-tuned on large-scale datasets to enhance their discriminative power in detecting copied images. Siamese and triplet networks have also been employed to learn similarity metrics for matching transformed versions of the same image.

Hybrid approaches combining traditional and deep learning methods have also been explored to achieve better accuracy and robustness. Some studies [37,38] integrate handcrafted features with CNN-extracted embeddings, leveraging both low-level texture patterns and deep feature representations to enhance detection performance. Others [39,40] utilize transformer-based architectures, capturing spatial and geometric relationships between image regions through self-attention mechanisms, enabling more effective copy detection.

While deep learning-based ICD methods perform well in controlled laboratory environments, they often struggle in large-scale production settings. DISC2021 [41] introduced a large-scale dataset comprising one million images to better reflect real-world workloads. This dataset simulates query conditions by including numerous distractor images that are not equivalent to the target images. However, it does not sufficiently address hard negative cases, which are crucial for distinguishing near-duplicates from true copies. To overcome this limitation, NDEC [42] has focused on developing benchmark datasets that explicitly incorporate hard negative cases, enabling more robust evaluation and adaptation to real-world scenarios.

6. Conclusions

Filtering systems are essential in large-scale content service platforms to prevent the dissemination of unauthorized media data. However, given the ease with which digital content can be manipulated, such systems are rendered ineffective if they cannot detect derivatives of unauthorized content. This study conducted a comprehensive quantitative performance evaluation of various DNA techniques designed to identify manipulated and derivative image data. To this end, four key performance metrics were defined from the perspective of image equivalence: database build time, database search time, true detection rate (TDR), and false detection rate (FDR). The analysis categorized image DNA techniques into three groups—bitmap-based, image processing-based, and deep learning-based—and evaluated them using extensive experiments.

The results highlight the distinct advantages and limitations of each approach. Bitmap-based techniques, with minimal computational requirements, demonstrated detection rates suitable for practical deployment, along with the fastest extraction and search speeds. Deep learning-based techniques, such as Autoencoder and VGG, excel in identifying similar images and image classification tasks but underperform in detecting image equivalence, with additional drawbacks in execution speed. For scenarios demanding extremely high detection rates and sufficient computational resources, the image processing-based ORB method emerged as the most effective choice. The findings of this study are expected to offer valuable insights for designing efficient and reliable filtering systems for large-scale content platforms.

This study proposed three classification methods based on image DNA for detecting image transformations. Future research could explore an additional classification approach using image quality metrics such as SSIM (Structural Similarity Index Measure) and PSNR (Peak Signal-to-Noise Ratio), enabling a comparative analysis of the proposed method and quality-based metrics, particularly in low-computational environments. Furthermore, validating our experimental results using widely accepted benchmarks [32,33] would enhance the reliability of our findings and provide a standardized comparison with existing approaches in the field. Another direction for future research is to explore the feasibility of integrating fuzzy logic into image DNA-based filtering techniques and to compare its performance with the current threshold-based method.

Author Contributions

K.C. implemented the architecture and algorithm and performed the experiments. H.B. designed the work and provided expertise. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00247727) and in part by RP-Grant 2023 of Ewha Womans University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data presented in this study are openly available in github at https://github.com/cezanne/nabla-dna (accessed on 30 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DNA	Deoxyribonucleic Acid
DRM	Digital Rights Management
SURF	Speeded-Up Robust Features
ORB	Oriented FAST and Rotated BRIEF
CNN	Convolutional Neural Network
TDR	True Detection Rate
FDR	False Detection Rate
VDB	Vector Database
MVR	Matched Vector Ratio

References

Available online: https://www.omnicoreagency.com/instagram-statistics/ (accessed on 30 December 2024).
Available online: https://bloggingwizard.com/pinterest-statistics/ (accessed on 30 December 2024).
Rahman, Z. Enforcing Copyright on Online Streaming Platforms: Challenges Faced by Rights Holders in the Digital Era. Int. J. Multidiscip. Res. 2023, 5, 1–14. [Google Scholar] [CrossRef]
Jun, W.-C. A study on the analysis of and educational solution for digital sex crimes in Korea. Int. J. Environ. Res. Public Health 2023, 20, 2450. [Google Scholar] [CrossRef] [PubMed]
Okolie, C. Artificial intelligence-altered videos (deepfakes), image-based sexual abuse, and data privacy concerns. J. Int. Women’s Stud. 2023, 25, 11. [Google Scholar]
Yousaf, K.; Tabassam, N. A deep learning-based approach for inappropriate content detection and classification of youtube videos. IEEE Access 2022, 10, 16283–16298. [Google Scholar] [CrossRef]
Kundur, D.; Kannan, K. Video fingerprinting and encryption principles for digital rights management. Proc. IEEE 2004, 92, 918–932. [Google Scholar] [CrossRef]
Law-To, J.; Chen, L.; Joly, A.; Laptev, I.; Buisson, O.; Gouet-Brunet, V.; Boujemaa, N.; Stentiford, F. Video copy detection: A comparative study. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9–11 July 2007. [Google Scholar]
Coyle, K. The Technology of Rights: Digital Rights Management. 2003. Available online: http://www.kcoyle.net/drm_basics.pdf (accessed on 30 December 2024).
Memon, N.; Ping, W.W. Protecting digital media content. Commun. ACM 1998, 41, 35–43. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Park, S.; Bahn, H. Trace-based Performance Analysis for Deep Learning in Edge Container Environments. In Proceedings of the 8th IEEE International Conference on Fog and Mobile Edge Computing (FMEC), Tartu, Estonia, 18–20 September 2023; pp. 87–92. [Google Scholar]
Available online: https://github.com/cezanne/nabla-dna (accessed on 2 December 2024).
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Corfu, Greece, 20–25 September 1999. [Google Scholar]
Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
Rosten, E.; Tom, D. Machine learning for high-speed corner detection. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. [Google Scholar]
Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of the 11th European Conference on Computer Vision (ECCV), Heraklion, Crete, Greece, 5–11 September 2010. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Simonyan, K.; Andrew, Z. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning; PMLR: Westminster, UK, 2020. [Google Scholar]
Grill, J.; Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
Lee, J.; Bahn, H. Analyzing Data Access Characteristics of Deep Learning Workloads and Implications. In Proceedings of the 3rd IEEE International Conference on Electronic Information Engineering and Computer Science, Changchun, China, 22–24 September 2023; pp. 546–551. [Google Scholar]
Available online: https://www.kaggle.com/datasets/pavansanagapati/images-dataset (accessed on 2 December 2024).
Han, Y.; Liu, C.; Wang, P. A comprehensive survey on vector database: Storage and retrieval technique, challenge. arXiv 2023, arXiv:2310.11703. [Google Scholar]
Available online: https://github.com/facebookresearch/faiss (accessed on 30 December 2024).
Malkov, Y.A.; Yashunin, D.A. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. arXiv 2016, arXiv:1603.09320. [Google Scholar] [CrossRef] [PubMed]
Available online: https://cocodataset.org/ (accessed on 2 December 2024).
Lee, J.; Bahn, H. File Access Characteristics of Deep Learning Workloads and Cache-Friendly Data Management. In Proceedings of the 10th IEEE International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Palembang, Indonesia, 20–21 September 2023; pp. 328–331. [Google Scholar]
Haviana, S.F.C.; Kurniadi, D. Average hashing for perceptual image similarity in mobile phone application. J. Telemat. Inform. 2016, 4, 12–18. [Google Scholar]
Neal, K. Kind of Like That. 2013. Available online: https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html (accessed on 2 December 2024).
Wong, H.C.; Bern, M.; Goldberg, D. An image signature for any kind of image. In Proceedings of the IEEE International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002. [Google Scholar]
Steinebach, M. An analysis of photodna. In Proceedings of the 18th International Conference on Availability, Reliability and Security, Benevento, Italy, 29 August–1 September 2023. [Google Scholar]
Nguyen, D.T.; Pham, T.D.; Baek, N.R.; Park, K.R. Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors. Sensors 2018, 18, 699. [Google Scholar] [CrossRef] [PubMed]
Hamdi, M.; Senan, E.M.; Jadhav, M.E.; Olayah, F.; Awaji, B.; Alalayah, K.M. Hybrid Models Based on Fusion Features of a CNN and Handcrafted Features for Accurate Histopathological Image Analysis for Diagnosing Malignant Lymphomas. Diagnostics 2023, 13, 2258. [Google Scholar] [CrossRef] [PubMed]
Lee, J.S.; Hsu, W.; Lee, M.L. An End-to-End Vision Transformer Approach for Image Copy Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Li, X.; Ding, H.; Yuan, H.; Zhang, W.; Pang, J.; Cheng, G.; Chen, K.; Liu, Z.; Loy, C.C. Transformer-based visual segmentation: A survey. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Washington, DC, USA, 1 January 2024. [Google Scholar]
Pizzi, E.; Kordopatis-Zilos, G.; Patel, H.; Postelnicu, G.; Ravindra, S.N.; Gupta, A.; Papadopoulos, S.; Tolias, G.; Douze, M. The 2021 video similarity dataset and challenge. arXiv 2021, arXiv:2106.09672. [Google Scholar]
Wang, W.; Sun, Y.; Yang, Y. A benchmark and asymmetrical-similarity learning for practical image copy detection. Proc. AAAI Conf. Artif. Intell. 2023, 37, 2672–2679. [Google Scholar] [CrossRef]

Figure 1. Unlawful media filtering by a media service provider.

Figure 2. Unlawful image filtering using image DNA.

Figure 3. The process of extracting Nabla DNA.

Figure 4. True and false detection rates on image DNA similarity.

Figure 5. DNA similarity heatmap for equivalent and non-equivalent image pairs. Cell colors transition from red (−1.0) to white (0.9) for equivalent pairs and from blue (1.0) to white (0.5) for non-equivalent pairs, representing increasing and decreasing DNA similarity, respectively. Cells exceeding 0.9 in red or below 0.5 in blue do not display gradients.

Figure 6. Comparison of DNA DB build time.

Figure 7. Comparison of DNA DB search time.

Figure 8. Detection rates of bitmap-based Nabla methods.

Figure 9. Detection rates of image processing-based methods.

Figure 10. Detection rates of deep learning-based methods.

Figure 11. Maximum true detection rates for 20 types of COCO-transformed images.

Figure 12. Minimum false detection rates for COCO 161,062 images.

Figure 13. False detection rates in cases satisfying 80% True Detection Ratio.

Table 1. Comparative analysis of image DNA features in three approaches.

Type	Image Characteristics Used for DNA	Feature Extraction Method
Bitmap	Pixel intensity, color values	Direct pixel mapping
Image Processing	Edges, texture, frequency components	Filters, histograms, transforms
Deep Learning	High-level patterns, learned representations	CNN feature maps

Table 2. The 20 transformations of COCO images in our experiments.

No	Image Name	Transformation Details
1	COCO_val2014_000000524333_modified	Reduce image size
2	COCO_val2014_000000393267_modified	Adjust image colors
3	COCO_val2014_000000393243_modified	Add partial mosaic
4	COCO_val2014_000000393226_modified	Crop image border
5	COCO_val2014_000000262175_modified	Add caption to image
6	COCO_val2014_000000262161_modified	Rotate 90 degree
7	COCO_val2014_000000262148_modified	Stretch horizontally
8	COCO_val2014_000000131115_modified	Enhance image color
9	COCO_val2014_000000131108_modified	Rotate slightly and crop
10	COCO_val2014_000000131089_modified	Change image texture
11	COCO_val2014_000000262197_modified	Enlarge image size
12	COCO_val2014_000000393271_modified	Adjust color levels
13	COCO_val2014_000000262200_modified	Obscure face with mosaic
14	COCO_val2014_000000131138_modified	Crop image border
15	COCO_val2014_000000393284_modified	Add caption to image
16	COCO_val2014_000000000074_modified	Convert black and white
17	COCO_val2014_000000131152_modified	Rotate −90 degrees
18	COCO_val2014_000000524373_modified	Stretch vertically
19	COCO_val2014_000000262229_modified	Blur face
20	COCO_val2014_000000262235_modified	Overlay with pen strokes

Table 3. Image DNA techniques used in the experiments.

Type	DNA Technique	Experiment Settings
Bitmap-based	Nabla	Configured with various resolutions and depths ranging from x4d4x8 to x8d8x16
Image Processing- based	SIFT	Using SIFT class within the Python3 OpenCV package
Image Processing- based	ORB	Using ORB class within the Python3 OpenCV package
Deep Learning- based	Autoencoder	- The encoder consists of two Conv2D layers in Keras, and the decoder consists of two Conv2DTranspose layers. - The latent vectors are configured with three settings of 4, 8, and 16 dimensions, denoted as Autoencoder u4, u8, and u16.
	MobileNet	- Utilize MobileNet from Keras, using the output image classification probability vector as the DNA. - Conduct experiments with two types of DNA: a 1000-dimensional output vector (denoted as u1000) and a 50-dimensional vector (u50), obtained by averaging the outputs in groups of 20.
	VGG	- Utilize VGG16 from Keras, using the output image classification probability vector as the DNA. - Conduct experiments with two types of DNA: a 1000-dimensional output vector (denoted as u1000) and a 50-dimensional vector (u50), obtained by averaging the outputs in groups of 20.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cho, K.; Bahn, H. Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms. Appl. Sci. 2025, 15, 4539. https://doi.org/10.3390/app15084539

AMA Style

Cho K, Bahn H. Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms. Applied Sciences. 2025; 15(8):4539. https://doi.org/10.3390/app15084539

Chicago/Turabian Style

Cho, Kyungwoon, and Hyokyung Bahn. 2025. "Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms" Applied Sciences 15, no. 8: 4539. https://doi.org/10.3390/app15084539

APA Style

Cho, K., & Bahn, H. (2025). Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms. Applied Sciences, 15(8), 4539. https://doi.org/10.3390/app15084539

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Image DNA Techniques for Filtering Unauthorized Content in Large-Scale Social Platforms

Abstract

Featured Application

Abstract

1. Introduction

1.1. Problem Definition

1.2. Main Findings

1.3. Paper Structure

2. Image DNA Techniques

2.1. Bitmap-Based Approaches

2.2. Image Processing-Based Approaches

2.3. Deep Learning-Based Approaches

3. Strategies for Analyzing Image DNA Techniques

3.1. Performance Metrics for Image DNA

3.2. Image DNA DB

3.3. Experimental Data Setup

3.4. Limitations of the Study

4. Experiment Results

5. Related Works

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI