Next Article in Journal
Precision Fermentation as a Tool for Sustainable Cosmetic Ingredient Production
Previous Article in Journal
Numerical Analyses of the Influences of Connector Structures on the Performance of Flat-Tube SOFC
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Leveraging Network Analysis and NLP for Intelligent Data Mining of Taxonomies and Folksonomies of PornHub

by
Jan Sawicki
1,*,
Loizos Bitsikokos
2,
Yulia Belinskaya
3,
Maria Ganzha
1 and
Marcin Paprzycki
4
1
Faculty of Mathematics and Information Science, Warsaw University of Technology, Plac Politechniki 1, 00-661 Warszawa, Poland
2
Brian Lamb School of Communication, Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, USA
3
Media and Digital Technologies, St. Pölten University of Applied Sciences, Campus-Platz 1, 3100 St. Pölten, Austria
4
Polish Academy of Sciences, Pałac Kultury i Nauki pl. Defilad 1, 00-901 Warszawa, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(17), 9250; https://doi.org/10.3390/app15179250
Submission received: 10 July 2025 / Revised: 4 August 2025 / Accepted: 14 August 2025 / Published: 22 August 2025

Abstract

This study explores graph-based methods to model and analyze the semantic interplay between editorial taxonomies and user-generated folksonomies on the PornHub platform, using a dataset of over 97,000 videos (2015–2024). We construct and examine a graph of user-assigned tags and platform-defined categories, applying the Leiden community detection algorithm to uncover latent semantic groupings. To enrich the graph structure, we embed textual metadata using state-of-the-art language models (Qwen3-Embedding-4B and all-MiniLM-L6-v2), enabling the integration of natural language processing within graph-based learning. Our analysis reveals that folksonomies partially align with taxonomies through synonymous structures but also diverge by capturing nuanced attributes such as body features and aesthetic styles. These asymmetries highlight how folksonomies introduce higher-resolution semantic layers absent from fixed-category systems. By fusing graph mining, NLP-driven embeddings, and network-based clustering, this work contributes a hybrid methodology for semantic knowledge extraction in large-scale, user-generated content. It offers implications for graph-based recommendation, content moderation, and metadata enrichment—demonstrating the utility of graph-centric AI techniques in real-world multimedia data settings.

1. Introduction

The rapid expansion of user-generated content on online platforms has intensified the need for effective content categorization and retrieval systems. In this context, adult video websites, such as PornHub, present a particularly rich and complex domain for studying classification frameworks due to the coexistence of platform-imposed categories and a vast array of user-defined tags. While the fixed-category taxonomy reflects the website’s editorial or organizational decisions, user-generated tags represent an emergent folksonomy shaped by viewers and contributors, potentially revealing alternative or more nuanced semantic structures.
Previous research has extensively investigated internet content classification using traditional methods [1] as well as recent advances in machine learning and natural language processing (NLP) [2,3,4]. Studies focusing specifically on PornHub have highlighted biases and limitations in its category and recommendation systems using vector space models and clustering techniques [5]. However, these efforts have predominantly concentrated on the platform’s fixed-category taxonomy, overlooking the rich semantic interplay with user-defined tags, which may offer complementary or divergent organizational insights.
The tension between imposed taxonomies and emergent folksonomies is well documented in the literature [6,7,8], with hybrid approaches often proposed to enhance retrieval performance and better reflect user perspectives. Given PornHub’s status as one of the most visited websites globally [9], it provides an ideal case study to explore this interplay in a large-scale, real-world setting.
The primary goal of this paper is to characterize the relationship between PornHub’s official categories and user-defined tags by leveraging advanced graph network analysis and state-of-the-art NLP embedding models. Specifically, we investigate whether the imposed categories overlap or align with emergent tagging patterns and explore how semantic similarity can bridge these two classification systems. Our approach employs network-based community detection via the Leiden algorithm [10] and multiple embedding models, including Qwen3-Embedding-4B [11] and all-MiniLM-L6-v2 [12], complemented by instruction-tuned large language model Mistral-7B-Instruct-v0.2 [13] for semantic evaluation.
This study is grounded in information technology and data science—specifically graph networks and natural language processing—but it also contributes to interdisciplinary areas such as sociology and online content analysis. By addressing the hybrid nature of content classification on adult video platforms, this work contributes to the broader field of intelligent data analysis, particularly in the application of machine learning and NLP techniques for social network data mining and content understanding. The findings provide insights into how folksonomies and taxonomies coexist and interact in complex digital environments, with implications for recommendation systems, content moderation, and the design of user-centric classification frameworks.
The rest of this work is organized as follows. Section 2 presents the state of the art. Section 3 describes the methods used in this work. Section 4 presents the results. Section 5 discusses the results. Section 6 concludes the work.
  • Trigger Warning: This article contains discussion and analysis of explicit adult content from a publicly accessible website, which may be disturbing or sensitive to some readers.

2. Related Works

The challenge of categorizing internet content has been approached through various methods. This section highlights the most pertinent examples from the existing literature.
The approach proposed in one of the earlier works [1] addresses two primary challenges in web-based information retrieval: information overload and vocabulary mismatch. Traditional search mechanisms—relying heavily on keyword matching (as in Lycos or Yahoo) or hypertext navigation (as in Mosaic or Netscape) struggle to accommodate the vast and semantically diverse content of the World Wide Web. The methodology introduces a concept-based categorization and search framework that leverages machine learning, specifically a multilayered neural network clustering algorithm based on the Kohonen Self-Organizing Feature Map (SOM). The core of the approach involves the automatic textual analysis of internet homepages, which are then categorized based on their content into hierarchical subject-specific clusters. These category hierarchies form a structured foundation that facilitates the development of concept spaces—semantic groupings of related documents—which can be employed to enhance associative information retrieval. The process is designed to operate in two stages: the first constructs the classification structure through self-organization, while the second uses these structures to support concept-based browsing and keyword search refinement. Initial testing of this multilayered SOM (M-SOM) approach demonstrated its capacity to identify meaningful patterns within small datasets, such as electronic brainstorming records and entertainment-related web pages, suggesting potential scalability to broader applications. This system offers a data-driven mechanism to dynamically organize unstructured web content, providing a basis for more intuitive and semantically aware search experiences.
More recent works apply deep learning [2,3] and large language models (LLMs) [4] to automatically categorize the content.
A particularly thematically close work [5] concerns the analysis of PornHub’s content categorization and recommendation systems, as well as preliminary results on its algorithmic bias. This work is part of a broader line of research demonstrating that the categorization of pornographic content can be meaningfully analyzed using quantitative or computational methods [14,15,16]. In particular, the authors use a large-scale corpus of video categories scraped directly from PornHub’s user interface from September 2023 to March 2024, in combination with a publicly available dataset of PornHubvideo metadata from 2019. Each video is treated as a document in the corpus and is represented by a list of category tokens, resembling a sentence composed of “words”. Subsequently, a Word2vec model [17] is trained, resulting in a 100-dimensional vector space of categories that preserves structural relationships between them. The space is reduced to lower dimensions using Principal Component Analysis, and categories are grouped into clusters using K-means clustering. The clusters are named based on which categories belong to them. Results indicate that the category meaning space of PornHub largely revolves around mainstream pornographic content, while also revealing certain patterns of bias within the categorization system itself. Categories regarding lesbian pornography are found to be spatially dislocated from the rest of the corpus in the embedding space. In addition, categories around race/ethnicity and gender are distributed in biased ways. Categories signifying whiteness are related to female bodies, and male body parts are connected to non-white race/ethnicities. The work also demonstrates, as a proof of concept, how an understanding of the content categorization can be used in auditing studies of PornHub’s recommendation system, projecting patterns of videos recommended to different types of users (e.g., straight male vs straight female) on the category embedding space. The work only deals with categories and leaves analyzing user-generated tags for future research.
The work in [5] drives the problem stated previously [6] about the interplay between website-imposed categorization system (taxonomy) and user-created and user-driven categorization system (folksonomy).
Recent research has focused extensively on the comparison and integration of user-generated tags (folksonomies) with traditional structured taxonomies for enhanced content classification and retrieval. The work in [7] investigated the relationship between user-generated tags and traditional subject headings in library systems, aiming to evaluate how folksonomies can complement controlled vocabularies. Their study highlights that while folksonomies provide flexible and user-centric metadata, they often suffer from inconsistencies and lack the precision of formal subject headings, suggesting that a hybrid approach could improve subject access [7].
Similarly, a case study [8] in marine science explores the overlap between user-generated tags and formal metadata elements such as subject descriptors and author keywords. The study finds that folksonomies offer alternative access points that better reflect users’ perspectives, thus enhancing information retrieval, particularly in specialized domains [8]. This reflects a growing recognition of the value of user contributions in complementing expert-generated metadata.
In the cultural heritage domain, a hybrid model integrating folksonomy with practical taxonomy was proposed for the Virtual Museum of the Pacific [18]. The work demonstrates how user-generated tags can be systematically incorporated into formal classification frameworks to enrich digital collections and maintain dynamic vocabularies that evolve with community input. Their method enhances both the relevance and accessibility of digital cultural assets [18].
Further advancing the integration efforts, a 2022 study [19] applies machine learning techniques to analyze and merge taxonomy and folksonomy terms within hybrid subject devices. By leveraging semantic relationships identified through machine learning, this approach facilitates improved organization and retrieval of digital resources, showcasing the potential of AI-driven methods to reconcile structured and unstructured metadata [19].
Collectively, these works emphasize the importance of combining folksonomies with traditional taxonomies, addressing challenges such as semantic inconsistency and enhancing retrieval performance by employing hybrid frameworks and machine learning tools. This integrative direction points toward more user-adaptive and semantically rich metadata systems in digital information environments.
A particular subdomain of internet content is pornography, with a leading service, PornHub. As of recent reports, PornHub ranks among the most visited websites globally, receiving over 2.5 billion visits per month and hosting millions of user-uploaded videos spanning a wide range of adult content categories [9].
The platform’s user base is demographically and geographically diverse, with a significant portion of traffic originating from North America and Europe, and its data trends have been used in studies examining internet behavior, sexual content consumption patterns, and technology use [20].
The platform has been widely analyzed under various aspects such as COVID-19 influence on users’ connections and userbase solidarity [21], sexual identification [14], or traffic analysis through HTTP traces [22].
In their work, the authors of [21] also address the issue of PornHub and its platformization [23] and folksonomy. According to PornHub’s yearly published statistics, there are notable peaks in certain search terms that correlate with various cultural, political, and societal events. For example, in 2019, the search term “Joker” peaked following the release of the film. In the case of this particular study, where the authors focused on the COVID-19 pandemic, daily searches for “coronavirus porn” (and several variations) increased steadily from February 2020 through March and April, reaching an astonishing 60 million searches. The response from the community was also immediate: alongside themed porn videos (such as so-called “medical” or “quarantine” porn), various videos containing pedagogical or humorous content were uploaded. Furthermore, thousands of unrelated porn videos were tagged as “coronavirus porn” in order to be picked up by the algorithm and shown to a wider audience. This shows, on the one hand, how PornHub as a platform organizes, nudges, or constrains user behavior. On the other hand, this also gives a certain power and advocacy to the users who understand the rules of the game: how complying with the platform’s algorithmic tendencies helps to optimize visibility and engagement of the content. By strategically using tags, users are not simply passive content consumers but active participants and meaning-makers [24].
It is an important question addressed in platform studies: How do tagging practices influence recommendation engines? How do the official categories defined by the platform, which are top-down imposed classifications, interact with the bottom-up content structuring carried out by the users? Who shapes what gets recommended—the platform’s hierarchy or the collective language of users?
There is, however, a gap in direct analysis of content categorization on PornHub that sparks interest in extending the internet categorization research to pornographic aspects.
With regards to content categorization, PornHub proposes a twofold approach. One is the website-imposed set of fixed categories. The second are user-defined, ever-evolving tags. The interplay of the two approaches is the focus of this contribution, and it was analyzed with the methods following in Section 3.

3. Methods

This section describes the methods used in this work. The methods are divided into two parts: the first part describes the statistical methods, and the second part describes the network methods applied to the dataset.

3.1. Ethical Considerations

In this research, the authors have followed the ethical guidelines of the Internet Research Association [25] and have undergone an assessment by the EAB (Ethics Advisory Board) of St. Pölten University of Applied Sciences. Even though the platform is in the public domain and all the data is accessible without registration, the paper does not mention any usernames, and within our database, all usernames are pseudo-anonymized (if present). To further minimize risk, all data was stored securely, handled in accordance with institutional data protection policies, and used exclusively for the analysis described in this study. The authors are committed to protecting the participants’ integrity, especially as they were not aware that this data would be used for the purposes of this research. Also, the researchers who engaged with the content were fully informed about the nature of the material, including potentially explicit or triggering elements.

3.2. Statistical and Network Methods

3.2.1. Statistical Methods

The initial attempt to characterize and understand the interaction between categories and tags relies on basic statistical techniques. While such methods provide useful insights through the analysis of co-occurrence patterns, they fall short when the focus shifts from mere frequency to popularity, as measured by view counts. These preliminary findings, which are limited to one-to-one relationships, are discussed in Section 4.1.
A comprehensive analysis of the relationships between categories and tags would require examining all possible subsets of categories in conjunction with all possible subsets of tags. However, this exhaustive approach entails a computational complexity of O ( 2 N + K ) , rendering it infeasible. Additionally, many of the theoretically possible combinations of subsets are not practically relevant due to their infrequent occurrence in the dataset.
To overcome these limitations, we propose a more efficient and principled method for subset selection. Specifically, we employ network analysis techniques and community detection algorithms to uncover many-to-many relationships in a computationally tractable and semantically meaningful manner.

3.2.2. Network Methods

A common way to model many-to-many relations is graph networks. A graph is an ordered pair G = ( V , E ) , where
  • V is a non-empty set of elements called vertices or nodes, and 
  • E { { u , v } u , v V , u v } is a set of unordered pairs of distinct vertices, called edges, in the case of an undirected graph. In a directed graph (or digraph), E { ( u , v ) u , v V , u v } , where the order of the pair indicates direction.
A network (or graph network) is a graph created based on real-life data.
In this approach, we build 2 graphs: one for categories and one for tags. Then, the goal is to find a mapping between groups of categories (nodes in category network) and groups of tags (nodes in tag network). To do so, first, the communities (also called clusters) need to be detected.
The most recent advances [26,27] in community detection showed that the Louvain [28] has been surpassed by the Leiden method [10] for community detection.

The Leiden Algorithm

The Leiden algorithm is an iterative method for community detection in complex networks, designed to optimize a quality function such as modularity or the Constant Potts Model (CPM). It was introduced by Traag, Waltman, and van Eck [10] to ensure key structural properties of the resulting communities, including internal connectivity, local optimality, and asymptotic stability. The method operates on a weighted or unweighted graph G = ( V , E ) , where V denotes the set of nodes and E the set of edges. The algorithm proceeds through a sequence of repeated iterations, each consisting of three main phases: (1) local movement of nodes, (2) refinement of the resulting partition, and (3) aggregation into a coarser graph.
  • Local Moving of Nodes.
In the first phase, each node is initially assigned to its own community or to a predefined partition. Nodes are then sequentially considered for movement to neighboring communities if such a move yields a positive gain in the quality function. Unlike deterministic greedy strategies, the Leiden algorithm uses a stochastic approach that allows moves even in cases of equal gain, which helps escape poor local optima. This phase continues until no further improvement is possible, resulting in a preliminary partition.
2.
Refinement of Partitions.
The key innovation of the Leiden algorithm lies in the refinement phase. After the initial local movement phase, each community may be internally disconnected or only weakly connected. To address this, each community is further partitioned into subcommunities such that all subcommunities are guaranteed to be connected subgraphs. This refinement is achieved by building a subgraph for each community and applying the local moving heuristic within it, using the same quality function. The resulting subcommunities are merged only if their union results in a connected structure and leads to an increase in the overall quality function. This guarantees the property of subset optimality, which ensures that no subset of a community can be moved without violating connectivity or reducing quality.
3.
Aggregation into a Coarse Graph.
In the final phase of each iteration, the graph is aggregated based on the refined partition: each community is treated as a super-node, and weighted edges between these super-nodes represent the sum of weights between constituent nodes in the original graph. The three-phase process is then repeated on this coarse graph. The algorithm continues until a fixed point is reached, i.e., when the partition no longer changes between iterations.

Theoretical Guarantees

The Leiden algorithm provides stronger guarantees than many existing community detection algorithms. Most notably, it ensures the following:
  • All communities are internally connected: No community contains disconnected components. Notably, this has been proven empirically [27].
  • Subset optimality: No subset of nodes within a community can be moved to a different community (or form a new one) without decreasing the quality function or breaking connectivity.
  • Asymptotic stability: Repeated iterations lead to a stable partition that satisfies all local optimality conditions under the specified quality function.
These properties make the Leiden algorithm robust, scalable, and particularly well-suited for large-scale and high-resolution network analysis. It can be used with various quality functions beyond modularity, such as CPM, enabling multiresolution community detection.
Furthermore, the Leiden method allows adjusting the parameter of maximal community size. This allows to proceed with the community detection at different granularity levels which is further be discussed in Section 4.
This translates to creating logically connected subsets of different sizes in categories and tags. These subsets can then be compared between the categories and tag graphs. In turn, this allows searching the subset space considering only the relevant subsets of categories and tags, saving on time complexity. These results are presented in Section 4.

3.2.3. NLP Methods

In addition to examining the relationships between the co-occurrences of categories and tags, it is also important to assess them for redundancy.
Beyond the conventional exact matching of character strings (e.g., the category “Big Boobs” and the tag “big boobs”), categories and tags can be compared on a purely semantic basis (e.g., the category “tiny” and tag “petite”). The advent of large language models (LLMs) has facilitated advanced text similarity comparisons. The selection of an appropriate model presents its own challenges.
In this study, the model selection is justified based on performance metrics from Huggingface’s Massive Text Embedding Benchmark (MTEB) Leaderboard (https://huggingface.co/spaces/mteb/leaderboard, accessed on 1 August 2025). The customized benchmark utilized parameters such as Domain: Web, Task Type: Speed, Any2AnyRetrieval, Classification, Clustering, Reranking, Retrieval, and Summarization, as these parameters closely reflect the overall quality of embeddings. According to the benchmark, at the time of writing, the top-performing model was Qwen3-Embedding-8B (https://huggingface.co/Qwen/Qwen3-Embedding-8B, accessed on 1 August 2025). To enhance computational efficiency, the second-best model, Qwen3-Embedding-4B (https://huggingface.co/Qwen/Qwen3-Embedding-4B, accessed on 1 August 2025), was selected.
In order to provide a more representative evaluation across different computational capacities, instead of a single LLM, two models are included that span the spectrum of model sizes: Qwen3-Embedding-8B as a larger model and MiniLM-L6-v2 (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2, accessed on 1 August 2025) as a smaller model. Both models are featured on the Huggingface Massive Text Embedding Benchmark (MTEB) Leaderboard (https://huggingface.co/spaces/mteb/leaderboard, accessed on 1 August 2025), which is a community-driven benchmark curated by Huggingface—the largest digital community focused on Natural Language Processing. To ensure domain relevance and task alignment, we filtered only those models whose benchmark results were based on parameters tailored to our use case, including Domain: Web and Task Types such as Speed, Any2AnyRetrieval, Classification, Clustering, Reranking, Retrieval, and Summarization. This dual-model approach provides a broader view of trade-offs between performance and efficiency in real-world deployment scenarios.

3.3. Key Embedding Features

  • Multi-stage Training Pipeline: The Qwen3 Embedding models utilize a sophisticated multi-stage training pipeline. This pipeline integrates large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets, ensuring the embeddings are both robust and contextually aware.
  • Multilingual Proficiency: The models exhibit exceptional performance across multilingual evaluation benchmarks. This proficiency enables effective cross-lingual and multilingual retrieval tasks, making them suitable for global applications.
  • State-of-the-Art Performance: Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. This performance underscores their superior capability in capturing semantic nuances and contextual information in text data.
  • Versatility in Model Sizes: Available in various sizes (0.6B, 4B, and 8B), the Qwen3 Embedding models cater to a wide range of deployment scenarios. This versatility allows users to optimize for efficiency or effectiveness based on specific requirements.
  • Open-Source and Community-Driven Development: The models are publicly available under the Apache 2.0 license, promoting transparency, reproducibility, and community-driven research and development.
Overall, the Qwen3 Embedding series is engineered to deliver high-quality text embeddings, making them highly suitable for a wide range of natural language processing and information retrieval applications.
However, due to questionable results in terms of embedding similarity (discussed in Section 4), another embedding model was also tested.
The sentence-transformers/all-MiniLM-L6-v2 (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2, accessed on 1 August 2025) model is a lightweight yet high-performing transformer-based architecture developed for producing dense vector representations of sentences and short texts [12]. Based on a distilled version of Microsoft’s MiniLM [29], it is optimized using a Siamese network structure to yield semantically meaningful sentence embeddings that perform effectively in semantic similarity, clustering, and retrieval tasks. Due to its low computational footprint and strong performance, it has become a preferred choice for real-time applications involving short text segments, such as dialogue systems, question-answering, and search ranking engines [30]. Its efficacy in capturing fine-grained semantic relationships makes it particularly suitable for use cases where short, context-sensitive text matching is critical.
In this contribution, these 2 models were used to embed the categories and tags and further measure their similarity with cosine similarity.
Cosine similarity is a measure used to determine the similarity between two non-zero vectors in an inner product space. It is defined as the cosine of the angle between these vectors and is commonly used in text analysis to measure document similarity. Mathematically, given two vectors A and B , the cosine similarity cos ( θ ) is calculated as
cos ( θ ) = A · B A B
where A · B represents the dot product of the vectors, and  A and B are the magnitudes (or lengths) of vectors A and B , respectively. This metric ranges from 1 to 1, where 1 indicates that the vectors are identical in orientation, 0 indicates orthogonality, and  1 indicates that the vectors are diametrically opposed.
Furthermore, to reduce the bias of the previous 2 LLMs, another approach is introduced for confirmation. The textual value of categories and tags was compared with an instruct model Mistral-7B-Instruct-v0.2 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2, accessed on 1 August 2025). Mistral-7B-Instruct-v0.2 is a variant of the Mistral language model, specifically fine-tuned for instruction-following tasks. An instruct model is a type of language model that has been trained not only to generate coherent and contextually relevant text, but also to follow specific instructions provided by users. This fine-tuning process involves training on datasets that include both input instructions and desired responses, enabling the model to perform a wide range of tasks such as answering questions, providing explanations, and generating content based on user prompts. The Mistral-7B-Instruct-v0.2 model leverages its foundational architecture, which is optimized for understanding and generating human-like text, to excel in applications requiring precise and contextually appropriate responses to user instructions. This makes it particularly useful in scenarios where nuanced understanding and task-specific outputs are crucial.
The template for the prompts was

You are an expert in pornographic studies.
Are these two pornographic categories similar?
Return “yes” or~“no”.
 
Category 1: {category}
Category 2: {tag}

3.4. Data Collection and Preprocessing

Following the description of the methods, we now introduce the data used in this study. The dataset comprises metadata of 5,231,225 videos collected from the PornHub website via the official API. Although the API is primarily intended for video embedding, it also provides comprehensive metadata in CSV format, including date, views, tags, and categories for each video.
The dataset spans a ten-year period, from January 2015 through December 2024 (inclusive). Basic statistics, such as the number of views per video, were extracted and used to filter out less significant entries. The distribution of video views is shown in Figure 1, where the y-axis is presented on a logarithmic scale. As evident from the figure, the distribution of views is highly skewed, with the vast majority of videos receiving minimal user attention. To mitigate this skew and ensure temporal representativeness, the top 10,000 most-viewed videos per year were retained.
Next, category and tag information from the metadata was extracted. The original dataset included 137 unique categories and 979,659 distinct tags. While the category list was preserved in its entirety (with the exception of one advertisement/technical category), the tag list was pruned to improve analytical tractability. Specifically, we selected the top 1000 tags per year based on total view counts, discarding the remainder.
The final curated dataset consists of 97,071 videos, encompassing 136 unique categories and 214 unique tags. This yields a total of 5,581,615 distinct existing category–tag pairs.
It is important to note that, for the instruct model task, the category and tag labels have been preprocessed to simplify the data. This preprocessing involved removing hyphens and converting all characters to lowercase.

4. Results

4.1. Statistical Results

The top 100 category–tag pairs are shown in Table 1 (by category views fraction) and Table 2 (by tag views fraction).
Category–tag pairs that share the larger category views fraction Table 1 reveal some patterns of alignment between categories and tags. They include exact matches (e.g., Latino—latin, Black—black, Bareback—bareback), but also distinctions of the same category (e.g., Asian—japanese/korean/chinese). In addition, they include logical extensions to categories, such as Solo Male—big-cock, Buakkake—cumshot. These observations seem to suggest that tags play a supplemental role to the platform’s categorization scheme. They serve to complement, clarify, or reproduce the categories, and in that sense, they are closely related to them.
Category–tag pairs that share the larger tag views fraction (Table 2) are mostly associated with the Pornstar category.
This category can be understood as both a functional and strategic element within the platform’s larger categorization framework. It does not offer specific details about the nature of the content beyond the fact that it features well-known adult film performers, so this category can also be seen as a placeholder.
However, it is important to note that it shares a large number of views with tags signifying female performers. Tags such as huge-tits, cougar, stockings, small-tits, girl-on-girl, reverse-cowgirl, fake-tits, and pussy-licking are related to it, whereas tags referring to male body parts are limited to just big-dick. The performer-specific content (e.g., Pornstar—huge-tits, Pornstar—small-tits) consistently generates a high number of views and is actively driving traffic to the platform.
The Pornstar category, thus, serves a dual function: first, it groups content based on the appearance or involvement of a well-known adult film performers, which supposedly can help users easily locate content related to specific performers, but it also falls under a platform-market logic: from a business and platform perspective, the Pornstar category plays a crucial role in driving traffic and maximizing engagement.
Much like the results in Table 1, tags are seen to complement categories such as Interracial—blacked, or Interracial—bbc. However, tags are also providing synonyms to categories such as Anal—ass-fuck, Threesome—3some, and Creampie—cum-in-pussy.
The numerical results show that the maximal tag views fraction reached 20%, while the maximal category views fraction reached 16% for category–tag pairs that appeared in at least 10 videos. This shows some overlap between categories and tags, but no clear one-to-one relation. The lack of a one-to-one relation sparks the need for deeper analysis with many-to-many relations.

4.2. Network Results

As described in Section 3, graph networks were constructed to represent relationships among categories and methods. These networks were subsequently analyzed using the Leiden community detection algorithm, applied with varying parameters for maximal community size.
For illustrative purposes, Figure 2 presents the category network with identified communities. Figure 3 depicts the corresponding tag network, also annotated with community structures. Given the high density of connections, a filtered version of the tag network—retaining only the top 10% of edges by weight (proportional to view counts)—is shown in Figure 4 to improve clarity.
To evaluate the correspondence between category and tag communities, co-occurrence patterns were examined based on their presence in actual videos. Specifically, the metric used to quantify co-occurrence strength was the total number of views of videos in which at least one category from a given category cluster and at least one tag from a corresponding tag cluster were present.
The results are summarized in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, which present the top 10 category–tag community pairs for each year from 2015 to 2024.
The columns “cat. max comm size” and “tag max comm size” indicate the maximum community sizes specified for the Leiden algorithm when applied to the category and tag networks, respectively. The column “views” reports the total view count of all videos where any category from the category cluster and any tag from the tag cluster appeared.
Pairs are ranked according to the “view × fraction” column, which is computed as the product of “views” and “tag views fraction”—the latter representing the proportion of views attributable to videos containing tags from the corresponding tag cluster.
While the matched community structures of categories and tags appear to be evolving over time, tag communities are increasingly aligning with their categorical counterparts.
In 2015, 2018, and 2019 (Table 3, Table 6 and Table 7), matching communities that include Step Fantasy in the category graph also feature its synonyms in the tag graph (e.g., step-sister, step-brother, step-siblings, etc.). Additionally, there are direct matches in 2017, such as step-fantasy (Table 5). These observations align with a view of tags as complementary and closely matching platform taxonomies.
However, tag communities include explanatory concepts that are not part of the categorical system. For example, concepts around hair (shaved), weight (skinny), sex positions (doggy-style, missionary, reverse-cowgirl), and clothing (stockings, lingerie) appear in community structures that match categories which lack reference to such concepts (e.g., in Table 3 and Table 7).
Tags also introduce intermediary terms that complement existing categories. For example, while only Small Tits and Big Tits exist in the categorical system, the medium-boobs tag emerges in tag communities that align with the category-communities of Big Tits (e.g., Table 6).
Nevertheless, it is important to note that the fact that tags complement or provide missing information to the categories does not inherently grant them emancipatory potential. The fine-grained information added in tag structures still aligns with the tendency of pornographic taxonomies to categorize, classify, and eventually dominate subjects.
This is even more evident in matching tag communities in recent years. From 2020 onwards, a certain stabilization in community matches can be observed. Tag communities in 2020 (Table 8) contain tags like blonde, big-cock/big-dick, hardcore that match the category communities containing corresponding categories. A similar situation is observed in 2021 (Table 9), with tags like pov and blowjob matching their respective categories. In 2022 (Table 10), although tag communities appear to be more diverse, they still revolve around the same themes (e.g., reverse-cowgirl, blowjob, brunette). In 2023 (Table 11), matching between categories and tags continues for hair colors, while in 2024 (Table 12) tag communities focus on doggystyle, which might not be part of category communities, yet remains consistent across the best matches.
These qualitative observations suggest a semantic de-diversification of tags over time, as community content appears to consolidate around a narrower set of repeated themes. We speculate that this trend may reflect broader platform dynamics, such as algorithmic feedback loops that promote popular or high-engagement content, or platform-level homogenization where recommendation systems steer users toward more universally resonant material. Another speculation is that such convergence could also be driven by user tagging behavior adapting to dominant trends in order to increase visibility or alignment with platform norms. Further research is needed to assess the mechanisms and implications of this semantic narrowing, particularly in relation to content diversity, recommendation fairness, and cultural representation.

4.3. NLP Results

Having examined the structural alignment and semantic evolution of category–tag communities through network analysis, NLP-based results are presented, attained using methods from Section 3.
To facilitate interpretability, Figure 5 presents the t-SNE [31] visualization of the category embeddings, while Figure 6 depicts the t-SNE-reduced embeddings of the tags. These visualizations are intended solely for illustrative purposes; all analyses and conclusions are based on the original, high-dimensional embeddings prior to dimensionality reduction.
The distribution of cosine similarities are shown in Figure 7 (MiniLM) and Figure 8 (QWEN). The average and median results of QWEN are higher (mean of 95%) than those of MiniLM (mean of 66%).
To establish the best matching category for each tag, all pairs have been sorted by the cosine similarity and the Boolean similarity value returned by the instruct model.
The results are presented in Table 13, Table 14, Table 15, Table 16 (MiniLM) and Table 17, Table 18, Table 19, Table 20 (QWEN).
Results of similarity score matching further validate qualitative observations of tags as aligning with provided categories.
The top similar category–tags identified by MiniLM (Table 13) reveal that many tags are simply exact matches to categories. Exceptions include the use of hyphens instead of spaces (big-tits vs Big Tits), abbreviations (3some vs Threesome), and the omission of the “adult” placeholder (e.g., teen vs Teen (18+)). In addition, tags provide direct synonyms or etymologically related terms (e.g., masturbate vs Masturbation). More interestingly, semantically similar tags include orgasm matching with Female Orgasm, suggesting that orgasms in a pornographic context are primarily associated with women. Similarly, the matching of tits with Big Tits suggests that breasts in a pornographic context are more likely to be perceived as large, while the addition of amateur to preexisting categories (amateur-threesome vs Threesome) further illustrates trends around perceptions in a pornographic context.
Furthermore, similar categories can complement information on type, size, action, or age (e.g., big-natural-tits, cum-on-tits, big-tits-milf, perky-tits related to Big Tits on Table 14).
The similarity of synonyms and explanatory terms is validated in Table 16, where step-mom, step-sister, step-siblings are similar to Step Fantasy. Of particular interest is also the relation of glasses and Reality in Table 16. In a PornHub marketing campaign, users were encouraged to fantasize about a stranger wearing glasses they potentially encounter in real life by searching PornHub using the keywords “sexy girl glasses” and “nerdy girl glasses” [32].
The Qwen model further aids in uncovering semantically similar tags. Note the matching between hard-fuck/hardcore and Hardcore (Table 17), all the synonyms for large breasts and penises (Table 17), the additions to the Step Fantasy category (Table 17 and Table 18), the explanatory terms for Cumshot (Table 17 and Table 18) and Amateur (Table 18).
Overall, tags, or at the very least popular tags, could be considered semantically “poor”. They provide multiple synonyms, etymologically similar terms, or additions that are largely conflated with the categorical taxonomic system, both in terms of video counts and shared views, as well as semantically.
However, there are cases where tags provide intermediary concepts that do not perfectly match categorical sets. For example, we previously observed that communities of tags containing medium-boobs matched categorical communities with Big Tits; yet the tag itself is found to be more similar to Small Tits. This conflation of concepts highlights the potential for tags to provide nuanced information that is lacking in the platform’s taxonomy. Nonetheless, as previously mentioned, this potential is largely tied to the overarching tendency for classification. At this point, it is important to note that these observations might be influenced by the context of being on a “tube” site, and there is qualitative evidence that sexual folksonomies can function effectively outside of this environment [33].
In general, videos related to physical traits and performance styles (e.g., Big Tits/big-boobs, Big Dick/big-cock) appear frequently across multiple categories and tags. Notably, the Big Ass category is quite prevalent, i.e., in 2019 and 2021, it came in first place; however, it is nearly absent among tags.

5. Discussion

5.1. Interplay Between Categories and Tags

As demonstrated in Section 4, significant overlap between categories and tags emerges primarily when the potential size of both sets is large. In our experiments, the maximum size for both categories and tags was set to 20, and the majority of observable overlaps occurred when this size was 18 or greater.
This suggests that a meaningful interplay between categories and tags is only observable when a broad selection is considered—essentially, when we “cast a very wide net”. However, such overlap is somewhat expected: the more categories and tags included, the more videos are captured in the aggregation, leading to higher overall view counts.
This indicates that categories and tags are not interchangeable. Their overlap is marginal under most conditions, and they do not appear to capture the same semantic dimensions. Consequently, tags and categories offer distinct types of information: they neither duplicate each other’s roles nor function in a complementary, hierarchical manner (i.e., tags do not refine categories, and categories do not generalize tags).

5.2. Naturalization of Tags

PornHub’s official categories and user-defined tags can be interpreted as two types of boundary objects [34], facilitating communication across different communities of practice. The platform-defined categories are top-down, standardized constructs intended to organize content uniformly. In contrast, user-generated tags emerge organically from community usage, reflecting subjective, personalized interpretations of content.
Since around 2020, we have observed a stabilization in both community tagging practices and category usage. This trend may reflect the process of naturalization [34], where users increasingly internalize and reproduce the system’s classifications through continued interaction. Over time, users adapt to and reinforce these socially and historically constructed categories, making them appear natural and self-evident.

5.3. QWEN vs. MiniLM

Although QWEN is a more recent model and ranks higher on HuggingFace benchmarks, its embedding similarities appear unusually high—averaging around 95%. In contrast, MiniLM produces average similarity scores closer to 69%.
While high similarity is not inherently problematic, this discrepancy suggests that MiniLM may differentiate between categories and tags more effectively. This is likely due to architectural differences: MiniLM is optimized for processing short text spans (such as words, bigrams, and trigrams), whereas QWEN is designed to handle longer textual inputs (e.g., full sentences or paragraphs). This makes MiniLM more suitable for fine-grained semantic distinctions required in our use case.

5.4. Alternative Datasets

While PornHub is among the most visited adult content platforms and offers a rich dataset for research, alternative datasets have been considered. The most notable alternative in terms of scale is XVideos, which also publicly shares its metadata.
However, the XVideos dataset includes only user-generated tags and lacks formal category labels, making direct comparison with PornHub’s data infeasible. Nonetheless, it represents a promising direction for future research, particularly in understanding tag-based classification in the absence of formal categorization structures.

5.5. Beyond Pornographic Dataset

The selection of the dataset for this study was not motivated by its pornographic content, but rather by three key methodological criteria: (1) the dataset must be publicly accessible, (2) it must include a clearly defined taxonomy, and (3) it must contain user-generated folksonomies. These criteria were essential to ensure transparency, reproducibility, and the dual analysis of structured and unstructured classification systems.
Several alternative, non-pornographic platforms were evaluated but ultimately excluded because they failed to meet one or more of the specified requirements. A summary of these platforms and the reasons for their exclusion is provided in Table 21.
Future research may explore alternative datasets, either by obtaining access to restricted platforms or by developing methods to infer or reconstruct taxonomies and folksonomies from existing data sources.

5.6. Limitations

Despite the breadth of our analysis, several limitations must be acknowledged. These constraints stem from both methodological decisions and structural features of the available data, and they define the boundaries of our findings.
First, our dataset is exclusively derived from PornHub. This choice was not arbitrary: PornHub is currently the only major adult content platform that provides both user-generated tags and platform-defined category metadata in a publicly accessible format. Other large platforms, such as XVideos, offer tags but do not expose structured category data, making comparative or integrative analysis across platforms infeasible. As a result, our study may reflect platform-specific dynamics that do not generalize to the broader adult content ecosystem.
Second, the temporal component of our study remains underdeveloped. While we included some analysis over time, this aspect only scratches the surface. Time-based analysis in this context is particularly challenging due to two key factors: (1) the overall size and structure of PornHub’s video corpus has changed significantly over the years, which complicates normalization across time slices, and (2) there is a large anomaly in content production and consumption patterns during the years 2019–2022, likely attributable to COVID-19. These disruptions introduce noise that is difficult to control for without deeper historical metadata.
Third, we made an arbitrary but necessary methodological decision to restrict our analysis to the top tags based on frequency. This threshold was set to make the computational analysis tractable and interpretable; however, it is ultimately subjective. A more comprehensive analysis would ideally incorporate the entire tag set, including rare or niche tags, to capture the full semantic diversity of user-generated labeling. This would require significant scaling of our infrastructure and additional strategies to manage long-tail data sparsity.
Lastly, our use of large language models (LLMs) for semantic analysis was constrained by the availability of general-purpose models. While we selected models—such as QWEN and MiniLM—based on strong performance in recognized benchmarks, these models are not fine-tuned for pornography-specific content. A domain-adapted LLM trained on adult content metadata could, in principle, offer improved semantic resolution and more meaningful embeddings. However, no such model currently exists in the public domain, and training one falls outside the scope of this study.
These limitations point to several avenues for future research, including the development of pornography-specific language models, deeper temporal modeling, and the integration of cross-platform metadata.

6. Conclusions

This study investigated the interplay between PornHub’s fixed categorical taxonomy and its vast, user-generated folksonomy of tags, aiming to understand their semantic alignment, divergence, and complementary functions. By leveraging a large-scale dataset of over 97,000 videos and applying advanced graph-based community detection and NLP-based embedding techniques, we provided both quantitative and qualitative insights into how these two classification systems interact.
The results show a significant degree of overlap between categories and tags, particularly for high-frequency and high-visibility content. Tags frequently mirror platform categories—either as direct synonyms, orthographic variants, or semantically aligned phrases—suggesting that user tagging behavior often reinforces the platform’s editorial taxonomy. However, this alignment is not perfect. Tags also supplement categories by introducing finer-grained distinctions, such as body characteristics (e.g., medium-boobs), sexual positions, or specific aesthetics (e.g., stockings, glasses), which are often absent in the rigid category scheme. This supplementary role enhances discoverability and personalization but remains largely within the semantic boundaries set by the dominant taxonomy. Notably, NLP embedding results confirm that many of the top tag-category pairs are either identical or near-identical in meaning, with occasional divergences offering explanatory or context-rich additions.
Despite their potential for semantic enrichment, the most popular tags tend to exhibit limited conceptual novelty. They reflect dominant genre tropes and platform incentives, suggesting that user-driven folksonomies in this context are shaped as much by algorithmic visibility and market logic as by bottom-up cultural expression. Thus, while tags could theoretically subvert or diversify categorical meaning, in practice, they often reinforce existing classification structures.
In sum, this study illustrates that folksonomies and taxonomies on adult video platforms exist in a dynamic yet asymmetric relationship: tags tend to enrich but rarely challenge the platform-imposed order. These findings contribute to the understanding of hybrid classification systems in large-scale, user-driven content ecosystems and have broader implications for the design of recommendation engines, metadata management, and content moderation strategies in other domains.

Author Contributions

Conceptualization, J.S.; methodology, J.S.; software, J.S.; validation, J.S.; formal analysis, J.S., L.B. and Y.B.; investigation, J.S., L.B. and Y.B.; resources, J.S. and M.G.; data curation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S., L.B., Y.B., M.G. and M.P.; visualization, J.S.; supervision, J.S., L.B., Y.B., M.G. and M.P.; project administration, J.S. and M.G.; funding acquisition, M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT-4o (OpenAI GPT-4o) for the purposes of rephrasing, reformatting, proofreading, grammar, and vocabulary corrections. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, H.; Schuffels, C.; Orwig, R. Internet categorization and search: A self-organizing approach. J. Vis. Commun. Image Represent. 1996, 7, 88–102. [Google Scholar] [CrossRef]
  2. Mejia-Escobar, C.; Cazorla, M.; Martinez-Martin, E. Webpage Categorization Using Deep Learning. In Advances in Intelligent Systems and Computing, Proceedings of the 16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021), Bilbao, Spain, 22–24 September 2021; Springer: Cham, Switzerland, 2022; pp. 358–368. [Google Scholar]
  3. Apandi, S.H.; Sallim, J.; Mohamed, R.; Ahmad, N. Automatic Topic-Based Web Page Classification Using Deep Learning. JOIV Int. J. Inform. Vis. 2023, 7, 2108–2114. [Google Scholar] [CrossRef]
  4. Vörös, T.; Bergeron, S.P.; Berlin, K. Web content filtering through knowledge distillation of large language models. In Proceedings of the 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Venice, Italy, 26–29 October 2023; pp. 357–361. [Google Scholar]
  5. Bitsikokos, L. Living in the Hub: A Platform Study of Desire Semantics. Master’s Thesis, University of Chicago, Chicago, IL, USA, 2024. [Google Scholar]
  6. Keilty, P. Tagging and sexual boundaries. KO Knowl. Organ. 2012, 39, 320–324. [Google Scholar] [CrossRef]
  7. Dutta, S.; Das, S.K. A Study of Subject Headings vs. User Generated Tags. SRELS J. Inf. Manag. 2022, 59, 177–182. [Google Scholar] [CrossRef]
  8. Vaidya, P.; Harinarayana, N.S. Comparison of User-generated Tags with Subject Descriptors, Author Keywords, and Title Terms of Scholarly Journal Articles: A Case Study of Marine Science. J. Inf. Sci. Theory Pract. 2020, 7, 29–38. [Google Scholar]
  9. Pornhub Insights Team. Pornhub 2023 Year in Review. 2023. Available online: https://www.pornhub.com/insights/2023-year-in-review (accessed on 1 August 2025).
  10. Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Li, M.; Long, D.; Zhang, X.; Lin, H.; Yang, B.; Xie, P.; Yang, A.; Liu, D.; Lin, J.; et al. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. arXiv 2025, arXiv:2506.05176. [Google Scholar] [CrossRef]
  12. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
  13. Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de Las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
  14. Rama, I.; Bainotti, L.; Gandini, A.; Giorgi, G.; Semenzin, S.; Agosti, C.; Corona, G.; Romano, S. The platformization of gender and sexual identities: An algorithmic analysis of Pornhub. Porn Stud. 2023, 10, 154–173. [Google Scholar] [CrossRef]
  15. Mazières, A.; Trachman, M.; Cointet, J.P.; Coulmont, B.; Prieur, C. Deep tags: Toward a quantitative analysis of online pornography. Porn Stud. 2014, 1, 80–95. [Google Scholar] [CrossRef]
  16. Stegeman, H.M.; Velthuis, O.; Jokubauskaitė, E.; Poell, T. Hypercategorization and hypersexualization: How webcam platforms organize performers and performances. Sexualities 2023, 28, 118–136. [Google Scholar] [CrossRef]
  17. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar] [CrossRef]
  18. Eklund, P.; Goodall, P.; Wray, T.; Daniel, V.; Van Olffen, M. Folksonomy with practical taxonomy, a design for social metadata of the virtual museum of the pacific. In Proceedings of the 6th International Conference on Information Technology and Applications, Hanoi, Vietnam, 9–12 November 2009; pp. 112–117. [Google Scholar]
  19. Chatterjee, S.; Das, R. Analysing and Examining Taxonomy and Folksonomy Terms in the Hybrid Subject Device using Machine Learning Techniques. DESIDOC J. Libr. Inf. Technol. 2022, 42, 154–167. [Google Scholar] [CrossRef]
  20. Price, D.; Patterson, R. Tracking adult content consumption: Pornhub as a case study. Media Psychol. 2019, 22, 123–138. [Google Scholar]
  21. Rodriguez-Amat, J.R.; Belinskaya, Y. ‘No coronavirus can leave us without sex’: Relations of complicity and solidarity on Pornhub. Porn Stud. 2023, 10, 233–251. [Google Scholar] [CrossRef]
  22. Morichetta, A.; Trevisan, M.; Vassio, L.; Krickl, J. Understanding web pornography usage from traffic analysis. Comput. Netw. 2021, 189, 107909. [Google Scholar] [CrossRef]
  23. Nieborg, D.B.; Poell, T.; van Dijck, J. Platforms and platformization. In The SAGE Handbook of the Digital Media Economy; SAGE: Newcastle upon Tyne, UK, 2022; pp. 29–49. [Google Scholar]
  24. Jenkins, H.; Deuze, M. Convergence culture, 2008. Convergence 2008, 14, 5–12. [Google Scholar] [CrossRef]
  25. Heise, A.H.H.; Hongladarom, S.; Jobin, A.; Kinder-Kurlanda, K.; Sun, S.; Lim, E.L.; Markham, A.; Reilly, P.J.; Tiidenberg, K.; Wilhelm, C. Internet Research: Ethical Guidelines 3.0. WildApricot. 2019. Available online: https://aoir.org/reports/ethics3.pdf (accessed on 1 July 2025).
  26. Anuar, S.H.H.; Abas, Z.A.; Yunos, N.M.; Zaki, N.H.M.; Hashim, N.A.; Mokhtar, M.F.; Asmai, S.A.; Abidin, Z.Z.; Nizam, A.F. Comparison between Louvain and Leiden algorithm for network structure: A review. J. Phys. Conf. Ser. 2021, 2129, 012028. [Google Scholar] [CrossRef]
  27. Sawicki, J.; Ganzha, M.; Paprzycki, M. Application of Natural Language Processing And Temporal Networks To Analysis of Evolution of Reddit Communities. J. Autom. Mob. Robot. Intell. Syst. 2024, in press. [Google Scholar]
  28. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
  29. Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. arXiv 2020, arXiv:2002.10957. [Google Scholar]
  30. Reimers, N.; Gurevych, I. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. arXiv 2020, arXiv:2004.09813. [Google Scholar] [CrossRef]
  31. Hinton, G.E.; Roweis, S. Stochastic neighbor embedding. Adv. Neural Inf. Process. Syst. 2002, 15. [Google Scholar]
  32. Saunders, R. Bodies of Work: The Labour of Sex in the Digital Age, 1st ed., 2020 ed.; Palgrave Macmillan: London, UK, 2021. [Google Scholar]
  33. Watson, B.M. A finding aid to the pornographic imaginary: Implications of amateur classifications on/by reddit’s NSFW411. Porn Stud. 2021, 8, 201–223. [Google Scholar] [CrossRef]
  34. Bowker, G.; Star, S.L. Sorting Things Out: Classification and Its Consequences; The MIT Press: Cambridge, MA, USA, 1999; Volume 4. [Google Scholar]
Figure 1. Histogram of views of the videos. The x-axis is the number of views, and the y-axis is the number of videos.
Figure 1. Histogram of views of the videos. The x-axis is the number of views, and the y-axis is the number of videos.
Applsci 15 09250 g001
Figure 2. Graph showing filtered co-occurrence of categories in 2024.
Figure 2. Graph showing filtered co-occurrence of categories in 2024.
Applsci 15 09250 g002
Figure 3. Graph showing filtered co-occurrence of tags in 2024.
Figure 3. Graph showing filtered co-occurrence of tags in 2024.
Applsci 15 09250 g003
Figure 4. Graph showing filtered co-ocurrence of tags (top 10% edges) in 2024.
Figure 4. Graph showing filtered co-ocurrence of tags (top 10% edges) in 2024.
Applsci 15 09250 g004
Figure 5. Visualization of category embedding using T-SNE (perplexity = 30).
Figure 5. Visualization of category embedding using T-SNE (perplexity = 30).
Applsci 15 09250 g005
Figure 6. Visualization of tag embedding using T-SNE (perplexity = 30).
Figure 6. Visualization of tag embedding using T-SNE (perplexity = 30).
Applsci 15 09250 g006
Figure 7. Histogram of cosine similarity for the MiniLM model.
Figure 7. Histogram of cosine similarity for the MiniLM model.
Applsci 15 09250 g007
Figure 8. Histogram of cosine similarity for the QWEN model.
Figure 8. Histogram of cosine similarity for the QWEN model.
Applsci 15 09250 g008
Table 1. Top 50 category–tag pairs by category views fraction.
Table 1. Top 50 category–tag pairs by category views fraction.
PeriodCategoryTagViewsCountTag Views FractionCategory Views Fraction
2020Cartoonbig-boobs131,235,391550.000.16
2020Hunksbig-cock114,730,005570.000.15
2024Musclebig-cock10,424,222100.000.15
2020Latinolatin37,550,065160.000.15
2015Muscleblowjob36,689,163430.000.14
2018Cartoonbig-boobs38,109,808240.000.14
2020Tattooed Menbig-cock39,377,400230.000.14
2019Latinolatin18,228,194140.000.14
2020Hentaibig-boobs30,394,027130.000.14
2019Bukkakecumshot70,353,261320.000.14
2015Blackblowjob23,074,467290.000.14
2018Japaneseasian272,358,5781260.040.13
2020Latinobig-cock33,774,829150.000.13
2020Jockbig-cock78,627,460390.000.13
2018Bukkakecumshot86,417,178390.000.13
2024Scissoringlesbian53,572,077250.030.13
2022Koreanasian50,081,577130.010.13
2019Chineseasian37,236,622100.010.13
2015Daddycock-sucking23,626,815270.000.13
2021Uncutbig-cock54,931,431230.000.13
2021Cartoonbig-boobs138,874,747470.000.13
2015Straight Guysblowjob53,281,296390.000.13
2016Hunksblowjob77,739,115830.000.13
2016Barebackbareback93,178,665960.120.13
2015Musclebig-dick31,734,102330.000.13
2024Solo Transcum26,335,640140.010.13
2023Koreanasian67,835,607230.010.12
2016Blackblack7,328,768150.000.12
2023Uncensoredbig-boobs86,227,981380.000.12
2021Solo Malebig-cock43,789,608280.000.12
2016Solo Malebig-dick17,012,390220.000.12
2016Hunksanal75,084,356800.010.12
2019Solo Malebig-cock81,943,926430.000.12
2016Straight Guysblowjob25,912,920230.000.12
2022Groupbareback24,472,289120.010.12
2017Bukkakecum30,353,788360.010.12
2020Bisexual Malebig-cock63,760,279290.000.12
2024Cartoonbig-tits26,233,097230.000.12
2023Hentaibig-boobs97,049,267400.000.12
2020Solo Malemasturbate80,364,063380.000.12
2017Bukkakecumshot29,996,334370.000.12
2015Groupblowjob12,109,558180.000.12
2023Solo Malebig-dick30,499,048190.000.12
2017Arabbig-boobs238,597,596360.010.12
2020Straight Guysbig-cock24,245,749130.000.12
2020Musclebig-cock53,988,846300.000.12
2017Japaneseasian119,831,2931150.050.12
2020Gaybig-cock279,495,8841390.000.12
2020Solo Malebig-cock77,568,836430.000.12
2016Czechczech76,749,883450.020.12
Table 2. Top 50 category–tag pairs by tag views fraction.
Table 2. Top 50 category–tag pairs by tag views fraction.
PeriodCategoryTagViewsCountTag Views FractionCategory Views Fraction
2015Interracialblacked367,921,072660.200.03
2015Pornstarblacked363,689,856650.200.00
2015Pornstar4k119,959,422710.200.00
2015Big Dickblacked354,194,143640.190.01
2015Pornstarnaughtyamerica490,113,4603970.190.00
2015Pornstarsuck391,455,5372430.190.00
2016Pornstarblacked389,491,431760.190.00
2015Pornstarrussian249,794,6722450.190.00
2015Pornstardeep-throat439,848,7501950.190.00
2016Interracialblacked383,678,263720.190.04
2015Pornstargagging670,263,4442720.190.01
2017Pornstarnaughtyamerica8,486,249100.190.00
2015Pornstarbig72,228,121550.180.00
2015Pornstarpornstar570,675,2955410.180.00
2015Pornstarczech614,430,0834710.180.01
2016Pornstarnaughtyamerica66,139,075580.180.00
2015Pornstarsensual467,751,0392580.180.00
2015Pornstarhuge-tits578,521,4793630.170.00
2015Pornstarraven424,206,4062120.170.00
2015Pornstarcum-on-tits218,527,1191270.170.00
2015Creampiecum-in-pussy133,892,532730.170.01
2015Pornstarlingerie781,660,2884100.170.01
2015Pornstarfemale-friendly438,968,2602840.170.00
2015Pornstarnatural-tits2,511,369,91418290.170.02
2015Threesome3some1,127,683,4396860.170.06
2015Analass-fuck1,423,940,24510340.170.08
2015Pornstarbubble-butt626,705,5202890.170.01
2015Pornstaranal-sex145,535,4351130.170.00
2015Pornstarcougar342,916,1642260.170.00
2015Pornstarstockings659,676,9723380.170.01
2016Pornstarczech604,811,6004210.170.00
2015Pornstarrimming380,617,4042190.170.00
2016Pornstarbig44,748,282380.170.00
2015Pornstarbig-dick3,223,427,37316230.170.03
2015Pornstarsmall-tits1,857,067,95412230.170.02
2016Pornstaranal-sex279,922,6362330.170.00
2015Pornstarbusty1,287,188,9457520.170.01
2015Pornstargirl-on-girl970,612,6166710.170.01
2015Pornstarfake-tits1,074,714,9696750.170.01
2015Pornstarbbc765,592,9103820.170.01
2015Interracialbbc765,420,0474180.170.06
2015Pornstarteamskeet503,581,306830.170.00
2015Pornstarreverse-cowgirl1,394,876,1907630.170.01
2016Pornstarsensual213,491,5211800.170.00
2015Pornstarbrazzers443,087,9081710.170.00
2015Pornstarskinny1,290,307,8226220.170.01
2015Pornstarkissing619,540,3973340.170.01
2015Pornstarmassage447,474,0842450.170.00
2016Pornstarcum-on-tits204,938,2391180.170.00
2015Pornstarpussy-licking1,391,653,9348060.170.01
Table 3. Top 10 category–tag pairs by category views fraction in 2015.
Table 3. Top 10 category–tag pairs by category views fraction in 2015.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, MILF, Big Ass, Anal, Arab, Rough Sex, Interracial, Red Head, Italianhd, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, skinny, petite, bigtits, teamskeet, step-sister, step-brother, step-siblings, compilation, tight, facialize161910B0.394B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, Blonde, MILF, Massage, Big Ass, Arab, Interracial, Romantic, School (18+), Italianhd, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, skinny, petite, bigtits, teamskeet, step-sister, step-brother, step-siblings, compilation, tight, facialize171910B0.384B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, MILF, Big Ass, Anal, Arab, Rough Sex, Interracial, Red Head, Italiandoggy-style, cowgirl, blowjob, reverse-cowgirl, big-cock, facial, riding, big-dick, stockings, deep-throat, sloppy-blowjob, blacked, bbc, gagging, black, hairy-pussy, interracial, cheating, riding-cock, big-tits-milf16209B0.394B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, MILF, Big Ass, Anal, Arab, Rough Sex, Interracial, Red Head, Italianhd, group, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, facial, bigtits, teamskeet, step-siblings, compilation, tight, slim, facialize16189B0.394B
Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Closed Captions, Big Tits, Blonde, MILF, Massage, Babe, Big Ass, Pussy Licking, Lesbian, Romantic, Red Head, Strap On, Italian, Tattooed Womenhd, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, skinny, petite, bigtits, teamskeet, step-sister, step-brother, step-siblings, compilation, tight, facialize201910B0.384B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, Blonde, MILF, Massage, Big Ass, Arab, Interracial, Romantic, School (18+), Italianhd, group, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, facial, bigtits, teamskeet, step-siblings, compilation, tight, slim, facialize17189B0.394B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, Blonde, MILF, Massage, Big Ass, Arab, Interracial, Romantic, School (18+), Italiandoggy-style, cowgirl, blowjob, reverse-cowgirl, big-cock, facial, riding, big-dick, stockings, deep-throat, sloppy-blowjob, blacked, bbc, gagging, black, hairy-pussy, interracial, cheating, riding-cock, big-tits-milf17209B0.384B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, MILF, Big Ass, Anal, Arab, Rough Sex, Interracial, Red Head, Italiancowgirl, blowjob, reverse-cowgirl, big-cock, facial, riding, big-dick, stockings, deep-throat, blacked, bbc, gagging, black, hairy-pussy, interracial, cheating, bangbros, doggy, big-tits-milf16199B0.394B
Big Dick, Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Big Tits, MILF, Big Ass, Anal, Arab, Rough Sex, Interracial, Red Head, Italiandoggy-style, cowgirl, blowjob, reverse-cowgirl, big-cock, riding, big-dick, stockings, blacked, bbc, gagging, black, interracial, cheating, bangbros, doggy, big-tits-milf16179B0.394B
Brunette, Hardcore, Pornstar, Threesome, Popular With Women, Step Fantasy, Closed Captions, Big Tits, Blonde, MILF, Massage, Babe, Big Ass, Pussy Licking, Lesbian, Romantic, Red Head, Strap On, Italian, Tattooed Womenhd, group, blonde, bigcock, hardcore, smalltits, shaved, cumshot, doggy-style, missionary, facial, bigtits, teamskeet, step-siblings, compilation, tight, slim, facialize20189B0.384B
Table 4. Top 10 category–tag pairs by category views fraction in 2016.
Table 4. Top 10 category–tag pairs by category views fraction in 2016.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videobig-cock, riding, facial, doggystyle, babe, blowjob, big-dick, ebony, black, deep-throat, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone192011B0.435B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videobig-cock, riding, facial, doggystyle, babe, blowjob, big-dick, ebony, black, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone191911B0.435B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videobig-cock, riding, facial, doggystyle, blowjob, big-dick, ebony, black, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone191811B0.435B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videocowgirl, cumshot, step-sister, step-sis, blonde, skinny, booty, shaved, sislovesme, step-siblings, bigtits, hardcore, missionary, group, facialize, step-brother, smalltits, redhead, teamskeet, stepsis192010B0.434B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videocowgirl, cumshot, step-sister, step-sis, blonde, skinny, booty, shaved, sislovesme, step-siblings, bigtits, missionary, group, facialize, step-brother, smalltits, redhead, teamskeet, stepsis191910B0.444B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videocowgirl, cumshot, step-sister, step-sis, blonde, skinny, booty, shaved, sislovesme, step-siblings, bigtits, missionary, facialize, step-brother, smalltits, redhead, teamskeet, stepsis191810B0.444B
Babe, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Hardcore, Orgy, Latina, Interracial, Threesome, FFM, Described Videobig-cock, riding, facial, doggystyle, babe, blowjob, big-dick, ebony, black, deep-throat, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone182011B0.364B
Babe, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Hardcore, Orgy, Latina, Interracial, Threesome, FFM, Described Videobig-cock, riding, facial, doggystyle, babe, blowjob, big-dick, ebony, black, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone181911B0.364B
Babe, Big Tits, MILF, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Ass, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Blonde, Hardcore, Latina, Threesome, Described Videocumshot, step-sister, step-sis, blonde, skinny, booty, shaved, sislovesme, step-siblings, bigtits, missionary, facialize, step-brother, smalltits, redhead, teamskeet, stepsis19179B0.444B
Babe, Pornstar, Teen (18+), Popular With Women, Closed Captions, Big Dick, Brunette, Small Tits, Rough Sex, Step Fantasy, Anal, Hardcore, Orgy, Latina, Interracial, Threesome, FFM, Described Videobig-cock, riding, facial, doggystyle, blowjob, big-dick, ebony, black, huge-cock, blacked, bbc, deepthroat, creampie, gagging, interracial, hairy-pussy, ball-sucking, prone-bone181811B0.364B
Table 5. Top 10 category–tag pairs by category views fraction in 2017.
Table 5. Top 10 category–tag pairs by category views fraction in 2017.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbigtits, brunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex, 4k, sloppy-blowjob202013B0.355B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbigtits, brunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex, 4k201913B0.355B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbrunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex, 4k201813B0.355B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbrunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex201713B0.355B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbrunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex201613B0.365B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbrunette, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, raw, close-up, doggy, babe, step-fantasy, hd, sex201412B0.364B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWbrunette, teasing, hardcore, cumshot, blowjob, cock-sucking, oral, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex201512B0.364B
Brunette, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Threesome, Small Tits, Big Dick, Interracial, Orgy, Romantic, Latina, FMM, SFWbigtits, brunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex, 4k, sloppy-blowjob192013B0.314B
Brunette, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Threesome, Small Tits, Big Dick, Interracial, Orgy, Romantic, Latina, FMM, SFWbigtits, brunette, point-of-view, teasing, hardcore, shaved, cumshot, blowjob, cock-sucking, pov, raw, close-up, doggy, babe, step-fantasy, cum-on-tits, hd, sex, 4k191913B0.314B
Brunette, MILF, Pornstar, Popular With Women, Step Fantasy, Closed Captions, Blonde, Hardcore, Reality, Big Ass, Ebony, Babe, Threesome, Small Tits, Big Dick, Interracial, Romantic, Latina, FMM, SFWpoint-of-view, blowjob, pov, hard-rough-sex, big-cock, reality, pov-blowjob, handjob, british, riding-dick201012B0.354B
Table 6. Top 10 category–tag pairs by category views fraction in 2018.
Table 6. Top 10 category–tag pairs by category views fraction in 2018.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWblowjob, brunette, hardcore, deepthroat, hd, 4k, creampie, small-tits, big-dick, sex, asian, rimming, gagging, deep-throating, porhub, medium-boobs201715B0.315B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWpoint-of-view, pov, big-cock, smalltits, skinny, stepsis, big, sislovesme, step-sister, step-brother, step-fantasy, shaved, bigcock, cumshot, redhead, step-siblings, bigtits, cum-shot, teamskeet, facialize202015B0.315B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWpoint-of-view, pov, big-cock, smalltits, skinny, stepsis, big, sislovesme, step-sister, step-brother, step-fantasy, shaved, bigcock, cumshot, redhead, step-siblings, cum-shot, teamskeet, facialize201915B0.315B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWblowjob, brunette, hardcore, deepthroat, cum-in-mouth, hd, 4k, creampie, small-tits, big-dick, sex, cum-on-face, pov-sex, gagging, sucking-dick, porhub, medium-boobs201815B0.315B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWblowjob, brunette, hardcore, hd, 4k, creampie, small-tits, big-dick, sex, asian, pov-sex, porhub, medium-boobs202014B0.315B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Closed Captions, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Role Play, Interracial, Red Head, Double Penetration, Threesome, Gangbang, SFWblowjob, brunette, hardcore, cum-in-mouth, hd, 4k, creampie, small-tits, big-dick, sex, sucking-dick, porhub, medium-boobs201914B0.314B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Interracial, Red Head, Double Penetration, Threesome, Gangbang, FFM, SFWblowjob, brunette, hardcore, deepthroat, hd, 4k, creampie, small-tits, big-dick, sex, asian, rimming, gagging, deep-throating, porhub, medium-boobs191715B0.294B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Interracial, Red Head, Double Penetration, Threesome, Gangbang, FFM, SFWpoint-of-view, pov, big-cock, smalltits, skinny, stepsis, big, sislovesme, step-sister, step-brother, step-fantasy, shaved, bigcock, cumshot, redhead, step-siblings, bigtits, cum-shot, teamskeet, facialize192015B0.294B
Babe, Blonde, Pornstar, POV, Small Tits, Popular With Women, Step Fantasy, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Interracial, Red Head, Double Penetration, Threesome, Gangbang, FFM, SFWpoint-of-view, pov, big-cock, smalltits, skinny, stepsis, big, sislovesme, step-sister, step-brother, step-fantasy, shaved, bigcock, cumshot, redhead, step-siblings, cum-shot, teamskeet, facialize191915B0.294B
Babe, Blonde, Pornstar, Popular With Women, Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Blowjob, Interracial, Double Penetration, Pussy Licking, Threesome, Gangbang, SFWblowjob, brunette, hardcore, deepthroat, hd, 4k, creampie, small-tits, big-dick, sex, asian, rimming, gagging, deep-throating, porhub, medium-boobs161715B0.294B
Table 7. Top 10 category–tag pairs by category views fraction in 2019.
Table 7. Top 10 category–tag pairs by category views fraction in 2019.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybrunette, blowjob, cowgirl, small-tits, big-dick, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, rimming, deep-throat, handjob, pussy-licking, hairy-pussy, suck, deep-throating201919B0.275B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybrunette, blowjob, deepthroat, cowgirl, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, asian, rimming, deep-throat, handjob, pussy-licking, kissing, hairy-pussy, suck, deep-throating202018B0.275B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybrunette, blowjob, cowgirl, small-tits, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, rimming, deep-throat, handjob, pussy-licking, kissing, hairy-pussy, suck201818B0.275B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybrunette, blowjob, cowgirl, small-tits, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, rimming, deep-throat, handjob, pussy-licking, hairy-pussy, suck201718B0.275B
Big Tits, Brunette, Hardcore, Pornstar, Big Dick, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Asian, Red Head, Massage, HD Porn, Cuckold, Parodybrunette, blowjob, cowgirl, small-tits, big-dick, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, rimming, deep-throat, handjob, pussy-licking, hairy-pussy, suck, deep-throating191919B0.255B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybrunette, blowjob, cowgirl, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, rimming, deep-throat, handjob, pussy-licking, hairy-pussy, suck201618B0.275B
Big Tits, Brunette, Hardcore, Pornstar, Big Dick, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Asian, Red Head, Massage, HD Porn, Cuckold, Parodybrunette, blowjob, deepthroat, cowgirl, facial, creampie, doggystyle, reverse-cowgirl, riding, missionary, lingerie, asian, rimming, deep-throat, handjob, pussy-licking, kissing, hairy-pussy, suck, deep-throating192018B0.254B
Table 8. Top 10 category–tag pairs by category views fraction in 2020.
Table 8. Top 10 category–tag pairs by category views fraction in 2020.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybig-tits, stepmom, big-boobs, milf, cougar, stepson, step-fantasy, bigtits, busty, butt, curvy, mother, mom, older-younger201415B0.284B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybig-cock, big-dick, facial, bbc, blonde, blacked, group, interracial, prone-bone, black, naughtyamerica, deep-throating201216B0.284B
Big Ass, Big Tits, Brunette, Hardcore, Pornstar, Big Dick, Blonde, MILF, Step Fantasy, Reality, Threesome, Latina, Role Play, Arab, Small Tits, Massage, Closed Captions, HD Porn, Cuckold, Parodybig-tits, stepmom, big-boobs, milf, cougar, stepson, step-fantasy, taboo, bigtits, busty, fake-tits, blonde, curvy, step-mom, mother, mom, older-younger201715B0.294B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)big-boobs, rough, butt, latina, big-ass-latina, big-natural-tits, big-ass, loud-moaning, big-booty, sucking-dick, college, latin, pornstar, bouncing-tits, hot-milf191520B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)blonde, creampie, big-cock, big-dick, petite, hardcore, sex, small-tits, hd, oral, outdoors, oral-sex, casting, group, stepbrother, tight, cum-in-pussy, huge-cock191820B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)big-boobs, big-tits, butt, latina, big-ass-latina, big-natural-tits, big-ass, couple, pawg, big-booty, latin, pornstar, cowgirl-riding191319B0.326B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)blonde, creampie, big-cock, big-dick, petite, shaved-pussy, sex, small-tits, hd, 4k, interracial, oral, outdoors, oral-sex, casting, group, tight, cum-in-pussy, huge-cock191920B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+), Vintageblonde, creampie, big-cock, big-dick, petite, hardcore, sex, small-tits, hd, oral, outdoors, oral-sex, casting, group, stepbrother, tight, cum-in-pussy, huge-cock201820B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)blonde, creampie, big-cock, big-dick, petite, hardcore, sex, small-tits, hd, step-siblings, step-sis, oral, casting, stepbrother191420B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Role Play, Babysitter (18+)blonde, creampie, big-cock, big-dick, petite, hardcore, sex, small-tits, hd, oral, outdoors, oral-sex, casting, group, stepbrother, tight, cum-in-pussy, huge-cock181820B0.306B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+), Vintageblonde, creampie, big-cock, big-dick, petite, shaved-pussy, sex, small-tits, hd, 4k, interracial, oral, outdoors, oral-sex, casting, group, tight, cum-in-pussy, huge-cock201920B0.306B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+)blonde, creampie, big-cock, big-dick, petite, shaved-pussy, submissive, sex, small-tits, skinny, hd, raw, outdoors, oral-sex, casting, group, tight, cum-in-pussy, bareback, huge-cock192020B0.316B
Big Dick, Blonde, Pornstar, Step Fantasy, Big Tits, Brunette, Blowjob, Cumshot, Hardcore, POV, Verified Models, Big Ass, MILF, Latina, Small Tits, Muscular Men, Closed Captions, Role Play, Babysitter (18+), Vintagebig-boobs, rough, butt, latina, big-ass-latina, big-natural-tits, big-ass, loud-moaning, big-booty, sucking-dick, college, latin, pornstar, bouncing-tits, hot-milf201520B0.306B
Table 9. Top 10 category–tag pairs by category views fraction in 2021.
Table 9. Top 10 category–tag pairs by category views fraction in 2021.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, MILF, Verified Models, Blowjob, POV, Step Fantasy, Blonde, Red Head, Small Tits, Role Play, Muscular Men, Ebony, Interracial, Casting, SFWpov, brunette, blowjob, creampie, deepthroat, blonde, lingerie, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, hd, oral-sex201918B0.316B
Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, MILF, Verified Models, Blowjob, POV, Step Fantasy, Blonde, Red Head, Small Tits, Role Play, Muscular Men, Ebony, Interracial, Casting, SFWpov, blowjob, creampie, deepthroat, blonde, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, sex, suck, hd, oral-sex, teasing202018B0.315B
Big Ass, Babe, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, Verified Models, Blowjob, POV, Step Fantasy, Red Head, Role Play, Ebony, Interracial, Casting, Hentaipov, brunette, blowjob, creampie, deepthroat, blonde, lingerie, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, hd, oral-sex171918B0.295B
Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, MILF, Verified Models, Blowjob, POV, Step Fantasy, Blonde, Red Head, Small Tits, Role Play, Muscular Men, Ebony, Interracial, Casting, SFWpov, deep-throat, blowjob, creampie, deepthroat, blonde, pussy-licking, cowgirl, shaved-pussy, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, suck, teasing201717B0.315B
Big Ass, Babe, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, Verified Models, Blowjob, POV, Step Fantasy, Red Head, Role Play, Ebony, Interracial, Casting, Hentaipov, blowjob, creampie, deepthroat, blonde, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, sex, suck, hd, oral-sex, teasing172018B0.295B
Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, MILF, Verified Models, Blowjob, POV, Step Fantasy, Blonde, Red Head, Small Tits, Role Play, Muscular Men, Ebony, Interracial, Casting, SFWpov, deep-throat, blowjob, creampie, blonde, masturbation, pussy-licking, cowgirl, shaved-pussy, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, hd, oral-sex, teasing201816B0.315B
Big Ass, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, MILF, Verified Models, Blowjob, POV, Step Fantasy, Blonde, Red Head, Small Tits, Role Play, Muscular Men, Ebony, Interracial, Casting, SFWpov, deep-throat, blowjob, creampie, blonde, masturbation, pussy-licking, cowgirl, shaved-pussy, cum-in-mouth, ball-sucking, hardcore, rough-sex, big-dick201416B0.315B
Big Dick, Brunette, Hardcore, Pornstar, Handjob, Verified Models, Blowjob, POV, Step Fantasy, Cumshot, Red Head, Small Tits, Role Play, Muscular Men, Interracial, Casting, Celebrity, Smoking, SFWpov, brunette, blowjob, creampie, deepthroat, blonde, lingerie, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, hd, oral-sex191918B0.285B
Big Ass, Babe, Big Dick, Big Tits, Brunette, Hardcore, Pornstar, Verified Models, Blowjob, POV, Step Fantasy, Red Head, Role Play, Ebony, Interracial, Casting, Hentaipov, deep-throat, blowjob, creampie, deepthroat, blonde, pussy-licking, cowgirl, shaved-pussy, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, suck, teasing171717B0.295B
Babe, Big Dick, Brunette, Hardcore, Pornstar, Rough Sex, Verified Models, Blowjob, POV, Step Fantasy, Red Head, Ebony, Interracial, Casting, Celebrity, Smokingpov, brunette, blowjob, creampie, deepthroat, blonde, lingerie, pussy-licking, cowgirl, shaved-pussy, petite, cum-in-mouth, ball-sucking, hardcore, long-hair, rough-sex, big-dick, hd, oral-sex161918B0.275B
Table 10. Top 10 category–tag pairs by category views fraction in 2022.
Table 10. Top 10 category–tag pairs by category views fraction in 2022.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastpussy-licking, blowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, brunette, riding, handjob, cum-in-mouth, suck, shaved, prone-bone, sloppy-blowjob, stockings201719B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, facial, brunette, riding, handjob, cum-in-mouth, suck, shaved, cum-on-face, sloppy-blowjob201619B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, brunette, riding, handjob, cum-in-mouth, suck, shaved, compilation, sloppy-blowjob201519B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, facial, brunette, riding, handjob, cum-in-mouth, suck201319B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, brunette, riding, handjob, cum-in-mouth, suck, shaved, stockings201418B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastbrazzers, pussy-licking, blowjob, cowgirl, missionary, natural-tits, doggystyle, bouncing-tits, cum-on-tits, deepthroat, lingerie, doggy, side-fuck, bald-pussy, bubble-butt, cum-in-mouth, cock-sucking, trimmed-pussy, stockings201918B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Threesome, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, brunette, rough-sex, cum-in-mouth, suck, gagging201218B0.326B
Big Dick, Big Tits, Hardcore, Pornstar, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastpussy-licking, blowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, brunette, riding, handjob, cum-in-mouth, suck, shaved, prone-bone, sloppy-blowjob, stockings191719B0.306B
Big Dick, Big Tits, Hardcore, Pornstar, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, facial, brunette, riding, handjob, cum-in-mouth, suck, shaved, cum-on-face, sloppy-blowjob191619B0.306B
Big Dick, Big Tits, Hardcore, Pornstar, Role Play, Step Fantasy, Babe, Brunette, Blowjob, Small Tits, Pussy Licking, Verified Models, Handjob, Blonde, Interracial, Orgy, Casting, Muscular Men, Podcastblowjob, reverse-cowgirl, cowgirl, missionary, doggystyle, cumshot, deepthroat, facial, brunette, riding, handjob, cum-in-mouth, suck191319B0.305B
Table 11. Top 10 category–tag pairs by category views fraction in 2023.
Table 11. Top 10 category–tag pairs by category views fraction in 2023.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastblonde, brunette, deep-throating, blowjob, cum-shot, female-orgasm, pussy-licking, rough-sex, handjob, shaved-pussy, pov, stepsister, trimmed-pussy, lingerie, doggy-style, long-hair, ball-sucking191715B0.325B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastblonde, brunette, deep-throating, blowjob, cum-shot, female-orgasm, pussy-licking, rough-sex, handjob, shaved-pussy, pov, lingerie, stockings, doggy-style, long-hair, deep-throat, ball-sucking, hand-job, naughtyamerica191915B0.325B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastblonde, brunette, deep-throating, blowjob, cum-shot, pussy-licking, small-tits, rough-sex, handjob, shaved-pussy, pov, cock-sucking, 4k, lingerie, doggy-style, long-hair, czech, ball-sucking, sucking-dick, perky-tits192015B0.325B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), Verified Models, Interracial, HD Porn, Podcast, Virtual Realityblonde, brunette, deep-throating, blowjob, cum-shot, female-orgasm, pussy-licking, rough-sex, handjob, shaved-pussy, pov, stepsister, trimmed-pussy, lingerie, doggy-style, long-hair, ball-sucking201715B0.315B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastdeepthroat, facial, doggystyle, cum-in-mouth, brazzers, reverse-cowgirl, cum-on-face, cowgirl, riding, natural-tits, missionary, outdoors, bubble-butt, prone-bone, fake-tits, shaved, perfect-ass, riding-cock, suck, naughtyamerica192014B0.325B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), Verified Models, Interracial, HD Porn, Podcast, Virtual Realityblonde, brunette, deep-throating, blowjob, cum-shot, female-orgasm, pussy-licking, rough-sex, handjob, shaved-pussy, pov, lingerie, stockings, doggy-style, long-hair, deep-throat, ball-sucking, hand-job, naughtyamerica201915B0.314B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), Verified Models, Interracial, HD Porn, Podcast, Virtual Realityblonde, brunette, deep-throating, blowjob, cum-shot, pussy-licking, small-tits, rough-sex, handjob, shaved-pussy, pov, cock-sucking, 4k, lingerie, doggy-style, long-hair, czech, ball-sucking, sucking-dick, perky-tits202015B0.314B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastbig-ass, latina, hardcore, babe, pornstar, big-tits, oral-sex, sensual, couple, fetish, fingering, latin, massage, asian, kink, big-ass-latina, pussy-eating, lesbian, casting191915B0.314B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastdeepthroat, facial, doggystyle, cum-in-mouth, reverse-cowgirl, cum-on-face, cowgirl, riding, natural-tits, missionary, cock-sucking, outdoors, prone-bone, bj, shaved, perfect-ass, riding-cock, suck, older-younger191914B0.334B
Big Ass, Babe, Big Tits, Blonde, Brunette, Blowjob, Hardcore, Pornstar, Cumshot, Ebony, Small Tits, Babysitter (18+), Big Dick, Role Play, Step Fantasy, POV, Old/Young (18+), HD Porn, Podcastbig-ass, latina, hardcore, babe, pornstar, big-tits, oral-sex, sensual, couple, fetish, fingering, massage, asian, kink, big-ass-latina, kissing, pussy-eating, lesbian, casting, girl-on-girl192014B0.314B
Table 12. Top 10 category–tag pairs by category views fraction in 2024.
Table 12. Top 10 category–tag pairs by category views fraction in 2024.
CategoriesTagsCat. Max Comm SizeTag Max Comm SizeViewsTag Views FractionViews x Fraction
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Smoking, Czech, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, huge-tits, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj202010B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, huge-tits, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj182010B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Bukkake, Smoking, Czech, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, huge-tits, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj192010B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Smoking, Czech, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj201910B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj181910B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Bukkake, Smoking, Czech, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj191910B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, British, Verified Models, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Bukkake, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, huge-tits, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj172010B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Smoking, Czech, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, loud-moaning, point-of-view, doggy, brazzers, side-fuck, ball-sucking, prone-bone, suck, bj20189B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, Verified Models, HD Porn, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Massage, Bukkake, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, loud-moaning, point-of-view, doggy, brazzers, side-fuck, ball-sucking, prone-bone, suck, bj18189B0.273B
Big Dick, Big Tits, Brunette, Hardcore, MILF, Pornstar, British, Verified Models, Blonde, Orgy, Interracial, Double Penetration, Gangbang, Bukkake, Cuckold, Bisexual Male, Vintagedoggystyle, bouncing-tits, titty-fuck, cum-on-tits, deepthroat, cowgirl, facial, 3some, missionary, natural-tits, point-of-view, doggy, brazzers, side-fuck, glasses, cum-in-pussy, ball-sucking, prone-bone, bj171910B0.273B
Table 13. Best matching category for each tag by model MiniLM’s cosine similarity (places 0 to 54).
Table 13. Best matching category for each tag by model MiniLM’s cosine similarity (places 0 to 54).
TagCategoryAre Similar (LLM)Cosine Similarity
asianAsianTrue1.00
big-dickBig DickTrue1.00
compilationCompilationTrue1.00
creampieCreampieTrue1.00
ebonyEbonyTrue1.00
female-orgasmFemale OrgasmTrue1.00
massageMassageTrue1.00
milfMILFTrue1.00
povPOVTrue1.00
squirtSquirtTrue1.00
amateurAmateurTrue1.00
analAnalTrue1.00
babeBabeTrue1.00
big-assBig AssTrue1.00
blowjobBlowjobTrue1.00
britishBritishTrue1.00
castingCastingTrue1.00
cumshotCumshotTrue1.00
fetishFetishTrue1.00
fingeringFingeringTrue1.00
interracialInterracialTrue1.00
latinaLatinaTrue1.00
lesbianLesbianTrue1.00
masturbationMasturbationTrue1.00
realityRealityTrue1.00
small-titsSmall TitsTrue1.00
step-fantasyStep FantasyTrue1.00
threesomeThreesomeTrue1.00
barebackBarebackTrue1.00
big-titsBig TitsTrue1.00
blackBlackTrue1.00
blondeBlondeTrue1.00
brunetteBrunetteTrue1.00
czechCzechTrue1.00
groupGroupTrue1.00
handjobHandjobTrue1.00
hardcoreHardcoreTrue1.00
pornstarPornstarTrue1.00
publicPublicTrue1.00
pussy-lickingPussy LickingTrue1.00
rough-sexRough SexTrue1.00
russianRussianTrue1.00
huge-titsBig TitsTrue0.99
hard-rough-sexRough SexTrue0.99
3someThreesomeTrue0.98
amateur-threesomeThreesomeTrue0.98
masturbateMasturbationFalse0.97
teenTeen (18+)False0.97
roleplayRole PlayTrue0.97
amateur-blowjobBlowjobTrue0.97
girl-orgasmFemale OrgasmTrue0.97
titsBig TitsTrue0.97
orgasmFemale OrgasmTrue0.96
babesBabeTrue0.96
Table 14. Best matching category for each tag by model MiniLM’s cosine similarity (places 54 to 108).
Table 14. Best matching category for each tag by model MiniLM’s cosine similarity (places 54 to 108).
TagCategoryAre Similar (LLM)Cosine Similarity
anal-sexAnalTrue0.96
latinLatinaFalse0.95
youngOld/Young (18+)True0.95
collegeCollege (18+)True0.95
big-natural-titsBig TitsTrue0.94
adult-toysToysTrue0.94
assBig AssTrue0.94
blackedBlackTrue0.94
ass-fuckBig AssTrue0.94
sloppy-blowjobBlowjobTrue0.94
cum-shotCumshotTrue0.94
teenagerTeen (18+)True0.93
fake-titsBig TitsTrue0.93
tattooTattooed MenFalse0.93
older-youngerOld/Young (18+)True0.93
bigFunnyFalse0.92
cum-on-titsBig TitsTrue0.92
realRealityTrue0.92
squirtingSquirtTrue0.92
pussyPussy LickingTrue0.92
hand-jobHandjobTrue0.92
18-year-oldOld/Young (18+)False0.92
coupleVerified CouplesTrue0.92
big-tits-milfBig TitsTrue0.92
perfect-assBig AssTrue0.92
hard-fuckFunnyFalse0.92
big-buttBig AssTrue0.91
suckFunnyFalse0.91
big-boobsBig TitsTrue0.91
big-ass-latinaLatinaTrue0.90
natural-titsBig TitsTrue0.90
hot-milfMILFTrue0.90
female-friendlyPopular With WomenTrue0.90
cumCumshotTrue0.90
squirting-orgasmFemale OrgasmTrue0.90
tightStrap OnFalse0.90
sucking-dickBig DickTrue0.90
pov-sexPOVTrue0.89
cum-in-pussyPussy LickingTrue0.89
sexRough SexFalse0.89
roughRough SexFalse0.88
close-upFunnyFalse0.88
hotFunnyFalse0.88
big-cockBig DickTrue0.88
amateur-coupleVerified CouplesFalse0.88
sexyFunnyFalse0.88
side-fuckFunnyFalse0.88
pov-blowjobPOVTrue0.88
sensualRomanticTrue0.87
hdHD PornFalse0.87
creamy-pussyPussy LickingTrue0.87
point-of-viewStrap OnFalse0.87
pornohubPornstarFalse0.87
perky-titsBig TitsTrue0.87
Table 15. Best matching category for each tag by model MiniLM’s cosine similarity (places 108 to 162).
Table 15. Best matching category for each tag by model MiniLM’s cosine similarity (places 108 to 162).
TagCategoryAre Similar (LLM)Cosine Similarity
whootyFunnyFalse0.87
tight-pussyPussy LickingTrue0.87
bbcBritishTrue0.87
buttBig AssTrue0.86
homemadeExclusiveFalse0.86
huge-cockBig DickTrue0.86
shaved-pussyPussy LickingFalse0.86
18-year-cute-girlOld/Young (18+)True0.86
hairy-pussyPussy LickingTrue0.86
cheatingFunnyFalse0.86
pussy-eatingPussy LickingTrue0.86
cum-mouthCumshotTrue0.86
big-natural-boobsBig TitsTrue0.86
intenseInteractiveTrue0.86
medium-boobsBig TitsFalse0.85
wet-pussyPussy LickingTrue0.85
deep-insideStrap OnFalse0.85
skinnyChubbyFalse0.85
cum-insideCumshotTrue0.85
bouncing-titsBig TitsTrue0.85
slimChubbyFalse0.85
redheadRed HeadTrue0.85
cum-on-faceCumshotTrue0.85
momFunnyFalse0.84
cum-in-mouthCumshotTrue0.84
step-brotherStep FantasyTrue0.84
outsideInteractiveFalse0.84
cock-suckingMasturbationTrue0.84
outdoorsInteractiveFalse0.84
real-couple-homemadeVerified CouplesTrue0.84
bigcockBig DickTrue0.84
tabooUncensoredTrue0.83
riding-dickBig DickTrue0.83
bigtitsBig AssFalse0.83
motherFunnyFalse0.83
big-bootyBig AssTrue0.83
oral-sexMasturbationTrue0.82
bald-pussyPussy LickingFalse0.82
kissingMasturbationFalse0.82
curvyChubbyTrue0.82
titty-fuckBig TitsTrue0.82
4kFunnyFalse0.82
trimmed-pussyPussy LickingFalse0.82
stepbrotherStep FantasyTrue0.81
sislovesmeFunnyFalse0.81
submissiveFetishTrue0.81
extremeExclusiveTrue0.81
stepmomStep FantasyTrue0.81
brazzersBBWFalse0.81
deepthroatBehind The ScenesFalse0.81
smalltitsFunnyFalse0.80
stepsisStep FantasyTrue0.80
oralMasturbationTrue0.80
teamskeetGroupTrue0.80
Table 16. Best matching category for each tag by model MiniLM’s cosine similarity (places 162 to 216).
Table 16. Best matching category for each tag by model MiniLM’s cosine similarity (places 162 to 216).
TagCategoryAre Similar (LLM)Cosine Similarity
teasingInteractiveTrue0.80
hottieRealityFalse0.80
gaggingParodyFalse0.80
stepsonStep FantasyTrue0.80
rawExclusiveTrue0.79
doggyMasturbationFalse0.79
perfect-bodyStrap OnFalse0.79
step-momStep FantasyTrue0.79
dirty-talkRough SexFalse0.79
facialReactionTrue0.79
bootyFunnyFalse0.79
girl-on-girlPopular With WomenTrue0.79
stepsisterStep FantasyTrue0.79
shavedUncutFalse0.79
riding-cockBig DickTrue0.79
hairyFetishFalse0.78
facializeInteractiveFalse0.78
drilledFunnyFalse0.78
long-hairStrap OnFalse0.78
lingerieLesbianFalse0.78
kinkFetishTrue0.78
doggy-stylePussy LickingFalse0.78
ridingStrap OnTrue0.78
bjBBWFalse0.78
mgvideosVertical VideoFalse0.77
step-sisterStep FantasyTrue0.77
step-siblingsStep FantasyTrue0.77
glassesRealityFalse0.77
cowgirlBabeTrue0.77
bubble-buttBig AssTrue0.76
bangbrosBBWFalse0.76
missionaryBabeFalse0.76
bang-brosGroupTrue0.76
ball-suckingBabeFalse0.75
loud-moaningBabeFalse0.75
naughtyamericaExclusiveFalse0.75
petiteSolo FemaleFalse0.74
ravenFunnyFalse0.74
familly-therapyExclusiveFalse0.74
porhubPublicFalse0.74
stockingsFeetTrue0.73
deep-throatExclusiveFalse0.73
bustyBabeTrue0.73
doggystylePussy LickingFalse0.73
step-sisStep FantasyTrue0.73
deep-throatingExclusiveFalse0.73
rimmingFingeringFalse0.73
pawgFunnyFalse0.72
reverse-cowgirlBabeFalse0.70
prone-boneStrap OnFalse0.70
cowgirl-ridingPopular With WomenFalse0.68
cougarChubbyFalse0.65
Table 17. Best matching category for each tag by model qwen’s cosine similarity (places 0 to 54).
Table 17. Best matching category for each tag by model qwen’s cosine similarity (places 0 to 54).
TagCategoryAre Similar (LLM)Cosine Similarity
amateurAmateurTrue1.00
blackBlackTrue1.00
cumCumshotTrue1.00
cum-insideCumshotTrue1.00
cum-mouthCumshotTrue1.00
cum-shotCumshotTrue1.00
cumshotCumshotTrue1.00
hand-jobHandjobTrue1.00
handjobHandjobTrue1.00
hard-fuckHardcoreTrue1.00
hardcoreHardcoreTrue1.00
milfMILFTrue1.00
pussy-eatingPussy LickingTrue1.00
pussy-lickingPussy LickingTrue1.00
russianRussianTrue1.00
glassesGamingFalse1.00
pawgPussy LickingFalse1.00
pov-blowjobPussy LickingFalse1.00
pov-sexPussy LickingFalse1.00
ravenRussianFalse1.00
asianAsianTrue1.00
babeBabeTrue1.00
barebackBarebackTrue1.00
big-assBig AssTrue1.00
big-boobsBig TitsTrue1.00
big-bootyBig AssTrue1.00
big-buttBig AssTrue1.00
big-cockBig DickTrue1.00
big-dickBig DickTrue1.00
big-titsBig TitsTrue1.00
bigcockBig DickTrue1.00
blondeBlondeTrue1.00
blowjobBlowjobTrue1.00
britishBritishTrue1.00
brunetteBrunetteTrue1.00
bustyBabeTrue1.00
castingCastingTrue1.00
compilationCompilationTrue1.00
creampieCreampieTrue1.00
czechCzechTrue1.00
ebonyEbonyTrue1.00
fetishFetishTrue1.00
fingeringFingeringTrue1.00
latinaLatinaTrue1.00
lesbianLesbianTrue1.00
massageMassageTrue1.00
masturbationMasturbationTrue1.00
pornstarPornstarTrue1.00
povPOVTrue1.00
realityRealityTrue1.00
roleplayRole PlayTrue1.00
small-titsSmall TitsTrue1.00
step-brotherStep FantasyTrue1.00
step-fantasyStep FantasyTrue1.00
Table 18. Best matching category for each tag by model qwen’s cosine similarity (places 54 to 108).
Table 18. Best matching category for each tag by model qwen’s cosine similarity (places 54 to 108).
TagCategoryAre Similar (LLM)Cosine Similarity
step-momStep FantasyTrue1.00
step-siblingsStep FantasyTrue1.00
step-sisStep FantasyTrue1.00
step-sisterStep FantasyTrue1.00
stepmomStep FantasyTrue1.00
stepsonStep FantasyTrue1.00
18-year-cute-girl180°False1.00
18-year-old180°False1.00
3some3DFalse1.00
bigBig AssFalse1.00
blackedBlondeFalse1.00
kinkKoreanFalse1.00
kissingKoreanFalse1.00
orgasmOrgyFalse1.00
pussyPissingFalse1.00
analAnalTrue1.00
anal-sexAnalTrue1.00
female-friendlyFemale OrgasmTrue1.00
female-orgasmFemale OrgasmTrue1.00
hottieHunksTrue1.00
interracialInterracialTrue1.00
publicPublicTrue1.00
redheadRed HeadTrue1.00
squirtSquirtTrue1.00
threesomeThreesomeTrue1.00
bald-pussyBrazilianFalse1.00
bouncing-titsBrazilianFalse1.00
brazzersBrazilianFalse1.00
hairyHunksFalse1.00
hdHD PornFalse1.00
groupGroupTrue1.00
rough-sexRough SexTrue1.00
roughRough SexFalse1.00
squirtingSolo FemaleFalse1.00
sucking-dickSolo MaleFalse1.00
big-ass-latinaBig AssTrue1.00
big-natural-boobsBig TitsTrue1.00
big-natural-titsBig TitsTrue1.00
big-tits-milfBig TitsTrue1.00
bigtitsBig TitsTrue1.00
hairy-pussyHentaiFalse1.00
cum-in-mouthCumshotTrue1.00
cum-in-pussyCumshotTrue1.00
cum-on-faceCumshotTrue1.00
cum-on-titsCumshotTrue1.00
hard-rough-sexHardcoreTrue1.00
amateur-blowjobAmateurTrue1.00
amateur-coupleAmateurTrue1.00
amateur-threesomeAmateurTrue1.00
creamy-pussyCreampieTrue1.00
pornohubPornstarFalse1.00
masturbateMasturbationFalse1.00
stepbrotherStep FantasyTrue1.00
intenseInterracialFalse1.00
Table 19. Best matching category for each tag by model qwen’s cosine similarity (places 108 to 162).
Table 19. Best matching category for each tag by model qwen’s cosine similarity (places 108 to 162).
TagCategoryAre Similar (LLM)Cosine Similarity
sensualSolo FemaleTrue1.00
suckSolo MaleFalse1.00
smalltitsSmall TitsTrue1.00
gaggingGamingFalse1.00
familly-therapyFistingFalse1.00
tattooTattooed MenFalse1.00
titty-fuckTattooed WomenFalse1.00
collegeCollege (18+)True1.00
teenagerTeen (18+)True1.00
teenTeen (18+)False1.00
babesBabysitter (18+)False1.00
sloppy-blowjobSolo MaleFalse1.00
squirting-orgasmSolo FemaleFalse1.00
titsTattooed WomenFalse1.00
doggystyleDaddyFalse1.00
4k60FPSFalse1.00
missionaryItalianFalse0.99
older-youngerBarebackFalse0.99
ridingCastingFalse0.99
riding-cockCastingFalse0.99
riding-dickCastingFalse0.99
huge-cockBarebackTrue0.99
huge-titsBarebackFalse0.99
oralItalianFalse0.99
oral-sexItalianFalse0.99
hot-milfHardcoreTrue0.99
hotHardcoreFalse0.99
dirty-talkOld/Young (18+)False0.99
adult-toysFemale OrgasmFalse0.99
girl-on-girlGayTrue0.99
girl-orgasmGayFalse0.99
bootyStrap OnFalse0.99
buttBarebackTrue0.99
youngFemale OrgasmFalse0.99
ball-suckingBearFalse0.99
sexyBarebackFalse0.99
sislovesmeItalianFalse0.99
close-upExclusiveFalse0.99
outsideExclusiveFalse0.99
tightRough SexFalse0.99
tight-pussyRough SexFalse0.99
bjEbonyFalse0.99
side-fuckStep FantasyFalse0.99
lingerieBarebackFalse0.99
cock-suckingGangbangFalse0.98
fake-titsOld/Young (18+)False0.98
teamskeetRole PlayFalse0.98
deep-insideRough SexTrue0.98
deep-throatRough SexFalse0.98
deepthroatRough SexFalse0.98
deep-throatingRough SexFalse0.98
cowgirlGermanFalse0.98
cowgirl-ridingGermanFalse0.98
reverse-cowgirlRough SexFalse0.98
Table 20. Best matching category for each tag by model qwen’s cosine similarity (places 162 to 216).
Table 20. Best matching category for each tag by model qwen’s cosine similarity (places 162 to 216).
TagCategoryAre Similar (LLM)Cosine Similarity
loud-moaningBarebackFalse0.98
bubble-buttMassageFalse0.98
slimBlondeTrue0.98
skinnyStrap OnFalse0.98
sexGayFalse0.98
medium-boobsSmall TitsFalse0.98
outdoorsGroupFalse0.98
bang-brosStrap OnFalse0.98
bangbrosStrap OnFalse0.98
natural-titsExclusiveFalse0.98
wet-pussyHentaiFalse0.98
petiteHD PornFalse0.98
trimmed-pussyGroupFalse0.98
perfect-assRough SexFalse0.98
perfect-bodyRough SexFalse0.98
latinMusicFalse0.98
stepsisStep FantasyTrue0.98
stepsisterStep FantasyTrue0.98
rimmingLatinaFalse0.98
bbcMusicFalse0.98
tabooMasturbationFalse0.98
perky-titsRealityFalse0.98
assIndianFalse0.98
ass-fuckIndianFalse0.98
teasingGayFalse0.98
cheatingGangbangFalse0.98
doggy-stylePartyFalse0.98
doggyPartyFalse0.98
stockingsBondageFalse0.98
shaved-pussyMILFFalse0.98
shavedMILFFalse0.98
rawOld/Young (18+)False0.98
mgvideosHD PornFalse0.98
momMILFTrue0.97
drilledSmokingFalse0.97
prone-boneTwink (18+)False0.97
realRough SexTrue0.97
real-couple-homemadeRough SexFalse0.97
facialEbonyFalse0.97
facializeEbonyFalse0.97
extremeSFWFalse0.97
naughtyamericaMILFTrue0.97
motherFemale OrgasmFalse0.97
porhubParodyFalse0.97
point-of-viewBondageFalse0.97
homemadeStrap OnFalse0.97
submissivetestcategoryadTrue0.97
cougarStripteaseFalse0.97
coupleStripteaseFalse0.97
whootyBrunetteFalse0.97
curvyHD PornFalse0.96
long-hairVirtual RealityFalse0.94
Table 21. Assessment of non-pornographic platforms for dataset suitability.
Table 21. Assessment of non-pornographic platforms for dataset suitability.
PlatformPublicly AvailableTaxonomyFolksonomy
YouTubeNoTaxonomyFolksonomy
FacebookNoGroupsHashtags
InstagramNoAbsentHashtags by creators
RedditPartiallySubreddit hierarchyUser-assigned flairs/tags
TikTokNoImplicit (trending categories)Hashtags by creators
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sawicki, J.; Bitsikokos, L.; Belinskaya, Y.; Ganzha, M.; Paprzycki, M. Leveraging Network Analysis and NLP for Intelligent Data Mining of Taxonomies and Folksonomies of PornHub. Appl. Sci. 2025, 15, 9250. https://doi.org/10.3390/app15179250

AMA Style

Sawicki J, Bitsikokos L, Belinskaya Y, Ganzha M, Paprzycki M. Leveraging Network Analysis and NLP for Intelligent Data Mining of Taxonomies and Folksonomies of PornHub. Applied Sciences. 2025; 15(17):9250. https://doi.org/10.3390/app15179250

Chicago/Turabian Style

Sawicki, Jan, Loizos Bitsikokos, Yulia Belinskaya, Maria Ganzha, and Marcin Paprzycki. 2025. "Leveraging Network Analysis and NLP for Intelligent Data Mining of Taxonomies and Folksonomies of PornHub" Applied Sciences 15, no. 17: 9250. https://doi.org/10.3390/app15179250

APA Style

Sawicki, J., Bitsikokos, L., Belinskaya, Y., Ganzha, M., & Paprzycki, M. (2025). Leveraging Network Analysis and NLP for Intelligent Data Mining of Taxonomies and Folksonomies of PornHub. Applied Sciences, 15(17), 9250. https://doi.org/10.3390/app15179250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop