Next Article in Journal
An SST-Based Emergency Power Sharing Architecture Using a Common LVDC Feeder for Hybrid AC/DC Microgrid Clusters and Segmented MV Distribution Grids
Next Article in Special Issue
Evaluating OFDMA and TWT in Wi-Fi 6/7 for QoS Assurance in IoMT Networks
Previous Article in Journal
Multi-Scale Graph-Decoupling Spatial–Temporal Network for Traffic Flow Forecasting in Complex Urban Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Systematic Evaluation of the Infrastructure of Free Content Websites: Network, Cloud, and Country-Level Security Analysis

1
Department of Digital Transformation Programs, Institute of Public Administration, Riyadh 11141, Saudi Arabia
2
Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
3
Information Systems Department, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
4
College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia
5
Department of Computer Engineering, University College of Applied Sciences, Gaza P.O. Box 1415, Palestine
6
Department of Computer Science, Northeastern Illinois University, Chicago, IL 60625, USA
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(3), 497; https://doi.org/10.3390/electronics15030497
Submission received: 30 October 2025 / Revised: 15 December 2025 / Accepted: 1 January 2026 / Published: 23 January 2026
(This article belongs to the Special Issue Modeling and Performance Evaluation of Computer Networks)

Abstract

We statistically examine the global distribution of free content websites (FCWs) by analyzing their hosting network scale, cloud service provider, and country-level presence, both in aggregate and across specific content categories. These measurements are contrasted with those of premium content websites (PCWs) and with general websites sampled from the Alexa top-1M. We further evaluate their security characteristics using multiple security indicators. Our findings show that FCWs and PCWs are predominantly hosted in medium-scale networks, which are strongly associated with a high concentration of malicious websites. At the cloud and country level, FCW distributions follow heavy-tailed patterns that differ from those of PCWs. Beyond static distributions, our analysis also uncovers dynamic trends, where PCWs demonstrate improving security postures over time while FCWs reveal increasing maliciousness in several categories and hosting regions. This study contributes to understanding the FCW ecosystem through comprehensive quantitative analysis. The results suggest that the harm posed by malicious FCWs can potentially be contained through effective isolation and filtering, given their concentration at the network, cloud, and country levels, and that longitudinal monitoring is essential to capture their evolving risks.

1. Introduction

Websites are broadly classified into two categories based on payment requirements: free content websites (FCWs) and premium content websites (PCWs) [1]. As the name suggests, FCWs provide content such as books, music, movies, software, and games without requiring payment. PCWs, on the other hand, charge a premium for accessing similar types of content. Both categories are prevalent across the web, though FCWs are increasingly popular due to their accessibility and convenience. However, this very appeal exposes them to significant security and privacy threats. FCWs have become an integral part of the Internet, but their widespread use amplifies associated risks. Prior studies have shown that FCWs often lack robust privacy policies to protect users’ data and rights. Furthermore, FCWs host some of the most malicious content compared to both PCWs and the general web population (e.g., Alexa’s top million websites) [1,2,3,4].
Research Gap. Despite extensive research on the FCW ecosystem and its security implications, little attention has been paid to the infrastructure supporting FCWs and how it differs from that of PCWs. To understand the interplay between FCWs and Internet infrastructure, it is essential to: (1) investigate the networks that host FCWs and assess whether network size correlates with their security, (2) analyze the role of Cloud Service Providers (CSPs), including their attributes and associations with FCWs’ security posture, and (3) explore the spatial distribution of FCWs in comparison to PCWs, at the country level. This work addresses these gaps by modeling FCWs in contrast to PCWs and the general web population, thereby providing a better understanding of their ecosystem and associated risks.
Our Approach and Rationale. Our study evaluates several dimensions of the FCW ecosystem, each guided by distinct motivations. Examining the network characteristics of FCWs relative to PCWs provides insights into Internet-scale vulnerabilities. Analyzing the distribution of FCWs across networks, CSPs, and countries—especially in regions with concentrated malicious content—helps identify risks and informs mitigation strategies. Finally, studying FCWs deployment at the country level highlights how national cybersecurity policies influence the prevalence and management of harmful content. We employ network analysis methods to characterize FCW hosting by classifying networks into four categories: small, medium, large, and very large, based on subnet masks. Subnet masks indicate the number of possible addresses reserved by hosting providers, which in turn reflects the number of publicly accessible hosts associated with FCWs.
Understanding the distribution of FCWs across networks is critical for designing practical defenses. Given that FCWs are reported to be more malicious than other website categories [1,4], identifying the most common network scales enables containment and isolation techniques to be applied more effectively. For example, if malicious FCWs are concentrated in small networks, isolating an entire network prefix may be the most effective strategy with minimal collateral disruption. Conversely, when malicious FCWs are hosted in very large networks, broad isolation becomes impractical, and filtering must instead target individual hosts. This perspective also supports prioritization under limited resources, allowing containment efforts to focus on the networks hosting the majority of malicious FCWs. Similarly, profiling FCWs at the CSP level provides a foundation for applying targeted risk-prevention procedures without inadvertently disrupting benign services.
Analyzing the geographical distribution of CSPs and hosting networks reveals how policies and regulations shape security. Understanding cross-border hosting patterns is vital, as FCWs often operate beyond the legal jurisdiction of their users. Users victimized by a malicious FCW hosted abroad may face significant challenges in pursuing legal remedies, such as requesting the removal of harmful or fraudulent content. These insights inform not only individual protection strategies but also regulatory and cooperative actions against CSPs linked to high concentrations of malicious FCWs.
Contributions. Building on a dataset of 1562 FCWs and PCWs from prior studies [1,4], augmented with multiple infrastructure and security dimensions and complemented with an independent sample from Alexa’s top one million websites [5], our work makes the following contributions:
  • Hosting and Network-Level Analysis. We systematically measure, analyze, and contrast the hosting patterns of FCWs and PCWs across the four network scales defined in our framework (Section 4.1). We further examine per-category hosting trends (Section 4.2), and quantify maliciousness using the indicators introduced in Section 3.3—Malicious Count (MC), Malicious Percentage (MP), and Malicious-Per-Feature Percentage (MPFP)—to evaluate how malicious websites are distributed across network sizes and to identify network scales associated with disproportionately higher malicious activity.
  • Hosting Networks Spatial Analysis. We identify malicious FCWs and PCWs and analyze their relationships with key infrastructure attributes. Beyond network scale, we enumerate the hosting countries associated with both website types and uncover heavy-tailed, highly concentrated geographical hosting patterns (Section 4.3).
  • Cloud Service Providers (CSPs) Analysis. We enumerate the major CSPs hosting FCWs and PCWs, characterize which providers host benign and malicious websites, and contrast these patterns with those in general web infrastructure using the Alexa-based benchmark dataset (Section 4.4).
  • Temporal Dataset Analysis. We perform a temporal reassessment of FCWs and PCWs using updated VirusTotal scans [6] to evaluate how maliciousness and hosting parameters—network scale, CSP distribution, and geographical location—change over time. This temporal dimension reveals the dynamic evolution of the FCW/PCW ecosystem and its implications for infrastructure-level security (Section 5).
The rest of the paper is organized as follows. Related work is presented in Section 2, data collection and methodology are described in Section 3, analysis results are reported in Section 4, temporal analysis is given in Section 5, discussion is provided in Section 6, and conclusion is drawn in Section 7.

2. Related Work

Prior works have examined the security, privacy, and modeling of free content websites (FCWs) [1,3,7,8]. Other studies investigated the role of infrastructure, including the use of content management systems (CMSs) in FCWs [4], and explored the security and network-level characteristics of widely used websites [9,10,11,12,13,14,15,16,17,18]. In addition, prior works have statistically analyzed domain-specific security breaches in web services, such as those affecting healthcare providers, along with their associated network characteristics [19]. Related efforts have investigated the role of infrastructure in securing web services [20,21,22,23,24]. Furthermore, multiple studies have analyzed various security features of web infrastructure [25,26,27,28].
Free Content Websites Analysis. The security and privacy of FCWs have been a significant concern in recent research. Alabduljabbar et al. [3] examined FCW security through the analysis of SSL certificates, including their issuers, validity, and signatures. They assessed the authenticity of these certificates and evaluated their coverage terms and overall website security, finding that 36% of FCWs relied on invalid, expired, or fake SSL certificates. Alqadhi et al. [4] studied the impact of content management systems (CMSs) on FCW security. They built a database of CMSs used by FCWs and cross-validated it against online resources such as CMS-detector and W3Techs. Through frequency analysis of CMS-based versus custom-coded FCWs, coupled with VirusTotal annotations, they reported that over 56% of FCWs employed custom code and are more likely to be malicious. Their findings suggest that FCWs, relying on popular CMSs, may exhibit increased malicious behavior. The security of free hosting infrastructure has also been investigated. Roy et al. [7] analyzed phishing attacks hosted on free web hosting domains (FHDs), which can evade detection and takedown mechanisms by anti-phishing entities. Based on a large-scale analysis of 8800 FHD URLs shared on Twitter and Facebook, they showed that such phishing attacks remained active 1.5 times longer than regular phishing URLs, had 1.7 times lower coverage on blocklists, and took 3.8 times longer to be detected by security tools.
General Websites Analysis. Several studies have examined the security of widely used websites at scale. Kontaxis et al. [9] investigated cross-domain policies in Rich Internet Applications (RIAs) such as Microsoft Silverlight and Adobe Flash, which are extensively deployed but prone to malicious exploitation. Their study, conducted on Alexa’s top 100 K websites and the websites of Fortune 500 companies at both global and country levels, identified more than 6500 vulnerable websites exposed to cross-domain security threats. Li et al. [29] focused on malicious advertising activities within websites by analyzing 90,000 leading domains. They demonstrated how attackers infiltrated advertising networks and revealed the role of malicious nodes in online advertising, characterizing their behaviors and interactions in detail.
More recently, Lindkvist et al. [30] analyzed secure communication practices across malicious and benign domains. Their findings indicate that while HTTPS and strong cipher suites are often adopted, they do not necessarily guarantee trustworthiness, as phishing domains may exhibit stronger protections than many benign ones. In contrast, our work examines the broader ecosystem of websites, emphasizing hosting networks, cloud- and country-level distributions, and category-specific risks. Whereas their work highlights protocol-level practices, ours emphasizes structural and ecosystem-level characteristics.
Websites Content Analysis. Several works have investigated the relationships between content, usability, service quality, and website security. Figueras-Martín [31] analyzed website connectivity, relationships, and content within the Freenet darknet. The results revealed widespread website availability, key structural nodes in the network, and a predominance of illegal content. Samarasinghe et al. [32] studied privacy risks in religious websites and mobile apps, highlighting the extensive use of trackers that compromise user data and erode user trust. Chen et al. [33] examined the impact of migration stress on risky Internet behaviors, showing how it increased scam victimization among Chinese migrant workers. Hernandez-Suarez et al. [34] proposed a methodological approach using text transformers and dense neural networks to detect websites hosting infringing content. In contrast, our work emphasizes network-level affinities and distributions to better understand the ecosystem from an infrastructure perspective.
Network Security Analysis. Website infrastructure security is a fundamental factor in safeguarding networks, especially since 50% of all websites rely on specific content management systems [4,35]. Noroozian et al. [36] conducted a longitudinal study of broadband CSPs to evaluate their role in mitigating IoT malware, with a focus on Mirai. By analyzing infection rates across 342 global CSPs, they found that 55% of the observed variation was explained by the number of subscribers per CSP. Wickramasinghe et al. [18] analyzed the hosting patterns of malicious domains by examining the hosting types of IP addresses. Their results showed that more than 95% of malicious websites were hosted on regular hosting IPs, and 97.1% of these websites shared infrastructure with unrelated benign websites. They further identified Cloudflare, Amazon, Google, OVH, and Microsoft as the top five hosting providers of malicious domains. Their findings underscore the need for stronger security measures by hosting providers to safeguard shared hosting infrastructures.
The Role of Infrastructure in Website Security. Fryer et al. [37] investigated malicious web pages and proposed mitigation strategies that hosting providers might implement to strengthen their defenses. Liao et al. [38] examined long-tail search engine optimization (SEO) spam on cloud service providers (CSPs). By analyzing 15,774 cloud directories across 10 major providers, they identified 3186 abusive directories used for long-tail SEO spam. Their study revealed the monetization strategies of spammers and their evasion techniques, including obfuscation through link shorteners and client-side JavaScript in cases where server-side scripting was unavailable.
Tajalizadehkhoob et al. [39] explored the distribution of web security features and patching practices in shared hosting providers to assess their influence on website compromise. Wang et al. [23] investigated the growing consolidation of DNS and web hosting providers, a trend with significant implications for Internet security, reliability, and availability. Their findings showed that Amazon and Cloudflare exclusively host the name servers for more than 40% of domains, while only five organizations (Cloudflare, Amazon, Akamai, Fastly, and Google) collectively hosted approximately 62% of the Tranco top 10 K index pages along with most external resources. A comparison between our work and related studies, in terms of scope and focus, is presented in Table 1.

3. Methodology

In this section, we review our research questions (Section 3.1), the dataset and the data collection methods used for augmentation (Section 3.2), and the various distributional (Section 3.3) and temporal analysis dimensions (Section 3.4).

3.1. Research Questions

The main goal of this paper is to provide a systematic understanding of the hosting patterns of FCWs, their utilization of Internet infrastructure, and their contrast with PCWs and the general website population. Moreover, we undertake the task of identifying specific patterns in infrastructure utilization by considering the various content types (e.g., books, games, movies, music, or software). To this end, our pursuit raises several questions that we attempt to answer:
  • RQ1. What are the hosting patterns of malicious FCWs?
  • RQ2. Are there network-level patterns associated with malicious FCWs’ hosting?
  • RQ3. Is there an affinity between FCWs, PCWs, and their hosting at the country level?
  • RQ4. How do hosting patterns of FCWs and PCWs compare to those of general websites?
  • RQ5. What are the distributional characteristics of FCWs at major cloud providers?
  • RQ6. Do the security properties of FCWs and PCWs change over time?

3.2. Dataset and Data Collection

Our effort to address our research questions relies on several datasets: (1) a primary dataset of FCWs, PCWs, and their associated annotations, (2) two complementary datasets for augmenting the analysis of the primary dataset in terms of security (maliciousness detection) and network scale enumeration, and (3) a dataset representing the general website population to aid in our contrast analysis against FCWs, PCWs, and their infrastructure utilization. We review these datasets and describe how they were obtained.

3.2.1. Free and Premium Content Websites Dataset

The primary dataset of FCWs and PCWs used in this study consists of 1562 websites previously analyzed in [1,3,4]. The criteria for including a website in our list are inherited from prior work and based on three factors: (1) popularity, (2) language, and (3) activity. Popularity is assessed based on a website’s ranking in major search engines when it is returned as a result of keyword searches. Language is evaluated by ensuring that only websites with English as the primary language are retained. Activity is determined by checking whether a website returned by the search engine is online (live) at the time of analysis.
At the time of Alabduljabbar et al.’s work [1], all websites in their dataset were active. Additionally, their dataset construction ensured a balanced representation across different content categories, a criterion we also maintain.
In their original study, Alabduljabbar et al. [1], estimated website popularity using three search engines: Google, DuckDuckGo, and Bing. The classification of websites as FCWs or PCWs was determined through manual inspection. Each website was also manually categorized into one of five content types: books, games, movies, music, or software. Keywords such as “free,” “premium,” “paid,” and “pay-per-use” were used for classification.
After filtering websites according to these criteria, we query all domain names to extract their associated IP addresses. We verify that 1509 websites are still online (96.6% of the total dataset), with 788 classified as FCWs and 721 as PCWs. The content distribution among FCWs and PCWs is as follows: (1) Books: 144 (free) vs. 191 (premium), (2) Games: 78 (free) vs. 111 (premium), (3) Movies: 310 (free) vs. 152 (premium), (4) Music: 80 (free) vs. 86 (premium), (5) Software: 176 (free) vs. 181 (premium). The distribution of the websites is shown in Figure 1 (where # on the y-axis signifies the count).
Annotation Consistency and Error Control. To ensure that website classification was not affected by annotator subjectivity, we employed a fixed, keyword-based coding protocol for distinguishing FCWs from PCWs and for assigning content categories. Keywords such as “free,” “premium,” “paid,” and “pay-per-use” were used to guide initial labeling, followed by manual inspection of each website’s access model and primary content type. All labels were then cross-verified in a second independent pass to reduce individual annotator bias and ensure consistency. This procedure has been validated in prior FCW studies and provides reliable internal error control for the dataset construction process.
Rationale for Manual, Rule-Based Classification. In line with prior FCW studies, we adopt a rule-based classification protocol in which websites are labeled using explicit keyword criteria followed by manual verification of the website’s access model and content semantics. Machine learning–based classification was not employed because distinguishing FCWs from PCWs requires semantic and contextual judgments that current automated models do not accurately infer without first depending on manually labeled training data. Relying on a controlled, keyword-driven labeling process therefore avoids subjectivity drift, preserves consistency with earlier datasets, and ensures high-precision annotation aligned with the business-model definitions central to this study.

3.2.2. Malicious Websites Annotation

A primary goal of this work is to examine the latent variables that may help explain the lax security in FCWs, particularly concerning their usage of Internet infrastructure. To start, we utilize VirusTotal [6] to identify whether a website (at the time of the analysis) is malicious or benign. In this study, we carry out a comprehensive re-evaluation of dataset security to ascertain the temporal stability of FCWs and PCWs. This investigation aims to provide insights into whether the security measures associated with FCWs and PCWs remained constant over time or exhibited variability, thus contributing to the advancement of our understanding in this critical domain of online content delivery. https://www.virustotal.com/gui/home/upload, an online tool that integrates more than 70 combined scanning engines, is used to determine whether a domain name (URL), IP address, or binary—which can be identified by a unique identifier; i.e., hash value of its contents—is malicious or benign. VirusTotal enables us to identify malicious IP addresses, domains, and URLs associated with the websites used in this study. We augment the collected data with the output from VirusTotal. Since VirusTotal returns multiple detection results, we considered an entity—website or IP—to be malicious if at least one of the returned scan results was marked as malicious.
Detection Reliability and Cross-Validation. It is important to note that our analysis does not rely on a single detection engine. VirusTotal aggregates the results of approximately 90 independent security scanners, including antivirus engines, URL classifiers, and behavioral analysis systems. Accordingly, the malicious or benign labels used in this study reflect a multi-engine consensus rather than the judgment of any individual detector. This ensemble-based approach is widely adopted in prior large-scale web security measurement studies due to its robustness and its ability to reduce false positives or false negatives that may arise from individual classifiers. Given the breadth and diversity of VirusTotal’s engines, incorporating additional external tools would provide only marginal benefit and is unlikely to materially affect the correctness of our annotations. Nevertheless, future work may explore the integration of complementary services that provide orthogonal detection signals for longitudinal validation.

3.2.3. Network Scale Enumeration

Another goal of this work is to understand the scale of the network infrastructure associated with FCWs and PCWs, which leads us to define network scale. To identify the scale of the network for each website in our dataset, we use the associated IP address with that domain as an analysis feature. Then, we utilize two major APIs, https://ipdata.co/ [40] and https://en.ipshu.com/ [41], to extract intelligent information related to the given IP address, such as domain name, subnet mask, cloud service provider, and geographical location, for further augmenting our dataset with scale information. The subnet mask for each IP address is extracted to identify the network scale for each website. Then, each website is classified according to the network scale using the CIDR (Classless Inter-Domain Routing) notation as follows: (1) small network: any network that is between (/25 and /32), (2) medium networks: any network that is between (/16 and /24), (3) large networks: any network between (/8 and /15), and (4) very large networks: any network that is /7 and below. The sizes above correspond to the value range of (20 to 27), (28 to 216), (217 to 224), and (225 and more). The characteristics of the network scales are presented in Table 2.
Justification of Subnet-Mask–Based Network Sizing. The subnet mask provides a standard, protocol-level indicator of network size under CIDR notation, as it directly determines the number of publicly routable IP addresses allocated to a hosting entity. Accordingly, using prefix lengths to define small, medium, large, and very large networks offers a concrete and reproducible measure of network scale. Our empirical analysis in Section 4 further validates this choice: maliciousness rates, hosting concentration, and category-specific patterns exhibit strong and consistent correlations with these CIDR-derived scales. These results demonstrate that subnet-mask– based network sizing is both an appropriate and sufficient method for characterizing hosting infrastructure within the scope of our study.
Determining the Number of Network Scales. For each website, we obtain the CIDR subnet mask and prefix length associated with its hosting IP address using the ipdata and IPSHU APIs. These prefix lengths are then mapped into the four network-scale classes defined in Table 2 (small, medium, large, and very large). The number of prefixes reported in the analysis corresponds to the number of distinct CIDR blocks present in the dataset, and each website is assigned a scale based on the prefix length of the block it belongs to. This process directly reflects the actual IPv4 allocation sizes and ensures that the network-scale counts are grounded in the underlying routing and allocation structure.
On the Use of WHOIS Data. Although WHOIS information can provide domain-level registration metadata, it is not used in this study because such records are often incomplete, inconsistent across registrars, and heavily redacted due to privacy regulations. Moreover, WHOIS fields are not reliably aligned with hosting infrastructure attributes such as subnet allocation, network size, or CSP ownership. Our framework focuses on IP-level features obtained via the ipdata and IPSHU APIs, which supply standardized and machine-readable subnet, ASN, CSP, and geolocation information. These attributes directly support the infrastructure-focused analyses conducted in this work and provide higher reliability for large-scale measurement.

3.2.4. General Websites Sample

A benchmark dataset that represents the web ecosystem to understand the infrastructure utilization for FCWs and PCWs. To this end, an unbiased random sample of 2400 websites from Alexa’s top one million websites dataset [5] was generated and used. To ensure that the measured characteristics of the websites represent the larger population, we fixed the confidence interval to 2% and 95%, respectively, to produce a sample size of 2400. By definition, the change in sample size is insignificant as the population grows. Therefore, the number of samples is kept at 2400. In subsequent implementation, we refer to this dataset as “general”.
As in the pre-processing and augmentation procedures of FCWs and PCWs, general websites are examined, online or offline (i.e., activity). As a result, only 2057 websites are online, corresponding to 85.7% compared to 96.5% for the final dataset of FCWs and PCWs. We then extracted the CSPs, countries, and subnet mask information of each sample using the ipdata API. Moreover, VirusTotal is used to identify malicious websites and their concentrations among different network scales for comparison.

3.3. Distribution Analysis Dimensions

We employ a statistical analysis approach to identify patterns and statistical differences between FCWs, PCWs, and general websites across various analytical dimensions. Our analysis focuses on eight key dimensions: network scale, CSP, country, maliciousness (in terms of count and percentage), and maliciousness per feature (in terms of count and percentage). We define each of these dimensions below in detail. The workflow of this analysis is illustrated in Figure 2.
Network Scale. The network scale analysis is based on the network scale feature, defined in Section 3.2.3. This feature signifies the network size where FCWs, PCWs, and the general websites reside. Based on the annotation in Section 3.2.3, this feature has four valid values: small, medium, large, and very large.
Cloud Service Provider (CSP). CSP signifies the cloud service provider where FCWs, PCWs, and general websites reside. Based on the analysis presented later in Section 4, this feature has 298 valid values (service providers’ names).
Country. This feature indicates the country name where the infrastructure (driven by IP allocation) of FCWs, PCWs, and general websites is located. Our analysis ultimately reveals that this feature has 41 distinct valid values across various countries.
Count. Count signifies the number of websites residing within the assigned entity type: network scale, CSP, or country.
Percentage. Signifies the count of FCWs, PCWs, or general websites residing in a given entity type, normalized by the total number of the studied websites for that given website type. This feature is used to understand the variance in the distribution of websites with respect to the studied feature.
Malicious Count (MC). This feature indicates the number of malicious websites residing within a specific infrastructure entity, based on the studied feature (CSP, country, or network size). Maliciousness is determined using VirusTotal scan results, as highlighted in Section 3.2.2.
Malicious Percentage (MP). This feature signifies the normalized malicious website count for the studied feature (i.e., country, CSP, network scale) over the sample’s total malicious count (i.e., total malicious websites in FCWs, PCWs, both, or general websites). In essence, this feature highlights the contribution of a specific infrastructure entity, among all other entities, to the maliciousness ascribed to the entity type (country, CSP, or network scale). Namely, M P = M C Total # Malicious Websites .
Malicious Per Feature Percentage (MPFP). This feature signifies the normalized number of malicious websites over the number of websites residing in the given infrastructure entity (country, CSP, or network scale). This feature describes the contribution of the studied entity to the malicious website population, considering its relative size in our dataset. Compared to the MC dimension, which characterizes an entity’s contribution to overall maliciousness, MPFP normalizes this quantity by the total number of websites potentially residing in the given entity to address the fact that different entities could have vastly different scales. In contrast to the MC feature indication, the MPFP means that an entity large in scale could contribute very little to the maliciousness once this scale is considered. M P F P = M C # Malicious Websites under One Dimension .

3.4. Temporal Analysis

After annotating the data, we performed a frequency analysis to assess the distribution of FCWs and PCWs in hosting patterns (networks, CSPs, and countries). The results show a high concentration of FCWs and PCWs in medium-sized networks compared to the other types of network size. We also discovered the top ten CSPs around which FCWs, PCWs, and malicious websites are distributed. Similarly, we find the top ten hosting countries for FCWs compared to PCWs, revealing the number of malicious websites. The following dimensions are used in this analysis. ❶ Network size. This feature was defined earlier in Section 3.3. ❷ Cloud Service Provider (CSP). The CSP feature is similar to feature ❷ defined previously in Section 3.3. ❸ Country. This feature represents the hosting country, which is the same as ❸, appearing in Section 3.3. ❹ Count. This feature represents the number of websites included in each category. The same feature ❹ defined in Section 3.3. ❺ Percentage. This metric quantifies the proportion of various categories of websites, defined in Section 3.3 as feature ❺. ❻ Malicious Count (MC). This metric measures the count of malicious websites within a particular entity, the same feature defined in Section 3.3 as feature ❻. ❼ Malicious Percentage (MP). This feature is the same as feature ❼ in Section 3.3. ❽ Malicious Per Feature Percentage (MPFP). This feature calculates the ratio of malicious websites to the total number of websites within a specific infrastructure entity (country, CSP, or network size), defined earlier in Section 3.3 as feature ❽. ❾ Old Malicious Percentage (OMP). This feature signifies the previous MP calculated based on the past scanner results using https://www.virustotal.com/gui/home/upload [6] API in previous work [4]. ❿ The Difference in Malicious Percent (Diff). This feature signifies the difference between the recent malicious scan using https://www.virustotal.com/gui/home/upload [6] compared to the previous scan. If the difference results in a positive value, the percentage of malicious websites increases in the category studied. A negative difference indicates a decrease in malicious websites within the studied category. This feature is formulated as D i f f = M P O M P .
Scope Clarification. While factors such as privacy policies, website operation duration, and WHOIS-based registration metadata can influence website security, an in-depth analysis of these dimensions is beyond the scope of the present study. Our work is purposefully focused on infrastructure-level characteristics—including hosting networks, cloud service providers, country-level distribution, and temporal maliciousness evolution. Investigating privacy-policy content or domain registration records would require a separate data collection and parsing pipeline distinct from the methodological framework developed here. Prior work has examined these aspects of FCWs (e.g., Alabduljabbar et al. [1]), and our analysis is designed to complement rather than duplicate those efforts.
Temporal and Geographical Applicability. Because CSP policies and regional hosting regulations evolve over time, we explicitly incorporate both temporal and geographical dimensions in our methodology. Our updated VirusTotal evaluation (Section 3.4 and Section 5) captures temporal variation in maliciousness, while the country-level hosting analysis (Section 4.3) reflects geographical differences in CSP deployment and regional infrastructure behaviors. Together, these dimensions ensure that our findings remain robust despite the dynamic nature of hosting ecosystems. We note that long-term temporal evolution is an inherent challenge in Internet measurement research, and extending the temporal window is an important direction for future work.

4. Analysis Results

This section presents the findings of our distribution analysis pipeline applied to the extracted dataset. We first compare the trends of free content websites (FCWs) and premium content websites (PCWs) across different network scales, CSPs, and countries. We then examine their distribution within the top one million most-visited websites. Finally, we provide a per-category analysis for books, games, movies, music, and software, highlighting similarities, differences, and security implications.

4.1. General Network Scale Analysis

The distribution analysis over the network scale yields several important insights summarized as follows. (1) Most websites reside in medium-scale networks, accounting for 81.24% of the total number of studied FCWs, PCWs, and general websites. (2) PCWs are more likely to use large networks, reflected in a higher proportion of secure websites than those in medium networks. (3) FCWs in medium networks are the riskiest category, with nearly 90% of FCWs hosted there and 40% classified as malicious. (4) Our per-category analysis shows that books, movies, and software websites rely more on large networks than games and music websites. These categories are also generally less malicious, except for the free software category, where most websites are in medium networks and exhibit the highest MP. This result is expected, as attackers often recruit victim devices by convincing users to install unauthenticated free software—ultimately influencing the security classification. (5) Across all categories (books, games, movies, music, and software), both FCWs and PCWs primarily reside in medium networks. Premium websites in medium networks account for ≈75% to ≈85% on average, compared to ≈84% and over 97% in free websites, with the game category showing the highest concentration in both. (6) Most CSPs are fairly evenly distributed between medium- and large-scale networks. (7) Hosting of FCWs and PCWs is concentrated mainly in the “United States,” where ≈58% of websites reside. (8) Large-scale networks are predominantly located in the United States, which hosts ≈71% of them.

4.1.1. Dataset Versus Benchmark

As shown in Table 3, a concentration of malicious websites is observed in medium-scale networks for both the combined FCWs/PCWs dataset and the general dataset, with MP values of 23.06% and 3.89%, respectively. Specifically, 27.38% of medium-network websites in the FCWs/PCWs dataset exhibit malicious behavior per network scale count (MPFP). In contrast, the general dataset reports a significantly lower rate of 4.92% for the same feature. This notable difference supports our hypothesis that a higher proportion of malicious websites are hosted within the FCWs/PCWs dataset.
Consequently, it is essential to account for both network scale and the degree of malicious activity when managing network security risks. These findings highlight the importance of considering such factors when comparing and analyzing datasets. Failing to address them could result in inadequate strategies for addressing security threats, ultimately compromising safety and integrity.

4.1.2. Free Versus Premium Websites

As shown in Table 4a, the majority of websites are hosted in medium networks, accounting for ≈89.1% and ≈78.9% of FCWs and PCWs, respectively. The MPFP for FCWs is nearly double that of PCWs, with ≈40.5% compared to ≈22.2%. The highest MP values in both groups are observed in medium networks, at ≈37.7% for FCWs and ≈19.8% for PCWs. These findings highlight the need for targeted defenses against websites hosted on medium-sized networks that contain malicious content. Furthermore, ≈20% of PCWs are hosted in large networks, which may provide improved security given the relatively lower presence of FCWs in these networks. Overall, the results support our hypothesis that malicious websites in FCWs and PCWs follow similar hosting patterns. A more thorough examination of these patterns is necessary to mitigate vulnerabilities and enhance security.

4.2. Per-Category Network Scale Analysis

In this section, we review the results and findings of our measurements through a per-category analysis of websites associated with books, games, movies, music, and software.

4.2.1. Book Websites

As shown in Table 4b, clear trends emerge across different network scales hosting FCWs and PCWs. Approximately 85% of FCWs and 80.1% of PCWs are hosted in medium networks, which together account for ≈82.4% of both types of websites. The MPFP is ≈30% for FCWs and ≈27.8% for PCWs. Notably, ≈31.7% of FCWs were found malicious compared to ≈30% of PCWs, indicating a substantial issue with book websites in medium networks. Within these networks, ≈27% of FCWs and ≈24% of PCWs contribute to the total malicious website MP. It is also worth noting that ≈17.3% of PCWs are hosted in large networks compared to only 10% of FCWs. Furthermore, ≈28.6% of FCWs in small networks were identified as malicious, compared to 0% in PCWs. While the difference is less pronounced in large networks, it is quite significant in small networks, potentially explaining the overall MP gap between FCWs and PCWs that offer book content.

4.2.2. Games Websites

As shown in Table 4c, a significant concentration of game websites is hosted in medium networks, with ≈97.4% of FCWs and ≈85.6% of PCWs. In total, ≈90.5% of both types of websites are hosted in medium networks. This suggests that organizations providing gaming content prefer medium networks, possibly to ensure high network speeds for users worldwide. Additionally, ≈12.6% of PCWs are hosted in large networks compared to just ≈1.3% of FCWs. This aligns with the earlier finding that large networks enhance the security of PCWs, as they contribute only 1.8% MP despite the total MP of PCWs being ≈31.5%. In contrast, the MP of FCWs reaches 64.1%. These results highlight the elevated risk associated with free gaming websites compared to premium gaming websites, emphasizing the broader vulnerability of game-related platforms.

4.2.3. Movie Websites

As shown in Table 5a, most FCWs and PCWs in the movie category are hosted in medium networks, similar to the games category. Specifically, 91.61% of FCWs and 75.66% of PCWs fall into this category, meaning that nearly 9 out of 10 FCWs and 3 out of 4 PCWs are hosted in medium networks. Large networks are particularly appealing to PCWs, which account for 23.03% of them, compared to only 6.77% of FCWs.
Within medium networks, 26.41% of FCWs were identified as malicious, contributing 24.19% of the total 26.45% MP. In contrast, PCWs exhibit a lower MP of 15.13%, with 13.82% attributed to websites hosted in medium networks. Notably, 18.26% of PCWs overall were identified as malicious. Small and very large networks are relatively uncommon for both FCWs and PCWs. A striking disparity is observed in small networks, where 50% of PCWs were found malicious compared to 0% of FCWs, although the overall count remains small.
These findings suggest that the movie category lags behind the book and games categories in terms of security. Movie websites frequently rely on cross-domain video players, which may increase the number of reported security threats and their susceptibility to malicious attacks. Overall, the results emphasize the dominance of medium networks in hosting movie websites and the higher MP among FCWs compared to their premium counterparts. The preference of PCWs for large networks may be explained by improved security or superior performance.

4.2.4. Music Websites

As shown in Table 5b, the distribution of music websites across network scales is dominated by medium networks. More than 90% of FCWs and 75% of PCWs are hosted in medium-sized networks. Large networks host approximately 8.8% of FCWs and 22.1% of PCWs, indicating a stronger preference for PCWs within these networks.
An apparent disparity emerges in malicious presence (MP) between FCWs and PCWs. Nearly 40% of FCWs are classified as malicious, compared to only ≈17% of PCWs. Medium networks account for much of this difference, as 43% of FCWs in these networks are malicious, versus ≈18% of PCWs. This corresponds to a total MP contribution of 100% for FCWs and 80% for PCWs within medium-sized networks. Large networks, by contrast, play a critical role in PCW security, exhibiting a lower MP of only ≈3.5%.
Interestingly, no malicious FCWs are hosted in large networks, despite the overall MP of FCWs being more than double that of PCWs. This suggests that large networks provide a safer environment for FCWs. Common to both movie and music websites is the reliance on shared content players across multiple domains, which may increase the risk of malicious exploitation.

4.2.5. Software Websites

As shown in Table 5c, FCWs in the software category exhibit the highest malicious concentration among all categories. Approximately 84% of FCWs are hosted in medium networks, where the MPFP reaches ≈70%. This aligns with expectations, as software applications often require system-level access when installed from FCWs, making them highly susceptible to malicious activity. In comparison, ≈77% of PCWs are hosted in medium networks, but with a much lower MPFP of ≈22%.
PCWs make greater use of large networks, with ≈21% compared to ≈9.7% of FCWs. FCWs also show significant reliance on small networks, with ≈6.8% of websites, of which ≈33.3% are identified as malicious, compared to 0% for PCWs. Within large networks, FCWs demonstrate a high MPFP of ≈41.2%, contributing to a total MP of ≈64.2%, in contrast to only ≈18.8% for PCWs.
These findings highlight the severity of the risks associated with software websites, particularly FCWs hosted in medium networks, which exhibit a very high MPFP of ≈69.4%. The results underscore the critical need for tightened scrutiny of free software sources given their disproportionate contribution to malicious activity.

4.3. Networks’ Spatial Analysis

While the abstract network-level distribution analysis sheds light on the structure of FCW and PCW networks, annotation at the CSP and country level provides additional insight into interdependence within this ecosystem. We begin with an examination of cloud service providers (CSPs) and their hosting countries as part of the spatial analysis.
As shown in Table 6, most websites (≈84%) are hosted in medium networks, with large networks accounting for only ≈13% and small networks for ≈2.5%. A negligible fraction (<0.1%) of websites is hosted in very large networks. FCWs and PCWs are distributed across multiple CSPs, with the highest concentrations in Cloudflare (≈27%) and Amazon (≈16%), both of which operate predominantly in medium- to large-sized networks based on their IP allocations. Liquid (4.8%), Trellian (2.8%), and Google (2.7%) also host a significant number of websites, primarily on medium-sized networks.
Other CSPs each host fewer than 3% of websites and are associated only with medium networks. The “Others” category represents ≈36.6% of websites, dispersed across all network scales, with a notable concentration of ≈30.8% in medium networks. Further analysis of CSP distributions across countries is essential, as regional variations may influence the overall security posture of hosted websites.

4.3.1. Countries of Networks

As shown in Table 7, the distribution of hosting countries for FCWs and PCWs across network scales reveals several notable patterns. A significant fraction of websites (≈84.2%) are hosted in medium networks, with the United States accounting for the majority at ≈56.5%. Overall, the United States hosts 58.7% of websites distributed across small, medium, large, and very large networks. Belgium and the Netherlands also host a considerable share, primarily in medium networks, with ≈6.6% and ≈6.3% of websites, respectively.
This analysis highlights the diversity of hosting patterns across countries and network scales, while reaffirming the dominance of medium networks. These findings underscore the importance of evaluating the maturity of national security policies, as most malicious websites appear to depend on medium-scale networks for their operation. By focusing on these networks and examining scale-level practices, it may be possible to curb the proliferation of malicious websites more effectively.

4.3.2. CSPs over Countries

As shown in Table 8, the distribution of the most commonly used CSPs across the top hosting countries reveals clear trends. Cloudflare leads with 410 websites, primarily hosted in the United States (296 websites) and Belgium (98 websites). Amazon follows with 240 websites, the majority of which (191 websites, 79.6%) are hosted in the United States.
Other leading CSPs include Liquid Web and Trellian, which host 72 and 42 websites, respectively. Liquid Web is exclusively hosted in the United States (72 websites), while Trellian is exclusively hosted in Australia (42 websites). Google hosts 41 websites, of which 34 are in the United States. LeaseWeb serves 37 websites, most of them (34) in the Netherlands. SP-Team hosts 35 websites, all located in Germany. Akamai hosts 33 websites, with 28 in the Netherlands, while Fastly accounts for 26 websites, primarily in the United States (12 websites). Microsoft hosts 21 websites, with 16 located in the United States.
The “Others” category in Table 8 includes 552 websites, with 260 hosted in the United States. In total, 1509 websites were analyzed, and the United States accounts for the largest share at ≈58.6%, followed by Belgium, the Netherlands, and Germany.

4.3.3. Network Distribution Heatmaps

We generate heatmaps to illustrate the distribution of network scales across countries based on data from Table 7. Figure 3a highlights the distribution of small networks (SN column), identifying the United States as the primary host. Figure 3b shows the distribution of FCWs and PCWs within medium networks (MN column), where the United States, Belgium, and the Netherlands emerge as the main host countries. Figure 3c presents the distribution of FCWs and PCWs within large networks (LN column), with the United States, Germany, and France as the main hosting countries.
These visualizations address RQ1 and RQ2 by providing detailed information on the geographical distribution of network scales and reaffirming the dominance of medium networks in hosting FCWs and PCWs. The results are consistent with earlier findings, revealing hosting patterns and their potential impact on website security and reliability. Importantly, medium networks are identified as less secure or reliable than large networks. A closer examination of different types of medium networks is necessary to better understand the most severe hosting patterns. Implementing robust defensive measures against websites hosted on medium networks, particularly those that offer FCWs, and identifying the primary locations of malicious websites are crucial steps toward enhancing online security.

4.4. Cloud Service Providers Analysis

As highlighted in Section 4.3, a CSP-level analysis provides deeper insight into the ecosystem of FCWs and PCWs, particularly for malicious websites. To this end, we extend our analysis to examine the affinities between different categories of websites and the major cloud providers across our assessment metrics. The distribution of FCWs and PCWs across CSPs reveals several key aspects: (1) Most FCWs, PCWs, and general websites are hosted on Cloudflare. Furthermore, Cloudflare exhibits the highest concentration of malicious websites among all CSPs in the three categories. (2) Amazon, while one of the largest providers, has the lowest concentration of malicious websites. Although this cannot be concluded definitively, one possible explanation is the stronger measures Amazon employs to mitigate security risks in shared infrastructure, compared to more permissive providers. (3) In the per-category analysis, FCWs predominantly rely on Cloudflare, whereas PCWs use Cloudflare only in the game category. For the remaining categories of PCWs, Amazon is the most frequently used hosting provider. (4) Providers with the highest concentration of malicious websites are located in the United States and Belgium. This can be attributed to providers such as Cloudflare, which primarily operate in these countries. (5) Overall, there is a strong affinity between the state of a website (malicious or benign) and its hosting provider.
Clarifying Cloudflare Risk Interpretation. It is important to distinguish between Cloudflare’s Edge/Proxy infrastructure and the Origin infrastructure that hosts the actual content. Because Cloudflare frequently operates as a reverse proxy, malicious actors often use its edge nodes to obscure the true origin of their infrastructure. Consequently, the “hosting risk” associated with Cloudflare in our measurements predominantly reflects proxy abuse rather than physical content hosting by Cloudflare itself. Our results should therefore be interpreted as identifying the surface through which malicious traffic is routed, not the physical server where the malicious content resides.
Regional Subsidiaries of CSPs. Although large CSPs operate through multiple regional subsidiaries, the number of region-specific CSP entries (e.g., Amazon US, Amazon CN) in our dataset is extremely small and statistically insignificant. Aggregating these subsidiaries under their parent CSP does not affect the validity of our findings. Moreover, regional variation is already accounted for by our country-level hosting analysis in Section 4.3, where differences in geographical deployment are reflected in the associated hosting countries.

4.4.1. Free and Premium Websites Comparison

The hosting pattern follows a heavy-tailed distribution. For both FCWs and PCWs, the top eight providers (Cloudflare, Amazon, Liquid Web, LeaseWeb, SP-Team, Akamai International, Fastly, and Microsoft) host 63.42% of the websites, while the remaining websites are distributed across 290 providers.In particular, 80.59% of malicious websites are hosted in these top CSPs.
The top five providers in terms of MPFP are Cloudflare, Liquid Web, LeaseWeb, SP-Team, and Trellian. Interestingly, Amazon, the second largest provider in terms of hosting volume, has a relatively low MPFP (≈12.9%) and a smaller MP compared to the other top providers. Fastly, which hosts 26 websites, does not have malicious websites. The category “Others” includes 552 websites (≈36.6%) and shows an MPFP of ≈16.9% with an MP of 6.16%. These results highlight the variation in security levels among CSPs.
Although Cloudflare and Amazon are the most popular providers for FCWs and PCWs, they differ significantly in terms of MPFP and MP. Cloudflare, the leading provider, contributes ≈68.5% to MPFP and ≈18.6% of the total MP. In contrast, Liquid Web ranks second in MP, contributing only ≈2.1% of all malicious websites. This distinction underscores two points: A low MP may correlate with a low proportion of hosted websites, while significant differences in MPFP reflect the relative security posture of individual CSPs.

4.4.2. Benchmark Websites

A comparison of Table 9a and Table 10a provides a unified perspective, summarized as follows. First, Cloudflare and Amazon emerge as the most popular CSPs. Cloudflare hosts the largest number of websites and records the highest MC count. According to Table 10a, Cloudflare hosts ≈27.2% of websites, while in Table 9a it hosts ≈16.4%. Amazon, second on the list, hosts ≈15.9% of websites in the first table and ≈10.9% in the second. Liquid Web consistently ranks second in terms of MPFP in both tables. In Table 9b, it is the third largest provider, hosting ≈4.8% of websites, while in Table 10a it ranks sixth, hosting ≈1.9%. Fastly stands out for its unique characteristics. In Table 9a, it does not host MC websites, yet in the benchmark results it records an MPFP of ≈4.2% and an MP of only 0.05%. The category Others represents a substantial share of websites, ranging from ≈36.6% to 53.5% across the tables.
In conclusion, Cloudflare and Amazon are the most popular CSPs among FCWs and PCWs, with Cloudflare hosting the highest number of MC websites and exhibiting the highest MPFP. Liquid Web ranks second in MPFP, while Fastly is notable as a provider without any MC websites in Table 9a.

4.4.3. Free Websites

The distribution of FCWs across various CSPs is presented in Table 9b. We observe that Cloudflare dominates the market by hosting ≈33.8% FCWs, with ≈64.3% of its hosted websites identified as malicious, resulting in 21.7% MP. Liquid Web and Amazon are the second and third most popular CSPs, respectively, hosting 8.5% and ≈6.9% of FCWs. Liquid Web has an MP of ≈4%, while Amazon’s MP stands at 1.9%. CSPs such as Trellian, LeaseWeb, and Sp-Team each host ≈5% of FCWs and exhibit similar MC and MP values. Notably, the “Others” category, encompassing a variety of CSPs, hosts ≈30% of FCWs and presents an MP of ≈7.2%. With a total of 788 FCWs, 319 (≈40.5%) were malicious.

4.4.4. Premium Websites

In Table 9c, which represents the distribution of PCWs in different CSPs, Amazon emerges as the most prominent host, accommodating 25.8% of the total PCWs. Among the websites hosted by Amazon, 8.6% are malicious, resulting in an overall MP of ≈2.2%. Cloudflare ranks as the second largest host with ≈20% of PCWs, with a higher proportion (≈76.4%) of malicious websites, leading to an MP of ≈15.3%. Other notable CSPs include Akamai, Google, Fastly, and Microsoft, which host around 2% to 4% of PCWs. Regarding malicious content, Google and Microsoft show MPs of ≈0.4% and ≈0.3%, respectively, while Akamai has a lower MP of ≈0.3%. Fastly and eBay host ≈3.2% and ≈1.1% of PCWs, respectively, but neither hosts any malicious content. Interestingly, Sp-Shopify hosts only ≈1.7% of PCWs but has a high proportion (≈83.3%) of malicious websites, resulting in an MP of ≈1.4%. Wal-Mart and OVH each host about 1% of PCWs and have MPs of ≈0.1%. Lastly, the “Others” category, which includes a variety of CSPs, hosts ≈35.1% of the total PCWs. With only ≈6% of its hosted websites classified as malicious, the category exhibits an MP of ≈2.1%. The table shows 721 PCWs, with (≈22.2%) malicious.

4.4.5. Free Versus Premium Websites

Upon comparing the distribution of FCWs and PCWs in different CSPs, as shown in Table 9b, several insights are drawn. First, we found that Cloudflare is the most prominent hosting cloud for FCWs, hosting ≈33.8% of the total FCWs, while Amazon is the most prominent host for PCWs, with 25.8%. Interestingly, the MP of Cloudflare is higher for PCWs (≈15.3%) compared to FCWs (≈21.7%), indicating that Cloudflare hosts a higher proportion of malicious PCWs than FCWs. In contrast, Amazon has a higher MP for FCWs (1.9%) than PCWs (≈2.2%), suggesting that it hosts proportionally more malicious FCWs than PCWs. Google has a relatively low MP for both FCWs (≈0.5%) and PCWs (≈0.4%), implying that it hosts a smaller proportion of malicious websites than other CSPs. The total number of websites is higher for FCWs (788) than for PCWs (721), with 41% and ≈22.2% being malicious, respectively, indicating that FCWs have a higher overall prevalence of malicious content than PCWs—some CSPs, for example, Liquid Web, Trellian, LeaseWeb, and Sp-Team, host only FCWs. In contrast, others like Akamai, Fastly, Microsoft, Sp-Shopify, eBay, and Wal-Mart only host PCWs, suggesting that different CSPs may have different preferences when hosting FCWs or PCWs or affinities in those types of websites for selecting a specific provider.

4.5. Per-Category Cloud Service Providers Analysis

4.5.1. Book Websites

Table 10b shows the distribution of Books FCWs and PCWs on CSPs. In FCWs, Cloudflare hosts the most, with 39 websites representing ≈27% of the total. Amazon follows with 11 websites (≈7.6%), Liquid Web with 10 websites (≈7%), Trellian and Sp-Team with 6 and 5 websites, respectively, and others collectively hosting 73 websites (≈50.7%). For PCWs, Amazon tops the list with 41 websites (≈21.5%), followed by Cloudflare with 40 websites (≈21%). Other CSPs in this category include Google, Sp-Shopify, Fastly, and others, with varying counts and percentages. Regarding MC, Cloudflare dominates in both FCWs and PCWs with 28 and 32 instances, respectively. The MPFP is highest for Cloudflare among FCWs (≈71.8%) and Sp-Shopify among PCWs (≈75%). The MP is fairly distributed between Cloudflare (≈19.4% for FCWs and ≈16.8% for PCWs) and other CSPs.

4.5.2. Games Websites

Table 10c presents the distribution of Games FCWs and PCWs on different CSPs. For FCWs, Cloudflare is the dominant CSP, hosting 42 websites, which account for 53.85% of the total. Other CSPs in this category include Mivocloud with 5 websites (≈6.4%), LeaseWeb and Liquid Web with 3 websites each (≈3.9% each), Amazon with two websites (≈2.6%), and others collectively hosting 23 websites (≈30%). For PCWs, Cloudflare is the leading CSP, hosting 37 websites (≈33.3%). Amazon comes next with 22 websites (≈20%), Akamai with 11 websites (≈10%), Fastly with 5 websites (4.50%), Google with 4 websites (3.6%), and “Others” hosting 32 websites (≈28.8%). Considering the MC aspect, Cloudflare has the highest count for both FCWs and PCWs, with 39 and 29 instances, respectively. The MPFP for Liquid Web is highest among FCWs at 100%, while Cloudflare leads among PCWs at ≈78.4%. The MP is distributed between various CSPs, with Cloudflare accounting for 50% in FCWs and ≈26.1% in PCWs.

4.5.3. Movie Websites

Table 11a shows the distribution of Movies FCWs and PCWs on different CSPs. For FCWs, Cloudflare is the leading CSP, followed by Liquid Web (≈11.6%), Trellian (≈9.7%), Sp-Team (≈7.7%), Amazon (≈6.1%), and all others hosting 118 websites (≈38.1%). Regarding PCWs, Amazon leads with 56 websites (≈36.8%), followed by Cloudflare with 18 websites (≈11.8%), Akamai with nine websites (≈5.9%), Google with seven websites (≈4.6%), Fastly with five websites (≈3.3%), and “Others” hosting 57 websites (37.5%). In terms of the MC, the largest count in FCWs is observed with Cloudflare (17), and in PCWs with Cloudflare (15). The MPFP highlights Sp-Team as the highest in FCWs with 37.5%, while Cloudflare tops PCWs with ≈83.3%. The MP is distributed among various CSPs: Cloudflare has ≈5.5% in FCWs, and has ≈9.9% in PCWs.

4.5.4. Music Websites

Table 11b presents the distribution of Music FCWs and PCWs across different CSPs. For FCWs, Cloudflare is the dominant CSP, hosting 22 websites (27.5% of the total). Sp-Team follows with six websites (7.5%), then Google with four websites (5%), Amazon with three websites (≈3.8%), Liquid Web with two websites (2.5%), and the Others category with 43 websites (≈53.8%). In the case of PCWs, Amazon leads with 30 websites (≈35%), followed by Cloudflare with 12 websites (≈14%), Fastly and Google each with 4 websites (≈4.7%), Apple with 2 websites (≈2.3%), followed by “Others” with 34 websites (≈39.5%). Regarding MC, Cloudflare has the highest count for both FCWs (16) and PCWs (9). In terms of MPFP, Liquid Web has the highest percentage in FCWs with 100%, while Cloudflare takes the lead in PCWs with 75%. The MP is distributed among various CSPs: Cloudflare accounts for 20% in FCWs, and in PCWs, Cloudflare accounts for ≈10.5%.

4.5.5. Software Websites

Table 11c shows the distribution of FCWs and PCWs software across various CSPs. In the case of FCWs, Cloudflare is the leading CSP with 80 websites (≈45.5%), followed by Amazon with 19 websites (10.8%), Liquid Web with 16 websites (≈9.1%), LeaseWeb with 11 websites (≈6.3%), Voxility LLP with 4 websites (≈2.3%), and Others with 46 websites (≈26.1%). On the other hand, for PCWs, Amazon and Cloudflare are the most prominent CSPs, each hosting 37 websites (≈20.4%), followed by Microsoft with 9 websites (≈5%), Akamai and Google each with 6 websites (≈3.3%), and “Others” with 86 websites (≈47.5%). In terms of MC, Cloudflare has the highest count for FCWs (71) and the second highest for PCWs (25). For FCWs, the highest MPFP is found in Cloudflare (≈88.8%), while for PCWs, it is also found in Cloudflare with 67.57%. Regarding MP, Cloudflare has the highest percentage in FCWs at ≈40.3% and the second highest in PCWs at ≈13.8%.

5. Temporal Analysis Results

5.1. Security Analysis and Updates in Scans

This section will provide the results of re-scanning the dataset by https://www.virustotal.com/gui/home/upload. We will compare the current results with previous scanning results and shed light on the changes in recent results (resulting in improvements). Then, we will examine the impact of these changes on the network size frequency analysis, CSPs, and hosting countries.

5.1.1. VirusTotal Overall Scanning Results

Table 12a delimits the results of the scans in various content categories, FCWs, and PCWs. A comparative analysis with previously scanned data reveals notable enhancements in aggregate results, evidenced by a 0.6% decrease in the overall discrepancy metric D i f f . This trend is particularly pronounced in the context of benign websites, with the book websites having improved by 13.77%, an 11.64% increase in the safety of gaming websites, an 8.43% improvement in music websites, and a modest 0.86% improvement in software-related websites. Interestingly, movie-related websites showed a substantial deterioration of 14.5%. These statistics underscore a generally positive change in the digital safety landscape, with marked progress in reducing potentially malicious content in various categories.
Potential Explanation. The shifts in security scan results across different content categories highlight targeted changes in the cybersecurity landscape. In FCWs, there is a rise in malicious content, particularly in popular categories such as movies and software, possibly due to more sophisticated concealment methods evading detection. In contrast, PCWs show a notable decrease in malicious content across all categories, likely due to enhanced security measures and investments in cyber defenses to safeguard revenue-generating content. These trends–increased malicious activity in FCWs and improved security in PCWs–underscore distinct approaches to addressing cyber threats, with PCWs potentially implementing more effective measures to uphold consumer trust and comply with stricter standards. In the subsequent sections, we dig deeper into the results by analyzing the FCWs and PCWs separately.

5.1.2. Security Updates in FCWs

To elucidate the factors contributing to the surge in benign websites as reflected in the overall results, we scrutinize the data presented in Table 12b. Our examination reveals an increase in the prevalence of malicious websites across various categories of FCWs, as indicated by an upsurge in the D i f f value for four categories. Specifically, the proportion of malicious websites in the movie category increased by 29.36%, followed by an 11.94% increase in the software category, an 8.98% increase in the gaming category, and a 1.61% increase in the book category. Overall, the percentage of malicious sites across all categories witnessed a significant increase of 15.23%. Contrary to this trend, the free music category notably improved, although marginally, with a decrease in malicious content by 1.25% (statistically insignificant). This divergence is particularly striking, given the substantial deterioration observed in other categories.
Potential Explanation. One possible reason for the deterioration in detection results on these websites could be attributed to their high activity level and frequent updates. These sites often employ aggressive tracking and monitoring practices to sustain their business model, utilizing scripts sometimes flagged as malicious by prominent antivirus scanners. These continuous changes make them susceptible to frequent updates in security labels and scans. Introducing new content that scanners may flag as malicious negatively impacts their security labels through deterioration.

5.1.3. Security Updates in PCWs

Table 12c presents a contrasting scenario detailing the scan outcomes for websites hosting PCWs. In a marked departure from the trend observed in FCWs, all five categories within the premium segment exhibited substantial improvements. The overall reduction in the discrepancy metric D i f f stands at an impressive 17.89%. Notably, the games category showed the most significant enhancement, with an improvement of 26.12%, closely followed by the book websites, which improved by 24.95%. Other categories also showed moderate but noteworthy improvements: music websites with a 15.11% enhancement, software websites with a 13.26% improvement, and movie websites with 9.87%. These significant strides in elevating the quality of PCWs have contributed notably to the general improvements observed in the scan results across both FCWs and PCWs categories.
Potential Explanation. One possible reason for the enhanced security performance of the PCWs could be attributed to their concern for reputation. These websites may take proactive measures upon detection to identify the underlying cause and remove any scripts triggering the flagging of their sites as malicious. As some detections may be false positives, these websites also have an opportunity to rectify any inaccuracies identified by the antivirus scanner, thereby enhancing their overall security stance.

5.2. Network Size Evolution

The data in Table 13a provides our analysis of the frequency of malicious websites that host FCWs and PCWs across various network sizes. The scanning results reveal a consistent pattern: medium-sized networks exhibit the highest concentration of malicious websites. In stark contrast, small networks harbor a significantly lower proportion of such websites, accounting for only 2.52% of the total FCWs and PCWs. However, it is notable that a substantial 42.11% of the websites in these small networks are identified as malicious. Large networks, on the other hand, have a malicious website percentage of 19.6%.

5.2.1. Summary of Changes

Malicious Websites Distribution. A particularly intriguing observation is the recorded increase in the MPFP metric across all network sizes when compared to previous results [42]. This indicates a general upward trend in the level of malicious content, regardless of network size. Moreover, the overall rate of maliciousness has escalated from 24.98% to 31.15%, a rise that appears to be primarily driven by the increased prevalence of malicious content within free websites. This trend underscores a growing concern for digital safety and the need for enhanced security measures in FCWs.
FCWs Distribution. Table 13b reveals a marked increase in the concentration of malicious websites across networks of different sizes. Small networks show the most significant rise, with the MPFP soaring from 23.08% to 61.54%. This figure has increased in medium-sized networks from 42.31% to 55.56%, and in large networks, it has increased from 26.67% to 55%. This trend highlights the critical need for robust monitoring and security measures, especially in smaller networks, where the growth of malicious content is most acute. The data indicates a strong correlation between network size and vulnerability, indicating the importance of implementing targeted cybersecurity strategies across varying network scales that contain FCWs.
PCWs Distribution. The data presented in Table 13c pertaining to PCW networks demonstrate a significant reduction in the prevalence of malicious content. This trend is observed in all network sizes. Specifically, small networks experienced a decline in malicious website percentage from 8.33% to an impressive 0%, indicating the complete disappearance of such content in these networks. Medium networks showed a decrease, dropping from 25.31% to 4.39%. Large networks witnessed a reduction in malicious website presence from 11.51% to 4.32%. In general, the MPFP for PCWs decreased from 22.19% to 4.3%.

5.2.2. Insights and Potential Explanation

These findings support earlier observations that networks hosting PCWs are generally more secure and less prone to malicious activities than their FCWs counterparts. Moreover, the data suggests that large networks are more commonly utilized for hosting PCWs, which correlates with a lower incidence of malicious content. This trend underscores the effectiveness of the security measures implemented within PCWs networks and highlights the potential benefits of these strategies in enhancing online safety and integrity.
Potential Explanation. The comparative analysis of old and recent scans across network sizes reflects a nuanced cybersecurity environment where small networks, particularly in the context of FCWs, demonstrate increased maliciousness, possibly due to their inherent security limitations and a higher probability of being exploited by threat actors targeting these less secure sites. Medium networks, often the backbone for FCWs due to their cost effectiveness and popularity, show a heightened presence of malicious content, potentially because their substantial benign traffic conceals malicious activities, making them attractive for malevolent exploitation while difficult to isolate without affecting legitimate operations.
On the other end of the spectrum, the improvement in PCWs’ security profile, especially in large networks, suggests an active engagement in security practices, likely driven by a desire to maintain their reputation. They may be adopting more advanced cybersecurity measures and diligently addressing any security gaps highlighted by scans, including false positives, which in turn bolsters their defense against genuine threats. This layered approach underscores a broader trend of escalating security efforts in response to the evolving digital threat landscape, balancing the need to protect against malicious content with the imperative of uninterrupted service delivery.

5.3. Cloud Service Providers Evolution

5.3.1. Summary of Changes

Overall Distribution. The frequency analysis of FCWs and PCWs across different CSPs indicates a notable overall improvement in their security posture. As detailed in Table 14a, there is a slight decrease in the overall MPFP among the top hosting CSPs, dropping from 31.74% to 31.15%. Clearly, “Cloudflare” has significantly improved hosting safety, with a 6.1% reduction in total malicious content and a 22.44% improvement in MPFP. Despite this substantial progress, “Cloudflare” remains the CSP with the highest incidence of malicious websites.
In contrast, our analysis reveals a concerning increase in malicious websites hosted by other top CSPs. “Amazon” exhibited a modest increase in malicious website hosting, with a 0.83% increase. More significantly, “Liquid Web” experienced a significant 8.34% increase, and “Trellian” displayed an even more significant increase of 35.71%. Additionally, “Google” reported a 4.88% increase in malicious site hosting, “Leaseweb” an 8.11% rise, and “SP-Team” showed a notable 15.83% increase. However, it should be noted that other top hosting CSPs have shown a decrease in their MPFP. This suggests that these CSPs’ security measures and protocols effectively mitigate the associated risks with hosting malicious content.
FCWs Distribution. The analysis of FCWs in Table 14b reveals that a distinct trend deviates from the overall results. The top CSPs show an increase in malicious websites hosted. Specifically, “Cloudflare” saw its MPFP rise from 64.29% to 68.42%. Similarly, “Liquid Web” experienced an increase from 46.76% to 56.72%, “Amazon” from 27.78% to 50%, “Trillian” from 23.81% to 59.52%, and “Leaseweb” from 27.78% to 36.11%.
This trend is consistent with other top CSPs, contributing to an overall increase of 15.23% in hosting malicious FCWs. This significant increase underscores the challenges associated with hosting FCWs. It reveals an evolving landscape where the maliciousness of these sites becomes more pronounced over time, particularly within the top hosting CSPs networks. The data highlight the urgent need for increased vigilance and advanced security measures to address the increasing risks of hosting FCWs.
PCWs Distribution. Data from the CSP Table 14c focusing on PCWs presents a comprehensive analysis that indicates a marked reduction in the hosting of malicious PCWs. On average, the top hosting CSPs exhibited a significant decrease in their MPFP, averaging ≈19% reduction. Notably, several CSPs have shown remarkable progress, with MPFP plummeting to near zero or below 5%; e.g., “Cloudflare” experienced a decrease from 76.39% to 4.86%, “Akamai” from 6.25% to 0%, “Microsoft” reduced from 11.11% to 0%, and “SP-Shopify” dropped from 83.33% to 0%.

5.3.2. Insights and Potential Explanation

Interestingly, “Google” CSP deviates from this trend, with MPFP moving from 10% to 13.33%. This change underscores the unique challenges and responses that different CSPs may face in managing PCWs security. These findings also highlight the evolving nature of security in the context of PCWs website hosting. Substantial improvements in most CSPs suggest a trend toward secure hosting environments over time. This trend indicates the increasing effectiveness of the security measures that these providers can potentially use, reflecting their commitment to improving their safety and integrity.
Potential Explanation. The differences in the malicious concentration across CSPs for FCWs, and PCWs hosting in the recent scans highlight the variable cybersecurity postures and practices within the hosting industry. CSPs like Cloudflare and Amazon, which show a higher number of websites hosted, have experienced a decrease in MPFP and total MP in recent scans. This trend may reflect their ongoing commitment to enhancing security measures, driven by the reputational risks of hosting a significant volume of malicious sites. These CSPs are likely to have the resources to implement detection and mitigation strategies, thereby reducing the prevalence of malicious content.
For smaller CSPs, e.g., Trellian and LeaseWeb, the MPFP remains relatively high, which could be due to the higher concentration of FCWs. These FCWs often feature dynamic content and aggressive advertising models that can inadvertently introduce malicious scripts, thereby inflating the MPFP. As these CSPs may cater to a niche market or operate with fewer resources, their capacity to enforce stringent security protocols might be limited, sustaining the MPFP levels seen in scans.
On the other hand, the decrease in MP among PCWs hosted on platforms like Amazon indicates that premium services are likely implementing rigorous security protocols, perhaps as part of a value proposition to their customers. This could include regular audits, immediate rectification of vulnerabilities, and swift responses to any detection of malicious content to maintain the integrity and trustworthiness of their hosted websites.
Across the board, the reduction in overall malicious counts and percentages reflects a positive shift towards better security, yet the data also points to the ongoing challenges CSPs face in balancing security with the need for accessible and dynamic content delivery. The nuanced changes underscore the importance of continuous security improvements and proactive measures to combat evolving threats in the digital landscape.

5.4. Hosting Countries Evolution

5.4.1. Summary of Changes

Overall Distribution. Table 15a highlights the frequency of host countries for FCWs and PCWs, revealing patterns consistent with the CSP analysis. Overall, there is a marginal decrease in each country’s MPFP, contributing to an aggregate reduction of 0.6%. For example, the United States shows a decrease from 33.60% to 27.15%, and Belgium from 67.68% to 65.66%. However, several countries show an increasing pattern: the Netherlands (+8.42%), Germany (+19.1%), and Australia (+31.3%) all report substantial increases, while the United Kingdom shows a modest increase of 5.13%. By contrast, France decreases by 5.72%, China remains stable at 21.21%, Canada rises by 8.33%, and Ireland achieves a notable reduction to 0%. These diverse patterns point to a complex interplay between content type and global distribution, where improvements often reflect a higher share of PCWs, while worsening cases correlate with increased FCW hosting.
FCWs Distribution. Table 15b reveals sharp increases in MPFP for several top FCW-hosting countries. The United States rises by 11.78%, Belgium by 3.41%, Germany by 24.33%, the Netherlands by 18.18%, and Australia by 35.71%. The United Kingdom and Russia also rise by 17.65% and 38.47%, respectively. Canada shifts from 0% to 50%, and Romania increases by 28.57%. These results confirm that FCWs are highly vulnerable, with 55.71% of FCWs classified as malicious. The data suggest a strong correlation between FCW hosting location and maliciousness, underscoring the need for enhanced monitoring of FCWs hosted in mid-tier countries such as Belgium, Germany, Australia, and Romania.
PCWs Distribution. By contrast, Table 15c shows that PCWs have generally improved in security. The United States reduced its MPFP from 17.48% to 4.54%, while other countries also show reductions. Only China remains stable at 21.43%. Most major PCW-hosting countries report single-digit or zero MPFP values, reinforcing the effectiveness of PCWs in implementing security strategies. The overall trend highlights a structural divide: PCWs maintain consistently lower malicious presence compared to FCWs.

5.4.2. Insights and Explanations

United States as a Hosting Hub with Divergent Risks. The United States dominates hosting across all categories, accounting for ≈59% overall, ≈51% of FCWs, and ≈67% of PCWs. Yet the security profile diverges: FCWs in the U.S. show very high malicious counts (MC = 218, MPFP ≈ 55%), while PCWs remain relatively secure (MPFP ≈ 4.5%). This contrast illustrates how business models, rather than geography alone, shape hosting risks.
Belgium Punching Above Its Weight. Belgium hosts only 6–12% of websites but consistently contributes disproportionately to FCW maliciousness (MC = 65, MPFP ≈ 74%). However, in PCWs, Belgium shows no malicious instances. This high-risk, low-volume profile suggests targeted misuse of Belgian infrastructure by FCWs.
Premium Websites as Structurally Cleaner. Across nearly all major hosting countries, PCWs show minimal malicious presence. Several countries (e.g., Ireland, India, Germany, France, Belgium) host PCWs with no malicious content at all. Even when malicious instances exist (e.g., United States, United Kingdom, Canada), they remain in the single digits. This pattern reflects stronger compliance and contractual enforcement in premium services compared to FCWs.
Medium-Sized Hosts as Outliers. Countries with smaller total hosting shares, such as Germany and Australia, record disproportionately high FCW maliciousness (Germany MPFP ≈ 46%; Australia ≈ 60%). These “peripheral hotspots” highlight how attackers leverage mid-tier providers with weaker oversight to host malicious FCWs.

5.4.3. Geographic Fragmentation of FCWs Versus Centralization of PCWs

FCWs are geographically diverse, with notable shares in Belgium, Germany, the Netherlands, Australia, France, and Romania. Many of these show elevated MPFP values (e.g., Romania ≈ 72%). PCWs, however, are highly centralized in the United States and some stable jurisdictions, reflecting stronger regulation and business incentives. This fragmentation versus centralization illustrates how attackers distribute FCWs to evade oversight while PCWs consolidate for reliability.
Potential Explanation. The divergence in malicious presence across countries reflects varying cybersecurity practices, regulatory environments, and hosting strategies. Countries like the United States benefit from an advanced cybersecurity infrastructure, which reduces PCW risks but does not fully mitigate FCW exploitation. In Belgium and France, high MPFP values for FCWs may stem from the accessibility and monetization models of free content, whereas PCWs face higher reputational and financial stakes that drive stronger defenses. The overall global reduction in MPFP suggests improvements in cybersecurity awareness, yet persistent malicious concentrations in certain regions underscore the need for targeted investment and regulatory harmonization.

6. Discussion

The results of the network-scale distribution, spatial analysis, CSP evaluation, and temporal re-scans reveal consistent yet nuanced patterns across the benchmark, FCWs, and PCWs datasets. Below, we summarize the key takeaways, followed by the shortcomings and limitations of this study and the overall recommendations.

6.1. Main Takeaways

Medium-Scale Networks as the Core Risk Zone. Across categories, FCWs and PCWs are disproportionately concentrated in medium-scale networks, which consistently show the highest malicious presence (MP). These networks provide the optimal balance of availability, cost, and oversight gaps for attackers. However, isolating medium-scale networks is ineffective, as many benign websites also reside there. A finer-grained tiering of medium networks is necessary to separate malicious from benign clusters. PCWs utilize larger-scale networks than FCWs, highlighting the link between reliable networks and stronger security. Overall, large networks enforce higher standards, while medium networks remain the weakest link.
Business Model as the Primary Risk Driver. The divide between FCWs and PCWs is a stronger determinant of maliciousness than geography or CSP alone. FCWs are consistently more malicious across all categories, regardless of the host country or provider. Premium services enforce stricter compliance and security practices, resulting in significantly lower MPFP values. The ranking of categories by average MP confirms this divide: games (47.82%), software (41.49%), books (28.81%), music (28.1%), and movies (20.79%), with an overall average MP of 31.34%. This order also holds for the share of malicious websites hosted in medium-scale networks, addressing RQ1 and RQ2.
CSP Affinities Shape Hosting Risk. A small set of CSPs dominate hosting, but their risk profiles differ sharply. Cloudflare is both the most widely used CSP and the one with the highest malicious concentration, while Amazon consistently shows lower MPFP values, likely reflecting stronger enforcement. Liquid Web ranks second in MPFP, and Fastly did not host malicious websites in one of the benchmark tables. Despite these contrasts, overlaps between malicious and benign websites across providers complicate efforts to isolate the “riskiest” CSPs. Notably, FCWs and PCWs follow heavy-tailed CSP distributions similar to the benchmark (top one million websites), though benign websites in PCWs cluster around a smaller set of providers. These findings provide answers for RQ5.
Geographic Risk is Uneven and Contextual. The United States dominates hosting overall (≈59%) and for both FCWs (≈51%) and PCWs (≈67%). Yet risk profiles diverge: FCWs in the U.S. exhibit high malicious counts (MC = 218, MPFP ≈ 55%), while PCWs remain comparatively clean (MPFP ≈ 4.5%). Other countries, such as Belgium, Germany, Romania, and Australia, host fewer websites but show disproportionately high MPFP values for FCWs, making them peripheral hot spots. In contrast, PCWs are concentrated in a few countries that report near-zero malicious footprints (e.g., Ireland, India, Germany, France, Belgium), reinforcing the role of regulatory maturity. More than half of the top CSPs (58.58%) are U.S.-based, while others are distributed across Belgium, the Netherlands, Germany, and Australia. These results contribute to addressing RQ3.
Category-Specific Vulnerability Profiles. Software FCWs are the most dangerous, with the highest MPFP values, reflecting the inherent risks of executable downloads. Games and music FCWs also show elevated maliciousness, often linked to cross-domain players. Books and movies show relatively lower rates but still contribute significant malicious shares, showing that distribution models (e.g., downloads vs. streaming) affect vulnerability levels.
Benchmark Comparison. Comparing FCWs and PCWs against the benchmark dataset highlights answers to RQ4. Network-scale distributions appear similar across datasets, but CSP distributions diverge. Cloudflare and Amazon dominate across all three datasets, though FCWs and PCWs have significantly higher malicious rates than the top one million websites. This elevated rate is largely driven by FCWs hosted on top CSPs. Liquid Web, for example, hosts the highest proportion of malicious websites in the benchmark (MPFP 23.08%), and ranks second in the combined FCW/PCW dataset. Heavy-tailed patterns are evident in all datasets, but benign websites are far more concentrated in PCWs.
Temporal Analysis. The re-scanning of FCWs and PCWs using VirusTotal provides insights into RQ6. FCWs reveal increasing maliciousness over time, while PCWs improve their security posture. Free movie websites show the highest increase in malicious activity, whereas premium games and premium books exhibit the strongest improvements. Overall, the security of content websites improved with time, as did the resilience of their infrastructure entities (networks, CSPs, and hosting countries).
Structural Insights for Future Defenses. Overall, the business model (free versus premium) is the most reliable predictor of maliciousness. Defensive strategies should focus on medium-scale networks and mid-tier hosting countries that repeatedly emerge as high-risk. Static blacklisting is inadequate, as attackers exploit dynamic infrastructure (e.g., shared players and free software) that requires behavioral and contextual defenses. Effective risk management must integrate fine-grained network tiering, CSP monitoring, and longitudinal scanning to capture evolving malicious patterns.

6.2. Limitations

This study has several limitations that should be taken into account when interpreting the results. First, the dataset represents a snapshot of FCWs and PCWs collected at a specific point in time. Because hosting infrastructures, CSP policies, and regional regulations evolve rapidly, the findings may not fully generalize to other timeframes or future ecosystem conditions. Although we perform an updated VirusTotal reassessment in Section 5, long-term longitudinal measurements remain an important direction for future work.
Second, manual annotation was necessary to distinguish FCWs from PCWs and to determine content categories. While we followed a structured, keyword-based protocol with a verification pass to minimize ambiguity (Section 3.2.1), manual labeling inevitably introduces the possibility of human error. Automated or ML-based classification was not used due to the semantic and context-dependent nature of business-model identification, which current automated methods cannot reliably perform without first relying on manually annotated training data.
Third, maliciousness detection relies on VirusTotal’s multi-engine evaluation. VirusTotal aggregates approximately 90 independent security engines, which provides strong cross-validation, yet individual engines may still produce false positives or false negatives. We employ the standard “≥1” threshold used in prior FCW/PCW studies to identify whether a website has been flagged by at least one engine. This heuristic supports a consistent comparison across FCWs and PCWs, but it does not imply causality, nor does it guarantee that every flagged website is malicious. These interpretive constraints are acknowledged in our analyses.
Fourth, our infrastructure-focused framework leverages IP-level metadata obtained through the ipdata and IPSHU APIs, including subnet masks, prefix lengths, ASN information, CSP identifiers, and geolocation. While these sources provide standardized and machine-readable infrastructure attributes, they do not include detailed WHOIS-registration data. WHOIS records are often incomplete, inconsistent across registrars, and heavily redacted, making them unsuitable for large-scale, reproducible infrastructure measurements. As a result, our classification and enumeration do not incorporate domain-registration attributes such as ownership, administrative contacts, or registrar-level history.
Fifth, although our analysis considers hosting geography at the country level, we treat regional subsidiaries of major CSPs (e.g., Amazon Data Services Canada, Amazon Data Services France) as part of their parent CSP. This simplification facilitates aggregate CSP-level characterization, but it may obscure potential regional differences in security practices or operational policies. A more granular examination of region-specific CSP subsidiaries represents a promising direction for future research.
Finally, while we perform correlation analyses to understand how infrastructure attributes relate to maliciousness, the study does not attempt predictive modeling or causal inference. Additional data sources—such as behavioral traces, flow-level logs, or longitudinal snapshots—would be required to build predictive or causal frameworks. Our results should therefore be interpreted as descriptive associations rather than causal mechanisms.
On Predictive and Causal Modeling. The goal of this study is descriptive rather than predictive: we characterize how maliciousness is associated with infrastructure attributes such as network scale, CSP distribution, and hosting geography. While these correlations highlight conditions under which malicious FCWs and PCWs are more prevalent, they do not establish causality. Developing predictive or causal models would require additional longitudinal, behavioral, or content-level data and a modeling framework beyond the scope of the present measurement study. Our empirical findings, however, provide a useful foundation on which such future modeling efforts can build.
Relation to SDN-Based Traffic Classification. Software-Defined Networking (SDN) offers flow-level programmability and centralized policy enforcement that can support fine-grained traffic classification. Such methods require packet- or flow-level data and programmable control-plane environments, which differ fundamentally from the infrastructure-level attributes examined in this study. Because our focus is on hosting networks, CSPs, and geographical distributions rather than traffic-flow behaviors, SDN-based classification techniques are complementary but outside the scope of our measurement framework.
Dataset Scale and Annotation Feasibility. An important consideration in interpreting our findings is the scale and nature of the dataset used in this study. Although our dataset includes 1562 websites, this number reflects a substantial and deliberately bounded scope given the requirement for detailed manual annotation of access models, content categories, and infrastructure attributes. Manual inspection is essential for achieving accurate FCW/PCW labeling and fine-grained semantic categorization, but it does not scale linearly and cannot be reliably replaced by automated classifiers without first relying on similar human-labeled ground truth. Within these constraints, our dataset captures a broad and diverse cross-section of the FCW/PCW ecosystem. spanning multiple hosting networks, geographies, and content domains, and provides sufficient coverage to reveal consistent structural and security-related patterns. While larger datasets would be valuable in future work, the present scale reflects a balance between methodological rigor, feasibility, and the goal of characterizing key infrastructural dynamics with high labeling fidelity.

6.3. Recommendations

As in [1,3,4], our study found that FCWs are consistently more malicious and vulnerable than PCWs. Similar to other works that analyzed the security of the most used websites [10,11,29,43], our contrast analysis shows that the top one million websites are less malicious than FCWs. This suggests that, even when benign, FCWs may be more susceptible to security breaches. Furthermore, prior research examined security factors of CSPs and networks [20,21,23,36,37,38,39,44,45,46,47,48,49,50] and proposed techniques to strengthen networks with strong CSP affinities. Our findings echo this, highlighting CSPs frequently used to host malicious FCWs.
The results suggest that network administrators should adopt more stringent security measures to defend against malicious activities. Specifically, organizations should focus on addressing risks of medium-scale networks, as these are often linked to malicious websites. Examining the CSPs associated with FCWs can also help identify providers that host disproportionately high numbers of malicious websites, enabling targeted defensive or legal actions where appropriate. To improve classification accuracy, additional security annotations should be incorporated using tools beyond VirusTotal, such as Google Safe Browsing, PhishTank, and other security services.
While medium-scale networks exhibit a higher concentration of malicious activity, isolating them outright is operationally challenging and could lead to substantial collateral damage, as approximately 80% of benign websites also reside within these networks. To minimize such side effects, we refine our recommendation to emphasize stricter inspection, risk-weighted monitoring, and reputation-based throttling for medium-scale networks, rather than blanket isolation. These targeted controls preserve the benefits of prioritizing medium-risk segments while maintaining acceptable availability for benign services.
Further research is needed to deepen the understanding of the relationship between hosting patterns and malicious content. For instance, future studies could analyze factors such as website age or domain registration date, which may influence classification outcomes. Examining the dynamic code of FCWs could also reveal the severity of their vulnerabilities, thereby improving classification accuracy and offering deeper insight into their functionality.
Researchers should also investigate alternative methods for detecting malicious activity within medium-scale networks to strengthen internet security. Although the study shows that most websites fall within medium-scale networks, additional work is needed to determine which specific ranges pose the most significant risks. To address this, we propose dividing medium-scale networks into multiple tiers and assessing the security posture of websites within each tier. Such an approach would allow organizations to focus defenses on networks with heightened vulnerability, supporting more effective risk management.
Finally, it is critical to study how attackers exploit free services hosted by trusted providers to launch attacks. Identifying these exploitation methods will enable organizations to detect and mitigate attacks more effectively, thereby reducing their duration and impact. By understanding attacker strategies, defenses can be proactively strengthened, either preventing attacks altogether or minimizing their consequences.

7. Conclusions and Future Work

Building on the missing insights and interpretive gaps outlined in the previous subsection, we now turn to the broader conclusions of this study. Our results show that FCWs and PCWs are concentrated in medium-scale networks, similar to malicious websites, implying that isolating this type of network alone may not be an effective solution. Furthermore, we identified Cloudflare (≈68.9%), Liquid Web (≈44.4%), LeaseWeb (≈29.4%), SP-Team (≈28.6%), and Trellian (≈23.8%) as the most common CSPs with high overlap between malicious and benign websites. This indicates a need for further investigation of their distribution and potential weaknesses in security protocols or policies in the countries where they operate.
Future work should examine how the distribution of FCWs and PCWs changes over time and whether these changes follow specific patterns. It is also essential to identify effective strategies to contain and limit the spread of malicious FCWs, considering factors such as network scale, CSPs, and hosting countries. Additionally, comparing the distribution and hosting patterns of FCWs with other cyber threats, such as phishing, scams, or ransomware attacks, is essential to uncover commonalities or differences in their spread and impact.
This study highlights the ongoing need to enhance the security of FCWs. Future research could explore vulnerability enumeration in FCWs to raise user awareness by identifying weak points in their infrastructure before attackers can exploit them.

Author Contributions

Conceptualization, M.A. and D.M.; methodology, M.A. and M.H.; software, M.A. and A.A. (Abdulrahman Alabduljabbar); validation, M.H., H.A., and A.A. (Ahmed Abdalaal); formal analysis, M.A. and M.M.; investigation, M.A., M.H., and A.A. (Abdulrahman Alabduljabbar); resources, D.M.; data curation, H.A. and A.A. (Ahmed Abdalaal); writing—original draft preparation, M.A.; writing—review and editing, M.H., M.M., and D.M.; visualization, H.A.; supervision, D.M.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported via funding from Prince Sattam bin Abdulaziz University project number (PSAU/2025/R/1447). The authors would like to acknowledge Deanship of Graduate Studies and Scientific Research, Taif University for funding this work. Finally, this work was supported by a SEED Grant from The Office of Research and Commercialization of the University of Central Florida (2024/2025).

Data Availability Statement

All data is available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Alabduljabbar, A.; Mohaisen, D. Measuring the Privacy Dimension of Free Content Websites through Automated Privacy Policy Analysis and Annotation. In Proceedings of the Companion of the Web Conference, WWW’22, Lyon, France, 25–29 April 2022; pp. 860–867. [Google Scholar] [CrossRef]
  2. Akhawe, D.; Barth, A.; Lam, P.E.; Mitchell, J.C.; Song, D. Towards a Formal Foundation of Web Security. In Proceedings of the 23rd IEEE Computer Security Foundations Symposium, CSF, Edinburgh, UK, 17–19 July 2010; pp. 290–304. [Google Scholar] [CrossRef]
  3. Alabduljabbar, A.; Ma, R.; Choi, S.; Jang, R.; Chen, S.; Mohaisen, D. Understanding the Security of Free Content Websites by Analyzing their SSL Certificates: A Comparative Study. In Proceedings of the CySSS@AsiaCCS, Nagasaki, Japan, 30 May 2022; pp. 19–25. [Google Scholar] [CrossRef]
  4. Alaqdhi, M.; Alabduljabbar, A.; Thomas, K.; Salem, S.; Nyang, D.; Mohaisen, D. Do Content Management Systems Impact the Security of Free Content Websites? A Correlation Analysis. In Proceedings of the CSoNet, Virtual Event, 5–7 December 2022; pp. 141–154. [Google Scholar] [CrossRef]
  5. Ghodke, S. Top 1 Million Websites. 2022. Available online: https://www.kaggle.com/datasets/cheedcheed/top1m (accessed on 8 December 2022).
  6. Analyze Suspicious Files and URLs to Detect Types of Malware Automatically. 2022. Available online: https://www.virustotal.com/ (accessed on 14 December 2022).
  7. Roy, S.S.; Karanjit, U.; Nilizadeh, S. A Large-Scalenalysis of Phishing Websites Hosted on Free Web Hosting Domains. arXiv 2022. [Google Scholar] [CrossRef]
  8. Lee, D.; Nam, K.; Han, I.; Cho, K. From free to fee: Monetizing digital content through expected utility-based recommender systems. Inf. Manag. 2022, 59, 103681. [Google Scholar] [CrossRef]
  9. Kontaxis, G.; Antoniades, D.; Polakis, I.; Markatos, E.P. An empirical study on the security of cross-domain policies in rich internet applications. In Proceedings of the Fourth European Workshop on System Security, EuroSec, Salzburg, Austria, 10 April 2011; pp. 1–6. [Google Scholar] [CrossRef]
  10. Libert, T. Exposing the Hidden Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites. arXiv 2015. [Google Scholar] [CrossRef]
  11. Alsmadi, I.; Mira, F. Website security analysis: Variation of detection methods and decisions. In Proceedings of the 21st IEEE/Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, 25–26 April 2018; pp. 1–5. Available online: https://ieeexplore.ieee.org/abstract/document/8592962 (accessed on 20 December 2025).
  12. Dobolyi, D.G.; Abbasi, A. PhishMonger: A free and open source public archive of real-world phishing websites. In Proceedings of the IEEE Conference on Intelligence and Security Informatics, ISI, Tucson, AZ, USA, 28–30 September 2016; pp. 31–36. [Google Scholar] [CrossRef]
  13. Matic, S.; Tyson, G.; Stringhini, G. PYTHIA: A Framework for the Automated Analysis of Web Hosting Environments. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3072–3078. [Google Scholar] [CrossRef]
  14. Calzavara, S.; Rabitti, A.; Bugliesi, M. Content Security Problems?: Evaluating the Effectiveness of Content Security Policy in the Wild. In Proceedings of the ACM CCS, Vienna, Austria, 24–28 October 2016; pp. 1365–1375. [Google Scholar] [CrossRef]
  15. Calzavara, S.; Rabitti, A.; Bugliesi, M. Semantics-Based Analysis of Content Security Policy Deployment. ACM Trans. Web 2018, 12, 1–36. [Google Scholar] [CrossRef]
  16. Samarasinghe, N.; Adhikari, A.; Mannan, M.; Youssef, A.M. Et tu, Brute? Privacy Analysis of Government Websites and Mobile Apps. In Proceedings of the ACM Web Conference, Lyon, France, 25–29 April 2022; pp. 564–575. [Google Scholar] [CrossRef]
  17. Englehardt, S.; Narayanan, A. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the ACM CCS, Vienna, Austria, 24–28 October 2016; pp. 1388–1401. [Google Scholar] [CrossRef]
  18. Wickramasinghe, N.; Nabeel, M.; Thilakaratne, K.; Keppitiyagama, C.; Zoysa, K.D. Uncovering IP Address Hosting Types Behind Malicious Websites. arXiv 2021. [Google Scholar] [CrossRef]
  19. Alkinoon, M.; Choi, S.J.; Mohaisen, D. Measuring Healthcare Data Breaches. In Proceedings of the 22nd International Conference on Information Security Applications, WISA, Jeju Island, Republic of Korea, 11–13 August 2021; pp. 265–277. [Google Scholar] [CrossRef]
  20. Kohout, J.; Pevný, T. Automatic discovery of web servers hosting similar applications. In Proceedings of the IFIP International Symposium on Integrated Network Management, Ottawa, ON, Canada, 11–15 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1310–1315. [Google Scholar] [CrossRef]
  21. Rizvi, S.R.; Killough, B.D.; Cherry, A.; Gowda, S. Lessons Learned and Cost Analysis of Hosting a Full Stack Open Data Cube (ODC) Application on the Amazon Web Services (AWS). In Proceedings of the International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8643–8646. [Google Scholar] [CrossRef]
  22. Nguyen, V.L.; Lin, P.; Hwang, R. Preventing the attempts of abusing cheap-hosting Web-servers for monetization attacks. arXiv 2019. [Google Scholar] [CrossRef]
  23. Wang, S.; MacMillan, K.; Schaffner, B.; Feamster, N.; Chetty, M. A First Look at the Consolidation of DNS and Web Hosting Providers. arXiv 2021. [Google Scholar] [CrossRef]
  24. Khare, S.; Badholia, A. Analysis of Cloud and Self-Web-Hosting Services Based on Security Parameters. Int. J. Inf. Syst. Model. Des. 2022, 13, 1–14. [Google Scholar] [CrossRef]
  25. Kasturi, R.P.; Sun, Y.; Duan, R.; Alrawi, O.; Asdar, E.; Zhu, V.; Kwon, Y.; Saltaformaggio, B. TARDIS: Rolling Back the Clock on CMS-Targeting Cyber Attacks. In Proceedings of the IEEE Symposium on Security and Privacy, SP, San Francisco, CA, USA, 18–21 May 2020; pp. 1156–1171. [Google Scholar] [CrossRef]
  26. Fett, D.; Küsters, R.; Schmitz, G. The Web SSO Standard OpenID Connect: In-depth Formal Security Analysis and Security Guidelines. In Proceedings of the 30th IEEE Computer Security Foundations Symposium, CSF, Santa Barbara, CA, USA, 21–25 August 2017; pp. 189–202. [Google Scholar] [CrossRef]
  27. Mannes, E.; Maziero, C. Naming Content on the Network Layer: A Security Analysis of the Information-Centric Network Model. ACM Comput. Surv. 2019, 52, 44:1–44:28. [Google Scholar] [CrossRef]
  28. Bangera, P.; Gorinsky, S. Ads versus regular contents: Dissecting the web hosting ecosystem. In Proceedings of the Networking Conference, IFIP Networking and Workshops, Stockholm, Sweden, 12–16 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–9. [Google Scholar] [CrossRef]
  29. Li, Z.; Zhang, K.; Xie, Y.; Yu, F.; Wang, X. Knowing your enemy: Understanding and detecting malicious web advertising. In Proceedings of the ACM Conference on Computer and Communications Security, CCS, Raleigh, NC, USA, 16–18 October 2012; pp. 674–686. [Google Scholar] [CrossRef]
  30. Lindkvist, R.; Petersson, L.; Bruhner, C.M.; Hasselquist, D.; Arlitt, M.; Carlsson, N. Characterizing the Trust Dilemma: Comparing Web Security of Malicious and Benign Domains. In Proceedings of the 2025 9th Network Traffic Measurement and Analysis Conference (TMA), Copenhagen, Denmark, 10–13 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–4. [Google Scholar]
  31. Figueras-Martín, E.; Magán-Carrión, R.; Boubeta-Puig, J. Drawing the web structure and content analysis beyond the Tor darknet: Freenet as a case of study. J. Inf. Secur. Appl. 2022, 68, 103229. [Google Scholar] [CrossRef]
  32. Samarasinghe, N.; Kapoor, P.; Mannan, M.; Youssef, A. No salvation from trackers: Privacy analysis of religious websites and mobile apps. In Proceedings of the International Workshop on Data Privacy Management, Copenhagen, Denmark, 26–30 September 2022; Springer: Cham, Switzerland, 2022; pp. 151–166. [Google Scholar]
  33. Chen, H.; Pu, Y.; Atkin, D. Migration stress, risky Internet uses, and scam victimization: An empirical study among Chinese migrant workers. Telemat. Inform. 2023, 83, 102022. [Google Scholar] [CrossRef]
  34. Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, L.K.; Perez-Meana, H.M.; Portillo-Portillo, J.; Olivares-Mercado, J. Methodological Approach for Identifying Websites with Infringing Content via Text Transformers and Dense Neural Networks. Future Internet 2023, 15, 397. [Google Scholar] [CrossRef]
  35. Mohaisen, A.; Alrawi, O.; Mohaisen, M. AMAL: High-fidelity, behavior-based automated malware analysis and classification. Comput. Secur. 2015, 52, 251–266. [Google Scholar] [CrossRef]
  36. Noroozian, A.; Rodríguez, E.; Lastdrager, E.; Kasama, T.; van Eeten, M.; Gañán, C. Can ISPs Help Mitigate IoT Malware? A Longitudinal Study of Broadband ISP Security Efforts. In Proceedings of the IEEE European Symposium on Security and Privacy, EuroS&P, Virtual, 6–10 September 2021; pp. 337–352. [Google Scholar] [CrossRef]
  37. Fryer, H.; StallaBourdillon, S.; Chown, T. Malicious web pages: What if hosting providers could actually do something. Comput. Law Secur. Rev. 2015, 31, 490–505. [Google Scholar] [CrossRef]
  38. Liao, X.; Liu, C.; McCoy, D.; Shi, E.; Hao, S.; Beyah, R.A. Characterizing Long-tail SEO Spam on Cloud Web Hosting Services. In Proceedings of the 25th International Conference on World Wide Web, ACM, Montréal, QC, Canada, 11–15 April 2016; pp. 321–332. [Google Scholar] [CrossRef]
  39. Tajalizadehkhoob, S.; van Goethem, T.; Korczynski, M.; Noroozian, A.; Böhme, R.; Moore, T.; Joosen, W.; van Eeten, M. Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting. In Proceedings of the SIGSAC Conference on Computer and Communications Security, ACM, Dallas, TX, USA, 30 October–3 November 2017; pp. 553–567. [Google Scholar] [CrossRef]
  40. Reliable IP Ddress DATA, 2022. Available online: https://ipdata.co/about.html (accessed on 14 December 2022).
  41. IP Address Lookup Tools, 2023. Available online: https://en.ipshu.com/ (accessed on 19 January 2023).
  42. Alqadhi, M.; Alkinoon, M.; Lin, J.; Abdalaal, A.; Mohaisen, D. Entangled Clouds: Measuring the Hosting Infrastructure of the Free Contents Web. In Proceedings of the ACM on Cloud Computing Security Workshop, CCSW, Copenhagen, Denmark, 26 November 2023; pp. 75–87. [Google Scholar] [CrossRef]
  43. Raponi, S.; Pietro, R.D. A Longitudinal Study on Web-Sites Password Management (in)Security: Evidence and Remedies. IEEE Access 2020, 8, 52075–52090. [Google Scholar] [CrossRef]
  44. Kondakci, S. A concise cost analysis of Internet malware. Comput. Secur. 2009, 28, 648–659. [Google Scholar] [CrossRef]
  45. Schulz, M.; Pieper, M. Web Compliance Management: Barrier-Free Websites Just by Simply Pressing the Button? Accessibility and the Use of Content-Management-Systems. In Proceedings of the Universal Access in Ambient Intelligence Environments, 9th ERCIM Workshop on User Interfaces for All, Königswinter, Germany, 27–28 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 419–426. [Google Scholar] [CrossRef]
  46. Vasek, M.; Weeden, M.; Moore, T. Measuring the Impact of Sharing Abuse Data with Web Hosting Providers. In Proceedings of the Workshop on Information Sharing and Collaborative Security, ACM, Vienna, Austria, 24 October 2016; pp. 71–80. Available online: http://dl.acm.org/citation.cfm?id=2994548 (accessed on 22 December 2025).
  47. Mirheidari, S.A.; Arshad, S.; Khoshkdahan, S.; Jalili, R. Two novel server-side attacks against log file in Shared Web Hosting servers. In Proceedings of the 7th International Conference for Internet Technology and Secured Transactions, ICITST, London, UK, 10–12 December 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 318–323. Available online: https://ieeexplore.ieee.org/document/6470968/ (accessed on 22 December 2025).
  48. Mirheidari, S.A.; Arshad, S.; Khoshkdahan, S.; Jalili, R. A Comprehensive Approach to Abusing Locality in Shared Web Hosting Servers. arXiv 2018. [Google Scholar] [CrossRef]
  49. Everman, B.; Zong, Z. GreenWeb: Hosting High-Load Websites Using Low-Power Servers. In Proceedings of the Ninth International Green and Sustainable Computing Conference, Pittsburgh, PA, USA, 22–24 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
  50. Huynh, T.T.; Nguyen, T.D.; Nguyen, N.T.H.; Tan, H. Privacy-Preserving for Web Hosting. In Proceedings of the Industrial Networks and Intelligent Systems—6th EAI International Conference, Proceedings of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Hanoi, Vietnam, 27–28 August 2020; Springer: Cham, Switzerland, 2020; Volume 334, pp. 314–323. [Google Scholar] [CrossRef]
Figure 1. Per-category distribution of the FCWs vs. PCWs.
Figure 1. Per-category distribution of the FCWs vs. PCWs.
Electronics 15 00497 g001
Figure 2. The data enumeration and feature extraction, along with steps leading to the final distribution of websites.
Figure 2. The data enumeration and feature extraction, along with steps leading to the final distribution of websites.
Electronics 15 00497 g002
Figure 3. The spatial country-level distribution of (a) small, (b) medium, and (c) large networks hosting FCWs and PCWs.
Figure 3. The spatial country-level distribution of (a) small, (b) medium, and (c) large networks hosting FCWs and PCWs.
Electronics 15 00497 g003
Table 1. Comparison of Related Work and the Scope of Our Study. ✓ indicates the feature is present whereas ✗ indicates the feature is absent in the related work.
Table 1. Comparison of Related Work and the Scope of Our Study. ✓ indicates the feature is present whereas ✗ indicates the feature is absent in the related work.
WorkFCWsContentProtocolNetworkCloudLocation
Alabduljabbar et al. [1,3]
Alqadhi et al. [4]
Roy et al. [7]
Kontaxis et al. [9]
Li et al. [29]
Lindkvist et al. [30]
Figueras-Martín [31]
Samarasinghe et al. [32]
Chen et al. [33]
Hernandez-Suarez et al. [34]
Noroozian et al. [36]
Wickramasinghe et al. [18]
Fryer et al. [37]
Liao et al. [38]
Tajalizadehkhoob et al. [39]
Wang et al. [23]
Our Work
Table 2. Network scales and their characteristics. The network size is represented by each slash bit of the CIDR notation, where the decimal number after the slash character represents the number of bits in the network prefix of the IP address. The maximum slash bit is 32 (IPv4). x represents the number of bits and y represents the number of addresses.
Table 2. Network scales and their characteristics. The network size is represented by each slash bit of the CIDR notation, where the decimal number after the slash character represents the number of bits in the network prefix of the IP address. The maximum slash bit is 32 (IPv4). x represents the number of bits and y represents the number of addresses.
ScaleBits in CIDR# Addresses
Small (SN) / 24 < x / 32 2 8 > y 2 0
Medium (MN) / 16 < x / 24 2 16 > y 2 8
Large (LN) / 8 < x / 16 2 24 > y 2 16
Very Large (VLN) / 0 < x / 8 2 32 > y 2 24
Table 3. Comparison of Free and Premium Content Websites (FCWs + PCWs) and General Websites across different network scales using website count (#), percentage (%), malicious count (MC), malicious count per feature percentage (MPFP), and percentage of malicious websites among all websites (MP).
Table 3. Comparison of Free and Premium Content Websites (FCWs + PCWs) and General Websites across different network scales using website count (#), percentage (%), malicious count (MC), malicious count per feature percentage (MPFP), and percentage of malicious websites among all websites (MP).
Free and Premium Content WebsitesGeneral Websites
Scale#%MCMPFPMPScale#%MCMPFPMP
Small382.52718.421.06Small00000
Medium127184.2344034.6229.16Medium162679.05804.923.89
Large19913.193216.082.12Large43020.90122.790.58
Very Large10.0700.000.00Very Large10.0500.000.00
Total150910047931.7431.74Total2057100924.474.47
Table 4. An overview of the distribution per category (FCWs vs. PCWs, books, games) across different network scales.
Table 4. An overview of the distribution per category (FCWs vs. PCWs, books, games) across different network scales.
(a)(b)(c)
Free Content WebsitesFree Books WebsitesFree Games Websites
Network#%MCMPFPMPNetwork#%MCMPFPMPNetwork#%MCMPFPMP
Small263.30623.080.76Small75.00228.571.39Small11.2800.000.00
Medium70289.0929742.3137.69Medium12385.003931.7127.08Medium7697.445065.7964.10
Large607.611626.672.03Large1410.00214.291.39Large11.2800.000.00
Very Large00.0000.000.00Very Large00.0000.000.00Very Large00.0000.000.00
Total78810031940.4840.48Total1441004329.8629.86Total781005064.1064.10
Premium Content WebsitesPremium Books WebsitesPremium Games Websites
Small121.661.008.330.14Small42.0900.000.00Small21.8000.000.00
Medium56978.9214325.1319.83Medium15380.104630.0724.08Medium9585.593334.7429.73
Large13919.281611.512.22Large3317.28721.213.66Large1412.61214.291.80
Very Large10.1400.000.00Very Large10.5200.000.00Very Large00.0000.000.00
Total72110016022.1922.19Total1911005327.7527.75Total1111003531.5331.53
Table 5. The distribution per category (movies, music, and software) of FCWs vs. PCWs across different network scales.
Table 5. The distribution per category (movies, music, and software) of FCWs vs. PCWs across different network scales.
(a)(b)(c)
Free Movies WebsitesFree Music WebsitesFree Software Websites
Network#%MCMPFPMPNetwork#%MCMPFPMPNetwork#%MCMPFPMP
Small51.6100.000.00Small11.2500.000.00Small126.82433.332.27
Medium28491.617526.4124.19Medium7290.003143.0638.75Medium14783.5210269.3957.95
Large216.77733.332.26Large78.7500.000.00Large179.66741.183.98
Very Large00.0000.000.00Very Large00.0000.000.00Very Large00.0000.000.00
Total310100.008226.4526.45Total801003138.7538.75Total17610011364.2064.20
Premium Movies WebsitesPremium Music WebsitesPremium Software Websites
Small21.32150.000.66Small11.1600.000.00Small31.1000.000.00
Medium11575.662118.2613.82Medium6676.741218.1813.95Medium14077.353122.1417.13
Large3523.0312.860.66Large1922.09315.793.49Large3820.9937.891.66
Very Large00.0000.000.00Very Large00.0000.000.00Very Large00.0000.000.00
Total152100.002315.1315.13Total861001517.4417.44Total1811003418.7818.78
Table 6. The distribution of the CSPs of FCWs and PCWs across the small (SN), medium (MN), large (LN), and very large (VLN) network scales.
Table 6. The distribution of the CSPs of FCWs and PCWs across the small (SN), medium (MN), large (LN), and very large (VLN) network scales.
Networks Distribution Over CSPs
CSP # SN MN LN VLN
Cloudflare410041000
Amazon24001211190
Liquid7207200
Trellian4204200
Google41028130
LeaseWeb3703700
Sp-Team3503500
Akamai3303300
Fastly2602600
Microsoft2102190
Others55238465481
Total15093812711991
%1002.5284.2313.190.07
Table 7. The distribution of the hosting countries (Alpha-3) of FCWs and PCWs across different network scales (SN, MN, LN, and VLN).
Table 7. The distribution of the hosting countries (Alpha-3) of FCWs and PCWs across different network scales (SN, MN, LN, and VLN).
Networks Distribution over CSPs
Country#SNMNLNVLN
USA884247181411
BEL9909900
NLD9509500
DEU8947870
AUS4804620
FRA3512860
CHN3312660
GBR3161870
CAN2401860
IRL22012100
Others1492133140
Total15093812711991
%1002.5284.2313.190.07
Table 8. The country-level distribution of commonly used CSPs, with the count per provider (#) and the count per country. The names are coded using Alpha-3.
Table 8. The country-level distribution of commonly used CSPs, with the count per provider (#) and the count per country. The names are coded using Alpha-3.
Cloud Service Providers Distribution Over Countries
CSP#USABELNLDDEUAUSFRACHNGBRCANIRLOther
Cloudflare410296981001301001
Amazon240191000310512019
Liquid72720000000000
Trellian42000042000000
Google41341400000002
LeaseWeb37203410000000
Sp-Team35000350000000
Akamai33102820010010
Fastly261201120000010
Microsoft21160000002111
Others552260027501213223220116
Total1509884999589483533312422149
%10058.586.566.305.903.182.322.192.051.591.469.87
Table 9. An overview of the distribution of the (top-1M, FCWs, and PCWs) across different cloud service providers.
Table 9. An overview of the distribution of the (top-1M, FCWs, and PCWs) across different cloud service providers.
(a)(b)(c)
General WebsitesFree Content WebsitesPremium Content Websites
CSP#%MCMPFPMPCSP#%MCMPFPMPCSP#%MCMPFPMP
Cloudflare33716.38144.150.68Cloudflare26633.7617164.2921.70Amazon18625.80168.602.22
Amazon22410.8652.330.24Liquid Web678.503247.764.06Cloudflare14419.9711076.3915.26
Google954.62000Amazon546.851527.781.90Akamai324.4426.250.28
OVH723.5045.560.19Trellian425.331023.811.27Google304.163100.42
Hetzner Online562.7235.360.15LeaseWeb364.571027.781.27Fastly233.19000
Microsoft422.0412.380.05Sp-Team354.441028.571.27Microsoft182.50211.110.28
Liquid Web391.90923.080.44Bodis172.16423.530.51Sp-Shopify121.661083.331.39
Automattic361.75000SEDO GmbH131.65215.380.25Ebay81.11000
Alibaba291.4113.450.05OVH111.40436.360.51Wal-Mart81.11112.500.14
Digitalocean271.3127.410.10Google111.40436.360.51Ovh70.97114.290.14
Others110053.48534.822.58Others23629.955724.157.23Others25335.09155.932.08
Total2057100924.474.47Total78810031940.4840.48Total72110016022.1922.19
Table 10. An overview of the distribution per category (combined, books, and games) across different cloud service providers.
Table 10. An overview of the distribution per category (combined, books, and games) across different cloud service providers.
(a)(b)(c)
CombinedFree Books WebsitesFree Games Websites
CSP#%MCMPFPMPCSP#%MCMPFPMPCSP#%MCMPFPMP
Cloudflare41027.1728168.5418.62Cloudflare3927.082871.7919.44Cloudflare4253.853992.8650
Amazon24015.903112.922.05Amazon117.6419.090.69Mivocloud56.41000
Liquid Web724.773244.442.12Liquid Web106.943302.08LeaseWeb33.85266.672.56
Trellian422.781023.810.66Trellian64.17000Liquid Web33.8531003.85
Google412.72717.070.46Sp-Team53.47000Amazon22.561501.28
LeaseWeb372.451027.030.66Others7350.6968.224.17Others2329.49521.746.41
Sp-Team352.321028.570.66Total1441004329.8629.86Total781005064.1064.10
Akamai332.1926.060.13Premium Books WebsitesPremium Games Websites
Fastly261.72000Amazon4121.4737.321.57Cloudflare3733.332978.3826.13
Microsoft211.39314.290.20Cloudflare4020.94328016.75Amazon2219.82313.642.70
Ovh181.19527.780.33Google94.71111.110.52Akamai119.91000
Bodis171.13423.530.27Sp-Shopify84.196753.14Fastly54.50000
Linode130.86538.460.33Fastly52.62000Google43.60000
Others50433.407915.685.24Others8846.076472.7333.51Others3228.8339.382.70
Total150910047931.7431.74Total1911005327.7527.75Total1111003531.5331.53
Table 11. An overview of the distribution by category (movies, music, and software) across various cloud service providers.
Table 11. An overview of the distribution by category (movies, music, and software) across various cloud service providers.
(a)(b)(c)
Free Movies WebsitesFree Music WebsitesFree Software Websites
CSP#%MCMPFPMPCSP#%MCMPFPMPCSP#%MCMPFPMP
Cloudflare8326.771720.485.48Cloudflare2227.501672.7320Cloudflare8045.457188.7540.34
Liquid Web3611.611130.563.55Sp-Team67.50116.671.25Amazon1910.80736.843.98
Trellian309.68826.672.58Google451251.25Liquid Web169.091381.257.39
Sp-Team247.74937.502.90Amazon33.75133.331.25LeaseWeb116.25327.271.70
Amazon196.13526.321.61Liquid Web22.5021002.50Voxility LLP42.27000
Others11838.063227.1210.32Others4353.751023.2612.50Others4626.141941.3010.80
Total3101008226.4526.45Total801003138.7538.75Total17610011364.2064.20
Premium Movies WebsitesPremium Music WebsitesPremium Software Websites
Amazon5636.8411.790.66Amazon3034.883103.49Amazon3720.44616.223.31
Cloudflare1811.841583.339.87Cloudflare1213.9597510.47Cloudflare3720.442567.5713.81
Akamai95.92111.110.66Fastly44.65000Microsoft94.97000
Google74.61228.571.32Google44.65000Akamai63.31000
Fastly53.29000Apple22.33000Google63.31000
Others5737.5047.022.63Others3439.5338.823.49Others8647.5133.491.66
Total1521002315.1315.13Total861001517.4417.44Total1811003418.7818.78
Table 12. The scans of both FCWs and PCWs with https://www.virustotal.com/gui/home/upload demonstrate the variations in their levels of maliciousness over time. The website count (#), percentage (%), malicious count (MC), malicious count per feature percentage (MPFP), and percentage of malicious websites among all websites (MP) for each category.
Table 12. The scans of both FCWs and PCWs with https://www.virustotal.com/gui/home/upload demonstrate the variations in their levels of maliciousness over time. The website count (#), percentage (%), malicious count (MC), malicious count per feature percentage (MPFP), and percentage of malicious websites among all websites (MP) for each category.
(a)(b)(c)
Overall Content WebsitesFree Content WebsitesPremium Content Websites
Category#MCMPOMPDiffCategory#MCMPOMPDiffCategory#MCMPOMPDiff
Books3355014.9328.7−13.77Books1444531.4729.86+1.61Books19152.6227.57−24.95
Games1896333.3344.97−11.64Games785773.0864.1+8.98Games11165.4131.53−26.12
Movies46218139.1822.73+16.45Movies31017355.8126.45+29.36Movies15285.2615.13−9.87
Music1663219.2827.71−8.43Music803037.538.75−1.25Music8622.3317.44−15.11
Software35714440.3241.18−0.86Software17613476.1464.2+11.94Software181105.5218.78−13.26
Overall150947031.1531.75−0.6Overall78843955.7140.48+15.23Overall721314.322.19−17.89
Table 13. The network size analysis across FCWs and PCWs.
Table 13. The network size analysis across FCWs and PCWs.
(a)(b)(c)
Overall Content WebsitesFree Content WebsitesPremium Content Websites
Network#%MCMPFPMPNetwork#%MCMPFPMPNetwork#%MCMPFPMP
VLN10.07000VLN00000VLN10.14000
LN19913.193919.62.58LN607.6133554.19LN13919.2864.320.83
MN127184.2341532.6527.5MN70289.0939055.5649.49MN56978.92254.393.47
SN382.521642.111.06SN263.31661.542.03SN121.66000
Total150910047031.1531.15Total78810043955.7155.71Total721100314.34.3
Table 14. An overview of FCWs and PCWs frequency over their hosting CSPs.
Table 14. An overview of FCWs and PCWs frequency over their hosting CSPs.
(a)(b)(c)
Overall Content WebsitesFree Content WebsitesPremium Content Websites
CPS#%MCMPFPMPCPS#%MCMPFPMPCPS#%MCMPFPMP
Cloudflare41027.1718946.112.52Cloudflare26633.7618268.4223.10Amazon18625.863.230.83
Amazon24015.93313.752.19Liquid Web678.53856.724.82Cloudflare14419.9774.860.97
Liquid Web724.773852.782.52Amazon546.8527503.43Akamai324.44000
Trellian422.782559.521.66Trellian425.332559.523.17Google304.16413.330.55
Google412.72921.950.6Leaseweb364.571336.111.65Fastly233.19000
Leaseweb372.451335.140.86Sp-Team354.441542.861.9Microsoft182.515.560.14
Sp-Team352.321542.860.99Bodis172.16952.941.14Sp-Shopify121.66000
Akamai332.19000SEDO131.65969.231.14Ebay81.11000
Fastly261.72000Ovh111.4763.640.89Wal-Mart81.11000
Microsoft211.3929.520.13Google111.4545.450.63Ovh70.97000
Others55236.5814626.459.68Others23629.9510946.1913.83Others25335.09135.141.8
Total150910047031.1531.15Total78810043955.7155.71Total721100314.304.30
Table 15. An overview of FCWs and PCWs frequency over their top ten hosting countries. The names are coded using Alpha-3, where GBR stands for the United Kingdom, which includes Northern Ireland.
Table 15. An overview of FCWs and PCWs frequency over their top ten hosting countries. The names are coded using Alpha-3, where GBR stands for the United Kingdom, which includes Northern Ireland.
(a)(b)(c)
Overall Content WebsitesFree Content WebsitesPremium Content Websites
Country#%MCMPFPMPCountry#%MCMPFPMPCountry#%MCMPFPMP
USA88458.5824027.1515.9USA39950.6321854.6427.66USA48567.27224.543.05
BEL996.566565.664.31BEL8811.176573.868.25NLD405.55000
NLD956.32728.421.79DEU749.393445.954.31CHA283.88621.430.83
DEU895.93438.22.25NLD556.982749.093.43GBR223.0529.090.28
AUS483.182552.081.66AUS425.332559.523.17IRL212.91000
GBR392.581435.90.93FRA202.5413651.65CAN162.22160.14
FRA352.321337.140.86GBR172.161270.591.52IND162.22000
CHN332.19721.210.46RUS131.65753.850.89DEU152.08000
CAN241.59520.830.33CAN81.024500.51FRA152.08000
IRL221.46000ROU70.89571.430.63BEL111.53000
Otr.1419.344028.372.65Otr.658.252944.623.68Otr.527.21000
Total150910047031.1531.15Total78810043955.7155.71Total721100314.304.30
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alqadhi, M.; Hussain, M.; Alabduljabbar, A.; Althebeiti, H.; Abdalaal, A.; Mohaisen, M.; Mohaisen, D. Systematic Evaluation of the Infrastructure of Free Content Websites: Network, Cloud, and Country-Level Security Analysis. Electronics 2026, 15, 497. https://doi.org/10.3390/electronics15030497

AMA Style

Alqadhi M, Hussain M, Alabduljabbar A, Althebeiti H, Abdalaal A, Mohaisen M, Mohaisen D. Systematic Evaluation of the Infrastructure of Free Content Websites: Network, Cloud, and Country-Level Security Analysis. Electronics. 2026; 15(3):497. https://doi.org/10.3390/electronics15030497

Chicago/Turabian Style

Alqadhi, Mohammed, Mukhtar Hussain, Abdulrahman Alabduljabbar, Hattan Althebeiti, Ahmed Abdalaal, Manar Mohaisen, and David Mohaisen. 2026. "Systematic Evaluation of the Infrastructure of Free Content Websites: Network, Cloud, and Country-Level Security Analysis" Electronics 15, no. 3: 497. https://doi.org/10.3390/electronics15030497

APA Style

Alqadhi, M., Hussain, M., Alabduljabbar, A., Althebeiti, H., Abdalaal, A., Mohaisen, M., & Mohaisen, D. (2026). Systematic Evaluation of the Infrastructure of Free Content Websites: Network, Cloud, and Country-Level Security Analysis. Electronics, 15(3), 497. https://doi.org/10.3390/electronics15030497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop