Privacy Risks of Cybersquatting Attacks

Kolenbrander, Jack; Rheault, Elliott; Michaels, Alan J.

doi:10.3390/jcp6010038

Open AccessArticle

Privacy Risks of Cybersquatting Attacks

by

Jack Kolenbrander

^1,*

,

Elliott Rheault

² and

Alan J. Michaels

^1,2

¹

Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA

²

National Security Institute, Virginia Tech, Blacksburg, VA 24061, USA

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2026, 6(1), 38; https://doi.org/10.3390/jcp6010038

Submission received: 1 January 2026 / Revised: 6 February 2026 / Accepted: 14 February 2026 / Published: 19 February 2026

(This article belongs to the Special Issue Building Community of Good Practice in Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

Cybersquatting is a collection of methods commonly used by malicious actors to mislead or trick internet users into accessing fraudulent or malicious content. Much of the current research has concentrated on the specific techniques used by attackers in this domain, such as typosquatting, combosquatting, and sound squatting. Some research has explored the financial and time impacts of cybersquatting; however, an understanding of user privacy impacts is limited. Prior research into privacy implications has primarily relied on passive techniques such as analyzing DNS records, HTML content, and domain registrations. These passive approaches limit the ability to interact with these domains and track the downstream impact of sharing personally identifiable information (PII). This research develops an active open-source intelligence (OSINT) collection system capable of rapidly collecting and analyzing squatting domains through both passive and active techniques, with a particular emphasis on identifying those that solicit user information. Synthetic identities are then registered with these domains, and their associated communications are collected and analyzed to identify privacy-related risks and determine whether shared PII propagates.

Keywords:

cybersquatting; typosquatting; OSINT; active OSINT; privacy; personal information; synthetic identities

1. Introduction

Cybersquatting, or domain squatting, is an attack where malicious actors register domains that closely resemble, sound like, or are a common typo of popular legitimate domains [1]. Malicious actors leverage squatting attacks in order to generate a profit, spread malware, or gather information for phishing attacks [2]. In 2024, the top 500 most visited domains were found to have over 30,000 registered typosquatting domains, of which over 10,000 were identified as malicious [3]. Of the various monetization schemes and content hosted at these domains, attacks focused on stealing individual users’ personal and private information have been found to be the most dangerous [4].

Privacy implications are difficult to quantify and investigate, particularly without access to personal information. However, utilizing personal data derived from real individuals presents its own set of ethical challenges and issues. Due to these difficulties, the current body of research focused on understanding the privacy impacts of squatting attacks is limited. As an alternative, active open-source intelligence (OSINT) techniques can be used to emulate and offer up realistic fake personal information that can subsequently be tracked and analyzed, without representing any risk to real users [5]. As described in Section 2, the majority of work related to squatting investigations is focused on thwarting or mitigating attacks.

This paper aims to present an in-depth investigation into the scope of privacy risks associated with squatting attacks. By leveraging the capabilities developed for the Use and Abuse Research Project [6], described in Section 2.5, a unique investigation of the privacy implications of squatting attacks can be performed, while eliminating the ethical concerns associated with utilizing real personal information.

1.1. Motivation

Squatting attacks impact both the corporations hosting the targeted domains as well as the individuals attempting to access those domains. When targeting individuals with squatting attacks, squatters are attempting to generate a profit utilizing advertisements or other methods [7]. On the other hand, when targeting corporations, attackers are attempting to steal traffic and ad-generating revenue from corporations or damage the organization’s reputation [7]. Unlike phishing and pharming spoofing methods, an extensive understanding of the squatting landscape, as well as the development of potential countermeasures, has not been created [8]. Corporations and companies implement defensive techniques such as defensive domain registrations and forwarding to combat squatting attackers [9]. Individuals, however, are largely left defenseless and must remain vigilant to protect themselves and recognize the attack.

Within the squatting landscape, attackers use various methods to target individuals. These attack techniques are detailed in Section 2.1. Figure 1 compares the intended destination domain with the actual domain accessed in each squatting methodology. For example, in a typosquatting attack, a user attempting to visit facebook.com may mistakenly end up on racebook.com due to a typo or misclick. This paper will largely focus on typosquatting iterations of domains, however, the methodology could be applied to any of the squatting methodologies.

Any individual who uses the internet is subject to the threats of typosquatting; however, certain groups of individuals are more susceptible than others. In one survey, it was found that older individuals, along with those who did not have security training, were more susceptible to typosquatting attacks [10]. Estimates suggest that consumers who misspell a popular URL have a 1 in 14 chance of reaching a typosquatted domain, while 10–20% of handwritten domains result in a typo [8].

The specific impact or loss experienced by individuals and corporations can vary depending on the site or situation. Common schemes include the hosting of advertisements, redirection of traffic to third-party websites, distribution of malware, and the collection of data for phishing scams [2]. In 2023, it was estimated that the top 250.com domains experienced a total loss of over $327 million due to the impacts of typosquatting attacks [11]. For these top 250 domains, 28,000 typosquatting iterations were identified, and 84% of those employed some form of pay-per-click advertising structure [11]. From the daily traffic to the top 100,000 websites, it is estimated that over 68.2 million visitors end up on a typosquatted website per day [12]. In the majority of cases, no matter what content is on the website visited, consumers lose time [13].

Apart from monetary impacts, from a corporation standpoint, typosquatting can result in consumer distrust or website traffic loss [14]. Additionally, typosquatting can represent trademark infringement, which can further harm the reputation of corporations [14]. In other cases, typosquatted domains can represent cases of competitor-squatting, where unscrupulous businesses can profit off of redirecting mistyped domains to their own domain [15].

Previous domain misuse and squatting investigations have leveraged techniques such as passive DNS and domain sinkholing. Compared to this study, these investigations aim to investigate broader domain abuse activities, including botnet command and control, spamming, and phishing [16]. Passive DNS investigations perform analysis of real-world DNS traffic to identify and detect malicious domains [16,17]. DNS sinkholing represents a more active approach, where clients attempting to resolve a malicious domain are redirected [18,19]. In contrast, our investigation expands beyond the network layer approaches and directly analyzes the privacy implications of squatting domains through active OSINT techniques.

The current scope of research and work investigating squatting has largely focused on types of squatting, techniques used to carry out those attacks, and potential mitigations or countermeasures to deter said attacks. An in-depth overview of the current body of typosquatting research is provided in Section 2. Within the current research body focused on typosquatting, the study of impacts largely focuses on financial and time-related costs, rather than the impact on individual privacy. Investigations into user harm have primarily examined it in terms of lost time, rather than focusing on privacy-related risks [13].

1.2. Paper Overview

This paper addresses the gap in research focusing on the privacy implications of typosquatting attacks by performing a comprehensive, privacy-focused investigation. In Section 2, an in-depth overview of the current squatting research body will be provided, focusing on the evolution of specific squatting attacks, experiments attempting to further understand squatting attacks, and the development of countermeasures to combat these attacks. Additionally, a brief overview of the Use and Abuse project at Virginia Tech’s National Security Institute and the unique privacy-oriented active OSINT research capabilities provided will be provided in Section 2.5. In Section 3, a detailed description of the experimental process used to research and apply these capabilities to determine the effects of typosquatting on the privacy of individuals is outlined. This includes the selection and identification of potentially squatted domains, the triage of those domains to determine sign up feasibility, and the sign up of fake identities to the experimental set. Section 4 will provide a detailed analysis of the findings and results of the triage process as well as the data collected after signup. Finally, Section 6 will outline key takeaways and conclusions. Additionally, Section 5 outlines opportunities for future work, including expanding the number of investigated domains, automating identity signups, and enabling prolonged, automated interactions with received communications.

2. Literature Review

Cybersquatting attacks emerged with the creation and implementation of the Domain Name System (DNS) in internet architecture and have since evolved to include various techniques that manipulate domain names for malicious purposes [20]. Initially, in a process called domain-squatting, attackers attempted to preemptively register desirable domain names and sell them at a premium to businesses and trademark owners [20]. Squatting attacks have subsequently expanded to include numerous different techniques and processes, exploiting both human and hardware errors [21]. These different attacks are described in greater detail in Section 2.1.

A large focus of research has focused on understanding the expansive nature of squatting attacks and their effect on the DNS landscape. Experimental techniques and processes have varied, but most studies have focused on understanding malicious actors’ goals and the trends in domain selection and creation [2,13,22]. Section 2.2 provides an overview of various squatting experiments. In addition to understanding the space, researchers have strived to develop applications and models to combat and detect squatted domains. Approaches to these solutions vary, ranging from machine learning models focused on detecting squatted domains to browser extensions focused on detecting typos, and even gamified trainings for individuals [8,15,23]. These approaches are further detailed in Section 2.3. An overview of the current scope of research related to the privacy impacts of squatting is provided in Section 2.4. Finally, an introduction to the architecture of the Use and Abuse project, along with the privacy-oriented active OSINT capabilities developed for it, will be presented in Section 2.5.

2.1. Overview of Squatting Techniques

Squatting has come to represent a class of attacks that exploit the DNS process to mislead users into accessing potentially malicious websites at domains that closely resemble the ones they intended to visit. Common techniques include typosquatting, combosquatting, sound squatting, homograph squatting, bitsquatting, and email squatting. As shown in Figure 1, each of these methods involves modifying the base domain in some form. Using one of these techniques, actors generate and register domains to host their own content and then count on users to unintentionally access their domain.

One of the more common techniques used by attackers is typosquatting, which focuses on the likelihood that an individual will mistype or misjudge an illegitimate domain as the real one [21]. The majority of typosquatted domains rely on users adding, subtracting, or mistyping one letter when attempting to access a legitimate domain, otherwise known as a Damerau-Levenshtein distance of one [24]. This concept has come to be coined the “fat-finger distance,” representing a user’s likelihood to accidentally strike a nearby key when typing on a keyboard [25]. An example of a domain in this category is a user accidentally typing fscebook.com rather than facebook.com.

Combosquatting is another squatting technique that involves creating domains by combining well-known company or organization names with believable keywords [26]. An example for facebook.com could be a domain registered as facebook-friends.com [22]. The purpose of combosquatting is to trick or mislead users by creating contextual and believable domain variations based on legitimate domains. Combosquatting and typosquatting represent the two most popular squatting techniques used by attackers [22]. Although squatting research primarily focuses on the English QWERTY keyboard layout, studies have also shown that attacks target non-English keyboards and languages [27].

Another form of squatting attack is soundsquatting, which takes advantage of similar-sounding words, otherwise known as homophones, within domain names [28]. Soundsquatting is not as popular as other squatting techniques, but is becoming more critical as the usage of virtual assistants and smart devices increase [29,30]. Users are more likely to access these websites as a result of misinterpretations by smart virtual assistants and platforms. In addition, these attacks are prevalent in multi-language scenarios, where an individual’s language, accents, and comprehension abilities are critical [31].

Homograph squatting is a technique that relies on utilizing visually similar characters when creating domains in order to visually deceive users [32]. This type of attack exploits characters from other languages that have different ASCII values, but visually resemble those of a legitimate domain [33]. Documented attacks have utilized Russian, Latin, Cyrillic, and other language characters to target well-known domains [33]. For example, an attacker might register ‘facebook.com’ using the Cyrillic character ‘a’ (U+0430) to replace the Latin ‘a’ (U+0061) in ‘facebook.com’. While the domains appear visually identical, they are registered as two completely distinct domains.

A more challenging and randomized attack harnesses unavoidable errors in memory and routing infrastructure. This type of attack, known as bitsquatting, relies on random bit errors in device memory to redirect users from legitimate domains to the registered malicious one [9]. More specifically, in the process of transmitting data from the user interface, network errors or hardware faults can result in bit flips occurring during the DNS process, resulting in misdirected requests [20]. For example, a single bit flip in ‘google.com’ might result in the ‘g’ (binary 1100111) changing to ‘c’ (binary 1100011), creating ‘coogle.com’. Attackers could then register ‘coogle.com’ in an effort to capitalize on the domain misrouting. There are several potential solutions to address bitsquatting, including ECC memory, CRC checks, and DNSSEC protocols. However, these solutions are not widely implemented and would require significant time and financial investment to deploy at scale [9].

Squatting attacks also present themselves in non-DNS applications like email domains. This attack aims to capture emails where a user mistypes the recipient email address or mistype their own address when registering for a website or service [24]. In other cases, email squatting can take advantage of typos in SMTP configurations within email clients [24]. An example of an attack in this format may be a user typing gmaol.com rather than gmail.com.

A summary overview of each squatting technique described, as well as relevant citations, is provided in Table 1. Overall, there are a variety of techniques that are commonly employed by attackers when carrying out cybersquatting attacks.

2.2. Categories of Typosquatting Research

Most research within the squatting space focuses on further understanding squatting attacks’ targeting, generation, purpose, and impacts. A better understanding of the squatting space enables researchers and organizations to create effective countermeasures and solutions. To achieve a better understanding, squatting experiments typically focus on one of three categories:

1.: Understand the motives of squatting actors (distinguish between profit-driven, politically motivated, or maliciously driven actors)
2.: Survey and understand the current scope of squatting attacks
3.: Perform active OSINT through registration of squatting domains

As mentioned previously, the typical goal of squatting is to generate a profit or to disseminate malware and phishing content via the content hosted at a squatted domain. Four main methodologies were discovered through a survey of squatted domain websites [4]. Thirty-three percent of the websites attempted to generate profit through advertisements, 17% were involved in affiliate abuse schemes, 13% violated trademarks, and 7%, considered to be the most harmful category, were phishing or scam websites designed to steal personal information [4]. With the wide range of potential motives, researchers have attempted to formulate methods to quantify the harm caused by squatting attacks as a function of time loss [13].

In addition to understanding motives and goals, it is crucial to understand the trends and scale of the squatting landscape so that mitigation efforts can be tailored and targeted effectively. Researchers have performed targeted analysis and surveys of DNS records and squatted domains to better understand the squatting field. One survey, focused on combosquatting domains, analyzed over 400 billion DNS records, and determined that the majority of cases only add a single character to the legit domain to create the malicious domain [26]. Additionally, the majority of combosquatted domains were found to exist for extended periods without suffering from remediation efforts [26]. Researchers who tracked the registrations of bitsquatted domains for the 500 most popular websites found 5366 different domains over a 270 day period [9]. Additionally, throughout their experiment, the number of active registrations increased by 46% [9]. Another large-scale survey of 8255 typosquatting URLs found 8828 different malicious pop-up messages that attempted to mislead visitors into downloading malicious content or sharing their private information [34].

A more active method to understanding the scope of squatting has also been utilized, where researchers create and register a set of squatted domains and monitor the traffic of those domains. Researchers who registered 76 misspelled variations of popular email domains found that the traffic received by squatted domains is largely influenced by the popularity of the legitimate domain and the degree of similarity between the squatted and legitimate domains [24]. Based on their data, they predict that five domains (gmail.com, hotmail.com, outlook.com, comcast.com, and verizon.com), targeted by 1211 typosquatted email domain registrations, would receive between 22,577 and 905,174 emails per year due to user typos [24].

2.3. Squatting Attack Counter Measures

To attempt to mitigate and decrease the number of typo squatting incidents, researchers have developed and proposed a variety of solutions. Proposed solutions range from AI/ML models to predict the targeted iterations of domains to browser extensions to detect user typos [8,35]. Squatting is used to facilitate identity theft, financial fraud, and malware distribution; therefore, mitigating this threat is crucial to protecting the average individual using the Internet [36].

Many of the countermeasures developed and proposed seek to leverage machine learning and AI capabilities to detect squatted domains. Similar investigations have sought to leverage machine learning to detect phishing websites, utilizing techniques such as semantic feature extraction and mutual information-based classification [37,38]. These approaches share a similar methodology with domain squatting detection, especially in URL analysis and feature-based classification techniques. Approaches to detecting potentially malicious DNS queries can be divided into two process categories, query level approaches and traffic level approaches, each having a set of features that can be focused on to detect irregularities [35]. Additionally, AI/ML solutions are typically divided into one of two methodologies: employing AI/ML for the detection of squatted domains or employing AI/ML to generate the likely set of targeted domains so that they can be defensively registered. Defensive registration is the process of domain owners purchasing and registering similar domain names so that malicious actors cannot register the domains themselves [39].

Determining the features to prioritize when training a machine learning (ML) model to detect phishing or squatting domains is crucial. Current research identifies four main feature categories: URL-based features, domain-based features, page-based features, and content-based features [40]. One implementation utilizes an ensemble learning classifier model based on five classification algorithms: K-Nearest Neighbors (K-NN), C4.5 Algorithm, Left-to-Right (LR), Naive Bayes (NB), and Support Vector Machine (SVM) [35]. This model was capable of achieving an 88.4% accuracy and 85.5% precision rating based on 8 key features of domain names, including domain length, unique letters/numbers, and ratios of character types [35]. Another approach trains and compares different machine-learning classifiers utilizing a dataset of known phishing URLs so that it can be applied to detect malicious domains, with the most successful classifier achieving an accuracy of 98.03% [29]. A potential limitation of some machine learning approaches is that they only focus on one squatting domain. This limitation is resolved through one approach, which employs and compares large language models to detect squatting attacks [36]. Through the application of the Llama-3-70B language model on a dataset of 1649 squatting domains, with curated prompts consisting of squatting domain examples and reference domains, this approach achieved 94.7% accuracy [36]. In real-world application, the system detected 34,359 squatting domains from 2.09 million new domains. Adversaries are constantly developing methods to avoid detection by ML algorithms, which has been coined the “evasion space”, through techniques such as HTML and URL manipulation [41]. Successful avoidance can significantly decrease the effectiveness of ML detection of phishing domains and techniques.

One solution researchers have pursued from a corporation standpoint is to analyze network traffic datasets to determine the most likely set of typo errors [15]. By training a random forest regressor model utilizing features from these datasets, researchers were able to achieve 95.7% accuracy in predicting likely iterations of domains, which organizations can then defensively register [15]. One tool, TypoWriter, utilizes a Recurrent Neural Network (RNN) to generate and predict the most probable set of typo-error domains for organizations to defensively register [42]. Alternative approaches are tailored towards specific squatting domains. A transformer neural network has been utilized to predict sound squatting domains for multi-language scenarios [30]. Using tools like this enables corporations to gain better oversight of squatting cases and domains that might otherwise go unnoticed.

Defensive registration can become extremely costly and complex, as attackers could abuse any number of the countless iterations of domains. This is especially prevalent for small businesses and organizations who may not have the resources to effectively register their squatting domain space. As a result, many organizations employ the services of defensive registrars like MarkMonitor or Com Laude [43]. Another underutilized approach organizations can adopt is the usage of sunrise periods, in which domain registrars will notify trademark owners if a potentially infringing domain is registered, so that organizations can take appropriate actions [43].

Other approaches by researchers are focused on protecting individual users accessing the Internet using their own devices. One example of this is an anti-typo Squatting Tool browser extension that provides real-time suggestions and error detection for users accessing domains on the Internet [8]. The tool itself functions by comparing user domain query inputs to a local database of common, popular website domains [8]. A similar approach aims to detect and prevent users from inputting sensitive information into untrustworthy websites or sources in order to prevent phishing attacks [44]. A different method takes advantage of the Swype keyboard framework, which is a keyboard that allows users to slide their finger from character to character to type words [45]. The TypoSwype tool analyzes Swype pattern images utilizing image recognition algorithms and a convolutional neural network (CNN) to compare entered queries to common queried domains [45].

Increasing users’ ability to detect malicious domains can greatly decrease their likelihood of falling victim, and therefore the threat of squatting attacks. Researchers have created a gamified application with features such as scores and leaderboards to train users to detect and avoid spoofed websites [23].

In addition to technical and gamified approaches, another potential area to address domain squatting is policy-based approaches focusing on domain registration. Domain name registrations are overseen by the Internet Corporation for Assigned Names and Numbers (ICANN), which is responsible for defining policies for domain name registrations [46]. ICANN, however, has received criticism over its lack of enforcement and administration internationally for TLD registrations, with studies indicating the need for verification standards for WHOIS registrations [47]. Additionally, the expansion of generic top-level domain (gTLD) names has been shown to have resulted in an increase in typosquatting attacks targeting legitimate organizations [48]. One potential approach to improve domain name registration verification is requiring that to register domains with certain TLDs, a user must possess a registered business. A more complex approach, which would require reform in the domain registration space, would be to limit the number of registration providers and hold those companies liable for allowing registrations of squatting domains. A third approach, similar to defensive registrations, would be to make domain registrars responsible for blocking registrations that mimic well-known brands and trademarks. In general, policy-based reform could result in significant changes in the domain space that would greatly reduce squatting attacks.

Overall, a variety of countermeasure approaches have been developed and explored. A summary of these countermeasures are provided in Table 2. Machine learning algorithms and applications have demonstrated success in detecting typo domains, however are yet to be adopted and applied. Defensive registration is a practice applied by major corporations, but the sheer number of possible domains combined with constrained resources limit its effectiveness, especially for small businesses. Approaches tailored for user-side interactions lack widespread implementation. Effective applications and detection methods have been demonstrated, but are limited in their mainstream usage.

2.4. Privacy Impacts of Squatting Attacks

The increased digitization of people’s day-to-day lives has resulted in greater privacy and personally identifiable information (PII) related risks. The challenge for individuals and organizations tasked with protecting the PII they collect, however, is that there is no uniform definition or standard for what constitutes PII. OMB Memorandum M-07-16 defines PII as “information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual [49].” This definition leaves room for nuanced scenarios and the requirement for case-by-case assessments to be performed based on the data collected and available [50]. The European Union’s GDPR refers to personal data as information relating to an identified or identifiable natural person, and refers to identifiers as factors such as name, identification number, location data, and factors related to online identifiers [51]. Although both definitions focus on the identification component for privacy data, both leave room for interpretation based on the scenario and type of data that is being collected or processed.

In the context of online transactions, certain categories of PII are more commonly collected when creating accounts or signing up for services. The required information differs based on the type of service an individual is trying to register for. For example, the creation of an account on a blog site typically requires an email address, name, birth date, and password, while an account on a social media website tracks additional information about user activity, interactions, and location [5]. Malicious actors seek to harvest and steal PII to carry out impersonation, identity fraud, and other cyber attacks [52]. One common technique leveraged by attackers to collect PII is creating fraudulent websites that mimic legitimate websites or services. Hosting these websites at squatted domains is particularly effective, as users who mistype a URL or are redirected may unknowingly share their PII with a website they believe to be legitimate. To understand the true extent and impact of squatting attacks on PII collection, researchers must examine not only whether these domains solicit information, but also what happens to that information after it is submitted.

The body of research on the privacy impacts of domain squatting is currently limited. While studies have shown squatted domains being used for PII data collection and phishing schemes, the broader privacy implications remain largely unknown. In an analysis of 40,299 nonlegitimate domains, 174 were found to be conducting phishing attacks [26]. A similar study identified 657,000 domains impersonating 702 popular brands, of which 1175 were found to be squatted domains attempting to carry out phishing attacks [53]. A study on defensive registrations found a moderately positive relationship between phishing attacks targeting company domains and the number of defensive registrations made by the company [43]. Overall, research indicates that squatting techniques can be used to steal PII; however, the broader large-scale impacts remain uncertain. Existing studies have focused primarily on identifying phishing domains rather than investigating what happens to PII after it is collected. This gap underscores the need for active investigative approaches that can track the downstream consequences of sharing personal information with squatted domains.

2.5. Introduction to Use and Abuse

The Use and Abuse (U&A) of Personal Information (PI) project [6,54] at the Virginia Tech National Security Institute is a large-scale research experiment focused on determining how personal information propagates on the internet after signing up to second-party organizations. The U&A project has developed extensive fake identity generation and active OSINT collection capabilities [5,6], allowing researchers to answer privacy-related research questions utilizing realistic fake personal information and identities. The design process and developed capabilities of the U&A project are described in a series of companion papers [5,6,55].

By applying the fake identities and the ability to track communications and information of those identities using the U&A project capabilities, a unique perspective of the squatting landscape can be established. Research has previously identified that the most dangerous squatting websites attempt to steal individuals’ private information [4]; however, the true scope and extent of the danger have yet to be fully investigated. This work seeks to answer these questions by generating and signing up fake identities to a set of predetermined squatting domains, described further in Section 3.

3. Experimental Design and Methodology

The operation and process conducted in this experiment can be broken down into five main steps: website identification and domain selection, alias generation, triage, a sign-up process, and data collection and analysis. Figure 2 provides an overview of the overall process and tasks of each step. The designed process aims to streamline the repetitive, manual components of domain investigations and perform a tailored investigation of privacy impacts through the application of fake identities. The first two steps, domain selection and alias generation, establish a baseline set of likely targeted squatted domains based on popular internet websites. The next step, triage process, performs a systematic investigation of the domain set to collect data and streamline the identification of domains collecting personal information. Fake identities are then deployed to this identified set with the goal of performing a more in-depth analysis of the privacy impacts of these squatted domains. Finally, all data collected is analyzed to identify trends among the squatted domains investigated.

The website identification and domain selection process builds a set of legitimate domains from a series of categories, which are then utilized as the basis for the generation of squatting domains. The alias generation phase creates the set of typosquatting domains from the legitimate domains that will be investigated throughout the experiment. The triage process serves to collect a wide array of information about each typosquatting domain, including the feasibility of providing each domain with a fake identity utilizing the U&A project architecture. Finally, the data collected from the triage and signup processes, as well as any data received by the identities shared to the given domains, was then analyzed in order to form a better understanding of the typosquatting space and its privacy-related impacts.

3.1. Domain Selection and Alias Generation

Multiple domain categories were selected to create the experimental set of 166 legitimate domains. These categories were chosen based on factors such as general internet traffic, commonly accessed sites or areas of business, and general interest of the project researchers, intentionally selecting a few less prevalent domains for comparison purposes. The domain categories for this experiment are as follows:

Top Domains by Traffic (21 Domains) [56]
Top Job Search Sites (22 Domains) [57]
Top Weather Sites (16 Domains) [58]
Top U.S. Government Sites by Traffic (24 Domains) [59]
Intelligence Agencies (18 Domains) [60]
Top Antivirus Software Sites (25 Domains) [61]
Top Travel Websites (21 Domains) [62]
Virginia Tech Domains (19 Domains)

From all categories, a total base set of 166 legitimate domains was composed. To generate the set of domains based on typography, a typography generation domain generation tool [63] was used. The tool accepts a legitimate domain as an input, and generates the set of likely typos “based on a database of common spelling errors and typos as well as the proximity of the characters on a QWERTY keyboard” [63]. For the set of government domains, since attackers are unable to create domains using .gov top-level domain scheme, iterations were created by varying the top-level domains. In total, a set of 3460 typo or error prone domains was generated from the original 166 legit domains. The number of domains generated from each legitimate domain varies based on domain length and complexity. For example, x.com resulted in a set of only 3 typos, whereas facebook.com resulted in 24 typos. In order to collect data and information, each domain generated is thoroughly scanned and researched utilizing the triage process described in the next section.

While the domain set investigated in this paper does not represent an exhaustive sample of Internet domains, the goal of this research is to develop and evaluate a tool and investigative process that streamlines the identification of squatting domains that seek to collect PII. A subset of domains was selected from multiple categories to identify trends and evaluate how well the tool can be applied across a variety of online scenarios. Further planned application of the tool developed in this research to more broadly investigate larger sets of domains is discussed in Section 5.

3.2. Triage Process

As mentioned, the experiment stage is broken down into a series of steps, with the ultimate goal of determining the set of domains that ask for an input of personal information to be provided. A comprehensive analysis of each typosquatting domain, including an investigation of its metadata, such as activity, registration, and certificate information, can yield a significant amount of data. This data, when collected and analyzed for a large number of typosquatting domains, can provide the foundation for formulating conclusions and identifying trends within the typosquatting space.

Manually investigating and collecting data from each of the 3460 typosquatting domains would be highly time-consuming; therefore, a custom Python 3.12.10-based “triage” tool was developed to automate the querying and data collection process. The triage script allows for the automated and streamlined collection of registration, certificate, activity, and other additional data for each domain. Creating such a script allows for a more efficient investigation of each domain with the ability to intervene for failed queries manually. This investigation specifically focused on active PII collection through registration and signup forms, as this represents a deliberate and measurable user action that allows interactions and sharing of PII to be directly traced. Passive collection techniques, such as browser fingerprinting and tracking scripts, represent a potential future expansion of the triage investigation process. Figure 3 presents an overview of the automated steps and processes executed for each domain ingested from the input CSV file. The triage process begins by conducting passive reconnaissance, such as domain connectivity tests and registration data collection, before escalating to active API querying and content scraping. Algorithm 1 provides a pseudocode representation of the triage methodology, highlighting the passive reconnaissance, API queries, and content scraping steps performed for each input domain.

An in-depth profile is created for each domain by completing the above actions. By comparing the information collected across all triaged domains, trends can be identified and analyzed, which will be further discussed in Section 4. The first action, sending a request to the domain using the Python Requests library, identifies if the domain is active and reachable on the internet. This establishes that the domain could be accessed accidentally by an individual. Querying the domain registration information provides detailed information about the organization responsible for the domain, as well as the IP address and name server information [64]. Additionally, the location where the domain’s organization is based can be determined. Querying the domain’s SSL certificate information provides insight into the domain’s identity and security, along with information about the organization responsible for providing the SSL certificate [65]. Table 3 provides a summary of the different tools, APIs, and libraries leveraged by the triage system, as well as their purpose.

Algorithm 1 Domain Triage System

1:: Input: CSV file containing legitimate domains and typosquatting iterations
2:: Output: MongoDB database populated with domain analysis results
3:: ⁣
4:: // Initial Database
5:: Connect to MongoDB and initialize Database
6:: Load PhishTank database into MongoDB
7:: ⁣
8:: for all row in domain CSV do
9:: $r o o t_d o m a i n \leftarrow$ row[’Domain’]
10:: $d o m a i n_i t e r a t i o n \leftarrow$ row[’Domain Iterations’]
11:: ⁣
12:: // Passive Reconnaissance
13:: $d o m a i n_u p \leftarrow$ Send HTTP request (Python Requests)
14:: $d o m a i n_r e g i s t r a t i o n \leftarrow$ Query WHOIS records (Python whois)
15:: $d o m a i n_c e r t \leftarrow$ Query SSL certificate (Python SSL)
16:: ⁣
17:: // Third-Party API Queries
18:: $d o m a i n_f o r_s a l e \leftarrow$ Query GoDaddy API
19:: $d o m a i n_p h i s h_q u e r y \leftarrow$ Search PhishTank database
20:: $v i r u s_t o t a l_a n a l y s i s \leftarrow$ Query VirusTotal API
21:: ⁣
22:: // Content Scraping
23:: $d o m a i n_s c r a p e_c h r o m e \leftarrow$ Scrape with Selenium
24:: Check for signup keywords and CAPTCHA
25:: ⁣
26:: // Data Storage
27:: Create document with all collected data
28:: Insert document into MongoDB

In addition to querying each domain’s registration and certificate details, a series of additional actions are performed utilizing third-party APIs and platforms. The first of these is to query PhishTank’s database [72] of known phishing websites and domains to determine if the URL is cataloged as a known phishing scam. The database contains over 4 million validated phishing domains and is populated by user reports [72]. The second action is querying GoDaddy’s API [71] to determine if the domain is for sale. GoDaddy provides information on the domains availability and price. Finally, the VirusTotal API [73] is used to scan each domain for known malware or viral activity. The VirusTotal API queries over 70+ antivirus products, resulting in a holistic picture for each domain.

The final action that the triage script performs is scraping the domain utilizing Selenium [69]. Selenium was chosen over other web-scraping alternatives because of a few key advantages. The first main advantage was that selenium provides support for rendering and visualizing websites that are based on JavaScript [74]. Selenium also provides support and capabilities for automatic interactions with webpages, which will be discussed in more detail as an opportunity for expansion in Section 5. Finally, Selenium also provides opportunities to mimic different web browsers by changing the webdriver on which the scraping query is based, providing the foundation for an investigation into content variation based on the browser utilized to access the domain. All content for each domain is scraped and scanned for keywords that may indicate sign-up feasibility. These keywords include ‘signup’, ‘login’, ‘sign in’, ‘register’, ‘create account’, ‘profile’, and variations of these terms. If a keyword is found in the source code, the domain is marked as possible for signup. It is possible, however, that a developer could have used a non-standard keyword to refer to their login or alternative authentication method, such as OAuth, which would not be detected by the triage script. To mitigate potential under-detection, domains that presented a CAPTCHA or where automated scraping was inconclusive were manually revisited during the sign-up phase to verify registration feasibility.

After completing the above queries and steps, a document is created and inserted into a MongoDB database for further analysis. The results of each triage step are combined into a unified record that allows for streamlined analysis and post-processing. An example entry after analysis for the facebook.com typosquatting derivative domain faxebook.com is provided in Figure 4. By establishing a uniform JSON object for each domain investigated, the analysis and comparison processes are streamlined and patterns can be identified across large data sets. As mentioned, the first item in the JSON record is the typosquatting domain, faxebook.com, and the root domain that it is derived from, facebook.com. Additionally, each iteration is tracked and numbered, as indicated by the “root_iteration” field. For instance, faxebook.com represents the 11th iteration of facebook.com. The result of attempting to reach the domain is recorded in the “domain_up” field. Additionally, the results of the GoDaddy and PhishTank API queries, along with the domain registration and certificate queries, are also recorded. The final entry is the results of scraping the domain and scanning for captcha requirements or signup feasibility.

3.3. Triage Tool Process

As mentioned previously, the triage script, which was created with Python, performs a series of actions for each domain provided to it by an input CSV file. Each row of the input CSV file consists of pairs of the legitimate domain and the corresponding typo generated domain. Figure 3 provides a high level overview of the series of steps performed by the triage script.

Querying domain registration and certificate information is carried out utilizing the Python whois [67] library and the Python SSL library [68]. Both of these libraries provide the capability to easily query a domain’s information and receive a simple data object output with the results. As shown in Figure 4, the domain_registration output includes information regarding the responsible organization, the domain’s nameservers and registrars, and the corresponding location of the registered domain. The SSL certificate information includes information about the domain’s certificate issuer and expiration dates.

As mentioned previously, three APIs are utilized by the triage tool to collect information for each domain triaged: the GoDaddy API [71], PhishTank API [75], and the VirusTotal API [73]. Figure 4 contains a GoDaddy API response in the domain_sale field and a PhishTank response query in the domain_phish_query.

An example response for the VirusTotal API, consisting of the critical fields used for analysis is provided in Figure 5. The last_analysis_results field contains the results from each individual antivirus platform or software while the last_analysis_stats field summarizes the results from all platforms. Each of the platforms places the domain in one of five categories: malicious, harmless, suspicious, timeout, and undetected. In order to mitigate false positives from individual antivirus platform ratings, a threshold of three or more antivirus platforms was used to consider a domain as malicious. Analysis of the VirusTotal results for all 3460 domains revealed that certain providers were significantly more likely to classify domains as malicious compared to others. For example, Fortinet flagged 454 domains while most providers flagged fewer than 50. Additionally, 311 domains were marked as malicious by only a single provider, suggesting a high potential for false positives at lower thresholds. The threshold of three or more engines was selected to reduce the likelihood of false positives from these outlier providers. With this approach, however, truly malicious domains marked as malicious by more advanced or specialzied antivirus systems may be overlooked. The reputation field reflects a weighted score based on the VirusTotal user base’s assessment of the domain. The weighted reputation score places higher emphasis on evaluations from highly rated, credible VirusTotal users. Combined, these scoring metrics can provide a marker for previous malicious activity for given domains; however, more novel squatting campaigns may be missed. Due to a daily API limit of 500 requests, the queries for VirusTotal were spread out over several days and completed external to the triage script.

The final element performed by the triage script is the domain scraping process utilizing Selenium. This process is described and outlined in Algorithm 2. In 3.8% of cases, the automated scraping was detected and were then manually revisited in the sign up phase to determine if an identity can be passed.

3.4. Signup Process

An in-depth analysis of all data collected is provided in Section 4, however, of the 3460 domains scanned, 1158 were identified as active, and 265 contained keywords indicating a sign up may be possible. Additionally, 134 domains presented a captcha and required manual investigation for sign up feasibility during the sign up process. In order to generate fake identities and collect data after the sign, the mentioned pre-existing U&A fake ID generator [5] was utilized. Additionally, the project has developed a collection engine capable of collecting and processing communications and data of the fake identities at scale [6]. The detailed design of those capabilities has been published across two papers [5,6], and this work deploys those capabilities to understand privacy impacts of cybersquatting cases via the application of robust fake identities.

Algorithm 2 Web Scraping Function

1:: function webScrape(domain)
2:: Input: Domain URL to Scrape
3:: Output: Domain HTML Source Code, Signup Boolean, Captcha Boolean
4:: if Domain Reachable then
5:: Load webpage utilizing Selenium Driver
6:: $d o m a i n S o u r c e \leftarrow$ domain HTML sourceCode
7:: if domainSource includes “captcha” then
8:: $c a p t c h a F o u n d \leftarrow$ true
9:: else
10:: $c a p t c h a F o u n d \leftarrow$ false
11:: if domainSource includes signupKeywords then
12:: $s i g n u p P o s s i b l e \leftarrow$ true
13:: else
14:: $s i g n u p P o s s i b l e \leftarrow$ false
15:: $s c r a p e R e s u l t \leftarrow {d o m a i n S o u r c e, c a p t c h a F o u n d, s i g n u p P o s s i b l e}$
return scrapeResult

3.4.1. Creation of Identities

A detailed overview of the design of the fake identities that will be utilized for this experiment has been previously published [5]. The fake identities are generated utilized a pseudorandom number generator (PRNG) [76] number generator based on real census data to ensure the fake identities generated accurately mimic the population base. Figure 6 provides a simplified overview of the fake identity generation process and the metadata generated for each identity. This approach, designed and tested in previous U&A research, allows for the repeatable, large-scale generation of high quality fake identities. As shown, each identity generated consists of a robust profile, including name, gender, age, email, race, nationality, etc. Each identity will be mapped 1:1 with one of the typosquatting domains to ensure all data collected can be traced back to the original domain. Overall, the fake identities generated provide a real-world human passable identity that can be deployed for privacy experimentation purposes without the ethical concern associated with real personal identifying information.

3.4.2. Sign Up Engine and Identity Registration

In order to efficiently sign up each identity to each domain, a “Sign up engine” was created for the Use & Abuse project [6]. The sign up engine automates many of the repetitive processes, such as navigating to each domain and identity selection. Furthermore, the sign-up engine displays all available fields for registration and offers the user a user-friendly interface to copy/paste as well as establishing a record which fields have been utilized. This interface is displayed in Figure 7.

In total, 527 signup attempts were made across 396 websites identified as “signup possible” by the triage script and 131 websites flagged as malicious by three or more antivirus platforms within the VirusTotal API. Fake identities were only successfully passed to 31 domains or 5.8% of these domains, suggesting that the direct harvesting of user personal information is less prevalent than previously believed. Anecdotally, we believe that the bar for a user to recognize the squatted domain as being distinct from the domain they intended to visit is very low, making it unlikely that they would proceed with a signup/signin if already familiar with the base domain. The majority of domains that were manually visited during the signup process were identified as defensive registrations, alternative companies, and other non-malicious websites. Some of the investigated domains, however, redirected to parked advertisement pages or potentially malicious sites, including gambling and cryptocurrency websites. An in-depth analysis of the signup process results are provided in Section 4.5.

4. Results

The triage and sign-up processes generated a substantial amount of data for each domain; however, even after completing the sign-up process, conclusions about privacy impacts remain limited. As previously mentioned, the triage process gathered information on various aspects such as domain activity, registration, certificates, content, and more. Analyzing this data allows for the identification of trends within the typosquatting landscape. Additionally, the sign-up process provided insights into the tactics employed by attackers and the content hosted at each domain. The overall progression of the domains examined in this study is illustrated in Figure 8.

4.1. Domain Activity and Sign Up Capability

As shown in Figure 8, of the 3460 domains scanned, 1158, or roughly 33.5%, were found to be active and reachable. A domain is considered to be active if it successfully responds to an HTTP request sent by the Python requests library. By category, the percentage of domains active varied largely. Figure 9 displays a normalized depiction of the number of active domains compared to the larger subset scanned by category. The domain categories of traffic, government, and travel show the highest percentages of active domains, at 62.3%, 49.2%, and 37.3%, respectively. Additionally, Figure 9 includes the number of domains that contained sign up keywords. In comparison to the broader domain space, the percentage of domains where sign-up is possible is relatively small, accounting for only 263 or 7.7% of domains. In addition to this set of domains, signup attempts were also made for the 134 domains that presented a CAPTCHA, as well as the 131 domains flagged as malicious by three or more antivirus platforms or that had a reputation value less than zero from the VirusTotal API. Through the signup process, 31 signups were successfully completed, which represents 5.6% of the domains manually investigated or 2.7% of the active domains scanned. Signup was deemed successful if credentials were successfully entered into a form or signup process and submitted to the domain.

4.2. Domain Registrations

Malicious actors frequently exploit the domain registration process to host domains that distribute scams and other types of malware [77]. Of the typosquatting domains scanned, 1882 domains (54.4%) were found to have registrations. The proportion of registered domains also varied, with the website traffic and government domain groups having the highest portion of typosquatted domains registered, implying that they are prime targets of attackers. Each domain registration contains a variety of information about the domain, including the registrar’s name, contact information, and address. Across the registered domains, GoDaddy (10.8%), CSC Corporate Domains (9.7%), and Mark Monitor (5.1%) were found to be the most common reoccurring registrars. MarkMonitor and CSC Corporate Domains are both companies that provide defensive registration and brand protection services. In total, there were 261 unique registrars across the 1882 registered domains. Table 4 provides the top 10 most frequent domain registrars as well as the top registrars for domains found to be malicious.

In addition to registrar information, each registration includes additional data about the given domain. One point of interest, based on the assumption that typosquatting domains may be registered or based out of an adversarial nation, is the address information provided. Each domain registration includes the city, state, and country of the registered domain. Among the registered domains, 276 were found to have one or more fields marked as “REDACTED FOR PRIVACY,” while 790 domains contained a null value in one or more fields. Under the ICANN Temporary Specification and the European Union’s General Data Protection Regulation (GDPR), redaction of registrant information is required prior to the registrar publishing the WHOIS record [78]. In a 2021 study, 85% of large WHOIS providers were found to be in compliance with GDPR regulations [78]. These regulations allow for domain registrations to contain null or redacted fields to maintain privacy of the registrants. Figure 10 provides a map of all traceable server locations for registered domains, color-coded by domain category. A large portion of domain servers were traced back to North America and Europe, while substantially less were traced to Asia or Australia. None of the domains investigated were located in South America or Africa.

By integrating domain registration data with a VirusTotal analysis for each domain, the registrations can be refined to target the subset of typosquatting domains identified as exhibiting malicious behaviors. Of the registered domains, 105 were found to be malicious by 3 or more antivirus platforms. Figure 11 is an updated map, limited to the malicious domains that contained address information. As shown, the majority of malicious domains come from the top website traffic category, the antivirus domains category, and the travel category. Only 5.6% of the registered domains were identified as malicious, but they appeared to be distributed across a wide range of registrars.

4.3. Domain Certificates

Similar to the domain registrations, information was collected about each domain’s certificate. Of the domains, 993 or 28.7%, were found to have an active SSL certificate. The presence of a valid SSL certificate indicates to users that the website is verified by a trusted authority, and therefore increases a user’s trust in a website. These certificates, however, can be forged or tampered [79]. To receive an SSL certificate, a system administrator or domain owner needs to submit a certificate signing request (CSR) to a certificate authority, which reviews and approves the request [80]. The request contains information about the server IP address and public key [80]. Registering for an SSL certificate often, but not always, requires a fee to be paid to the certificate authority in exchange for the verified SSL certificate [81]. In Table 5, the top 10 certificate providers are listed along with the number of typosquatted domains for which a certificate was issued by that organization. In contrast to the large number of registrars, there were only 10 unique certificate issuers. Let’s Encrypt was the predominant certificate issuer, issuing 677 certificates, while GoDaddy was the second most common, issuing 109. This is likely because Let’s Encrypt offers free SSL certificates. Utilizing the VirusTotal data, five issuers were found to have issued a certificate to one or more websites marked as malicious by 3 or more antivirus providers.

For each certificate provided, the location information of the issuer is included. Figure 12 presents a heat map of the location distribution of certificate issuers, with the majority of certificates issued from providers in the United States and Europe. Of the providers attributed to malicious domains, four of the providers (Let’s Encrypt, GoDaddy.com, DigiCert Inc., and Google Trust) are all based in the United States, while ZeroSSL is based in Austria. Among the domains analyzed, malicious domains appear to exploit the availability of free SSL certificates to create a false sense of security for visitors accessing their sites. Among the domains investigated, Let’s Encrypt is the most prevalent certificate issuer across both legitimate and malicious domains. As a result, we suggest that browsers consider ignoring or downgrading trust of SSL certificates by “trusted authorities” that are clearly not performing due diligence when approving requests.

4.4. Domain Scanning

Every domain was scanned utilizing the VirusTotal API, which queries over 70 different antivirus providers. Of the domains scanned, 889 were marked as malicious by one or more providers. Certain providers were more likely to classify domains as malicious, with 311 domains being flagged as malicious by only a single provider. Table 6 provides the top 10 providers and the number of domains that were marked as malicious by that provider. Fortinet, CRDF, and SecLookup were the only providers to mark over 100 domains as malicious.

In substitute of a malicious classification, some providers utilized a “suspicious” classification for some domains. This classification, however, was much less frequent, as only 161 domains were marked suspicious by one or more providers. Of the domains scanned, seven were marked as malicious by 10 or more providers. Additionally, 47 domains were classified as malicious by five or more providers. Figure 13 provides an overview of the 20 domains with the highest combined classification of malicious and suspicious domains. The domain youtibe.com was found to be malicious by the highest number of providers. Additionally, VirusTotal allows users to vote and provide feedback on domains, which it aggregates and weights to generate a reputation score. Of the domains investigated, the range of scores were from −169 to 65, where a negative score represents a malicious domain and a positive score represents a trustworthy domain. In total, 52 domains had a score below zero and the domain facebok.com has the lowest score.

4.5. Domain Sign Ups

As mentioned previously, fake identity signups were only successfully submitted to 31 domains. The signup process required a manual investigation of 527 domains, which included those flagged as potentially supporting signups by the triage script, domains identified as malicious by three or more providers via the VirusTotal API, and domains with a reputation score below zero. Manual investigation and the signup process was performed utilizing the Browserling tool [82], which allows for sandboxing and the provision of virtual machines. The top content categories are outlined in Table 7. A legitimate website classification means that the domain is redirected to the legitimate, original domain, representing a defensive registration. An alternate company designation signals that the domain is routed to another legitimate company. The top types of companies routed to were other technology, commerce, and service provider companies. Similarly, the main types of alternate websites routed to were blogs, organizations, and schools.

Of the 527 domains, 69 were available for sale, and 18 displayed a message that they were registered. Of the domains that were for sale, 33.3% displayed some form of parked advertisements in addition to the for sale message. Overall, 14.7% of the domains hosted and displayed advertisements in some form. Error messages indicating that the domain was unavailable or was not found occurred for 37 of the domains. Interestingly, 7 of the domains redirected to gambling websites in some form, 5 redirected to some form of cryptocurrency website, and 2 attempted to download a file after clicking a button on the website.

In addition to the content hosted at each website, other information was collected during the investigation of each domain. Interestingly, 286 of the domains were redirected to another domain. Of these, 177 or 61.9% were defensive redirects to the legitimate domain, while 38.1% or 109 redirected to another domain or website. The most prominent redirect was to a domain registrar for sale page for the domain, accounting for 52 or 18.1% of the redirected domains. Interestingly, 40 (13.9%) of the redirected domains were to completely different webpages or businesses. Some interesting redirects were that iterations of the U.S. Citizenship and Immigration Services website uscis.gov, uscis.com, uscis.net, and uscis.ca all redirected to immigrationdirect.com. The Immigration Direct website, shown in Figure 14, provides services for preparing immigration forms and provides disclaimers that it is not the USCIS organization/website.

Another webpage that was commonly redirected to was joya.casino, an online cryptocurrency casino. The domain typos of instagram.com, twitter.com, and nytimes.com: inwtagram.com, twitteg.com, and nytikes.com all redirected to this domain. In total, 5 different domains redirected to joya.casino and 5 identities were successfully signed up to the domain. Figure 15 showcases what is presented to users who are redirected to the domain.

Of the 31 successful signups, 15 of the identities received emails after six months. In total, 158 emails were received across all fake identities. The identity signed up to nytkmes.com, which redirected to cointiply.com, a website that claims to provide users cryptocurrency in exchange for completing various tasks and games, was the most frequent sender of emails and accounted for 61 or 38.6% of emails received. The cointiply homepage is displayed in Figure 16. Apart from the initial sign-up confirmation, all of the emails received from cointiply followed the format shown in Figure 17. The email appears to be a phishing or spam message; however, the embedded link did not resolve and instead returned an error.

Joya.Casino was the second most frequent sender of emails, with the five accounts registered in the domain receiving 59, or 37.3%, of the emails received. Despite all five identities being registered to the same domain, email delivery varied: one account received 15 emails, while the other four each received 11. The emails received from Joya.Casino were typically marketing emails attempting to get users to take advantage of offers or play games on their website. An example email is shown in Figure 18.

Overall, only a small fraction (0.44%) of the domains analyzed ultimately received email traffic. Nevertheless, there is evidence suggesting that certain companies and organizations deliberately register typographic variants of other domains as a strategy to redirect traffic to their own sites. As documented in other work [7], a large portion of the domains investigated were attempting to profit off of typosquatting utilizing parked advertisements. The domains investigated in this experiment likely represent a closely monitored subset of typosquatted websites. Expanding the number of legitimate domains and incorporating a wider array of squatting techniques could increase the number of instances where account registration is feasible. In the future, additional research could be conducted to determine whether any of the registered identities were propagated to other domains or appeared on the dark web—given the known one-time use of the identities, we can attribute that release of personal information to its source.

5. Future Work and Extension

This work provided a detailed analysis of typosquatting domains using OSINT techniques and established a foundational process for the investigation of the privacy impacts of squatting attacks. There are, however, a variety of opportunities for future work and expansion to further investigate the privacy impacts of squatting as well as establish a more holistic picture of the squatting landscape. These extensions include:

5.1. Expansion to a Broader Set of Domains and Squatting Techniques

In this experiment, the base set of domains analyzed was fairly limited due to time and resource constraints. The initial set of 166 domains allowed for a foundational method and approach to be established, however, they do not provide a comprehensive overview of the squatting space, and incorporating additional domains would allow for a more thorough investigation and robust conclusions to be drawn. Additionally, this experiment only focused on typosquatting iterations of the legitimate domains. Since this work automated many of the manual and cumbersome tasks associated with domain investigation, the investigation could easily be expanded to include combosquatting, TLD iterations, and other squatting techniques, allowing for a more in-depth look into the squatting landscape as a whole.

5.2. Automated Identity Signups for Squatting Domain Iterations

As mentioned previously, Selenium was utilized to scrape domains due to its ability to fully render each domain and interact with active content. Selenium was designed for automated website testing and therefore can automatically interact with webpage content. The capability to automatically test webpage logins and sign-ups with Selenium has been demonstrated [83] and could be modified to automate the signup of fake identities to eligible squatting domains. By automating this process, identities could be more efficiently distributed to squatting websites, and therefore, more data could be collected about the privacy implications of the space.

5.3. Automated and Continued Interaction with Content Received from Squatting Domains

The U&A project is currently working on an “Account Interaction Engine”, to allow for prolonged interaction with content received by fake identities [5]. This capability would allow for automated and ongoing interaction with websites, potentially resulting in an increased number of communications received and data to be collected as a live phish on the hook.

5.4. Advanced Forensic Analysis Tracking of Fake Identity Information

In its current state, after identities are passed to typosquatted domains, the only source of tracking impact is through communications received by the fake identities. A planned expansion of the U&A project is to implement robust tracking capabilities to monitor data brokers and dark web sources for the existence of fake identity information post-signup. Additionally, enhanced automated forensic analysis capabilities are planned for classification and analysis of received communications.

5.5. Ethical Investigation and Model Development for Fake Identity Applications

When applying fake identities for privacy research, there is a potential for deception or misuse, and a solution is necessary to ensure ethical application. In this experiment, fake identities were utilized once and do not perform any automated behavior or send responses to content received. The U&A project is currently developing a quantitative ethics evaluation framework for active OSINT experiments. This work aims to publish both an expansive literature survey and a proposed methodology for ensuring ethical conduct of these experiments.

6. Conclusions

The work performed in this paper laid the foundation for in-depth OSINT analysis and research focused on the privacy implications of the squatting space. By automating the initial triage and investigation steps, the subset of domains where fake identities can be registered can easily be identified. Additionally, trends can be identified about the overall set of squatted domains utilizing various libraries and APIs. This work found that of the domains investigated, 33.5% were found to be active, with the largest portion of active domains stemming from the highest traffic domains and government domains. Of the domains, 54.4% had active registrations from 261 unique registrars. The majority of the registered domain servers were located in North America and Europe, with a smaller portion located in Asia. Utilizing the VirusTotal API, 105 of the registered domains were identified as malicious and were located in North America. Finally, 28.7% of the domains investigated had a valid, active SSL certificate, of which 67.2% were provided for free by Let’s Encrypt. The triage script developed enables automated and efficient investigation of a large number of domains, allowing for comprehensive data collection and the identification of broader trends. Additionally, the triage script allows efficient identification of the subset of domains where a fake identity could potentially be registered.

The triage script identified 7.7% of the domains to contain keywords related to account registration and signup. Additionally, domains that presented a CAPTCHA along with domains marked by VirusTotal were manually investigated. In total, 31 of the 577 domains manually investigated resulted in successful signups, and 11 of these resulted in emails received by fake identities. Although a definitive conclusion about the privacy implications of squatting cannot be drawn from the identities registered, this work demonstrates that there is a presence of actors utilizing squatted domains in an attempt to increase business and collect information from individuals by redirecting them to alternative domains. As mentioned previously, this study focused on a heavily scrutinized subset of typosquatting domains derived from the most popular legitimate domains. Expanding the initial domain set could potentially reveal a greater number of cases aimed at harvesting personally identifiable information (PII) from individuals.

Overall, this work provided a foundational process for investigating squatted domains and employing the capabilities developed by the Use and Abuse project to investigate the privacy-related impacts of squatting.

Author Contributions

Conceptualization, J.K. and A.J.M.; methodology, J.K.; software, J.K.; validation, J.K., E.R. and A.J.M.; formal analysis, J.K.; investigation, J.K.; data curation, J.K.; writing—original draft preparation, J.K.; writing—review and editing, A.J.M.; visualization, J.K.; supervision, A.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Commonwealth Cyber Initiative, an investment in the advancement of cyber R&D, innovation, and workforce development. For more information about CCI, visit www.cyberinitiative.org. Additional support was also received from the VT National Security Institute’s Spectrum Dominance Division. Additionally, this material is based upon work supported by the National Science Foundation under Grant Number 1946493. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
API	Application Programming Interface
ASCII	American Standard Code for Information Interchange
CAPTCHA	Completely Automated Public Turing test to tell Computers and Humans Apart
CNN	Convolutional Neural Network
CRC	Cyclic Redundancy Check
CSR	Certificate Signing Request
CSV	Comma-Separated Values
DNS	Domain Name System
DNSSEC	Domain Name System Security Extensions
ECC	Error-Correcting Code
GDPR	General Data Protection Regulation
HTTP	Hypertext Transfer Protocol
ICANN	Internet Corporation for Assigned Names and Numbers
IP	Internet Protocol
JSON	JavaScript Object Notation
K-NN	K-Nearest Neighbors
LR	Left-to-Right (Algorithm)
ML	Machine Learning
NB	Naive Bayes
OSINT	Open-Source Intelligence
PI	Personal Information
PII	Personally Identifiable Information
PRNG	Pseudorandom Number Generator
RNN	Recurrent Neural Network
SMTP	Simple Mail Transfer Protocol
SSL	Secure Sockets Layer
SVM	Support Vector Machine
TLD	Top-Level Domain
U&A	Use and Abuse
URL	Uniform Resource Locator
WHOIS	Who Is (domain registration protocol)

References

Devi, G.; Vats, M. The Threat of Cyber Squatting: Understanding the Risks of Digital Identity Theft. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
Szurdi, J.; Kocso, B.; Cseh, G.; Spring, J.; Felegyhazi, M.; Kanich, C. The Long “Taile” of Typosquatting Domain Names. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 191–206. [Google Scholar]
ThreatLabz. Phishing, Typosquatting, and Brand Impersonation: Trends and Tactics. 2024. Available online: https://www.zscaler.com/blogs/security-research/phishing-typosquatting-and-brand-impersonation-trends-and-tactics (accessed on 18 September 2025).
Zeng, Y.; Zang, T.; Zhang, Y.; Chen, X.; Wang, Y. A Comprehensive Measurement Study of Domain-Squatting Abuse. In Proceedings of the ICC 2019-2019 IEEE International Conference on Communications (ICC); IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Kolenbrander, J.; Husmann, E.; Henshaw, C.; Rheault, E.; Boswell, M.; Michaels, A.J. Use & Abuse of Personal Information, Part II: Robust Generation of Fake IDs for Privacy Experimentation. J. Cybersecur. Priv. 2024, 4, 546–571. [Google Scholar] [CrossRef]
Rheault, E.; Nerayo, M.; Leonard, J.; Kolenbrander, J.; Henshaw, C.; Boswell, M.; Michaels, A.J. Use and Abuse of Personal Information, Part I: Design of a Scalable OSINT Collection Engine. J. Cybersecur. Priv. 2024, 4, 572–593. [Google Scholar] [CrossRef]
Chandra, R.; Bhatnagar, V. Cyber-squatting: A cyber crime more than an unethical act. Int. J. Soc. Comput.-Cyber Syst. 2019, 2, 146. [Google Scholar] [CrossRef]
Chen, G.; Johnson, M.F.; Marupally, P.R.; Singireddy, N.K.; Yin, X.; Paruchuri, V. Combating Typo-Squatting for Safer Browsing. In Proceedings of the 2009 International Conference on Advanced Information Networking and Applications Workshops; IEEE: Piscataway, NJ, USA, 2009; pp. 31–36. [Google Scholar] [CrossRef]
Nikiforakis, N.; Van Acker, S.; Meert, W.; Desmet, L.; Piessens, F.; Joosen, W. Bitsquatting: Exploiting bit-flips for fun, or profit? In Proceedings of the 22nd International Conference on World Wide Web; ACM: New York, NY, USA, 2013; WWW ’13; pp. 989–998. [Google Scholar] [CrossRef]
Spaulding, J.; Upadhyaya, S.; Mohaisen, A. You’ve Been Tricked! A User Study of the Effectiveness of Typosquatting Techniques. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS); IEEE: Piscataway, NJ, USA, 2017; pp. 2593–2596. [Google Scholar] [CrossRef]
Bourne, J. Are Typosquatters Hijacking Your Brand? FairWinds (USA) Inc.: Boston, MA, USA, 2023. [Google Scholar]
Moore, T.; Edelman, B. Estimating Visitors and Advertising Costs of Typo Domains–Online Appendix. In Proceedings of the 14th International Conference on Financial Cryptography and Data Security; Lecture Notes in Computer Science (LNCS); Springer: Berlin/Heidelberg, Germany, 2010; Available online: https://www.benedelman.org/typosquatting/ (accessed on 10 December 2025).
Khan, M.T.; Huo, X.; Li, Z.; Kanich, C. Every Second Counts: Quantifying the Negative Externalities of Cybercrime via Typosquatting. In Proceedings of the 2015 IEEE Symposium on Security and Privacy; IEEE: Piscataway, NJ, USA, 2015; pp. 135–150. [Google Scholar] [CrossRef]
Puzari, J. Typosquatting and Its Impact upon Intellectual Property in Cyberspace: A Legal Study. J. Intellect. Prop. Stud. 2023, 7, 85–100. [Google Scholar]
Tahir, R.; Raza, A.; Ahmad, F.; Kazi, J.; Zaffar, F.; Kanich, C.; Caesar, M. It’s All in the Name: Why Some URLs are More Vulnerable to Typosquatting. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications; IEEE: Piscataway, NJ, USA, 2018; pp. 2618–2626. [Google Scholar] [CrossRef]
Bilge, L.; Kirda, E.; Kruegel, C.; Balduzzi, M. Exposure: Finding malicious domains using passive DNS analysis. In Proceedings of the Ndss, San Diego, CA, USA, 6–9 February 2011; pp. 1–17. [Google Scholar]
Torabi, S.; Boukhtouta, A.; Assi, C.; Debbabi, M. Detecting Internet Abuse by Analyzing Passive DNS Traffic: A Survey of Implemented Systems. IEEE Commun. Surv. Tutorials 2018, 20, 3389–3415. [Google Scholar] [CrossRef]
Jung, H.M.; Lee, H.G.; Choi, J.W. Efficient Malicious Packet Capture Through Advanced DNS Sinkhole. Wirel. Pers. Commun. 2017, 93, 21–34. [Google Scholar] [CrossRef]
Lagunzad, H.C.; Gonzaga, M.V. Tracking and Blocking Adware using DNS Sinkholing Algorithm. In Proceedings of the 2024 16th International Conference on Computer and Automation Engineering (ICCAE); IEEE: Piscataway, NJ, USA, 2024; pp. 30–35. [Google Scholar] [CrossRef]
Hu, H.; Zivi, A.; Doerr, C. Dealing with Bad Apples: Organizational Awareness and Protection for Bit-flip and Typo-Squatting Attacks. In Proceedings of the 19th International Conference on Availability, Reliability and Security; Association for Computing Machinery: New York, NY, USA, 2024; ARES ’24. [Google Scholar] [CrossRef]
Spaulding, J.; Upadhyaya, S.; Mohaisen, A. The Landscape of Domain Name Typosquatting: Techniques and Countermeasures. In Proceedings of the 2016 11th International Conference on Availability, Reliability and Security (ARES); IEEE: Piscataway, NJ, USA, 2016; pp. 284–289. [Google Scholar] [CrossRef]
Loyola, P.; Gajananan, K.; Kitahara, H.; Watanabe, Y.; Satoh, F. Automating Domain Squatting Detection Using Representation Learning. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2020; pp. 1021–1030. [Google Scholar] [CrossRef]
Omotosho, A.; Awazie, D.; Ayegba, P.; Emuoyibofarhe, J. A Gamified Technique to Improve Users’ Phishing and Typosquatting Awareness. In Proceedings of the Information and Communication Technology and Applications; Misra, S., Muhammad-Bello, B., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 403–414. [Google Scholar]
Szurdi, J.; Christin, N. Email typosquatting. In Proceedings of the 2017 Internet Measurement Conference; Association for Computing Machinery: New York, NY, USA, 2017; IMC ’17; pp. 419–431. [Google Scholar] [CrossRef]
Moore, T.; Edelman, B. Measuring the Perpetrators and Funders of Typosquatting. In Proceedings of the Financial Cryptography and Data Security; Sion, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 175–191. [Google Scholar]
Kintis, P.; Miramirkhani, N.; Lever, C.; Chen, Y.; Romero-Gómez, R.; Pitropakis, N.; Nikiforakis, N.; Antonakakis, M. Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2017; CCS ’17; pp. 569–586. [Google Scholar] [CrossRef]
Le Pochat, V.; Van Goethem, T.; Joosen, W. A Smörgåsbord of Typos: Exploring International Keyboard Layout Typosquatting. In Proceedings of the 2019 IEEE Security and Privacy Workshops (SPW); IEEE: Piscataway, NJ, USA, 2019; pp. 187–192. [Google Scholar] [CrossRef]
Nikiforakis, N.; Balduzzi, M.; Desmet, L.; Piessens, F.; Joosen, W. Soundsquatting: Uncovering the Use of Homophones in Domain Squatting. In Proceedings of the Information Security; Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M., Eds.; Springer: Cham, Switzerland, 2014; pp. 291–308. [Google Scholar]
Kumar, D.; Paccagnella, R.; Murley, P.; Hennenfent, E.; Mason, J.; Bates, A.; Bailey, M. Skill Squatting Attacks on Amazon Alexa. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 33–47. [Google Scholar]
Valentim, R.V.; Drago, I.; Mellia, M.; Cerutti, F. X-squatter: AI Multilingual Generation of Cross-Language Sound-squatting. ACM Trans. Priv. Secur. 2024, 27, 1–27. [Google Scholar] [CrossRef]
Valentim, R.; Drago, I.; Mellia, M.; Cerutti, F. Lost in Translation: AI-based Generator of Cross-Language Sound-squatting. In Proceedings of the 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW); IEEE: Piscataway, NJ, USA, 2023; pp. 513–520. [Google Scholar] [CrossRef]
Thao, T.P. Improving Homograph Attack Classification. arXiv 2020, arXiv:2009.08006. [Google Scholar] [CrossRef]
Thao, T.P.; Sawaya, Y.; Nguyen-Son, H.Q.; Yamada, A.; Kubota, A.; Van Sang, T.; Yamaguchi, R.S. Human Factors in Homograph Attack Recognition. In Proceedings of the Applied Cryptography and Network Security; Conti, M., Zhou, J., Casalicchio, E., Spognardi, A., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 408–435. [Google Scholar]
Dam, T.; Klausner, L.D.; Buhov, D.; Schrittwieser, S. Large-Scale Analysis of Pop-Up Scam on Typosquatting URLs. In Proceedings of the 14th International Conference on Availability, Reliability and Security; Association for Computing Machinery: New York, NY, USA, 2019; ARES ’19. [Google Scholar] [CrossRef]
Moubayed, A.; Injadat, M.; Shami, A.; Lutfiyya, H. DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM); IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar] [CrossRef]
Chiba, D.; Nakano, H.; Koide, T. DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection. arXiv 2024. [Google Scholar] [CrossRef]
Vajrobol, V.; Gupta, B.B.; Gaurav, A. Mutual information based logistic regression for phishing URL detection. Cyber Secur. Appl. 2024, 2, 100044. [Google Scholar] [CrossRef]
Almomani, A.; Alauthman, M.; Shatnawi, m.t.; Alweshah, M.; Alrosan, A.; Alomoush, W.; Gupta, B. Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study. Int. J. Semant. Web Inf. Syst. 2022, 18, 1–24. [Google Scholar] [CrossRef]
Spaulding, J.; Nyang, D.; Mohaisen, A. Understanding the effectiveness of typosquatting techniques. In Proceedings of the Fifth ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies; Association for Computing Machinery: New York, NY, USA, 2017; HotWeb ’17. [Google Scholar] [CrossRef]
Buber, E.; Demir, O.; Sahingoz, O.K. Feature selections for the machine learning based detection of phishing websites. In Proceedings of the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP); IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar] [CrossRef]
Apruzzese, G.; Conti, M.; Yuan, Y. SpacePhish: The Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning. In Proceedings of the 38th Annual Computer Security Applications Conference; Association for Computing Machinery: New York, NY, USA, 2022; ACSAC ’22; pp. 171–185. [Google Scholar] [CrossRef]
Ahmad, I.; Parvez, M.A.; Iqbal, A. TypoWriter: A Tool to Prevent Typosquatting. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC); IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 423–432. [Google Scholar] [CrossRef]
Benjamin, B.C.; Bayer, J.; Fernandez, S.; Duda, A.; Korczyński, M. Shielding Brands: An In-Depth Analysis of Defensive Domain Registration Practices Against Cyber-Squatting. In Proceedings of the 2024 8th Network Traffic Measurement and Analysis Conference (TMA); IEEE: Piscataway, NJ, USA, 2024; pp. 1–11. [Google Scholar] [CrossRef]
Kirda, E.; Kruegel, C. Protecting users against phishing attacks with AntiPhish. In Proceedings of the 29th Annual International Computer Software and Applications Conference (COMPSAC’05); IEEE: Piscataway, NJ, USA, 2005; Volume 2, pp. 517–524. [Google Scholar] [CrossRef]
Joon Sern, L.; Gui Peng David, Y. TypoSwype: An Imaging Approach to Detect Typo-Squatting. In Proceedings of the 2021 11th IFIP International Conference on New Technologies, Mobility and Security (NTMS); IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Internet Corporation for Assigned Names and Numbers; ICANN Policy; ICANN: Los Angeles, CA, USA, 2024.
Watters, P.A.; Herps, A.; Layton, R.; McCombie, S. ICANN or ICANT: Is WHOIS an Enabler of Cybercrime? In Proceedings of the 2013 Fourth Cybercrime and Trustworthy Computing Workshop; IEEE: Piscataway, NJ, USA, 2013; pp. 44–49. [Google Scholar] [CrossRef]
Pouryousef, S.; Dar, M.D.; Ahmad, S.; Gill, P.; Nithyanand, R. Extortion or Expansion? An Investigation into the Costs and Consequences of ICANN’s gTLD Experiments. In Proceedings of the Passive and Active Measurement; Sperotto, A., Dainotti, A., Stiller, B., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 141–157. [Google Scholar]
Office of Management and Budget. Safeguarding Against and Responding to the Breach of Personally Identifiable Information. In Memorandum for the Heads of Executive Departments and Agencies; Office of Management and Budget: Washington, DC, USA, 2007. [Google Scholar]
U.S. General Services Administration. Rules and Policies-Protecting PII-Privacy Act. GSA Privacy Program; GSA: Washington, DC, USA, 2025. [Google Scholar]
European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council (General Data Protection Regulation). Off. J. Eur. Union 2016, 679, 10–13. [Google Scholar]
Zaeifi, M.; Kalantari, F.; Oest, A.; Sun, Z.; Ahn, G.J.; Shoshitaishvili, Y.; Bao, T.; Wang, R.; Doupé, A. Nothing Personal: Understanding the Spread and Use of Personally Identifiable Information in the Financial Ecosystem. In Proceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy; Association for Computing Machinery: New York, NY, USA, 2024; CODASPY ’24; pp. 55–65. [Google Scholar] [CrossRef]
Tian, K.; Jan, S.T.K.; Hu, H.; Yao, D.; Wang, G. Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. In Proceedings of the Internet Measurement Conference 2018; Association for Computing Machinery: New York, NY, USA, 2018; IMC ’18; pp. 429–442. [Google Scholar] [CrossRef]
Michaels, A.J.; George, K.B. Use and Abuse of Personal Information. 2021. Available online: https://www.blackhat.com/us-21/briefings/schedule/#use–abuse-of-personal-information-22688 (accessed on 5 May 2025).
Harrison, J.; Lyons, J.; Anderson, L.; Maunder, L.; O’Donnell, P.; George, K.B.; Michaels, A.J. Quantifying Use and Abuse of Personal Information. In Proceedings of the 2021 IEEE International Conference on Intelligence and Security Informatics (ISI); IEEE Press: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
ahrefs. Top Websites Ranking in United States. November 2024. Available online: https://ahrefs.com/websites/united-states (accessed on 5 May 2025).
Half, R. 25 Job Search Sites and Job Boards. 2024. Available online: https://www.roberthalf.com/us/en/insights/landing-job/best-job-search-sites-job-boards (accessed on 5 May 2025).
Sedykh, S. 15 Best Weather Sites to Get Accurate Forecast in 2024. Available online: https://codesupply.co/best-weather-sites/ (accessed on 5 May 2025).
GSA. 2025. Available online: https://analytics.usa.gov/ (accessed on 5 May 2025).
ODNI. Members of the IC. 2025. Available online: https://www.dni.gov/index.php/what-we-do/members-of-the-ic (accessed on 5 May 2025).
Miguel, P.G. Comprehensive Guide to the 25 Best Anti-Malware Software Solutions. 2025. Available online: https://thectoclub.com/tools/best-anti-malware-software/ (accessed on 5 May 2025).
Trends, O. Most Visited Travel & Tourism Websites in Worldwide 2024. 2025. Available online: https://www.semrush.com/trending-websites/global/travel-and-tourism (accessed on 5 May 2025).
Boykin, J. Domain Name Misspellings|Domain Typo Generator Tool. 2024. Available online: https://www.internetmarketingninjas.com/tools/domain-typo-generator/ (accessed on 5 May 2025).
IBM. Querying Domain Registration Information. 2025. Available online: https://www.ibm.com/docs/en/networkmanager/4.2.0?topic=information-querying-domain-registration (accessed on 5 May 2025).
Kaspersky. What Is an SSL Certificate & Why Is It Important? 2020. Available online: https://usa.kaspersky.com/resource-center/definitions/what-is-a-ssl-certificate#:~:text=An%20SSL%20certificate%20is%20a,server%20and%20a%20web%20browser (accessed on 5 May 2025).
Reitz, K. Requests: HTTP for Humans. Python Package Index (PyPI), Version 2.32.3; Python Software Foundation: Beaverton, OR, USA, 2024.
Penman, R. Python-Whois. 2024. Available online: https://pypi.org/project/python-whois/ (accessed on 5 May 2025).
Foundation, P.S. SSL-TLS/SSL Wrapper for Socket Objects. 2025. Available online: https://docs.python.org/3/library/ssl.html (accessed on 5 May 2025).
Selenium. 2023. Available online: https://www.selenium.dev/ (accessed on 5 May 2025).
MongoDB, Inc. MongoDB Documentation; MongoDB, Inc.: New York, NY, USA, 2024. [Google Scholar]
GoDaddy. 2025. Available online: https://developer.godaddy.com/ (accessed on 5 May 2025).
Phishtank. Join the Fight Against Phishing. 2025. Available online: https://phishtank.org/ (accessed on 5 May 2025).
VirusTotal. Virustotal API V3 Overview. 2025. Available online: https://docs.virustotal.com/reference/overview (accessed on 5 May 2025).
Udofia, E. Webscrapping: Beautifulsoup or Selenium? 2024. Available online: https://medium.com/@udofiaetietop/webscrapping-beautifulsoup-or-selenium-3467edb3c0d9 (accessed on 5 May 2025).
Group, C.T.I. PhishTank > API Information. Available online: https://phishtank.org/api_info.php (accessed on 5 May 2025).
Michaels, A.J. Improved RNS-based PRNGs. In Proceedings of the 13th International Conference on Availability, Reliability and Security; Association for Computing Machinery: New York, NY, USA, 2018; ARES ’18. [Google Scholar] [CrossRef]
Hao, S.; Thomas, M.; Paxson, V.; Feamster, N.; Kreibich, C.; Grier, C.; Hollenbeck, S. Understanding the domain registration behavior of spammers. In Proceedings of the 2013 Conference on Internet Measurement Conference; Association for Computing Machinery: New York, NY, USA, 2013; IMC ’13; pp. 63–76. [Google Scholar] [CrossRef]
Lu, C.; Liu, B.; Zhang, Y.; Li, Z.; Zhang, F.; Duan, H.; Liu, Y.; Chen, J.Q.; Liang, J.; Zhang, Z.; et al. From WHOIS to WHOWAS: A Large-Scale Measurement Study of Domain Registration Privacy under the GDPR. In Proceedings of the 2021 Network and Distributed System Security Symposium, Virtually, 21–25 February 2021. [Google Scholar]
Huang, L.S.; Rice, A.; Ellingsen, E.; Jackson, C. Analyzing Forged SSL Certificates in the Wild. In Proceedings of the 2014 IEEE Symposium on Security and Privacy; IEEE: Piscataway, NJ, USA, 2014; pp. 83–97. [Google Scholar] [CrossRef]
Akram, M.; Barker, W.C.; Clatterbuck, R.; Dodson, D.; Everhart, B.; Gilbert, J.; Haag, W.; Johnson, B.; Kapasouris, A.; Lam, D.; et al. Securing Web Transactions: TLS Server Certificate Management; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2020. [Google Scholar]
Cloudfare. What Is an SSL Certificate? 2025. Available online: https://www.cloudflare.com/learning/ssl/what-is-an-ssl-certificate/ (accessed on 5 May 2025).
Browserling. Online Cross-Browser Testing. 2025. Available online: https://www.browserling.com/ (accessed on 5 May 2025).
BrowserStack. Login Automation Using Selenium Webdriver: Tutorial. 2024. Available online: https://www.browserstack.com/guide/login-automation-using-selenium-webdriver (accessed on 5 May 2025).

Figure 1. Comparison of desired input and destination versus squatting attack iterations.

Figure 2. Overview of experimental design and process.

Figure 3. Overview of triage system process and steps.

Figure 4. Example triage system output for faxebook.com domain.

Figure 5. Example VirusTotal API output for faxebook.com.

Figure 6. Fake ID generation overview.

Figure 7. Example signup engine form for ID number 25520 and domain www.microwoft.com.

Figure 8. Comparison of number of domains at key experimental points.

Figure 9. Proportion of active domains and signup domains by domain category.

Figure 10. Server location for registered domains.

Figure 11. Server location for malicious domain registrations.

Figure 12. SSL certificate issuer map.

Figure 13. VirusTotal scores for top 20 most malicious domains.

Figure 14. ImmigrationDirect landing page-A website that redirected from USCIS domain typos.

Figure 15. Joyacasino landing page-An online cryptocurrency casino that multiple typosquatted domains redirected to.

Figure 16. Cointiply landing page-A cryptocurrency rewards website that was the most frequent email sender.

Figure 17. Email newsletter solicitation from Alyssa @ Cointiply.

Figure 18. Promotional email from Joya.Casino.

Table 1. Overview of squatting techniques and corresponding sources.

Technique	Description	Summary	Citations
Typosquatting	Domains variations created utilizing the concept of “fat-finger distance”, which is the likelihood of a user to mistype a letter when accessing a domain.	Most prevalent, targets typing errors	[21,22,24,25]
Combosquatting	Technique that combines common legitimate domains with believable keyword extensions.	Targets well-known brands	[22,26]
Sound Squatting	Technique that generates domains using similar-sounding words, otherwise known as homophones.	Targets audio and voice-to-text applications	[28,29,30,31]
Homograph Squatting	Technique that takes advantage of visually similar characters that have different ASCII values in order to visually deceive users.	Aims to visually deceive users	[32,33]
Bitsquatting	Technique that relies on random memory bitflips occurring in data transmission, resulting in users being redirected to the wrong domains.	Targets faulty hardware	[9,20]
Email Squatting	A technique that aims to take advantage of typos in email addresses or email client settings.	Targets email communications and common email domains	[24]

Table 2. Overview of squatting countermeasures.

Countermeasure	Description	Citations
Defensive Registrations	Preemptively registered the identified set of likely targeted domains by organizations	[15]
ML Algorithm	Creation of a ML algorithm to detect squatted domains	[35,36]
Browser Extension	Extension that automatically limits users from accessing potentially squatted domains	[8]
Gamified User Training	Training to educate users on squatting attacks and how to avoid them	[23]
Policy-Based Controls	Reform to registration policies and enforcement of standards	[47,48]

Table 3. Overview of tools used in triage script.

Tool/Library	Purpose
Python Requests	Domain connectivity and availability testing [66]
Python whois	Domain registration data collection [67]
Python SSL	SSL certificate verification [68]
Selenium	Web scraping and HTML content analysis [69]
MongoDB	Data storage and analysis [70]
GoDaddy API	Domain availability and sale status [71]
PhishTank API	Known phishing domain lookup [72]
VirusTotal API	Malware and malicious activity scanning [73]

Table 4. Top 10 registrars and their counts.

Registrar	Count	Registrar (Malicious > 3)	Count
GoDaddy.com, LLC. (Tempe, AZ, USA)	203	Above.com Pty Ltd. (Victoria, Australia)	13
CSC Corporate Domains, Inc. (Wilmington, DE, USA)	183	GoDaddy.com, LLC. (Tempe, AZ, USA)	12
MarkMonitor, Inc. (Meridian, ID, USA)	96	DYNADOT LLC. (San Mateo, CA, USA)	9
DYNADOT LLC. (San Mateo, CA, USA)	70	PublicDomainRegistry.com (Jacksonville, FL, USA)	7
Above.com Pty Ltd. (Victoria, Australia)	69	Internet Domain Service BS Corp (Nassau, The Bahamas)	7
Media Elite Holdings Limited( Panama City, Panama)	44	Media Elite Holdings Limited (Panama City, Panama)	6
Network Solutions, LLC. (Herndon, VA, USA)	38	Dynadot Inc. (San Mateo, CA, USA)	5
Corsearch Domains LLC. (London, UK)	35	Porkbun LLC. (Sherwood, OR, USA)	4
Internet Domain Service BS Corp (Nassau, The Bahamas)	31	Sea Wasp, LLC. (Metairie, LA, USA)	3
PublicDomainRegistry.com (Jacksonville, FL, USA)	31	CSC Corporate Domains, Inc. (Wilmington, DE, USA)	3
Total Domain Registrations	1882	Total Malicious Registrations	105

Table 5. Top SSL certificate issuers (non-malicious vs. malicious).

Issuer	Count	Issuer (Malicious > 3)	Count
Let’s Encrypt (San Francisco, CA, USA)	677	Let’s Encrypt (San Francisco, CA, USA)	66
GoDaddy.com, Inc. (Tempe, AZ, USA)	109	GoDaddy.com, Inc. (Tempe, AZ, USA)	7
DigiCert Inc. (Lehi, UT, USA)	72	DigiCert Inc. (Lehi, UT, USA)	5
Google Trust Services (Mountain View, CA, USA)	60	ZeroSSL (Vienna, Austria)	3
Amazon (Seattle, WA, USA)	24	Google Trust Services (Mountain View, CA, USA)	1
ZeroSSL (Vienna, Austria)	9
GlobalSign nv-sa (Portsmouth, NH, USA)	8
Sectigo Limited (Salford, UK)	8
cPanel, LLC. (Houston, TX, USA)	3
Apple Inc. (Cupertino, CA, USA)	3

Table 6. Top providers by number of domains marked as malicious.

Provider	Domains Marked Malicious
Fortinet	454
CRDF	321
Seclookup	236
Bfore.Ai PreCrime	79
alphaMountain.ai	65
CyRadar	60
G-Data	48
BitDefender	46
Trustwave	45
Webroot	29

Table 7. Counts per domain category for the 527 domains considered for signups (Ordered by Frequency).

Category	Count
Legitimate Website	177
Alternate Company	107
For Sale	69
Advertisement	52
Alternate Website	48
Error Message	37
Registered	18
Gambling	7
Cryptocurrency	5
Contact Us	4
Download	2
Login Page	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kolenbrander, J.; Rheault, E.; Michaels, A.J. Privacy Risks of Cybersquatting Attacks. J. Cybersecur. Priv. 2026, 6, 38. https://doi.org/10.3390/jcp6010038

AMA Style

Kolenbrander J, Rheault E, Michaels AJ. Privacy Risks of Cybersquatting Attacks. Journal of Cybersecurity and Privacy. 2026; 6(1):38. https://doi.org/10.3390/jcp6010038

Chicago/Turabian Style

Kolenbrander, Jack, Elliott Rheault, and Alan J. Michaels. 2026. "Privacy Risks of Cybersquatting Attacks" Journal of Cybersecurity and Privacy 6, no. 1: 38. https://doi.org/10.3390/jcp6010038

APA Style

Kolenbrander, J., Rheault, E., & Michaels, A. J. (2026). Privacy Risks of Cybersquatting Attacks. Journal of Cybersecurity and Privacy, 6(1), 38. https://doi.org/10.3390/jcp6010038

Article Menu

Privacy Risks of Cybersquatting Attacks

Abstract

1. Introduction

1.1. Motivation

1.2. Paper Overview

2. Literature Review

2.1. Overview of Squatting Techniques

2.2. Categories of Typosquatting Research

2.3. Squatting Attack Counter Measures

2.4. Privacy Impacts of Squatting Attacks

2.5. Introduction to Use and Abuse

3. Experimental Design and Methodology

3.1. Domain Selection and Alias Generation

3.2. Triage Process

3.3. Triage Tool Process

3.4. Signup Process

3.4.1. Creation of Identities

3.4.2. Sign Up Engine and Identity Registration

4. Results

4.1. Domain Activity and Sign Up Capability

4.2. Domain Registrations

4.3. Domain Certificates

4.4. Domain Scanning

4.5. Domain Sign Ups

5. Future Work and Extension

5.1. Expansion to a Broader Set of Domains and Squatting Techniques

5.2. Automated Identity Signups for Squatting Domain Iterations

5.3. Automated and Continued Interaction with Content Received from Squatting Domains

5.4. Advanced Forensic Analysis Tracking of Fake Identity Information

5.5. Ethical Investigation and Model Development for Fake Identity Applications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI