Next Article in Journal
On Information-Theoretic Scaling Laws for Wireless Networks
Previous Article in Journal
Image Captioning Using Topic Faster R-CNN-LSTM Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Survey of IPv6 Address Scanning Technologies

1
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
2
Nanjing Lexbell Information Technology Co., Ltd., Nanjing 210007, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(9), 727; https://doi.org/10.3390/info16090727
Submission received: 22 June 2025 / Revised: 22 July 2025 / Accepted: 29 July 2025 / Published: 25 August 2025

Abstract

Due to the huge network address space of IPv6, the overhead of IPv6 space detection is very large. Thus, how to improve the detection efficiency is a hot research topic. Firstly, the basic process of IPv6 network space detection is illustrated, and then the detection algorithms are categorized into two primary categories: optimization scanning-based detection algorithms and address generation-based detection algorithms. According to the timelines proposed by the algorithms, the ideas of these algorithms are discussed separately. Finally, a comprehensive comparison and analysis of these detection algorithms are conducted from multiple dimensions. Future research directions are discussed, including: collection of active IPv6 addresses and allocation patterns, discovery of IPv6 address detection patterns based on machine learning, improvement of the detection response rate, and moral issues caused by network detection.

1. Introduction

Rich Internet Applications not only support massive public information services and industrial production, but have also become an indispensable part of people’s daily lives, such as shopping, transportation, entertainment, finance, and social interaction. With the arrival of the Internet of Things (IoT) and the intelligent era, a challenge has been created of connecting an enormous number of IoT devices. The number of IoT devices worldwide is forecast to more than double, from 19.8 billion in 2025 to more than 40.6 billion by 2034 [1], far exceeding the size of the entire IPv4 address space [2,3]. Additionally, with the adoption of new technologies like the Industrial Internet and mobile networks, the IPv4 address shortage problem has become more severe. On November 25, 2019, the Réseaux IP Européens Network Coordination Centre (RIPE NCC) announced that the last IPv4 address block with a /22 prefix had been allocated, making large-scale IPv6 network deployment inevitable [4,5,6].
In order to seize the discourse power in the new generation network field, governments and other organizations worldwide are striving to promote the implementation of IPv6 technology. In November 2017, the General Office of the CPC Central Committee and the General Office of the State Council issued the Action Plan for the Large-scale Deployment of Internet Protocol Version 6 (IPv6) to accelerate the deployment of IPv6 [7]. By the end of 2018, the IPv6 traffic of 24 countries accessing Google had already exceeded 15%, while China’s traffic remained below 5% [8]. According to statistics from the China Internet Network Information Center (CNNIC), as of January 2025 [9], China had a total of 69,148 IPv6 address blocks of /32, and the number of active IPv6 users had reached 822 million, ranking first globally. With the upsurge in IPv6 network construction, the effective management of IPv6 networks has become an important current issue. Firstly, it is necessary to comprehend the deployment status of IPv6 through network space detection technology. IPv6 detection can not only promote cyberspace regulation, but also can serve as the first step of network attacks [10].
IPv6 address space detection has gone through two stages. In the first stage, researchers attempted to achieve IPv6 detection by enhancing detection capabilities, such as improving detection methods, multi-threading, and constructing distributed detection systems. However, compared with the vast IPv6 address space, these remedies were not significant. In the second stage, IPv6 detection was accelerated by generating or inferring active IP address ranges. These algorithms can be roughly divided into two categories: Domain Name System (DNS)-based detection address generation algorithms and pattern-based detection set generation algorithms. The former obtains currently used IPv6 addresses by accessing or traversing DNS records. Obviously, this method can only acquire IPv6 addresses of hosts or devices with registered domain names. The latter collects a set of active IPv6 addresses through various methods and then uses machine learning and artificial intelligence techniques to infer the network segments where active IPv6 addresses are located, thereby reducing measurement overhead.
This paper aims to systematically describe the latest domestic and international research progress on rapid IPv6 address space detection and provide researchers with directions for further study. The subsequent section structure is as follows: Section 2 overviews rapid address space detection technologies; Section 3 reviews optimized scanning-based detection algorithms; Section 4 analyzes address generation-based detection algorithms; Section 5 comprehensively compares various IPv6 detection algorithms and identifies future research directions; Section 6 comprises conclusions.

2. Overview of IPv6 Address Space Scanning

This section first explains the encoding and allocation rules of IPv6 addresses, which lay the foundation for scanning in cyberspace, and then outlines the IPv6 scanning process and constructs a generalized model.

2.1. IPv6 Address Space Encoding and Allocation

  • Overview of IPv6 Addresses
The range of IPv6 addresses 0~2^128 is very large, compared to the range of IPv4 addresses 0~2^32. Even a grain of sand can be allocated a IPv6 address. Due to the relatively long length of IPv6 addresses, which are not conducive to writing and reading, the international community has standardized them in multiple RFC documents, such as RFC1884 [11], RFC2373 [12], RFC3513 [13], etc. In the addressing schemes of IPv6, representative examples include RFC2373, which defines the IEEE EUI-64 format. Currently, RFC4291 [14] is more widely accepted, which recommends representing IPv6 addresses in the colon-hexadecimal notation.
There are three common representation methods for IPv6 addresses. The first is the colon-hexadecimal notation, which represents an IPv6 address as X:X:X:X:X:X:X:X, where each X represents 16 bits and is displayed in hexadecimal format [15]. The second is zero-compression notation: when there are multiple consecutive “0”s in an IPv6 address, the consecutive “0”s are compressed and represented as “::”, with the constraint that this symbol can only appear once to ensure unique address resolution. The third is embedded IPv4 address notation, which divides a 128-bit IPv6 address into the first 96 bits and the last 32 bits, represented in colon-hexadecimal and dotted-decimal notations, respectively (e.g., X:X:X:X:X:X:d.d.d.d), thus achieving IPv4-IPv6 compatibility. In addition, various countries and operators have formulated addressing schemes. These IPv6 addressing schemes provide the possibility for designing heuristic detection methods and effectively reduce the detection range to improve detection efficiency.
IPv6 addresses are divided into three categories: unicast address, multicast address, and anycast address. In unicast addressing mode, when a network switch or router receives a unicast IPv6 packet destined for a host, it sends the packet to a single output interface. The working mechanism of IPv6 multicast mode is the same as that of IPv4 multicast mode. All hosts interested in the multicast information need to first join the corresponding multicast group. The hosts joining the group receive the multicast data and process it. Anycast addressing is a new addressing method proposed by IPv6. Multiple hosts or devices have the same anycast IP address configured for their interfaces. The sender of the message sends a unicast message to the network, and the anycast protocol sends the packet to the host closest to the sender.
2.
Analysis of Address Patterns
The huge address space of IPv6 and the sparse active nodes in the network space have created difficulties in management. Consequently, numerous studies on IPv6 address pattern allocation have been conducted, proposing various allocation strategies. From the perspective of address composition, these include automatic random allocation and allocation methods based on specific encoding rules. From the perspective of configuration methods, they can be divided into automatic allocation and manual allocation [16].
For a 32-bit subnet host, theoretically there are 2^96 selectable addresses. In reality, the addresses of each Internet Service Provider (ISP) have different spatial and temporal characteristics. Stable addresses contain pseudo-random numbers in the interface identifier, and even in the network identifier [17]. After a newly connected host sends a request message to the local router, the router returns the IPv6 prefix information to it. Then, the host combines the locally generated Interface IDentifier (IID) with the prefix to generate an IPv6 address. According to different IID generation strategies, IPv6 addresses can be divided into six categories: EUI-64 addresses, temporary addresses, IPv4 embedded addresses, port embedded addresses, low-order addresses, and IPv6 transition technology addresses.
From the above allocation strategies, it can be seen that IPv6 addresses may include IPv4 addresses, geographical locations, ISP and user information, etc. [18], which reduces the spatial range of probes. For example, when an IPv6 address adopts the EUI-64 format, the address scanning range can be reduced from 0~2^64 to 0~2^48..In addition, if the 24-bit organization identifiers of all MAC addresses in this network are the same, the scanning range can be further reduced to 0~2^24 bits [18]. Meanwhile, research shows that low-order addresses account for 70% of the IPv6 addresses configured in routers [19]. When detecting the backbone of a network, a low-order address scanning strategy can be adopted to improve the scanning efficiency. Moreover, the domain names of some networks follow certain patterns. Active IPv6 addresses can be obtained by guessing domain names and traversing DNS query records. Therefore, studying the address allocation patterns of IPv6 to improve the efficiency of IPv6 network address detection has become an important research direction [15,16,17,18,19,20].

2.2. Detection Process and Challenges

Address space detection is the foundation for generating network topology views, user views, routing views, and security views, and it is of great significance for cyberspace modeling, network management, network defense, network attacks, etc. From the perspective of management and defense, address space detection is the starting point for finding out the basic situation and formulating management and defense plans. From the perspective of attacks, address space detection is the first step in discovering potential attack targets and collecting target information [21]. However, with the accelerating deployment of IPv6 networks, the effective management of the network has become more urgent. Thus the detection or discovery of active addresses in IPv6 networks has become the most pressing issue.
In the IPv4 cyberspace, researchers have proposed a variety of detection methods and tools that can achieve rapid detection of the cyberspace, such as Zmap [22] (University of Michigan, Ann Arbor, MI, USA), MASSCAN 1.3.2 [23] (Robert D. G., Washington, DC, USA), etc. Due to the vast IPv6 address space and the extremely sparse distribution of active IP addresses in the cyberspace, the abovementioned tools cannot meet the needs of IPv6 space detection. In order to improve the detection efficiency and reduce the disturbance, it is necessary to optimize the detection process. The general process of IPv6 address space detection is shown in Figure 1. Firstly, the IPv6 basic database is obtained from third-party databases, Internet data, DNS domain name databases, filing data, traffic extraction, etc., which contain active IPv6 address data. Secondly, based on these original address data, methods such as machine learning and pattern analysis are used to infer the allocation rules of the IPv6 address space. Thirdly, the target set of detection addresses is generated. Finally, detection tools such as Ping6 or Traceroute6 are used to quickly scan the IPv6 target address space to obtain new active IPv6 addresses, the above process is continuously iterated to discover more active IPv6 addresses, and the detection results are displayed and analyzed.
Since the IPv6 address space is extremely vast, simply relying on methods such as increasing detection nodes and using multi-threading cannot effectively shorten the detection time. The core issue is how, in the face of the sparse IPv6 space, to reduce the spatial scope of the detection targets, so as to effectively shorten the detection time. In cyberspace detection, it is assumed that the target IPv6 address space is represented as T, the set of actually allocated IPv6 addresses is A, the set of online IP addresses is O, and the set of known active IP addresses is OA. Obviously, OAOAT, as shown in Figure 2.
If the set of detection addresses is Os, the cyberspace detection is represented as Scan(•), and the detection target generation algorithm is represented as Hlist(•), then the cyberspace detection can be modeled as an inference process from the known active IPv6 addresses OA to the actually allocated addresses A or the set of online IP addresses O, which is expressed as Equation (1).
Scan(Os) = Scan(Hlist(OA))→A or O
In the above model, the detection is completed in accordance with the standard network communication protocols. Generally, the Simple Network Management Protocol (SNMP) or other application layer protocols are used for detection, such as tools like ping6 and traceroute6. Therefore, the core of the IPv6 address space detection is Hlist(•), which is how to generate the detection address set Os from the known set of active IPv6 addresses OA, as shown in Equation (2).
Os = Hlist(OA)
Obviously, an optimal detection algorithm should satisfy Os = O, If OOs and |Os| ≪ |T|, it can also be regarded as a good detection algorithm. In fact, most detection algorithms can only achieve Oc = OsO = ∅, and |Oc| ≫ |T|.
As can be seen from the above model, in IPv6 space detection, research needs to be carried out on two aspects. On the one hand, the requirement is to improve the detection capability. On the other hand, it is about how to reduce the detection space, so as to shorten the detection time. There are many detection algorithms for the IPv6 address space, which can be classified from different perspectives. In reference [24], all of the IPv6 space detection algorithms are mainly classified into two kinds: address internal structure analysis and seed address fingerprint discovery by clustering analysis. Cheng X. et al. classify the detection algorithms into two classes: generating IPv6 addresses based on the structural information of seed addresses and generating IPv6 addresses based on the semantic information of seed addresses [25].
However, these classifications do not consider the whole evolution of the IPv6 detection algorithms. Thus, this paper classifies all IPv6 detection algorithms into two different types: optimized scanning-based detection algorithms and address generation-based detection algorithms.

3. Optimized Scanning-Based Detection Algorithms

Faced with the vast IPv6 address space, an intuitive approach is to increase detection speed to shorten detection time. Research in this area primarily focuses on two aspects: enhancing the detection capabilities of a single detection node and improving the detection capabilities of distributed detection systems.

3.1. Single-Point Detection Optimization Algorithms

This type of method mainly focuses on studying the composition patterns of certain types of IPv6 addresses or improving the detection methods.
Colitti et al. [26] introduced a series of techniques to detect and collect information about IPv6 in IPv4 tunnels and demonstrated how to use known tunnels to initiate third-party tunnels and expand the discovery process. Ahmed et al. [27] described the technical specifications of the Neighbor Discovery Protocol and demonstrated its components, functionalities, and workflow, which can be used for discovering IPv6 addresses. Chown T. [28] analyzed IPv6 addresses using multiple datasets and conducted statistical analysis on a certain number of addresses. The results showed that more than 70% of hosts used stateless address auto configuration (SLAAC) address types and embedded IPv4 address types, while 70% of routers used low bit address types. This knowledge will greatly reduce the scanning range of hosts in IPv6 networks, providing an important reference for IPv6 network scanning. According to the rules of IPv6 address distribution, Gont F. et al. [18] listed many methods that use address distribution characteristics to narrow down the scanning range of IPv6 addresses, such as scanning methods for virtual machine addresses, low order addresses, and EUI-64 addresses.
Junyi L. et al. [29] studied and experimented with three specific IPv6 scanning strategies, forging RA messages, multicast ping6, and error detection messages, and analyzed the advantages and disadvantages of these three strategies through latency and simulation experiments. The experimental results showed that the two scanning strategies of forging RA messages and multicast ping6 can detect active hosts faster than multicast ping6. Su F. et al. [30] constructed a Next Generation Internet worm model (NGIWM). This model adopted a three-tier architecture, mainly including the domain hosts locating layer (DHL), local propagation layer (LPL) and distributed information sharing layer (DIS), enabling it to have large-scale popularity and achieve the effect of fast scanning of IPv6 networks.
In addition, some scanning tools such as Zmap also use multi-threading to improve the single-point detection ability. However, none of the above methods can solve the problems caused by the vast IPv6 space.

3.2. Distributed Detection Algorithms

Currently, several international open measurement organizations and commercial entities, such as CAIDA [31] and Rapid7 [32], have been conducting long-term monitoring and tracking of IPv6 networks. These institutions have amassed a substantial volume of IPv6 address data through their respective tools and methodologies.

3.2.1. CAIDA Archipelago

Since 2008, CAIDA has leveraged its globally deployed Archipelago monitoring infrastructure to collect IPv6 data, generating both IP-level and AS-level topological insights [33]. Specifically designed for active network measurement, Archipelago (Ark) [34] simplifies operations compared to general-purpose distributed experimental platforms. Beyond initiatives like spoofed address discovery, TCP behavior inference, and Domain Name System (DNS) health assessment, Ark has collaborated with researchers from the Simula Research Laboratory to study the stability and performance of IPv4 and IPv6 [35]. Based on the dual-stack Ark monitoring nodes, Ark conducts high-frequency Ping and Traceroute measurements of IPv4 and IPv6 on the dual-stack servers in the Alexa list [36], aiming to compare the reachability and performance of dual-stack targets on IPv4 and IPv6. Moreover, Ark has also carried out topological detection work in the IPv6 address space. For each detection path, Ark collects the IP addresses, Round-Trip Time (RTT), response Time-To-Live (TTL), and Internet Control Message Protocol (ICMP) responses of all hops—including intermediate hops. Every 48 h, each monitoring node scans all announced /48 or shorter IPv6 prefixes [33]. This process, termed a “cycle”, involves randomly selecting one target per prefix. Successive cycles use independent random orderings for prefix selection and target addresses within prefixes, ensuring that no prefix is scanned within a 16-h window across cycle boundaries. The resulting IPv6 topology dataset is publicly available [31].
In addition, for the IPv6 addresses in the topology database, CAIDA uses a customized batch DNS lookup service to centrally perform DNS resolution. That is, after collecting the topological information, DNS resolution is quickly executed so that the results can better match the Internet status when the traces are collected. This service executes millions of DNS resolutions every day. However, in order to avoid excessive requests to DNS servers, Ark has taken some measures to reduce the query frequency, and established a DNS naming database [33].

3.2.2. Rapid7 Nexpose

In order to address the increasingly severe IPv6 security issues, Rapid7, a provider of security risk information solutions, has added discovery and scanning capabilities for IPv6 addresses to its flagship product Nexpose in 2012. The main reason for adding this capability is the current accelerated transition from IPv4 to IPv6. Although most organizations claim they have not deployed IPv6, many devices enable IPv6 settings by default. This will introduce security risks without the user’s knowledge, as many existing security products lack consideration for IPv6 security. Attackers can exploit this gap by targeting the IPv6 protocol to gain tunnel access to IPv4 devices and leverage unrecognized/untreated IPv6 vulnerabilities for attacks [34].
The main functions of the Nexpose platform include the following steps: (1) Executes IPv6 discovery through the IPv4 network, enabling organizations to disable IPv6 devices in the IPv4 network and thus avoid potential security risks; (2) Displays the devices with IPv6 enabled; (3) Conducts scans to discover vulnerabilities in IPv6 devices; (4) Exports the data to Rapid7’s Metasploit platform, and then runs a risk assessment to verify the risks based on vulnerability exploitation.
The above algorithms are all aimed at enhancing the detection capability of the detection system, rather than narrowing the detection target space. These methods and strategies have certain effects on specific networks or address segments, but compared to the huge detection target space, this research route has little effect on the detection of the entire IPv6 space.

4. Address Generation-Based Detection Algorithms

Scanning a 64-bit IPv6 subnet space at a rate of one address per second would take approximately 5 billion years [28]. Given the typically low host density (total number of hosts/total number of addresses) in IPv6 networks, reducing the detection space has gradually become a research direction for IPv6 network space detection. IPv6 address generation algorithms can be roughly divided into two categories: DNS-based detection address generation algorithms and pattern-based detection address generation algorithms.

4.1. DNS-Based Detection Address Generation Algorithms

DNS servers provide a mapping between IP addresses and domain names. The IP addresses of websites can be obtained through DNS resolution. Based on this principle, references [37,38,39,40,41,42,43,44] proposed a series of detection algorithms for IPv6 addresses, which can be divided into three categories: algorithms based on website resolution, IPv4-to-IPv6 mapping methods, and traversal algorithms based on domain trees.

4.1.1. Algorithms Based on Website Resolution

This algorithm is the most straightforward IPv6 address space detection algorithm, which can directly resolve IPv6 addresses through websites. Mehdi Nikkhah et al. [42] crawled the top one million IP addresses from the Alexa website [36] to resolve IPv4 and IPv6 addresses, mainly to study the characteristics, progress, and influencing factors of the transition from IPv4 to IPv6 networks.
When querying a PTR record from a DNS server, an NXERROR response indicates the domain name exists but the queried IP address does not. A NXDOMAIN response signifies that the queried domain itself is non-existent. Leveraging this principle, reference [43] proposed an IPv6 address detection method as follows: Firstly, obtain the information of the allocated network segments from the RIR website. Then, determine the query time interval according to a Poisson distribution with an average time of 1 s, and query the PTR record from the DNS server. Then, traverse the website addresses according to two different responses to discover active network segments and addresses, as shown in Figure 3. When querying the PTR record, there are three common responses:
  • NXERROR means the query domain exists in the ip6.arpa domain, but there are no PTR records. It adds a new nibble whose initial value is 0, and appends it to the previous reverse query.
  • NXDOMAIN means there are no records for the query; the program will increase the value on the current nibble and query again.
  • If the response is the hostname, the program will save the result in the database.

4.1.2. IPv4 vs. IPv6 Algorithm

Strowes [37] proposed an intuitive IPv6 address detection algorithm as shown in Figure 4. This algorithm performs reverse DNS resolution on the IPv4 addresses in the network traffic to obtain their domain names. Subsequently, it queries the AAAA record from the DNS server to retrieve the IPv6 address associated with the IPv4 device. This approach assumes that network service providers, aiming to reduce costs, often share physical devices and domain names, and that new IPv6 addresses are typically assigned to network infrastructure rather than end hosts or servers.
The primary advantage of this algorithm lies in its simplicity and feasibility. However, it is only effective for hosts or servers with registered domain names. Given that a significant number of network infrastructure devices and end hosts lack domain name registration, the comprehensiveness of its resolution is relatively limited.

4.1.3. Traversal Algorithm Based on Domain Name Tree

Fiebig T et al. [38] designed a DNS query algorithm based on the principle of “NXDOMAIN” returned when querying domain names. This algorithm transforms the detection of IPv6 addresses into a traversal problem of a 128-bit domain name tree, as shown in Figure 5. In order to improve the traversal speed, the algorithm is divided into two stages:
  • Network Segment Traversal: When traversing the subnet addresses, a combination of depth-first and breadth-first traversal is adopted. Firstly, a depth-first traversal of the first 32-bit subnets is carried out, and then a breadth-first traversal. After that, for the subnets with existing domain names, a depth-first traversal is performed again to obtain 48-bit subnets, and then a breadth-first traversal is carried out. According to the above algorithm, the 64-bit subnets are traversed.
  • Host Traversal: For each domain name space where hosts exist, a direct traversal to 128 bits is carried out. In order to speed up the search, this algorithm takes the active IPv6 address segments published by the Route View project [44] and the RIPE project [45] as inputs. Additionally, it analyzes the prevalence of dynamic domain names in the current network by introducing the Damerau–Levenshtein edit distance to compute domain name similarity, thereby facilitating the detection of dynamically allocated domain names.
Borgolte K et al. [39] further improved the algorithm in reference [38], taking into account the influence of the Domain Name System Security Extensions (DNSSEC). Network measurements show that 86% of root servers currently support DNSSEC, with the remaining servers expected to adopt this extension. In DNSSEC-enabled servers, domain name resolution returns not only “NXDOMAIN” but also the names of the two adjacent domain name tree nodes before and after the queried domain. For example, querying the non-existent domain “b.edu” on a traditional DNS server returns only “NXDOMAIN,” while a DNSSEC-enabled server returns (a.edu, g.edu) as neighboring nodes, as shown in Figure 6.
At present, DNS servers with security extensions include different versions. The initial versions do not encrypt domain names, while in DNSSEC3 and later versions, hash functions are calculated for domain names, which may not preserve monotonicity in hashed values. Consequently, the DNSSEC-based traversal tree method processes two server types—DNSSEC and DNSSEC3—separately. The algorithm proceeds by randomly selecting an initial domain name, recording resolution results at each step, modeling encrypted domains as a ring structure, and iteratively sampling the hash values within the ring until the entire interval is covered. Compared to the algorithm in reference [38], this approach enables exploration of distinct DNS regions and enhances traversal efficiency to some extent.
Fiebig T. et al. [46] conducted a comparative analysis of the IPv4 and IPv6 addresses collected by the rDNS method and the passively collected IP addresses, and pointed out that using “NXDOMAIN” for detection can detect 40% of the IPv6 addresses of each DNS server. Through data comparison, it is shown that using rDNS to discover active IP addresses is an effective method and can serve as a supplement to passive detection.
This type of algorithm effectively improves the efficiency of IPv6 address discovery in three respects. However, it is still ineffective for hosts that have no records in the DNS. At the same time, the principles of this type of algorithm have also been used by some researchers to detect IPv6 scans, as in reference [41], etc.

4.2. Pattern-Based Detection Address Generation Algorithms

After obtaining the allocation patterns of IPv6 addresses through analysis, generating a set of target IP addresses for detection based on this prior knowledge can effectively reduce the detection overhead. The following is an analysis of the algorithm for generating the set of target addresses for IPv6 detection.

4.2.1. Non-Neural Networks-Based Generation Algorithms

  • Algorithm Based on Seed Fingerprint
Ullrich J. et al. [47] proposed a recursive detection algorithm based on pattern generation, as shown in Algorithm 1. Firstly, the detected addresses are converted into bitstreams. Then they are iterated continuously in the above order, refining the given pattern with each additional bit of recursion. Each additional bit can enable the optimized pattern to cover the maximum number of address spaces among all candidate patterns. Each recursion increases the determined number of bits by one, thereby reducing the undetermined number of bits by one. If the undetermined number of digits is below the given threshold, address generation begins. Then, all addresses containing the current mode are generated in ascending order.
Algorithm 1. Recursion pattern-based IPv6 generation algorithm
Input: IP Bit Pattern (Marked with Determined and Undetermined Bits)
1: if count(undetermined bits) < threshold then
2:   iterateAddresses(pattern)
3:   return false
4: end if

5: rule = findBestRule(pattern)
6: pattern = apply(pattern,rule)
7: doRecursionWith(pattern)

8: alternativeRule = inverse(rule)
9: alternativePattern = apply(pattern, alternativeRule)
10: doRecursionWith(alternativePattern)
Obviously, this algorithm is influenced by the initial data set. Meanwhile, the setting of the threshold is also crucial as it affects the scope of detection. When the initial data set is small, the search space is still quite large, but the algorithm has better adaptability.
Similar to this algorithm, Song G. et al. [48] proposed the DET algorithm. It identifies high-density regions in seed addresses by constructing a density space tree, and dynamically generates target addresses to improve the efficiency of IPv6 active address detection. This algorithm assumes that active IPv6 addresses are more likely to exist in high-density areas of seed addresses, and utilizes this characteristic to optimize the address generation process.
The process of this algorithm is shown in Figure 7, and the main procedures are as follows: Firstly, convert the IPv6 seed address into a high-dimensional vector for clustering analysis. Next, using the top-down split hierarchical clustering method, a density space tree is constructed based on these vectors. The root node of a tree represents the entire set of seed addresses, while the leaf nodes represent high-density regions. During the construction process, the algorithm selects the dimension with the smallest information entropy for node splitting, ensuring that high-density regions are concentrated on leaf nodes or the same branch. Subsequently, the algorithm dynamically generates target addresses in high-density regions of the density space tree, and adapts to the new address distribution by continuously expanding the seed address set and updating the tree structure. Finally, based on budget and address granularity, the algorithm flexibly adjusts the address generation strategy to improve detection efficiency. Experiments have shown that, for small budgets, choosing a tighter generation method (such as β = 1) is more effective. For large budgets, a slightly looser generation method (such as β = 2) can be chosen.
2.
Information Entropy-Based Generation Algorithm
Foremski et al. [16] utilized the entropy theory to conduct a comprehensive analysis of the IPv6 address structures of servers, routers, and clients, so as to construct the corresponding IPv6 address structure models and generate the IPv6 detection addresses. Entropy is an important indicator for measuring the unpredictability of information content. Generally, for a discrete random variable X with possible values {x1, x2, …, xk} and a probability mass function P(xi), it is defined as H(X):
H ( X ) = i = 0 k P ( x i ) log P ( x i )
In the formula, the higher the entropy, the greater the uncertainty of the value of X. If H(X) = 0, then X takes only one value; if H(X) reaches its maximum, then P(X) will exhibit a uniform distribution. Since IPv6 addresses have various forms of representation, to facilitate the analysis of the IPv6 network address structure using the concept of entropy, Foremski et al. uniformly represented IPv6 addresses as 32-bit hexadecimal characters without colons. Then, the entropy analysis is conducted for each character. After that, Foremski et al. segmented the addresses according to adjacent fields with similar entropy values and designed the corresponding threshold-based segmentation algorithm. Then, the patterns in the non-random fields is mined and clustering analysis is carried out on the address sub-segments. Finally, the Bayesian method is used to model the IPv6 address space. Because Bayesian networks can be used to simulate complex phenomena involving many variables and provide a basic framework for learning and inference under uncertainty, the idea of Bayesian networks can be applied to conduct structure learning and parameter learning in the massive and complex IPv6 network address space.
3.
6Gen Algorithm
Murdock A. et al. hypothesized that the dense seed regions are associated with the dense regions of active addresses, and modeled the seed addresses as independent and identically distributed random samples of the active addresses [49]. Based on the above hypothesis, the 6Gen algorithm was proposed. It first clusters the initially obtained seed clusters, and then expands the detection range for searching. After that, the active IP addresses obtained from the detection are reclustered. When the detection range reaches a certain threshold, that is, when it exceeds the range that the user can scan, the search will stop. The 6Gen algorithm can be divided into three parts:
(1) Definition of Address Distance. 6Gen uses the Hamming distance to represent the distance between two IPv6 addresses. The calculation process involves performing an XOR operation on the two IPv6 addresses and counting the number of 1 s in the result. This value is the Hamming distance between the two addresses.
When calculating, the algorithm selects characters with a length of 4 bits (i.e., half a byte). There are two reasons for this. Firstly, IPv6 addresses may be allocated in half-byte units during address assignment. Secondly, if bits are used as the calculation unit, two IP addresses with low similarity may be mistakenly regarded as addresses with high similarity.
Therefore, when calculating the Hamming distance between IPv6 addresses, 6Gen compares characters in units of 4 bits, as specifically shown in Figure 8.
When calculating the Hamming distance, the pairs of IPv6 addresses with a distance of 1 are first analyzed. Then, the address range is expanded, and the different characters within it are replaced with wildcards, as shown in Figure 9.
(2) Address Clustering and Expansion. After the operation shown in Figure 9, a new address range 2::? can be obtained. Then, this address range is matched with other IPv6 addresses to obtain the number of existing addresses within this address range. Since the total number of addresses contained in this address range is 16, if there are other addresses in the original IPv6 address set that also belong to this address range and the number of such addresses is N, then the address distribution density of this address range is N/16. This distribution density serves as a representation of the maturity of the address range expansion.
After the address range expansion as shown in Figure 9, it continues to be compared with other IPv6 addresses, and the addresses are reaggregated with adjacent Hamming distances, as shown in Figure 10.
After the above operations, a new address range 2::?:? is generated. By using the aforementioned method and comparing this address range with the original IPv6 address set again, the address distribution density within the new address range can be obtained. Through continuously repeating the above operations and conducting iterative searches, it is possible to prioritize the scanning of the address space with the highest probability of potential IPv6 addresses, thus greatly improving the detection efficiency.
(3) Address Detection and Iteration. Guided by the obtained address distribution density, the highest-density address range is prioritized for detection. The target address set comprises addresses outside the original IPv6 address set but within this high-density range. This approach leverages IPv6’s block-based allocation pattern: calculating address distribution density via Hamming distance within address ranges reveals allocation rules, thereby enhancing detection hit rates.
Upon completing target address set detection, newly identified IPv6 addresses are merged into the original set. Address distribution densities across all ranges are then recalculated, and the process repeats—selecting the updated highest-density range for detection until the entire scan is complete.
Experiments have shown that the efficiency of the 6Gen algorithm is 1 to 8 times that of the Entropy/IP algorithm. Inspired by the 6Gen algorithm, Zheng G et al. [50] proposed the IDEC algorithm. In this algorithm, the learning of IP address seeds is measured by information entropy, and the clustering adopts the Ordering Points to Identify the Clustering Structure (OPTICS) in the density clustering algorithm. Experiments have demonstrated that this algorithm performs better than the 6Gen and 6Tree algorithms.
4.
6Tree Algorithm
In reference [51], Liu et al. proposed the 6Tree algorithm, which analyzes the known active IPv6 addresses as seeds to discover their distribution characteristics and provides recommended search directions for IPv6 detection. The 6Tree algorithm employs the top-down hierarchical clustering (DHC) algorithm, treating obtained active IPv6 addresses as a set of high-dimensional vectors and generating a spatial tree data structure based on their similarity. This structure describes the changes in values in different dimensions. In addition, the 6Tree algorithm can also dynamically adjust the appropriate direction according to the real-time scanning results, and for the first time embeds the alias detection into the search.
Liu et al. defined the concept of the IPv6 address space tree. With 2m as the base (m denots the dimension of the tree), the space tree is a 2m forked tree. Each node in the tree has an address vector sequence σ, and its child nodes form a partition of the sequence σ [51]. The distribution of active IPv6 address vectors is uneven: some dimensions have more variable values and higher information entropy than others. The IPv6 address detector preferentially detects end nodes with significant information changes. Newly discovered active address vectors vary greatly across different nodes but share the same values in parent nodes. The overall structure of the 6Tree is shown in Figure 11.
The 6Tree executes the DHC algorithm and generates a spatial tree based on the seeds. Then, the IPv6 address detector conducts an Internet-wide scan according to the search direction provided by the spatial tree and dynamically adjusts the direction according to the real-time feedback. If the IPv6 address detector identifies an area with active addresses, it will interrupt and start detecting the alias prefixes. The dynamic scanning will continue until the number of scanned addresses reaches a predetermined value. This algorithm mainly consists of three modules.
(1) Generation of the Spatial Tree. The spatial tree generation algorithm proposed by Liu et al. converts the seeds into address vectors based on the input radix β (for example, β = 16 for hexadecimal) and sorts them into a vector sequence according to the corresponding binary integers [51]. In the sequence, the corresponding addresses of the closer vectors are likely to belong to a longer prefix.
When the DHC (Top-down Hierarchical Clustering) starts, the root node’s sequence contains all seed-corresponding vectors. Clustering is then performed to divide the sequence into child node subsequences. This process continues until the number of vectors in each leaf node does not exceed β.
During the clustering process, the DHC divides the sequence based on the leftmost variable dimension δ* in the sequence. Then, vectors in each subsequence share the same value in dimension δ*, and each subsequence is assigned to a specific child node.
(2) Dynamic Detection. Firstly, the 6Tree prepares the data structure for dynamic scanning and generates the initial target addresses on the leaf nodes. Then, the 6Tree conducts a predetection of all the initial targets. It sorts the leaf nodes into a sequence according to the ratio of the number of active addresses to the number of detected addresses.
After the predetection, it starts the iterative detection until the budget is exhausted or the entire address space is scanned. In each iteration, it removes the nodes with a large number of detected addresses (AAD) from the node sequence and scans more target addresses. Finally, it reorders the sequence according to the real-time results.
(3) Alias Detection. The next target region (TS) to be detected expands exponentially in a one-dimensional scale. This bottom-up process facilitates the timely perception of aliases and the detection of their scale, so as to avoid unnecessary impacts on the corresponding devices. If the TS is already large and the AAD (number of detected addresses) remains high, the node is considered abnormal, and alias detection is triggered at this point.
Song G et al. [24] integrated the entropy/IP algorithm and the 6Tree algorithm and proposed the DET algorithm. Similar to the 6Tree algorithm, the DET algorithm also uses a clustering tree construction method. The difference lies in that, when a node generates child nodes, the method of minimum information entropy is adopted. The 6Tree is a bottom-up clustering algorithm, while the DET algorithm is a top-down clustering algorithm. Although there are differences in the measurement of node distances, all three algorithms share the common characteristic that they cannot completely achieve the global optimum.
5.
Anomaly Seed Detection Algorithms
Most of the current main IPv6 address generation technologies only conduct spatial partitioning of the seed set without considering whether there are abnormal seeds or patterns in the address regions after partitioning. Abnormal IPv6 seeds can cause an increase in the spatial size of the address region, thus prolonging the scanning and detection time and reducing the effectiveness of the algorithm.
Therefore, references [52,53] proposed two new anomaly detection algorithms: one for IPv6 anomaly seed detection based on graph clustering and the other based on ensemble learning.
The IPv6 abnormal seed detection technology based on graph clustering adopts an unsupervised anomaly detection algorithm of minimum spanning tree clustering. It can accurately eliminate abnormal addresses in the region. To further reduce the time complexity and improve the detection efficiency, ensemble learning-based IPv6 abnormal seed detection technology was proposed by introducing the improved isolation forest algorithm. Based on this, the method of ensemble learning is used to exclude abnormal addresses. Compared with the above method, this technology has a linear time complexity and is more efficient in detecting abnormal seeds. Such methods have opened up new ideas for the generation of IPv6 detection addresses.

4.2.2. Neural Networks-Based Generation Algorithms

  • Deep Neural Networks-based Generation algorithms
Reference [54] first introduced deep neural networks into the generation of IPv6 detection addresses and proposed the 6GCVAE algorithm as shown in Figure 12. This algorithm first converts the 128-bit information of an IPv6 address into a 32-dimensional high-dimensional coordinate according to half bytes. Then, it learns the seed features through a Gated Convolutional Neural Network (Gated CNN) and uses these features as the input of a Variational Autoencoder to reconstruct the 32-dimensional address vector. Finally, it converts the vector into the target IPv6 address range.
After that, the author introduced word integration and a language model into the field of IPv6 detection address generation to improve the 6GCVAE algorithm. Thus the 6VecLM algorithm was proposed [55]. Unlike the 6GCVAE algorithm, after converting the 128-bit information of an IPv6 address into a 32-dimensional high-dimensional coordinate, the 6VecLM does not directly used for neural network training. Instead, an address sentence conversion technology named IPv62Vec is adopted. The specific process is as follows: an IPv6 address is first converted into 32 half-byte values, and then all the position indexes and the corresponding half-byte values are combined into a “word”, thereby representing the IPv6 address as a sentence.
These “sentences” will be used as the input of the Skip-gram algorithm to train the neural network. As a natural language vectorization technology, the Skip-gram algorithm can effectively overcome the problem of the loss of sentence word order information caused by disassembling natural language into words. The Transformer network can generate new sentences by learning from previous sentences, thus generating a new IPv6 detection space. The 6VecLM transplants the Transformer into the generation of IPv6 target addresses, transforming the problem of generating IPv6 target addresses into a text generation problem, which is a method based on the semantic information of seeds. The algorithms are tested on a daily updated public dataset IPv6 Hitlist and a measurement dataset CERN IPv6 2018. 6VecLM obtains better experimental results than the target generation algorithms entropy/IP and 6Gen.
Overall, due to the relatively high cost of deep learning, the methods of generating IPv6 detection addresses based on deep neural networks take a long time to generate the target detection addresses. This is are not suitable for scenarios that require real-time and high-speed detection.
2.
Generative Adversarial Network-based Generation Algorithm
The Generative Adversarial Network (GAN), first proposed in 2014, has been widely applied in fields such as image generation, video generation, style transfer, and text generation. The 6Gan algorithm proposed in reference [56] introduces generative adversarial networks into the field of IPv6 detection target address generation as shown in Figure 13. Firstly, it uses a manual or automatic classifier to classify the seeds. Then, it employs a Long Short-Term Memory network (LSTM) as the generator and a Convolutional Neural Network (CNN) as the discriminator, continuously strengthening the generation of IPv6 targets that follow a fixed addressing pattern. The 6Gan algorithm comprehensively uses three methods to classify the seed set: one based on the definition in RFC, another on the custom information entropy, and the third one on the IPv62Vec technology. Experiments on the three datasets show that the classifier based on the RFC definition performs best on dividing the seed set and generates the largest number of candidate addresses. Another experiment on 50k active addresses and 50k source addresses indicates that 6GAN outperformed the other target generation algorithms, such as entropy/IP, 6Gen, 6Tree, 6GCVAE and 6VecLM.
Similar to other neural network-based methods, this method has a relatively high computational complexity and low generation efficiency. Therefore, this technology is not suitable for the detection of large-scale IPv6 address spaces.
3.
Hybrid Neural Network-Based Generation Algorithm
Reference [56] divides address detection into two scenarios, a scenario without seed addresses and a scenario with seed addresses, and designs efficient detection algorithms for each scenario. For the seedless scenario, the 6EDL-N algorithm employs a neural network to model implicit relationships between Border Gateway Protocol (BGP) prefix information and address configuration patterns. This enables address migration from seeded to unseeded regions, expanding detection boundaries.
In the scenario with seed addresses, an active address detection method, 6EDL-S, based on the Generative Adversarial Network (GAN) is proposed. By analyzing the distribution rules of seed addresses and adopting an environmental feedback mechanism to alleviate the sampling bias of seed addresses, the hit rate is effectively improved.
Experiments show that both algorithms can improve the accuracy of detection. This algorithm is an improvement on 6Gen and 6Tree, especially for the implementation of seedless scene detection algorithm and transfer learning of IPv6 address space.

5. Comprehensive Comparison and Future Direction

5.1. Comprehensive Comparison and Analysis of Algorithms

To gain a more comprehensive understanding of all IPv6 address space detection algorithms, the following compares in detail the above-mentioned algorithms from various dimensions, such as algorithm complexity, proposed time, principle, seed dependency, efficiency, and integrity, as shown in Table 1.
Complexity: The computational complexity of the algorithm.
Proposed Time: The time when the algorithm or system was publicly released.
Principle: The core idea of the algorithm, the main algorithms adopted, and the core steps.
Seed dependent: Whether the algorithm requires an initial set of IPv6 addresses as input for training.
Efficiency: The ratio of the target IP addresses for detection generated by the algorithm to the discovered valid IPv6 addresses. If the two are close, it indicates high algorithm efficiency; if there are few active IPs in the detection space, it indicates low efficiency.
Integrity: Whether the algorithm can infer all IPv6 addresses in the target space or the situation of the detection space covered by the detection addresses generated by the algorithm. If the generated target address space is comparable to the original address set, it is considered poor; if it is larger than the initial target address space, it is considered good; if it covers the entire active space, it is considered excellent.
In order to further illustrate the evolution process of the algorithm and clarify the research ideas, this article summarizes the evolution process of the algorithm, as shown in Figure 14. In this figure, the detection address generation algorithms based on DNS and the algorithms based on pattern generation are respectively aligned using parallel X-axes. It can be seen that there has been no new progress in the former type of algorithms since 2018, while new research achievements keep emerging in the latter type of algorithms. From the perspective of the research ideas, most of the latter represent the IP allocation patterns through the method of information entropy, the address allocation patterns are represented by clustering techniques, and the research ideas tend to be consistent.

5.2. Future Research Directions

  • Collection of Active IPv6 Addresses and Allocation Patterns
The base of active IPv6 addresses serves as the foundation for rapid detection. On the one hand, it can be used as the starting set for traceroute-like detection tools. On the other hand, it is also the basis for mining IP address allocation patterns. Since the collection efficiency of active measurement is relatively low, using passive traffic collection is a more efficient and feasible approach. However, Internet Service Providers (ISPs) and network management departments often do not disclose the IP addresses they use, in order to ensure network security.
Plonka and Berger have published their work on classifying IPv6 addresses based on Akamai server logs [17]. Over the course of a year, they recorded customers from 133 countries and more than 4000 Autonomous Systems (ASes), and classified them according to temporal stability and spatial density. They found that only 4% of the addresses could remain stable for 4 d or longer. They also observed approximately 1% of EUI-64 addresses. Among the unstable EUI-64 addresses, about 62% roamed between /64 prefixes.
Therefore, it is crucial to conduct research on IPv6 address allocation patterns, encoding rules, and usage methods, which will provide basic data support for optimizing detection algorithms.
2.
Discovery and Detection of IPv6 Address Patterns Based on Machine Learning
Machine learning, especially deep learning, has achieved great success in many fields, including image recognition, speech processing, and natural language understanding. The reason behind this is that neural networks can better discover and fit the underlying models hidden in the data.
For the detection of the IPv6 address space, current methods such as the entropy model, semantic features, and statistical analysis are used for feature discovery. This involves a large amount of manual work, including model selection and threshold setting, and it is difficult to adjust dynamically with changes in the environment. In contrast, deep neural network methods do not require complex manual intervention and can adapt to various types of input data. They have the potential to better discover the usage patterns, allocation patterns, and detection patterns of IPv6 data sets.
Therefore, in the detection of IPv6 address space, selecting appropriate network architecture, clarifying the preprocessing method of the IPv6 address set, and using the discovered address set rules to guide the detection work are issues that need to be resolved.
3.
Improving the Detection Response Rate
Gasser et al. [59] collected active IPv6 addresses by deploying active and passive probes. The scanning results showed that 76% of the addresses obtained from active probes responded to ICMPv6, while only 13% of the addresses obtained from passive probes responded. This indicates that many routers, hosts, and servers do not respond to Ping6 type measurements. Therefore, the detection methods can be improved. For example, for devices providing HTTP or HTTPS services, corresponding protocol-based detection tools can be used, or detection can be carried out by checking the most commonly used ports, so as to increase the detection response rate.
4.
Moral Issues Caused by Network Detection
The current research aims to improve the efficiency and accuracy of detection algorithms, without considering the network overhead caused by network detection and the intrusion caused to ordinary users. This contradicts the original intention of setting a very large address space for IPv6 to avoid network scanning, and is also the cause of the problem of active IPv6 address space detection. Thus, how to perform target address space detection instead of discovering specific address sets while protecting user privacy will become a new research direction in the future. It may be able to bring new opportunities for improving the discovery of IPv6 active addresses.

6. Conclusions

With the deepening of the Internet of Things era, IPv6 addresses are more widely used in Internet of Vehicles, industrial control network and other applications. IPv6 will gradually replace IPv4 as the main body of the Internet.
In order to provide researchers with a comprehensive and in-depth understanding of current IPv6 address space detection technologies, we systematically summarized the evolution of IPv6 detection algorithms, dividing them into two stages: optimization scanning-based detection algorithms and address generation-based detection algorithms. In addition, we pointed out that current research focuses on the address generation-based detection algorithms, which guess the active IPv6 address by studying the allocation pattern of IPv6. However, current algorithms are attempting to improve the efficiency and accuracy of detection algorithms, without considering the network intrusion and security threats that detection may cause, leading to ethical or abusive scanning issues. Therefore, it is proposed that, in future research, the discovery of active IPs can be transformed into the discovery of active network segments.

Author Contributions

Conceptualization, Y.M. and L.C.; methodology, Y.M.; software, Y.M. and Z.W.; formal analysis, Y.M. and L.C.; investigation, Y.M. and L.C.; resources, Y.M. and L.C.; data curation, L.C.; writing—original draft preparation, Y.M. and Z.W.; writing—review and editing, Y.M. and Z.W.; visualization, Y.M. and Z.W.; supervision, L.C.; project administration, L.C. and Z.W.; funding acquisition, Y.M., L.C. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Joint Fund Key Program of National Natural Science Foundation of China under grant No. U22B2026, the Special Fund for Key Program of Science and Technology of Jiangsu Province under grant No. BG2024042, the National Key Research and Development Program of China under grant No. 2022YFB3104002, and the National Key Research and Development Program of Jiangsu Province under grant No. BE2022081.

Conflicts of Interest

Author Zhanfeng Wang was employed by the Nanjing Lexbell Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ISPInternet Service Provider
DNSDomain Name System
ICMPInternet Control Message Protocol
LSTMLong Short-Term Memory network
CNNConvolutional Neural Network
BGPBorder Gateway Protocol
GANGenerative Adversarial Network
ASAutonomous System

References

  1. Statista. Available online: https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/ (accessed on 26 May 2025).
  2. Ngo, T.; Yin, J.; Ge, Y.-F.; Wang, H. Optimizing IoT Intrusion Detection—A Graph Neural Network Approach with Attribute-Based Graph Construction. Information 2025, 16, 499. [Google Scholar] [CrossRef]
  3. Alshehri, S.M.; Sharaf, S.A.; Molla, R.A. Systematic Review of Graph Neural Network for Malicious Attack Detection. Information 2025, 16, 470. [Google Scholar] [CrossRef]
  4. Jain, V.K.; Aggrawal, J.; Dangi, R.; Prasad Shukla, S.S.; Yadav, A.K.; Choudhary, G. Unmasking the True Identity: Unveiling the Secrets of Virtual Private Networks and Proxies. Information 2025, 16, 126. [Google Scholar] [CrossRef]
  5. Li, K.-H.; Wong, K.-Y. Empirical Analysis of IPv4 and IPv6 Networks through Dual-Stack Sites. Information 2021, 12, 246. [Google Scholar] [CrossRef]
  6. The RIPE NCC Has Run Out of IPv4 Addresses. Available online: https://www.ripe.net/publications/news/about-ripe-ncc-and-ripe/the-ripe-ncc-has-run-out-of-ipv4-addresses (accessed on 21 July 2025).
  7. Chinese Government Website. Available online: http://www.gov.cn/zhengce/2017-11/26/content_5242389.htm (accessed on 21 July 2025).
  8. 2018 ISOC Report on IPv6 Deployment. Available online: https://www.internetsociety.org/wp-content/uploads/2018/06/2018-ISOC-Report-IPv6-Deployment.pdf (accessed on 21 July 2025).
  9. The 55th Statistical Report on China’s Internet Development. Available online: https://cnnic.cn/n4/2025/0117/c208-11228.html (accessed on 21 July 2025).
  10. Ubiedo, L.; O’Hara, T.; Erquiaga, M.J.; Garcia, S. Current state of IPv6 security in IoT. arXiv 2021, arXiv:2105.02710. [Google Scholar] [CrossRef]
  11. Hinden, R.; Deering, S. IP Version 6 Addressing Architecture, RFC1884, December 1995. Available online: https://www.rfc-editor.org/rfc/rfc1884.html (accessed on 21 July 2025).
  12. Hinden, R.; Deering, S. IP Version 6 Addressing Architecture, RFC2373, July, 1998. Available online: https://www.rfc-editor.org/rfc/rfc2373.html (accessed on 21 July 2025).
  13. Hinden, R.; Deering, S. IP Version 6 Addressing Architecture, RFC3513, April 2003. Available online: https://www.scirp.org/reference/referencespapers?referenceid=246286 (accessed on 21 July 2025).
  14. Hinden, R.; Deering, S. IP Version 6 Addressing Architecture, RFC3513, February 2006. Available online: https://www.rfc-editor.org/rfc/rfc4291.html (accessed on 21 July 2025).
  15. Ehsan ul Haq, M.; Raza Perwaz, M.; Ahmed, K. Compact and user-friendly representation of IPv6 addressing approach and masking. In Proceedings of the 2009 International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, 9–12 November 2009. [Google Scholar]
  16. Foremski, P.; Plonka, D.; Berger, A. Entropy/IP: Uncovering Structure in IPv6 Addresses. In Proceedings of the Internet Measurement Conference, ACM, Santa Monica, CA, USA, 27–29 October 2016. [Google Scholar]
  17. Plonka, D.; Berger, A. Temporal and Spatial Classification of Active IPv6 Addresses. In Proceedings of the Internet Measurement Conference, ACM, Tokyo, Japan, 28–30 October 2015. [Google Scholar]
  18. Gont, F.; Chown, T. Network Reconnaissance in IPv6 Networks; Technical Report; Internet Engineering Task Force (IETF): Montreal, QC, Canada, 2016. [Google Scholar]
  19. Malone, D. Observations of IPv6 addresses. In Proceedings of the International Conference on Passive and Active Network Measurement, Cleveland, OH, USA, 29–30 March 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 21–30. [Google Scholar]
  20. Fan, X.; Heidemann, J. Selecting representative IP addresses for Internet topology studies. In Proceedings of the Internet Measurement Conference, ACM, Melbourne, VIC, Australia, 1–3 November 2010; pp. 411–423. [Google Scholar]
  21. Bou-Harb, E.; Debbabi, M.; Assi, C. Cyber scanning: A comprehensive survey. IEEE Commun. Surv. Tut 2013, 16, 1496–1519. [Google Scholar] [CrossRef]
  22. Durumeric, Z.; Wustrow, E.; Halderman, J.A. ZMap: Fast Internet-wide Scanning and Its Security Applications. In Proceedings of the 22nd USENIX Conference on Security, Washington, DC, USA, 14–16 August 2013. [Google Scholar]
  23. Masscan. Available online: https://github.com/robertdavidgraham/masscan/ (accessed on 21 July 2025).
  24. Song, G.; He, L.; Wang, Z.; Yang, J.; Jin, T.; Liu, J.; Li, G. Towards the Construction of Global IPv6 Hitlist and Efficient Probing of IPv6 Address Space. In Proceedings of the IWQOS 2020: 28th International Teletraffic Congress (ITC 28), Hangzhou, China, 15–17 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  25. Cheng, X.; Jing, S.; Jiao, L.; Zhao, C.; Yang, H. A survey of IPv6 address scanning techniques. In Proceedings of the Second International Conference on Frontiers of Applied Optics and Computer Engineering (AOCE 2025), Guangzhou, China, 20–22 March 2025; SPIE: Bellingham, WA, USA, 2025; Volume 13564, pp. 143–152. [Google Scholar]
  26. Colitti, L.; Di Battista, G.; Patrignani, M. IPv6-in-IPv4 tunnel discovery: Methods and experimental results. IEEE TNSM Trans. Netw. Serv. Manag. 2004, 1, 30–38. [Google Scholar] [CrossRef]
  27. Ahmed, A.S.A.M.S.; Hassan, R.; Othman, N.E. IPv6 neighbor discovery protocol specifications, threats and countermeasures: A survey. IEEE Access 2017, 5, 18187–18210. [Google Scholar] [CrossRef]
  28. Chown, T. IPv6 Implications for Network Scanning; Technical Report; Internet Engineering Task Force (IETF): Montreal, QC, Canada, 2008. [Google Scholar]
  29. Li, J.; Su, F.; Lin, Z.; Yan, M. The research and analysis of worm scanning strategies in IPv6 network. In Proceedings of the 2011 13th Asia-Pacific Network Operations and Management Symposium, Jeju, Korea, 21–23 March 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–4. [Google Scholar]
  30. Su, F.; Xu, W.J.; Lin, Z.W.; Ma, Y. Internet Worm Modeling and Analysis in IPv6 Networks. Adv. Mater. Res. 2011, 268, 1514–1519. [Google Scholar] [CrossRef]
  31. Zirngibl, J.; Steger, L.; Sattler, P.; Gasser, O.; Carle, G. Rusty Clusters? Dusting an IPv6 Research Foundation. In Proceedings of the 2022 Internet Measurement Conference, Nice, France, 25–27 October 2022. [Google Scholar]
  32. Rapid7 IPv6 Discovery Press Release. Available online: https://www.rapid7.com/about/press-releases/ (accessed on 21 July 2025).
  33. Clark, D.; Testart, C.; Luckie, M.; Claffy, K. A path forward: Improving Internet routing security by enabling zones of trust. J. Cybersecur. 2024, 10, tyae023. [Google Scholar] [CrossRef]
  34. Alfroy, T.; Holterbach, T.; Krenc, T.; Claffy, K.C.; Pelsser, C. Next Generation of BGP Data Collection Platforms. In Proceedings of the ACM SIGCOMM Conference 2024, Sydney, NSW, Australia, 4–8 August 2024. [Google Scholar]
  35. Livadariu, I.; Elmokashfi, A.; Dhamdhere, A. An agent-based model of IPv6 adoption. In Proceedings of the 2020 IFIP Networking Conference (Networking), Paris, France, 22–26 June 2020. [Google Scholar]
  36. Alexa Top 1,000,000 Sites. Available online: http://www.alexa.com/topsites (accessed on 21 July 2025).
  37. Strowes, S.D. Bootstrapping Active IPv6 Measurement with IPv4 and Public DNS. arXiv 2017, arXiv:1701.00891. [Google Scholar] [CrossRef]
  38. Fiebig, T.; Borgolte, K.; Hao, S.; Kruegel, C.; Vigna, G. Something from Nothing (There): Collecting Global IPv6 Datasets from DNS. In Passive and Active Measurement (PAM); Springer: Cham, Switzerland, 2017; pp. 30–43. [Google Scholar]
  39. Borgolte, K.; Hao, S.; Fiebig, T.; Vigna, G. Enumerating Active IPv6 Hosts for Large-Scale Security Scans via DNSSEC-Signed Reverse Zones. In Proceedings of the 2018 IEEE Symposium on Security & Privacy, San Francisco, CA, USA, 21–23 May 2018; IEEE Computer Society: Los Alamitos, CA, USA, 2018; pp. 770–784. [Google Scholar]
  40. Gasser, O.; Hof, B.; Helm, M.; Korczynski, M.; Holz, R.; Carle, G. In Log We Trust: Revealing Poor Security Practices with Certificate Transparency Logs and Internet Measurements. In Passive and Active Measurement Conference Proceedings; Springer: Cham, Switzerland, 2018; pp. 173–185. [Google Scholar]
  41. Fukuda, K.; Heidemann, J. Who Knocks at the IPv6 Door?: Detecting IPv6 Scanning. In Proceedings of the Internet Measurement Conference 2018, Boston, MA, USA, 31 October–2 November 2018; ACM: New York, NY, USA, 2018; pp. 1–7. [Google Scholar]
  42. Nikkhah, M.; Guérin, R.; Lee, Y.; Woundy, R. Assessing IPv6 Through Web Access: A Measurement Study and Its Findings. In Proceedings of the Seventh Conference on Emerging Networking Experiments and Technologies, Tokyo, Japan, 6–9 December 2011. [Google Scholar]
  43. Hu, Q.; Brownlee, N. IPv6 Host Address Usage Survey. Int. J. Future Comput. Commun. 2014, 3, 341–345. [Google Scholar] [CrossRef]
  44. University of Oregon RouteViews Project. Available online: https://www.routeviews.org/routeviews/ (accessed on 20 July 2025).
  45. Ripe NCC. RIPE Routing Information Service (RIS). Available online: https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ (accessed on 21 July 2025).
  46. Fiebig, T.; Borgolte, K.; Shuang, H.; Kruegel, C.; Vigna, G.; Feldmann, A. In rDNS We Trust: Revisiting a Common Data-Source’s Reliability. In International Conference on Passive & Active Network Measurement Proceedings; Springer: Cham, Switzerland, 2018; pp. 1–15. [Google Scholar]
  47. Ullrich, J.; Kieseberg, P.; Krombholz, K.; Weippl, E. On Reconnaissance with IPv6: A Pattern-Based Scanning Approach. In Proceedings of the 2015 10th International Conference on Availability, Reliability and Security (ARES), Toulouse, France, 24–27 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–8. [Google Scholar]
  48. Song, G.; Yang, J.; Wang, Z.; He, L.; Lin, J.; Pan, L.; Duan, C.; Quan, X. DET: Enabling efficient probing of IPv6 active addresses. IEEE/ACM TON Trans. Netw. 2022, 30, 1629–1643. [Google Scholar] [CrossRef]
  49. Murdock, A.; Li, F.; Bramsen, P.; Durumeric, Z.; Paxson, V. Target generation for internet-wide IPv6 scanning. In Proceedings of the Internet Measurement Conference, London, UK, 1–3 November 2017; ACM: New York, NY, USA, 2017; pp. 242–253. [Google Scholar]
  50. Zheng, G.; Xu, X.; Wang, C. An Effective Target Address Generation Method for IPv6 Address Scan. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  51. Liu, Z.; Xiong, Y.; Liu, X.; Xie, W.; Zhu, P. 6Tree: Efficient dynamic discovery of active addresses in the IPv6 address space. Comput. Netw. 2019, 155, 31–46. [Google Scholar] [CrossRef]
  52. Yang, T.; Hou, B.; Cai, Z.; Wu, K.; Zhou, T.; Wang, C. 6Graph: A graph-theoretic approach to address pattern mining for Internet-wide IPv6 scanning. Comput. Netw. 2022, 203, 108666. [Google Scholar] [CrossRef]
  53. Yang, T.; Cai, Z.; Hou, B.; Zhou, T. 6Forest: An ensemble learning-based approach to target generation for internet-wide IPv6 scanning. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications, London, UK, 2–5 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1679–1688. [Google Scholar]
  54. Cui, T.; Gou, G.; Xiong, G. 6GCVAE: Gated Convolutional Variational Autoencoder for IPv6 Target Generation. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 11–14 May 2020; pp. 609–622. [Google Scholar]
  55. Cui, T.; Xiong, G.; Gou, G.; Shi, J.; Xia, W. 6VecLM: Language Modeling in Vector Space for IPv6 Target Generation. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Ghent, Belgium, 14–18 September 2020; pp. 192–207. [Google Scholar]
  56. Cui, T.; Gou, G.; Xiong, G.; Liu, C.; Fu, P.; Li, Z. 6GAN: IPv6 Multi-Pattern Target Generation via Generative Adversarial Nets with Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
  57. Gasser, O.; Scheitle, Q.; Foremski, P.; Lone, Q.; Korczyński, M.; Strowes, S.D.; Hendrinks, L.; Carle, G. Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists. In Proceedings of the IMC ‘18: Internet Measurement Conference, Boston, MA, USA, 31 October–2 November 2018; ACM: New York, NY, USA, 2018; pp. 364–378. [Google Scholar]
  58. Song, G.; He, L.; Zhu, F.; Lin, J.; Zhang, W.; Fan, L.; Li, C.; Wang, Z.; Yang, J. AddrMiner: A Fast, Efficient, and Comprehensive Global Active IPv6 Address Detection System. IEEE/ACM TON Trans. Netw. 2024, 32, 3870–3887. [Google Scholar] [CrossRef]
  59. Gasser, O.; Scheitle, Q.; Gebhard, S.; Carle, G. IPv6 Hitlist Collection. Available online: http://www.net.in.tum.de/pub/ipv6-hitlist/ (accessed on 21 July 2025).
Figure 1. The process of IPv6 address space detection.
Figure 1. The process of IPv6 address space detection.
Information 16 00727 g001
Figure 2. The relationships of IPv6 detection sets.
Figure 2. The relationships of IPv6 detection sets.
Information 16 00727 g002
Figure 3. Activity diagram of the reverse search algorithm.
Figure 3. Activity diagram of the reverse search algorithm.
Information 16 00727 g003
Figure 4. The procedure of the IPv4 Vs IPv6 algorithm.
Figure 4. The procedure of the IPv4 Vs IPv6 algorithm.
Information 16 00727 g004
Figure 5. The Enumeration of PTR records.
Figure 5. The Enumeration of PTR records.
Information 16 00727 g005
Figure 6. The difference in query results between traditional DNS servers and DNSSEC servers.
Figure 6. The difference in query results between traditional DNS servers and DNSSEC servers.
Information 16 00727 g006
Figure 7. Density-based efficient targeting algorithm.
Figure 7. Density-based efficient targeting algorithm.
Information 16 00727 g007
Figure 8. The Hamming distance between IPv6 addresses.
Figure 8. The Hamming distance between IPv6 addresses.
Information 16 00727 g008
Figure 9. Detecting address range expansion (1).
Figure 9. Detecting address range expansion (1).
Information 16 00727 g009
Figure 10. Detecting address range expansion (2).
Figure 10. Detecting address range expansion (2).
Information 16 00727 g010
Figure 11. The architecture of the 6Tree algorithm.
Figure 11. The architecture of the 6Tree algorithm.
Information 16 00727 g011
Figure 12. The architecture of the 6GCVAE algorithm.
Figure 12. The architecture of the 6GCVAE algorithm.
Information 16 00727 g012
Figure 13. The architecture of the 6Gan algorithm.
Figure 13. The architecture of the 6Gan algorithm.
Information 16 00727 g013
Figure 14. The evolution of IPv6 detection algorithms.
Figure 14. The evolution of IPv6 detection algorithms.
Information 16 00727 g014
Table 1. Comparison of IPv6 detection algorithms.
Table 1. Comparison of IPv6 detection algorithms.
AlgorithmComplexityYearKey TechniquesSeed DependentIntegrity
Alex-based [42]O(N)2011Alex + rDNSYespoor
RIR-based [43]O(MN)2011RIR + rDNSYespoor
IPv4 Vs IPv6 [37]O(N)2017IPv4 + rDNSYespoor
DNS-based [38]O(N2)2017Levenshtein Distance + rDNSSemiexcellent
DNS-based++ [39]0.5O(N2)2018DNSSEC + rDNSSemigood
Pattern-based [47]MO(N2)2015Address pattern + recursive scanningYesgood
Pattern-based++ [18]MO(N2)2017Address patternYesgood
Entropy/IP [16]O(nlogn) + O(N2k)2016Information entropy + Bayesian networkYesgood
Balanced [57]O(nlogn) + O(n)2018Information entropy + k-meansYesgood
IDEC [50]O(nlogn) + O(N2)2020Information entropy + OPTICSYesgood
6Gen [49]O(nlogn) + O(N2)2017Hamming distance + density clusteringYesgood
6Tree [51]O(nlogn) + O(N3)2019Information entropy + Agglomerative ClusteringYesgood
DET [48]O(nlogn) + O(N3)2020Information entropy + Divisive HierarchicalYesgood
6GCVAE [54]O(n*D2)2020Gated Convolutional NetworkYesgood
6VecLM [55]O(n*D2)2021Word Vector + Gated Convolutional NetworkYesgood
6Gan [56]O(2n*D2)2021Generative Adversarial NetworksYesgood
AddrMiner [58]O(2n*D2)2024BGP Allocation + Generative Adversarial NetworksNoexcellent
*D denotes the eigenvector length.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, Y.; Chen, L.; Wang, Z. A Survey of IPv6 Address Scanning Technologies. Information 2025, 16, 727. https://doi.org/10.3390/info16090727

AMA Style

Ma Y, Chen L, Wang Z. A Survey of IPv6 Address Scanning Technologies. Information. 2025; 16(9):727. https://doi.org/10.3390/info16090727

Chicago/Turabian Style

Ma, Yang, Liquan Chen, and Zhanfeng Wang. 2025. "A Survey of IPv6 Address Scanning Technologies" Information 16, no. 9: 727. https://doi.org/10.3390/info16090727

APA Style

Ma, Y., Chen, L., & Wang, Z. (2025). A Survey of IPv6 Address Scanning Technologies. Information, 16(9), 727. https://doi.org/10.3390/info16090727

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop