1. Introduction
In science, medicine or industry, voxelized data representations and related voxel-based rendering have been used for decades, similarly to voxelized three-dimensional (3D) scenes, employed in computer graphics and augmented or virtual reality. In this context, Ref. [
1] provides an excellent overview of compact data representation in GPU-based direct volume rendering (DVR). However, encoding various aspects of the voxelized 3D scenes—ranging from their geometry through colours to other material properties of voxels—while representing them as high-resolution grids, requires massive amounts of memory and—consequently—memory bandwidth.
In its naive, uncompressed form, the geometry of a voxelized 3D scene, formed by a regular 3D voxel grid, may be represented by a regular 3D grid of 1b scalar values. Each voxel of the scene is thus assigned 1 bit of memory (1b/vox). If set to 0, the corresponding bit represents a passive (empty) voxel and vice versa—if set to 1, it represents an active (occupied) voxel. This encoding may require significant amounts of data. For example, the binary representation of the geometry of a scene consisting of (4K) voxels would require up to 8 GB of memory in its uncompressed form. At an extremely high resolution (such as 128K), this scene geometry representation would require a whopping 256 TB. This exceeds the capacity of off-the-shelf hardware resources used in 3D-scene processing and visualisation. Therefore, it is necessary to find a solution allowing a significantly more compact form of encoding.
A popular solution to this issue is the use of domain-specific hierarchical data structures (HDSs), both in the form of octant trees and directed acyclic graphs (DAGs). These HDSs decompose the 3D scene space, halving it in each of its three principal axes, thus creating eight subspaces called octants. Passive octants (homogeneously empty, i.e., containing only passive voxels) or active octants (homogeneously filled, i.e., containing only active voxels) can be encoded frugally, resulting in a more compact representation. Partially active octants—those containing both passive and active voxels—are then recursively decomposed into further octants.
Modern HDSs have focused on representing 3D scenes sparsely populated with active voxels (i.e., the proportion of passive voxels can reach even 99.999%), which is reflected in the use of the sparse attribute in their names. These also divide the available space into octants; however, only the passive ones—composed entirely of passive voxels—are represented in the frugal form. Active octants are those containing at least a single active voxel; these are then further recursively decomposed. Such HDSs include Sparse Voxel Octrees (SVOs)—whose main advantage is the aforementioned capability to provide a compact representation of passive octants—and other HDSs introducing the application of the Common Subtree Merge (CSM) technique—such as Sparse Voxel Directed Acyclic Graphs (SVDAGs)—and complementing the CSM technique with the use of reflective subtree transformation—as in the case of Symmetry-aware Sparse Voxel Directed Acyclic Graphs (SSVDAGs).
By implementing a binary representation of pointers to child nodes, these data structures enable fast traversal. However, pointers can represent a significant portion of the total size of the binary representation of an HDS. Therefore, researchers have been keen to optimize them by diversifying their lengths. This also allowed further optimization using Frequency-Based Compaction (FBC), in which pointers are assigned to child nodes so that the shortest pointers are assigned to the child nodes having the highest numbers of references and vice versa.
Data structures free of such pointers—such as Pointerless Sparse Voxel Octrees (PSVOs) or Pointerless Sparse Voxel Directed Acyclic Graphs (PSVDAGs)—allow for achieving high compression ratios by completely omitting the binary representation of pointers to child nodes. However, this degrades their traversability and makes them more suitable for streaming or archiving purposes. These HDSs are discussed in more detail in
Section 2 and
Section 3 hereof.
In this paper, we propose a novel hierarchical data structure, the Clustered Sparse Voxel Octrees (CSVOs), based on the structure of SVOs. CSVOs allow—without losing fast traversability—omitting a significant number of pointers to child nodes and shortening others significantly by representing them using 8b, 16b and 32b. These pointers do not represent addresses in the global address space of the data structure or offsets from the beginning of the representation of the respective nodes level in the tree, as it is the case in other HDSs. Instead, the value of such a pointer represents the offset of the position of the child node measured from the end of the pointer array of its parent node. With this modification, when testing the method, we managed to achieve a many times more compact binary representation of CSVOs, compared to SVOs.
The contribution hereof lies in the following:
The design of a domain-specific hierarchical data structure, the CSVO, designed for a compact representation of the geometry of three-dimensional voxelized scenes, sparsely populated with active voxels, employing lossless compression
The design of a two-step out-of-core algorithm, aimed at constructing a CSVO from an ordered list of active scene voxels, represented by their Morton addresses.
The structure of this paper is as follows:
Section 2 discusses the related works in terms of linearization of multi-dimensional data using Space-Filling Curves (SFCs) and—notably—the representation of the geometry of 3D scenes using domain-specific HDSs. Due to the vast number of papers published in this field, this section of the paper focuses on the works more closely related to its contribution.
Section 3 introduces Sparse Voxel Octrees (SVOs), as the domain-specific hierarchical data structure, as a means for representing the geometry of three-dimensional voxelized scenes, along with their pointerless version called Pointerless Sparse Voxel Octrees (PSVOs).
Section 4 represents the most important part of the contribution of our work. It presents CSVO, a domain-specific hierarchical data structure proposed herein, designed for representing the geometry of voxelized three-dimensional scenes.
Section 5 introduces a two-step out-of-core algorithm for the construction of the aforementioned CSVOs, proposed herein.
Section 6 presents the test results. The first part of the section presents three scenes at six different voxel resolutions, whose geometry was stored in multiple SVO versions and in the newly proposed CSVO. This is followed by the evaluation of the achieved results. The last part of the section discusses the compression sources allowing higher data compression rates within the CSVOs (in comparison with the SVOs).
Section 7 summarizes the conclusions drawn from the test results described in the preceding section hereof.
2. Related Works
To rearrange—linearize—pixels of a regular 2D grid or voxels of a regular 3D grid, one may use Space Filling Curves (SFCs), introduced back in the 19th century [
2]. In computer graphics, Morton Space Filling Curves (MSFCs) [
3] are the SFCs frequently used to linearize multi-dimensional data. Hilbert Space Filling Curves (HSFCs) [
4] are also often used in computer science, as they preserve the locality of the data better. See
Figure 1 for an illustration of MSFCs and HSFCs.
In their 2010 works [
5,
6], Laine and Karras presented Efficient Sparse Voxel Octrees (ESVOs), based on octant trees, along with an efficient ray casting algorithm using this HDS. The main advantage of this structure was the possibility of replacing entire subtrees if contour information could be used instead (or the situation could be interpreted as increasing the geometric resolution of the voxel using this contour information). At the binary level, contour information was represented using 32 bits—of these, 24 were used to store the contour pointer and 8 to store the contour mask. This allowed for increasing the geometric resolution, compressing the binary representation of smooth surfaces and accelerating ray casting.
In their 2013 work [
7,
8], Baert and Dutré presented a two-step out-of-core algorithm transforming a mesh of triangles into an SVO. In the first step, an intermediate product—a list of active voxels—was created from the mesh of triangles. Each was represented using its Morton coordinate, while the list items were ordered according to this coordinate in an ascending order. In the next (second) step, an SVO was constructed from this intermediate product. The size of the input set of polygons, the intermediate product and even the resulting SVO could exceed the available amount of operating memory by far. In its first step, the aforementioned two-step algorithm produced an intermediate product whose representation could consume large amounts of data. In their 2015 work [
9], Pätzold and Kolb presented an algorithm combining parallel voxelization on GPUs with an out-of-core approach not processing an intermediate product (a voxel grid), but rather producing an SVO directly.
In 2013, Kämpe et al. introduced Sparse Voxel Directed Acyclic Graphs (SVDAGs) [
10]. Compared to SVOs, this HDS allowed a significant increase in data compression, due to the possibility of using CSM. With the CSM technique, two or more identical subtrees of the HDS could be represented by fully expanding the binary representation of only a single copy of such a subtree, whose root node was then referenced multiple times, by multiple child node pointers from the respective parent node level. Thus, the other copies of the subtree could be omitted from the binary representation of this HDS. Using multiple references to nodes in the data structure also led to a change in terminology, as these data structures were no longer octant trees, but rather directed acyclic graphs (DAGs). All node parts of this data structure, the nodes themselves, and also the entire HDS—were 32-bit aligned. Despite a more compact data representation, SVDAGs could be constructed so that the decompression overhead of their binary representation is identical to that of SVOs.
Besides the use of DAGs for the representation of the geometry of voxelized scenes, efforts are being made to add information about other attributes of voxels, both by integrating information into more complex SVDAGs and by creating separate data structures developed for this purpose. Williams presented Moxel DAG HDS in [
11], where an extended High Resolution SVDAG is used in connection with an external data structure called Moxel Table, where the material information is stored. Dado et al. proposed in [
12] decoupling of geometry and voxel data, using a novel mapping scheme, to apply the DAG principle to encode the topology, while using a palette-based compression for the voxel attributes. Dolonius et al. presented in [
13,
14] a novel method for connecting each node in SVDAG to its corresponding colors in a separate 1D array of colors using a small amount of additional information incorporated into the DAG. In connection with DAGs, attention is paid also to their use in the compact representation of voxelized shadows [
15,
16].
In 2016, Villanueva et al. introduced a hierarchical data structure, Symmetry-aware Sparse Voxel Directed Acyclic Graphs (SSVDAGs) in [
17,
18]. Like SVDAGs, this data structure also allowed using the CSM technique; however, it added the possibility of common subtree merging even if reflective transformations (i.e., mirroring) were required to make the subtrees identical. These transformations could be implemented independently in each of the principal axes of the represented 3D scene. To achieve this mirroring, an extra 3 bits had to be inserted into the child node pointers. The pointer could be either shorter (16b) or longer (32b). The shorter pointers were assigned to the most frequently referenced nodes, using frequency-based compaction (FBC). Due to the greater number of child node pointers of various lengths, 2 bit Header Tags (HTs) were used to form a 16b Child Node Mask (CHNM)—the size of these was the double of those used in the aforementioned HDSs (for both HT and CHNM). In order to be able to achieve a compact representation of the leaf node layer and to minimize the number of child node pointers, the leaf node layer of this HDS was made up of voxel grids having a size of 4
voxels. Node components, nodes, grids and the whole data structure were aligned to 16 bits. While the mirroring itself did not increase the decompression overhead during rendering, compacting the child node pointers led to a 15% overhead.
In 2020, Pointerless Sparse Voxel Directed Acyclic Graphs (PSVDAGs) were introduced by Vokorokos et al. [
19]. This HDS combined the advantages of PSVOs and SVDAGs. As it was in the case of PSVOs, this structure omitted child node pointers, and, similarly to SVDAGs, it allowed common subtree merging (CSM). To make this possible without implementing child node pointers, this HDS introduced the concept of labels and callers. Labels denoted subtrees serving as patterns referenceable by callers multiple times. In order to achieve a more compact representation of the data structure, both labels and callers had variable lengths and even FBC was applied, when the most frequently referenced subtrees were assigned the shortest labels and callers and vice versa. Due to the absence of pointers to child nodes, this data structure had the same drawback as the one of PSVOs: a limited traversal rate. Therefore, in 2021, Madoš and Ádám presented an algorithm enabling fast transformation of these data structures into SVDAGs [
20].
While the aforementioned works focused on lossless compression of scene geometries using hierarchical data structures, attention was also given to lossy compression. In 2020, van der Laan et al. introduced Lossy Sparse Voxel Directed Acyclic Graphs (LSVDAGs), based on SVDAGs [
21]. In its construction process, not only absolutely identical subtrees were searched for, but also more rarely occurring subtrees that only required minimal modification to become identical. This increased the number of subtrees to which the CSM technique could be applied. The achieved reduction (compared to SVDAGs) ranged from 10% to 50% when modifying 1% to 5% of the active voxels.
The geometry representation of voxelized scenes using aforementioned HDSs is suitable for static data. When a change of geometry is implemented, it is necessary to decompress the corresponding HDS and then re-compress it. Therefore, in [
22], Careil et al. introduced a new data structure called HashDAG that enables interactive modifications of such compressed voxel geometry without requiring de- and recompression. This data structure is compatible with the attributes compression introduced in [
13,
14].
HDSs find their application also in the representation of time-variable voxelized scenes. In [
23], Kämpe et al. presented a temporal DAG, which stores time-varying voxel data in one DAG, while special attention is also paid to the optimization of pointer lengths. In [
24], Martinek et al. proposed the Motion DAG data structure which interleaves a temporal interval binary tree for filtering time consecutive data and a sparse voxel octree (SVO) which simplifies spatially nearby data. Zhang et al. in [
25,
26,
27] dealt with an octree-based motion representation method that can be applied to compress animated geometric data.
3. Octree-Based Hierarchical Data Structures
Domain-specific hierarchical data structures designed to represent 3D scene geometry include octree-based SVOs and PSVOs. This section contains a brief introduction and formalization of these.
3.1. Sparse Voxel Octrees
An SVO represents the geometry of a voxelized 3D scene containing voxels; , where , and . The nodes of the SVOs are hierarchically arranged into m layers, which can be numbered. The root node, representing the whole 3D scene, shall form layer 0, while the leaf nodes (LNODEs) shall form layer . All nodes in layers 0 to are internal nodes (INODEs). The nodes represent specific octants of the 3D scene, and their child nodes represent the recursive decomposition of these octants into sub-octants.
Thus, INODEs can potentially have eight child nodes. A suboctant can be either passive, i.e., homogeneously filled with passive voxels (in this case, there is no child node associated with it in the HDS, which is a significant source of compression) or active, i.e., containing at least one active voxel (in this case, a child node exists). Information about the passive and active octants is stored in the node’s Child Node Mask (CHNM), composed of eight Header Tags (HTs), one for each potential child node. If the HT is set to ‘0’, the associated octant is passive, without any corresponding child node. If the HT is set to ‘1’, the octant is active and the child node exists.
Following the CHNM, there is a concatenated array of pointers (PTS) to the active child nodes—as their count () may vary, so does the total length of the binary representation of the PTS. A pointer (PT) may represent an address within the global address space of the SVO, pointing to the start of the binary representation of the corresponding child node. Alternatively, if each SVO level has its own separate address space, a PT may represent an offset of the start of the binary representation of the corresponding child node from the beginning of that address space.
The order of the PTs in the PTS array corresponds to the order of the HTs in the CHNM. In this paper, we used the Morton-order to determine this order.
The CHNM of LNODEs encodes individual voxels directly (there are no child nodes and therefore no pointers to these child nodes), i.e., HT = ‘0’ represents a passive voxel and HT = ‘1’ represents an active voxel, respectively. Again, their order is consistent with the Morton-order in this paper.
In order to formalize the binary representation of SVOs, we used the Backus–Naur Form (BNF):
where the following applies:
<SYM>—a mandatory non-terminal symbol SYM
“sym”—terminal symbol sym
(n)<SYM>—the SYM symbol, concatenated n-times
(n)*(m)<SYM>—the SYM symbol, concatenated n to m times
|—alternative
In this formal notation, the parameters p and q represent the number of reserved bits appended to the CHNM in the INODE and the LNODE, respectively. They are used to align the CHNM to the desired number of bits. The parameter r is then used to set the desired length of the binary representation of the child node pointers.
If we set the parameters to and , all node parts, the entire nodes and even the whole data structure are aligned to 32 bits. The 8b child node mask (CHNM) is complemented by 24 reserved bits—this applies to both internal (INODE) and leaf (LNODE) nodes. Child node pointers are represented using 32b. For testing purposes, we denoted this version of the data structure SVO.
If we set the parameters to and , the binary representation of SVO is more compact, but not all parts of the data structure are aligned to 32b. For testing purposes, we denoted this version of the data structure SVO.
An example of encoding a two-dimensional space (for simplicity and greater clarity, therefore using only the lower 4 HTs) into an SVO, with the parameters set to
and
, is depicted in
Figure 2. Here, the root node constructed as an INODE is shown, having three active child nodes, all of them represented as LNODEs. The addresses of the nodes and their components are given below them, in decimal notation.
3.2. Pointerless Sparse Voxel Octrees
In the case of the PSVO data structure, child node pointers are not present in the binary representation of the nodes. The nodes consist exclusively of 8b CHNMs—this applies to both INODEs and LNODEs. In order to represent the relationship between parent and child nodes, the binary representation of the child node is appended right to its HT in the parent node.
To formalize the binary representation of PSVOs, we used the BNF:
For testing purposes, we denoted this version of the data structure PSVO.
An example of encoding a two-dimensional space (for simplicity and greater clarity) into a quadrant tree analogous to PSVO is depicted in
Figure 3.
4. Clustered Sparse Voxel Octrees
The HDS proposed herein—CSVOs—is designed to represent the geometry of a voxelized 3D scene containing voxels; , where , and . While the nodes of traditional SVOs are hierarchically arranged into m layers, the nodes of the CSVOs are arranged into layers. The CSVO root node is stored in layer 0 and the CSVO leaf node layer, stored in layer , is equivalent to the last two layers of the SVO. Thus, if , and hence the 3D-scene comprises voxels. The SVO nodes are then stored in eight levels (numbered 0 to 7) and the CSVO nodes are stored in seven levels (numbered 0 to 6).
The CSVO consists of three kinds of nodes, denoted as follows:
Internal Nodes (INODEs), stored in layers 0 to . Their child node masks, denoted as Long Child Node Masks (LCHNMs), require 16 bits, as each of the eight HTs uses 2 bits. These nodes support 8b, 16b, and 32b pointer lengths to child nodes, respectively. HT = ‘01’ indicates an 8b pointer, HT = ‘10’ indicates a 16b pointer, and HT = ‘11’ indicates a 32b pointer. HT = ‘00’ indicates that there is no child node and therefore no pointer to this child node.
Pre-Leaf Nodes (PLNODEs), stored in layer . Their CHNMs have 8 bits, each of the HTs has 1 bit. These nodes support 8b length pointers to child nodes and are indicated in the CHNM by setting the HT to ‘1’. HT = ‘0’ indicates that there is no child node and therefore no pointer to this child node.
Leaf Nodes (LNODEs), stored in layer . Their CHNMs have 8 bits. Each of the HTs has 1 bit. They do not have pointers to their child nodes. HT = ‘1’ indicates that the corresponding child node (in the form of 8b CHNM, where each HT represents activity/passivity of particular voxel) is appended in an array of child nodes following the CHNM of LNODE.
If the dimension of the 3D scene is , i.e., , the CSVO root node is encoded as an INODE; if , i.e., , it is encoded as a PLNODE; and if , i.e., , it is encoded as an LNODE. The child node of an INODE must be either an INODE (if the parent node belongs to the levels 0 to ) or a PLNODE (if the parent node belongs to level ). The child node of a PLNODE must be an LNODE.
4.1. Internal Node
Each CSVO internal node consists of a 16b “long” child node mask (LCHNM) followed by a concatenated array of child node pointers (PTS). Child nodes are then stored immediately following this parent node (they are further recursively decomposed here to form their own clusters of nodes—for the purposes of this HDS, a cluster is an encoded subtree of the CSVO, with the root node being the particular child node). The pointer represents the offset of the start of the child node (and its cluster) from the end of the PTS pointer array of its parent node (in bytes). This offset is represented by the pointer with the smallest possible number of bits. If the offset value is from the range , it is represented by an 8b pointer and HT = ‘01’ is used in the LCHNM; if it is from the range , it is represented by a 16b pointer and HT = ‘10’ is used in the LCHNM; and finally, if it is from the range , it is represented by a 32b pointer and HT = ‘11’ is used in the LCHNM. The offset of the first child node in the sequence is always 0 and therefore it does not need to have a binary representation in the node, although its HT is set to ‘01’ in the LCHNM. The number of child nodes ranges from 1 to 8, the number of child node pointers in the PTS ranges from 0 to 7.
For example, the internal node depicted in
Figure 4 has four active child nodes whose cluster sizes are 27B, 3450B, 72080B and 870B, respectively. Therefore, the first child node cluster has an offset of 0B from the end of the pointer array to the child node cluster and its pointer PT0 is thus omitted from the pointer array (it is indicated in the figure only for illustrative purposes); however, its HT = ‘01’ is present in the LCHNM. The second child node cluster has an offset of 27B, so its pointer PT1 will have an 8b binary representation and its HT set to ‘01’. The third child node cluster has an offset of 3477B (27B + 3450B) and a 16b pointer PT2 with HT = ‘10’. Finally, the fourth child node cluster has an offset of 75557B (27B + 3450B + 72080B), a 32b pointer PT3 and its HT = ‘11’. The size of this node amounts to 9B.
4.2. Pre-Leaf Node
Each CSVO pre-leaf node consists of an 8b child node mask (CHNM) followed by a concatenated array of child node pointers (PTS). The child nodes are then stored right following this parent node. The pointer represents the offset of the start of the child node from the end of its parent node’s pointer array. This offset is always represented by an 8b pointer and is always assigned HT = ‘1’ in the CHNM (since the maximum size of a child node in the case of PLNODEs cannot exceed 9B and the number of such child nodes is at most 8, the offset of the last child node can be at most 63). By analogy with INODEs, the pointer to the first child node is not encoded, but its HT = ‘1’ is stored in the CHNM.
Figure 5 shows an example of a pre-leaf node, having four active child nodes with cluster sizes 5B, 7B, 4B and 9B, respectively. Therefore, the first child node cluster has an offset of 0B from the pointer array to the cluster of child nodes and its pointer is thus not represented in the pointer array; however, its HT is present in the CHNM. The second child node cluster has an offset of 5B; the third child node cluster has an offset of 12B (5B + 7B). Finally, the fourth child node cluster has an offset of 16B (5B + 7B + 4B). The size of this node amounts to 4B.
4.3. Leaf Node
Each CSVO leaf node consists of an 8b child node mask (CHNM). It is the equivalent of a CHNM node from SVO node layer . However, this CSVO node no longer includes an array of child node pointers (PTS). If the HT is set to ‘1’ in this CHNM, this indicates that right after the CHNM, there will be an 8b node appended—this already carries information on the geometry of the voxels themselves (i.e., their passivity or activity). The node is thus equivalent to the CHNM leaf node from SVO node layer . Since these nodes have a constant size of 1B, their offset from the CHNM of an LNODE can be calculated using the CHNM of the particular node itself, without using pointers. Therefore, these pointers are omitted from the LNODEs. If all HTs in the CHNM of an LNODE are set to 1, they indicate 8 other 1B nodes being appended, so the maximum size of an LNODE is 9B.
An example of a CSVO leaf node is depicted in
Figure 6, in which the CHNM size is 1B. An HT value of 1 indicates the existence of four concatenated nodes, each 1B long. Thus, in total, the binary representation of this node cluster requires 5B of space.
5. Out-of-Core Algorithm for CSVO Creation
The algorithm for constructing CSVOs, proposed herein, allows out-of-core construction of this data structure in two steps. The first step determines CHNMs of CSVO nodes; the second step determines child node pointers of CSVO INODEs.
The input of the algorithm is a list of active voxels read from a file, where each voxel is represented by its Morton address. In this file, the voxels are sorted in ascending order according to the aforementioned Morton address.
The first step of the CSVO construction algorithm is implemented using a modified version of Baert’s algorithm (see [
7,
8] for details).
Baert’s algorithm allows for compiling SVOs by writing each level of the generated tree into a separate file; in this, the nodes are stored in the order, in which they can be read from center to right at the time of rendering the graphical representation of the SVO. Each node is represented by both its child node mask and an array of child node pointers. It is possible to set the parameters described in
Section 3.1 of this paper within its implementation, i.e., the CHNM alignment for both INODEs and LNODEs and the length of the binary representation of pointers to child nodes, which is constant for the entire HDS in the case of Baert’s algorithm. Using the original Baert’s algorithm, SVO
and SVO
were generated for the test scenes used in this paper (details of scenes can be seen in
Table 1). The obtained sizes of the binary representation of these HDSs can be seen in
Table 2.
We made two modifications of Baert’s algorithm. The first modification is that only the second step of Baert’s algorithm is used because the input of this second step is the same as the input of our algorithm. The second modification is that pointers to child nodes are not created and written to the output files because they are determined in the second step of our algorithm.
Through the modification of Baert’s algorithm, only the 8b CHNMs of nodes are determined and written to the output files. This causes a significant reduction in the size of the node’s representation in the output files of intermediate product generated in the first step of our algorithm, compared to the outputs of the classical Baert’s algorithm. For example, in the case of INODEs of SVOs generated by the original Baert’s algorithm, where all node components and thus the entire SVOs are aligned to 32b, the size of the INODEs written to the output file is from 64b (one 8b child node mask aligned to 32b and one 32b pointer to the child node) up to 288b (one 8b child node mask aligned to 32b and eight 32b pointers to the child nodes). Therefore, there is an 8- to 36-fold compression of binary representation for each INODE in our intermediate output files compared to the original Baert’s algorithm. In the case of LNODEs, there are no pointers to child nodes, and the 32b LNODE is replaced by 8b, allowing 4-fold compression.
The reason for using Baert’s algorithm is its out-of-core nature and simple possibility to modify it in a way that the generated intermediate product is significantly more compact compared to SVO. This intermediate product represents the optimal input for the second step of our algorithm. The size of the binary representation of this product for a specific scene is the same as the size of its PSVO representation, so the obtained results for the test scenes can be viewed in more detail in
Table 2. It is possible to compare the size of the resulting product of Baert’s algorithm (SVO
and SVO
) and the size of the intermediate output from our modification (equal to the size of PSVO). Comparison of number and lengths of pointers from SVO generated by Baert’s algorithm and our final data structure for each layer of nodes can be seen in
Table 3 and
Table 4.
The Intermediate output of the first step of our algorithm is composed of m node layers, each stored in a separate file and numbered from 0 to m − 1.
The example in
Figure 7 shows the generated CHNMs of the SVO nodes, having four node levels. The root node is at level 0. This node and the nodes of levels 1 and 2 are INODEs of the SVO. Level 3 contains the SVO LNODEs. The red arrows, showing the association between the HT of the parent node and the child node, are added to the figure for illustrative purposes only; they are not included in the binary representation of this intermediate product. The relationship of a particular HT of a parent node and a particular child node is determined by the fact that the n-th HT set to 1 (counted across all CHNMs in a particular node layer, from its start) in the parent layer, is associated with the n-th CHNM in the child layer. In the example, level 0 occupies 1B, level 1 occupies 3B, level 2 occupies 4B, and—finally—level 3 occupies 6B. In total, we used 14B.
Step 2 of the algorithm processes the obtained intermediate product in a bottom-up approach and finally generates the CSVO. In its first sub-step, it generates a layer of LNODEs; in the second, a layer of PLNODEs; while, in its third, it generates a layer of INODEs. The third sub-step is then repeated until layer 0 is processed.
In sub-step 1, the leaf node (LNODE) layer of the CSVO is generated. The first CHNM is loaded from node layer
(i.e., level 2 in the example depicted in
Figure 8). The number of HTs set to ‘1’ in this CHNM determines the number
n of nodes from node layer
(i.e., level 3 in the example depicted in
Figure 8) that will be appended to the loaded CHNM. Then, SIZE is calculated as the size of the resulting LNODE. This value is then written into the
output file at 32b (marked in yellow in the example depicted in
Figure 8). Then, the CHNM loaded from layer
and the
n CHNMs from layer
are written to the output file. In this way, the algorithm creates the first LNODE. Then, it continues with the next LNODE until the last CHNM from layer
is processed. The result of this sub-step is the file
(an eponymous file is shown in the example depicted in
Figure 8).
In sub-step 2, a pre-leaf node (PLNODE) layer of the CSVO is generated. The first CHNM is loaded from node layer
(i.e., level 1 in the example depicted in
Figure 9). The number of HTs set to ‘1’ in this CHNM determines the number
n as the number of clusters of result0 (result0 in the example depicted in
Figure 9) that will be appended to the loaded CHNM. From each such cluster, the values of SIZE are retrieved, to calculate the SIZE value of the generated output node cluster. This information is written to the
output file (result1 in the example depicted in
Figure 9) at 32b; then, the following are written to the output file: the CHNM loaded from layer
and the generated child node pointers that can be determined from the cluster SIZEs loaded from result0. Subsequently, the
n clusters read from the result0 layer are written to the file (without their SIZE information though). In this way, the first PLNODE cluster is created; the algorithm then continues with the next one, until the last CHNM of layer
is processed. The result of this sub-step is the
file (an eponymous file appears in the example depicted in
Figure 9).
In sub-step 3, the internal node (INODE) layer of the CSVO is generated. The first CHNM is loaded from node layer
(i.e., level 0 in the example depicted in
Figure 10). From the number of HTs set to ‘1’,
n is calculated as the number of clusters of
(result1 in the example depicted in
Figure 10) that will be appended to the loaded CHNM. From each such cluster, the values of SIZE are retrieved, to calculate the SIZE of the output node cluster. This information is written to the
output file (result2 in the example depicted in
Figure 10) at 32b. Subsequently, the LCHNM is generated from loaded CHNM and written to the output file; then, the child node pointers are generated and written to the output file. Subsequently, the
n clusters read from the
layer are appended to the file (without their SIZE information, though). By this, the first INODE and its cluster is created; the algorithm then continues with the next one, until the last CHNM of layer
is processed. The result of this sub-step is the
file (an eponymous file appears in the example depicted in
Figure 10).
If the intermediate file processed in sub-step 3 represented level 0 (i.e., it contained the root node), no SIZE information is added to the generated result file and the obtained result file contains the final CSVO. If the intermediate file containing the root node has not been processed, the algorithm repeats sub-step 3 to iteratively process the next node layer of the intermediate product, along with the last generated result file, which leads to the generation of another result file.
6. Results and Discussion
This section summarizes the results of the comparison of the proposed CSVO (compiled using the algorithm proposed herein) with two SVO versions and a single PSVO version. In the first part of this section, we present the testing datasets we used to obtain the results shown in the next part of the section; then, in the final part of the section, we discuss sources of increase of the compression ratio within CSVOs, compared to other HDSs.
6.1. Datasets
The three-dimensional test scenes were created from 3D polygonal models, originally saved in the Wavefront Technologies OBJ geometry definition file format. These models include “Angel Lucy”, consisting of 488,880 triangles; “Skull”, containing 80,016 triangles; and “Porsche”, containing 22,011 triangles. These models were embedded into scenes and these were then voxelized to various resolutions, ranging from 128 to 4096 (4K) voxels. This resulted in 18 voxelized scenes.
Subsequently, we created separate geometry representations of each and every scene involved. Every representation had the form of a regular 3D grid of scalar values, with the same grid dimensions as the corresponding voxelized scene, using a scalar value size of 1b. Thus, in this uncompressed form, describing the geometry of the scenes required 1b/vox. Passive (empty) voxels were represented as 0 s, while active (filled) voxels as 1 s.
The proportion of active voxels in the test scenes ranged from 3.53% (in the case of the “Skull” model, voxelized to a resolution of 128) to 0.03% (in case of the “Angel Lucy” model, voxelized to a resolution of 4096, i.e., 4K). In contrast, the absolute number of active voxels was the smallest with the lowest resolution (in this case, the “Angel Lucy” model, voxelized to a 128 resolution, consisted of active voxels) and the largest in the case of the Skull model, voxelized to a resolution of 4096 (4K), consisting of active voxels.
We used Morton Space Filling Curve (MSFC) to linearize the data.
The detailed parameters of the respective 3D scenes are shown in
Table 1; their visualizations are depicted in
Figure 11.
Then, for each active voxel of the particular scene—using its
x,
y, and
z coordinates—we calculated its Morton coordinate, representing its location in the scene, as shown in the example depicted in
Figure 12: here, a 24b Morton coordinate is constructed from three 8b coordinates. All active voxels of the scene were then sorted in ascending order and stored in a file with the extension *.pts.
6.2. Test Results
The test datasets represent the geometry of the respective scenes as lists consisting of only active voxels, represented as their 64b Morton addresses (constructed as described in
Section 6.1). The voxels in the dataset were sorted in ascending order, according to the value of this address. Using the algorithm proposed by Baert et al., SVO
and SVO
were constructed for each scene; later, the PSVO structure was also created, as described in
Section 3.1 and
Section 3.2 hereof, respectively. The sizes of the binary representations of these HDSs were then compared with the size of the binary representation of the CSVO structure, described in
Section 4 and compiled using the algorithm described in
Section 5. The tests were performed on a computer with an Intel(R) Core(TM) i5-3470 CPU @ 3.20 GHz and 8 GB RAM, running Debian Linux version 4.19.0-6 and gcc version 8.3.0.
The achieved results, i.e., the size of the binary representation of the aforementioned HDSs and the achieved relative compression ratios between these HDSs and the CSVO proposed herein, are summarized in
Table 2.
As it is evident from
Table 2, the binary representation of the CSVO data structure exceeds that of the PSVO data structure. The relative compression ratio (CR), measured as the ratio of the sizes of the PSVO and CSVO data structures (denoted as PSVO/CSVO CR in
Table 2), ranges from 0.82 to 0.85. However, it should be noted that the PSVO data structure—due to the absenting child node pointers—is not easily traversable. Compared to SVO
, having all of its parts aligned to 32b, the CSVO data structure was 6.57 to 6.82 times more compact. Compared to SVO
, not having all of its parts aligned to 32b, the CSVO data structure was 4.11 to 4.27 times more compact.
With the increasing voxel resolution of the model, the compression ratio of the CSVO, compared to the other HDSs, gradually decreased. This is due to the increasing volume of the binary representation of the CSVO and the associated increasing offset of the child nodes from their parent nodes at higher levels of the HDS (i.e., closer to the root). The binary representations of the pointers to these child nodes are longer in this case. A deeper analysis of the number of pointers of various lengths in the respective levels of the tree of SVO
, SVO
and CSVO for the Angel Lucy model voxelized to a resolution of 128
is shown in
Table 3,
Table 4 and
Table 5.
6.3. Compression Gains
The sources of compression of the binary representation of CSVO that allow for outperforming the compared SVOs include:
compression of the child node mask representation;
omitting a significant number of child node pointers;
shorting a significant number of 32b child node pointers to 8b and 16b.
One of the most significant sources of compression—in terms of binary representation—when using the CSVO (instead of SVO) is the removal of reserved bits appended to the CHNMs, since in both the INODE and LNODE of this SVO there are up to 24 reserved bits appended to the 8b CHNMs. In the CSVO, both the LNODEs and the PLNODEs contain only 8b CHNMs, which allows for achieving up to 4-fold compression of the representation of this part of the nodes. The INODE of the CSVO uses a 16b LCHNM, which leads to a 2-fold compression of this part of the node. Compared to SVO, SVO is more compact—precisely because it already includes this optimization by omitting the reserved bits. On the contrary, CSVO (using 16b LCHNMs) loses against SVO (using 8b CHNMs).
The CSVO LNODE design allows the encoding of parent nodes CHNM and the associated CHNMs of the child nodes of the last two SVO levels by omitting the pointers to these child nodes. In the case of the “Angel Lucy” 128 model, this allowed omitting no less than 5291 of all 6832 pointers (77.44%) from the binary representation of the HDS.
In CSVO, the design of the INODEs and PLNODEs omits the binary representation of the pointer to their first child nodes in the sequence. Since each of these nodes must have at least one child node, a significant number of child node pointers can be omitted in this way. In the case of the “Angel Lucy” 128 model, this allowed for omitting an additional 349 pointers (5.11% of all pointers).
Finally, the increase in the range of lengths of the binary representation of child node pointers is also an important source of compression: in CSVO, the 32b pointers of SVO nodes can be represented not only as 32b pointers, but also as 8b and 16b pointers, respectively. In the case of the “Angel Lucy” 128 model, not less than 1192 pointers (17.44%) have been replaced by shorter pointers—in CSVO, 1150 were represented as 8b pointers and 42 as 16b pointers. Due to the low resolution of this model, 32b pointers were not used at all.
7. Conclusions
This paper discussed domain-specific hierarchical data structures designed for representing the geometry of voxelized 3D scenes, sparsely populated with active voxels. The aim of the paper was to investigate the potential of using the information on the distance of the child nodes of a hierarchical data structure from their parent nodes when linearizing the structure and encoding this information into the child node pointers. This, together with optimizing the count and length of the binary representation of these pointers, allowed us to design a new way of HDS encoding—CSVO—together with a new out-of-core construction algorithm.
Compared to SVO, having all of its parts aligned to 32b, the CSVO data structure was 6.57 to 6.82 times more compact. Compared to SVO, not having all of its parts aligned to 32b, the CSVO data structure was 4.11 to 4.27 times more compact. We got significantly closer to the size of the PSVO, which does not implement any child node pointers, and compared to which the CSVO was larger only by 17% to 22% (relative compression ratio was ranging from 0.82 to 0.85) in the tests performed using our testing datasets. With the increasing voxel resolution of the model, the compression ratio of the CSVO, compared to PSVO and SVO, gradually slightly decreased.
In the context of the proposed HDS, the potential of using common subtree merging has not yet been explored, which would allow in the future research, employing the principles presented herein, to construct an even more compact HDS in the form of an directed acyclic graph.