Fast overlap detection between hard-core colloidal cuboids and spheres. The OCSI algorithm

Collision between rigid three-dimensional objects is a very common modelling problem in a wide spectrum of scientific disciplines, including Computer Science and Physics. It spans from realistic animation of polyhedral shapes for computer vision to the description of thermodynamic and dynamic properties in simple and complex fluids. For instance, colloidal particles of especially exotic shapes are commonly modelled as hard-core objects, whose collision test is key to correctly determine their phase and aggregation behaviour. In this work, we propose the OpenMP Cuboid Sphere Intersection (OCSI) algorithm to detect collisions between prolate or oblate cuboids and spheres. We investigate OCSI's performance by bench-marking it against a number of algorithms commonly employed in computer graphics and colloidal science: Quick Rejection First (QRI), Quick Rejection Intertwined (QRF) and SIMD Streaming Extensions (SSE). We observed that QRI and QRF significantly depend on the specific cuboid anisotropy and sphere radius, while SSE and OCSI maintain their speed independently of the objects' geometry. While OCSI and SSE, both based on SIMD parallelization, show excellent and very similar performance, the former provides a more accessible coding and user-friendly implementation as it exploits OpenMP directives for automatic vectorization.


Introduction
Employing computer programs and algorithms to generate 2D or 3D images is referred to as rendering. Rendering is a topic of striking relevance in computer graphics with practical impact on many heterogeneous disciplines, spanning engineering, simulators, video games and movie special effects. Collision detection and collision determination are key elements of rendering as they determine the distance between two objects and their possible intersection [1]. Due to their widespread use in video representation of time-evolving systems, with tens of frames displayed per second, algorithms for rendering are expected to be very efficient [2,3]. Generally, to assess whether two complex objects collide, the distance between their respective bounding volumes is evaluated first. Common bounding volumes are cuboidal boxes, whose axes might or might not be aligned, or spheres. Due to their simple geometry, the collision between cuboids and/or spheres is computationally easier [4,5,6,7], thus enhancing the speed and efficiency of the overall rendering process [2]. Collision detection algorithms are of utmost relevance in many heterogeneous applications spanning computer graphics for shape modelling and video games [8,9], robotics to prevent potential collisions in manrobot interactions [10] or machining of sculptured surfaces [11], and simulations of molecular or particle systems to estimate their thermodynamics properties [12,13].
Collision algorithms have also been key to address the thermodynamics of liquid and solid phases and their phase transition by early molecular simulation studies that employed the hard-sphere model [14,15,16]. More recently, and following the seminal theory by Onsager on the isotropic-to-nematic transition of hard rods [17], they were fundamental to confirm the crucial role of excluded volume effects in the formation of colloidal liquid crystal phases of anisotropic particles [12]. Realising the practical impact of the particle shape on the design of nanomaterials triggered the blooming of biosynthetic [18], chemical [19] and physical [20] experimental routes to manufacture precise building blocks with ad hoc properties, including lock-and-key particles [21], fused spheres [22], superballs [23] and cuboids [24,25,26,27]. The appearance of these exotic shapes unveiled a realm of novel opportunities in nanomaterials science by offering an increasingly varied selection of morphologies for state-of-the-art applications spanning medicine (controlled drug delivery), smart materials (self-healing coatings) and photonics (light detection), among others. Often anticipating experimental evidence, computer simulations have significantly contributed to our comprehension of the effect of particle shape and interaction at the nanoscale on the material properties at the macroscale [28,29,30,31]. Understanding the fundamentals of such a complex correlation, which develops over orders of magnitude in length and time scales, dramatically depends on the existence of reliable force fields mimicking the interactions between particles. This is not always the case for most exotic particle shapes, whose force field is assumed to be described by mere excluded volume effects and thus only incorporates a hard-core interaction potential. Consequently, efficient and robust algorithms able to detect collisions and intersections between objects become essential to extract structural, thermodynamic and dynamic properties of such systems from a molecular simulation. In colloid science, cuboids are especially intriguing building blocks that can form a rich variety of liquid crystal phases [32,33,34,35,36]. Incorporating guest spherical particles in these phases is relevant to understand phenomena of diffusion in crowded environments that display a significant degree of ordering.
In the light of these considerations, highlighting the harmonious interdisciplinary convergence of computer graphics and colloid science, here we report on the specific case of cuboid-sphere collision detection. In particular, we propose our own OpenMP [37] Cuboid Sphere Intersection (OCSI) algorithm to detect collisions in monodisperse systems of cuboids and spheres oriented in a 3D space. OCSI is found to be especially efficient when compared to the Quick Rejection First (QRI) and the Quick Rejection Intertwined (QRF) algorithms, and more user-friendly and easier to implement than the SIMD Streaming Extensions (SSE) algorithm. This paper is organised as follows. In Section 2, we detail the theoretical framework of the cuboid-sphere intersection problem and the state-of-the-art in software implementation. In Section 3, we describe the code that we have specifically developed to test each of the above-mentioned algorithms' efficiency for cuboids of different shape and spheres of different size. The comparison between the algorithms is then discussed in Section 4, while, in Section 5, we draw our conclusions.

Algorithms
In geometry, a sphere S is identified by its radius, R, and the position of its centre, r S , with respect to a reference point. Similarly, a cuboid C can be defined by the extension of its thickness, 2c T , length, 2c L , and width, 2c W , the position of its centre of mass, r C , and the unit vectorsê T ,ê L andê W that indicate the orientation of its three orthogonal axes. As a result, all the points within the volume occupied by the cuboid can be indicated by a vector C that reads where T , L and W indicate, respectively, the cuboid thickness, length and width, whereas α = − 1, 1 is a scalar interval. With these essential definitions, the minimum distance, d min , between the surface of a randomly oriented cuboid and the center of a sphere can be calculated as follows: where r SC = r S − r C and Θ is the Heaviside step function: The interested reader is referred to Appendix A for a formal derivation of Eq. 2. To the best of our knowledge, Arvo was the first to propose an algorithm detecting the intersection between a sphere and an axis-aligned cuboid, that is a cuboid whose orientation matches that of the reference axes [5]. For this specific case, we assume that the cuboid thickness is aligned with the x axis, i.e.ê T =x, its length with the y axis, i.e.ê L =ŷ, and its width with the z axis, i.e.ê W =ẑ. Following this assumption, Eq. 1 can be rewritten as whereî =x,ŷ,ẑ are the reference axes for T , L and W , respectively, and Therefore, for an axis-aligned cuboid, d min can be calculated as By using the infimum and supremum of B i , the terms in the above sum can be easily calculated: Consequently, the algorithm proposed by Arvo only requires the extreme values of B x , B y , B z along with the sphere radius and position and detects cuboid-sphere collisions if d min ≤ R. An illustrative example of a pseudocode describing its main steps is reported in Algorithm 1.
end for 10: if (d ≤ R 2 ) return true checking overlap 11: return f alse 12: end function The alignment of the cuboid unit vectors with the reference axes is a particular case of a more general scenario with the cuboid randomly oriented. Eventually, Arvo's algorithm can also be applied to randomly oriented cuboids by performing a translation of the vectors involved in the calculation of d min in the reference frame of C. Rokne and Ratschek proposed to estimate d min by employing interval analysis and reported a test to determine whether a point P ∈ C is within a sphere delimited by four peripheral points [6]. The algorithms proposed by Larsson and co-workers employ quick rejection and vectorized overlap tests to enhance the efficiency of collision detection between a sphere and either an aligned or a randomly oriented cuboid [7]. The pseudocode of these algorithms, referred to as Quick Reference Intertwined (QRI) and Quick Reference First (QRF), are reported in Algorithm 2 and Algorithm 3, respectively. Both QRI and QRF are based on the implementation of a quick rejection test that immediately excludes an overlap if at least one of the summands in Eq. 2 or Eq. 5 is larger than R 2 . For the general case of a randomly oriented cuboid, this condition reads A geometrical representation of this condition is provided in Fig. 1, where a sphere S of radius R and centered at r S is at the distance r SC ·ê L from the center of mass of a cuboid C that is centered at r C . For this specific arrangement, the left-hand side of Eq. 6 measures the distance of S from the face of C delimited by T and W and schematically identified by the vertical solid line of Fig. 1. Figure 1: Schematic representation of a sphere S and a cuboid C at relative distance r SC ·ê L . Sphere and cuboid are centered, respectively, at r S and r C , and c L is half of the cuboid length. If r SC ·ê L > c L + R, then S and C do not overlap.
QRI applies this rejection criterion within the loop that evaluates the minimum distance, precisely at lines 6 and 9 of Algorithm 2. By contrast, QRF performs the three quick rejection tests, one for each summand of Eq. 2, before the computation of the minimum distance, between lines 3 and 6 of Algorithm 3. In this case, the scalar products r SC ·ê i are stored in line 4 and eventually employed to compute d = d 2 min in the following loop.
if (l > r) return f alse quick rejection test 10: end for 13: if (d ≤ r 2 ) return true checking overlap 14: return f alse 15: end function The different location of the quick rejection tests in QRI and QRF is expected to determine a difference in the efficiency of the two algorithms, which is analysed in detail in Section 4. Additionally, QRI and QRF quick rejection tests depend on both c i and R, so these algorithms' efficiency are expected to be influenced also by sphere and cuboid geometry. Finally, keeping in mind the potential application in computational colloid science, where crowded systems are usually simulated, the efficiency of QRI and QRF is also influenced by the system packing, which determines the probability for an attempted move to produce an overlap.
A parallel version of Algorithm 1, generalised for randomly oriented cuboids and using Streaming SIMD Extension (SSE), was also proposed [7]. SSE is an instruction set that makes use of specific CPU registers to perform simple operations in parallel [38]. By substituting the if statements in lines 8 and 11 of Algorithm 3 to compute the minimum distance, with the max and min functions available in SSE, the computation of the minimum distance can be vectorized. This algorithm, running in parallel and thus significantly faster than QRI and QRF, does not need quick rejection tests. A pseudocode if (a i < −c i ) then if (d ≤ R 2 ) return true checking overlap 17: return f alse 18: end function for this algorithm, here named after the SSE instruction set, is presented in Algorithm 4 for the general case of randomly oriented cuboids. if (l T + l L + l W ≤ R 2 ) return true checking overlap 10: return f alse 11: end function Finally, we present our own algorithm, which incorporates a number of elements providing additional efficiency when compared to Algorithms 1, 2 and 3, and versatility when compared to Algorithm 4. A new element that significantly simplifies the algorithm is the use of the absolute value to estimate the minimum distance. In addition, we employed OpenMP directives for an SIMD parallelization of the cycle [39] without using SSE intrinsic instructions. The advantage of avoiding SIMD intrinsics is that one can vectorize a code written in Fortran, for which these instructions are not available. Given the heterogeneous nature of the communities using collision-detection algorithms and their preference for likely different programming languages, an user-friendly code is a crucial advantage. Our algorithm, referred to as OpenMP Cuboid Sphere Intersection (OCSI), proved to be efficient and functional for both C and Fortran 90 (F90). Its pseudocode is presented in Algorithm 5. if (l T + l L + l W ≤ R 2 ) return true checking ovlerlap 7: return f alse 8: end function

Computational Details
To test the relative performance of the above algorithms, we have developed two versions of the same program in C and in F90 that detect collision between one cuboid and one sphere. The dimensions of the cuboid and sphere are given in units of the cuboid thickness T , which is our unit length, and do not change within the same detection-collision test. In particular, the colloid length and width are L * ≡ L/T and width W * ≡ W/T , respectively, whereas the sphere radius is R * ≡ R/T . For each of the cuboid shapes analysed, we have pondered the impact on the algorithms' efficiency of changing the sphere radius between R * = 0.05 and R * = 5. To control the value of the acceptance ratio, i.e. the percentage of random configurations that do not produce overlaps, the sphere S is generated within a spherocuboid. This spherocuboid, centred and oriented as C, results from the Minkowski addition [40] of C and a sphere larger than S and whose diameter is optimized to obtain the desired acceptance rate. A dedicated program optimises the volume of the spherocuboid according to the target value of the acceptance ratio and the dimensions of both C and S, which are specified as input parameters. To generate a configuration, C is initially aligned with the reference axes and its centre is set as origin, while the centre of S is randomly positioned within the volume of the spherocuboid. Then, the reference system is randomly rotated and the cuboid-sphere overlap checked. For consistency, the section of the code that calls the overlap function is the same as that proposed by Larsson et al [7]. The time spent by each algorithm to detect collisions is computed for 3 independent sets of N c = 2 × 10 6 configurations and then averaged out. The efficiency of the algorithms has been assessed on Intel Core i5-8500 CPU @ 3.00GHz (Coffee Lake) CPU using Intel Compilers. In this work, configurations of cuboids with L * = [1,20], W * = [1,20] and spheres with R * = {0.05, 0.5, 5} have been tested.

Results and discussion
The relative performance of each algorithm is assessed in Fig. 2 and Fig.  3 for codes written in C and F90, respectively. Fig. 2  between QRI, QRF, SSE and OCSI, while Fig. 3 only for QRI, QRF and OCSI, being the SSE algorithm unsuitable for Fortran. Both figures report the run-times for detecting collisions between one cuboid of 1 ≤ L * ≤ 20 and 1 ≤ W * ≤ 20 and one sphere of radius R * = 0.05 (frames a), 0.5 (frames b) and 5 (frames c). The N c random configurations tested per run have been produced at the constant acceptance ratio of 40%, which is within the usual range of values employed in Metropolis Monte Carlo simulations of hardcore particles [41]. It is evident that SSE and OCSI perform significantly better than QRI and QRF under the conditions specified here, although we have also tested cuboids of larger length and width (up to 100T ) with the same acceptance ratio and observed very similar tendencies. The difference in performance is especially evident at R * = 5 as SSE and OCSI run-times are up to 5 and 6 times faster than QRF and QRI, respectively. In general, C codes show a better performance than F90 codes, although this difference is not substantial. Interestingly enough, SSE and OCSI do not present any relevant dependence on the cuboid and sphere geometry, being the run-times basically constant across the whole range of dimensions. This is probably due to the SIMD parallelism implemented, different from QRI and QRF, which have to run in serial for their use of quick rejection tests (see lines 6 and 9 in QRI and line 5 in QRF). Basically, if the quick rejection test is true for the first dot product, the algorithms exit the loop with negative result (C and S do not overlap) with no need to compute the remaining two. The geometry of both cuboid and sphere exhibits a very intriguing effect on the performance of QRI and QRF as the shape of the run-time surfaces in the L * W * plane suggests. More specifically, for spheres with R * = 0.5 (frames b in Figs. 2 and 3) the time required for the collision detection decreases upon increasing the cuboid dimensions, with the shortest time detected at L * = W * = 20 (disk-like cuboid). Larger spheres, with R * = 5 (frames c in Figs. 2 and 3), induce a different performance resulting in an opposite concavity of the run-time surface as compared to that observed for smaller spheres. In this case, the slowest detection is measured at (L * , W * ) = (4,8) and (3,4) for QRI and QRF in C program, respectively, and (3,9) for both QRI and QRF in F90 program. In general, QRF is faster than QRI, except when the spheres are especially small (R * = 0.05) and F90 is used. The difference in performance between QRI and QRF might be due not only to how data are stored and read by C and F90 compilers, but also to the value set for the acceptance ratio. As a matter of fact, Larsson and coworkers had already noticed that the run-times of QRI and QRF were very similar for acceptance ratios of approximately 50%, although their collision-detection method tests run-times for sets of configurations with cuboids and sphere of random dimensions [7].

offers a benchmark
To more easily compare the efficiency of the algorithms tested, the runtimes reported in the frames of Figs. 2 and 3 have been averaged out for each value of R * . The resulting average run-times are reported in Tables 1 and 2. We stress that these average run-times should be taken as indicative values for QRI and QRF as their speed strongly depends on the cuboid geometry. We observe that QRI and QRF average run-times tend to be longer for larger spheres, with no significant difference between C and F90. By contrast, both SSE and OCSI are completely insensible to the sphere radius as no relevant change in their average run-time is detected between R * = 0.05 and 0.5.
We also notice that the performance of OCSI and SSE is very similar, with our algorithm between 0.2 and 0.4 ms faster than SSE. This difference, per se, would not be especially significant if the overlap checks were limited to 2×10 6 configurations. However, the typical number of configurations generated in Monte Carlo simulations of colloids is usually a few millions per particle, which are rarely less than a few thousands. Moreover, because colloids can be especially dense systems, one random configuration might generate more than a single collision. Consequently, it is reasonable to assume that, within a single Monte Carlo simulation, a collision-detection algorithm might be Table 1: Average run-times of the C version of algorithms that detect collisions between one cuboid of 1 ≤ L * ≤ 20 and 1 ≤ W * ≤ 20 and one sphere of radius R * over 2 × 10 6 configurations with 40% of acceptance ratio.  called between 10 3 and 10 5 times the configurations explored here. This would produce a difference of no more than a few tens of seconds between OCSI and SSE, which is still irrelevant. The main advantage of using OCSI is that it is based on automatic vectorization and employs OpenMP libraries to be parallelized, making it a very user-friendly algorithm. Also for this reason, OCSI can be directly implemented in Fortran, with no need to compile mixed C/Fortran codes.

Conclusions
In summary, we have benchmarked four different collision-detection algorithms that check the occurrence of overlaps between one cuboid and one sphere. Our analysis focused on a specific acceptance ratio, which is within the usual range applied to efficiently sample the configuration space of hardcore systems in Monte Carlo simulations [41]. We notice that SSE has been previously tested for different acceptance ratios and did not show relevant changes in performance [7]. A similar tendency is also expected for OCSI, but should be confirmed by further tests. While QRI and QRF are observed to be geometry-dependent, SSE and OCSI are basically insensible to the cuboid anisotropy and sphere radius and, thanks to automatic vectorization, they are also significantly faster. In particular, the OCSI algorithm proved to be especially valuable in terms of performance and simplicity of implementation in both C and F90. It should be stressed that the method applied to generate the sphere around the cuboid is crucial to provide a robust comparison between different algorithms. The choice of the spherocuboid as a sampling volume allows to precisely set the desired acceptance ratio and guarantees that the algorithms are tested for all the possible positions of the sphere around or inside the cuboid. This is especially relevant to fairly assess the performance of QRI and QRF, due to their use of quick rejection tests. In Monte Carlo simulations, where the generation of configurations follows a different procedure, the performance of collision-detection algorithms, most likely affected by the degree of system order and packing, should be tested. Finally, it is important to mention that the OCSI algorithm allows for the calculation of the cuboid-sphere minimum distance, hence suggesting future study to determine a suitable interaction potential beyond mere hard-core interactions. The formal proof reported here can also be useful to test the intersection of cuboids with particles of different shape. The last term in Eq. A.4 has been obtained considering thatê i ·ê j = δ ij and that every member of the sum depends on just one value α i , hence they are all independent. It is sufficient to calculate only one term of this sum as all dimensions are equivalent. In particular, this term equals zero if the following conditions are met: Because α i = [−1, 1], this implies that If |r SC ·ê i | > c i , then (r SC ·ê i ) > c i or (r SC ·ê i ) < −c i . The former inequality implies that Therefore, the minimum distance of a point r S from the surface of a cuboid C reads min Finally, a cuboid C overlaps with a sphere of radius R and centre in r S , if the following inequality is satisfied: