Geometric Shape Characterisation Based on a Multi-Sweeping Paradigm

: The characterisation of geometric shapes produces their concise description and is, therefore, important for subsequent analyses, for example in Computer Vision, Machine Learning, or shape matching. A new method for extracting characterisation vectors of 2D geometric shapes is proposed in this paper. The shape of interest, embedded into a raster space, is swept several times by sweep-lines having different slopes. The interior shape’s points, being in the middle of its boundary and laying on the actual sweep-line, are identiﬁed at each stage of the sweeping process. The midpoints are then connected iteratively into chains. The chains are ﬁltered, vectorised, and normalised. The obtained polylines from the vectorisation step are used to design the shape’s characterisation vector for further application-speciﬁc analyses. The proposed method was veriﬁed on numerous shapes, where single-and multi-threaded implementations were compared. Finally, characterisation vectors, among which some were rotated and scaled, were determined for these shapes. The proposed method demonstrated a good rotation-and scaling-invariant identiﬁcation of equal shapes.


Introduction
Dealing with geometric data has become one of the main issues of many modern computer applications.There are countless solutions using geometric data in manufacturing, robotics, traffic, medicine, engineering, chemistry, cultural heritage, art, security, and defence.Unfortunately, answering questions about geometric shapes, which are treated easily by humans, frequently represents a considerable challenge to computers.Among such tasks are finding (almost) identical shapes, extracting those shapes that expose some kind of symmetry, finding the desired objects in point clouds obtained by remote sensing scanners, discovering pathological structures in medical images, or matching biometric data.Namely, the internal data structures storing the information about geometric shapes are designed with the main aim of how to represent the shapes in an unambiguous way [1,2] and do not support querying about the shapes' characteristics directly.
In this paper, the shape characteristic corresponds to the description of the shape's geometrical and/or topological properties in a countable way.We will refer to it as a characterisation in the continuation (terms such as attributes, properties, or features are also used [3]).This approach is based on geometric shapes' local symmetries and the multi-sweeping paradigm [4] and works in 2D.The proposed method works in three steps: • Initialisation, where a shape is inserted into a grid of equally sized cells; • Processing, where the shape is swept several times with sweep-lines having different slopes; as a result of each sweep, the interior midpoints with respect to the shape boundary are determined and linked into the chains of midpoints; • Finalisation, where the obtained chains are filtered, vectorised, and normalised.A shape's characterisation vector is then formed from the polylines, which were obtained by the vectorisation.
The main benefits of this approach are the following: • The obtained set of polylines enables the construction of various, application-specific characterisation vectors; • It handles free-form shapes; • It processes the shapes containing holes without any modifications in the algorithm; • It can be parallelised.
The paper consists of five sections: Section 2 contains a summary of the previous works; Section 3 introduces the new shape characterisation approach; Section 4 presents the experimental results; Section 5 concludes the paper.

Related Works
The sweeping paradigm is explained first in this section.The shape characterisation methods, most similar to the introduced approach, are explained briefly after that.

Sweeping Paradigm
Sweeping, proposed by Shamos and Hoey [5], is an algorithmic paradigm used to solve various geometric problems.The idea is straightforward.Let s be a sweeping element (typically, a line in 2D or a plane in 3D), which glides continuously through the Euclidean space populated by geometric objects.When the geometric object of interest is hit by s, the sweeping element stops for a while, works out the considered problem locally, and updates an internal data structure.The stop is considered a sweep event, while the data structure a sweep status.In this way, the problem is solved behind s completely and unknown in front of it.When all the geometric objects have been passed by s, the sweep status contains the final solution of the considered problem.In practice, however, s does not glide continuously, but jumps from event to event.For this reason, the geometric objects should be sorted in regard to the movement of s before the sweeping is started.This is why s moves typically along one of the coordinate axes.

Characterisation Methods and Skeletons
The characterisation of geometric shapes has attracted much research culminating in various reviews [15][16][17] and considered in books [18][19][20].In general, the characterisation of shapes results either in a numerical value or in an alternative shape representation.The first group of methods parses the shape boundary and applies various transforms on it, while the second group stays in the space domain and produces another shape representation, from which a vector of values is derived (i.e., a characterisation vector).In the continuation, we review the latest ones briefly, among which the most-well-known is the medial axis transform or topological skeleton.There are, however, different terminologies in use [16].However, for the purposes of this overview, we considered them the same and shall use the term skeleton in the continuation.
The skeleton (the concept was introduced by Blum [21]) is a set of all points being inside of the shape and having more than one closest point on its boundary.In this way, a reduced version of the shape is obtained, which contains enough information to reconstruct the shape.The skeleton captures the geometrical and topological characteristics of the shape and represents them internally with a graph, from which the characterisation information, such as the connectivity, lengths, directions, and widths, can be obtained directly.This information can then be used in the characterisation process.The main problem of the skeleton is its sensitivity to noise, as even a small change in the shape's boundary can cause a considerable change in the graph's topology.A different solution was proposed to mitigate this problem [22].
A simple polygon can be represented by a straight skeleton [23].As the name suggests, it consists only of line segments in contrast to the topological skeleton, which may contain parabolic arcs.Its generalisation to general polygons was introduced shortly after that [24].An algorithm for constructing an approximate straight skeleton using Steiner points was suggested in [25].
A scale axis transform, another type of skeleton, was proposed in [26].It is defined by multiplicative scaling operations, with the aim to eliminate small local features of the shape.The points belonging to the skeleton are considered the centres of balls, touching at least two boundary points.By the gradual scaling of the shape, some balls become covered entirely by other balls.These covered balls are removed, and a hierarchical skeleton is obtained as a result.The skeleton is simplified most at the topmost level.
A β-skeleton was suggested in [27].It is an undirected graph, defined on a set of points on the plane.The boundary points p i and p j are connected by an edge if there exists point q whose angle ∠p i qp j is greater than the user-defined parameter β.The undirected graph is not always connected in this way.
The most-recent studies in the field of shape characterisation heavily rely on neural networks and deep learning.Applications of these state-of-the-art techniques have been utilised successfully in numerous research domains, such as medicine [34,35], remote sensing [36], and physics [37].Unfortunately, the downside of these methods is the requirement for large training sets in order to achieve high characterisation accuracy.

Materials and Methods
Let π be a rasterised plane consisting of equally sized squared cells c i,j , 0 ≤ i < n, 0 ≤ j < m, where n and m define the horizontal and vertical resolutions of π.Each cell c i,j is associated with an attribute a i,j ∈ {I, B, E}, where I stands for interior, B for border, and E for exterior.Let S be a subset of π, such that S = {c i,j } : a i,j ∈ {B, I}.In addition, let us introduce sweep-line s(α) with the slope α ∈ [0 • , 180 • ).s(α) investigates π by gliding through it.The sweeping is repeated for different slopes α; this is why the method is considered the Multi-Sweep Characterisation Algorithm (MSCA) in the continuation.It works in three main steps: These are discussed in the following subsections.

Initialisation
The MSCA accepts S either in a vector or a discrete form.The task of the initialisation is to unify these two possibilities for the unique processing.The bounding box BBox(S ) = (x min , y min , x max , y max ) is determined firstly in both cases, where (x min , y min ) and (x max , y max ) represent its left-bottom and right-upper corner, respectively.BBox is then moved at the origin and becomes our rasterised plane π.If S is given in the discrete form (for example, by one of the known chain codes exposing four-connectivity [38][39][40][41]), the cell's size = 1, and the size of the bounding box is obtained as n = x max − x min and m = y max − y min .Otherwise, when S is given in the vector form, suitable heuristics should be applied to determine the size and the number of cells n and m. S is rasterised by the four-connected rasteriser [42,43], and the shape's boundary cells are obtained.
Having π and the boundary cells determined, the interior cells are marked by setting a i,j = I by one of the shape-filling algorithms, while all the remaining cells are marked by setting a i,j = E. Figure 1 shows the result of the initialisation for the demonstration shape, which has been given in the vector form at the input.
Result of the initialisation for the demonstration shape plotted in orange, where cells with a i,j = E are white, a i,j = B are black, while the grey cells indicate a i,j = I.

Multi-Sweeping
Because π is discrete, some changes to the classical sweep-line paradigm (explained in Section 2.1) are needed in the MSCA:

•
Sorting of geometric objects is not needed as the cells in π are organised clearly; • π is not infinite, but bounded by its frontier cells, i.e., c i,0 , c 0,j , c n−1,j , c i,m−1 , 0 ≤ i < n, 0 ≤ j < m; • s(α) does not move from an event to an event, but advances through the consecutive frontier cells.
The multi-sweep part of the MSCA is explained by the pseudocode shown in Algorithm 1.An initialisation of variables is performed in Lines 8-10.The function in Line 14 (considered later in Algorithm 2) returns the endpoints (x 1 , y 1 ) and (x 2 , y 2 ) of the sweep-line segment s(α).The function also sets the flag, indicating whether the whole π has been swept.If it has not, the intersections between s(α) and cells with a i,j = B are calculated by the function in Line 16.The midpoints t i,j between these intersections, which are inside S, are calculated and returned in sequence T = t i,j by the function in Line 17.They are appended to previously determined midpoints to form a set of chains L = {L i }, L i = t i,j .The chains are controlled by two sweep-line events as follows: • Chain L i is created when the local shape feature is met by s(α) (Sweep-lines a and b in Figure 2); • Chain L i is terminated when the local shape feature is swept completely (Sweep-line c in Figure 2).
In this context, the local shape feature is any concave part of S (if S is convex, only one chain is obtained during each sweep).These two events can, however, appear simultaneously at any position of the actual s(α).For example, the chain (or more of them) can be terminated, and another one (or more of them) can be created at the same time (see Sweep-line c in Figure 2).The opposite case is shown for Sweep-line d in Figure 2, where one chain is terminated and three new chains are born.The obtained chains are stored in the sweep-line status SLS = {L i }.The whole process is repeated by increasing α in Line 22 by the user-defined parameter step.The MSCA terminates when α ≥ 180 • .It returns terminated chains, stored in SLS, for further processing.
It is obvious that the cardinality of SLS depends on the local shape's features and the value of the parameter step.Although its actual value is not critical, some reasonable guidelines should be considered:

•
Too small values result in many similar (or even equal) chains, which do not contribute additional information to the shape characterisation and slow down the whole process.

•
Large values may cause some local feature to be missed if the filtering process, as described in Section 3.3, is applied.

•
It is practical that step is an integer divisor of 180 • .
Various values of step were evaluated in our experiments.However, the values for the parameter step = i • 15 • , i = 1, 2, ..., 11, yielded the best results.

SLS ← ∅
Sweep-line status is empty at the beginning T ← CalculateMidPoints(borderPixels) The pseudocode in Algorithm 2 determines the endpoints of s(α).The function in Line 7 returns necessary frontier cells one by one until all of them are used.The returned frontier cells depend on α as shown in Figure 3, where these cells are coloured in blue.The frontier cell (x 1 , y 1 ), obtained by this function, is the first sweep-line coordinate.The second one is obtained by the function in Line 9.This function calculates the intersection point of the line passing through cell (x 1 , y 1 ) with slope α and pierces BBox(S ) = (0, 0, m, n).
Algorithm 2 Algorithm returns the sweep-line's endpoints.n, m: the resolution of π 4: x 1 , y 1 , x 2 , y 2 ; the endpoints of the sweep-line, returned by the function The results of multi-sweeping for slopes α = i • step, i = {0, 1, 2, 3, 4, 5} and step = 30 • are shown in Figure 3. Midpoints that have already been determined by s(α) are coloured in red.Meanwhile, midpoints that lie in front of s(α) and are yet to be discovered are plotted in grey.

Finalisation
Finalisation consists of three parts: Figure 4 shows the remaining chains in SLS after filtering.Chain vectorisation: Round-off errors in the raster space π are, unfortunately, unavoidable.Therefore, it is favourable to vectorise L i ∈ SLS to minimise the effect of the round-off errors in the further characterisation process.The well-known Douglas-Peucker algorithm [44] was applied on L i ∈ SLS.The set of polylines P L = {PL i } was obtained, which replaced SLS in the further steps of the algorithm.Normalisation: The normalisation is performed to make the characterisation of S insensitive to scaling or rotation.BBox(S ) = BBox(0, 0, n, m) is transformed into a normalised bounding box BBox * (S) according to (3).
and after that, PL i ∈ P L are transformed similarly.

Time Complexity of the Algorithm
The MSCA operates in discrete space π, which consists of equally sized cells c i,j , 0 ≤ i < n, 0 ≤ j < m.There are, altogether, k = n × m cells.The forming of π with all k cells is performed in linear time O(k).S is then embedded into π to determine boundary cells, and after that, the remaining cells are classified as being either inside or outside of the shape.Each cell is visited only once, and therefore, the classification of all cells is performed in O(k).It can, therefore, be concluded that the initialisation is performed in linear time O(k).
The main part of the MSCA is multi-sweeping.Let us consider the whole sweep-line process for the given slope α.Sweep-lines are sent through m + n frontier cells.The first coordinate of each s is determined in this way, while the second is calculated in constant time O(1) by determining the intersection of the bounding box and s. s is then rasterised, and the exact intersection points are calculated for cells with the attribute a i,j = B.The number of boundary cells on s is considerably smaller than k, and as the calculation of the intersection points is performed in constant time, all intersection points on a sweep-line are obtained in O(1).The midpoints t i,j between the obtained intersection points being inside S are calculated after that in O(1).The sequence of midpoints T is obtained in this way.Midpoints from T are then concatenated to chains L. However, as the number of chains is significantly smaller than k, this task is also terminated in O(1).We have already stated that the count of all sweep-lines at an arbitrary angle α is at most m + n k.However, all cells that form π are visited during one sweep-line process, and therefore, the whole π is swept in O(k).The sweeping is repeated multiple times at various slopes.The number of slopes is considerably smaller than k; therefore, the time complexity of all different slopes remains O(k).
Finalisation consists of three steps and operates only on obtained chains consisting of midpoints stored in SLS.As the number of midpoints in SLS is significantly smaller than k, it can be accepted that the finalisation is performed in constant time O (1).It can, therefore, be concluded that the proposed MSCA works in linear time O(k), where k is the number of cells defining the raster space π.

Experiments
This section consists of two parts.The information about 12 testing shapes is given first, and the results of the MSCA are presented on them.The efficiency of the method was evaluated after that by measuring the CPU time spent on single-and multi-threaded implementations.In the second part, the MSCA was applied to find equal shapes, some of which were rotated and scaled.For this, a characterisation vector V(S ) was constructed for each shape and compared against the characterisation vectors of the other shapes.

Demonstration of MSCA on Testing Shapes
Twelve shapes, shown in Figure 5, were used in the experiments.Their borders were described by the Freeman chain code in eight directions [38].The properties of these shapes and the number of detected chains are collected in Table 1.In the continuation, the results obtained by the MSCA for two shapes, Circle and Cupid, are shown in Figure 6 and Figure 7, respectively.Circle is the simplest shape, where the chains are in the form of straight lines.However, Cupid is a challenging shape containing holes and many concave parts.As can be seen, the MSCA handled both shapes successfully.It should be noted that the characterisation of these shapes can be performed equally successfully with other values of step as far as the guidelines given in Section 3.2 are followed.
The CPU times spent by the MSCA are shown in Table 2.A personal computer was used with an Intel i9-12900K CPU and 64 GB of DDR5 RAM running Windows 11.An MSVC compiler for C++, along with Microsoft Visual Studio 2022, were applied for development and compilation purposes.Two versions of the MSCA were implemented: the single-and the multi-threaded one using 12 threads.As shown, the multi-threaded implementation reduced the processing time considerably only for shapes with the larger BBox.

Recognition of Equal Objects
Arbitrary selected shapes from Figure 5 were used for this experiment.Some of them were rotated by a multiple of 90 • , and some of them were enlarged by a factor of two, while the remaining shapes were just copied.The set of shapes obtained in this way is shown in Figure 8.The aim of the experiment was to find equal objects, regardless of whether they were rotated, scaled, or just duplicated.For this, characterisation vector V i (PL) for shape S i was constructed using the set of polylines PL, produced by the MSCA.Various characteristics can, of course, be designed.V i (PL) was formed in this experiment as follows:

•
For each polyline PL k ∈ P L, 0 ≤ k < |P L|, its length was calculated according to (4).
where D denotes the Euclidean distance between consecutive polyline points.• Components V i k of an individual vector V i were sorted after that in decreasing order.Two shapes S i and S j are considered equal when: where | | denotes the cardinality of vectors, and if this condition is true; , where ≈ corresponds to a user-defined 5% tolerance.This tolerance was determined experimentally as the best compromise between the ability of the algorithm to, despite the rounding errors, discriminate similar, yet different shapes (e.g., a circle or ellipse).The rounding errors, unfortunately, cannot be avoided during the sweep-line rasterisation process and geometric transformations of the shapes.Table 3 reports the results of these experiments.The MSCA, with the proposed characterisation vector, found equal shapes successfully in all cases, regardless of their rotation and/or scaling.The proposed approach can also be adjusted to detect shapes that are not perfectly equal by softening the above two conditions.For example, the cardinalities |V| i and |V| j can be considered the same by allowing some variation and using a tolerance larger than 5%.However, these parameters should be determined by the user according to the specific application.
Table 3. Equality of the objects where letters refer to the shapes from Figure 8. 2 Non-equality of a pair of shapes. 3Equality of a pair of shapes.

Conclusions
A new method for the characterisation of geometric shapes was presented in the paper.It was based on the sweeping paradigm, used frequently in traditional Computational Geometry.However, in this work, it was adapted for the raster space.The geometric shape was swept by following the frontier cells of the rasterised plane.The interior shape's points, being in the middle of its boundary and laying on the sweep-line, were determined during each sweep step and connected in chains.Their construction was controlled by two sweep-line events.They used the local characteristics of the considered shape to determine the beginning and ending of each chain.The sweeping process was then repeated using different slopes of the sweep-line.The chains were filtered, vectorised, and normalised after that.As a result, a set of polylines was obtained, and various characterisation vectors can be extracted from it.The proposed approach utilised the local symmetry of the geometric shapes to recognise their eventual similarity, without the need to detect the symmetry explicitly.If the extraction of the local symmetrical features was the goal, the algorithm could also be generalised to produce such an output.
For the proof of concept, the method was implemented within the Multi-Sweep Characterisation Algorithm (MSCA).Its correctness and computational load were demonstrated by twelve testing shapes of different sizes and complexities (from the most simple circle to shapes with many concave parts and holes).Single-and multi-threaded implementations of the MSCA were tested, where the multi-threaded implementation was considerably faster for larger shapes.Finally, the results of the MSCA were used for finding equal shapes on the scene.A simple characterisation vector consisting of normalised polylines' lengths was constructed for each shape.The proposed approach determined reliably equal shapes in all cases, regardless of their rotation or scaling.
The MSCA offers many new challenges for further research.Although its theoretical time complexity is linear in regard to the number of cells of the raster space π, it turned out to be rather slow for a large number of cells and the shapes containing holes and many concave shapes.Therefore, it would be worth investigating whether hierarchical spatial data structures, such as quadtree/octree or kd-trees, would accelerate the algorithm.For the proof of concept, the MSCA was implemented in 2D.Theoretically, it should also work in higher dimensions.Therefore, a 3D implementation will be performed in the future.The parameters of the MSCA were determined empirically in this research.It would be important to determine theoretically how to set these parameters optimally.New characterisation features can be constructed from the obtained chains, e.g., such that it measures the waviness of count junctions that occur when joining chains from sweep-lines with various angles.In addition, the result of the MSCA could be combined with other characterisation methods, for example with shape skeletons.Finally, the sweep-line itself could be replaced by different investigating elements; conics (i.e., a circle or circular arcs) should be the first choice.

Algorithm 1
The multi-sweep-line part of MSCA.1: function MULTI-SWEEP(step, size, n, m, π) 2: step: an increment of the sweep-line slope 3: size: the size of the cell 4: n, m: the resolution of π 5:π: rasterised plane with embedded geometric shape S

Figure 2 .
Figure 2. Common sweep-line events (the arrow denotes the sweep-line moving direction, while the sequence of red dots belongs to the chains L i ; characteristic positions of sweep-line are marked with characters a-e).

Figure 8 .
Figure 8. Shapes used for detection of equality; the shapes are denoted by letters A-P.
| ≤ ∆, then chain L i is removed from SLS. |L i | denotes the number of points in L i , while ∆ is a threshold denoting the minimal number of midpoints in the chain.It may be determined by users or by heuristics.An example of such a heuristic used in the experiments presented in Section 4 is given in (1).
(b)The average angle α of lines is calculated, determined by the sequential pairs of midpoints t x,y ∈ L i .L i is accepted if α is close to being perpendicular in regard to α, i.e, if the heuristic, given in (2), is valid.90 • − 0.25 • step ≤ |α − α| ≤ 90 • + 0.25 • step.

Table 1 .
Properties of S.

Table 2 .
CPU time of the MSCA spent for different shapes.