1. Introduction
Tornado-induced treefall patterns provide a physically based proxy for estimating tornado intensity under the Enhanced Fujita (EF) scale and for characterizing near-surface wind fields in forested regions using treefall-based methods [
1,
2,
3]. The orientation, density, and spatial organization of fallen trees have been used to infer predominant wind direction, assess relative wind intensity, and support EF damage assessments. This is particularly important in non-urban areas where the current EF scale lacks anthropogenic damage indicators and relies primarily on natural features such as trees. These treefall-based approaches will be formally adopted in an upcoming ASCE Standard for Tornado Wind Speed Estimation [
4].
In such methods, the number of fallen trees, their spatial density, and their fall directions directly inform analytical estimates of wind speed and hazard. Consequently, improving instance-level detection accuracy is essential for producing reliable treefall-derived wind-field reconstructions.
To this end, the preceding study was based on the automated analysis of tornado-induced treefall using high-resolution uncrewed aerial system (UAS) imagery and deep-learning techniques [
5]. That work introduced a YOLO11 instance segmentation pipeline to identify fallen trees and root balls, followed by a geometry-driven estimation of fall direction through tree taper rate analysis and the generation of wind-direction maps. Although the YOLO11x-seg model achieved strong overall accuracy, two well-known issues in instance segmentation remained especially pronounced in tornado-affected forest imagery, where fallen trees often lay close together, and root balls were partially hidden [
6,
7,
8].
- (1)
Fragmented masks: single objects split into multiple parts
In some cases, instance segmentation models detected a single object as multiple disconnected fragments due to the high overlap range, irregular shape of the fallen trees, shadow, and local pixel distortions. As a result, the model predicted a single fallen tree or a root ball as two or more disconnected mask regions.
Figure 1a shows a tree trunk represented by separate mask fragments within the same detection from YOLO11x-seg. Tree-level delineation in
Figure 1 was derived directly from YOLO11 instance segmentation masks, rather than bounding boxes, using polygon-based mask extraction and subsequent geometric analysis. In this figure, the blue color represents the treefall, and the green color represents the root ball instances, before any post-processing. YOLO/COCO-style mask polygon exporters do not allow holes in the masks. Therefore, when the model returned a fragmented mask for a single object, it may have retained only one polygon (often the first or largest area fragment) and discarded the remaining sections. This led to an underestimation of tree length and subsequent feature-based analyses.
Figure 1b illustrates how only one (lowest Y axis) fragment was recorded, resulting in inaccurate length and orientation. In
Figure 1a,c, blue polygons are the treefall, and green polygons are the root ball prediction generated using the YOLO11x-seg with their corresponding confidence score. While in
Figure 1b,d, the blue polygons represent the retained part of the instance mask, and the red arrows represent the orientation vectors computed after taper analysis [
5].
- (2)
Duplicate detections: multiple predictions for the same object
In other cases, the model detected a single tree or root ball more than once, producing overlapping predictions, as shown in
Figure 1c on a different instance in the picture. Duplicate predictions occurred frequently in natural imagery, particularly for elongated fallen trees with an overlapping arrangement. These duplicates increased the total object count and introduced errors in density-based analyses, where knowing the exact number of fallen trees was critical [
3]. These limitations motivated the development of the shape-aware post-processing framework introduced later in this study.
Figure 1d shows duplicated predictions that resulted in overlapping treefall vectors in the analysis.
Need for Geometry-Based Post-Processing
While YOLO11 leveraged built-in filtering by applying Non-Maximum Suppression (NMS) to address segmentation problems, this method was optimized for isolated objects [
9]. The main limitation of NMS was that it operated at the bounding box level [
10], preserving the highest-confidence detections for an object and removing the remaining ones when the Intersection over Union (IoU) exceeded a specified threshold. This approach was not optimal in crowded regions where bounding boxes frequently overlapped [
11].
In practice, tornadoes produced extreme damage, with many fallen trees lying parallel and clustered, and in some cases, root balls located close to one another. Under these conditions, bounding-box-based NMS often dropped fragments of a single tree or failed to suppress duplicates effectively. At the same time, root balls and fallen trees had irregular geometry: fallen trees were elongated, while root balls were approximately elliptical in shape. Therefore, due to differences in their spatial and physical structures, applying the same post-processing strategy for both classes was not optimal.
Accurate treefall geometry is required to estimate fall direction and generate wind patterns for reliable storm-intensity assessment. Therefore, improving mask quality is a critical step in the workflow. To this end, the next section summarizes previous approaches to resolving the instance-segmentation problems and the methods used to improve mask quality. These prior studies provided the foundation for the shape-aware post-processing method described in
Section 3 (Methodology), which accounts for the actual mask geometry rather than relying solely on bounding-box overlap. The following section reviews post-processing and merging strategies that form the conceptual basis for developing the shape-aware framework proposed in this study.
2. Previous Work
The effectiveness of a deep-learning segmentation model ultimately depends heavily on post-processing refinement after inference. In tornado-damaged forests, where trunks, branches, and root balls often overlap, the baseline YOLO11 detections usually contained fragmented masks and duplicate objects [
6,
12]. These problems arose from both the complex scene geometry and the limited geometric awareness of most segmentation frameworks.
Over the past two decades, a variety of post-processing methods have been developed to improve the integrity of segmentation outputs. These methods have been broadly grouped into three methodological categories:
Overlap-based filtering, which eliminates redundant detections through spatial overlap analysis.
Region-growing and watershed-based merging, which reconstructs fragmented areas based on local similarity and continuity.
Geometry-based reconstruction, which recovers the physical shape of natural objects through mathematical boundary modeling, such as the α-shape.
Each method category addresses a distinct aspect of the overall problem, yet the focus has remained on computational efficiency and the preservation of geometric details. The following sections describe how these developments collectively motivated the shape-aware post-processing framework proposed in this study.
2.1. Overlap-Based Methods
Among the earliest and most widely adopted strategies for improving mask quality in instance segmentation were Non-Maximum Suppression (NMS) and Non-Maximum Merging (NMM), both designed to handle overlapping predictions. These methods evaluated the degree of overlap between instances using the Intersection-over-Union (IoU) metric. When the overlap exceeded a defined threshold, NMS removed the detection with the lower confidence, while NMM combined the two into a single, more complete mask [
12].
At the core of early post-processing pipelines was a simple idea: if two detections substantially overlapped, the strongest detection was retained and the others were suppressed. The IoU ratio between bounding boxes, which quantifies the portion of shared area relative to their union, was used to detect duplicate detections efficiently [
10,
13]. However, the box-level convenience introduced a geometric compromise. For elongated or curved objects, such as fallen trees, rectangles misrepresented the real boundary, causing NMS to mistakenly identify distinct trees as duplicated. This limitation was frequently highlighted in studies on high-resolution scenes, where rigid IoU thresholds could not follow irregular natural shapes [
12,
13].
To reduce that mismatch, researchers moved from boxes to masks. Rather than comparing rectangles, they compared the pixel-level overlap of the predicted instance masks. This mask-area formulation preserved irregular or non-rectangular objects better than previous methods, although the computational cost increased [
11]. However, box-IoU NMS remained extremely fast, as it only used four coordinates per bounding box, making it suitable for real-time analysis. In contrast, higher geometric details relied on mask-level comparison, which was more computationally expensive. To accelerate the analysis, R-tree spatial indexing was used to skip far-apart comparisons and reduce redundant IoU checks on tiled orthomosaics; yet box-IoU still could not handle non-rectangular shapes [
7].
Later studies combined the efficiency of IoU-based filtering with adaptive, context-aware reasoning. Adaptive thresholds allowed overlap tolerance to vary with local detection density [
14], while other approaches scaled bounding boxes according to curvature to better fit irregular outlines [
15]. Another approach was the development of mask-aware NMS methods, which preserved the natural contours of the complex shapes by combining the speed of box-level IoU with the precision of pixel-level comparison [
11]. Collectively, these refinements made overlap filtering more flexible and more faithful to object geometry without sacrificing computational efficiency.
Even so, overlap rules remained limited for elongated or fragmented natural forms. In natural scenes, overlapping boxes could wrongly eliminate valid trees as duplicates, and none of these methods addressed the fragmented mask problem, which is a critical issue for applications such as the present study, which requires accurate object geometry [
13].
2.2. Region-Growing Watershed Methods
In high-resolution damage mapping, region-growing and watershed-based methods have long served as practical tools for refining the fragmented outputs of instance segmentation models. For treefall and root ball detection, these techniques were particularly effective for connecting partial mask fragments that represented different sections of the same tree. The main idea was straightforward, when nearby pixels had similar spectral or spatial properties, such as tone, color, or surface texture, they were progressively merged into continuous regions. As a result, scattered or incomplete detections were transformed into coherent shapes that better traced fallen trunks or uprooted root balls [
16].
Early studies in remote sensing demonstrated the potential of this approach by applying spectral consistency checks to avoid over-merging adjacent regions [
16].Although these methods successfully joined fragmented regions in ultra-high-resolution scenes and produced clear boundaries, they remained highly dependent on local contrast [
17]. In post-tornado forest imagery, lighting variation, exposed soil, and shadows often caused pixels from the same fallen tree to appear with different brightness values. As a result, the algorithm sometimes failed to reconnect fragments or produced incorrect merges.
To reduce these limitations, later research integrated geometric rules into region-growing and watershed algorithms. Rather than relying only on tonal or spectral similarity, newer approaches considered the physical structure of the object itself. A shape-oriented watershed model that combined spectral homogeneity with geometric descriptors was introduced, allowing the process to follow curved and irregular boundaries [
18]. In a similar effort, the method was refined by incorporating compactness and length criteria so that the merged regions better preserved the proportions of elongated objects, such as fallen tree trunks [
17]. These advances shifted the focus toward hybrid models, where spectral similarity still guided pixel grouping, but spatial geometry helped maintain realistic natural continuity.
Even with these improvements, region-growing frameworks continued to struggle in low-contrast conditions where faint or irregular boundaries produced inconsistent and incorrect results [
17]. To address this error, boundary-continuity rules were added to guide adjacent segments to align with strong linear features such as tree lines or roads [
19]. While this adjustment allowed partially visible objects to reconnect more smoothly and accurately, shadows, fallen tree crowns, and scattered debris still made edges difficult to follow. For treefall and root ball detection, such conditions demonstrated that relying on pixel similarity, even with added geometric adjustments, could not fully capture the spatial continuity of natural forms.
Research in recent years has shifted toward blending radiometric and geometric reasoning within a single framework. The goal has been to ensure that the resulting object conforms to its expected physical form. This involves using shape or directional cues to decide how fragments should connect to produce smoother, more complete masks [
19,
20]. Aligning fragments along major landscape linear components, such as tree lines, paths, or corridors, has made merged regions more realistic and continuous and noticeably improved the overall quality of reconstructed objects. Therefore, the gradual move away from color-based merging had led to what can be described as geometry-aware refinement, in which the merging process is influenced by the object’s overall form and spatial context. In treefall and root ball reconstruction, this shift determines whether the merged masks follow the actual outline of the trunks and the ellipsoidal structure of the root balls, or simply blend nearby pixels with similar tones.
2.3. Geometry-Based Methods
The α-shape approach represented a significant step toward using the object geometry. The method expanded on the convex hull concept, allowing the boundary to curve inward and follow the object’s actual shape. The α parameter controlled boundary detail: larger values produced smoother contours, while smaller ones traced fine features. This adaptability made α-shapes particularly useful for mapping natural forms such as tree crowns, downed tree trunks, and root ball cavities, where shapes rarely fit geometric patterns.
The idea was formalized by researchers who demonstrated that α-shapes captured the fine structure of an object far more accurately than a convex hull [
21]. Later studies applied it to reconstruct building footprints from airborne-laser scans, achieving sub-meter accuracy while preserving small cavities [
22]. Although these studies focused on human-made structures, the same geometric flexibility proved critical in post-storm landscapes where debris, vegetation, and soil formed uneven boundaries. In many cases, α-shapes were among the few tools capable of describing mixed natural and artificial forms without oversimplification.
The tunable α parameter governed the level of boundary detail. This property was exploited to reconnect fragmented edges and restore object continuity [
23]. Even on 3D point clouds, the variable α parameter was applied for canopy modeling, where dense point clouds received smaller α values to retain the fine branch structures, while sparse regions required larger α values to avoid noise amplification [
24]. These results showed that α controlled the balance between geometric detail and avoiding excessive smoothing. This flexibility was particularly valuable when merging fragmented detections in UAS-derived treefall masks with variable trunk curvature and debris density.
Because the α-shape adapted to local geometry, it accurately represented elongated and curved forms that box- or mask-based techniques tended to oversimplify. In forest applications, this adaptability captured the complex concavities and narrow vegetative gaps [
22,
23,
24]. For post-storm mapping, it helped describe the asymmetric outlines of fallen trees and exposed root balls. By following the natural curvature rather than a rectangular mask boundary, α-shape reconstruction produced a geometry that more closely reflected the physical arrangement of debris on the ground.
Recent computational geometry research has expanded α-shape concepts to handle large, noisy datasets more efficiently. One influential study combined the α-shape method with a Delaunay-based concave-hull filter, providing the opportunity to control the balance between boundary detail and smoothness [
20]. In parallel, a graph-theoretic m-LCuts was proposed to merge over-segmented objects using node-collinearity constraints similar in logic to α-shapes yet more scalable for dense imagery [
25]. Related advances demonstrated the benefit of geometry-aware connectivity, where directed-acyclic-graph segmentation maintained spatial cohesion among tree-crown components without sacrificing detail [
26]. These studies showed how α-shape reasoning has continued to evolve through multi-scale and graph-based formulations for shape-aware refinement of complex natural scenes.
In tornado-damaged forest mapping, such geometric criteria became indispensable. The α-shape approach helped rebuild fragmented detections while retaining the natural curvature of fallen trunks and the ellipsoidal form of root balls. These geometric features were essential for estimating the treefall orientation and the root ball dimension and for interpreting wind direction and damage intensity. Within the overall post-processing workflow, the α-shape acted as a geometric bridge that linked small fragments into a single, physically consistent mask that matched the true structure of the damaged vegetation rather than being guided only by color or brightness similarity.
2.4. Knowledge Gap
The limitations identified across existing post-processing methods directly shaped the motivation for developing the shape-aware non-maximum suppression (SA-NMS) algorithm presented in this study. Each category of methods contributed useful ideas yet left critical gaps when applied to complex, tornado-damaged forest imagery.
Table 1 summarizes each processing category’s main works, along with their strength and limitations.
Overlap-based approaches, including NMS and NMM, remained computationally efficient [
11,
12,
13]. As a result, they became the default choice in real-time deep-learning pipelines. However, they were insensitive to object geometry since they depended only on bounding-box or mask-level overlap, and often removed valid elongated detections in natural scenes. More importantly, these approaches only addressed the overlap problem and failed to merge fragmented masks. In forest imagery, this meant that two neighboring parallel fallen tree trunks could be misidentified as duplicates and one suppressed, even though both represented distinct objects.
Region-growing and watershed techniques reconnected fragmented masks and improved local continuity. However, because they relied heavily on pixel contrast, these methods often failed when lighting, debris, exposed soil, shadows, uprooted mounds, and reflective surfaces, or local distortions, altered pixel tone. For this reason, these algorithms did not generalize well across events. They could merge unrelated debris or missed faint sections, limiting their use for large-scale, multi-scene assessments.
Geometry-based models, such as the α-shape and graph-based variants, helped correct many geometric gaps by rebuilding concave and uneven boundaries [
19,
25,
26]. The main limitation was the need for careful parameter adjustment and high computational cost. For large aerial surveys covering thousands of detections, these requirements made direct application impractical. Taken together, these findings highlighted the need for an approach that balances efficiency with geometric sensitivity and adaptability. This goal motivated the development of the shape-aware algorithm introduced here.
The proposed shape-aware post-processing method repaired and refined detection results instead of simply filtering them. By merging fragmented pieces using local α-shape boundaries and correlation analysis, it reconnected trunks or root balls into single objects. This method also removed duplicate predictions that overlapped; however, it operated with consideration of object geometry and prediction confidence. Therefore, the outcome was a cleaner, continuous map that preserved the true geometry, retaining curved trunks and irregular outlines that box-based suppression would otherwise distort or discard.
The method included several limitations, as it required threshold adjustments such as, including the α value controlling boundary flexibility, the correlation coefficient r linking fragments, and the overlap tolerance τ determining when merging should occur. These settings varied slightly with image scale or canopy density and might require minor adjustment for different datasets. The process was also computationally heavier than regular IoU-based filtering because it performed local geometric fitting and correlation checks instead of one-step suppression. Even so, the added processing time was moderate, and the improvement in mask accuracy and interpretability generally outweighed the extra computation, especially for high-resolution forest-damage assessments.
3. Methodology
The segmentation models were trained using image tiles generated from the orthomosaics, each with a spatial dimension of 1024 × 1024 pixels. The training dataset consisted of 5400 image tiles with manually annotated instances across two classes: treefall and root ball. Pixel-level annotations were created to capture object geometry under varying lighting conditions, weather effects, and tree species. Annotation quality was reviewed manually (using an annotator and a separate reviewer for every image), and incomplete or ambiguous polygons were corrected prior to training. Preprocessing and data augmentation included random horizontal and vertical flips, small rotations, Gaussian blur, and minor geometric distortions to improve model robustness to variations in image resolution and illumination.
The work presented in this study focused on developing a shape-aware post-processing framework intended to refine the outputs of a YOLO11x instance-segmentation model. During the qualitative assessment of the results, it was observed that while the model performed well in identifying fallen trees and root balls, the presence of fragmented and duplicated detections in the baseline YOLO11 predictions often created unexpected errors in the analysis. These problems were especially visible along tile boundaries, where cross-tile continuity was lost, or within dense debris zones, where multiple treefalls overlapped. The proposed framework was designed to address these issues by introducing a geometry-based evaluation step that filtered duplicates, reconnected partial fragments, and merged spatially continuous detections into single, coherent objects.
A primary goal of this design was to balance mask integrity with correctness. Rather than merging masks solely based on overlap, each detection was evaluated according to how well its shape aligned with neighboring fragments. The goal was to determine whether the combined outline still followed the real edge of the same tree or root ball. If the geometry deviated excessively or stretched unnaturally, the merge was rejected. In this way, the refined masks remained close to what was actually visible in the imagery and preserved the size and shape of the original object instead of expanding or warping after combination. In summary, the process aimed to correct errors without distorting the natural shape of the object, which is essential for accurate orientation and root ball dimension analysis later in the workflow.
Finally, the framework was structured to support both treefall and root ball detections through class-specific post-processing criteria. The fragmented treefalls and root balls exhibited different geometric clues and the spatial dimensions, which meant they could not be refined using the same rule set. In most images, the fallen trunks appeared elongated and slightly curved, often crossing tile boundaries or overlapping with other debris, while the root balls were more compact and roughly ellipsoidal. Because of this, the framework employed separate merging thresholds and geometric filters for each class. For the treefall detections, the main goal was to reconnect the broken trunk sections and remove duplicated masks that represented the same object. In contrast, the root ball refinements focused on maintaining area and ellipsoidal shape consistent after merging so that the refined masks retained the physical appearance visible in the orthomosaic. These small but class-specific steps ensured that the results appeared more natural and physically accurate within the overall map of tornado-damaged trees.
The proposed post-processing framework integrated deep learning segmentation outputs with sequential geometry-based refinement steps to improve the spatial consistency and completeness of the detected treefall and root ball instances.
Figure 2 presents a conceptual overview of this workflow, illustrating how the baseline YOLO11 predictions generated by the YOLO11x segmentation model were progressively transformed into geometry-consistent and analysis-ready outputs. The flowchart follows a top-to-bottom progression in which each stage performs a distinct operation: detection, refinement, vectorization, and georeferencing.
At the start of the process, the YOLO11x model performed instance segmentation on the orthomosaic tiles and generated pixel-level masks for both classes. Immediately after this step, the Shape-Aware Non-Maximum Suppression (SA-NMS) module was applied. It operated directly after the YOLO11 baseline detection and before the predictions were stored, removing duplicate detections and merging fragmented instances that belonged to the same tree or root ball. Once these refined masks were saved and georeferenced, the next stage, tree-instance aggregation, linked detections that were disconnected within the same tile to form complete trees represented as geospatial vectors.
The final refinement step, edge-effect reduction, could be executed in two alternative positions within the workflow, depending on the dataset. It was typically applied after filtering or the aggregation stage to ensure that the reconstructed trees remained continuous across adjacent tiles. Taken as a whole, the modules functioned as a single pipeline linking the deep-learning results to geometry-based rules. This integration produced outputs that were spatially consistent and physically realistic, suitable for later analyses such as orientation or root ball characterization.
3.1. Input Data and Model Output
The dataset used in this study was derived from high-resolution 2D orthomosaic imagery collected by uncrewed aerial systems (UAS) over tornado-affected forested regions. These large orthomosaic images were subdivided into smaller orthophotos to maintain clarity and resolution while ensuring computational efficiency. Each image tile had uniform dimensions and minimal distortion, consistent with the source’s nadir perspective. All image tiles were expressed in two coordinate systems:
- (1)
State Plane coordinates, used for local mapping and metric measurements.
- (2)
Pixel coordinates, which defined each object’s position within the 1024 × 1024-pixel tile and were derived from the State Plane coordinate system.
The goal of generating these images was to produce manageable input data before performing inference. This dual-coordinate representation enabled each detection to be expressed both in image space (used for model training and inference as well as geometric analysis) and in real-world coordinates (used for spatial analysis and map production).
Standard approaches to handling large images involved dividing them into smaller sections (tiling) or downscaling the full input. Downscaling reduced image clarity and blurred fine details. In contrast, tiling preserved the original ground sampling distance and allowed efficient processing on a standard workstation [
27].
Accordingly, the orthomosaics were divided into tiles measuring 1024 × 1024 pixels with no overlap (0%), for both technical and practical reasons. This resolution was sufficient to capture small features such as root balls while providing adequate context for larger features like fallen trunks. At the same time, it satisfied the input size requirements of the YOLO11 architecture. Using smaller tiles (e.g., 512 × 512 pixels) would have hindered the model’s ability to recognize elongated features or clusters of nearby objects. In contrast, while using larger tiles (e.g., 2048 × 2048 pixels) would have increased GPU memory requirements and slowed the detection process.
A small overlap of 10–20% was considered but ultimately not adopted due to its negative impact on both computational efficiency and detection consistency. Overlapping tiles would still split elongated fallen trees or root balls across adjacent tiles, while also introducing additional duplicate predictions in the overlapping regions. These duplications in overlapping regions would also require an additional refinement step, consequently increasing the processing time of the workflow. For these reasons, a non-overlapping tiling was selected to avoid edge-related duplication along tile borders.
The ground sampling distance (GSD) for input data ranged from approximately 1.1 to 4.7 cm per pixel, depending on flight altitude and camera payload. Each tile thus represented an on-ground area of approximately 121–2500 m2. The recommended size of 1024 × 1024 pixels was selected for consistency across the deep-learning model and post-processing stages. However, the algorithm can accommodate other input sizes when necessary.
The model outputs were stored in a YOLO-compatible text file format consistent with common dataset standards for object detection models. Each record included the class label (treefall or root ball), polygon vertices defining the mask outline, and the model-assigned confidence score. Each process was also accompanied by a metadata document that recorded the threshold values used during inference, the model mode (local or cloud), and the input image path for post-processing. The purpose of this metadata was to ensure reproducibility. Even though bounding box parameters (center, width, and height) were also recorded, they are not used directly in this study, since post-processing operated on polygon-level masks. The shape-aware post-processing algorithm utilized these polygonal masks to assess geometric relationships, merge fragments, and eliminate duplicate detections.
3.2. Shape Aware Non-Maximum Suppression (SA-NMS)
The Shape-Aware Non-Maximum Suppression (SA-NMS) framework was developed to address limitations in post-processing instance-segmentation predictions within crowded and natural scenes. Conventional IoU-based suppression struggled to represent the irregular and overlapping geometries that characterize natural environments, such as tornado-impacted forests. Because standard NMS relied on bounding-box overlap, these features often produced errors in detection and suppression.
Existing methods, such as mask-level NMS and Soft-NMS, improved upon the traditional approach by applying mask-IoU instead of box-IoU. However, because they relied on a single threshold value for the suppression criterion, they failed to capture the complexity of object geometry. Additionally, these methods focused solely on suppression and did not enhance the geometric integrity of the disconnected or fragmented masks. Therefore, they were unable to interpret the true spatial relationship between objects, particularly when shapes were elongated, fragmented, or concave.
In this study, a prediction refers to a single instance output generated by the YOLO11x-seg model within an image tile. Each prediction corresponds to one detected object and may consist of one or more disconnected mask polygons, hereafter referred to as fragments. Fragmentation occurs when elongated or irregular objects, such as fallen trees, are partially segmented into multiple components due to occlusion, shadows, or local image artifacts. The Shape-Aware Non-Maximum Suppression (SA-NMS) algorithm operates on the set of YOLO11 instance predictions rather than on raw image tiles. For each prediction, SA-NMS evaluates the geometry and spatial relationships of its associated fragments to determine whether fragments should be merged, separated into independent instances, or discarded. Duplicate instance predictions are then suppressed using mask-level overlap criteria. The complete SA-NMS workflow is described step-by-step in Algorithm 1.
| Algorithm 1. Shape-Aware Non-Maximum Suppression (SA-NMS). |
| Input: YOLO11x-seg predictions P = {Mi, Ci, CLi} |
| Where Mi = mask polygon, Ci = class label, CLi = confidence level |
| Parameters: |
| Amin= minimum segment area, α = alpha parameter, |
| rthr= Correlation threshold, Oxy= projected overlap threshold, |
| τ = mask overlap, |
| Aindep = minimum segment size to separate as an independent mask, |
| SubsamplingValue = contour subsampling interval, |
| minPoints = minimum number of contour points, |
| Output: Refined mask set P’ = {Mi, Ci, CLi} |
| // (1) Remove small or noise-like detections |
| 1: For each instance Mi ∈ P: |
| 2: if area(Mi) < Amin: |
| 3: discard Mi |
| // (2) Merge multi-segment predictions |
| 4: For each remaining instance Mi ∈ P: |
| 5: if num_segments(Mi) == 1: |
| 6: keep Mi |
| 7: else: |
| 8: if Ci == “root ball”: |
| 9: Mi ← α_shape_merge (Mi,α) |
| 10: else if Ci == “treefall”: |
| 11: r ← pearson_corr(segment_positions(Mi)) |
| 12: overlap ← compute_overlap_xy(Mi) |
| 13: if r > rthr and overlap < Oxy: |
| 14: Mi ← α_shape_merge (Mi,α) |
| 15: else if area(Mi) > Aindep: |
| 16: retain_individual_segments(Mi) |
| 17: else: |
| 18: discard Mi |
| // (3) Suppress duplicate detections using mask-level overlap |
| 19: For each mask pair (Mi, Mj) in P: |
| 20: if mask_overlap(Mi, Mj) > τ: |
| 21: keep ← larger_area(Mi, Mj) or higher_confidence |
| 22: discard(other) |
| 23: For each mask Mi ∈ P: |
| 24: Polygoni ← find_contour(Mi) |
| 25: if Ci == “treefall”: |
| 26: Polygoni ← subsample(Polygoni, SubsamplingValue) |
| 27: if num_points(Polygoni) ≥ minPoints: |
| 28: add {Mi, Ci, CLi} to P’ |
| 29: Return refined set P’ |
The first step identified the number of fragments contained within one prediction. This was a critical operation because instances consisting of a single unified mask were excluded from further geometric processing. This exclusion reduced computation time and prevented unnecessary evaluations. However, when a prediction contained more than one disconnected segment, the segment areas were calculated, and extremely small segments were treated as noise and removed. Subsequently, class-specific processing was applied according to the assigned label. For the treefall class, the workflow began by computing the Pearson correlation coefficient between mask fragments. This measure was appropriate for treefall detections because fallen trunks typically exhibited strong linear alignment.
- (1)
If the fragments were collinear (r > 0.8), an additional geometric criterion was applied before merging. High correlation could also occur when two parallel trees lay in close proximity to one another. To avoid false merging, each mask was projected onto the x- and y-axes, and the mask overlap was evaluated. Segments representing the same tree were expected to show limited overlap. When both conditions were satisfied, sufficient collinearity and acceptable overlap, the fragments were merged using the α-shape algorithm, which reconstructed a continuous boundary between them. This selective merging minimized unnecessary α-shape operations and improved computational efficiency.
- (2)
If the fragments were not collinear or exceeded the overlap threshold, their individual areas were evaluated. Only segments with sufficient area were retained as separate detections, ensuring that very small or noise-like fragments were excluded, while the preserved segments maintained adequate geometric detail after processing.
For the root ball class, a different rule was applied. Because root balls are compact and approximately circular, they did not require correlation analysis. Instead, their fragments were merged directly using the α-shape method, which reconstructed the full circular boundary without any additional geometric conditions.
Once the merging and separation steps were completed, the framework performed a final cleanup to eliminate duplicate detections. During this stage, if two predictions mask overlapped beyond the chosen threshold (τ > 0.6), a non-maximum-suppression check was executed that considered both mask area and confidence score. Rather than automatically retaining the detection with the highest confidence score, the algorithm evaluated which mask represented the larger and more complete shape. In practice, this meant that a slightly lower-confidence mask was preserved if it captured the object’s full outline, while small or partial fragments were discarded. This approach maintained detections consistent with what was visible in the imagery and prevented incomplete segments from replacing full tree or root ball boundaries.
The goal in developing this workflow was to achieve a practical balance between geometric accuracy and processing efficiency. By counting fragments, the algorithm restricted computation to the detections that required refinement. In the next step, it verified that fragments being merged aligned along the same trunk direction, rather than forcing unrelated segments together. Using the α-shape function only when necessary reconnected broken outlines without applying heavy geometric operations to every mask. In the final stage, comparing both mask area and confidence ensured that the model did not discard a complete but lower-confidence detection in favor of a smaller fragment. Collectively, these design choices enabled the method to handle very large, high-resolution orthomosaics while maintaining refined masks that accurately represented the true shape of trees and root balls observed on the ground after a tornado.
Among the three stages of SA-NMS, the geometry-based decision step (Stage 2) is the primary contributor to performance improvement. This step evaluates fragment alignment through the correlation check and projected-overlap test, and then applies α-shape merging only when the fragments are collinear, otherwise separating them as independent masks. By reconstructing a continuous mask from disconnected segments, this step reduces false positives caused by fragmentation and increases true positives by operating on the separations. The final suppression stage selects the instance with the larger and more complete shape rather than relying only on confidence levels, which improves the accuracy of the detections. These two geometry refinement steps explain the observed improvements in precision, recall, mask continuity, and orientation stability.
3.2.1. Parameter Definition and Thresholds
The proposed SA-NMS refinement process relies on several geometric and area-based parameters that control how fragmented masks are evaluated, merged, or suppressed. All thresholds were selected based on a sensitivity analysis performed over the two validation datasets (NT1 and NT7). The analysis followed a one-at-a-time parameter-variation strategy, in which each parameter was varied across a prescribed range while all other parameters were held constant at their current optimal values. For each configuration, the number of matched instances, missed matches, mismatches, and orientation alignment accuracy were evaluated. In total, 52 full processing runs were conducted across both validation zones, covering systematic variations in minimum segment area, α value, correlation coefficient, overlap thresholds, and contour-related constraints. The results demonstrated stable performance across moderate parameter changes, with no abrupt degradation in orientation alignment or instance matching within the tested ranges. Based on this analysis, the final parameter values were selected near the center of the stable performance regions rather than at boundary values, reducing sensitivity to dataset-specific effects.
Table 2 summarizes the tested parameter ranges and the selected values used in the final implementation. The parameter values that produced the highest number of correctly matched instances, while maintaining stable orientation alignment, were prioritized and are reported in the final column.
Although several parameters in the SA-NMS framework are defined as fixed values for these study areas, their effectiveness was evaluated through the parameter-sensitivity analysis described above. The NT1 and NT7 validation zones were selected to represent different tree species, damage densities, and environmental conditions. Threshold values were tested across orthophotos with ground sampling distances ranging from approximately 2 to 10 cm/px and across varying levels of canopy density and damage severity.
The sensitivity analysis demonstrated stable performance around the selected threshold values, with no abrupt degradation in instance matching or orientation alignment across the tested ranges. This behavior is expected because the SA-NMS framework relies primarily on geometric relationships, such as segment alignment, mask area, and projected overlap, rather than radiometric or site-specific attributes. These geometric criteria scale consistently with object size and are therefore relatively insensitive to moderate variations in spatial resolution, forest type, or terrain conditions within the tested ranges. While minor adjustments may be required for substantially different imaging resolutions or forest structures, the results indicate that the proposed parameter set generalizes well across the conditions evaluated in this study.
The first control variable was the minimum segment size, which defined the smallest area a mask fragment had to occupy to be retained. In this study, segments smaller than 1500 pixels2 were discarded early in the process because they typically represented noise. This filtering step removed non-informative fragments and reduced computational load before the start of the geometric analysis. The α (alpha) parameter was set to 0.05 in this study, which controlled how closely α-shape connected separate fragments within a single detection. Lower α values preserved fine curvatures along fallen trunks and irregular root ball edges.
To evaluate whether fragmented masks belonged to the same tree, the Pearson correlation coefficient (r) was calculated between the mask segments. A threshold of r > 0.8 was adopted to reflect the strong linear alignment typical of fallen trunks. It was noted that values smaller than 0.7 merged parallel trees and, very close to 1, missed merging slightly bended trees. Only fragments meeting this correlation condition were considered eligible for geometric merging using α-shape. However, to prevent two adjacent and nearly parallel trees from being incorrectly merged, an additional test examined the projected overlap along both image axes. If the horizontal or vertical overlap exceeded approximately 50 pixels, the fragments were retained as separate detections even when correlation was high.
For all classes, the mask-level overlap threshold (τ) determined when two detections were classified as duplicates. A value near 0.6 proved effective, allowing minor intersection between neighboring masks while preventing excessive suppression. When overlap exceeded this limit, one instance was removed according to a dual criterion that considered both mask area and confidence score. In most cases, the algorithm retained the instance with the larger, more complete geometric coverage instead of the one with the slightly higher confidence. In this context, noise refers to segmentation artifacts that do not represent a complete tree or root ball, including small or partial mask fragments, isolated trunk segments, detached boundary patches, and fragments caused by occlusion from overlapping canopy, shadows, or truncation at tile boundaries. These noisy fragments can still receive relatively high confidence scores from the model despite lacking physical completeness. Prioritizing geometric completeness over confidence alone therefore provides a more reliable representation of true treefall instances. Two additional area-based parameters further refined how fragmented detections were handled. During multi-region processing, small disconnected fragments under 2500 pixels2 were removed, while sufficiently large non-collinear fragments with an area greater than 1500 pixels2 were preserved as independent masks. These thresholds ensured that distinct neighboring trees remained properly separated.
Finally, geometric precision was maintained through two structural constraints. To streamline storage, the boundary coordinates of treefall masks were subsampled every five points, reducing file size without compromising spatial accuracy. In addition, for both treefall and root ball classes, each polygon was required to contain at least four distinct coordinate points along its contour to be accepted as a valid mask, ensuring that the resulting geometry defined a closed shape.
These thresholds enabled the SA-NMS procedure to adapt to different object sizes and the varying complexity of each scene. Using area-, correlation-, and overlap-based rules together preserved the geometry of detected objects without introducing heavy computational demands. During the final suppression stage, both mask size and confidence were jointly weighted, producing results that remained geometrically realistic and reliable for analyzing treefall and root ball patterns.
The performance of the shape-aware post-processing framework is illustrated in
Figure 3, which summarizes the refinement results and their improvements. The black polygons represent the boundaries of the tree trunks, and the red vectors indicate the derived fall directions, both generated automatically. The yellow polygons, manually added for clarity, highlight areas where noticeable changes occurred after refinement.
Figure 3a shows a representative fragmented prediction in which a single fallen tree was divided into several disconnected mask sections due to an obstacle (e.g., another tree). It also illustrates multiple predictions for the same object. After applying the SA-NMS procedure, the algorithm identified the fragments belonging to the same object and reconstructed the prediction boundary while preserving the natural curvature of the trunk, as shown in
Figure 3b. The refinements also improved the accuracy of the estimated orientation due to the enhanced geometric integrity of the prediction polygon.
In another case, as shown in
Figure 3c, the model produced two overlapping detections for a single instance. The final retained instance, shown in
Figure 3d, represented a single, well-defined object whose boundaries followed the true edge of the fallen tree. After refinement, two fallen trees that were not detected in the baseline predictions were recovered. This occurred because the YOLO11 instance-segmentation framework records only a single retained mask polygon per prediction, even when the underlying object is represented by multiple disconnected mask fragments. These trees were likely initially missed due to a combination of factors common in tornado-affected forest imagery, including partial occlusion by overlapping canopy, low contrast between fallen trunks and background vegetation, and fragmentation of elongated tree structures across multiple segments. In some cases, YOLO-based segmentation models assign a single mask to complex scenes containing multiple adjacent or crossing trees, which can suppress individual detections when objects are not well separated. The proposed post-processing approach addresses these cases by separating non-collinear mask fragments into independent detections, allowing previously merged or suppressed treefall instances to be recovered. These examples demonstrate that the proposed refinement improved both completeness and accuracy, as duplicates were removed and the resulting dataset became cleaner and spatially consistent, suitable for subsequent analyses such as treefall orientation estimation.
Fragmentation in the prediction masks also had a negative impact on the computed detection metrics. When a single fallen tree was predicted as multiple disconnected mask fragments, each fragment appeared in the evaluation as an independent detection. This produced additional false positives with a smaller area, while their corresponding ground truth tree frequently remained unmatched, which increases the number of inaccurate or false negatives. As a result, precision and recall were reduced for the treefall class. The SA-NMS refinement reduced this issue by reconstructing a single continuous mask when appropriate, allowing the evaluation metrics to better represent the actual scene.
3.2.2. Methodological Contributions and Innovations
The proposed Shape-Aware Non-Maximum Suppression (SA-NMS) introduced several methodological improvements over traditional post-processing approaches.
- (1)
The algorithm integrated α-shape boundary reconstruction with mask-area overlap analysis. This combination enabled the recovery of fragmented detections while preserving the realistic geometry of each object. The result was a refined mask that more accurately followed the true outline of fallen trees and root balls.
- (2)
The method applied a feature-based linear correlation coefficient (r) to determine when α-shape merging should occur. By evaluating the spatial alignment of mask fragments, the process activated only for segments exhibiting meaningful geometric relationships. This ensured that α-shape operations were applied selectively rather than indiscriminately to all predictions.
- (3)
SA-NMS treated all valid fragments meeting the minimum area requirement as independent instances when they did not satisfy the collinearity or overlap checks. This step preserved smaller but meaningful detections, improving output completeness and reducing undercounting in complex forest scenes.
- (4)
Suppression decisions were based on a dual-criterion strategy that considered both mask area and detection confidence. This approach prevented the loss of larger or more complete detections that might otherwise have been replaced by smaller, high-confidence, but incomplete masks. This is a limitation common to IoU-based NMS methods.
Together, these design choices made the SA-NMS framework geometry-aware and class-specific. The resulting masks were cleaner, spatially consistent, and better suited for the feature-based analyses of tornado-induced forest damage at large scales. To further illustrate differences in mask continuity across post-processing methods, a zoomed comparison is presented in
Figure 4. This figure provides a qualitative comparison of post-processing strategies applied to the same YOLO11x-seg predictions, including the baseline output, standard Non-Maximum Suppression (NMS), Non-Maximum Merging (NMM), and the proposed SA-NMS approach. The baseline predictions exhibit frequent mask fragmentation and discontinuities caused by complex debris patterns in root ball and treefall objects. While standard NMS and NMM reduce duplicate detections, these methods are not designed to reconstruct fragmented object geometry. In contrast, SA-NMS improves object representation by preserving geometric continuity across fragmented mask segments for both treefall and root ball instances. This results in more complete and continuous representations of fallen tree trunks and root balls, which is critical for geometry-based feature analyses. In
Figure 4, green polygons represent root ball instances and blue polygons represent treefall instances predicted by YOLO11x-seg. Yellow circles were added manually to highlight locations where instances behave differently across post-processing methods, while the enlarged yellow circles show the corresponding zoomed-in views of these regions.
3.3. Tree Instance Aggregation
Even after the shape-aware refinement stage, some trees were still represented by multiple masks or orientation vectors. These artifacts arose primarily from illumination variations, occlusions, or local image distortions, which could lead to multiple detections for the same tree when no overlap existed. SA-NMS was designed to reconnect fragmented masks within the same detection and suppress duplicated detection with an overlap greater than τ. If left uncorrected, such duplications increased the number of predicted trees and introduced errors in subsequent analyses, particularly in treefall methods that depend on treefall density [
3]. The aggregation procedure reviewed all predicted polygons within each tiled orthophoto and evaluated whether they should be combined into a single continuous outline of a fallen tree. Five parameters were defined to guide aggregation through the Region of Interest (ROI) around each detected tree polygon [
28].
- (1)
Length of Extension (Lextension):
This parameter defined the outward extension from both ends of a treefall vector, aligned with the trunk’s orientation. Its purpose was to cover the potential gaps between nearby segments that represent parts of the same tree but appeared fragmented. Without this extension, trunks separated by root balls or intersecting debris could remain disconnected. By enlarging the vector endpoints by Lextension, the ROI accounted for these discontinuities and improved the likelihood of reconstructing a single continuous tree.
- (2)
Tree Thickness Multiplier (N):
The ROI width was scaled proportionally to the predicted tree thickness (t) using a multiplier N. The resulting ROI width was 2Nt, extending laterally on both sides of the centerline. The combined use of a multiplier and the tree thickness ensured that thicker trees, which naturally occupied larger image sections, were assigned wider search areas for potential segment reconstructions. Conversely, thinner trunks were limited to narrower regions, reducing interference from surrounding features.
- (3)
Overlap Range (τ):
To prevent erroneous merging of distinct but parallel trees, the projections of candidate vectors were compared along the x and y axes. If the overlap on either axis exceeded the threshold τ, the segments were classified as parts of separate, parallel trees. This condition was particularly critical in dense forest stands where adjacent, similarly oriented trees often fell in parallel. By constraining overlap tolerance, τ reduced false-positive merges in the aggregation process.
- (4)
Angle Difference Threshold (θ):
Directional agreement was required before candidate treefall vectors could be aggregated. The orientation difference was calculated, and vectors with angular separation smaller than θ were retained as possible fragments of a single tree. This criterion reduced the likelihood of merging unrelated objects, such as perpendicular stems that fell within the same ROI.
- (5)
Percent of Interior Extension (PIE):
In addition to outward extensions, ROIs were extended inward by a fraction of the length extension (L
extension), defined by the percentage parameter PIE. This inward extension captured predictions that were missed during the shape-aware post-processing (SA-NMS) due to minimal overlap, which otherwise would have been excluded if only exterior extensions had been applied. For example, a value of PIE = 0.5 extended the ROI inward by half of L
extension. This adjustment improved robustness in cases where predictions slightly overlapped but still represented the same fallen tree, as shown in Equation (1).
The candidate vectors that fell within these ROIs were evaluated for similarity based on angle and overlap criteria.
Figure 5 illustrates the parameters used to generate the ROI for the candidate treefall vectors in the aggregation evaluation.
Orientation Estimation of Aggregated Treefall Instances
After the fragmented predictions of the same fallen tree were grouped for aggregation, the next step was to assign a single representative orientation vector to the aggregated treefall instance. The individual fragments could have exhibited different orientations, including opposite or slightly deviated directions, caused by noise, occlusion, or segmentation inconsistencies. To resolve these discrepancies, the algorithm employed two complementary approaches to estimate treefall orientation. The first approach used a weighted average of fragment directions, based on confidence level, segment length, and taper rate. The second approach relied on geometric analysis of the aggregated mask. Together, these approaches ensured that the final orientation captured both the reliability of weighted aggregation and the geometric consistency of the reconstructed mask.
- (1)
Weighted Orientation Averaging
The process of computing the weighted average began by extracting the horizontal orientation (θ
i) of each predicted treefall segment grouped for the aggregation. Within each aggregation group, the first segment was treated as the reference direction (θ
href) and assigned a directional sign of +1. The direction magnitude of the other segments was compared to the reference direction. If the angular deviation relative to θ
href is within 45°, the segment was considered directionally consistent and assigned +1; otherwise, it was assigned −1. This sign assignment removed directional bias and maintained consistency when aggregating vectors representing the same tree. Equation (2) defines the directional sign rule used in this assessment.
The overall Horizontal Angle Sign (HAS) for the group was then calculated as a weighted average of all contributing segments, incorporating four factors: prediction confidence (Cl
i), trunk segment length (L
i), taper rate (TR
i), and the directional sign defined above. Equation (3) presents the weighted average formula.
where Cl
i is the model confidence score, L
i is the segment length, and TR
i is the taper rate for segment i. This weighting ensured that longer, thicker, and more confidently predicted segments exerted greater influence on the aggregated orientation regardless of directional magnitude.
Finally, the aggregated treefall orientation θ
final was determined from the HAS. If HAS ≥ 0, the mean of all positively signed orientations was selected; if HAS < 0, the mean of all negatively signed orientations was selected. Equation (4) defines the final decision rule.
- (2)
Orientation Using Calculation of Taper Rate
In addition to the weighted averaging procedure, the aggregated mask geometry was also analyzed directly to estimate treefall orientation [
5]. Similarly to the mask boundaries recorded during the deep learning model segmentation, the aggregated mask boundary was used to recalculate the taper rate and infer the treefall orientation. Applying this approach, enabled estimation of the reconstructed fall tree’s orientation without relying solely on fragment-level statistics.
This procedure reduced duplication errors while preserving geometric fidelity, ensuring that each fallen tree was represented by a single polygon and orientation vector. The resulting orientations were both representative and robust, providing reliable inputs for subsequent treefall analyses.
Figure 6 illustrates the effect of tree instance aggregation and its improvements. The black polygons represent tree trunk boundaries derived from detection, and the red vectors show the corresponding fall directions, both generated automatically. The yellow boxes, added manually for clarity, highlighted areas where noticeable changes occurred after aggregation.
Figure 6a shows two tree trunks represented by several disconnected predictions with no overlap due to occlusions. After performing the tree instance aggregation, the algorithm identified the tree vectors belonging to the same object and reconstructed the prediction polygon, updating its vector, as seen in
Figure 6b. The refinements also enhanced the accuracy of the estimated orientation due to improved geometric definition of the prediction polygon.
The influence of each parameter was analyzed to understand how its effect on aggregation behavior. Increasing Lextension improved reconnection of fragmented trees but occasionally caused nearby parallel trunks to merge unintentionally in dense regions. The width multiplier N required tuning according to expected trunk diameter; large values widened the ROI excessively, while smaller values missed adjacent fragments. The overlap threshold (τ) controlled the proximity at which two fragments were treated as separate trees; values near 0.5 produced the most stable results. The interior extension fraction (PIE), typically between 0.4 and 0.6, helped reconnect trees that the SA-NMS stage had failed to merge because of minimal overlap. These empirical adjustments established a balanced parameter set that performed consistently across multiple tornado tracks.
3.4. Edge Effect Reduction
As demonstrated in
Section 3.1, the input data used for processing in the research were generated by tiling large orthomosaics. Although tiling improved processing efficiency, it introduced the challenge of the edge effect. When a single fallen tree spanned adjacent tiles, the model detected it multiple times in consecutive images, producing duplicated or fragmented polygons and vectors. The goal of edge effect reduction was to identify treefall segments that crossed tile boundaries and merge them into a single continuous treefall vector, since this duplication could increase the number of predictions and affect treefall analyses that depend on damage density [
5]. The procedure was applied only to georeferenced inputs (i.e., GeoTIFF).
To address this issue, a post-processing step was implemented to identify and correct edge effects. This procedure was based on ROI logic established in the Tree Instance Aggregation step but extended to operate across tile boundaries. Vectors located near tile boundaries were analyzed using similar geometric and positional criteria, including ROI extension, orientation agreement, and overlap evaluation. In this boundary analysis, two key modifications distinguished the method from that used within a single orthophoto.
- (1)
Boundary Size [m]
A buffer region was defined along the edges of each tile. Only treefall vectors intersecting this region were evaluated for cross-tile merging. This prevented unnecessary computation for vectors located well within tile interior, where edge effects could not occur.
- (2)
Exclusion of Interior Extension
Unlike the tree instance aggregation procedure, where interior extensions (PIE) were used to capture fragmented predictions within the same image, this approach excluded interior extensions. Because minimal overlap was not expected between cross-tile treefall vectors; the procedure relied solely on outward ROI extensions and boundary alignment. This adjustment reflected the fact that tiling discontinuities only occurred at image boundaries.
The remaining parameters, Length of Extension (L
extension), Tree Thickness Multiplier (N), Overlap Range (τ), and Angle Difference (θ), were applied with the definitions and roles as described in
Section 3.3. For clarity, their formulations are not repeated here, but the same reasoning was applied to define the ROI around each candidate vector and determine whether separate fragments represented a single fallen tree.
Figure 7 illustrates parameters used to generate the ROI for candidate treefall vectors during the boundary evaluation.
Orientation Fusion and Weighted Averaging
When two vectors representing the same tree were detected across tiles, their orientation, length, and taper rate values were fused using a weighted average similar to that used in the aggregation step. Each orientation was weighted by its segment length and confidence score as defined in Equations (2)–(4). Consequently, longer and more reliable segments had greater influence on the merged result. After merging, the new vector retained both geometric and confidence attributes from its source detections, smoothing small directional deflections at tile boundaries.
Figure 8 illustrates the improvements produced by the edge-effect reduction. In this figure, white dashed lines mark the vertical and horizontal tile boundaries. Green vectors represent prediction vectors before correction, and blue vectors indicate merged treefall vectors after edge-effect reduction. In
Figure 8a, the original predictions appear fragmented along the tile edges, while in
Figure 8b, the same area shows continuous, merged vectors. The refinement eliminated duplicate detections, reduced orientation discontinuities, and ensured that each fallen tree was represented once within the georeferenced dataset.
Applying the edge-effect reduction procedure significantly decreased the number of duplicated treefall vectors significantly, by approximately 17.4%, while maintaining nearly identical overall accuracy. This result indicated that most removed vectors were false duplicates rather than true detections. Comparison between pre- and post-correction results showed that the majority of retained matches occurred among properly aligned trunk segments. The duplication reduction rate improved both geometric continuity and dataset consistency across adjacent tiles. The overall processing outcome is discussed in greater detail in the next section.
4. Discussion and Results
The purpose of this section was to evaluate how each stage of the proposed post-processing framework contributed to improving the reliability and integrity of the treefall vectors. The analysis focused on determining whether the refinements produced measurable and observable improvements in the detection results. Specifically, the evaluation aimed to (1) quantify how geometry-based refinements enhanced orientation accuracy and reduced redundant or fragmented detections, and (2) qualitatively assess the integrity and continuity of the refined treefall vectors within the orthophotos and across tiled orthomosaic scenes.
In this assessment, quantitative comparisons were performed using manually tagged reference vectors from two validation zones, representing ground-truth treefall vectors. These vectors were created by the authors using QGIS (v3.30.1) software through manual identification of fallen trees in two survey zones. Together, these complementary analyses provided an overall understanding of how the proposed post-processing improved the completeness, spatial coherence, and interpretability of the final treefall vectors. The comparison workflow was evaluated in four main stages:
- (1)
Baseline YOLO11 Segmentation:
This stage represented the direct model output before any additional refinement. It provided the predictions that would have been obtained without post-processing.
- (2)
Shape-Aware Non-Maximum Suppression (SA-NMS):
A geometry-driven filtering process that reconnected fragmented detections and removed overlapping duplicates by analyzing boundary shapes and correlation.
- (3)
Tree Instance Aggregation:
The stage merged residual fragments that remained separate due to occlusion or lack of overlap, using region-of-interest (ROI) rules based on trunk thickness and directional consistency.
- (4)
Edge-Effect Correction:
The final step identified duplicate treefall vectors at tile boundaries and merged them to maintain cross-tile continuity in the dataset.
To evaluate the performance of each refinement stage, two study zones, referred to as NT1 and NT7, were selected for quantitative assessment. Both areas were located along the northern track (NT) of the December 2021 tornado in the Land Between the Lakes (LBL) region and represented forest damage with differing canopy densities and species composition. NT1, the westernmost 1.6 km of the damage in LBL, contained predominately deciduous trees without foliage and covered an area of 2.55 km2. The site was surveyed on 14 March 2022, and the resulting orthomosaic had an average ground sampling distance (GSD) of 1.87 cm. The image was divided into tiles measuring 1024 × 1024 pixels, generating 7275 tiles in total. Similarly, NT7, located approximately 6 km from the eastern end of LBL, was surveyed with the same equipment on 15 March 2022, with an average GSD of 1.86 cm. This zone covered 2.67 km2 and contained both coniferous and deciduous trees. After tiling the zone into 1024 × 1024-pixel images, 7594 tiled orthophotos were generated. Each orthomosaic tile was processed using the same workflow, deep learning model, and threshold values. Manually tagged reference vectors served as benchmarks for orientation accuracy and duplicate detection.
4.1. Quantitative Evaluation and Performance Metrics
- (1)
Detection performance
Detection performance was evaluated using four post-processing approaches (Baseline, NMS, NMM, and SA-NMS) based on precision, recall, and F1-score for both the treefall (TF) and root ball (RB) classes. All metrics were computed from YOLO11x-seg predictions evaluated at an intersection-over-union (IoU) threshold of 0.50, which provides a consistent basis for comparing detection performance across methods.
Table 3 summarizes the resulting performance metrics for both detection classes and serves as the basis for the comparative analysis.
For treefall detection, NMS achieves the highest precision (89.2%), reflecting strong suppression of false positives and duplicate detections. However, the increase in precision led to a reduction in recall (48.6%), indicating that valid treefall instances, particularly fragmented or lower confidence score detections, are removed. This behavior is consistent with the bounding box–based suppression strategy of NMS, which can be restrictive for elongated treefall objects in dense damage regions. In contrast, SA-NMS exhibits a different behavior, where it produces the highest recall (54.0%) and the highest F1-score (66.8%) for treefall detection. These results indicate a more balanced approach between retaining true treefall instances and limiting false positives. Retaining true detections is particularly important for treefall methods where the density of the fallen tree is important, such as the Godfrey–Peterson treefall method [
3]. The improved recall suggests that the shape-aware strategy used in SA-NMS better preserves complete treefall representations that might otherwise be fragmented or suppressed by conventional bounding confidence score-based NMS approaches. Notably, this increase in recall is achieved without a substantial reduction in precision, which is consistent with the intended design of the shape-aware post-processing framework.
For root-ball detection, the influence of post-processing choice is much smaller. Precision, recall, and F1-score values remain similar across NMS, NMM, and SA-NMS. The root analysis consists of merging the fragmentations and suppressing duplications. Therefore, this outcome is expected given the smaller number of root ball instances and their more compact geometry, which reduces the likelihood of fragmentation or over-lap-related suppression. While SA-NMS performs comparably for root ball detection, additional annotated data and broader testing would be needed to more conclusively evaluate potential advantages for this class.
During evaluation of the validation dataset consisting of 1003 orthophotos (each 1024 × 1024 pixels), processing time was recorded for each post-processing method. Standard IoU-based NMS required 38 min to process the full dataset, while the complete SA-NMS pipeline required 46 min. This corresponds to an increase of approximately 21% in total processing time relative to standard NMS method. At the scale of full-track processing, this additional computational cost was considered acceptable, as SA-NMS is applied only to detections requiring geometric mask refinement rather than uniformly to all predictions.
Two primary measures were used to quantify performance at the end of each refinement stage:
- (2)
Instance-level accuracy
Instance-level accuracy was evaluated by comparing automated detections with the manual tags in the NT1 and NT7 regions. The evaluation was performed at the instance level rather than the grid-cell level [
29]. Polygons were created around each treefall vector in both datasets, and overlaps were measured using the intersection over union (IoU) metric. When IoU values exceeded zero, the vectors were considered to represent the same tree, and their orientation difference was calculated.
Automated vectors were classified as correctly aligned with manual measurements if their orientation difference was within ±20°. For both validation zones, results showed that most automated detections aligned closely with manual references, approximately 77–79% differed by 0–10°, and an additional 6–7% fell within the 10–20° range. Minor deviations were likely caused by partial canopy coverage or irregularities in tree crowns. Only 2–4% exhibited larger differences (20° and 160°), typically due to unclear or noisy trunk boundaries. Approximately 10% showed near or complete directional reversals (≥170°), often related to snapped trunks or misaligned manual references.
When compared with earlier automated or semi-automated tornado damage assessments, the proposed method achieved higher accuracy. The previous semi-automated method reported only 59% of predictions within a ±20° [
29], with a median angular deviation of 13.3° [
28]. The average accuracy of 78% across both sites therefore represented a substantial improvement over prior methods.
- (3)
Duplicate-rate reduction:
In addition to orientation alignment, instance-level segmentation performance was evaluated using standard detection metrics. Precision, Recall, and F1-score were computed for both treefall and root ball instances after each post-processing step.
Table 4 reports these metrics for the NT1 and NT7 study areas across baseline YOLO predictions and three post-processing stages (SA-NMS, aggregation, and edge-effect correction). Results show that applying SA-NMS increases Recall and F1-score relative to the YOLO11x-seg baseline in both study areas, indicating improved preservation of valid treefall and root ball instances. Precision remains nearly constant across processing stages, suggesting that the increase in Recall does not correspond to an increase in false detections. These results demonstrate that the proposed post-processing framework improves instance segmentation quality while supporting reliable orientation-based damage analysis.
The Shape-Aware NMS step yielded the largest performance improvement. Instead of large jumps in instance counts, SA-NMS produced a modest increase in the number of predictions (from 18,518 to 19,231 in NT1 and from 17,625 to 18,746 in NT7). More importantly, orientation agreement improved from 73.84% to 76.03% in NT1 and from 74.56% to 76.45% in NT7, indicating that reconstructed masks better captured the true trunk geometry. Tree Instance Aggregation followed SA-NMS to merge remaining fragments. Although the total prediction count decreased slightly (about 4% in both NT1 and NT7) and individual vector continuity improved, there was minimal change in orientation accuracy, which changed very little (≤0.3 percentage points) in orientation accuracy relative to manual vectors.
Edge-Effect Reduction addressed duplications caused by tiling, merging duplicated vectors that crossed tile borders. This step applied only to treefall vectors, since orientation analysis depended on trunk geometry, while root ball shape was derived from best-fit ellipses. The number of predictions decreased approximately 15–20% depending on the zone, showing effective removal of cross-tile duplications and improving continuity. Despite this reduction, orientation agreement remained essentially unchanged (within 0.3–0.4 percentage points), indicating that most eliminated vectors were indeed redundant.
While SA-NMS increases the total number of detected instances, this change has also led to an improvement in recall rather than the introduction of false positives. As shown in
Table 4, recall increases substantially for both NT1 and NT7 following SA-NMS, with a slight improvement in precision. This indicates that many of the additional detections correspond to valid treefall or root ball instances that were previously fragmented in the baseline predictions. The separation of non-collinear fragments allows SA-NMS to recover distinct object instances that better align with the manually annotated ground truth. If the increase in instance count were dominated by false positives, a corresponding decrease in precision would be expected, which is not observed. In this table, the instance count (# Instance) value represents the final number of object instances retained in each processing stage.
4.2. Limitations and Future Work
This study, like most image-based analyses, was subject to several constraints that could affect consistency and accuracy. First, the method was tested on imagery with varying ground-sampling distances. Results showed that prediction reliability decreased as resolution became coarser. Therefore, the workflow should be applied primarily to high-resolution imagery with a GSD of 10 cm or finer. Second, although both deciduous and coniferous stands were included, training and testing were limited to the dominant species along the tornado path. Broader geographic and ecological testing is still required to verify parameter and model transferability across different forest compositions.
Seasonal conditions posed another limitation. Leaf-off conditions improved visibility of fallen trunks, while leaf-on conditions obscured them, potentially affecting orientation estimates. Accurate results also depended on image perspective and distortion. Nadir-view imagery with precise georeferencing was required to minimize distortion. Oblique imagery introduced parallax and geometric shifts, so maintaining a near-zero camera tilt was essential for reliable results. The geometry-based post-processing framework relied heavily on the quality of the initial YOLO11x segmentation. When the model failed to delineate a trunk correctly, later stages (i.e., SA-NMS), orientation estimation, aggregation, and edge-effect reduction could not fully recover its true shape.
An additional set of limitations is related to the SA-NMS refinement process itself. Several geometric thresholds used in the method (e.g., α, minimum segment area, and correlation threshold) were tuned for the datasets analyzed in this study and may require minor adjustment for substantially different imaging resolutions or scene characteristics. In addition, SA-NMS is more computationally demanding than standard NMS, as it evaluates fragment alignment and applies geometric merging operations only to segmented objects requiring refinement. Finally, the method depends on the quality of the initial YOLO11x segmentation. If a tree is not detected or is only partially segmented in the initial prediction, subsequent refinement steps cannot recover missing geometry. These limitations motivate future work on adaptive parameter selection and further improvements in base model detection performance. Parameter sensitivity posed an additional challenge. Thresholds such as the overlap (τ), α-value, correlation (r), and ROI width were optimized for datasets with a GSD of 2 cm. Different forest densities or GSDs may require recalibration, limiting direct transfer to new datasets.
Figure 9 illustrates where the proposed method failed to estimate the correct fall direction due to irregular trunk shape or partial occlusion of the trunk.
5. Conclusions
This research presented a geometry-based post-processing framework designed to refine deep-learning detections of tornado-damaged trees and root balls. The work addressed two common issues observed in the YOLO11 segmentation results, overlapping masks that duplicated the same object and fragmented masks that divided a single fallen tree into multiple parts. These problems altered tree-count statistics and reduced the accuracy of orientation measurements, both of which are critical in wind-field analysis and damage interpretation in treefall methods for evaluating tornado intensity.
The Shape-Aware Non-Maximum Suppression (SA-NMS) method constituted the first stage of refinement. It applied α-shape reconstruction together with correlation and area-based reasoning to decide when fragments should be merged, separated or suppressed. By weighting each detection by both mask size and confidence, the algorithm preserved the natural geometry of elongated trunks and compact root ball forms while eliminating redundant predictions.
The second stage, treefall-instance aggregation, examined all treefall detections within each orthophoto to reconnect parts of a single trunk that had been divided by obstacles or illumination variations (e.g., local shadows). This process relied on region-of-interest (ROI) reasoning that compared angles, overlaps, and directions to ensure that only geometrically consistent fragments were merged. The third stage, edge-effect reduction, extended the same logic across neighboring tiles, resolving discontinuities created during image tiling and removing duplicate vectors along shared borders.
The proposed refinement methods were tested on high-resolution orthomosaics from a 2021 tornado in the Land Between the Lakes region, confirming their effectiveness. After refinement, the results achieved 76.4% and 77.1% orientation agreement accuracy for the two validation areas, an improvement of approximately 4% over the YOLO11x-seg baseline model. The number of duplicate or incomplete predictions decreased substantially, producing a more reliable and spatially coherent dataset for subsequent analyses of treefall direction and tornado intensity.