A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning

Cao, Bei; Wang, Yinsheng; Li, Yani; Zhu, Xudong; Yang, Zicheng; Liu, Xinlong; Lu, Guangyin

doi:10.3390/rs17244022

Open AccessArticle

A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning

by

Bei Cao

^1,3,4,

Yinsheng Wang

²,

Yani Li

^1,4,

Xudong Zhu

^1,3,4

,

Zicheng Yang

^1,3

,

Xinlong Liu

^1,4 and

Guangyin Lu

^1,3,4,*

¹

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

²

China Railway Academy Group Co., Ltd., Chengdu 610032, China

³

National Engineering Research Center of High-Speed Railway Construction Technology, Changsha 410083, China

⁴

Hunan Key Laboratory of Nonferrous Resources and Geological Hazards Exploration, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(24), 4022; https://doi.org/10.3390/rs17244022 (registering DOI)

Submission received: 8 November 2025 / Revised: 10 December 2025 / Accepted: 12 December 2025 / Published: 13 December 2025

(This article belongs to the Section Engineering Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A semi-automated 3D framework is developed for extracting dry beach regions in tailings ponds, effectively capturing irregular boundaries and complex textures.
DeepLabv3+ achieves superior segmentation accuracy and boundary continuity compared with SegNet and UNet.

What are the implications of the main findings?

The demonstrated incremental training capability enables efficient model adaptation to challenging scenarios such as snow cover and strong shadows.
The proposed method provides reliable 3D dry beach point clouds for accurate dry beach length measurement and morphological change monitoring.

Abstract

The spatial characteristics of the dry beach in tailings ponds are critical indicators for the safety assessment of tailings dams. This study presents a method for dry beach extraction that combines deep learning-based semantic segmentation with 3D reconstruction, overcoming the limitations of 2D methods in spatial analysis. The workflow includes four steps: (1) High-resolution 3D point clouds are reconstructed from UAV images, and the projection matrix of each image is derived to link 2D pixels with 3D points. (2) AlexNet and GoogLeNet are employed to extract image features and automatically select images containing the dry beach boundary. (3) A DeepLabv3+ network is trained on manually labeled samples to perform semantic segmentation of the dry beach, with a lightweight incremental training strategy for enhanced adaptability. (4) Boundary pixels are detected and back-projected into 3D space to generate consistent point cloud boundaries. The method was validated on two-phase UAV datasets from a tailings pond in Yunnan Province, China. In phase I, the model achieved high segmentation performance, with a mean Accuracy and IoU of approximately 0.95 and a BF of 0.8267. When applied to phase II without retraining, the model maintained stable performance on dam boundaries, while slight performance degradation was observed on hillside and water boundaries. The 3D back-projection converted 2D boundary pixels into 3D coordinates, enabling the extraction of dry beach point clouds and supporting reliable dry beach length monitoring and deposition morphology analysis.

Keywords:

UAV; point cloud; deep learning; 3D reconstruction; dry beach; tailings ponds

1. Introduction

Tailings ponds are typically formed by valley interception or surrounding embankments and serve as primary storage sites for solid waste from mineral resource extraction. Their structural integrity is directly linked to the safety of downstream communities and the health of surrounding ecosystems [1,2,3,4]. With the increasing scale of mining operations and accumulation of tailings, effective stability monitoring and management of tailings ponds have become essential for geohazard prevention. The dry beach, which serves as the primary deposition zone, strongly influences pond stability. Its length determines the seepage line and provides a reliable indicator of dam safety [5,6,7]. Meanwhile, its slope gradient influences both the reservoir’s storage capacity and tailings deposition patterns [8,9,10]. Therefore, reliable delineation of the dry beach is necessary for comprehensive operational assessment and timely risk management. To enhance risk management, Chinese regulations have incorporated the dry beach length and slope into routine monitoring requirements.

Traditional measurement methods rely on manual field surveys, which involve visual observation [11], laser-based measurements [12], or ultrasonic level meters [13,14]. Although economical to implement, these methods are affected by operator experience, weather conditions, and site accessibility, often resulting in low accuracy and limited spatial coverage [15]. With the advancement of remote sensing and computer vision, Many scholars are conducting tailings pond monitoring based on images obtained from cameras or satellites [16,17,18]. 2D imaging methods, including K-means clustering [11], watershed algorithm [19], and edge detection [20], have been introduced for dry beach analysis. These techniques extract pixel coordinates at the waterline using visual features such as color, brightness, and texture and then convert them into geometric parameters with projection models. However, their performance heavily depends on threshold selection and remains sensitive to noise, illumination changes, and water surface reflections [21,22,23]. In recent years, semantic segmentation networks have shown strong performance in remote sensing image classification [24,25,26,27]. These networks can automatically extract multi-scale features from images and exhibit robustness in dealing with complex textures. They have been applied to beach–water segmentation tasks [28,29,30,31]. However, image-based approaches cannot capture 3D terrain features, which limits their application for morphological analysis, slope estimation, and 3D visualization.

In contrast to 2D images, point clouds serve as high-accuracy digital terrain models, offering a more comprehensive and precise representation of surface morphology [32,33]. Large-scale, high-resolution point clouds can be acquired either through LiDAR, including terrestrial laser scanning (TLS) and airborne laser scanning (ALS), or UAV photogrammetry. TLS can achieve very high geometric accuracy (millimeter-level), but covering extensive tailings pond areas requires deploying multiple scanning stations, which is labor-intensive [34] and may still leave portions of the area unmeasured. ALS offers broader coverage but has reduced accuracy (typically centimeter-level), comparable to that achievable by UAV photogrammetry [35,36]. UAV photogrammetry not only generates dense 3D point clouds but also provides high-resolution 2D images, which can be used for rapid semantic segmentation. Considering that dry beach changes generally occur at the meter scale, centimeter-level point cloud accuracy is sufficient for practical engineering applications, making UAV photogrammetry the preferred approach for tailings pond monitoring. However, the presence of complex surrounding features such as dams, hillslopes, and vegetation poses significant challenges for precise dry beach extraction from heterogeneous and large-scale point clouds. Common point cloud segmentation methods include region growing (RG) [37], random sample consensus (RANSAC) [38], density-based clustering [39], and deep learning approaches [40]. These techniques have demonstrated effectiveness in landslide detection [41,42], discontinuity extraction [43,44], and urban modeling [45,46,47]. Nevertheless, their applicability in dry beach identification still faces considerable challenges. First, the interface between dry beaches and the water body results in transitional areas characterized by low topographic contrast, rendering the identification of their boundaries difficult [30]. Second, the gently undulating and continuous surface morphology of dry beaches often leads to region-growing algorithms either incorrectly merging distinct areas or fragmenting homogeneous zones. While RANSAC is effective in extracting regular geometric features, it encounters limitations when handling complex morphological boundaries. Density-based clustering methods are also susceptible to variations in point density and occlusions, often leading to unstable extraction results [48,49]. Deep learning approaches offer strong feature learning capacities but require large labeled point cloud datasets [50], which are scarce and costly in tailings pond applications.

To overcome the limitations of existing 2D image methods in spatial quantification and the limitations of direct point cloud methods in robust semantic identification, this study develops a semi-automatic workflow for dry beach extraction in tailings ponds based on UAV photogrammetry and 3D reconstruction. The proposed framework establishes an integrated pipeline of “boundary image retrieval–semantic segmentation–3D back-projection–dry beach monitoring”. By utilizing the projection matrix derived from photogrammetric reconstruction, the 2D boundaries are accurately projected into 3D space, generating geometrically explicit and measurable 3D boundaries and corresponding dry beach point clouds. The strategy fully exploits the maturity and high accuracy of image-based deep learning and the spatial expressiveness of point clouds, while compensating for the inherent lack of spatial information in 2D methods and avoiding the high cost associated with direct point cloud deep learning. Such capabilities facilitate key applications in tailings pond management, including safety assessment of dams and monitoring of tailings deposition.

This paper is organized as follows. Section 2 details the overall workflow, including 3D reconstruction, dry beach boundary image retrieval, semantic segmentation, and 3D boundary extraction. Section 3 validates the reliability and accuracy of the method using two phases of UAV images. Section 4 discusses the generalization capability, advantages, along with practical engineering applications. Section 5 concludes the study and outlines directions for future work.

2. Method

The proposed method comprises four main steps (Figure 1): (1) 3D reconstruction. UAV images are automatically processed to generate a dense point cloud, and projection matrices are obtained to link 2D pixels with 3D points. (2) Boundary image retrieval. Image features are automatically extracted using AlexNet [51] and GoogLeNet [52]. Representative boundary images are manually selected. Images containing dry beach boundaries are then automatically retrieved based on similarity. (3) Boundary image segmentation. The retrieved images are automatically segmented with DeepLabv3+ [53] trained by manually labeled images to delineate the dry beach regions at the pixel level. (4) Boundary extraction and point cloud segmentation. Dry beach boundaries are subsequently automatically extracted and back-projected into 3D space to extract the corresponding dry beach point cloud.

2.1. 3D Reconstruction

To achieve 3D identification of the dry beach in the tailings pond, aerial images captured by UAVs are first processed through Structure-from-Motion (SfM) and Multi-View Stereo (MVS) algorithm to generate a dense point cloud. Simultaneously, a projection matrix is calculated for each image, which records the camera pose in the 3D coordinate system. These matrices allow 2D boundary pixels to be transformed into 3D coordinates. The 3D reconstruction process mainly involves the following steps:

(1) Feature matching and fundamental matrix estimation.

Feature points and their local descriptors are extracted from all images using feature extraction algorithms such as Scale Invariant Feature Transform (SIFT) [54]. Initial correspondences between two images are established by matching similar feature points. Let p₁ = [u₁, v₁, 1], p₂ = [u₂, v₂, 1] denote normalized pixel coordinates in two images, respectively. These coordinates satisfy the constraint defined by the fundamental matrix F:

{p_{2}}^{T} F p_{1} = 0

(1)

F = {K_{2}}^{- T} [t_{\times}] R {K_{1}}^{- 1}

(2)

Here, the rotation matrix R and translation vector t represent the relative pose between the cameras corresponding to the two images. K₁, K₂ denote the intrinsic parameter matrix of their respective cameras, and [t_x] is the skew-symmetric matrix of t. The fundamental matrix F is estimated using at least eight pairs of matched feature points. Given the intrinsic parameters, the essential matrix E related to the fundamental matrix F is defined as follows:

E = {K_{2}}^{T} F K_{1} = [t_{\times}] R

(3)

By performing singular value decomposition (SVD) on E, the rotation matrix R and translation vector t between the two cameras can be determined.

(2) Camera poses estimation and triangulation.

The first camera is fixed as the world coordinate frame (R₁ = I, t₁ = 0), and the pose of the second camera (R₂, t₂) is recovered from the essential matrix E, as described above. Triangulation is then performed for matched points between the first two images to calculate their 3D coordinates X_j. Starting from the third image, the camera poses (R_i, t_i) of subsequent images are determined by solving the Perspective-n-Point (PnP) problem. Specifically, given a set of known 3D points X_j and their corresponding pixel coordinates p_ij in image I_i, along with the camera intrinsics K_i, the camera poses (R_i, t_i) can be estimated, as expressed in Equation (4).

p_{i j} ~ K_{i} [R_{i} | t_{i}] X_{j}

(4)

By iteratively processing all images, the projection matrix K_i [R_i|t_i] for each image and the corresponding sparse point cloud are obtained.

(3) Bundle adjustment

To correct errors in the initial estimates of camera poses and 3D point coordinates, bundle adjustment is applied to optimize all camera poses and 3D points by minimizing the overall reprojection error across all images. Specifically, the reprojection error is defined as the difference between the observed 2D pixel coordinates p_ij and the projected coordinates of the corresponding 3D points, as shown in Equation (5).

\min_{R_{i}, t_{i}, X_{j}} \sum_{i, j} ‖p_{i j} - p r o j (K_{i}, R_{i}, t_{i}, X_{j})‖

(5)

Through nonlinear least squares optimization, both camera poses and 3D points are simultaneously adjusted, resulting in a more accurate sparse point cloud.

(4) Dense point cloud construction

Dense point clouds are generated using the Multi-View Stereo (MVS) algorithm [55]. Given a pixel p_ij in an image, a 3D ray is defined by its corresponding projection matrix:

X (d) = {R_{i}}^{T} (d {K_{i}}^{- 1} p_{i j} - t_{i})

(6)

where d represents the depth to be estimated, and X(d) denotes the 3D coordinate of the pixel p_ij at depth d. The MVS algorithm evaluates the consistency of p_ij across multiple views at varying depths and selects the depth that achieves the best overall consistency with all observations as its 3D coordinate.

This process reconstructs a dense point cloud and determines the corresponding projection matrix for each image. These projection matrices define the relationship between 2D pixels and 3D coordinates, enabling the transfer of semantic information from images to the point cloud.

2.2. Boundary Image Retrieval

The dry beach in tailings ponds exhibits three distinct geomorphic interfaces: (1) the dry beach–dam, (2) the dry beach–hillside, and (3) the dry beach–reservoir water, as illustrated in Figure 2. Retrieving images containing these boundaries from a large UAV dataset is a challenging task, as manual inspection of thousands of images is time-consuming and susceptible to errors. To address this issue, the pretrained AlexNet and GoogLeNet models, both trained on the ImageNet dataset [56], are employed for feature extraction. Boundary images are then retrieved semi-automatically based on a weighted combination of cosine distances calculated from features extracted by both networks.

2.2.1. Image Feature Extraction

The architecture of AlexNet consists of an input layer, five convolutional layers, and three fully connected layers (Figure 3). All input images are standardized to 227 × 227 pixels before processing. In this study, the feature is extracted from the second fully connected layer, generating 4096-dimensional feature vectors. This layer captures high-level semantic information while preserving some spatial structural details. GoogLeNet, as shown in Figure 4, comprises nine Inception modules. Each module integrates 1 × 1, 3 × 3, and 5 × 5 convolutions with 3 × 3 max pooling to increase the network width and enhance multi-scale feature extraction capability [52]. In this work, the global average pooling layer is chosen as the output feature layer, generating 1024-dimensional feature vectors. Located at the end of the backbone, this layer aggregates multi-scale convolutional responses, enabling comprehensive capture of both spatial structure and semantic characteristics. To comply with the architectural requirements of GoogLeNet, input images are standardized to 224 × 224 pixels.

For each image, deep features are extracted using AlexNet and GoogLeNet. Specifically, each image is represented by two feature vectors: (1) a 4096-dimensional feature vector

f^{A} \in ℝ^{4096}

extracted from the AlexNet, and (2) a 1024-dimensional feature vector

f^{G} \in ℝ^{1024}

extracted from the GoogLeNet. These two feature vectors jointly describe the visual characteristics of the input image.

2.2.2. Image Feature Similarity Calculation

To quantify the similarity between two images, the cosine distance is adopted to measure the distance between their corresponding feature vectors. Given two images i and j, their cosine distance of AlexNe features and GoogLeNe features are, respectively, defined as:

d_{A} (i, j) = 1 - \frac{f_{i}^{A} \cdot f_{j}^{A}}{‖f_{i}^{A}‖ \times ‖f_{j}^{A}‖}

(7)

d_{G} (i, j) = 1 - \frac{f_{i}^{G} \cdot f_{j}^{G}}{‖f_{i}^{G}‖ \times ‖f_{j}^{G}‖}

(8)

where

f_{i}^{A}

and

f_{j}^{A}

denote the AlexNet features of images i and j, respectively, and

f_{i}^{G}

and

f_{j}^{G}

denote the corresponding GoogLeNet features. A smaller cosine distance indicates a higher similarity between the two images.

To avoid relying on a single network for similarity evaluation, a weighted fusion strategy is adopted to combine the two distances:

d (i, j) = r \cdot d_{A} (i, j) + (1 - r) \cdot d_{A} (i, j)

(9)

where d(i,j) represents the similarity between images i and j and ranges from 0 to 1, with 0 indicating high similarity and 1 indicating high dissimilarity. r

\in

[0, 1] is a weighting factor controlling the relative contribution of the two networks.

For boundary image retrieval, a small set of n representative boundary images is manually selected as reference templates. For each candidate image in the UAV dataset, the similarity d(i,j) is calculated with respect to each template image. To automatically identify boundary images, a similarity threshold K is applied: candidate images with d(i,j) < K are classified as boundary images, while those with d(i,j) > K are discarded (Equation (10)). The similarity threshold K was determined by examining the retrieval results to ensure high retrieval precision of boundary images. Specifically, K is chosen such that candidate images with similarity below K are very likely to correspond to boundary images, while minimizing the inclusion of non-boundary images.

\min_{i = 1, \dots, n} d (i, j) < K \Rightarrow image j is identified as a boundary image

(10)

2.3. Boundary Image Segmentation

The dry beach in tailings ponds exhibits distinct geomorphological features, making it visually distinguishable from its surroundings. To extract dry beach boundaries, this study employs the DeepLabv3+ semantic segmentation network. Built on an encoder–decoder architecture, DeepLabv3+ captures high-level semantic information while preserving fine details, making it particularly suited for blurred or complex boundaries [53].

2.3.1. Network Architecture

To balance segmentation accuracy and computational efficiency, the standard Xception [57] backbone of DeepLabv3+ is replaced with MobileNetV2 in this study. Although MobileNetV2 achieves slightly lower classification accuracy than Xception, it significantly reduces computational cost and model parameters, effectively decreasing network complexity and maintaining strong image structural perception [58]. The modified architecture (Figure 5) retains the core components of DeepLabv3+, including the Atrous Spatial Pyramid Pooling (ASPP) module with dilation rates of 6, 12, and 18 for multi-scale context aggregation, and the decoder structure for boundary refinement through feature fusion. The processing pipeline includes feature extraction by MobileNetV2, multi-scale context aggregation by the ASPP module, and upsampling with detail restoration by the decoder. The network ultimately outputs a pixel-level segmentation map distinguishing “dry beach” from “background” with the output size matching the input image.

2.3.2. Network Training

Due to the absence of available labeled datasets for dry beach boundary segmentation in tailings ponds, we manually constructed a labeled dataset. A total of 140 representative images were selected from the initial retrieved UAV images, comprising 30 dam-beach, 60 hillside-beach, and 50 water-beach cases. The selection was based on criteria of representativeness, boundary clarity, and structural integrity, ensuring dataset comprehensiveness and model generalization. The dataset was partitioned into training (120 images) and validation (20 images) sets.

The original images were standardized to 395 × 700 pixels to meet the network input requirements while reducing memory consumption and improving training efficiency. Annotations followed a strict protocol: dry beach regions were labeled with purple masks (dry beach), while regions outside the dry beach, including tailings dam, hillside, and water, were labeled with green masks (background). Examples of the annotated masks for the three boundary types are shown in Figure 6.

Data augmentation was applied to enhance the model’s robustness against illumination variations, viewpoint changes, and boundary ambiguities. First, each training image underwent four random brightness and contrast adjustments, generating five variants (including the original). Each variant was then rotated by 90°, 180°, 270°, and horizontally flipped, which generated 25 augmented samples per original image. All geometric transformations were simultaneously applied to the corresponding pixel-level labels to maintain spatial correspondence. The augmented images and labels were resized to the original resolution to ensure pixel alignment. Figure 7 illustrates a typical example of the augmented examples, showing the preservation of boundary structures and increased sample diversity.

In this study, the DeepLabv3+ model was trained on a desktop equipped with an Intel Core i5-12600KF CPU, 32 GB RAM, and an NVIDIA RTX 3060Ti GPU. The training parameters were set with a mini-batch size of 8, a maximum of 50 epochs, and an initial learning rate of 0.001, and the Adam optimizer was employed with adaptive learning rate adjustment. Upon convergence, the trained model was applied to segment all retrieved dry beach boundary images.

2.3.3. Lightweight Incremental Training Strategy

To mitigate performance degradation caused by variations in environmental conditions across different acquisition phases, a lightweight incremental training strategy is introduced to enhance the generalization capability of the segmentation model with a small additional labeling cost.

This strategy assumes that the segmentation network has been pre-trained using an initial annotated dataset. Specifically, the baseline DeepLabv3+ model is first trained on the annotated images from initial acquisition (referred to as ‘phase I’ in Section 3.1), as described in Section 2.3.2. When the trained model is applied to newly acquired imagery and performance degradation is observed in certain boundary types, a small number of representative samples from the new phase dataset are selectively chosen for incremental annotation. These samples are preferentially selected from regions exhibiting significant missegmentation, such as areas affected by strong illumination changes, surface cover transitions, or complex reflection conditions.

The newly annotated samples are then merged with the existing training dataset to construct an updated training set. To compensate for the limited number of incremental samples, data augmentation techniques (rotation, flipping, and brightness and contrast adjustments) are applied to enrich feature diversity and reduce the risk of overfitting. Instead of retraining the network from scratch, the pre-trained model is subsequently fine-tuned using this combined dataset, enabling the network to retain previously learned structural features while adapting to the newly introduced appearance variations.

This lightweight incremental training strategy allows efficient cross-phase adaptation with low annotation effort and computational cost, while maintaining the overall efficiency and practical applicability. It provides a viable solution for long-term dry beach monitoring under continuously evolving environmental conditions.

2.4. Boundary Extraction and Point Cloud Segmentation

To accurately map dry beach boundaries from 2D semantic masks to 3D space, a boundary pixel extraction and back-projection method is employed in this study. First, the Canny edge detection algorithm is applied to the semantic segmentation results to identify 2D boundary pixels between “dry beach” and “non-dry beach” regions, as illustrated in Figure 8. Each boundary pixel is then back-projected into 3D space using the known projection matrix obtained from the 3D reconstruction, forming a ray along which the nearest point in the reconstructed dense point cloud is determined as its 3D coordinate (Equation (6)). Once all boundary pixels are processed, the resulting 3D boundary points are projected onto the XOY plane and fitted with a closed polygon. Points inside the polygon are classified as dry beach, while those outsides are considered non-target features (e.g., dams, hillslopes, water). This approach successfully integrates 2D semantic information with 3D geometric constraints, enabling efficient extraction of dry beach from large-scale point clouds.

3. Experiments and Results

3.1. Data Acquisition

The study area is a valley-type tailings pond located in Yunnan Province, China, with an altitude of approximately 3300–3560 m (Figure 9a). The pond extends roughly 4–6 km along a northwest–southeast trending valley, with a floor width of about 100 m and side slopes of 25–35°, representing a typical morphology of mountainous tailings ponds. The narrow, elongated valley-type morphology of the pond reflects the typical spatial arrangement of dry beaches. The lithology is dominated by weakly metamorphosed slate, with well-developed shear joints and cleavage. Surface water is primarily supplied by the Gecan River system and groundwater is primarily fissure water within the slate, supplemented by scattered pore water in loose deposits. The region experiences pronounced seasonal precipitation, with nearly 90% of annual rainfall occurring during the wet season from May to September, while winter months are cold and mining activities are typically limited.

To validate the applicability and generalization capability of the proposed method, UAV data were collected in two phases: May 2024 (phase I) and January 2025 (phase II). Phase I was acquired during the wet season, when precipitation is concentrated, and the reservoir water level is relatively high, resulting in partial inundation of the dry beach and a relatively shorter exposed beach length. In contrast, phase II was conducted during the cold and dry winter period at a high altitude. During this period, mining activities were suspended, tailings discharge was significantly reduced, and parts of the reservoir surface were frozen, leading to changes in surface reflectance and enhanced brightness contrast between the water and dry beach boundaries. Meanwhile, snow cover occurred on portions of the hillslopes, and strong surface reflections were introduced due to low solar elevation. In addition, the seasonal replacement of black tarpaulins with white ones altered the surface appearance. Therefore, the two phases represent contrasting acquisition conditions in terms of hydrology, surface cover, and illumination, providing a suitable multi-temporal dataset for evaluating the robustness of boundary extraction. Both datasets cover the tailings dam, dry beach, reservoir water, and surrounding hills, providing typical samples with clear boundary structures and diverse terrain.

For high-precision modeling and boundary identification of the tailings pond, multi-view images were acquired using a DJI M300 RTK drone with a Zenmuse P1 camera. As phase I was the initial survey, the flight route was designed to ensure full coverage of the entire study area, with a flight altitude of approximately 145 m and a coverage area of about 3.85 km². In phase II, the flight route was optimized to improve acquisition efficiency while still ensuring complete coverage of the target area, resulting in a reduced flight altitude of approximately 127 m and a smaller coverage area of about 1.23 km². Therefore, four sub-area flights were conducted in phase I and two in phase II, each with at least 80% forward overlap and 70% side overlap to ensure reliable 3D reconstruction (Figure 9b). Thirteen targets were deployed throughout the study area (Figure 9d), and their coordinates were precisely measured using RTK. In phase I, nine of them were used as Ground Control Points (GCPs) to constrain the bundle adjustment, while the remaining four were reserved as independent Check Points (CKPs) for accuracy assessment. In phase II, due to the optimized flight route, one target (Point 11) was not captured, resulting in twelve valid targets, among which eight were used as GCPs and four as CKPs to maintain a consistent validation.

A total of 4016 and 3985 original images were acquired for phase I and phase II, respectively, with each image having a resolution of 8192 × 5460 pixels. SfM and MVS techniques were applied to generate approximately 3.97 million sparse points and 240.8 million dense points in phase I, and 1.71 million sparse points and 177.5 million dense points in phase II, as shown in Figure 9c,d. The accuracy assessment is summarized in Table 1. The root mean square error (RMSE) of CKPs for phase I was ± 3.05 cm (horizontal xy) and ± 4.49 cm (vertical z), while phase II was ± 2.12 cm (horizontal xy) and ± 3.40 cm (vertical z). The errors of the CKPs were consistent in magnitude with the residuals of the GCPs, demonstrating the high reliability of the photogrammetric reconstructions. This level of accuracy is sufficient to meet the engineering requirements of the present study.

Both datasets effectively reconstructed the 3D structure of the tailings dam, dry beach, and reservoir water area. Projection matrices for all images were also obtained during the reconstruction process, enabling subsequent 3D back-projection of semantic boundaries.

3.2. Network Comparison

To evaluate the suitability of the different semantic segmentation networks for dry beach identification, three classical networks (DeepLabv3+, SegNet [59], and UNet [60]) were compared using identical datasets, augmentation strategies, and training parameters. Figure 10 illustrates the training processes of the three networks. With the same number of iterations, DeepLabv3+ (7.5 h) and SegNet (8 h) required far less training time than UNet (54.5 h). More importantly, the learning curve indicates that DeepLabv3+ reached near-convergence significantly earlier than both SegNet and UNet, while consistently maintaining higher accuracy. These results demonstrate the clear advantages of DeepLabv3+ in both training efficiency and performance.

To quantitatively evaluate model performance, three commonly used evaluation metrics are adopted. Accuracy is the ratio of correctly predicted pixels to the total ground truth pixels for a given class. Intersection over Union (IoU) is the ratio of correctly classified pixels to the total number of ground truth and predicted pixels in that class. Boundary F1 Score (BF) quantifies boundary extraction accuracy by comparing predicted boundaries with ground truth, with higher scores indicating better matching.

Table 2 summarizes the mean values of three evaluation metrics across all classes on the test set for the three networks. Figure 11 shows representative segmentation results compared with the ground truth. DeepLabv3+ achieves the highest scores across all indices, producing precise boundaries and structural continuity. SegNet exhibits moderate accuracy but poor boundary delineation (BF = 0.5133). This is visually manifested as blurred and discontinuous boundaries deviating from the true edges (Figure 11). UNet shows relatively lower overall accuracy, although its mean BF (0.5791) is slightly higher than that of SegNet. However, UNet generated noticeable misclassifications in several cases. Overall, DeepLabv3+ demonstrates optimal performance in both training efficiency and segmentation quality, establishing it as the preferred network for dry beach boundary identification in this study.

3.3. Results of Phase I

Four representative boundary images in Figure 12 were selected as templates. The similarity between all images and templates was calculated using Equations (7)–(9). The weighting coefficient r = 0.5 was set to assign equal importance to the features extracted by AlexNet and GoogLeNet. With a threshold of K = 0.35, the retrieval process identified 152 dry beach-dam boundary images from the template in Figure 12a, 248 waterline images from Figure 12b, and 211 and 214 dry beach-hillside boundary images from Figure 12c and Figure 12d, respectively. Examples of the retrieved images are shown in Figure 13.

The retrieved images were then segmented into “dry beach” and “non-dry beach” regions using the trained DeepLabv3+ model (Section 2.3.2). Figure 14 shows segmentation examples corresponding to the images in Figure 13. To quantitatively evaluate the segmentation performance of the proposed method in phase I images, the images shown in Figure 13 were manually annotated and compared with the automatic segmentation results. Table 3 lists the three evaluation metrics, including accuracy, Intersection over Union (IoU), and Boundary F1 Score (BF).

For the dam boundary and hillside boundary, both the mean accuracy and IoU consistently exceeded 0.95, indicating that the segmented dry beach regions exhibit high regional completeness without obvious misclassification. In addition, the mean BF for these boundaries reached values above 0.85, demonstrating a high consistency between the extracted boundaries and the manually interpreted ones. A comparison between the dry beach-hillside 1 and hillside 2 shows that the performance of hillside 2 is slightly lower. This discrepancy is mainly attributed to shadow effects induced by vegetation projected onto the dry beach under oblique illumination conditions. The shadowed areas exhibit spectral characteristics similar to black tarpaulins on the hillside, resulting in partial misclassification of dry beach pixels as background (Figure 14d).

For the water boundary, the evaluation metrics are relatively lower, with mean accuracy, IoU, and BF of approximately 0.9101, 0.8477, and 0.7059, respectively. This is mainly because the transition zone between water and dry beach is inherently ambiguous, where water and deposited materials are intermingled. Consequently, both manual annotation and automatic segmentation suffer from unavoidable uncertainties near the transition zone.

Canny edge detection was subsequently applied to the extracted 2D boundary pixels from the segmented images, with results illustrated in Figure 15. Next, the 2D boundary pixels were back-projected to 3D space using the projection matrices obtained from SfM reconstruction. For each pixel, a ray was constructed along the back-projection direction, and the nearest neighbor in the dense point cloud was selected as the 3D coordinate, producing a set of 3D boundary points (Figure 16a). These 3D boundary points were projected onto the XOY plane and fitted with a closed 2D polygon. The dense point cloud was then clipped through a spatial inclusion test to isolate the dry beach points (Figure 16b). The resulting point cloud exhibits clear coverage and structural continuity, providing a reliable basis for further geometric analyses.

3.4. Results of Phase II

To assess the temporal adaptability of the proposed method, the semantic segmentation model trained on phase I was directly applied to the phase II dataset. Differences in imaging conditions and environmental states between the two phases affected the stability and generalization performance of the model. From the 3985 images in phase II, 653 boundary images were identified, including 62 dry beach-dam boundaries, 343 dry beach-hillslope boundaries, and 152 waterline boundaries. Typical examples are shown in Figure 17, with the first image serving as the template.

The segmentation performance in phase II images was evaluated using manually annotated boundaries in the same manner as in phase I (Table 4). The results indicate that the model maintained good performance in certain regions of phase II. For the dam boundary, the evaluation metrics remain comparable to those in phase I, due to the clear structural outlines (Figure 17a), indicating that the proposed model maintains stable performance.

The hillside boundary (Figure 17c) was also segmented accurately under varying illumination and shadows, reflecting a reasonable generalization. However, the evaluation metrics show a decrease, with mean accuracy, IoU, and BF dropping to approximately 0.9314, 0.9186, and 0.8043, respectively. This degradation is mainly caused by surface changes between the two periods. Some hillside boundary images of phase II included white coverings such as snow and white tarpaulins, whereas phase I images featured black tarpaulins partially obscuring the hillside. These variations in color and texture degrade the model generalization, leading to misclassifications in the affected regions (Figure 17c).

For the water boundary in phase II, the performance also exhibits a certain degree of degradation (Figure 17b). In phase I, the uncertainty mainly originates from the inherent mixing characteristics of the transition zone, which makes the boundary ambiguous. In phase II, however, the performance degradation is additionally closely related to the changes in illumination conditions. Compared with the dam and hillslope regions, the water surface is more sensitive to illumination variations, and changes in optical reflection properties cause the visual features of part of the dry beach-reservoir water transition zone to deviate from the feature distribution learned from the phase I training data. As a result, when the model trained only in phase I images is directly applied to the phase II dataset, a decrease in segmentation robustness is observed in the water boundary regions.

4. Discussion

4.1. Analysis of Generalization Capability

As discussed in Section 3.4, although some regions in phase II exhibited boundary blurring and misclassifications, the overall structural continuity of the boundaries was preserved. However, these errors revealed model generalization limitations when faced with variations in surface cover and reflections. To enhance robustness under such challenging conditions, a lightweight incremental training strategy was implemented. The model was fine-tuned using only 8 annotated phase II images (2 hillside boundaries and 6 waterline boundaries) (Figure 18), which were augmented to 200 training samples.

The results demonstrate a clear improvement in segmentation accuracy for hillside regions and more distinct boundaries in waterline regions, as shown in Figure 19. Figure 20 shows the corresponding 3D boundary and dry beach point cloud after fine-tuning. Overall, the experiments demonstrate the cross-phase generalization capability of the proposed method. The lightweight incremental training approach reduces deployment costs and demonstrates practical potential. Moreover, integrating continual learning techniques in future work could further improve adaptability to extreme conditions.

4.2. Traditional Point Cloud Segmentation

Two conventional point cloud segmentation techniques were also evaluated: Region Growing (RG) and Random Sample Consensus (RANSAC). RG clusters point clouds based on geometric similarity using normal vector and curvature as growth criteria. However, inconsistent normal and significant curvature variations caused by sediment accumulation and local subsidence in the dry beach compromise the segmentation consistency. As shown in Figure 21a, RG incorrectly merges the dry beach with the tailings dam into a continuous structure and produces fragmented segments near the waterside. RANSAC, which fits geometric primitives (e.g., planes, cylinders) through random sampling and consensus evaluation, is similarly challenged by the complex micro-topography and elevation variations in the dry beach. As shown in Figure 21b, it overextends boundaries and captures water–hillside interfaces as well as portions of the water surface.

In summary, the performance of RG and RANSAC is constrained by their dependence on geometric assumptions, rendering them inadequate for precise dry beach boundary extraction. In contrast, our method exploits deep networks to extract high-level semantic features from images and reconstructs accurate 3D boundaries through projection matrices.

4.3. Dry Beach Monitoring

The proposed method not only enables automated extraction of 3D dry beach boundaries but also quantitative monitoring of safety indicators and deposition changes. First, dry beach length can be derived from reconstructed 3D boundaries. Monitoring across five sections (Figure 22a) indicates an average variation of 20.4 m. Given that the minimum dry beach length influences the phreatic line position, particular concern is paid to point P5, where a 64.1 m reduction was observed between the two phases (Figure 22b). Such variations reflect seasonal hydrological conditions: higher water levels during May rainfall shorten the dry beach, whereas lower levels in drier January expose more of it. Second, the results support multi-temporal deposition analysis through point cloud registration and vertical differences analysis. A sectional profile of the dry beach point cloud is shown in Figure 23a, where noise points are distributed along the Z-direction on both sides of the dry beach surface, while points on the actual surface exhibit the highest density, and the surrounding sparse points are considered noise. Considering this characteristic, the dry beach point clouds were denoised using a density-based filtering method on sectional profiles, in which points with the maximum Z-direction density were retained as valid points (Figure 23a). Then, pairwise comparisons of the denoised multi-temporal dry beach point clouds were conducted. As shown in Figure 23b, deposition is concentrated along the main discharge channel, while regions near the reservoir water show minimal elevation change. These results demonstrate the spatial heterogeneity of tailings discharge and deposition thickness, providing a crucial 3D dataset for characterizing depositional processes.

5. Conclusions

In this study, we proposed a deep learning–based framework that integrates image semantic segmentation with 3D reconstruction to accurately extract dry beach boundaries from UAV datasets, establishing a complete workflow from boundary image retrieval and semantic segmentation to 3D back-projection and dry beach point cloud extraction.

The proposed method was first applied to phase I images, where DeepLabv3+, SegNet, and Unet were trained on a limited number of annotated images. Comparative experiments indicate that DeepLabv3+ achieved higher segmentation accuracy, Intersection over Union (IoU), and Boundary F1 score (BF), making it a suitable choice for this study. The results of phase I demonstrate that the proposed framework can effectively extract continuous boundaries and point clouds of the dry beach.

When the model trained in phase I images is directly applied to the phase II dataset, the performance remains generally stable for dam and partial hillslope boundaries, while certain degradations occur in regions affected by surface cover changes and illumination variations. These issues can be effectively mitigated through fine-tuning with a small number of additional samples, confirming the feasibility of the lightweight incremental training strategy.

Unlike traditional point-cloud segmentation that relies on geometric conditions, the proposed approach effectively handles irregular boundaries and complex textures. The extracted 3D boundaries and dry beach point cloud enable the quantitative analysis of length and morphology changes, overcoming the limitations of purely 2D image analyses.

In this work, AlexNet and GoogLeNet were employed for image feature extraction, and DeepLabv3+ was used for semantic segmentation. Although these are classical network architectures, the experimental results demonstrate that the extracted boundary images and segmentation outputs are sufficient for practical engineering applications. Future work will explore the integration of more recent network architectures and more challenging image scenarios to further improve the robustness and efficiency of the proposed framework. Overall, the proposed framework provides a reliable and scalable solution for automated 3D monitoring of tailings beaches, supporting safety assessment and multi-temporal deposition analysis in engineering applications.

Author Contributions

Conceptualization, B.C. and Y.W.; methodology, B.C., Y.W. and Y.L.; software, B.C. and Y.L.; validation, Y.L.,Y.W. and Z.Y.; investigation, B.C. and Y.W.; resources, G.L. and Y.W.; data curation, B.C. and Y.L.; writing—original draft preparation, B.C. and Y.L.; writing—review and editing, B.C., X.Z., Z.Y. and X.L.; visualization, G.L.; supervision, X.Z. and G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 41974148), Hunan Provincial Natural Science Foundation of China (No. 2025JJ80007), Guangxi Key Research and Development Project (AB24010133), and Science and Technology Research and Development Project of China Railway Co., LTD (No. 2022-Special-07).

Data Availability Statement

The data and materials that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Author Yinsheng Wang was employed by the company China Railway Academy Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from China Railway Co., LTD. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Macklin, M.G.; Brewer, P.A.; Balteanu, D.; Coulthard, T.J.; Driga, B.; Howard, A.J.; Zaharia, S. The long term fate and environmental significance of contaminant metals released by the January and March 2000 mining tailings dam failures in Maramureş County, upper Tisa Basin, Romania. Appl. Geochem. 2003, 18, 241–257. [Google Scholar] [CrossRef]
Piciullo, L.; Storrøsten, E.B.; Liu, Z.; Nadim, F.; Lacasse, S. A new look at the statistics of tailings dam failures. Eng. Geol. 2022, 303, 106657. [Google Scholar] [CrossRef]
Hao, T.; Xu, K.; Zheng, X.; Liu, B.; Li, J. Forecasting and uncertainty analysis of tailings dam system safety based on data mining techniques. Appl. Math. Model. 2024, 133, 474–490. [Google Scholar] [CrossRef]
Hudson-Edwards, K.A.; Kemp, D.; Torres-Cruz, L.A.; Macklin, M.G.; Brewer, P.A.; Owen, J.R.; Thomas, C.J. Tailings storage facilities, failures and disaster risk. Nat. Rev. Earth Environ. 2024, 5, 612–630. [Google Scholar] [CrossRef]
Tian, S.; Zhao, Y.; Gong, Y.; Wang, G. Experimental study on tailings deposition distribution pattern and sedimentation characteristics. ACS Omega 2024, 9, 19428–19439. [Google Scholar] [CrossRef]
Barlow, K.; Walsh, A.; Mckellar, M.; Mulligan, R.; McDougall, S.; Evans, S.G.; Take, W.A. Effect of the presence of a tailings dam beach on breach outflow and erosion during overtopping failure. Eng. Geol. 2025, 344, 107805. [Google Scholar] [CrossRef]
Du, C.; Tao, H.; Yi, F. Seepage stability analysis of geogrid reinforced tailings dam. Sci. Rep. 2024, 14, 1814. [Google Scholar] [CrossRef]
Li, A.L. Tailings subaerial and subaqueous deposition and beach slope modeling. J. Geotech. Geoenviron. Eng. 2015, 141, 04014089. [Google Scholar] [CrossRef]
Zhang, D.; Wang, X.; Shen, Y.; Dong, Z.; Li, X.; Li, Y.; Sui, S. Study on the law of tailings deposition under different discharge concentration. J. Chongqing Univ. 2023, 46, 68–77. (In Chinese) [Google Scholar]
Li, Q.; Duan, Z.; Shi, H.; Yu, Y.; Zhang, H.; Fu, B. Development of a full-scale flume test system for depositional behavior study of segregating tailings. Bull. Eng. Geol. Environ. 2024, 83, 66. [Google Scholar] [CrossRef]
Hu, J.; Hu, S.; Kang, F.; Zhang, J. Real-Time Dry Beach Length Monitoring for Tailings Dams Based on Visual Measurement. Math. Probl. Eng. 2013, 1, 935371. [Google Scholar] [CrossRef]
Wang, K.; Zhang, Z.; Yang, X.; Wang, D.; Zhu, L.; Yuan, S. Enhanced Tailings Dam Beach Line Indicator Observation and Stability Numerical Analysis: An Approach Integrating UAV Photogrammetry and CNNs. Remote Sens. 2024, 16, 3264. [Google Scholar] [CrossRef]
Li, W.; He, Y.; Tao, L.; Chen, J.; Wang, X.; Ge, Y.; Lin, H. Scene-GCN: A time-series prediction method in complex monitoring environments through spatial–temporal knowledge graph (ST-KG). Int. J. Geogr. Inf. Sci. 2025, 39, 2211–2235. [Google Scholar] [CrossRef]
Du, C.; Niu, B.; Yi, F.; Jiang, X.; Liang, L. Impact of inundation range of overtopping dam break of tailings pond under actual terrain conditions. PLoS ONE 2023, 18, e0295056. [Google Scholar] [CrossRef]
Sun, E.; Zhang, X.; Li, Z. The internet of things (IOT) and cloud computing (CC) based tailings dam monitoring and pre-alarm system in mines. Saf. Sci. 2012, 50, 811–815. [Google Scholar] [CrossRef]
Zhang, H.; Li, Q.; Wang, J.; Fu, B.; Duan, Z.; Zhao, Z. Application of space–sky–earth integration technology with UAVs in risk identification of tailings ponds. Drones 2023, 7, 222. [Google Scholar] [CrossRef]
Sun, Y.; Chen, H.; Tong, R. Tailings Pond Dry Beach Length Measurement Based on Deep Learning Water Instance Segmentation. Acta Metrol. Sin. 2024, 45, 1607–1614. (In Chinese) [Google Scholar]
Liu, B.; Wu, W.; Wu, H.; Ma, A.; Hu, S.; Du, N.; Xie, J. KERF: A knowledge-enhanced relearning framework for tailings pond detection from high-resolution remote sensing images. GISci. Remote Sens. 2025, 62, 2541429. [Google Scholar] [CrossRef]
Huang, Q.; Li, Q.; Wang, Y.; Zhang, J. Dry beach of tailings dam length measurement based on waterline recognition. In Proceedings of the 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 19–20 December 2015; pp. 503–507. [Google Scholar]
Zhang, Y.; Niu, Y.; Li, X. Length Monitoring for Dry Beach of Tailings Dam Based on the Digital-image-processing Technology. Min. Res. Dev. 2013, 6, 4. (In Chinese) [Google Scholar]
Wang, Y.; Lv, H.; Deng, R.; Zhuang, S. A comprehensive survey of optical remote sensing image segmentation methods. Can. J. Remote Sens. 2020, 46, 501–531. [Google Scholar] [CrossRef]
Jiang, H.; Peng, M.; Zhong, Y.; Xie, H.; Hao, Z.; Lin, J.; Hu, X. A survey on deep learning-based change detection from high-resolution remote sensing images. Remote Sens. 2022, 14, 1552. [Google Scholar] [CrossRef]
Xie, K.; Lei, D.; Du, W.; Bai, P.; Zhu, F.; Liu, F. A new operator based on edge detection for monitoring the cable under different illumination. Mech. Syst. Signal Process. 2023, 187, 109926. [Google Scholar] [CrossRef]
Lyu, Y.; Vosselman, G.; Xia, G.S.; Yilmaz, A.; Yang, M.Y. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS J. Photogramm. Remote Sens. 2020, 165, 108–119. [Google Scholar] [CrossRef]
Li, X.; Zhou, Y.; Wang, F. Advanced information mining from ocean remote sensing imagery with deep learning. J. Remote Sens. 2022, 2022, 9849645. [Google Scholar] [CrossRef]
Liu, X.; Peng, Y.; Lu, Z.; Li, W.; Yu, J.; Ge, D.; Xiang, W. Feature-fusion segmentation network for landslide detection using high-resolution remote sensing images and digital elevation model data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4500314. [Google Scholar] [CrossRef]
Wieland, M.; Martinis, S.; Kiefl, R.; Gstaiger, V. Semantic segmentation of water bodies in very high-resolution satellite and aerial images. Remote Sens. Environ. 2023, 287, 113452. [Google Scholar] [CrossRef]
Yang, J.; Sun, Y.; Li, Q.; Qian, Z. Measure dry beach length of tailings pond using deep learning algorithm. In Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence, Shanghai, China, 20–22 September 2019; pp. 503–508. [Google Scholar]
Vicens-Miquel, M.; Medrano, F.A.; Tissot, P.E.; Kamangir, H.; Starek, M.J.; Colburn, K. A deep learning based method to delineate the wet/dry shoreline and compute its elevation using high-resolution UAS imagery. Remote Sens. 2022, 14, 5990. [Google Scholar] [CrossRef]
Wang, J.; Wu, Y. Research on Dry Beach Length Measurement Based on Improved Mask R-CNN Algorithm. In Proceedings of the 2022 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an, China, 23–25 September 2022; pp. 1–5. [Google Scholar]
Duan, Z.; Tian, Y.; Li, Q.; Liu, G.; Cui, X.; Zhang, S. Deep Learning-Based Calculation Method for the Dry Beach Length in Tailing Ponds Using Satellite Images. Appl. Sci. 2024, 14, 7560. [Google Scholar] [CrossRef]
Krüsi, P.; Furgale, P.; Bosse, M.; Siegwart, R. Driving on point clouds: Motion planning, trajectory optimization, and terrain assessment in generic nonplanar environments. J. Field Robot. 2017, 34, 940–984. [Google Scholar] [CrossRef]
Yang, X.; Hu, J.; Li, P.; Gao, C.; Latifi, H.; Bai, X.; Tang, F. SCCD: A slicing algorithm for detecting geomorphic changes on topographically complex areas based on 3D point clouds. Remote Sens. Environ. 2024, 303, 114022. [Google Scholar] [CrossRef]
Yao, H.; Liang, X. Autonomous exploration under canopy for forest investigation using lidar and quadrotor. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5704719. [Google Scholar] [CrossRef]
Chen, C.; Zhang, C.; Schwarz, C.; Tian, B.; Jiang, W.; Wu, W.; Zhou, Y. Mapping three-dimensional morphological characteristics of tidal salt-marsh channels using UAV structure-from-motion photogrammetry. Geomorphology 2022, 407, 108235. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Y.T.; Hancock, C.; Hamm, N.A.S.; Zhang, Z. A Novel Approach for As-Built BIM Updating Using Inertial Measurement Unit and Mobile Laser Scanner. Remote Sens. 2024, 16, 2743. [Google Scholar] [CrossRef]
Rabbani, T.; Van Den Heuvel, F.; Vosselmann, G. Segmentation of point clouds using smoothness constraint. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2006, 36, 248–253. [Google Scholar]
Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. In Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2007; Volume 26, pp. 214–226. [Google Scholar]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd; AAAI Press: Washington, DC, USA, 1996; Volume 96, pp. 226–231. [Google Scholar]
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef]
Ma, J.; Luo, B.; Zhao, Y.; Li, S.; Chen, M.; Wei, J.; Zhang, X. Detection and automatic identification of landslide areas from the LiDAR point clouds using improved DBSCAN. Landslides 2025, 22, 3843–3854. [Google Scholar] [CrossRef]
Beheshtifar, S. Identification of landslide-prone zones using a GIS-based multi-criteria decision analysis and region-growing algorithm in uncertain conditions. Nat. Hazards 2023, 115, 1475–1497. [Google Scholar] [CrossRef]
Riquelme, A.J.; Abellán, A.; Tomás, R.; Jaboyedoff, M. A new approach for semi-automatic rock mass joints recognition from 3D point clouds. Comput. Geosci. 2014, 68, 38–52. [Google Scholar] [CrossRef]
Cao, B.; Zhu, X.; Lin, Z.; Li, Y.; Yang, Z.; Lu, G. Semi-automatic measurement for rock mass discontinuity orientation, trace and spacing from point clouds. Measurement 2025, 246, 116688. [Google Scholar] [CrossRef]
Jung, J.; Che, E.; Olsen, M.J.; Parrish, C. Efficient and robust lane marking extraction from mobile lidar point clouds. ISPRS J. Photogramm. Remote Sens. 2019, 147, 1–18. [Google Scholar] [CrossRef]
Rastiveis, H.; Shams, A.; Sarasua, W.A.; Li, J. Automated extraction of lane markings from mobile LiDAR point clouds based on fuzzy inference. ISPRS J. Photogramm. Remote Sens. 2020, 160, 149–166. [Google Scholar] [CrossRef]
Li, Z.; Shan, J. RANSAC-based multi primitive building reconstruction from 3D point clouds. ISPRS J. Photogram-Metry Remote Sens. 2022, 185, 247–260. [Google Scholar] [CrossRef]
Ram, A.; Jalal, S.; Jalal, A.S.; Kumar, M. A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 2010, 3, 1–4. [Google Scholar] [CrossRef]
Liu, F.Y.; Geng, H.; Shang, L.Y.; Si, C.J.; Shen, S.Q. A cotton organ segmentation method with phenotypic measurements from a point cloud using a transformer. Plant Methods 2025, 21, 37. [Google Scholar] [CrossRef]
Reed, S.; Lee, H.; Anguelov, D.; Szegedy, C.; Erhan, D.; Rabinovich, A. Training deep neural networks on noisy labels with bootstrapping. arXiv 2014, arXiv:1412.6596. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25; Curran Associates, Inc.: Red Hook, NY, USA, 2012. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) 2018, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Seitz, S.M.; Curless, B.; Diebel, J.; Scharstein, D.; Szeliski, R. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 519–528. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention 2015, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]

Figure 1. The overall workflow of the proposed method.

Figure 2. Three types of boundaries. (a) Dry beach—dam. (b) Dry beach—hillside. (c) Dry beach—reservoir water.

Figure 3. AlexNet structure and feature extraction layer.

Figure 4. GoogLeNet structure and feature extraction layer.

Figure 5. The network architecture of DeepLabv3+ employed in this study.

Figure 6. Examples of annotated masks for three typical boundaries. (a) Dry beach—dam. (b) Dry beach—hillside. (c) Dry beach—reservoir water.

Figure 7. Example of data augmentation. (a) Original image. (b) Rotation 90°. (c) Horizontal flipping. (d) Brightness and contrast adjustments.

Figure 8. Illustration of boundary pixel extraction. (a) Dry beach—dam. (b) Dry beach—hillside. (c) Dry beach—reservoir water.

Figure 9. Location of the study site and data acquisition. Left: phase I. Right: phase II. (a) Location of the study site. (b) Flight routines. (c) Sparse point cloud. (d) Dense point cloud.

Figure 10. The training process of the three networks. (a) Training accuracy. (b) Training loss.

Figure 11. Semantic segmentation results under three networks. (a) Test data 1. (b) Test data 2. (c) Test data 3. (d) Test data 4. (e) Test data 5. (f) Test data 6.

Figure 12. The selected representative boundary image of phase I. (a) Dry beach—dam. (b) Dry beach—reservoir water. (c) Dry beach—hillside 1. (d) Dry beach—hillside 2.

Figure 13. Partial retrieval results of boundary image for phase I. (a) Dry beach—dam. (b) Dry beach—reservoir water. (c) Dry beach—hillside 1. (d) Dry beach—hillside 2.

Figure 14. Partial segmentation results of phase I. (a) Dry beach—dam. (b) Dry beach—reservoir water. (c) Dry beach—hillside 1. (d) Dry beach—hillside 2.

Figure 15. Partial 2D boundary extraction results of phase I. (a) Dry beach—dam. (b) Dry beach—reservoir water. (c) Dry beach—hillside 1. (d) Dry beach—hillside 2.

Figure 16. The 3D boundary extraction result of phase I. (a) 3D coordinates. (b) Dry beach point cloud.

Figure 17. Results of the phase II. (a) Dry beach—dam. (b) Dry beach—reservoir water. (c) Dry beach—hillside.

Figure 18. Eight newly annotated samples.

Figure 19. Segmentation results of phase II after model enhancement. (a) Dry beach—hillside. (b) Dry beach—reservoir water.

Figure 20. 3D boundary extraction result of phase II. (a) 3D coordinates. (b) Dry beach point cloud.

Figure 21. Results of traditional segmentation for phase I. (a) RG. (b) RANSAC.

Figure 22. Monitoring of dry beach length. (a) Location of the monitoring points. (b) Length of the dry beach.

Figure 23. Comparison of two-phase point clouds. (a) Point cloud denoising. (b) Vertical differences of two-phase point clouds.

Table 1. Accuracy evaluation of GCPs and CKPs for phase I and phase II.

Phase I						Phase II
GCPs			CKPs			GCPs			CKPs
No.	Error (cm)		No.	Error (cm)		No.	Error (cm)		No.	Error (cm)
No.	XY	Z	No.	XY	Z	No.	XY	Z	No.	XY	Z
1	5.93	0.68	2	1.83	6.04	1	1.80	3.39	4	2.92	2.94
3	4.85	0.48	5	4.02	1.13	2	2.06	0.70	5	1.06	2.76
4	9.68	6.97	10	3.61	4.44	3	3.39	1.66	6	1.05	3.72
6	4.51	1.09	11	2.18	4.83	7	1.42	2.12	9	2.68	4.01
7	1.20	0.12	-	-	-	8	2.02	0.12	-	-	-
8	0.76	0.91	-	-	-	10	0.16	1.42	-	-	-
9	3.15	0.01	-	-	-	12	1.95	2.41	-	-	-
12	1.95	0.18	-	-	-	13	3.53	0.73	-	-	-
13	0.81	0.73	-	-	-	-	-	-	-	-	-
RMSE	4.58	2.40		3.05	4.49		2.28	1.86		2.12	3.40

Table 2. The mean value of the evaluation metrics based on the test data.

	Mean Accuracy	Mean IoU	Mean BF
DeepLabv3+	0.9642	0.9349	0.8072
SegNet	0.9230	0.8782	0.5133
UNet	0.8665	0.7723	0.5791

Table 3. The mean evaluation metrics for the representative phase I images are shown in Figure 13.

	Mean Accuracy	Mean IoU	Mean BF
dry beach—dam	0.9931	0.9862	0.8890
dry beach—reservoir water	0.9101	0.8477	0.7059
dry beach—hillside 1	0.9871	0.9713	0.8611
dry beach—hillside 2	0.9805	0.9614	0.8507
Mean	0.9677	0.9417	0.8267

Table 4. The mean evaluation metrics for the representative phase II images are shown in Figure 17.

	Mean Accuracy	Mean IoU	Mean BF
dry beach—dam	0.9874	0.9797	0.8802
dry beach—reservoir water	0. 8501	0.7841	0.6114
dry beach—hillside	0.9314	0.9186	0.8043
Mean	0.9230	0.8941	0.7653

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, B.; Wang, Y.; Li, Y.; Zhu, X.; Yang, Z.; Liu, X.; Lu, G. A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning. Remote Sens. 2025, 17, 4022. https://doi.org/10.3390/rs17244022

AMA Style

Cao B, Wang Y, Li Y, Zhu X, Yang Z, Liu X, Lu G. A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning. Remote Sensing. 2025; 17(24):4022. https://doi.org/10.3390/rs17244022

Chicago/Turabian Style

Cao, Bei, Yinsheng Wang, Yani Li, Xudong Zhu, Zicheng Yang, Xinlong Liu, and Guangyin Lu. 2025. "A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning" Remote Sensing 17, no. 24: 4022. https://doi.org/10.3390/rs17244022

APA Style

Cao, B., Wang, Y., Li, Y., Zhu, X., Yang, Z., Liu, X., & Lu, G. (2025). A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning. Remote Sensing, 17(24), 4022. https://doi.org/10.3390/rs17244022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Semi-Automatic Framework for Dry Beach Extraction in Tailings Ponds Using Photogrammetry and Deep Learning

Highlights

Abstract

1. Introduction

2. Method

2.1. 3D Reconstruction

2.2. Boundary Image Retrieval

2.2.1. Image Feature Extraction

2.2.2. Image Feature Similarity Calculation

2.3. Boundary Image Segmentation

2.3.1. Network Architecture

2.3.2. Network Training

2.3.3. Lightweight Incremental Training Strategy

2.4. Boundary Extraction and Point Cloud Segmentation

3. Experiments and Results

3.1. Data Acquisition

3.2. Network Comparison

3.3. Results of Phase I

3.4. Results of Phase II

4. Discussion

4.1. Analysis of Generalization Capability

4.2. Traditional Point Cloud Segmentation

4.3. Dry Beach Monitoring

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI