Next Article in Journal
Spatio-Temporal Characteristics of the Morphological Development of Gully Erosion on the Chinese Loess Plateau
Previous Article in Journal
Monitoring Irish Coastal Heritage Destruction: A Case Study from Inishark, Co. Galway, Ireland
Previous Article in Special Issue
Deep Learning-Based Real-Time Surf Detection Model During Typhoon Events
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks

Navigational Faculty, Geoinformatics and Remote Sensing Department, Maritime University of Szczecin, Wały Chrobrego 1-2, 70-500 Szczecin, Poland
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(15), 2707; https://doi.org/10.3390/rs17152707
Submission received: 20 June 2025 / Revised: 17 July 2025 / Accepted: 31 July 2025 / Published: 5 August 2025

Abstract

The number of CNN application areas is growing, which leads to the need for training data. The research conducted in this work aimed to obtain effective detection models trained only using simplified synthetic objects (SSOs). The research was conducted on inland shallow water areas, while images of bottom objects were obtained using a UAV platform. The work consisted in preparing SSOs, thanks to which composite images were created. On such training data, 120 models based on the YOLO (You Only Look Once) network were obtained. The study confirmed the effectiveness of models created using YOLOv3, YOLOv5, YOLOv8, YOLOv9, and YOLOv10. A comparison was made between versions of YOLO. The influence of the amount of training data, SSO type, and augmentation parameters used in the training process was analyzed. The main parameter of model performance was the F1-score. The calculated statistics of individual models indicate that the most effective networks use partial augmentation, trained on sets consisting of 2000 SSOs. On the other hand, the increased transparency of SSOs resulted in increasing the diversity of training data and improving the performance of models. This research is developmental, and further research should improve the processes of obtaining detection models using deep networks.

1. Introduction

The increasing progress in the development of deep networks has resulted in numerous publications in recent years relating to the use of machine learning [1]. One of the environments in which work is carried out is shallow water areas. In particular, convolutional neural networks (CNNs) allow for the development of research related to the habitats of marine animals and plants [2,3,4,5,6,7,8]. The use of deep learning allows for the development of techniques related to object detection, as well as improving the quality of images [4,5,9,10]. The ocean is still largely unmapped, and although only about 1% of the ocean is shallow water below 10 m, it is still an important area of research [11,12].
Among interesting applications, it is worth mentioning the acquisition of knowledge regarding the monitoring of animal populations [2,13,14], detection of their habitat areas [3,4,8], monitoring of underwater plant vegetation [15,16,17], underwater boulder mapping [18,19], and control of water pollution [20,21,22]. In the case of working with artificial intelligence-based networks, one of the leading problems is the provision of large training sets [8,23,24]. The large number of labeled images that must be provided by a deep network has led to the search for alternative solutions. One such solution is the use of synthetic data. This type of data can be obtained in many ways, which can be divided into rule-based simulation and generative AI simulation [25,26,27,28,29,30,31,32].
Many publications refer to the use of synthetic data to obtain neural networks for detection purposes. The obtained synthetic data are created as a result of simple image processing as well as complex processes of generating 3D objects [26,33]. They can also contain generated backgrounds or real images [34]. Examples of works in which extremely realistic synthetic objects are generated are all kinds of research related to the inspection of mechanical objects, where the search focuses on details and defects [33,35]. In these studies, the researchers focused on developing a robust system for creating synthetic or composite images. Objects created using computer programs such as Blender allowed for the creation of objects nearly identical to real-world objects. The automation process allowed for automatic labeling and the introduction of changes related to lighting, positioning, and texture. The primary goal of the solutions being developed is to create reliable datasets.
In contrast to this approach, works on objects in shallow water reservoirs often contain elements related to improving the quality of images. The aquatic environment is an extremely variable and difficult area to work in. The use of remote sensing images is limited by the range of electromagnetic radiation penetration; hence, there are works focusing on the detection of objects from sonar images [12,18,19,27,36,37]. Such images do not allow for obtaining natural object colors and have limitations in the case of flat objects, and the use of sonar in shallow water areas can be problematic. In the case of traditional photos taken by an unmanned aerial system (UAV), as well as underwater images, we are dealing with factors that make it difficult to obtain remote sensing data, such as reflections, refraction and dispersion of light, pollution occurring in water, and currents that dynamically change the environmental conditions [3,9,10,38,39,40,41,42]. A number of factors, more or less locally diverse, must be taken into account in such studies. In these studies, the influence of water refers to factors such as water waves and depth, which affect the color and shape of the object. Additionally, the images show reflections and bottom sediments obscuring the objects.
In this article, attention was focused on obtaining images using unmanned aerial vehicles (UAVs) and detecting objects in shallow inland waters. A small amount of training data led to the use of synthetic data. The influence of the previously mentioned factors, i.e., the dynamics of changes in the aquatic environment and its unpredictability, creates a big problem in generating a realistic synthetic object. In this work, the spatial resolution of remote sensing images was also a limitation that influenced the visualization of underwater objects observed from an aerial platform. This is mainly caused by image pixelization and depends on the altitude of the remote sensing platform and the specifications of the optical sensor. These factors led to the proposal to use simplified synthetic objects. Such an SSO visually represents the simplified appearance of the object captured by a UAV camera rather than as a complex 3D model that realistically replicates the real one. Simplification allows for faster generation of this type of synthetic data, as well as easier adaptation to the environment in which the detection is carried out.
The main goal of this article was to develop a methodology for creating training dataset with simplified synthetic data of objects located at the bottom of inland water reservoirs for the purposes of creation detection models based on deep networks. This topic is related to certain challenges, limitations, and opportunities that are described later in this work. The partial goals achieved in the work were as follows:
  • Development of simplified synthetic objects;
  • Assessing the impact of type, quantity, and augmentation of SSOs on detection accuracy;
  • Testing and validating the neural network models using real data to ensure its efficiency;
  • Analyzing the effectiveness of the CNN model in locating ground control points in the orthoimage;
  • Determining the neural network creation process to prepare the model that performs best in detecting the real underwater objects in the shallow water area.
The objects researched in this work were measuring disks used as ground control points (GCPs). This type of object was only experimental data for the possibility of using the method of creating composite images with simplified synthetic objects. There are also works available focusing on the detection of GCPs strictly for georeferencing of aerial images for land survey, which, however, was not the aim of this work. Additionally, those works used the real images of objects in detection process [43,44,45].
In this research, YOLO (You Only Look Once) networks were utilized as the detection model. YOLO networks are widely adopted and well documented in the scientific literature. Since the first version was introduced in 2015, YOLO models have evolved rapidly, with YOLOv3, released in 2018. The latest version, YOLO12, reflects the continued advancements in this technology [46,47,48,49,50,51,52,53,54,55]. In order to use the YOLO network on wide-area image data registered in spatial coordinate system, in the first stage of the study, we used ArcGIS software. In particular, this allowed us to automate the preparation of the test set (subdivision into smaller image chips) and then detect objects in the domain of a uniform image. Since the version, due to licensing restrictions, implements YOLO version v3, we verified the detection performance of this model with models on other models developed using YOLOv5, YOLOv6, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO12, using Python 3.10.12.
The YOLO series has become a leading framework for real-time object detection and has evolved over the years through successive improvements. Early YOLO systems provide a framework for the YOLO series, primarily from a model design perspective. YOLOv4 and YOLOv5 introduced CSPNet, a multi-feature scaling and augmentation framework. YOLOv6 further expands on these solutions by introducing BiC and SimCSPSPPF modules for the backbone and neck, as well as anchor-aided training. Subsequently, YOLOv7 introduced E-ELAN (Efficient Layer Aggregation Networks) for improved gradient flow and various “bag-of-freebies” features. YOLOv8 integrates an efficient C2f block for better feature extraction, while YOLOv9 introduces GELAN for architecture optimization and PGI for improved training. YOLOv10 employs NMS-free training with double assignment to improve performance, while YOLO11 further reduces latency and improves accuracy by using C3K2 module and uses lightweight depthwise separable convolution in the detection head. YOLO12 introduces an attention-centric architecture that departs from the traditional approach by improving network performance to achieve a compromise between the best possible results and real-time performance [54].
The research undertaken has relevance in terms of bridging the data availability gap for underwater object detection models, using simplified synthetic data as an alternative to fully synthetic and real data. The results may find applications in underwater monitoring, underwater archaeology, geographic research, environmental protection, and further development of neural detection methods based on this type of data.

2. Materials and Methods

In this section, various assumptions and parameters of the conducted research will be discussed. The research problems resulting from the application of deep learning, discussed in the previous chapter, led to the establishment of the course of the conducted research.

2.1. Research Methodology

In our research, a series of activities, presented in Figure 1, were carried out, the aim of which was to obtain ready-made deep learning models for use in searching for objects at the bottom of water reservoirs. The presented diagram constitutes the individual basic stages of work related to the preparation and processing of data in the study. Seven main elements have been identified, which will constitute subsections later in this work. These sections will discuss the most important elements, concepts, and abbreviations used in this article.

2.2. Input Data

The following image data were used for this study:
  • Orthophotos from the resources of the Polish geoportal [56]: two images showing the same part of Dąbie Lake located in northwestern Poland. The images had a ground resolution of 5 cm and were taken in 2021 and 2022. They show the area of a shallow-water bay in natural colors. These data were the basis for fusion with simplified synthetic objects.
  • Set of 129 images from the UAV photogrammetric mission: photos taken in October 2022. These photos show the previously mentioned area of Dąbie Lake in natural colors. There are 152 real GCPs visible in the entire set of photos. These data served as a test set to check the effectiveness of the detection models.
  • Orthophoto from a UAV photogrammetric mission: an orthoimage showing the same bay on the lake with a visible arrangement of 5 ground control points (GCPs). This image, taken in October 2022, had a ground resolution of 4.5 cm and a composition of natural colors. The CNN-detected GCPs in this study were utilized to evaluate positional accuracy by calculating the distance differences between the actual GCP coordinates and the bounding boxes generated by the neural network. This method allowed for a precise measurement of accuracy, highlighting the effectiveness of the CNN model in locating ground control points in the orthoimage.
The orthophotos used in the study and an example photo from the test set are shown in Figure 2.

2.3. Simplified Synthetic Object Generation

In this study, the generated objects represented the real ground control points (GCPs), which are well-recognizable objects in aerial imagery. The GCP used was a circular disk with a diameter of 1.1 m, featuring a four-field checkerboard pattern in two colors: black and orange. This type of GCP was visible in the UAV images and served as the basis for SSO generation.
The ground control points shown in Figure 3 have a digit visible on one of the fields and a distinct triangular notch in the orange area within the black field. These features of the GCP were not represented in the SSOs (Figure 4). This was because they were created based on the UAV orthophoto image. Due to the resolution and the influence of the water layer, the aforementioned features of the GCP are not visible in the imagery (Figure 3c) and were not reproduced on the SSOs.
The process of generating synthetic ground control points involved using GIMP (2.10.34) and Inkscape (0.92.0 r15299). The object was initially created in Inkscape as four quarters of a circle, two orange and two black, forming a checkerboard pattern, assuming the Circle tool without contour and with colored filling. The colors were picked using the Color Picker tool from the GCP image obtained from the drone.
After exporting the graphic to PNG format, the GCP image resolution was reduced to 22 by 22 pixels, matching the size of the GCP in the UAV images.
The next stage was the preparation of three types of artificial ground control points, which were processed using Opacity tools. Opacity has been set to 80% for all object types. In this way, a Type A GCP was created.
Based on this foundation, a background layer was added under the synthetic GCP with a color similar to the reservoir bed visible in Figure 3c (HTML: a58953), using a circular gradient that brightens outward (HTML: c8b694). This background shows through from under the partially transparent Type A GCP. In this way, a Type B GCP is obtained, which has the lowest transparency.
The last type of GCP was created by using GCP Type A as a base, followed by reducing the black opacity to 65%. After this adjustment, the blur was applied. The resulting objects are presented in Figure 4.
Despite the GCP being a highly characteristic object due to its regular circular shape and color scheme, simple graphic modifications were applied to enhance the quality of training data and illustrate the impact of the water layer above the object.
  • GCP Type A: described with the abbreviation Type A or just letter “A”, has partial transparency;
  • GCP Type B: described with the abbreviation Type B or just letter “B”, has a gradient that reduces transparency and gives a sandy color;
  • GCP Type C: described with the abbreviation Type C or just letter “C”, an object with the highest transparency, especially increased in the black area, and blurred edges.
The simplified synthetic objects vary slightly in their circularity, which can be influenced in underwater objects by water impurities, vegetation, sediment layers, or light reflections that blur the object’s contour.

2.4. Creating Image Data with SSOs

The creation of images, combining the base layer of orthoimages from 2021 and 2022 with the three types of ground control points, was conducted using Python scripts. The choice of underlying images influenced the diversity of synthetic ground control points due to variations in lighting and color saturation characteristic of these images. The general aim was to generate composite images containing a specific quantity of one type of synthetic GCP. Given that deep learning requires sufficient training data, the study employed four quantities of simplified synthetic GCPs on composite images: 500, 1000, 2000, and 4000. These SSOs were distributed across two images and are summarized in Table 1.
An example composite image created from GCP Type C with 500 objects on the 2021 orthoimage is presented in Figure 5. Due to the large extent of the terrain, the SSOs are not clearly visible, hence the fragment showing a close-up view of the objects.

Composite Image Generation Principles

The algorithm used for creating composite images aimed to expedite the process while adhering to specific assumptions regarding the placement of synthetic GCPs on the image:
  • Non-overlapping objects: ground control points must not overlap each other. This rule ensures that the simplified synthetic GCPs mimic the arrangement observed in the UAV imageries, where each GCP is distinctly positioned without overlap.
  • Random rotation of objects: each GCP was randomly rotated to simulate real-world variability. This rotation adds diversity to the simplified synthetic GCPs, reflecting the natural variability in the orientation of objects in the environment.
  • Objects distributed in water areas: the simplified synthetic ground control points were placed specifically in the water areas of the lake. This condition replicates the detection of underwater objects, ensuring that the synthetic images accurately represent the environment where the GCPs would be detected.
  • Inclusion of transparency in synthetic ground control points: simplified synthetic objects were created with transparency, represented by an alpha channel. This transparency allows the underlying orthoimage to influence the appearance of the simplified synthetic GCP, resulting in variations in coloration depending on the underlying surface.
The assumptions adopted for generating simplified synthetic images were aimed at testing the impact of different types of artificial GCPs in the same environment, on the same background data, and taking into account the basic properties of the detected object.

2.5. Preparation of Training Sequences

At the stage of preparing the training sequence, ArcGIS Pro 3.3.1 software was used along with the required extensions related to machine learning. The activities carried out were aimed at creating datasets for training deep networks on image chips, i.e., small images extracted from a larger source image. The source image in this case was the composite images obtained in the previous stage.
ArcGIS Pro includes a pre-defined tool called Export Training Data For Deep Learning, designed specifically for generating image chips. This tool allows users to define bounding boxes around the objects of interest and assign appropriate labels. In this context, the bounding box was configured as a circle with a diameter of 1.2 m, which encompassed the detected object along with a small margin. The marking process was automated using a script due to the large volume of data and the known coordinates of synthetic ground control points on the generated images.
The Export Training Data For Deep Learning tool in ArcGIS Pro facilitated the extraction of image chips with the specified bounding boxes and labels, providing datasets ready for further processing.

2.6. Network Training

The network training process was carried out with the Train Deep Learning Model tool. At this stage, it is possible to set a large number of hyper-parameters related to learning, the type of model used and hardware.
For the purposes of the experiment, an object detection model based on the convolutional neural network (CNN) architecture called YOLOv3 was used. The reason to use YOLOv3 in this study was due to its implementation in ArcGIS Pro software, which was the primary platform for conducting the research. It allows easy preparation of training datasets and performs object detection for large images.
YOLOv3 is based on the architecture of a CNN-type neural network and is known for its speed and effectiveness in real-time object detection. The model divides the image into a grid and creates bounding boxes and feature classes for each grid cell.
YOLOv3 consists of two main elements, i.e., a detection module and a deep base model (backbone model), which, in this case, is Darknet-53, consisting of 53 convolutional layers. The structure of this convolutional neural network is shown in Figure 6.

2.6.1. Description of Network Hyperparameters

In addition to selecting the detection model, the tool also has other settings available, including number of epochs, batch size, learning rate, chip size, data augmentation, and validation set size. The models created had the “Stop when model stops improving” option applied, which made them automatically stop the training process if the model started to improve. In practice, all models achieved a range of 60–100 training epochs. Learning rate was not specified; hence, the software adjusted the learning rate from the learning curve during the training process. The value determined by the software varied depending on the model and was, on average, learning rate of slice (0.00006, 0.0006). The chip size for models was set to 416 × 416 pixels, the validation set size was set to 10% (default value), and the batch size was 16. The monitoring method used was validation loss. Some of these settings are due to hardware limitations and the computing power of the computer. The device on which the tests were conducted had one GeForce RTX 3070 DUAL 8 GB V2 LHR GPU, an Intel Core i7-13700K (3.40 GHz, 16 total cores, 24 total threads) CPU, and 32 GB (DDR5, 6000 MHz) RAM.

2.6.2. Research Series Based on Data Augmentation

In this work, 36 models using the YOLOv3 architecture were created. All models can be divided into 3 series, differing in augmentation parameters:
  • 1 series: a set of 12 models for which no additional augmentation parameters have been set. This series is marked with the letter “N”.
  • 2 series: a set of 12 models in which some of the available augmentation parameters were used. These parameters were brightness, contrast, and zoom, all with basic software value settings. This series is marked with the letter “P”.
  • 3 series: a set of 12 models in which all available augmentation parameters were used. These parameters were brightness, contrast, zoom, rotate, and crop, all with basic software value settings. This series is marked with the letter “F”.
A total of 36 models were obtained in the study, differing in settings, type of artificial GCP, and the amount of training data. All obtained networks, along with a description of the elements on which they were based, are presented in Table 2.

2.7. Object Detection

The process of object detection was conducted using the Detect Objects Using Deep Learning tool. It utilized the 36 CNN-based models trained during the previous stages to detect objects on unmodified UAV images. The test set for all models was a set of 129 photos from a UAV mission (2022) and an orthophoto from another one (2022). There was a total of 152 GCPs to be detected in the set of images. The description of the test data is provided in Section 2.2. The results of this experiment are discussed in the results section and summarized in Table 3.
It should be noted that no data used in the process of testing the obtained models constituted a training set. The training set consisted only of composite images. Additionally, in the detection process, the area of each image was limited to the water area using masks. The aim of the research was to detect shallow-water objects, hence the limitation of detection only to the lake area.
In the detection process, it is possible to set parameters that determine how the created models work. The available and used options in the software are padding, threshold, nms overlap, batch size, non-maximum suppression, and exclude pad detections. In the study, padding was a quarter of the chip size value, i.e., 104, while the threshold value was set to 0.6. Nms overlap refers to the maximum overlap ratio of two objects and was set to 0.1. The batch size value was 4 due to hardware limitations. Non-maximum suppression limits the number of duplicate objects, and this setting was used in the study. The last parameter used was exclude pad detections, which limits detection on the edges of image chips.

3. Results

This section presents the obtained results of the detection models along with the basic method of testing their effectiveness. It also describes the method for evaluating the trained network and assessing its training level.

3.1. Validation and Testing

As mentioned earlier in this work, the Train Deep Learning Model tool allows for specifying the percentage of data allocated to the validation set, which is crucial for assessing the quality of network training. We used 10% of input data as validation datasets.
After completing the training process, the Train Deep Learning Model tool provides a model summary, which includes the following metrics:
  • Training loss: the average entropy loss function result computed across the training dataset;
  • Validation loss: the average entropy loss function result calculated on the validation dataset using the trained model at each epoch;
  • Average precision: the ratio of points in the validation data that were correctly classified by the model trained in a given epoch (true positives) to all points in the validation data.

3.2. Verification on Real Data

3.2.1. Performance Indicators of the Developed Models

The accuracy of detected objects was measured using the Compute Accuracy For Object Detection tool. The tool, based on the knowledge of the location of the proper bounding box of the detected object and the minimum Intersection over Union (IOU) parameter, counts detections, false detections, and non-detections. Based on these three parameters, the precision, recall, F1-score, and AP coefficients are calculated. The previously mentioned parameters can be described as follows:
  • Intersection over Union (IOU)—a fundamental metric in object detection, quantifying the overlap between the predicted and ground truth bounding boxes [57];
I O U = a r e a   o f   o v e r l a p a e r a   o f   u n i o n ,
2.
True positive (TP)—a correct detection of a ground-truth bounding box [57];
3.
False positive (FP)—an incorrect detection of a nonexistent object or a misplaced detection of an existing object [57];
4.
False negative (FN)—an undetected ground-truth bounding box [57];
5.
Precision—a parameter that is the percentage of correct positive predictions [57];
P r e c i s i o n = T P T P + F P = P r e d i c t i o n s   A c t u a l l y   P o s i t i v e T o t a l   P r e d i c t e d   P o s i t i v e ,
6.
Recall—a model parameter that is the percentage of correct positive predictions among all given ground truths [57];
R e c a l l = T P T P + F N = P r e d i c t i o n s   A c t u a l l y   P o s i t i v e T o t a l   A c t u a l   P o s i t i v e
7.
F1-score—the parameter is the weighted sum of recall and precision [38];
F 1 s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
8.
AP—average of the precision of all recall values between 0 and 1; the parameter is interpreted as finding the area under the precision–recall curve.
The IOU parameter in the Compute Accuracy For Object Detection tool was set to 10%. This means that only 10% of the detection bounding box must match the actual bounding box of the object. The value was deliberately underestimated because the study tested detection, and in the case of higher IOU settings, it was possible to miss some successful detections.
In the rest of the article, the F1-score is used as the main parameter of model performance, as it is a result of precision and recall. This parameter best represents the relationship between TP, FN, and FP.

3.2.2. Evaluation of Model Effectiveness

The evaluation of the effectiveness of models on real images was carried out on two sources of image data. First was the set of real images that consisted of 129 photos. A total of 152 GCPs could be detected, of which there were actually 5 in the study field. UAV photos allow for obtaining diversity due to different angles of taking photos in relation to ground control points, as well as reflection of light from water and its ripples. Table 3 presents a summary of the study performed on a set of 129 photos from a UAV flight (2022).
Figure 7 shows one of the photos from the test set. The size of the detected object relative to the image is visible. To better demonstrate how the network frames the detection, a fragment of the image has been enlarged.

3.3. Analysis of the Accuracy of Determining the Location of Detected Objects

The orthophoto, which allows for the measurement of geospatial accuracy and includes five ground control points (GCPs), was primarily used in this study as a source of information on the average deviation in detection relative to points with known locations. Based on the obtained boundary frames and calculations of their centers, the distances to the actual centers of ground control points were determined. Information on the basic parameters of the detection results together with the deviations of individual GCPs (dmean) and the average value of the total deviation of individual models (Δdmean) is included in Table 4 for 36 YOLOv3 models. True positive values indicate the number of ground control points detected by the model out of five possible and from what number of results the average model deviation would be calculated.
Table 4 shows that not all models were able to calculate the deviation, which is due to the lack of detection of these models. In total, the models detected 114 ground control points out of 180 possible to detect. The deviation values in the obtained results ranged from 1.03 to 43.26 cm, and the average value was 12.15 cm, while the median was 10.71 cm.
Figure 8 shows the bonding boxes obtained with the YC1000P model, as well as the arrangement of ground control points in the real image. The YC1000P model is an example of how detection models work and was selected based on data from Table 3.

3.3.1. Analysis of GCP Detection on Orthoimage

The number of TPs of individual ground control points in the entire series was compared. Such statistics allow for checking the influence of the applied augmentations on the detection of specific GCPs. It is possible to estimate whether certain GCPs are not detected significantly less often than others, which could be caused by the environment at the location of the objects or the influence of the water layer above the detected objects. This comparison is presented in the graph in Figure 9.
Based on the TP value graph of the model series, it can be seen that GCP#1, located closest to the lake shore, was detected most often in all series. As we move further into the lake, the number of TPs decreases, although there is one exception, namely the GCP#4 object. This ground control point also had a much better detection among all series. This may be due to the high contrast of the object to the background, because this GCP was located on the darkest lake bottom of all the objects. It is likely that the influence of the dark bottom makes this the only case where series 1 performed better than series 3.
It was also observed that partial augmentation proved to be the most effective in this scenario. This approach resulted in improved detection accuracy, particularly in challenging conditions.

3.3.2. Analysis of Deviations from Detected Objects

The next comparison of the results of the entire series of models was the average deviation obtained by them on individual GCPs. The results of this comparison are presented in Figure 10.
The values read from the series mean deviation graph are not as closely related as in the case of the TP values from Figure 9. Figure 10 indicates that the GCP#1 object has the lowest mean deviation values, which is consistent with the results of Figure 9. It is interesting that the least detected ground control point GCP#5 has the second-best mean deviation value.
As above, we also observed that partial augmentation proved to be the most effective in this scenario, resulting in lesser deviations and better location of detected objects.

3.4. Influence of the Size of the Training Sequence

The study showed an effect of training set size on the model’s ability to detect ground control points, even when the results are different across the series. Potentially, a larger training set means more data on which CNNs can improve their detection performance, but the results do not indicate such a correlation. Figure 11 shows the effect of training set size on the obtained F1-scores and is based on the data in Table 3.
The F1-score changes shown in Figure 11 vary across models. The calculated F1-score means showed that all models achieved the highest efficiency when the number of training data (SSOs) was 1000 and 2000 (0.76 and 0.75, respectively). Large fluctuations in network performance make it difficult to clearly assess the impact of training set size. The F1-score increases with the amount of training data, although this increase clearly decreases with the largest amount of data (4000). This may be due to model overfitting, although the training loss + validation loss vs. time plots provided by the software do not indicate this. Potential overfitting led to model oversensitivity, which for most models trained on 4000 simplified artificial objects showed higher FP value than the other three models on these sets.

3.5. Influence of the Augmentation Method

The comparison of all 36 CNN models from Table 3 was restricted by a filter assuming the efficiency of 0.85 in all parameters (precision, recall, F1-score, and AP). The obtained results were sorted by F1-score value and are presented in Table 5.
The requirement of 0.85 efficiency in all parameters was met by seven YOLOv3 networks. These models come from two research series. The first four models were created by partial augmentation, while the remaining three were created with full augmentation available in ArcGIS Pro.
The seven distinguished models constitute 19.4% of all models created. Highly effective models were created in one out of five cases. The lack of any model obtained without additional augmentation is very noticeable here. The first non-augmented model takes only 11th place among all models.
Two conclusions can be drawn from this comparison regarding the impact of augmentation on training dataset with SSOs:
  • The augmentation of detection networks positively affects the effectiveness of the obtained models.
  • Partial augmentation gives better results for this study.
Although comparing the same models in different series (the same type of artificial GCP and size of training data) gives very different results, in some cases, augmentation improved the model performance by dozens of percentage points. The second conclusion may result from the fact that in the case of these studies, additional augmentation could lead to the less efficient detection by the neural model. Such an unwanted effect of augmentation may explain the greater number of FP detections. The advantage of augmenting training data in the network training process is also indicated in Figure 9 and Figure 10. The number of TPs and the average deviation of individual series calculated on the basis of the orthophoto clearly indicate that the performance of partially augmented models is better than the remaining series. Series 2 detects objects most effectively, while series 1 is the worst in the comparison.

3.6. Influence of the Type of Simplified Synthetic Object

Studies on the effectiveness of YOLOv3 models depending on the type of simplified synthetic object used provide unambiguous results. Figure 11 clearly shows that the F1-score values break down in the middle parts of each of the three series of models. The results of networks trained on GCP Type B are in the range of the lowest F1-score values. This is best seen in series 1, not augmented, where three out of four models trained on Type B have clearly low F1-score values. In Table 5, presenting only the seven best models, there is only one model trained on the basis of GCP Type B. This model also takes the penultimate position in this ranking. The calculated F1-score average of all 36 models with respect to the type of simplified synthetic object is as follows:
  • Type A: F1-score coefficient value equal to 0.76;
  • Type B: F1-score coefficient value equal to 0.50;
  • Type C: F1-score coefficient value equal to 0.82.
However, if the number of TPs detected by the 36 YOLOv3 models is summarized according to the artificial GCP type, then
  • Type A: 1270;
  • Type B: 759;
  • Type C: 1478;
The reported detection results in the form of F1-score coefficient and true positive (TP) values allow for the creation of a ranking of simplified synthetic objects, where Type C is the most effective, while Type B is the least effective. All synthetic objects were created with the aim of the most accurate possible representation of ground control points photographed in UAV photos. Despite this, the model results differ drastically depending on the simplified synthetic object. Research should be carried out on a much larger set of synthetic objects to better understand the relationships affecting the operation of ready-made detection networks.
However, referring to the process of preparing the Type B and C objects themselves, it should be noted that both simplified presentations are extensions of Type A. GCP Type A is in the middle position in terms of the effectiveness of trained models. The changes made when creating Type B worsened the effectiveness, while the modifications within Type C increased it. The main difference between these simplified synthetic objects is transparency: Type B has the lowest transparency, while Type C has the highest. Transparency can be crucial in achieving significant diversity of training data and significantly affect the subsequent performance of ready-made detection networks.

3.7. Test YOLO Models Outside the ArcGIS Environment

In order to check the effectiveness of YOLOv3 available in ArcGIS Pro against newer models, tests were carried out on subsequent versions of YOLO up to the current latest version YOLO12. The tests were carried out on versions YOLOv5, YOLOv6, YOLOv8, YOLOv9, YOLOv10, YOLO11, and YOLO12.
All models used in the Python environment were L-type models, except for YOLOv9, for which this version is not available, and C-type was used instead. The device on which the tests were conducted had one GeForce RTX 4090 GPU, an Intel Core i7-14900K CPU, and 128 GB (DDR5, 5400 MHz) RAM. The same training sets as in YOLOv3 were used in model training, also with 10% test and validation sets. In all cases, the chip size was 416 × 416 pixels. The hyperparameters of the model training process are listed in Table 6.
The hyperparameters were intended to reflect as closely as possible those used in series 2 (partial augmentation) in ArcGIS PRO.
As in the case of studies in the ArcGIS Pro environment, masks were used to limit the test only to the water area. The evaluation of the models’ effectiveness was performed on a set of 152 real photos from the UAV flight. In total, it was possible to detect 152 GCPs in the set, of which 5 were distributed in the study area. In the case of tests of newer models, the effectiveness test was conducted on the same images as in the case of YOLOv3 but cropped to 416-pixel size, i.e., the same size on which the models were trained.
In total, the tests of seven higher versions of YOLO resulted in 84 models whose detection performance is presented in Table 7, which is included in Appendix A. Figure 12 and Figure 13 were created based on the results from Table 3 and Table A1. The first of these figures, Figure 12, shows the heat map of the F1-score of all 120 models created in this study.
The test shows that newer versions of YOLO are not necessarily characterized by much better SSO detection efficiency than YOLOv3. In the case of the best YOLOv3 model (YC2000P), the obtained F1-score parameter was 0.96, while among the new 84 models, only the YOLO12 4000C model exceeded this value (0.97), and 4 models were equally effective. Similar values, equaled minimum to 0.95, were obtained by YOLOv5 (range 0.95–0.96), YOLOv8 (0.96), YOLOv9 (0.95), and YOLOv10 (0.96).
The compiled results clearly indicate the highest effectiveness of SSO Type C, which shows much better detection in all YOLO models except YOLOv6, where Type A is the most effective, as shown in Figure 13. It should be noted, however, that the results of YOLOv6 differ significantly from the rest of the YOLO models due to the high sensitivity of these models and a very large number of FPs, despite the use of a mask for the water area in the study. After calculating the average F1-score of the models against C-type SSOs, the best-performing three YOLO versions are YOLOv5 (0.95), YOLOv9 (0.94), and partially augmented (P) YOLOv3 (0.91).
The prepared 84 models were also used to analyze the accuracy of object localization in a similar way as in Section 3.3. In this case, the location of the center of the bounding box relative to the GCP point was analyzed. This analysis was not performed on the orthophotomap but rather on its slices of 416 × 416 pixels. All obtained results are included in Appendix A, Table A2.
Table 7 is based on the data from Table A2 and presents the five selected models relative to the best-performing YOLO higher-level models obtained in this work so far.
Based on Table 7, it can be seen that in the case of this method of determining the distance deviation, the determined distance is 2–3 times more accurate than the data obtained by the YOLOv3 models with partial augmentation. The calculated values represent the distance corresponding to less than one pixel on an orthophotomap with a spatial resolution of 4.5 cm.
The data from the table confirm again the advantage of SSO Type C. Models trained on other types of SSOs obtain almost no results, except for models based on YOLOv6. YOLOv6, in the case of five images from the orthophotomap, also shows very high recall but also low precision due to the large number of FPs.
The resulting 84 models achieved very high detection rates very quickly during the training process and required a small number of epochs to train the model. An example of the parameters obtained during the training process is included in Appendix B, Figure A1.

4. Discussion

The research conducted presents a methodology for creating composite images with simplified synthetic objects (SSOs) designed for object detection purposes. Through the development of 36 detection models in ArcGIS Pro and 84 in Python environment, it has been confirmed that utilizing a CNN trained on these composite images with SSOs effectively detects objects on the bottom of shallow inland water areas.
The study focused on recreating objects using a graphic form to create SSOs that closely resemble the appearance of objects in UAV images. This concept is well illustrated in Figure 3 and Figure 4, where the synthetic ground control points (GCPs) represent not actual GCPs but their appearance as captured in UAV images. The literature review discusses various methods researchers use to create synthetic objects, including generating 3D models [28,29,30] and adding detected objects from other images to background scenes [31,32,34]. Notable works by Hinterstoisser et al. and Huh et al. illustrate graphic modification techniques such as blurring, light adjustment, and object positioning, which increase training data volume and enhance detection performance across different environments. Unlike these studies, our research focused on creating SSOs that closely resemble the GCPs seen in UAV images, with backgrounds reflecting the actual operational area of the UAV. This approach presents an alternative method of data creation, allowing for quick, practical application without complex, realistic data simulations.
A clear advantage of proposed method is its ability to precisely locate detected objects, supporting applications in object mapping. This precise location capability enhances spatial analysis and broadens the use of SSOs. Deviations in bounding box centers from actual GCP centers were assessed using orthophotos, with the results of all 36 YOLOv3 models ranging from 1 to 43 cm. The top four models had deviations between 8 and 10 cm. The Python versions of YOLO performed better, with deviations ranging from 0 to 30 cm. However, the average value for these 84 models was less than 4 cm. Further performance improvements in the CNN models could potentially improve these results, although at this stage, the offsets are only 1–3 pixels relative to the image resolution.
Additionally, the presented synthetic image generation process, automated with Python scripts, enables creating large amounts of specialized training data for specific objects and environments. Model performance may decrease when SSOs are detected in different terrains, yet the ease of background modification and synthetic object creation should allow for rapid adaptation to new study requirements. This methodology also facilitates the swift development of specialized detection models, which could be beneficial in emergency situations requiring object detection in aquatic environments, including even human detection.
The primary limitation of this method is the spatial resolution, impacting the minimum detectable object size. Objects smaller than the spatial resolution may not be captured, and slightly larger objects may lack sufficient pixel density for effective deep learning recognition. While UAV images enable relatively small object detection, higher resolutions also increase background complexity, potentially complicating detection model performance. Additionally, environmental factors, such as changing water and lighting conditions [58], can significantly reduce water transparency, hindering or partially obstructing object visibility to optical sensors. In extreme cases, the object may simply be invisible to the optical system.
During these studies, it was proven that YOLOv3, YOLOv5, YOLOv8, YOLOv9, and YOLOv10 are good detectors of elaborated SSOs of Type C. Taking into account overall performance, YOLOv5 achieve the higher average value (0.95) and achieved very good GCP detection in the orthophotomap test, in particular detecting all five GCPs in the SSO Type C trained models. In this overview, however, it is worth noting that YOLOv3 may have a fundamental advantage, related to its ability to be implemented within acceptable licensing terms of the software under development. The models used achieved high detection rates despite the very rapid training and network adaptation process. This is likely due to the low diversity of the datasets, the size of the object, and its graphical features. This issue should be addressed in future work.
This study specifically targeted the detection of a distinctive object like ground control point, characterized by their round shape, color pattern, and uniform size. Based on current observations, special attention should also be paid to the influence of the background, and therefore, in future work, the training set should be expanded to include more diverse background on the images. Future research should pertain to another objects, such water pollution (including plastics), animal habitats, boulders, and various anthropogenic ones. It can certainly be a further challenge to explore different configurations of learning sets, e.g., consisting in part of simplified synthetic and real data.
Such work would introduce also new challenges related to using alternative types of simplified synthetic data. A promising research avenue could involve generating super-resolution images with generative adversarial networks [59,60,61] as foundational data for real object detection, particularly in shallow water areas, potentially applicable across large areas covered by satellite scenes.
Next research should also focus on developing models based on various YOLO versions, incorporating real-time detection, and use of the technology Internet of Things [62] in terms of using processed image data. In addition to models based on the YOLO architecture, it is possible to extend the research to other detection models. This would allow us to determine the effectiveness of the proposed methodology related to simplified synthetic objects on other CNNs and frameworks.
Future research should also investigate the impact of interferences resulting from the aquatic environment (turbidity, changes in water bio-optical properties, and changes in lighting) and the detected object (shape, regularity, texture, and light reflectivity). It will also be important to introduce multi-variant SSOs to increase training datasets and counteract dynamic environmental changes. Future research will allow for better identification of the limitations of the methodology presented in this work and areas for further development.

5. Conclusions

This article describes a method of generating simplified synthetic objects for training YOLO networks. The main focus is on the object’s compliance with its presentation obtained in UAV images. Data variability is obtained by using SSO transparency and the influence of the background. The data generation process is automated by processing a pipeline with automatic placement, rotation, and generation of object centers, which helps in their labeling. The efficiency and applicability of the method using GCPs are demonstrated. Synthetic data generated for object recognition are successfully used to train neural networks, taking into account the influence of network augmentation.
Although not all models achieved satisfactory detection test results on a UAV image set, the top five models—YOLOv12 4000C, YOLOv8 4000C, YOLOv8 2000C, YOLOv5 1000C, and YOLOv5 500C—exhibited very high precision, recall, and F1-score values, each exceeding 0.93 in efficiency. The results of the 120 models presented in this paper show that the best models were created based on SSO Type C, using partial augmentation and a training set of 2000 objects. In the full set of tests, the most effective version of YOLO was version 5. Particularly effective were the models obtained using a simplified synthetic object C, which was most dependent on the background image due to its high transparency. This type statistically achieves twice the detection efficiency of other SSO types. Research confirms the better efficiency of detection models trained on SSOs with greater convergence with the real object, while larger simplifications are less effective.
The research indicates the possibility of using simplified synthetic objects as an alternative to real or fully synthetic data, when there is a lack of data with real images to create training dataset. In addition, proposed method can be used when acquisition of the real data is too time consuming or when is required a short time to elaboration a detection model.

Author Contributions

Conceptualization, D.K. and J.L.; methodology, D.K. and J.L.; software, D.K. and P.A.; validation, D.K. and P.A.; formal analysis, D.K. and J.L.; investigation, D.K., J.L., and P.A.; resources, D.K., J.L., and P.A.; data curation, D.K. and J.L.; writing—original draft preparation, D.K.; writing—review and editing, D.K. and J.L.; visualization, D.K.; supervision, J.L.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research outcome was financed by a subsidy from the Polish Ministry of Education and Science for statutory activities at the Maritime University of Szczecin, grant no. 2/s/KGiH/25.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

The summary of the 84 models obtained in the Python environment is given in Table A1.
Table A1. Model detection results of newer versions of YOLO on the image set.
Table A1. Model detection results of newer versions of YOLO on the image set.
Short Name of the ModelTPFPFNPrecisionRecallF1-Score
YOLOv5 500A3331190.920.220.35
YOLOv5 1000A2001321.000.130.23
YOLOv5 2000A2431280.890.160.27
YOLOv5 4000A801441.000.050.10
YOLOv5 500B811440.890.050.10
YOLOv5 1000B1721350.890.110.20
YOLOv5 2000B3001221.000.200.33
YOLOv5 4000B4731050.940.310.47
YOLOv5 500C1422100.990.930.96
YOLOv5 1000C144480.970.950.96
YOLOv5 2000C1382140.990.910.95
YOLOv5 4000C1426100.960.930.95
YOLOv6 500A1482040.880.970.93
YOLOv6 1000A1484840.760.970.85
YOLOv6 2000A1482540.860.970.91
YOLOv6 4000A1492630.850.980.91
YOLOv6 500B15142310.260.990.42
YOLOv6 1000B151345610.040.990.08
YOLOv6 2000B150156220.090.990.16
YOLOv6 4000B151162010.090.990.16
YOLOv6 500C15178110.160.990.28
YOLOv6 1000C149216830.060.980.12
YOLOv6 2000C14932330.320.980.48
YOLOv6 4000C14970530.170.980.30
YOLOv8 500A701451.000.050.09
YOLOv8 1000A1901331.000.130.22
YOLOv8 2000A811440.890.050.10
YOLOv8 4000A411480.800.030.05
YOLOv8 500B801441.000.050.10
YOLOv8 1000B3321190.940.220.35
YOLOv8 2000B1631360.840.110.19
YOLOv8 4000B2051320.800.130.23
YOLOv8 500C1106420.950.720.82
YOLOv8 1000C1243280.980.820.89
YOLOv8 2000C146560.970.960.96
YOLOv8 4000C147650.960.970.96
YOLOv9 500A1411380.930.090.17
YOLOv9 1000A201501.000.010.03
YOLOv9 2000A2111310.950.140.24
YOLOv9 4000A1011420.910.070.12
YOLOv9 500B3021220.940.200.33
YOLOv9 1000B4821040.960.320.48
YOLOv9 2000B1881340.690.120.20
YOLOv9 4000B3041220.880.200.32
YOLOv9 500C1441080.940.950.94
YOLOv9 1000C14111110.930.930.93
YOLOv9 2000C1403120.980.920.95
YOLOv9 4000C144780.950.950.95
YOLOv10 500A3441180.890.220.36
YOLOv10 1000A411480.800.030.05
YOLOv10 2000A321490.600.020.04
YOLOv10 4000A101511.000.010.01
YOLOv10 500B1411380.930.090.17
YOLOv10 1000B1111410.920.070.13
YOLOv10 2000B1511370.940.100.18
YOLOv10 4000B4211100.980.280.43
YOLOv10 500C1188340.940.780.85
YOLOv10 1000C1233290.980.810.88
YOLOv10 2000C144580.970.950.96
YOLOv10 4000C1399130.940.910.93
YOLO11 500A211500.670.010.03
YOLO11 1000A201501.000.010.03
YOLO11 2000A301491.000.020.04
YOLO11 4000A611460.860.040.08
YOLO11 500B611460.860.040.08
YOLO11 1000B611460.860.040.08
YOLO11 2000B1351390.720.090.15
YOLO11 4000B1071420.590.070.12
YOLO11 500C1182340.980.780.87
YOLO11 1000C13862140.690.910.78
YOLO11 2000C1194330.970.780.87
YOLO11 4000C1379150.940.900.92
YOLO12 500A1921330.900.130.22
YOLO12 1000A1021420.830.070.12
YOLO12 2000A611460.860.040.08
YOLO12 4000A801441.000.050.10
YOLO12 500B401481.000.030.05
YOLO12 1000B001520.000.000.00
YOLO12 2000B1911330.950.130.22
YOLO12 4000B1361390.680.090.15
YOLO12 500C1283240.980.840.90
YOLO12 1000C1215310.960.800.87
YOLO12 2000C1249280.930.820.87
YOLO12 4000C147450.970.970.97
The calculations of the deviations of the center position relative to the five GCPs from the orthophotomap obtained for the newer YOLO versions of the models are presented in Table A2.
Table A2. Network detection results on orthophotomap images along with deviations on individual GCPs and calculated average deviation of 84 YOLOv5–12 models.
Table A2. Network detection results on orthophotomap images along with deviations on individual GCPs and calculated average deviation of 84 YOLOv5–12 models.
Short Name of the ModelTPGCP#1
dmean [cm]
GCP#2
dmean [cm]
GCP#3
dmean [cm]
GCP#4
dmean [cm]
GCP#5
dmean [cm]
Δdmean [cm]
YOLOv5 500A0
YOLOv5 1000A130.2730.27
YOLOv5 2000A0
YOLOv5 4000A0
YOLOv5 500B0
YOLOv5 1000B0
YOLOv5 2000B0
YOLOv5 4000B32.255.033.64
YOLOv5 500C53.182.255.038.113.184.35
YOLOv5 1000C52.252.253.188.117.124.58
YOLOv5 2000C53.182.253.187.122.253.60
YOLOv5 4000C53.182.253.188.112.253.80
YOLOv6 500A50.000.004.502.253.181.99
YOLOv6 1000A53.182.257.125.032.253.97
YOLOv6 2000A53.182.257.125.033.184.15
YOLOv6 4000A50.002.256.366.756.754.42
YOLOv6 500B52.252.252.252.252.252.25
YOLOv6 1000B53.184.503.183.182.253.26
YOLOv6 2000B53.182.255.033.182.253.18
YOLOv6 4000B53.185.034.502.253.183.63
YOLOv6 500C53.186.362.253.183.183.63
YOLOv6 1000C55.033.183.183.182.253.37
YOLOv6 2000C56.360.005.035.034.504.19
YOLOv6 4000C54.502.253.185.034.503.89
YOLOv8 500A0
YOLOv8 1000A0
YOLOv8 2000A0
YOLOv8 4000A0
YOLOv8 500B0
YOLOv8 1000B0
YOLOv8 2000B0
YOLOv8 4000B0
YOLOv8 500C33.182.254.503.31
YOLOv8 1000C43.182.252.258.113.95
YOLOv8 2000C53.182.253.187.125.034.15
YOLOv8 4000C52.252.252.255.033.182.99
YOLOv9 500A0
YOLOv9 1000A0
YOLOv9 2000A0
YOLOv9 4000A0
YOLOv9 500B13.183.18
YOLOv9 1000B20.004.502.25
YOLOv9 2000B0
YOLOv9 4000B12.252.25
YOLOv9 500C42.252.253.188.113.95
YOLOv9 1000C53.182.253.188.112.253.80
YOLOv9 2000C52.252.252.257.123.183.41
YOLOv9 4000C52.252.253.187.122.253.41
YOLOv10 500A0
YOLOv10 1000A0
YOLOv10 2000A0
YOLOv10 4000A0
YOLOv10 500B0
YOLOv10 1000B0
YOLOv10 2000B0
YOLOv10 4000B0
YOLOv10 500C42.252.253.187.123.70
YOLOv10 1000C32.252.259.553.51
YOLOv10 2000C52.252.252.255.033.182.99
YOLOv10 4000C53.182.252.258.113.183.80
YOLO11 500A0
YOLO11 1000A0
YOLO11 2000A0
YOLO11 4000A0
YOLO11 500B0
YOLO11 1000B0
YOLO11 2000B0
YOLO11 4000B0
YOLO11 500C24.504.504.50
YOLO11 1000C52.252.255.036.363.183.82
YOLO11 2000C53.182.253.185.032.253.18
YOLO11 4000C53.182.253.187.123.183.78
YOLO12 500A0
YOLO12 1000A0
YOLO12 2000A0
YOLO12 4000A0
YOLO12 500B0
YOLO12 1000B0
YOLO12 2000B0
YOLO12 4000B0
YOLO12 500C53.184.503.187.129.285.45
YOLO12 1000C42.252.253.186.363.51
YOLO12 2000C53.182.253.187.125.034.15
YOLO12 4000C53.182.253.185.033.183.37

Appendix B

Example graphs of the training phase of 84 models in the Python environment are shown in Figure A1.
Figure A1. YOLO12 1000C network training stage graphs: (a) change in box loss coefficient in subsequent training epochs; (b) change in class loss coefficient in subsequent training epochs; (c) change in distributed focal loss coefficient in subsequent training epochs; (d) change in box loss coefficient in subsequent validation epochs; (e) change in class loss coefficient in subsequent validation epochs; (f) change in distributed focal loss coefficient in subsequent validation epochs; (g) change in precision parameter in subsequent training epochs; (h) change in recall parameter in subsequent training epochs; (i) change in mean average precision parameter at IoU threshold 0.5 in subsequent training epochs; (j) change in mean average precision parameter at IoU threshold 0.95 in subsequent training epochs.
Figure A1. YOLO12 1000C network training stage graphs: (a) change in box loss coefficient in subsequent training epochs; (b) change in class loss coefficient in subsequent training epochs; (c) change in distributed focal loss coefficient in subsequent training epochs; (d) change in box loss coefficient in subsequent validation epochs; (e) change in class loss coefficient in subsequent validation epochs; (f) change in distributed focal loss coefficient in subsequent validation epochs; (g) change in precision parameter in subsequent training epochs; (h) change in recall parameter in subsequent training epochs; (i) change in mean average precision parameter at IoU threshold 0.5 in subsequent training epochs; (j) change in mean average precision parameter at IoU threshold 0.95 in subsequent training epochs.
Remotesensing 17 02707 g0a1

References

  1. Li, Z.; Wang, Y.; Zhang, N.; Zhang, Y.; Zhao, Z.; Xu, D.; Ben, G.; Gao, Y. Deep Learning-Based Object Detection Techniques for Remote Sensing Images: A Survey. Remote Sens. 2022, 14, 2385. [Google Scholar] [CrossRef]
  2. Dunstan, A.; Robertson, K.; Fitzpatrick, R.; Pickford, J.; Meager, J. Use of Unmanned Aerial Vehicles (UAVs) for Mark-Resight Nesting Population Estimation of Adult Female Green Sea Turtles at Raine Island. PLoS ONE 2020, 15, e0228524. [Google Scholar] [CrossRef]
  3. Feng, J.; Jin, T. CEH-YOLO: A Composite Enhanced YOLO-Based Model for Underwater Object Detection. Ecol. Inform. 2024, 82, 102758. [Google Scholar] [CrossRef]
  4. Sineglazov, V.; Savchenko, M. Comprehensive Framework for Underwater Object Detection Based on Improved YOLOv8. Electron. Control Syst. 2024, 1, 9–15. [Google Scholar] [CrossRef]
  5. Martin, J.; Eugenio, F.; Marcello, J.; Medina, A. Automatic Sun Glint Removal of Multispectral High-Resolution Worldview-2 Imagery for Retrieving Coastal Shallow Water Parameters. Remote Sens. 2016, 8, 37. [Google Scholar] [CrossRef]
  6. Cao, N. Small Object Detection Algorithm for Underwater Organisms Based on Improved Transformer. J. Phys. Conf. Ser. 2023, 2637, 012056. [Google Scholar] [CrossRef]
  7. Mathias, A.; Dhanalakshmi, S.; Kumar, R.; Narayanamoorthi, R. Deep Neural Network Driven Automated Underwater Object Detection. Comput. Mater. Contin. 2022, 70, 5251–5267. [Google Scholar] [CrossRef]
  8. Dakhil, R.A.; Khayeat, A.R.H. Review on Deep Learning Techniques for Underwater Object Detection. In Proceedings of the Data Science and Machine Learning, Academy and Industry Research Collaboration Center (AIRCC), Copenhagen, Denmark, 17–18 September 2022; pp. 49–63. [Google Scholar]
  9. Zheng, M.; Luo, W. Underwater Image Enhancement Using Improved CNN Based Defogging. Electronics 2022, 11, 150. [Google Scholar] [CrossRef]
  10. Hu, K.; Weng, C.; Zhang, Y.; Jin, J.; Xia, Q. An Overview of Underwater Vision Enhancement: From Traditional Methods to Recent Deep Learning. J. Mar. Sci. Eng. 2022, 10, 241. [Google Scholar] [CrossRef]
  11. Mogstad, A.A.; Johnsen, G.; Ludvigsen, M. Shallow-Water Habitat Mapping Using Underwater Hyperspectral Imaging from an Unmanned Surface Vehicle: A Pilot Study. Remote Sens. 2019, 11, 685. [Google Scholar] [CrossRef]
  12. Grządziel, A. Application of Remote Sensing Techniques to Identification of Underwater Airplane Wreck in Shallow Water Environment: Case Study of the Baltic Sea, Poland. Remote Sens. 2022, 14, 5195. [Google Scholar] [CrossRef]
  13. Goodwin, M.; Halvorsen, K.T.; Jiao, L.; Knausgård, K.M.; Martin, A.H.; Moyano, M.; Oomen, R.A.; Rasmussen, J.H.; Sørdalen, T.K.; Thorbjørnsen, S.H. Unlocking the Potential of Deep Learning for Marine Ecology: Overview, Applications, and Outlook. ICES J. Mar. Sci. 2022, 79, 319–336. [Google Scholar] [CrossRef]
  14. Liu, Y.; An, B.; Chen, S.; Zhao, D. Multi-target Detection and Tracking of Shallow Marine Organisms Based on Improved YOLO v5 and DeepSORT. IET Image Process. 2024, 18, 2273–2290. [Google Scholar] [CrossRef]
  15. Flynn, K.; Chapra, S. Remote Sensing of Submerged Aquatic Vegetation in a Shallow Non-Turbid River Using an Unmanned Aerial Vehicle. Remote Sens. 2014, 6, 12815–12836. [Google Scholar] [CrossRef]
  16. Taddia, Y.; Russo, P.; Lovo, S.; Pellegrinelli, A. Multispectral UAV Monitoring of Submerged Seaweed in Shallow Water. Appl. Geomat. 2020, 12, 19–34. [Google Scholar] [CrossRef]
  17. Chabot, D.; Dillon, C.; Shemrock, A.; Weissflog, N.; Sager, E.P.S. An Object-Based Image Analysis Workflow for Monitoring Shallow-Water Aquatic Vegetation in Multispectral Drone Imagery. ISPRS Int. J. Geo-Inf. 2018, 7, 294. [Google Scholar] [CrossRef]
  18. Feldens, P. Super Resolution by Deep Learning Improves Boulder Detection in Side Scan Sonar Backscatter Mosaics. Remote Sens. 2020, 12, 2284. [Google Scholar] [CrossRef]
  19. Von Rönn, G.; Schwarzer, K.; Reimers, H.-C.; Winter, C. Limitations of Boulder Detection in Shallow Water Habitats Using High-Resolution Sidescan Sonar Images. Geosciences 2019, 9, 390. [Google Scholar] [CrossRef]
  20. Román, A.; Tovar-Sánchez, A.; Gauci, A.; Deidun, A.; Caballero, I.; Colica, E.; D’Amico, S.; Navarro, G. Water-Quality Monitoring with a UAV-Mounted Multispectral Camera in Coastal Waters. Remote Sens. 2022, 15, 237. [Google Scholar] [CrossRef]
  21. Matsui, K.; Shirai, H.; Kageyama, Y.; Yokoyama, H. Improving the Resolution of UAV-Based Remote Sensing Data of Water Quality of Lake Hachiroko, Japan by Neural Networks. Ecol. Inform. 2021, 62, 101276. [Google Scholar] [CrossRef]
  22. Yan, Y.; Wang, Y.; Yu, C.; Zhang, Z. Multispectral Remote Sensing for Estimating Water Quality Parameters: A Comparative Study of Inversion Methods Using Unmanned Aerial Vehicles (UAVs). Sustainability 2023, 15, 10298. [Google Scholar] [CrossRef]
  23. Rajpura, P.S.; Bojinov, H.; Hegde, R.S. Object Detection Using Deep CNNs Trained on Synthetic Images. arXiv 2017, arXiv:1706.06782v2. [Google Scholar]
  24. Ayachi, R.; Afif, M.; Said, Y.; Atri, M. Traffic Signs Detection for Real-World Application of an Advanced Driving Assisting System Using Deep Learning. Neural Process Lett. 2020, 51, 837–851. [Google Scholar] [CrossRef]
  25. Singh, K.; Navaratnam, T.; Holmer, J.; Schaub-Meyer, S.; Roth, S. Is Synthetic Data All We Need? Benchmarking the Robustness of Models Trained with Synthetic Images. arXiv 2024, arXiv:2405.20469v2. [Google Scholar]
  26. Greff, K.; Belletti, F.; Beyer, L.; Doersch, C.; Du, Y.; Duckworth, D.; Fleet, D.J.; Gnanapragasam, D.; Golemo, F.; Herrmann, C.; et al. Kubric: A Scalable Dataset Generator. arXiv 2022, arXiv:2203.03570. [Google Scholar]
  27. Guerneve, T.; Mignotte, P. Expect the unexpected: A man-made object detection algorithm for underwater operations in unknown environments. In Proceedings of the International Conference on Underwater Acoustics 2024, Institute of Acoustics, Bath, UK, 31 May 2024. [Google Scholar]
  28. Hinterstoisser, S.; Pauly, O.; Heibel, H.; Marek, M.; Bokeloh, M. An Annotation Saved Is an Annotation Earned: Using Fully Synthetic Training for Object Instance Detection. arXiv 2019, arXiv:1902.09967. [Google Scholar]
  29. Josifovski, J.; Kerzel, M.; Pregizer, C.; Posniak, L.; Wermter, S. Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 6269–6276. [Google Scholar]
  30. Fabbri, M.; Braso, G.; Maugeri, G.; Cetintas, O.; Gasparini, R.; Osep, A.; Calderara, S.; Leal-Taixe, L.; Cucchiara, R. MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking? In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 10829–10839. [Google Scholar]
  31. Lin, S.; Wang, K.; Zeng, X.; Zhao, R. Explore the Power of Synthetic Data on Few-Shot Object Detection. arXiv 2023, arXiv:2303.13221. [Google Scholar]
  32. Huh, J.; Lee, K.; Lee, I.; Lee, S. A Simple Method on Generating Synthetic Data for Training Real-Time Object Detection Networks. In Proceedings of the 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Honolulu, HI, USA, 12–15 November 2018; IEEE: New York, NY, USA, 2018; pp. 1518–1522. [Google Scholar]
  33. Seiler, F.; Eichinger, V.; Effenberger, I. Synthetic Data Generation for AI-Based Machine Vision Applications. Electron. Imaging 2024, 36, IRIACV-276. [Google Scholar] [CrossRef]
  34. Andulkar, M.; Hodapp, J.; Reichling, T.; Reichenbach, M.; Berger, U. Training CNNs from Synthetic Data for Part Handling in Industrial Environments. In Proceedings of the 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany, 20–24 August 2018; IEEE: New York, NY, USA, 2018; pp. 624–629. [Google Scholar]
  35. Wang, R.; Hoppe, S.; Monari, E.; Huber, M.F. Defect Transfer GAN: Diverse Defect Synthesis for Data Augmentation. arXiv 2023, arXiv:2302.08366. [Google Scholar]
  36. Ma, Q.; Jin, S.; Bian, G.; Cui, Y. Multi-Scale Marine Object Detection in Side-Scan Sonar Images Based on BES-YOLO. Sensors 2024, 24, 4428. [Google Scholar] [CrossRef]
  37. Zhang, X.; Yang, P.; Cao, D. Synthetic aperture image enhancement with near-coinciding Nonuniform sampling case. Comput. Electr. Eng. 2024, 120, 109818. [Google Scholar] [CrossRef]
  38. Mohamed, H.; Nadaoka, K.; Nakamura, T. Automatic Semantic Segmentation of Benthic Habitats Using Images from Towed Underwater Camera in a Complex Shallow Water Environment. Remote Sens. 2022, 14, 1818. [Google Scholar] [CrossRef]
  39. Mohamed, H.; Nadaoka, K.; Nakamura, T. Semiautomated Mapping of Benthic Habitats and Seagrass Species Using a Convolutional Neural Network Framework in Shallow Water Environments. Remote Sens. 2020, 12, 4002. [Google Scholar] [CrossRef]
  40. Villon, S.; Mouillot, D.; Chaumont, M.; Darling, E.S.; Subsol, G.; Claverie, T.; Villéger, S. A Deep Learning Method for Accurate and Fast Identification of Coral Reef Fishes in Underwater Images. Ecol. Inform. 2018, 48, 238–244. [Google Scholar] [CrossRef]
  41. Han, F.; Yao, J.; Zhu, H.; Wang, C. Underwater Image Processing and Object Detection Based on Deep CNN Method. J. Sens. 2020, 2020, 6707328. [Google Scholar] [CrossRef]
  42. Jin, L.; Liang, H. Deep Learning for Underwater Image Recognition in Small Sample Size Situations. In Proceedings of the OCEANS 2017—Aberdeen, Aberdeen, UK, 19–22 June 2017; IEEE: New York, NY, USA, 2017; pp. 1–4. [Google Scholar]
  43. Jain, A.; Mahajan, M.; Saraf, R. Standardization of the Shape of Ground Control Point (GCP) and the Methodology for Its Detection in Images for UAV-Based Mapping Applications. In Advances in Computer Vision; Arai, K., Kapoor, S., Eds.; Advances in Intelligent Systems and Computing; Springer International Publishing: Cham, Switzerland, 2020; Volume 943, pp. 459–476. ISBN 978-3-030-17794-2. [Google Scholar]
  44. Chuanxiang, C.; Jia, Y.; Chao, W.; Zhi, Z.; Xiaopeng, L.; Di, D.; Mengxia, C.; Zhiheng, Z. Automatic Detection of Aerial Survey Ground Control Points Based on Yolov5-OBB. arXiv 2023, arXiv:2303.03041. [Google Scholar]
  45. Muradás Odriozola, G.; Pauly, K.; Oswald, S.; Raymaekers, D. Automating Ground Control Point Detection in Drone Imagery: From Computer Vision to Deep Learning. Remote Sens. 2024, 16, 794. [Google Scholar] [CrossRef]
  46. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 779–788. [Google Scholar]
  47. Wang, C.; Wang, Q.; Wu, H.; Zhao, C.; Teng, G.; Li, J. Low-Altitude Remote Sensing Opium Poppy Image Detection Based on Modified YOLOv3. Remote Sens. 2021, 13, 2130. [Google Scholar] [CrossRef]
  48. Wang, Q.; Shen, F.; Cheng, L.; Jiang, J.; He, G.; Sheng, W.; Jing, N.; Mao, Z. Ship Detection Based on Fused Features and Rebuilt YOLOv3 Networks in Optical Remote-Sensing Images. Int. J. Remote Sens. 2021, 42, 520–536. [Google Scholar] [CrossRef]
  49. Hong, Z.; Yang, T.; Tong, X.; Zhang, Y.; Jiang, S.; Zhou, R.; Han, Y.; Wang, J.; Yang, S.; Liu, S. Multi-Scale Ship Detection From SAR and Optical Imagery Via A More Accurate YOLOv3. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6083–6101. [Google Scholar] [CrossRef]
  50. Chen, L.; Shi, W.; Deng, D. Improved YOLOv3 Based on Attention Mechanism for Fast and Accurate Ship Detection in Optical Remote Sensing Images. Remote Sens. 2021, 13, 660. [Google Scholar] [CrossRef]
  51. Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
  52. Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
  53. Yeerjiang, A.; Wang, Z.; Huang, X.; Zhang, J.; Chen, Q.; Qin, Y.; He, J. YOLOv1 to YOLOv10: A Comprehensive Review of YOLO Variants and Their Application in Medical Image Detection. J. Artif. Intell. Pract. 2024, 7, 112–122. [Google Scholar] [CrossRef]
  54. Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
  55. Sapkota, R.; Meng, Z.; Karkee, M. Synthetic Meets Authentic: Leveraging LLM Generated Datasets for YOLO11 and YOLOv10-Based Apple Detection through Machine Vision Sensors. Smart Agric. Technol. 2024, 9, 100614. [Google Scholar] [CrossRef]
  56. Polish National Geoportal. Available online: https://mapy.geoportal.gov.pl/imap/Imgp_2.html (accessed on 30 July 2025).
  57. Padilla, R.; Netto, S.L.; Da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niterói, Brazil, 1–3 July 2020; IEEE: New York, NY, USA, 2020; pp. 237–242. [Google Scholar]
  58. McCoy, R.M. Field Methods in Remote Sensing; Guilford Press: Guilford, UK, 2004; pp. 101–110. [Google Scholar]
  59. Karwowska, K.; Wierzbicki, D. Improving Spatial Resolution of Satellite Imagery Using Generative Adversarial Networks and Window Functions. Remote Sens. 2022, 14, 6285. [Google Scholar] [CrossRef]
  60. Karwowska, K.; Wierzbicki, D. Using Super-Resolution Algorithms for Small Satellite Imagery: A Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3292–3312. [Google Scholar] [CrossRef]
  61. Lu, T.; Wang, J.; Zhang, Y.; Wang, Z.; Jiang, J. Satellite Image Super-Resolution via Multi-Scale Residual Deep Neural Network. Remote Sens. 2019, 11, 1588. [Google Scholar] [CrossRef]
  62. Elhanashi, A.; Dini, P.; Saponara, S.; Zheng, Q. Integration of Deep Learning into the IoT: A Survey of Techniques and Challenges for Real-World Applications. Electronics 2023, 12, 4925. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the process for obtaining underwater object detection models.
Figure 1. Flowchart of the process for obtaining underwater object detection models.
Remotesensing 17 02707 g001
Figure 2. Raster data compilation used in the research: (a) aerial orthophoto (2021); (b) aerial orthophoto (2022); (c) UAV orthophoto (2022); (d) example of single UAV image (2022).
Figure 2. Raster data compilation used in the research: (a) aerial orthophoto (2021); (b) aerial orthophoto (2022); (c) UAV orthophoto (2022); (d) example of single UAV image (2022).
Remotesensing 17 02707 g002
Figure 3. Presentation of a GCP in the field and in a photo from a UAV: (a) GCP located on land; (b) GCP at the bottom of the reservoir (photo taken with a camera); (c) GCP visible in the UAV aerial image (altitude 80 m).
Figure 3. Presentation of a GCP in the field and in a photo from a UAV: (a) GCP located on land; (b) GCP at the bottom of the reservoir (photo taken with a camera); (c) GCP visible in the UAV aerial image (altitude 80 m).
Remotesensing 17 02707 g003
Figure 4. Three types of simplified synthetic objects used in the study: (a) Type A GCP; (b) Type B GCP; (c) Type C GCP.
Figure 4. Three types of simplified synthetic objects used in the study: (a) Type A GCP; (b) Type B GCP; (c) Type C GCP.
Remotesensing 17 02707 g004
Figure 5. Composite image with zoomed-in area showing random distribution of Type C GCPs.
Figure 5. Composite image with zoomed-in area showing random distribution of Type C GCPs.
Remotesensing 17 02707 g005
Figure 6. YOLOv3 network framework. Based on figure from “Multi-Scale Ship Detection from SAR and Optical Imagery via A More Accurate YOLOv3” [49].
Figure 6. YOLOv3 network framework. Based on figure from “Multi-Scale Ship Detection from SAR and Optical Imagery via A More Accurate YOLOv3” [49].
Remotesensing 17 02707 g006
Figure 7. Example of GCP detection in an image from a UAV survey.
Figure 7. Example of GCP detection in an image from a UAV survey.
Remotesensing 17 02707 g007
Figure 8. Five ground control points on UAV image with bounding boxes. The nomenclature used is consistent with Table 4 (1) GCP#1; (2) GCP#2; (3) GCP#3; (4) GCP#4; (5) GCP#5.
Figure 8. Five ground control points on UAV image with bounding boxes. The nomenclature used is consistent with Table 4 (1) GCP#1; (2) GCP#2; (3) GCP#3; (4) GCP#4; (5) GCP#5.
Remotesensing 17 02707 g008
Figure 9. The influence of the location of ground control points on the number of true positives (TPs). Graph of the curves of three measurement series: series 1(N)—not augmented; series 2(P)—partial augmentation; series 3(F)—full augmentation.
Figure 9. The influence of the location of ground control points on the number of true positives (TPs). Graph of the curves of three measurement series: series 1(N)—not augmented; series 2(P)—partial augmentation; series 3(F)—full augmentation.
Remotesensing 17 02707 g009
Figure 10. Influence of the position of ground control points on the value of the mean detection deviation of a series of models. Graph of the curves of three measurement series: series 1(N)—not augmented; series 2(P)—partial augmentation; series 3(F)—full augmentation.
Figure 10. Influence of the position of ground control points on the value of the mean detection deviation of a series of models. Graph of the curves of three measurement series: series 1(N)—not augmented; series 2(P)—partial augmentation; series 3(F)—full augmentation.
Remotesensing 17 02707 g010
Figure 11. Chart depicting the F1-score of detection for individual models relative to the length of their training sets, based on data from Table 3.
Figure 11. Chart depicting the F1-score of detection for individual models relative to the length of their training sets, based on data from Table 3.
Remotesensing 17 02707 g011
Figure 12. Heat map of the F1-score the range of parameter values from 0 (red) to 1 (green).
Figure 12. Heat map of the F1-score the range of parameter values from 0 (red) to 1 (green).
Remotesensing 17 02707 g012
Figure 13. F1-score averages of 8 versions of YOLO, with values for the 3 best performing model sets shown.
Figure 13. F1-score averages of 8 versions of YOLO, with values for the 3 best performing model sets shown.
Remotesensing 17 02707 g013
Table 1. Presentation of a method for creating synthetic images constituting training data.
Table 1. Presentation of a method for creating synthetic images constituting training data.
GCP TypeNumber of SSOs in the ortho2021 ImageNumber of SSOs in the ortho2022 ImageTotal Number of Simplified Synthetic GCPs in Training Data
Type A250250500
Type A5005001000
Type A100010002000
Type A200020004000
Type B250250500
Type B5005001000
Type B100010002000
Type B200020004000
Type C250250500
Type C5005001000
Type C100010002000
Type C200020004000
Table 2. List of prepared detection models.
Table 2. List of prepared detection models.
GCP TypeThe Total Number of Training Data (SSOs) The Detection Model UsedSeriesThe Short Name of the Trained Model
Type A500YOLOv31YA500N
Type A1000YOLOv31YA1000N
Type A2000YOLOv31YA2000N
Type A4000YOLOv31YA4000N
Type B500YOLOv31YB500N
Type B1000YOLOv31YB1000N
Type B2000YOLOv31YB2000N
Type B4000YOLOv31YB4000N
Type C500YOLOv31YC500N
Type C1000YOLOv31YC1000N
Type C2000YOLOv31YC2000N
Type C4000YOLOv31YC4000N
Type A500YOLOv32YA500P
Type A1000YOLOv32YA1000P
Type A2000YOLOv32YA2000P
Type A4000YOLOv32YA4000P
Type B500YOLOv32YB500P
Type B1000YOLOv32YB1000P
Type B2000YOLOv32YB2000P
Type B4000YOLOv32YB4000P
Type C500YOLOv32YC500P
Type C1000YOLOv32YC1000P
Type C2000YOLOv32YC2000P
Type C4000YOLOv32YC4000P
Type A500YOLOv33YA500F
Type A1000YOLOv33YA1000F
Type A2000YOLOv33YA2000F
Type A4000YOLOv33YA4000F
Type B500YOLOv33YB500F
Type B1000YOLOv33YB1000F
Type B2000YOLOv33YB2000F
Type B4000YOLOv33YB4000F
Type C500YOLOv33YC500F
Type C1000YOLOv33YC1000F
Type C2000YOLOv33YC2000F
Type C4000YOLOv33YC4000F
Table 3. Network detection results on image set.
Table 3. Network detection results on image set.
Short Name of the ModelTPFPFNPrecisionRecallF1-ScoreAP
YA500N650871.000.430.600.43
YA1000N1113410.970.730.830.73
YA2000N1178350.940.770.840.77
YA4000N12728250.820.840.830.83
YB500N2011320.950.130.230.13
YB1000N811440.890.050.100.05
YB2000N12342290.750.810.780.79
YB4000N35341170.510.230.320.16
YC500N1171350.990.770.870.77
YC1000N840681.000.550.710.56
YC2000N1084440.960.710.820.72
YC4000N551970.980.360.530.37
YA500P2601261.000.170.290.17
YA1000P143790.950.940.950.93
YA2000P1272250.980.840.900.84
YA4000P791730.990.520.680.52
YB500P1121410.850.070.130.06
YB1000P881640.990.580.730.58
YB2000P3801141.000.250.400.26
YB4000P1256270.950.820.880.82
YC500P1124400.970.740.840.74
YC1000P1370151.000.900.950.91
YC2000P1511310.920.990.960.99
YC4000P1441780.890.950.920.94
YA500F12530270.810.820.810.79
YA1000F1443580.800.950.870.91
YA2000F8845640.660.580.620.42
YA4000F1183340.980.780.860.77
YB500F2311290.960.150.260.15
YB1000F1326200.960.870.910.86
YB2000F1105420.960.720.820.72
YB4000F46151060.750.300.430.23
YC500F1305220.960.860.910.85
YC1000F1456370.700.950.810.94
YC2000F15215800.491.000.660.98
YC4000F1431790.890.940.920.90
Table 4. Network detection results on the orthophoto along with deviations on individual GCPs and the calculated average deviation of the 36 YOLOv3 models.
Table 4. Network detection results on the orthophoto along with deviations on individual GCPs and the calculated average deviation of the 36 YOLOv3 models.
Short Name of the ModelTPGCP#1
dmean [cm]
GCP#2
dmean [cm]
GCP#3
dmean [cm]
GCP#4
dmean [cm]
GCP#5
dmean [cm]
Δdmean [cm]
YA500N0
YA1000N210.1011.8510.98
YA2000N221.4913.6517.57
YA4000N118.4018.40
YB500N16.806.80
YB1000N27.167.457.30
YB2000N36.4019.109.7211.74
YB4000N29.3210.7110.02
YC500N46,4024.7612.134.7912.02
YC1000N46.4023.2010.297.0511.73
YC2000N56.4022.3920.6212.338.7114.09
YC4000N45.0524.7310.7213.6913.55
YA500P16.806.80
YA1000P57.1010.624.5611.745.817.97
YA2000P47.1613.4919.2313.0913.24
YA4000P213.5513.0913.32
YB500P39.0513.664.729.14
YB1000P57.1611.311.0310.167.957.52
YB2000P410.1014.311.9717.0310.85
YB4000P55.072.312.137.317.054.77
YC500P55.0716.325.4515.738.8410.28
YC1000P55.0710.4916.0712.337.2810.25
YC2000P59.5817.921.8610.7110.6010.14
YC4000P55.0512.0616.687.2811.8710.59
YA500F37.7925.0017.3416.71
YA1000F53.218.202.387.3119.328.08
YA2000F0
YA4000F24.6527.9616.30
YB500F0
YB1000F26.8711.839.35
YB2000F13.533.53
YB4000F214.5643.2628.91
YC500F520.4829.5521.4914.5611.2819.47
YC1000F514.2717.9818.1218.0829.5619.60
YC2000F514.4333.3814.6820.992.9517.28
YC4000F514.9615.6316.9313.233.2612.80
Table 5. List of the most effective detection networks.
Table 5. List of the most effective detection networks.
Short Name of the ModelTPFPFNPrecisionRecallF1-ScoreAP
YC2000P1511310.920.990.960.99
YC1000P1370151.000.900.950.91
YA1000P143790.950.940.950.93
YC4000P1441780.890.950.920.94
YC4000F1431790.890.940.920.90
YB1000F1326200.960.870.910.86
YC500F1305220.960.860.910.85
Table 6. Hyperparameters of YOLO models.
Table 6. Hyperparameters of YOLO models.
ParameterValue
imgsz416
workers8
batch16
epochs300
patience50
hsv_h0.015
hsv_s0.7
hsv_v0.4
degrees0.0
translate0.1
scale0.5
shear0.0
perspective0.0
flipud0.0
fliplr0.5
bgr0.0
mosaic1.0
mixup0.0
copy_paste0.5
copy_paste_modeflip
auto_augmentrandaugment
erasing0.4
crop_fraction1.0
Table 7. Analysis of deviations in the distance of the bounding box marking from the GCP center of the five determined models with respect to the F1-score parameter.
Table 7. Analysis of deviations in the distance of the bounding box marking from the GCP center of the five determined models with respect to the F1-score parameter.
Short Name of the ModelTPGCP#1
dmean [cm]
GCP#2
dmean [cm]
GCP#3
dmean [cm]
GCP#4
dmean [cm]
GCP#5
dmean [cm]
Δdmean [cm]
YOLOv5 1000C52.252.253.188.117.124.58
YOLOv8 2000C53.182.253.187.125.034.15
YOLOv9 2000C52.252.252.257.123.183.41
YOLOv10 2000C52.252.252.255.033.182.99
YOLO12 4000C53.182.253.185.033.183.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Klukowski, D.; Lubczonek, J.; Adamski, P. A Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks. Remote Sens. 2025, 17, 2707. https://doi.org/10.3390/rs17152707

AMA Style

Klukowski D, Lubczonek J, Adamski P. A Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks. Remote Sensing. 2025; 17(15):2707. https://doi.org/10.3390/rs17152707

Chicago/Turabian Style

Klukowski, Daniel, Jacek Lubczonek, and Pawel Adamski. 2025. "A Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks" Remote Sensing 17, no. 15: 2707. https://doi.org/10.3390/rs17152707

APA Style

Klukowski, D., Lubczonek, J., & Adamski, P. (2025). A Method of Simplified Synthetic Objects Creation for Detection of Underwater Objects from Remote Sensing Data Using YOLO Networks. Remote Sensing, 17(15), 2707. https://doi.org/10.3390/rs17152707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop