MDPI - Publisher of Open Access Journals

17 pages, 5356 KiB

Open AccessEditor’s ChoiceArticle

A Study on the Features for Multi-Target Dual-Camera Tracking and Re-Identification in a Comparatively Small Environment

by Jong-Chen Chen, Po-Sheng Chang and Yu-Ming Huang

Electronics 2025, 14(10), 1984; https://doi.org/10.3390/electronics14101984 - 13 May 2025

Viewed by 532

Abstract

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates [...] Read more.

Tracking across multiple cameras is a complex problem in computer vision. Its main challenges include camera calibration, occlusion handling, camera overlap and field of view, person re-identification, and data association. In this study, we designed a laboratory as a research environment that facilitates our exploration of some of the above challenging issues. This study uses stereo camera calibration and key point detection to reconstruct the three-dimensional key points of the person being tracked, thereby performing person-tracking tasks. The results show that the dual cameras’ 3D spatial tracking method can have a relatively better continuous monitoring effect than a single camera alone. This study adopts four ways to evaluate person similarity, which can effectively reduce the unnecessary identity generation of persons. However, using all four methods simultaneously may not produce better results than a specific assessment method alone due to differences in people’s activity situations. Full article

(This article belongs to the Collection Computer Vision and Pattern Recognition Techniques)

► Show Figures

Figure 1

15 pages, 11293 KiB

Open AccessArticle

An Assessment of the Stereo and Near-Infrared Camera Calibration Technique Using a Novel Real-Time Approach in the Context of Resource Efficiency

by Larisa Ivascu, Vlad-Florin Vinatu and Mihail Gaianu

Processes 2025, 13(4), 1198; https://doi.org/10.3390/pr13041198 - 15 Apr 2025

Viewed by 567

Abstract

This paper provides a comparative analysis of calibration techniques applicable to stereo and near-infrared (NIR) camera systems, with a specific emphasis on the Intel RealSense SR300 alongside a standard 2-megapixel NIR camera. This study investigates the pivotal function of calibration within both stereo [...] Read more.

This paper provides a comparative analysis of calibration techniques applicable to stereo and near-infrared (NIR) camera systems, with a specific emphasis on the Intel RealSense SR300 alongside a standard 2-megapixel NIR camera. This study investigates the pivotal function of calibration within both stereo vision and NIR imaging applications, which are essential across various domains, including robotics, augmented reality, and low-light imaging. For stereo systems, we scrutinise the conventional method involving a 9 × 6 chessboard pattern utilised to ascertain the intrinsic and extrinsic camera parameters. The proposed methodology consists of three main steps: (1) real-time calibration error classification for stereo cameras, (2) NIR-specific calibration techniques, and (3) a comprehensive evaluation framework. This research introduces a novel real-time evaluation methodology that classifies calibration errors predicated on the pixel offsets between corresponding points in the left and right images. Conversely, NIR camera calibration techniques are modified to address the distinctive properties of near-infrared light. We deliberate on the difficulties encountered in devising NIR–visible calibration patterns and the imperative to consider the spectral response and temperature sensitivity within the calibration procedure. The paper also puts forth an innovative calibration assessment application that is relevant to both systems. Stereo cameras evaluate the corner detection accuracy in real time across multiple image pairs, whereas NIR cameras concentrate on assessing the distortion correction and intrinsic parameter accuracy under varying lighting conditions. Our experiments validate the necessity of routine calibration assessment, as environmental factors may compromise the calibration quality over time. We conclude by underscoring the disparities in the calibration requirements between stereo and NIR systems, thereby emphasising the need for specialised approaches tailored to each domain to guarantee an optimal performance in their respective applications. Full article

(This article belongs to the Special Issue Circular Economy and Efficient Use of Resources (Volume II))

► Show Figures

Figure 1

25 pages, 6410 KiB

Open AccessArticle

Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume

by Zongcheng Zuo, Yuanxiang Li, Yu Zhou and Fan Mo

Sensors 2025, 25(7), 2233; https://doi.org/10.3390/s25072233 - 2 Apr 2025

Viewed by 955

Abstract

Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional [...] Read more.

Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

19 pages, 39933 KiB

Open AccessArticle

SIFT-Based Depth Estimation for Accurate 3D Reconstruction in Cultural Heritage Preservation

by Porawat Visutsak, Xiabi Liu, Chalothon Choothong and Fuangfar Pensiri

Appl. Syst. Innov. 2025, 8(2), 43; https://doi.org/10.3390/asi8020043 - 24 Mar 2025

Viewed by 1611

Abstract

This paper describes a proposed method for preserving tangible cultural heritage by reconstructing a 3D model of cultural heritage using 2D captured images. The input data represent a set of multiple 2D images captured using different views around the object. An image registration [...] Read more.

This paper describes a proposed method for preserving tangible cultural heritage by reconstructing a 3D model of cultural heritage using 2D captured images. The input data represent a set of multiple 2D images captured using different views around the object. An image registration technique is applied to configure the overlapping images with the depth of images computed to construct the 3D model. The automatic 3D reconstruction system consists of three steps: (1) Image registration for managing the overlapping of 2D input images; (2) Depth computation for managing image orientation and calibration; and (3) 3D reconstruction using point cloud and stereo-dense matching. We collected and recorded 2D images of tangible cultural heritage objects, such as high-relief and round-relief sculptures, using a low-cost digital camera. The performance analysis of the proposed method, in conjunction with the generation of 3D models of tangible cultural heritage, demonstrates significantly improved accuracy in depth information. This process effectively creates point cloud locations, particularly in high-contrast backgrounds. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)

► Show Figures

Figure 1

21 pages, 16376 KiB

Open AccessArticle

Triple-Camera Rectification for Depth Estimation Sensor

by Minkyung Jeon, Jinhong Park, Jin-Woo Kim and Sungmin Woo

Sensors 2024, 24(18), 6100; https://doi.org/10.3390/s24186100 - 20 Sep 2024

Cited by 1 | Viewed by 1466

Abstract

In this study, we propose a novel rectification method for three cameras using a single image for depth estimation. Stereo rectification serves as a fundamental preprocessing step for disparity estimation in stereoscopic cameras. However, off-the-shelf depth cameras often include an additional RGB camera [...] Read more.

In this study, we propose a novel rectification method for three cameras using a single image for depth estimation. Stereo rectification serves as a fundamental preprocessing step for disparity estimation in stereoscopic cameras. However, off-the-shelf depth cameras often include an additional RGB camera for creating 3D point clouds. Existing rectification methods only align two cameras, necessitating an additional rectification and remapping process to align the third camera. Moreover, these methods require multiple reference checkerboard images for calibration and aim to minimize alignment errors, but often result in rotated images when there is significant misalignment between two cameras. In contrast, the proposed method simultaneously rectifies three cameras in a single shot without unnecessary rotation. To achieve this, we designed a lab environment with checkerboard settings and obtained multiple sample images from the cameras. The optimization function, designed specifically for rectification in stereo matching, enables the simultaneous alignment of all three cameras while ensuring performance comparable to traditional methods. Experimental results with real camera samples demonstrate the benefits of the proposed method and provide a detailed analysis of unnecessary rotations in the rectified images. Full article

(This article belongs to the Collection 3D Imaging and Sensing System)

► Show Figures

Figure 1

16 pages, 2676 KiB

Open AccessArticle

Point Cloud Densification Algorithm for Multiple Cameras and Lidars Data Fusion

by Jakub Winter and Robert Nowak

Sensors 2024, 24(17), 5786; https://doi.org/10.3390/s24175786 - 5 Sep 2024

Viewed by 2228

Abstract

Fusing data from many sources helps to achieve improved analysis and results. In this work, we present a new algorithm to fuse data from multiple cameras with data from multiple lidars. This algorithm was developed to increase the sensitivity and specificity of autonomous [...] Read more.

Fusing data from many sources helps to achieve improved analysis and results. In this work, we present a new algorithm to fuse data from multiple cameras with data from multiple lidars. This algorithm was developed to increase the sensitivity and specificity of autonomous vehicle perception systems, where the most accurate sensors measuring the vehicle’s surroundings are cameras and lidar devices. Perception systems based on data from one type of sensor do not use complete information and have lower quality. The camera provides two-dimensional images; lidar produces three-dimensional point clouds. We developed a method for matching pixels on a pair of stereoscopic images using dynamic programming inspired by an algorithm to match sequences of amino acids used in bioinformatics. We improve the quality of the basic algorithm using additional data from edge detectors. Furthermore, we also improve the algorithm performance by reducing the size of matched pixels determined by available car speeds. We perform point cloud densification in the final step of our method, fusing lidar output data with stereo vision output. We implemented our algorithm in C++ with Python API, and we provided the open-source library named Stereo PCD. This library very efficiently fuses data from multiple cameras and multiple lidars. In the article, we present the results of our approach to benchmark databases in terms of quality and performance. We compare our algorithm with other popular methods. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 3639 KiB

Open AccessArticle

Time-of-Flight Camera Intensity Image Reconstruction Based on an Untrained Convolutional Neural Network

by Tian-Long Wang, Lin Ao, Na Han, Fu Zheng, Yan-Qiu Wang and Zhi-Bin Sun

Photonics 2024, 11(9), 821; https://doi.org/10.3390/photonics11090821 - 30 Aug 2024

Cited by 3 | Viewed by 2053

Abstract

With the continuous development of science and technology, laser ranging technology will become more efficient, convenient, and widespread, and it has been widely used in the fields of medicine, engineering, video games, and three-dimensional imaging. A time-of-flight (ToF) camera is a three-dimensional stereo [...] Read more.

With the continuous development of science and technology, laser ranging technology will become more efficient, convenient, and widespread, and it has been widely used in the fields of medicine, engineering, video games, and three-dimensional imaging. A time-of-flight (ToF) camera is a three-dimensional stereo imaging device with the advantages of small size, small measurement error, and strong anti-interference ability. However, compared to traditional sensors, ToF cameras typically exhibit lower resolution and signal-to-noise ratio due to inevitable noise from multipath interference and mixed pixels during usage. Additionally, in environments with scattering media, the information about objects gets scattered multiple times, making it challenging for ToF cameras to obtain effective object information. To address these issues, we propose a solution that combines ToF cameras with single-pixel imaging theory. Leveraging intensity information acquired by ToF cameras, we apply various reconstruction algorithms to reconstruct the object’s image. Under undersampling conditions, our reconstruction approach yields higher peak signal-to-noise ratio compared to the raw camera image, significantly improving the quality of the target object’s image. Furthermore, when ToF cameras fail in environments with scattering media, our proposed approach successfully reconstructs the object’s image when the camera is imaging through the scattering medium. This experimental demonstration effectively reduces the noise and direct ambient light generated by the ToF camera itself, while opening up the potential application of ToF cameras in challenging environments, such as scattering media or underwater. Full article

► Show Figures

Figure 1

13 pages, 320 KiB

Open AccessReview

Stereo-Photogrammetry for Impression of Full-Arch Fixed Dental Prosthesis—An Update of the Reviews

by Paulo Ribeiro, Carmen María Díaz-Castro, Blanca Ríos-Carrasco, José Vicente Ríos-Santos and Mariano Herrero-Climent

Prosthesis 2024, 6(4), 939-951; https://doi.org/10.3390/prosthesis6040068 - 15 Aug 2024

Cited by 6 | Viewed by 2973

Abstract

Photogrammetry (PG) appeared as an alternative for multiple implant impressions. Stereo-photogrammetry is a more sophisticated alternative to PG, which estimates the 3D coordinates of the points of an object, making the process quicker and more precise. A search in PubMed MEDLINE, PMC, and [...] Read more.

Photogrammetry (PG) appeared as an alternative for multiple implant impressions. Stereo-photogrammetry is a more sophisticated alternative to PG, which estimates the 3D coordinates of the points of an object, making the process quicker and more precise. A search in PubMed MEDLINE, PMC, and Google Scholar was conducted to find systematic reviews published in the last 10 years. The PICdental^® camera (IDITEC NORTH WEST, SL; Torrelodones, Spain) is a stereocamera that records implant positions in the mouth by means of photogrammetry with the objective of registering and obtaining a viable, reliable, and direct digital impression of the positions of the multiple implants. The use of photogrammetry via PiCdental^® camera as an alternative to digital impression for multiple implants is an easy and trustworthy technique that permits an adequate fit without prosthetic complications. Full article

(This article belongs to the Collection Oral Implantology: Current Aspects and Future Perspectives)

19 pages, 10016 KiB

Open AccessArticle

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

by Weiming Luo, Zongqing Lu and Qingmin Liao

Sensors 2024, 24(8), 2400; https://doi.org/10.3390/s24082400 - 9 Apr 2024

Cited by 6 | Viewed by 2515

Abstract

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching [...] Read more.

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments. Full article

(This article belongs to the Special Issue 3D Reconstruction with RGB-D Cameras and Multi-sensors)

► Show Figures

Figure 1

22 pages, 13859 KiB

Open AccessArticle

Stereo Vision for Plant Detection in Dense Scenes

by Thijs Ruigrok, Eldert J. van Henten and Gert Kootstra

Sensors 2024, 24(6), 1942; https://doi.org/10.3390/s24061942 - 18 Mar 2024

Cited by 2 | Viewed by 2128

Abstract

Automated precision weed control requires visual methods to discriminate between crops and weeds. State-of-the-art plant detection methods fail to reliably detect weeds, especially in dense and occluded scenes. In the past, using hand-crafted detection models, both color (RGB) and depth (D) data were [...] Read more.

Automated precision weed control requires visual methods to discriminate between crops and weeds. State-of-the-art plant detection methods fail to reliably detect weeds, especially in dense and occluded scenes. In the past, using hand-crafted detection models, both color (RGB) and depth (D) data were used for plant detection in dense scenes. Remarkably, the combination of color and depth data is not widely used in current deep learning-based vision systems in agriculture. Therefore, we collected an RGB-D dataset using a stereo vision camera. The dataset contains sugar beet crops in multiple growth stages with a varying weed densities. This dataset was made publicly available and was used to evaluate two novel plant detection models, the D-model, using the depth data as the input, and the CD-model, using both the color and depth data as inputs. For ease of use, for existing 2D deep learning architectures, the depth data were transformed into a 2D image using color encoding. As a reference model, the C-model, which uses only color data as the input, was included. The limited availability of suitable training data for depth images demands the use of data augmentation and transfer learning. Using our three detection models, we studied the effectiveness of data augmentation and transfer learning for depth data transformed to 2D images. It was found that geometric data augmentation and transfer learning were equally effective for both the reference model and the novel models using the depth data. This demonstrates that combining color-encoded depth data with geometric data augmentation and transfer learning can improve the RGB-D detection model. However, when testing our detection models on the use case of volunteer potato detection in sugar beet farming, it was found that the addition of depth data did not improve plant detection at high vegetation densities. Full article

(This article belongs to the Special Issue Intelligent Sensing and Machine Vision in Precision Agriculture)

► Show Figures

Figure 1

28 pages, 14831 KiB

Open AccessArticle

Multipath-Closure Calibration of Stereo Camera and 3D LiDAR Combined with Multiple Constraints

by Jianqiao Duan, Yuchun Huang, Yuyan Wang, Xi Ye and He Yang

Remote Sens. 2024, 16(2), 258; https://doi.org/10.3390/rs16020258 - 9 Jan 2024

Cited by 5 | Viewed by 2175

Abstract

Stereo cameras can capture the rich image textures of a scene, while LiDAR can obtain accurate 3D coordinates of point clouds of a scene. They complement each other and can achieve comprehensive and accurate environment perception through data fusion. The primary step in [...] Read more.

Stereo cameras can capture the rich image textures of a scene, while LiDAR can obtain accurate 3D coordinates of point clouds of a scene. They complement each other and can achieve comprehensive and accurate environment perception through data fusion. The primary step in data fusion is to establish the relative positional relationship between the stereo cameras and the 3D LiDAR, known as extrinsic calibration. Existing methods establish the camera–LiDAR relationship by constraints of the correspondence between different planes in the images and point clouds. However, these methods depend on the planes and ignore the multipath-closure constraint among the camera–LiDAR–camera sensors, resulting in poor robustness and accuracy of the extrinsic calibration. This paper proposes a trihedron as the calibration object to effectively establish various coplanar and collinear constraints between stereo cameras and 3D LiDAR. With the various constraints, the multipath-closure constraint between the three sensors is further formulated for the extrinsic calibration. Firstly, the coplanar and collinear constraints between the camera–LiDAR–camera are built using the trihedron calibration object. Then, robust and accurate coplanar constraint information is extracted through iterative maximum a posteriori (MAP) estimation. Finally, a multipath-closure extrinsic calibration method for multi-sensor systems is developed with structurally mutual validation between the cameras and the LiDAR. Extensive experiments are conducted on simulation data with different noise levels and a large amount of real data to validate the accuracy and robustness of the proposed calibration algorithm. Full article

(This article belongs to the Section Engineering Remote Sensing)

► Show Figures

Figure 1

21 pages, 52303 KiB

Open AccessArticle

Imitation Learning of Complex Behaviors for Multiple Drones with Limited Vision

by Yu Wan, Jun Tang and Zipeng Zhao

Drones 2023, 7(12), 704; https://doi.org/10.3390/drones7120704 - 13 Dec 2023

Cited by 3 | Viewed by 3493

Abstract

Navigating multiple drones autonomously in complex and unpredictable environments, such as forests, poses a significant challenge typically addressed by wireless communication for coordination. However, this approach falls short in situations with limited central control or blocked communications. Addressing this gap, our paper explores [...] Read more.

Navigating multiple drones autonomously in complex and unpredictable environments, such as forests, poses a significant challenge typically addressed by wireless communication for coordination. However, this approach falls short in situations with limited central control or blocked communications. Addressing this gap, our paper explores the learning of complex behaviors by multiple drones with limited vision. Drones in a swarm rely on onboard sensors, primarily forward-facing stereo cameras, for environmental perception and neighbor detection. They learn complex maneuvers through the imitation of a privileged expert system, which involves finding the optimal set of neural network parameters to enable the most effective mapping from sensory perception to control commands. The training process adopts the Dagger algorithm, employing the framework of centralized training with decentralized execution. Using this technique, drones rapidly learn complex behaviors, such as avoiding obstacles, coordinating movements, and navigating to specified targets, all in the absence of wireless communication. This paper details the construction of a distributed multi-UAV cooperative motion model under limited vision, emphasizing the autonomy of each drone in achieving coordinated flight and obstacle avoidance. Our methodological approach and experimental results validate the effectiveness of the proposed vision-based end-to-end controller, paving the way for more sophisticated applications of multi-UAV systems in intricate, real-world scenarios. Full article

► Show Figures

Figure 1

17 pages, 8119 KiB

Open AccessEditor’s ChoiceArticle

Unsupervised Deep Learning for Advanced Forming Limit Analysis in Sheet Metal: A Tensile Test-Based Approach

by Aleksandra Thamm, Florian Thamm, Annette Sawodny, Sally Zeitler, Marion Merklein and Andreas Maier

Materials 2023, 16(21), 7001; https://doi.org/10.3390/ma16217001 - 1 Nov 2023

Cited by 2 | Viewed by 2073

Abstract

An accurate description of the formability and failure behavior of sheet metal materials is essential for an optimal forming process design. In this respect, the forming limit curve (FLC) based on the Nakajima test, which is determined in accordance with DIN EN ISO [...] Read more.

An accurate description of the formability and failure behavior of sheet metal materials is essential for an optimal forming process design. In this respect, the forming limit curve (FLC) based on the Nakajima test, which is determined in accordance with DIN EN ISO 12004-2, is a wide-spread procedure for evaluating the formability of sheet metal materials. Thereby the FLC is affected by influences originating from intrinsic factors of the Nakajima test-setup, such as friction, which leads to deviations from the linear strain path, biaxial prestress and bending superposition. These disadvantages can be circumvented by an alternative test combination of uniaxial tensile test and hydraulic bulge test. In addition, the forming limit capacity of many lightweight materials is underestimated using the cross-section method according to DIN EN ISO 12004-2, due to the material-dependent occurrence of multiple strain maxima during forming or sudden cracking without prior necking. In this regard, machine learning approaches have a high potential for a more accurate determination of the forming limit curve due to the inclusion of other parameters influencing formability. This work presents a machine learning approach focused on uniaxial tensile tests to define the forming limit of lightweight materials and high-strength steels. The transferability of an existing weakly supervised convolutional neural network (CNN) approach was examined, originally designed for Nakajima tests, to uniaxial tensile tests. Additionally, a stereo camera-based method for this purpose was developed. In our evaluation, we train and test materials, including AA6016, DX54D, and DP800, through iterative data composition, using cross-validation. In the context of our stereo camera-based approach, strains for different materials and thicknesses were predicted. In this cases, our method successfully predicted the major strains with close agreement to ISO standards. For DX54D, with a thickness of 0.8 mm, the prediction was

0.659

(compared to ISO’s

0.664

). Similarly, for DX54D, 2.0 mm thickness, the predicted major strain was

0.780

(compared to ISO

0.705

), and for AA6016, at 1.0 mm thickness, a major strain of

0.314

(in line with ISO

0.309

) was estimated. However, for DP800 with a thickness of 1.0 mm, the prediction yielded a major strain of

0.478

(as compared to ISO

0.289

), indicating a divergence from the ISO standard in this particular case. These results in general, generated with the CNN stereo camera-based approach, underline the quantitative alignment of the approach with the cross-section method. Full article

(This article belongs to the Special Issue Forming Process and Mechanical Behavior Analysis of Light Metals and Alloys)

► Show Figures

Figure 1

18 pages, 4544 KiB

Open AccessArticle

FishSeg: 3D Fish Tracking Using Mask R-CNN in Large Ethohydraulic Flumes

by Fan Yang, Anita Moldenhauer-Roth, Robert M. Boes, Yuhong Zeng and Ismail Albayrak

Water 2023, 15(17), 3107; https://doi.org/10.3390/w15173107 - 30 Aug 2023

Cited by 3 | Viewed by 2424

Abstract

To study the fish behavioral response to up- and downstream fish passage structures, live-fish tests are conducted in large flumes in various laboratories around the world. The use of multiple fisheye cameras to cover the full width and length of a flume, low [...] Read more.

To study the fish behavioral response to up- and downstream fish passage structures, live-fish tests are conducted in large flumes in various laboratories around the world. The use of multiple fisheye cameras to cover the full width and length of a flume, low color contrast between fish and flume bottom and non-uniform illumination leading to fish shadows, air bubbles wrongly identified as fish as well as fish being partially hidden behind each other are the main challenges for video-based fish tracking. This study improves an existing open-source fish tracking code to better address these issues by using a modified Mask Regional-Convolutional Neural Network (Mask R-CNN) as a tracking method. The developed workflow, FishSeg, consists of four parts: (1) stereo camera calibration, (2) background subtraction, (3) multi-fish tracking using Mask R-CNN, and (4) 3D conversion to flume coordinates. The Mask R-CNN model was trained and validated with datasets manually annotated from background subtracted videos from the live-fish tests. Brown trout and European eel were selected as target fish species to evaluate the performance of FishSeg with different types of body shapes and sizes. Comparison with the previous method illustrates that the tracks generated by FishSeg are about three times more continuous with higher accuracy. Furthermore, the code runs more stable since fish shadows and air bubbles are not misidentified as fish. The trout and eel models produced from FishSeg have mean Average Precisions (mAPs) of 0.837 and 0.876, respectively. Comparisons of mAPs with other R-CNN-based models show the reliability of FishSeg with a small training dataset. FishSeg is a ready-to-use open-source code for tracking any fish species with similar body shapes as trout and eel, and further fish shapes can be added with moderate effort. The generated fish tracks allow researchers to analyze the fish behavior in detail, even in large experimental facilities. Full article

► Show Figures

Figure 1

28 pages, 24071 KiB

Open AccessArticle

Three-Dimensional Reconstruction and Geometric Morphology Analysis of Lunar Small Craters within the Patrol Range of the Yutu-2 Rover

by Xinchao Xu, Xiaotian Fu, Hanguang Zhao, Mingyue Liu, Aigong Xu and Youqing Ma

Remote Sens. 2023, 15(17), 4251; https://doi.org/10.3390/rs15174251 - 30 Aug 2023

Cited by 50 | Viewed by 2280

Abstract

Craters on the lunar surface are the most direct method for the study of geological processes and are of great significance to the study of lunar evolution. In order to fill the research gap on small craters (diameter less than 3 m), we [...] Read more.

Craters on the lunar surface are the most direct method for the study of geological processes and are of great significance to the study of lunar evolution. In order to fill the research gap on small craters (diameter less than 3 m), we focus on the small craters around the moving path of the Yutu-2 lunar rover and carry out a 3D reconstruction and geometrical morphology analysis on them. First, a self-calibration model with multiple feature constraints is used to calibrate the navigation camera and obtain the internal and external parameters. Then, the sequence images with overlapping regions from neighboring stations are used to obtain the precise position of the rover through the bundle adjustment (BA) method. After that, a cross-scale cost aggregation for a stereo matching network is proposed to obtain a parallax map, which can further obtain 3D point clouds of the lunar surface. Finally, the indexes of the craters are extracted (diameter D, depth d, and depth–diameter ratio d_r), and the different indicators are fitted and analyzed. The results suggest that CscaNet has an anomaly percentage value of 1.73% in the KITTI2015 dataset, and an EPE of 0.74 px in the SceneFlow dataset, both of which are superior to GC-Net, DispNet, and PSMnet, and have higher reconstruction accuracy. The correlation between D and d is high and exhibits a positive correlation, while the correlation between D and d_r is low. The geometric morphology expressions of small craters fitted by using D and d are significantly different from the expressions proposed by other scholars for large craters. This study provides a priori knowledge for the subsequent Von Karmen crater survey mission in the SPA Basin. Full article

► Show Figures

Figure 1

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (69)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI