Figure 1.
Examples of two rendering methods. (a,b) Data simulation models, which are limited by data collection accuracy and may not represent the real situation. (c) Our proposed 3D realistic visualization, which displays real-time personnel location relationships without relying on high-precision data acquisition.
Figure 1.
Examples of two rendering methods. (a,b) Data simulation models, which are limited by data collection accuracy and may not represent the real situation. (c) Our proposed 3D realistic visualization, which displays real-time personnel location relationships without relying on high-precision data acquisition.
Figure 2.
Applications of real-time 3D visualization. (a) Video Flashlights for immersive model visualization. (b) 3D Immersive View in Google Street View, created from texture registration of real-world imagery.
Figure 2.
Applications of real-time 3D visualization. (a) Video Flashlights for immersive model visualization. (b) 3D Immersive View in Google Street View, created from texture registration of real-world imagery.
Figure 3.
Failure cases of traditional layout estimation in terminal scenes. (a,b) Non-box structures. (c,d) Reflective and texture-less areas.
Figure 3.
Failure cases of traditional layout estimation in terminal scenes. (a,b) Non-box structures. (c,d) Reflective and texture-less areas.
Figure 4.
Failure of deep learning-based layout estimation. (a) Erroneous depth map in a reflective scene, different colors indicates relative depth. (b) Incorrectly fitted planar graph resulting from the flawed depth map, different colors represent distinct planar instances recovered from the noisy data.
Figure 4.
Failure of deep learning-based layout estimation. (a) Erroneous depth map in a reflective scene, different colors indicates relative depth. (b) Incorrectly fitted planar graph resulting from the flawed depth map, different colors represent distinct planar instances recovered from the noisy data.
Figure 5.
The pipeline of our proposed real-time registration method.
Figure 5.
The pipeline of our proposed real-time registration method.
Figure 6.
The relationship between the four coordinate systems: World (), Camera (), Image (), and Pixel ().
Figure 6.
The relationship between the four coordinate systems: World (), Camera (), Image (), and Pixel ().
Figure 7.
(a) From top-view to show that different FOV with two situations of one camera (after rotation). (b) From a 3D view to show the relationship between the 3D structure and the camera’s FOV. We compute the intersection of the 3D structure’s edges with the camera. The represents the edge of a 3D structure (in the cameras’ FOV); , represent the start point and end point of line segment .
Figure 7.
(a) From top-view to show that different FOV with two situations of one camera (after rotation). (b) From a 3D view to show the relationship between the 3D structure and the camera’s FOV. We compute the intersection of the 3D structure’s edges with the camera. The represents the edge of a 3D structure (in the cameras’ FOV); , represent the start point and end point of line segment .
Figure 8.
Illustration of re-projection inconsistency. (a) Input image with two texture regions. (b) The re-projection template mesh. (c) The final re-projected mesh in 3D space, showing misalignment at the edge.
Figure 8.
Illustration of re-projection inconsistency. (a) Input image with two texture regions. (b) The re-projection template mesh. (c) The final re-projected mesh in 3D space, showing misalignment at the edge.
Figure 9.
The pixel scale-invariant regions defined along a line segment for descriptor computation. Arrows represent local gradient orientations.
Figure 9.
The pixel scale-invariant regions defined along a line segment for descriptor computation. Arrows represent local gradient orientations.
Figure 10.
Sample data from different perspectives. (a) Normal enclosed spaces. (b) High-dynamicrange environments. (c) Slight reflection scene.
Figure 10.
Sample data from different perspectives. (a) Normal enclosed spaces. (b) High-dynamicrange environments. (c) Slight reflection scene.
Figure 11.
Two surveillance cameras within the terminal environment.
Figure 11.
Two surveillance cameras within the terminal environment.
Figure 12.
The positioning of camera i within the 3D model.
Figure 12.
The positioning of camera i within the 3D model.
Figure 13.
The resulting 3D layout for camera i.
Figure 13.
The resulting 3D layout for camera i.
Figure 14.
An overhead view of the local 3D layout.
Figure 14.
An overhead view of the local 3D layout.
Figure 15.
Registration result using only the initial layout estimation (the baseline GP component), showing inconsistencies at texture boundaries.
Figure 15.
Registration result using only the initial layout estimation (the baseline GP component), showing inconsistencies at texture boundaries.
Figure 16.
Visualization of the gradient descent process for refining the 2D layout estimation. The (u,v) plane represents the image coordinates, and the z-axis represents the evaluation score, which the process seeks to minimize.
Figure 16.
Visualization of the gradient descent process for refining the 2D layout estimation. The (u,v) plane represents the image coordinates, and the z-axis represents the evaluation score, which the process seeks to minimize.
Figure 17.
Empirical analysis of kernel size impact on (a) visual registration quality and (b) computational latency.
Figure 17.
Empirical analysis of kernel size impact on (a) visual registration quality and (b) computational latency.
Figure 18.
The test result of our method: (a) experimental scenarios, (b) the layout estimation results, (c) registration result.
Figure 18.
The test result of our method: (a) experimental scenarios, (b) the layout estimation results, (c) registration result.
Figure 19.
Real-time 3D visualization effects of a central international airport terminal hall involving six synchronized cameras.
Figure 19.
Real-time 3D visualization effects of a central international airport terminal hall involving six synchronized cameras.
Figure 20.
Comparison of layout estimation results for four different scenes. From left to right: input image, traditional method, deep learning method, our method. (a) The layout estimation result for scene1. (b) The layout estimation result for scene2. (c) The layout estimation result for scene3. (d) The layout estimation result for scene4.
Figure 20.
Comparison of layout estimation results for four different scenes. From left to right: input image, traditional method, deep learning method, our method. (a) The layout estimation result for scene1. (b) The layout estimation result for scene2. (c) The layout estimation result for scene3. (d) The layout estimation result for scene4.
Figure 21.
Comparison of 3D visualization effects for a normal scene. The inaccurate layouts from traditional and deep learning methods lead to distorted registrations.
Figure 21.
Comparison of 3D visualization effects for a normal scene. The inaccurate layouts from traditional and deep learning methods lead to distorted registrations.
Figure 22.
Comparison of 3D visualization effects for a challenging scene with reflections. The deep learning method produces severe deformations, while our method remains robust.
Figure 22.
Comparison of 3D visualization effects for a challenging scene with reflections. The deep learning method produces severe deformations, while our method remains robust.
Table 1.
The score of layout hypothesis.
Table 1.
The score of layout hypothesis.
| s/(u,v) | 1250 | 1200 | 1170 | 1150 | 1130 | 1110 | … |
|---|
| 390 | 0.4595 | 0.4424 | 0.4456 | 0.4547 | 0.4977 | 0.4781 | … |
| 380 | 0.4848 | 0.3688 | 0.3187 | 0.3031 | 0.2091 | 0.1960 | … |
| 375 | 0.5922 | 0.4417 | 0.3382 | 0.2313 | 0.1668 | 0.1322 | … |
| 370 | 0.5387 | 0.5388 | 0.4039 | 0.2970 | 0.2293 | 0.2032 | … |
| 365 | 0.5726 | 0.5237 | 0.3660 | 0.3497 | 0.3627 | 0.3510 | … |
| 360 | 0.6371 | 0.5315 | 0.3822 | 0.4356 | 0.3549 | 0.3855 | … |
| 350 | 0.5888 | 0.4070 | 0.4337 | 0.4422 | 0.5015 | 0.4996 | … |
| … | … | … | … | … | … | … | … |
Table 2.
Computational timings of framework components.
Table 2.
Computational timings of framework components.
| Processing Stage | Subprocedure | Time (ms) |
|---|
| Real-time Update and Processing | Overall processing and rendering | <50 |
| Data update and GPU uploading | <20 |
| Texture mapping and transformation | <20 |
| Layout Inference | Initial estimation | <10 |
| Optimal solution | 20–30 |
Table 3.
Comparison of computational performance and registration accuracy using different floating-point precisions.
Table 3.
Comparison of computational performance and registration accuracy using different floating-point precisions.
| Precision | RMSE (px) | MAE (px) | Average Time (ms) | Peak Memory (GB) |
|---|
| 32-bit (Single) | 10.3 ± 2.2 | 9.8 ± 1.8 | 21.5 | 1.2 |
| 64-bit (Double) | 10.1 ± 2.1 | 9.6 ± 1.7 | 34.8 | 2.1 |
Table 4.
Performance under calibrated parameter perturbations.
Table 4.
Performance under calibrated parameter perturbations.
| Perturbation Level | RMSE (px) | Success Rate | Degradation |
|---|
| No perturbation | 10.3 | 98% | Baseline |
| +3% position, +1.5° orientation | 11.9 | 94% | Minor |
| +5% position, +2° orientation | 13.5 | 87% | Moderate |
| +8% position, +3° orientation | 16.8 | 72% | Significant |
Table 5.
Quantitative comparison on regular layout scenes from the Airport Terminal Collection dataset, evaluated by reprojection error.
Table 5.
Quantitative comparison on regular layout scenes from the Airport Terminal Collection dataset, evaluated by reprojection error.
| Method | RMSE (px) ↑ | MAE (px) ↑ | Average Time (ms) ↓ | FPS ↑ |
|---|
| ORB-SLAM2 a | / | / | / | / |
| LSD-SLAM a | / | / | / | / |
| LSD | 476.50 ± 3.5 | 403.22 ± 3.2 | ∼ 20 | 50 |
| PlaneRCNN | 18.7 ± 3.8 | 14.5 ± 2.8 | ∼ 300 | 3.3 |
| RoomNet | 31.5 ± 4.2 | 27.2 ± 3.5 | ∼ 100 | 10.5 |
| RTAB-Map (RGB-D) | 16.5 ± 3.1 | 12.8 ± 2.4 | ∼ 38.7 | 25.8 |
| DCL | 14.2 ± 2.8 | 11.3 ± 2.1 | | 22.1 |
| SIR | 19.5 ± 3.5 | 15.7 ± 2.9 | | 3.6 |
| Ours | 10.3 ± 2.2 | 9.8 ± 1.8 | ∼ 21.5 | 46.5 |
Table 6.
Performance of the Layout-Based Analysis and Rendering Stage.
Table 6.
Performance of the Layout-Based Analysis and Rendering Stage.
| #Streams | Avg. Processing Time/Frame (ms) | Throughput (FPS) |
|---|
| 1 | ∼3 | ∼330 |
| 4 | ∼8 | ∼131.6 |
| 8 | ∼17 | ∼58.6 |
| 16 | ∼40 | ∼25.4 |
Table 7.
Ablation study of framework components. GP: geometric projection, CF: coordinate framework, MF: multi-stage refinement. Score denotes the geometric consistency loss, with lower values being better.
Table 7.
Ablation study of framework components. GP: geometric projection, CF: coordinate framework, MF: multi-stage refinement. Score denotes the geometric consistency loss, with lower values being better.
| Configuration | Score ↓ | Time (ms) ↓ | Improvement |
|---|
| GP | 0.451 | 1.4 | Baseline |
| + CF | 0.362 | 4.2 | +19.7% |
| + CF + MF (Ours) | 0.184 | 21.5 | +59.2% |