Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line

Gąbka, Joanna

doi:10.3390/app15169136

Open AccessArticle

Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line

by

Joanna Gąbka

Department of Advanced Manufacturing Technologies, Faculty of Mechanical Engineering, Wroclaw University of Science and Technology, 27 Wybrzeże Stanisława Wyspiańskiego, 50-370 Wrocław, Poland

Appl. Sci. 2025, 15(16), 9136; https://doi.org/10.3390/app15169136

Submission received: 14 June 2025 / Revised: 17 July 2025 / Accepted: 17 July 2025 / Published: 19 August 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The article presents an efficient method of multi-camera image processing for real-time Virtual Reality display, intended for both viewers and professional training applications.

Abstract

The paper presents the results of research focused on configuring a system for stereoscopic view capturing and processing. The system is being developed for use in staff training scenarios based on Virtual Reality (VR), where high-quality, distortion-free imagery is essential. This research addresses key challenges in image distortion, including the fish-eye effect and other aberrations. In addition, it considers the computational and bandwidth efficiency required for effective and economical streaming and real-time display of recorded content. Measurements and calculations were performed using a selected set of cameras, adapters, and lenses, chosen based on predefined criteria. A comparative analysis was conducted between the nearest-neighbour linear interpolation method and a third-order polynomial interpolation (ABCD polynomial). These methods were tested and evaluated using three different computational approaches, each aimed at optimizing data processing efficiency critical for real-time image correction. Images captured during real-time video transmission—processed using the developed correction techniques—are presented. In the final sections, the paper describes the configuration of an innovative VR-based training system incorporating an edge computing device. A case study involving a factory producing wheel rims is also presented to demonstrate the practical application of the system.

Keywords:

stereoscopic video processing; fish-eye effect; temporal reconstruction; image distortion correction methods; virtual reality; VR staff training; bandwidth efficient streaming

1. Introduction

Recent trends in vision systems are closely connected to the development of Virtual Reality (VR) solutions [1,2]. These systems require the efficient implementation of multi-camera setups to generate high-quality, immersive 3D experiences for end users. The number of VR-based products is steadily increasing [3,4]. A key advantage of such content is the enhanced sense of spatial depth, which significantly increases viewer engagement.

In professional training applications, this immersive quality offers a major benefit, as it effectively places the viewer within a simulated working environment, providing a realistic sense of distance and movement. In industrial contexts, there is growing interest in VR systems supporting applications such as maintenance management training, production process analysis, and operational training. The realism achieved in VR-based professional training heavily depends on the quality of the 3D video presented. In many cases, the precision required for such applications drives ongoing efforts to improve image quality.

Various categories of image distortion have been thoroughly described and analyzed in [5]. One well-known issue that significantly affects stereoscopic video processing is the fish-eye effect, which causes visible distortions due to the characteristics of wide-angle lenses. This problem is especially relevant for wide field-of-view (FoV) cameras—commonly referred to as fish-eye cameras—particularly those with an FoV greater than 110 degrees (~120–180°), which are often used in stereoscopic VR video production [6]. The most prominent form of distortion encountered with such devices is radial distortion, the effect of which is illustrated in Figure 1.

There is no single universal method recommended for correcting the displacement of image points that occurs non-linearly relative to their ideal positions, defined with respect to the center of distortion (COD). While numerous sophisticated mathematical models have been developed to address this issue and are used in practice, their implementation remains challenging for VR due to the complexity of these models and the associated bandwidth consumption. This becomes particularly problematic during real-time correction of the large volumes of data typically generated during 3D image recording and streaming [7,8,9,10]. A schematic representation of the ideal model used to measure deviation from the norm is presented in Figure 2, while an example of radial distortion measurement is shown in Figure 3.

Despite extensive research on lens distortion models—including polynomial models, division models, and newer machine learning-based correction techniques—there remains a significant gap in methods that balance real-time performance, accuracy, and low computational cost for multi-camera VR systems. Many proposed solutions either prioritize accuracy at the expense of processing time [13,14] or simplify models to achieve speed but at the cost of visual artifacts, such as residual warping or stitching errors [15].

Furthermore, while deep learning approaches for distortion correction have shown promise (e.g., convolutional neural networks trained to undistort wide-angle images), these methods often require large, annotated datasets and are rarely optimized for the unique geometry and calibration demands of multi-camera VR systems [16]. Another critical gap is the lack of standardized evaluation metrics that consider not only geometric accuracy but also perceptual quality in immersive VR contexts [17,18].

Additionally, most distortion correction methods do not fully account for dynamic factors in industrial VR applications, such as vibration, camera drift, or thermal effects on lens calibration. These issues lead to cumulative errors over time, which are particularly detrimental in high-precision VR training systems. There is thus a pressing need for adaptive, lightweight correction algorithms that can operate robustly in dynamic environments while meeting the low-latency requirements of VR streaming.

Given the wide range of industrial applications for fish-eye camera-based systems, there is an urgent need to address the associated image distortion issues. This paper presents a comparative analysis of two distortion correction methods applied to selected camera sets. The research forms part of a broader effort to develop an efficient multi-camera system for stereoscopic video production and streaming, with particular emphasis on image quality and computational efficiency. High-resolution, detailed, and sharp imagery is not only desirable for aesthetic reasons—such as viewing comfort and immersion—but is also critical for professional applications, including object detection, recognition, and classification. This need becomes especially apparent when data is collected from multiple camera devices. Distortion correction methods typically demand considerable CPU resources and bandwidth. This paper begins with a review of available image distortion correction techniques, followed by the presentation of experimental results comparing two selected approaches. The evaluation focuses on their effectiveness in configuring multi-camera systems for stereoscopic video production in VR-based staff training, with special attention to data processing efficiency. Exemplary recorded images and the corresponding processing stages are discussed. The final section of the paper describes the practical implementation of the proposed VR system in a case study involving a hot-working production line.

2. State of the Art-Methods of Geometric Distortion Compensation and VR Content Streaming

The work proposed by Hughes et al. [19] classifies the radial distortion correction methods into two main groups: (1) polynomial models with approaches such as the odd polynomial model, the division model and the polynomial fish transform, and (2) the non-polynomial models such as the fish-eye transform, the field-of-view model, and the perspective model. To picture the whole spectrum of solutions presented in the literature, there must be added a third group of interpolation methods studied widely in medical image applications that can be divided into nearest-neighbour interpolation, bilinear interpolation, bicubic interpolation, Lanczos interpolation, and edge-preserving interpolation. The first group of methods proposed for the radial distortion correction are polynomial models. Their main drawback is the lack of an analytical method to invert them for the purpose of using the inverted version to display the processed view. The most standard approach from this group is presented in [20,21,22,23]. The odd polynomial Model has the following expression:

r_{d} = r_{u} + \sum_{n = 1}^{\infty} k_{n} r_{u}^{2 n + 1} = r_{u} + k_{1} r_{u}^{3} + \dots + k_{n} r_{u}^{2 n + 1} + \dots

(1)

where

r_{d}

is the distorted radius,

r_{u}

is the undistorted radius, and the

k

terms are the polynomial coefficients. Its approximate inversion to the fifth order was presented in [22] as follows:

r_{u} = r_{d} - r_{d} (\frac{k_{1} r_{d}^{2} + k_{2} r_{d}^{4} + k_{1}^{2} r_{d}^{4} + k_{2}^{2} r_{d}^{8} + 2 k_{1} k_{2} r_{d}^{6}}{1 + 4 k_{1} r_{d}^{2} + 6 k_{2} r_{d}^{4}})

(2)

The second well-known approach is the division model [24,25]:

r_{u} \frac{r_{d}}{1 + \sum_{n = 1}^{\infty} k_{n} r_{d}^{2 n}} = \frac{r_{d}}{1 + k_{1} r_{d}^{2} + \dots k_{n} r_{d}^{2 n} + \dots}

(3)

The division model is inherently inverted, which means that the undistorted radial distance of the point is derived as a function of the distorted radial distance of the point. Both calculation methods presented above are approximations of the real camera’s distortion curve rather than approximations of the benchmark, which is the standard polynomial model.

The last calculation procedure from the polynomial group is the polynomial fish-eye transform (PFET). It is suggested in [25] that polynomial models are insufficient for fish-eye lenses, even using a high value of order. It is better to insert odd and even coefficients in the same formula instead of choosing one or the other. This idea is reflected in [25,26]:

r_{d} = \sum_{n = 1}^{\infty} k_{n} r_{u}^{n} = k_{1} r_{u}^{1} + \dots + k_{n} r_{u}^{n} + \dots

(4)

A similar formula with an additional 0th-order term was introduced in [27]:

r_{d} = k_{0} + k_{1} r_{u}^{1} + \dots + k_{n} r_{u}^{n} + \dots

(5)

The other group of solutions for the distortion issue is non-polynomial models, where the analytical approach is implemented instead of an approximation of the distortion curve. The first representation of this approach is the fish-eye transform (FET), which considers the fact that the distorted image has lower resolution in peripheral areas and higher resolution in fovea areas [27]:

r_{d} = s l n (1 + λ r_{u})

(6)

where

s

is a scalar and

λ

expresses the amount of distortion across the image. The inverse form of this model is as follows:

r_{u} \frac{e^{\frac{r_{d}}{s}} - 1}{λ}

(7)

The second example from the non-polynomial models is the field-of-view model (FOV) based on the simplified, optical model of the fish-eye lens [28]

r_{d} = \frac{1}{ω} a r c t a n (2 r_{u} t a n \frac{ω}{2})

(8)

where

ω

is the angular field-of-view from the ideal fish-eye camera. It is important to notice that this value may not reflect the actual field-of-view because the considered device may not precisely fit the proposed model. The disadvantage is that the model is not applicable to complex distortion causes. In this kind of issue, the hybrid two-step calculation model is recommended with Formula (1), where

k_{1}

= 0 and, subsequently, Equation (8) from the FOV model.

The inverse of the FOV model is:

r_{u} = \frac{t a n (r_{d} ω)}{2 t a n (\frac{ω}{2})}

(9)

The third method is the perspective model described in [29]:

r_{d} = f a r c t a n (\frac{r_{u}}{f})

(10)

where

f

is the apparent focal length. The inverse version of the formula is as follows:

r_{u} = f t a n (\frac{r_{d}}{f})

(11)

It is important to underline that the apparent focal length does not have to be the actual value since fish-eye optics often include several different lenses that all influence the value to different degrees.

In the third group of Interpolation models, the first approach (nearest-neighbour interpolation) assumes that the value of a given pixel is assigned based on the colour of the nearest neighbouring pixel (in the sense of Euclidean distance). The flaw of this method is that it introduces aliasing artefacts due to the lack of smoothing, which causes problems during the extraction of features for the purpose of their classification. The second approach (bilinear interpolation) is more appealing visually, and it is implemented by taking the weighted average of pixels in a 2 × 2 neighbourhood. However, it is still imperfect. Bicubic interpolation is similar to the previous method but more precise, while taking a 4 × 4 pixel neighbourhood for the weighted average computation. This results in a smaller number of artefacts compared to using bilinear interpolation. The fourth method from this group is so called Lanczos filter [11], which applies an 8 × 8 neighbourhood for quite effective reconstruction, reducing artefacts, keeping frequencies, and avoiding aliasing.

Another extensively researched and developed group of methods in recent years is neural network-based approaches to image distortion correction. Among these, convolutional neural networks (CNNs) have emerged as the dominant algorithms for image classification and processing. CNN-based methods can generally be categorized into three main types [30]: (i) pre-trained CNNs [31], (ii) fine-tuned pre-trained CNNs adapted to specific datasets [32,33], and (iii) CNNs trained from scratch with randomly initialized weights [34]. A more detailed taxonomy of CNN architectures includes the following groups:

Classical CNN models, such as LeNet, AlexNet, VGGNet, and Network in Network (NIN) [35,36,37,38].
GoogLeNet/Inception models, from Inception V1 to V4 [39].
Residual learning networks, including ResNet, improved versions of ResNet, ResNet combined with Inception, and DenseNet [40,41,42,43,44].
Attention-enhanced CNNs, such as Residual Attention Neural Network, SENet, BAM, CBAM, GENet, SKNet, GSoP-Net, ECA-Net, and Coordinate Attention [45,46,47,48,49,50,51,52].
Lightweight or efficient networks, including SqueezeNet, MobileNet V1–V3, ShuffleNet V1–V2, PeleeNet, MnasNet, and other backbone networks designed for real-time vision tasks [53,54,55,56].

The increasing possibilities resulting from 3D panoramic VR cameras usage are underlined in [1], together with the importance of the gap between live streaming and recorded content, where online, the correction of optical flow from multi-camera rigs is expensive and technically challenging. The content captured by Google’s Jump VR camera, for example, requires 60 s of processing per frame [9]. The approaches to this issue might be divided into the following three groups:

(1): The Panoramic and ODS Imaging—panoramic stitching via image capture and alignment [6,57,58] and special concentric mosaics that use a pair of panoramic images and encoding algorithms [59,60].
(2): Spinning Cameras and Displays—rotating camera system with sensors [61,62,63].
(3): Dynamic Omni-Directional Stereo Imaging—synchronized multi-camera array systems proposed in [9,58] and applying interpolation for the view adjustment. This category of solutions is contemporarily most common. It is also analyzed in this article. The biggest challenge related to these systems, the massive amount of data with high processing requirements (Facebook’s 360 cameras record 30 frames per second, generating 17 Gb/s of raw data streamed for offline processing), was tackled in the current work, with noticeable improvements compared to the results from reported works. The confirmation of the improvement achieved by utilizing the chosen method with special adjustments and modifications is presented in the images recorded during real-time video transmission of an event (paragliders competition).

3. The Test and Analysis of Different Approaches to Distortion Correction Considering Effectiveness and Data Processing Efficiency

The testing process was preceded by a preliminary study aimed at selecting the optimal combination of camera, adapter, and lens from the following five kits:

Sony A7 III (ILCE-7M3) + adapter Metabones MB-EF-E-BT5 without optics + Canon EF 8–15 mm f/4L Fisheye USM.
Panasonic Lumix DC-GH5S + adapter Metabones Canon EF to Micro Four Thirds T Speed Booster XL 0.64z with additional optics + Canon EF 8–15 mm f/4L Fisheye USM.
Panasonic Lumix DC-GH5S + Fujinon FE185C086HA-1 with additional passive adapter.
Pixelink PL-D755CU-BL-08051 + Sunex DSL-315B with additional passive adapter.

Pixelink PL-D775CU-BL-08051 + Fujinon FE185C086HA-1.The camera sets listed above were evaluated according to the following acceptance criteria: minimum tonal range: 60 dB/10 f-stops; minimum image resolution: 1000 × 1000 pixels; minimum frame rate: 30 frames per second; minimum bit rate for stereoscopic image transmission: 8000 kbps; and permissible distortion for stereoscopic half-spherical 180° view capture (camera, adapter, lens set): ±20% from the F-theta model.

This phase of the research involved verifying the optical accuracy of each set in the context of stereoscopic 180° image capture, identifying and correcting image artifacts, and determining the maximum achievable optical resolution under various lighting conditions. Based on the evaluation criteria, the optimal configuration selected was the Sony A7 III (ILCE-7M3) camera paired with the Metabones MB-EF-E-BT5 adapter (without additional optics) and the Canon EF 8–15 mm f/4L Fisheye USM lens.

The mechanical fitting of the adapter demonstrated no noticeable backlash and allowed for easy assembly and disassembly with both the camera and the lens. There was no observable impact of the adapter on image quality. Electronic functions, including diaphragm control and autofocus settings, operated flawlessly. The adapter ensured proper bidirectional communication with the camera, as confirmed by the accurate reading of the set focal length. An additional advantage of the adapter is the inclusion of a ¼″ photographic screw socket located on its underside, which enables easy and secure mounting to a camera rig near the system’s center of gravity. The visual field of view was measured at a focal length of 8 mm. The angle between the optical axis and the horizontal edge of the visual range was approximately 89°, resulting in a total circular field of view of 178° for the entire lens. Figure 4 illustrates the image circle produced by the Canon lens.

Field-of-view tests were conducted using the camera’s maximum resolution of 6000 × 4000 pixels in photographic mode. In video mode, the actual frame coverage varies depending on the selected aspect ratio (4:3 or 16:9). The camera does not support internal recording of the full sensor area in 4:3 format; however, an external video signal—used as a live preview—can be transmitted via HDMI at a resolution of 4096 × 2160 pixels (30 fps), providing a near-complete image circle. At a focal length of 8 mm, the image circle fully fits within the sensor, with an average diameter of D = 3911 px. Assuming an ideal circular field of view, the number of utilized pixels can be estimated as follows:

Number of Pixels Used = π (D/2)2 = π (3911/2)2 ≈ 12.0139 Mpix;

The resolution of Sony A7SIII matrix equals 6000 × 4000 px = 24 Mpix;

The matrix coverage in percentage:

Number of Pixels Used/Matrix resolution = 12.0139/24 = 50.06%.

The deviation from the F-theta model was measured at the minimum focal length of 8 mm to assess the wide-angle distortion characteristics. The highest deviation observed was 6%, occurring at viewing angles of 86° and 87.5°, as illustrated in Figure 5. The detailed results are presented in Table 1. Chromatic aberration values were also measured across selected focal lengths within the tested setup (see Table 1). As the Metabones adapter used in this configuration does not include any optical elements, it was assumed to have no impact on the measured aberration.

The results obtained across all tested configurations showed chromatic aberration levels consistently below 0.06%. A correlation between aberration and focal length was observed, while the influence of the screen on aberration proved negligible. These findings aligned with subjective assessments based on visual analysis of the images. To correct image distortion—a necessary step for effective rear projection—two primary approaches were analyzed:

Third-order polynomial correction (ABCD polynomial);
Nearest-neighbor interpolation (polyline method) with interpolation between experimentally derived points.

In the first method, a lookup table (LUT) containing precomputed pixel coordinates was used to improve computational efficiency. The LUT was generated once, and the stored values were used repeatedly. However, initial implementation revealed two issues: (1) performance inefficiency and (2) visual artifacts due to aliasing, indicating insufficient sampling frequency. The performance bottleneck was addressed by leveraging AVX and AVX2 processor enhancements, which significantly improved efficiency. The aliasing problem, however, required a more sophisticated image sampling method. Implementing this on the CPU would have resulted in unacceptable processing delays, so the entire solution was transferred to the GPU using C++ AMP. This approach took advantage of high-quality built-in GPU samplers, which can accurately fetch colors from non-integer input texture coordinates, thereby eliminating aliasing artifacts and improving overall processing speed.

To further optimize performance, an alternative method was explored: instead of using a LUT, each pixel was projected and corrected individually on the GPU. Although highly precise, implementing F-theta correction in this framework proved challenging. The second approach, based on interpolating between experimental correction points using a polyline, was not directly feasible. Instead, the polyline was encoded as a 1D texture from which correction data were sampled. This method was later refined by replacing the polyline with a third-order polynomial fitted to experimental data, as second-order approximations did not yield sufficient precision.

To evaluate the trade-off between efficiency and precision—critical for real-time projection—four methods were tested using NV12 color space on GPU:

LUT—Precomputed ABCD polynomial values stored in a temporary texture; tested at nine resolutions.
realtime-lutcorr—Frame-by-frame correction using polyline data stored in a 1D texture.
realtime-abcd—Polynomial ABCD computed for each frame independently.
realtime—No correction applied.

A key requirement for the projection module is real-time frame processing, favoring approximation methods due to their faster execution. However, these methods typically reduce image precision, so minimizing quality degradation relative to performance gains is essential. A practical way to improve speed was to reduce LUT resolution. While lower-resolution tables were faster, they also introduced some image deformation. To identify an optimal balance, several LUT resolutions were tested, and the corresponding image deformation ratio was analyzed (Figure 6). The results showed a clear inflection point: reducing LUT resolution to 256 × 256 preserved acceptable image precision. Resolutions above this threshold did not result in perceptible quality improvements, confirming 256 × 256 as the optimal resolution for real-time processing.

An additional issue encountered during the research was the occurrence of artifacts introduced by GPU-based floating-point calculations, particularly during the computation of the unit vector V used in the equirectangular projection. The precision limitations of floating-point arithmetic led to deviations in the calculated vector’s length, which, in some cases, significantly diverged from the expected unit length of 1. This inaccuracy resulted in visible graphical artifacts, most notably affecting several to a dozen pixels. Depending on the resolution, the artifacts appeared as rectangular shapes near the center of the image. Figure 7 illustrates these issues, showing the graphical artifacts produced by the fast but less precise realtime-lutcorr method on the left, and those from the more accurate realtime-abcd method on the right. Additionally, Figure 7 includes an error map, where pixel-wise inaccuracies are visualized. The map reveals a distinct pattern of computational errors, which is superimposed with imperfections caused by the reduced resolution of the LUT, further contributing to the observed image degradation.

To solve the problem shown in Figure 7, there were two solutions available:

(1): Using a precise calculation method from the family concurrency: precise_math, described further as the precise method.
(2): Vector normalization in the appropriate place, significantly reducing artifacts, described as the fast + fix method.
(3): Method WITHOUT any changes is described as the fast method.

Figure 8 shows differences between coordinate calculations for the fast method (on the left), the fast + fix (in the middle), and the precise method(on the right). In the second row, there is a copy of the middle part of the view zoomed in.

The method utilizing an appropriately selected third-order polynomial (ABCD) demonstrated very high precision, with deviations from the polyline method remaining below one per mil across the entire image. Nevertheless, processing speed, combined with minimal image distortion, remains the most critical factor when selecting the optimal correction method. To identify the most suitable approach, efficiency tests were conducted and compared against image quality metrics. The results, presented in Figure 9, show that for the LUT-based method, resolutions up to 512 × 512 can be employed without incurring significant computational overhead. Conversely, for methods that involve per-frame calculations, issues related to precision and normalization of the unit vector V must be carefully addressed. In such cases, highly precise methods should be avoided, as they tend to introduce computational bottlenecks without offering a proportional benefit in visual quality.

The best-performing variants of both the LUT-based method and the precise per-frame calculation method demonstrated comparable processing speeds. Considering this, the per-frame calculation approach is recommended due to its flexibility, simpler reconfiguration, and ease of use. While the ABCD and polyline correction methods offer similar levels of efficiency, the ABCD method holds an additional advantage: greater simplicity in configuration. Instead of relying on a lookup table derived from multiple measurement points, the ABCD method requires only four precomputed coefficients, streamlining the calibration process. The algorithms developed during the study were further optimized for real-time execution on a dedicated edge device designed for capturing and transmitting video streams (DfCT),. Integrating the DfCT device with the selected algorithms enables compression of 5–10 GB of raw multi-camera data into approximately 100 MB of corrected and compressed imagery, ready for efficient streaming.

Streaming tests using the DfCT device confirmed the viability of the solution under various conditions. Key results include:

3840 × 2160 resolution (1920 × 2160 per eye), 30 FPS, 13 Mbps bitrate—suitable for standard high-definition VR transmission.
1920 × 1080 resolution (960 × 1080 per eye), 60 FPS, 20 Mbps bitrate—optimized for dynamic content such as sports events.
1920 × 1080 resolution, 25 FPS, 10 Mbps bitrate—used in low-light conditions requiring longer exposure times.

Each test session lasted 90 min and was repeated ten times to validate reliability and reproducibility. The results indicate that the computational and bandwidth challenges identified in [9] were effectively addressed through the combination of optimized correction algorithms and the DfCT edge processing unit. An example of the raw image output from the real-time video stream is shown in Figure 10, displaying two circular fish-eye projections, each corresponding to a separate eye. These circular views were transformed into equirectangular hemispherical projections (180° × 180°) and merged side-by-side. In some instances, minor gaps may appear between the hemispherical segments due to optical limitations. To mitigate this, cropping the inner or outer image edges (e.g., the right side of the left eye and the left side of the right eye) is a viable solution. This adjustment does not affect the viewer’s experience, as the occluded areas typically correspond to regions naturally blocked by the viewer’s nose or the overlapping lenses. As a result, immersion remains uncompromised.

The color profile of the raw images is corrected in real time. Initially, the images are captured using a so-called “flat” profile, characterized by a modified gamma curve. This approach preserves the maximum tonal range by sacrificing color gradient smoothness, without requiring increased bit depth or metadata for HDR encoding. During real-time processing, this flat profile is transformed into a low dynamic range (LDR) representation that aligns with the Rec.709 standard. The transformation enhances perceived contrast and color saturation, while also performing tonal remapping—specifically, brightening shadow regions and darkening highlights. This adjustment optimizes the image for standard display environments and ensures visual fidelity across typical viewing devices. An example of this transformation process is illustrated in Figure 11.

Figure 12 presents the same scene as viewed within the VR player, where the content is displayed following an equirectangular-to-rectilinear transformation. This transformation enables the viewer to experience a real-time interactive perspective, allowing them to look around freely within the scene. The displayed fragment represents the portion of the visual sphere currently in the user’s field of view, dynamically adjusted based on head movement or input from a control device.

Figure 13 displays the image as seen in the player used for live transmissions, where the viewer can freely explore any part of the hemisphere in real time. The figure illustrates the result obtained using the implemented method, which effectively balances image quality and compression efficiency. The solution enables smooth, uninterrupted streaming while maintaining low bandwidth usage, demonstrating the method’s capability for stable real-time performance during live events.

It is important to note that all the images presented in Figure 10, Figure 11, Figure 12 and Figure 13 are not shown in full quality, as their size and resolution were reduced for publication purposes, which may result in visible imperfections not present in the original output. The measurements and implementation tests clearly demonstrate that high-quality, real-time image correction for VR applications is achievable. The successful case of 3D video streaming from a sports competition further encouraged exploring opportunities to apply the developed solution in professional engineering contexts.

4. The Design of a VR Application for Safety Improvement Training System

The development of Industry 4.0 and 5.0 is closely linked to increasing production system autonomy, reliability, and, above all, employee safety. This was the key motivation for designing a VR-based system aimed at improving safety in industrial environments with elevated levels of risk and danger. The manufacturing process at a wheel rim production facility begins with the acquisition of aluminum alloy. The initial stage involves two large furnaces, each capable of holding several tons of raw material. Upon exiting these furnaces, the alloy reaches temperatures between 500 °C and 700 °C. The molten material is extracted in 500 kg batches, which are then transported to the next phase of production. This subsequent phase is carried out on three parallel lines, each equipped with 16 smaller furnaces. These smaller units are fitted with dispensers that precisely administer the alloy into injection molding machines, where further processing takes place. A schematic representation of this production workflow is shown in Figure 14.

The transportation of molten aluminum alloy between the aggregate furnaces and the smaller furnaces is carried out by operators using forklifts equipped with specialized ladles. Given the continuous-flow nature of this production process, operations are maintained across three shifts. The workflow can be broken down into the following stages:

Accurate positioning of the forklift near the main furnace;
Loading the ladle with molten aluminum alloy;
Transporting the load to one of the 48 smaller furnaces spread across three production lines;
Approaching the assigned workstation;
Depositing the alloy into the smaller furnace.

To ensure smooth and safe execution, the process demands at least twenty highly trained and experienced operators. Strict adherence to safety protocols is essential, as the molten alloy reaches hazardous temperatures, and any deviation from procedures can lead to severe or fatal injuries. Particular attention must be paid to manoeuvring the ladle during the filling, transport, and pouring stages. Newly recruited personnel undergo both theoretical and hands-on training, guided by seasoned operators who highlight high-risk procedures and critical control points. Despite these efforts, safety violations continue to be observed—especially among workers with less than four months of experience—highlighting the need for more immersive and effective training methods. A Virtual Reality (VR)-based training system is proposed to enhance learning outcomes and reduce supervisory costs. Compared to traditional video recordings, VR technology, particularly with stereoscopic video, enables safe yet realistic simulations of hazardous scenarios, improving the trainee’s situational awareness and decision-making.

The training system will comprise two key components:

Stereoscopic 3D Video Training Module—This module will deliver live-streamed, high-resolution 3D video focusing on tasks prone to procedural errors, such as step omissions, incorrect sequences, or lack of concentration. The video feed will provide multi-angle views and include visual annotations (e.g., arrows, numbering, and safety tips) to enhance instructional clarity. The processed recordings will also be used for on-demand training and procedural reviews.
VR Simulation Module—A fully interactive simulator replicating the workstation environment will be used for practical training and skill assessment. This will allow new hires to practice and demonstrate competence before being deployed on the factory floor.

This paper focuses on the development of the first module—the real-time 3D video training. The critical area to be covered during the training is the load/unload zone of the furnace, where most hazardous operations occur. For this purpose, a multi-camera rig will be deployed around the station, comprising six cameras arranged in two rigs, each configured for stereoscopic video capture using three synchronized cameras. A schematic of the custom-designed camera rig is shown in Figure 15.

In this application, the system will be configured using two camera rigs, positioned on either side of the furnace’s load/unload zone. This strategic placement ensures optimal coverage of the operator’s actions without interfering with the forklift’s movement path. An example of the installation setup is presented in Figure 16. Each rig accommodates three synchronized cameras, enabling stereoscopic video capture and offering the flexibility to present critical details from multiple angles and perspectives. This arrangement ensures that the training footage captures both the procedural flow and the nuances of operator movement, contributing to a more immersive and instructive VR experience.

As demonstrated in Section 3, delivering an undistorted image free from visual artifacts requires extensive computation and must be achieved in near real-time to avoid noticeable transmission delays. To meet these requirements, the core component of the system is a dedicated device for capturing and transmitting video streams (DfCT), which functions as an edge computing unit (Figure 17). This specialized hardware is responsible for processing input signals from the cameras using optimized algorithms that reduce latency by a factor of four compared to conventional solutions currently available on the market. The system has been designed to maintain a maximum latency threshold equivalent to 15 frames per second.

The DfCT is engineered not only for high-speed data capture and transmission but also for superior visual quality. It supports 4K resolution (3840 × 2160) at a minimum of 30 frames per second, with an expected spatial resolution of 21.3 pixels per degree. The integrated algorithms perform geometric corrections, colour grading via 3D LUT, and compensation for optical and camera-related imperfections. Additionally, the DfCT ensures compatibility with media streaming protocols required by the VR training platform. An essential feature of the DfCT is its data storage capability. All captured video streams are archived, enabling offline post-production, including replay functionality or slow-motion playback of critical training segments. This makes the DfCT not only a real-time transmission unit but also a versatile tool for detailed training analysis and review. A comprehensive description of the DfCT can be found in [63].

The presented case study illustrates the potential of Virtual Reality (VR) technologies in supporting innovative work environments within the context of Industry 4.0. This example demonstrates how immersive solutions can significantly enhance training processes and workplace safety. It is easy to envision similar implementations across various sectors where precision, complexity, and safety-critical operations are integral to the workflow. The motivation for adopting such systems extends beyond mitigating health and safety risks, as in the scenario described. In many modern manufacturing environments, the demand for highly accurate, technically advanced tasks is growing, while at the same time, there is a noticeable shortage of adequately trained personnel. This skills gap imposes additional costs on employers in the form of extended onboarding, supervision, and potential operational inefficiencies. By incorporating intelligent, immersive training platforms such as VR-based simulations, organizations can reduce these costs while simultaneously improving training quality and employee preparedness. These technologies offer an effective and scalable solution to meet the evolving demands of the digital industrial landscape.

5. Discussion and Conclusions

The tests and calculations conducted for the two selected image distortion correction methods yielded highly promising results, indicating their viability for integration into the designed VR-based staff training system. The comparative analysis revealed no substantial differences in either visual output or numerical precision between the nearest-neighbour interpolation and the ABCD polynomial approaches. Given their comparable performance, the polynomial method was favored due to its simpler implementation, making it more suitable for practical deployment. The developed correction algorithm demonstrated strong performance during stereoscopic video streaming of a paragliding competition, which served as a final validation test under real-world conditions. This is particularly significant considering that the high data volume and bandwidth requirements are often cited as major challenges hindering the widespread adoption of VR technology. Regardless of the specific correction method used, the ability to process large datasets in near real time is critical. To address this, a DfCT edge computing unit was employed. The specifications and operational characteristics of the proposed edge device were detailed in this study. Repeatable transmission tests using various video materials confirmed measurable performance improvements over previous research. Although the DfCT unit demonstrated clear benefits, scaling such solutions for large industrial installations—where dozens of cameras may be deployed simultaneously—remains an open challenge. Notably, the implemented correction techniques rely on static calibration parameters and do not dynamically adapt to operational changes such as camera drift, thermal expansion effects, or vibration-induced misalignments—factors that are increasingly significant in industrial and outdoor VR environments. There is a clear opportunity to explore hybrid correction frameworks that combine classical models with machine learning techniques capable of learning distortion characteristics on the fly, thereby reducing calibration time and improving robustness in dynamic settings [64]. Future research on appropriate correction methods for VR applications should also consider lightweight neural networks and deep learning specifically designed for real-time vision tasks. In conclusion, the results presented in this paper highlight the considerable potential of VR technologies, not only for industrial training, as shown in this case study, but also across a broader range of applications where immersive, data-intensive environments can offer tangible value.

Funding

This research was funded by the European Union and financed by the European Regional Development Fund under the Smart Growth Program. The project is implemented as part of the ‘Szybka Ścieżka’ competition organized by the National Centre of Research and Development. POIR.01.01.01-00-1111/17.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study and due to technical limitations. Requests to access the datasets should be directed to [support@bivrost360.com].

Acknowledgments

The author would like to express her sincere gratitude to the Bivrost company for their valuable support and collaboration throughout the preparation of this work. She also wishes to acknowledge the individual contributions of Tomasz Gawlik, Paweł Surgiel, and Krzysztof Bociurko, whose expertise, insightful discussions, and assistance were instrumental in the development and completion of this paper.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Konrad, R.; Dansereau, D.G.; Masood, A.; Wetzstein, G. SpinVR: Towards Live-Streaming 3D Virtual Reality Video. ACM Trans. Graph. 2017, 36, 209. [Google Scholar] [CrossRef]
Luhmann, T.; Robson, S.; Kyle, S.; Boehm, J.; de Gruyter, W. Close-Range Photogrammetry and 3D Imaging; Gruyter GmbH & Co KG: Berlin, Germany, 2020. [Google Scholar]
Fanini, B.; Ferdani, D.; Demetrescu, E. Temporal Lensing: An Interactive and Scalable Technique for Web3D/WebXR Applications in Cultural Heritage. YOCOCU2020 Hands Herit. Exp. Conserv. Mastering Manag. Herit. 2021, 4, 710–724. [Google Scholar] [CrossRef]
Vieri, C.; Lee, G.; Balram, N.; Jung, S.; Yang, J.; Yoon, S.; Kang, I. An 18 megapixel 4.3" 1443 ppi 120 Hz OLED display for wide field of view high acuity head mounted displays. J. Soc. Inf. Disp. 2018, 26, 314–324. [Google Scholar] [CrossRef]
Gao, Z.; Hwang, A.; Zhai, G.; Peli, E. Correcting geometric distortions in stereoscopic 3D imaging. PLoS ONE 2018, 13, e0205032. [Google Scholar] [CrossRef] [PubMed]
Dansereau, D.G.; Schuster, G.; Ford, J.; Wetzstein, G. A Wide-Field-of-View Monocentric Light Field Camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5048–5057. [Google Scholar]
Hoffman, D.; Meraz, Z.; Turner, E. Limits of peripheral acuity and implications for VR system design. J. Soc. Inf. Disp. 2018, 26, 483–495. [Google Scholar] [CrossRef]
Hoffman, D.M.; Lee, G. Temporal Requirements for VR Displays to Create a More Comfortable and Immersive Visual Experience. Inf. Disp. 2020, 35, 9–39. [Google Scholar] [CrossRef]
Anderson, R.; Gallup, D.; Barron, J.T.; Kontkanen, J.; Snavely, N.; Hernandez, C.; Agarwal, S.; Seitz, S.M. Jump: Virtual Reality Video. ACM Trans. Graph. (SIGGRAPH Asia) 2016, 35, 397–407. [Google Scholar] [CrossRef]
Arza-García, M.; Núñez-Temes, C.; Lorenzana, J.A. Evaluation of a low-cost approach to 2-D digital image correlation vs. a commercial stereo-DIC system in Brazilian testing of soil specimens. Arch. Civ. Mech. Eng. 2022, 22, 4. [Google Scholar] [CrossRef]
Hughes, C.; Jones, E.; Glavin, M.; Denny, P. Validation of Polynomial-based Equidistance Fish-Eye Models. In IET Irish Signals and Systems Conference (ISSC); IET: Stevenage, UK, 2009. [Google Scholar]
Kumlera, J.; Martin, B. Fisheye lens designs and their relative performance. Proc. SPIE 2000, 4093, 360–369. [Google Scholar]
Kannala, J.; Brandt, S.S. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1335–1340. [Google Scholar] [CrossRef]
Hughes, C.J.; Glavin, M.; Jones, E.; Trivedi, M.M. Review of geometric distortion compensation in automotive camera systems. IEEE Trans. Intell. Transp. Syst. 2010, 11, 377–384. [Google Scholar]
Scaramuzza, D.; Martinelli, A.; Siegwart, R. A flexible technique for accurate omnidirectional camera calibration and structure from motion. In Proceedings of the IEEE International Conference on Computer Vision Systems, New York, NY, USA, 4–7 January 2006; pp. 45–51. [Google Scholar]
Rong, Y.; Su, S.; Yan, S. Radial distortion correction with deep convolutional neural networks. In Proceedings of the IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016; pp. 1067–1071. [Google Scholar]
Yu, X.; Ramalingam, S. New calibration methods and datasets for omnidirectional cameras. Int. J. Comput. Vis. 2018, 126, 1113–1135. [Google Scholar]
Madejski, G.; Zbytniewski, S.; Kurowski, M.; Gradolewski, D.; Kaoka, W.; Kulesza, W.J. Lens Distortion Measurement and Correction for Stereovision Multi-Camera System. Eng. Proc. 2024, 82, 85. [Google Scholar] [CrossRef]
Hughes, C.; Glavin, M.; Jones, E.; Denny, P. Review of Geometric Distortion Compensation in Fish-Eye Cameras. In Proceedings of the IET Irish Signals and Systems Conference (ISSC), Galway, Ireland, 18–19 June 2008. [Google Scholar]
Slama, C. (Ed.) Manual of Photogrammetry. In American Society of Photogrammetry, 4th ed.; American Society of Photogrammetry: Falls Church, VA, USA, 1980. [Google Scholar]
Tsai, R. A versatile camera calibration technique for high accuracy 3d machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom. 1987, 4, 323–344. [Google Scholar] [CrossRef]
Mallon, J.; Whelan, P.F. Precise radial un-distortion of images. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26–26 August 2004; Volume 1, pp. 18–21. [Google Scholar]
Ahmed, M.; Farag, A. Nonmetric calibration of camera lens distortion: Differential methods and robust estimation. IEEE Trans. Image Process. 2005, 14, 1215–1230. [Google Scholar] [CrossRef]
Fitzgibbon, A.W. Simultaneous linear estimation of multiple view geometry and lens distortion. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. 125–132. [Google Scholar]
Asari, K.V. Design of an efficient VLSI architecture for non-linear spatial warping of wide-angle camera images. J. Sys. Arch. 2004, 50, 743–755. [Google Scholar] [CrossRef]
Shah, S.; Aggarwal, J.K. A simple calibration procedure for fisheye (high distortion) lens camera. In Proceedings of the 1994 IEEE International Conference on Robotics and Automation, San Diego, CA, USA, 8–13 May 1994; Volume 4, pp. 3422–3427. [Google Scholar]
Basu, A.; Licardie, S. Alternative models for fish-eye lenses. Elsevier Pattern Recognit. Lett. 1995, 16, 433–441. [Google Scholar] [CrossRef]
Devernay, F.; Faugeras, O. Straight lines have to be straight: Automatic calibration and removal of distortion from scenes of structured environments. Springer-Verl. J. Mach. Vis. Appl. 2001, 13, 14–24. [Google Scholar] [CrossRef]
Ishii, C.; Sudo, Y.; Hashimoto, H. An image conversion algorithm from fisheye image to perspective image for human eyes. In Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Kobe, Japan, 20–24 July 2003; Volume 2, pp. 1009–1014. [Google Scholar]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Hu, F.; Xia, G.S.; Hu, J.; Zhang, L. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef]
Minetto, R.; Segundo, M.P.; Sarkar, S. Hydra: An ensemble of convolutional neural networks for geospatial land classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6530–6541. [Google Scholar] [CrossRef]
Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1155–1167. [Google Scholar] [CrossRef]
Chen, G.; Zhang, X.; Tan, X.; Cheng, Y.; Dai, F.; Zhu, K.; Gong, Y.; Wang, Q. Training small networks for scene classification of remote sensing images via knowledge distillation. Remote Sens. 2018, 10, 719. [Google Scholar] [CrossRef]
Mangasarian, O.L.; Musicant, D.R. Data Discrimination via Nonlinear Generalized Support Vector Machines. In Complementarity: Applications, Algorithms and Extensions; Springer: Boston, MA, USA, 2001; pp. 233–251. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 15 October 2015; pp. 1–9. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Zagoruyko, S.; Komodakis, N. Wide Residual Networks. arXiv 2016, arXiv:1605.07146. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
Hu, J.; Shen, L.; Sun, G.; Albanie, S. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Vedaldi, A. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks; NIPS’18; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 9423–9433. [Google Scholar]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Ionescu, C.; Vantzos, O.; Sminchisescu, C. Matrix Backpropagation for Deep Networks with Structured Layers. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2965–2973. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2019, arXiv:1910.03151. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. arXiv 2021, arXiv:2103.02907. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Howard, A.; Pang, R.; Adam, H.; Le, Q.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv 2018, arXiv:1807.11626. [Google Scholar]
Brown, M.; Lowe, D.G. Automatic Panoramic Image Stitching Using Invariant Features. Int. J. Comput. Vis. 2007, 74, 59–73. [Google Scholar] [CrossRef]
Matzen, K.; Cohen, M.F.; Evans, B.; Kopf, J.; Szeliski, R. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 2017, 36, 148:1–148:12. [Google Scholar] [CrossRef]
Richardt, C.; Pritch, Y.; Zimmer, H.; Sorkine-Hornung, A. Megastereo: Constructing High-Resolution Stereo Panoramas. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 1256–1263. [Google Scholar]
Bourke, P. Capturing omni-directional stereoscopic spherical projections with a single camera. In Proceedings of the 2010 16th International Conference on Virtual Systems and Multimedia, Seoul, Republic of Korea, 20–23 October 2010; pp. 179–183. [Google Scholar]
Jones, A.; McDowall, I.; Yamada, H.; Bolas, M.; Debevec, P. Rendering for an Interactive 360 Light Field Display. ACM Trans. Graph. (SIGGRAPH) 2007, 26, 3. [Google Scholar] [CrossRef]
Peleg, S.; Ben-Ezra, M.; Pritch, Y. Omnistereo: Panoramic stereo imaging. IEEE Trans. PAMI 2001, 23, 279–290. [Google Scholar] [CrossRef]
Tanaka, K.; Hayashi, J.; Inami, M.; Tachi, S. TWISTER: An immersive autostereoscopic display. In Proceedings of the IEEE Virtual Reality 2004, Chicago, IL, USA, 27–31 March 2004; pp. 59–66. [Google Scholar]
Gąbka, J. Edge computing technologies as a crucial factor of successful Industry 4.0 growth. The case of live video data streaming. In Advances in Manufacturing II Volume 1; Solutions for Industry 4.0. Lecture Notes in Mechanical Engineering; Justyna, T., Olaf, C., José, M.M., Pavlenko, I., Eds.; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
Zhu, X.; Yu, S. A review of distortion correction in wide-angle vision systems: Challenges and opportunities. Appl. Sci. 2021, 11, 3001. [Google Scholar]

Figure 1. Effect of the radial distortion occurring in the fish-eye cameras.

Figure 2. Principle of equidistant projection for lensed cameras [11].

Figure 3. Measurements of deviation from F-theta model [12].

Figure 4. Image circle of the Canon 8–15 z A7SIII.

Figure 5. Chromatic aberration of the A7III + Canon 8–15.

Figure 6. Correlation between the number of pixels per mil in the correct place and the resolution used in the LUT method.

Figure 7. The graphical artifact for the fast method (left) and the precise method (right).

Figure 8. Visual effect of coordinates calculations for fast (left), fast + fix (middle), and precise (right) methods.

Figure 9. Efficiency measurement [milliseconds mean].

Figure 10. The raw image captured during video streaming.

Figure 11. Colour improvement effects.

Figure 12. Equirectangular-to-rectilinear transformation of the captured view.

Figure 13. Captured video played on the Google or displayer, enabling looking around (final effect).

Figure 14. Schematic view of the production plant layout (fragment of the process—heating treatment).

Figure 15. Design of the dedicated camera rig element (3D view and vertical section).

Figure 16. Exemplary installation of multi-camera system around unload area of the furnace.

Table 1. Chromatic aberration values [%] of the A7III + Canon 8–15.

	Aberration Values
Focal Length	8 mm	12 mm	15 mm
f/4	0.058	0.047	0.039
f/4,5	0.056	0.047	0.041
f/5	0.052	0.049	0.044
f/5,6	0.055	0.046	0.046
f/6,3	0.052	0.041	0.041
f/7,1	0.053	0.049	0.049
f/8	0.052	0.047	0.047
f/9	0.052	0.049	0.049
f/10	0.055	0.044	0.053
f/11	0.056	0.044	0.049
f/13	0.051	0.048	0.05
f/14	0.055	0.048	0.049
f/16	0.057	0.046	0.049
f/18	0.055	0.048	0.049
f/20	0.057	0.049	0.052
f/22	0.056	0.05	0.053

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gąbka, J. Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line. Appl. Sci. 2025, 15, 9136. https://doi.org/10.3390/app15169136

AMA Style

Gąbka J. Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line. Applied Sciences. 2025; 15(16):9136. https://doi.org/10.3390/app15169136

Chicago/Turabian Style

Gąbka, Joanna. 2025. "Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line" Applied Sciences 15, no. 16: 9136. https://doi.org/10.3390/app15169136

APA Style

Gąbka, J. (2025). Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line. Applied Sciences, 15(16), 9136. https://doi.org/10.3390/app15169136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line

Abstract

Featured Application

Abstract

1. Introduction

2. State of the Art-Methods of Geometric Distortion Compensation and VR Content Streaming

3. The Test and Analysis of Different Approaches to Distortion Correction Considering Effectiveness and Data Processing Efficiency

4. The Design of a VR Application for Safety Improvement Training System

5. Discussion and Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI