Advancing the Robotic Vision Revolution: Development and Evaluation of a Bionic Binocular System for Enhanced Robotic Vision

This paper describes a novel bionic eye binocular vision system designed to mimic the natural movements of the human eye. The system provides a broader field of view and enhances visual perception in complex environments. Compared with similar bionic binocular cameras, the JEWXON BC200 bionic binocular camera developed in this study is more miniature. It consumes only 2.8 W of power, which makes it ideal for mobile robots. Combining axis and camera rotation enables more seamless panoramic image synthesis and is therefore suitable for self-rotating bionic binocular cameras. In addition, combined with the YOLO-V8 model, the camera can accurately recognize objects such as clocks and keyboards. This research provides new ideas for the development of robotic vision systems.


Introduction
In the context of the new technological revolution and industrial transformation, the deep integration of artificial intelligence and robotics is essential for unleashing human productivity and accelerating the implementation of AI technologies.This profound empowerment of traditional industries effectively drives their transformation and upgrade [1][2][3].Sensors perceive the external world as critical components of robots and provide essential information to support AI's accurate decision-making [4,5].Among these, vision sensors play a crucial role in the deep integration of AI and robotics.They are widely used, enabling robots equipped with vision systems to perceive and interpret their surroundings, thus allowing them to interact with the world and efficiently perform various tasks.
As one of the most crucial human organs, the eye plays a vital role in shaping our understanding of the world [6].Given its significance, robots with vision systems can perceive and interpret their surroundings, enabling them to interact with the world and perform various tasks effectively [7][8][9].The development of applications that combine machine vision as the eyes and artificial intelligence as the brain has been a focal point of research.In the 1990s, Yann LeCun pioneered the convolutional neural network (CNN) with his creation of LeNet-5, which endowed computers with the ability to learn and recognize image data [10].Effectively, ImageNet, a large-scale image dataset, has acted as a catalyst for the development of computer vision, showcasing the impressive performance of deep learning in image recognition and laying the foundation for further research into vision-intelligent robots [11][12][13][14][15][16][17][18].However, these applications typically utilize a single camera and focus on analyzing two-dimensional images on a flat plane.The material world humans inhabit is three-dimensional.The technology to capture and process threedimensional information reflects a significant grasp of the target world and is a hallmark The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.
Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.The material world humans inhabit is three-dimensional.The technology to capture and process three-dimensional information reflects a significant grasp of the target world and is a hallmark of intelligence [19].Consequently, bionic eyes enable robots to obtain threedimensional coordinates by mimicking the structure of the human eye, propelling artificial intelligence into a new phase of development.Monocular vision, binocular stereoscopic vision, and structured light technology have all been extensively researched; for more details, see Table 1.Monocular vision systems are known for their simple setup, fast response times, low power consumption, and cost efficiency.However, their main disadvantage is lower recognition accuracy.Representative companies in this field include Cognex, Honda, and Keyence.Instruments capable of three-dimensional measurement include binocular stereoscopic cameras [20], Time-of-Flight (TOF) cameras [21,22], structured light imaging cameras [23,24], holographic interferometry cameras [25], and more.Among these, the structure of binocular−vision cameras most closely resembles that of the human eye.Binocular stereoscopic vision technology uses two cameras to simulate the disparity principle of human eyes, obtaining planar and depth information from the real environment through triangulation.Companies like Luster (Bumblebee) and Fuayun (A100) represent the early stages of vision processing technology when algorithm and device precision were not very high; binocular-vision cameras do not emit light actively but rely entirely on two captured images to calculate depth, utilizing passive binocular imaging techniques.Consequently, structured light cameras were developed, using an infrared projection mode and binocular cameras to calculate the features projected onto object surfaces, thus significantly enhancing recognition precision compared to traditional stereoscopic cameras.However, this method requires high ambient light conditions and has a limited projection range, failing to function properly in reflective light conditions.Companies like Intel (Realsense), ORBBEC, and Microsoft (Kinect) represent this product category.Binocular-vision cameras are more cost-effective compared to other 3D optical devices.They can also adapt to complex lighting conditions and capture dynamic scenes and moving targets.Consequently, binocular cameras are often used as bionic eyes and installed on mobile robots, where they have found widespread applications across various fields.In service robots, Qian implemented a binocular-vision system combined with the AdaBoost algorithm to detect facial regions in real time, facilitating tasks such as conversing with people, tracking faces, and coordinating mechanical arms to grasp objects [26].In coal mine rescue robots, robots equipped with binocular-vision systems accurately collect information about collapses and obstacles within mines, providing critical feedback to rescue personnel to prevent secondary accidents [27].Underwater rescue robots equipped with binocular stereovision, despite challenging and variable lighting conditions, demonstrate exceptional distance measurement capabilities and can perform underwater search and rescue tasks [28].In an essential area of robotics-robotic arm vision-Sheng utilized the SURF algorithm to extract and match image feature points based on binocular vision, accomplishing the mechanical arm's grasping of targets [29].In the research on autonomous forestry robots for trunk distance measurement, Zhao developed a trunk-measuring system based on binocular vision theory using TI's DaVincDM37x system.Zhao calculated three-dimensional information from images captured by the binocular cameras, obtaining accurate target location and distance measurements.To enhance the accuracy of obstacle detection algorithms in autonomous vehicles, Liang developed a binocular vision-based obstacle detection algorithm, which minimizes recognition errors and accurately measures the relative position between the vehicle and obstacles.
Binocular-vision technologies have been developed in recent years.Bionic binocular-vision technology mimics the visual mechanisms of humans and animals.However, most binocular cameras cannot rotate autonomously, thus missing autonomous flexibility.This paper primarily focuses on designing a binocular-vision system that can imitate human eye mechanisms, achieving flexibility similar to human eyes and automatically tracking targets for dynamic recognition, thereby providing a broader field of view, as shown in Figure 1.Eye movement is crucial for visual perception.Collecting more information, achieving a more comprehensive view, and eliminating blind spots all depend on the rotation of the eyes [30,31].Eyes with degrees of freedom can smoothly track and locate targets, keeping the area of interest at the center of the frame and allowing quick responses to changes in the tracked target [32,33].Particularly for robots performing target tracking tasks, when the target swiftly changes direction and the robot body cannot quickly turn, a binocular-vision system fixed on the mobile robot body is highly likely to lose track of the target.Therefore, researching a binocular-vision system with degrees of freedom is essential.Basic binocular-vision systems have inherent limitations in wide-angle coverag image quality, making it difficult for robots to meet performance requirements in co and challenging environments.Therefore, this paper has developed two mod binocular-vision systems with degrees of freedom, including one model where on camera rotates and another where both the axis and camera rotate together.The where only the camera rotates is considered the ideal model.The proposed mode effectively improve the robotic vision performance, enhancing the system's capabili challenging and complex environments, increasing its adaptability, and expandi applications.Additionally, it is essential to note that traditional binocular-vision calculation methods, which rely on the principles of similar triangles, are no l applicable when there are degrees of freedom in the binocular system.Thus, this also presents the depth calculation formulas corresponding to the two models.

Structural Design of Bionic Binocular Camera
To mimic the visual motion functions of human eyes, this study develope JEWXON BC200 bionic binocular camera, as shown in Figure 2.This device co mainly of precise miniature components, including the left eyeball and its asso motors.The motor for the up−and−down movement of the left eyeball is responsib its vertical movement, simulating the up-and-down observation function of the h eye.In contrast, the left eyeball left/right movement motor handles horizontal movem allowing the left eyeball to move from side to side, thus expanding the field of view right eyeball and its motors coordinate with the left eyeball to achieve binocular v The motor for the up and down movement of the right eyeball and the right e left/right movement motor control the vertical and horizontal movements of the Basic binocular-vision systems have inherent limitations in wide-angle coverage and image quality, making it difficult for robots to meet performance requirements in complex and challenging environments.Therefore, this paper has developed two models of binocular-vision systems with degrees of freedom, including one model where only the camera rotates and another where both the axis and camera rotate together.The model where only the camera rotates is considered the ideal model.The proposed models can effectively improve the robotic vision performance, enhancing the system's capabilities in challenging and complex environments, increasing its adaptability, and expanding its applications.Additionally, it is essential to note that traditional binocular-vision depth calculation methods, which rely on the principles of similar triangles, are no longer applicable when there are degrees of freedom in the binocular system.Thus, this paper also presents the depth calculation formulas corresponding to the two models.

Structural Design of Bionic Binocular Camera
To mimic the visual motion functions of human eyes, this study developed the JEW-XON BC200 bionic binocular camera, as shown in Figure 2.This device consists mainly of precise miniature components, including the left eyeball and its associated motors.The motor for the up-and-down movement of the left eyeball is responsible for its vertical movement, simulating the up-and-down observation function of the human eye.In contrast, the left eyeball left/right movement motor handles horizontal movements, allowing the left eyeball to move from side to side, thus expanding the field of view.The right eyeball and its motors coordinate with the left eyeball to achieve binocular vision.The motor for the up and down movement of the right eyeball and the right eyeball left/right movement motor control the vertical and horizontal movements of the right eyeball, respectively.Both eyeballs are equipped with an IMU, which is responsible for collecting the attitude data of the eyeballs.The micro camera, embedded within the left and right bionic eyeballs, captures images and, along with the IMU data, transmits them in real time to the data acquisition sensors.These sensors are connected to the PCB via a cable, which transmits the data to the MCU for real-time processing.This lets the camera respond quickly in dynamic environments, capturing clear images and accurate eyeball attitude angles.After processing the data, the MCU communicates with the drive unit to perform the necessary operations.The drive is connected to the motors, controlling the multiple movement motors, allowing the left and right eyeballs to move up and down independently and left and right, significantly expanding the camera's field of view.The PCB is designed for low power consumption, with a power usage of only 2.8 W, which is particularly important for mobile robots as low power consumption translates to longer operational times and higher efficiency.The PCB is mounted on the back of the bionic eyeballs and integrates data input/output interfaces.Through the coordinated work of these components, the JEWXON BC200 bionic binocular camera operates efficiently in dynamic environments, capturing high-quality images and providing precise eyeball attitude data, thus offering reliable visual perception capabilities for mobile robots.
Biomimetics 2024, 9, x FOR PEER REVIEW 5 of 23 eyeball, respectively.Both eyeballs are equipped with an IMU, which is responsible for collecting the attitude data of the eyeballs.The micro camera, embedded within the left and right bionic eyeballs, captures images and, along with the IMU data, transmits them in real time to the data acquisition sensors.These sensors are connected to the PCB via a cable, which transmits the data to the MCU for real-time processing.This lets the camera respond quickly in dynamic environments, capturing clear images and accurate eyeball attitude angles.After processing the data, the MCU communicates with the drive unit to perform the necessary operations.The drive is connected to the motors, controlling the multiple movement motors, allowing the left and right eyeballs to move up and down independently and left and right, significantly expanding the camera's field of view.The PCB is designed for low power consumption, with a power usage of only 2.8 W, which is particularly important for mobile robots as low power consumption translates to longer operational times and higher efficiency.The PCB is mounted on the back of the bionic eyeballs and integrates data input/output interfaces.Through the coordinated work of these components, the JEWXON BC200 bionic binocular camera operates efficiently in dynamic environments, capturing high−quality images and providing precise eyeball attitude data, thus offering reliable visual perception capabilities for mobile robots.

Basis of Binocular Vision
Monocular cameras can capture two−dimensional images or videos but lack the ability to directly perceive depth information.As shown in Figure 3, on the same line, three different targets at varying distances project onto the same position in camera A. Therefore, camera A cannot distinguish which point is farther or closer based on the formed images.Due to the lack of a human eye-like stereoscopic sensing system and parallax for the target, monocular cameras are unable to directly acquire depth information.Thanks to studies of the human visual system, the researchers found that both eyes have different views of the same observation target.The brain could infer the depth of the target by comparing these two view differences.
The development of camera depth perception technology can be traced back to the 1980s.In 1998, based on the human eye's mechanism, the Massachusetts Institute of Technology (MIT) started developing robot bionic eyes and created the Kismet robot [34].The robot could interact by capturing facial expressions using two Charge-coupled Device (CCD) cameras.After that, an increasing number of researchers focused on developing robot vision [35][36][37], primarily centered around the research of bionic eyes composed of binocular cameras.Utilizing the parallax between the left and right cameras for the same observation target, binocular imaging techniques could calculate the distance from the camera to the target.As the target moves away from the camera, the parallax between the

Basis of Binocular Vision
Monocular cameras can capture two-dimensional images or videos but lack the ability to directly perceive depth information.As shown in Figure 3, on the same line, three different targets at varying distances project onto the same position in camera A. Therefore, camera A cannot distinguish which point is farther or closer based on the formed images.Due to the lack of a human eye-like stereoscopic sensing system and parallax for the target, monocular cameras are unable to directly acquire depth information.Thanks to studies of the human visual system, the researchers found that both eyes have different views of the same observation target.The brain could infer the depth of the target by comparing these two view differences.The development of camera depth perception technology can be traced back to the 1980s.In 1998, based on the human eye's mechanism, the Massachusetts Institute of Technology (MIT) started developing robot bionic eyes and created the Kismet robot [34].The robot could interact by capturing facial expressions using two Charge-coupled Device (CCD) cameras.After that, an increasing number of researchers focused on developing robot vision [35][36][37], primarily centered around the research of bionic eyes composed of binocular cameras.Utilizing the parallax between the left and right cameras for the same observation target, binocular imaging techniques could calculate the distance from the camera to the target.As the target moves away from the camera, the parallax between the images seen by the left and right cameras decreases, and the difference increases as the target moves closer to the camera.To accurately calculate the distance to a target, it is necessary to calibrate the left and right cameras to determine their geometric relationship and distortion parameters.Subsequently, the feature points in the left and right images would be matched, and the depth information of the target would be obtained by utilizing the parallax of the image and the principle of triangulation.
A schematic diagram of the binocular-vision system is shown in Figure 4.In the ideal scenario of a robot bionic eye, two identical cameras are placed in parallel and simultaneously capture images.Due to the different positions of the two cameras (one left and one right), the image points formed by the same observation target in the left and right camera are not in the same position, resulting in a particular parallax.Based on the parallax between the left and right cameras, the depth information of the target distance from the binocular camera can be calculated by utilizing the principle of similar triangles.
Biomimetics 2024, 9, x FOR PEER REVIEW 6 of 23 images seen by the left and right cameras decreases, and the difference increases as the target moves closer to the camera.To accurately calculate the distance to a target, it is necessary to calibrate the left and right cameras to determine their geometric relationship and distortion parameters.Subsequently, the feature points in the left and right images would be matched, and the depth information of the target would be obtained by utilizing the parallax of the image and the principle of triangulation.A schematic diagram of the binocular−vision system is shown in Figure 4.In the ideal scenario of a robot bionic eye, two identical cameras are placed in parallel and simultaneously capture images.Due to the different positions of the two cameras (one left and one right), the image points formed by the same observation target in the left and right camera are not in the same position, resulting in a particular parallax.Based on the parallax between the left and right cameras, the depth information of the target distance from the binocular camera can be calculated by utilizing the principle of similar triangles. is the center of the observed target. and  are the projection planes of the left and right cameras, respectively. ′ is the plane after rotating  , and  ′ is the plane after rotating  . ′ and  ′ are the optical centers of the two cameras, respectively.The installation height of the two cameras is , the installation distance is , and both cameras have a focal length of .The horizontal length of the camera's projection plane is .To facilitate the understanding of the calculation in the presence of degrees of freedom, the two planes  ′ and  ′ are chosen as the reference planes.According to the triangle similarity theorem, it is easy to prove: △  ′ ′ ∼△  ′ ′.The distance from fixed point  to  ′  ′ is labeled ′.Thus, there is the following relation: The equation simplifies to: p is the center of the observed target.L 1 and L 2 are the projection planes of the left and right cameras, respectively.L ′ 1 is the plane after rotating L 1 , and L ′ 2 is the plane after rotating L 2 .o ′ l and o ′ r are the optical centers of the two cameras, respectively.The installation height of the two cameras is A, the installation distance is B, and both cameras have a focal length of f .The horizontal length of the camera's projection plane is C. To facilitate the understanding of the calculation in the presence of degrees of freedom, the two planes L ′ 1 and L ′ 2 are chosen as the reference planes.According to the triangle similarity theorem, it is easy to prove: △px ′ l x ′ r ∼ △po ′ l o ′ r .The distance from fixed point p to o ′ l o ′ r is labeled Z ′ .Thus, there is the following relation: The equation simplifies to: From this, the target depth distance can be measured as: f , A, and B can be obtained through measurement; thus, they are known parameters.By multiplying the pixel disparity of matched points with the real size of each pixel, the value of N 1 − N 2 can be obtained.This is the depth calculation method employed for basic binocular vision.

Binocular Vision with Degrees of Freedom
However, in many specialized and complex environments, basic binocular-vision systems often fail to achieve the desired results; for example, those scenes that require wider viewing angles and higher image quality.Therefore, in this paper, two models of binocular-vision systems with degrees of freedom were built, including the model with only the camera rotating and the model of co-rotation of the axis and camera.Also, in the paper, the corresponding depth calculation formulas for these two models were provided.

The Model with Only the Camera Rotating
Ideal binocular vision is shown in Figure 5.The binocular camera is free to rotate while the connecting axis is immobile perpendicular to the substrate.
Biomimetics 2024, 9, x FOR PEER REVIEW 7 of 23 From this, the target depth distance can be measured as: ,  , and  can be obtained through measurement; thus, they are known parameters.By multiplying the pixel disparity of matched points with the real size of each pixel, the value of  −  can be obtained.This is the depth calculation method employed for basic binocular vision.

Binocular Vision with Degrees of Freedom
However, in many specialized and complex environments, basic binocular-vision systems often fail to achieve the desired results; for example, those scenes that require wider viewing angles and higher image quality.Therefore, in this paper, two models of binocular-vision systems with degrees of freedom were built, including the model with only the camera rotating and the model of co-rotation of the axis and camera.Also, in the paper, the corresponding depth calculation formulas for these two models were provided.

The Model with Only the Camera Rotating
Ideal binocular vision is shown in Figure 5.The binocular camera is free to rotate while the connecting axis is immobile perpendicular to the substrate.A simple coordinate system was constructed in order to calculate the distance from the center  of the observed target to the substrate. represents the angle of camera rotation,  denotes the camera focal length,  corresponds to the distance from the fixed point  to the projection plane  , and  represents the horizontal length of the projection plane. was chosen as the origin coordinate (0,0) .At the same time, clockwise rotation was considered positive, counterclockwise rotation was considered negative, and || < 60 ∘ .The coordinates of the points of the left camera are shown in Table 2:  A simple coordinate system was constructed in order to calculate the distance from the center p of the observed target to the substrate.α represents the angle of camera rotation, f denotes the camera focal length, A corresponds to the distance from the fixed point o 1 to the projection plane L 1 , and C represents the horizontal length of the projection plane.o l was chosen as the origin coordinate (0, 0).At the same time, clockwise rotation was considered positive, counterclockwise rotation was considered negative, and |α|< 60 • .The coordinates of the points of the left camera are shown in Table 2: Let the line function of the line containing points x l and p be g 2 (x) = p 1 x + q 1 .The slope p 1 of g 1 (x) could be calculated by the points x ′ l and x l : By substituting point o ′ l into g 1 (x), the bias term q 1 was obtained as: Similarly, let the points x r and p on the right lie on the line g 2 (x) = p 2 x + q 2 .Similarly, the slope p 2 of the right camera was as follows: The coordinates of o r could be calculated as (sinα * f + B, cosα * f ).The bias term q 2 could be obtained by substituting point o r into the linear function g 2 (x): The coordinates of the intersection of g 1 and g 2 are the coordinates of point p.

The Model of Co-Rotation of Axis and Camera
However, in practical application situations, the camera often needs to be fixed on top of a substrate with a certain thickness.Therefore, the model where the camera rotates alone and the axis does not rotate is the ideal case.The model of co-rotation of the axis and camera is more common and realizable.The schematic of the model of co-rotation of the axis and camera is shown in Figure 6.
The coordinates of  could be calculated as ( *  + ,  * ).The bias term  could be obtained by substituting point  into the linear function  (): The coordinates of the intersection of  and  are the coordinates of point .

The Model of Co-Rotation of Axis and Camera
However, in practical application situations, the camera often needs to be fixed on top of a substrate with a certain thickness.Therefore, the model where the camera rotates alone and the axis does not rotate is the ideal case.The model of co-rotation of the axis and camera is more common and realizable.The schematic of the model of co-rotation of the axis and camera is shown in Figure 6.A simple coordinate system was constructed in order to calculate the distance from the center  of the observed target to the substrate. represents the angle of camera A simple coordinate system was constructed in order to calculate the distance from the center p of the observed target to the substrate.α represents the angle of camera rotation, f denotes the camera focal length, A corresponds to the distance from the rotation point o 1 to the projection plane L 1 , and C represents the horizontal length of the projection plane.o 1 was chosen as the origin coordinate (0, 0), then o 2 coordinates were (B, 0).At the same time, clockwise rotation was considered positive, counterclockwise rotation was considered negative, and |α|< 60 • .When the projection plane was parallel to the mounting substrate, α was considered 0 • .The length of the projection plane and the mounting substrate were the same as when they are perpendicular, so the coordinates of the points on the left can be easily determined as shown in Table 3: Table 3.The coordinates of the points of the left camera.

Point
x−Coordinate y−Coordinate Let p and o ′ l be the function of this line, denoted as f 1 (x) = a 1 x + b 1 .Since the points p, o ′ l , x l , and x ′ l were collinear, any two sets of coordinates could be substituted into the line equation to determine the function of this line.It followed that: By substituting a 1 and o ′ L into f 1 (x), b 1 could be determined as follows: When α = 0 • , the expression for f 1 (x) could be simplified to: Similarly, the coordinates of each point of the right camera could be obtained, as shown in Table 4: Table 4.The coordinates of the points of the right camera.

Point x−Coordinate y−Coordinate
Compared to the left camera, the right camera differs mainly in the extra bias term B in the horizontal coordinate.Therefore, the calculation method for the slope a 2 in the corresponding function f 2 (x) = a 2 x + b 2 was similar, and it was as follows: Since there was a bias term B, b 2 was: Similarly, when α = 0 • , the function f 2 (x) was as follows: Point p was the intersection of functions f 1 (x) and f 2 (x), and the y-coordinate of point p represented the depth value of the observed target.The fixed point p had coordinates (x, y), f and then the following relationship held: When the camera performed up and down sweeps, there was an impact on depth perception.However, the vertical coordinates of the two cameras remained consistent, so the main impact was on the relative height, as shown in Figure 7.
Compared to the left camera, the right camera differs mainly in the extra bias term B in the horizontal coordinate.Therefore, the calculation method for the slope  in the corresponding function  () =   +  was similar, and it was as follows: Similarly, when  = 0 ∘ , the function  () was as follows: Point  was the intersection of functions  () and  (), and the y−coordinate of point  represented the depth value of the observed target.The fixed point  had coordinates (, ),  () =   +  ,  () =   +  , and then the following relationship held: When the camera performed up and down sweeps, there was an impact on depth perception.However, the vertical coordinates of the two cameras remained consistent, so the main impact was on the relative height, as shown in Figure 7. Obviously, when there was a rotation angle along the vertical axis and the two cameras were parallel, the distance  relative to the mounting base plate ensured that the vertical rotation had little impact on the distance.Its effect was on the relative height of points  and  .This relative height could be derived by constructing the function in the same way, and afterward, the distance between point  and the mounting base plate could be obtained.

Experiments
To assess the technological advantages of the bionic eye in next−generation vision devices, this project has procured mainstream industry depth cameras for experimental comparison.The cameras include Intel Realsense SR300 Depth Camera:Working distance: 0.  Obviously, when there was a rotation angle along the vertical axis and the two cameras were parallel, the distance A relative to the mounting base plate ensured that the vertical rotation had little impact on the distance.Its effect was on the relative height of points o 1 and o 2 .This relative height could be derived by constructing the function in the same way, and afterward, the distance between point p and the mounting base plate could be obtained.

Experiments
To assess the technological advantages of the bionic eye in next-generation vision devices, this project has procured mainstream industry depth cameras for experimental comparison.The cameras include Intel Realsense SR300 Depth Camera:Working distance: 0.  During the preliminary operational tests, it was found that the Intel SR300 depth camera and the ORBBEC Dabai DW depth camera had similar specifications, leading to the decision to phase out the SR300 camera.Additionally, the SXHDR 300 binocular camera was found to be inferior to the Fuayun A100 binocular camera in terms of resolution, viewing angle, and power consumption, resulting in the elimination of the SXHDR 300 binocular camera from further consideration.

Experimental Platform
To achieve optimal performance of the cameras, this experiment involves connecting the cameras to different system platforms as detailed in Figure 9.The JEWXON BC200 bionic eye binocular camera is connected via USB 3.0 to Nvidia's Jetson Orin NX 16 GB embedded vision development board.The operating system used is Ubuntu 20.04, with CUDA version 11.4.19.The AI performance of the setup reaches 100 TOPS.
The Fuayun A100 binocular camera is connected via USB 3.0 to the Allwinnertech H6 embedded development board, which features a quad-core Cortex A53 processor, Mali-T720 GPU, and 2 GB LPDDR3 memory.The operating system used is Android 7.0, as shown in Figure 10.
The ORBBEC Dabai DW depth camera is connected via USB 3.0 to an ASUS FX50 laptop, which features an i7 CPU 12,700 H at 2.30 GHz, 16 GB of memory, and an RTX 3070 GPU.The operating system used is Windows 11, as shown in Figure 11.During the preliminary operational tests, it was found that the Intel SR300 depth camera and the ORBBEC Dabai DW depth camera had similar specifications, leading to the decision to phase out the SR300 camera.Additionally, the SXHDR 300 binocular camera was found to be inferior to the Fuayun A100 binocular camera in terms of resolution, viewing angle, and power consumption, resulting in the elimination of the SXHDR 300 binocular camera from further consideration.

Experimental Platform
To achieve optimal performance of the cameras, this experiment involves connecting the cameras to different system platforms as detailed in Figure 9.The JEWXON BC200 bionic eye binocular camera is connected via    The Fuayun A100 binocular camera is connected via USB 3.0 to the Allwinnertech H6 embedded development board, which features a quad-core Cortex A53 processor, Mali-T720 GPU, and 2 GB LPDDR3 memory.The operating system used is Android 7.0, as shown in Figure 10.

Binocular Ranging Accuracy Experiment
Accurately obtaining distances in the world coordinate system is a major feature of binocular stereovision.For this experiment, a high−precision DELIXI brand laser rangefinder, model D100, was selected, which has a maximum measuring distance of 120 M, as shown in Figure 12.This device was used to measure the pose of the cameras, thus allowing for a comparison of the actual measurement accuracy of the cameras.The ORBBEC Dabai DW depth camera is connected via USB 3.0 to an ASUS FX50 laptop, which features an i7 CPU 12,700 H at 2.30 GHz, 16 GB of memory, and an RTX 3070 GPU.The operating system used is Windows 11, as shown in Figure 11.

Binocular Ranging Accuracy Experiment
Accurately obtaining distances in the world coordinate system is a major feature of binocular stereovision.For this experiment, a high−precision DELIXI brand laser rangefinder, model D100, was selected, which has a maximum measuring distance of 120 M, as shown in Figure 12.This device was used to measure the pose of the cameras, thus allowing for a comparison of the actual measurement accuracy of the cameras.

Binocular Ranging Accuracy Experiment
Accurately obtaining distances in the world coordinate system is a major feature of binocular stereovision.For this experiment, a high-precision DELIXI brand laser rangefinder, model D100, was selected, which has a maximum measuring distance of 120 M, as shown in Figure 12.This device was used to measure the pose of the cameras, thus allowing for a comparison of the actual measurement accuracy of the cameras.

Binocular Ranging Accuracy Experiment
Accurately obtaining distances in the world coordinate system is a major feature of binocular stereovision.For this experiment, a high−precision DELIXI brand laser rangefinder, model D100, was selected, which has a maximum measuring distance of 120 M, as shown in Figure 12.This device was used to measure the pose of the cameras, thus allowing for a comparison of the actual measurement accuracy of the cameras.In this study, the laser rangefinder and the cameras being tested were aligned with a reference target (a chessboard) to perform depth imaging measurements.The lenses of the three cameras and the emitter of the laser rangefinder were kept at a horizontal level.The distances measured were as follows: the Fuayun A100 binocular camera was 1.328 M from the chessboard; the ORBBEC Dabai DW depth camera was 1.339 M from the chessboard; and the JEWXON BC200 bionic eye binocular camera was 1.338 M from the chessboard, as shown in Figure 13.
reference target (a chessboard) to perform depth imaging measurements.The lenses of the three cameras and the emitter of the laser rangefinder were kept at a horizontal level.The distances measured were as follows: the Fuayun A100 binocular camera was 1.328 M from the chessboard; the ORBBEC Dabai DW depth camera was 1.339 M from the chessboard; and the JEWXON BC200 bionic eye binocular camera was 1.338 M from the chessboard, as shown in Figure 13.

Camera Parameters and Error Testing
Since the ORBBEC Dabai DW depth camera comes factory-calibrated with built-in direct ranging capabilities, as shown in Figure 14, this experiment only compares the two binocular cameras.Table 5 presents the basic parameters of the two experimental cameras obtained through the Matlab Stereo Camera Calibrator tool.BC200 may deliver images with less distortion, which is beneficial for applications that require high precision and image fidelity.From the camera error experiments shown in Figures 15 and 16, it is evident that the average error for the Fuayun A100 binocular camera is 3.53 pixels, while the JEWXON BC200 bionic eye binocular camera has an average error of 0.08 pixels.binocular cameras.Table 5 presents the basic parameters of the two experimental cameras obtained through the Matlab Stereo Camera Calibrator tool.BC200 may deliver images with less distortion, which is beneficial for applications that require high precision and image fidelity.From the camera error experiments shown in Figures 15 and 16, it is evident that the average error for the Fuayun A100 binocular camera is 3.53 pixels, while the JEWXON BC200 bionic eye binocular camera has an average error of 0.08 pixels.

Distance Testing
Each of the three cameras acquired images on their respective platforms, and after shooting, calibration, and correction, the following data were obtained: ORBBEC Dabai DW Depth Camera: The actual distance from the camera to the target object was 1.339 M, with the camera's measurement data showing 1.269 M; Fuayun A100 Binocular Camera: The actual distance from the camera to the target object was 1.328 M, with the camera's measurement data showing 1.015 M; JEWXON BC200 Bionic Eye Binocular Camera: The actual distance from the camera to the target object was 1.338 M, with the camera's measurement data showing 1.299 M. The experimental results indicate that the JEWXON BC200 bionic eye binocular camera's measurement accuracy is closer to the world coordinate system distance compared to the ORBBEC Dabai DW camera.However, the Fuayun A100 binocular camera recorded measurement data of 1.015 M, which shows a significant deviation from the actual distance, as depicted in Figure 17.

Distance Testing
Each of the three cameras acquired images on their respective platforms, and after shooting, calibration, and correction, the following data were obtained: ORBBEC Dabai DW Depth Camera: The actual distance from the camera to the target object was 1.339 M, with the camera's measurement data showing 1.269 M; Fuayun A100 Binocular Camera: The actual distance from the camera to the target object was 1.328 M, with the camera's measurement data showing 1.015 M; JEWXON BC200 Bionic Eye Binocular Camera: The actual distance from the camera to the target object was 1.338 M, with the measurement data showing 1.299 M. The experimental results indicate that the JEWXON BC200 bionic eye binocular camera's measurement accuracy is closer to the world coordinate system distance compared to the ORBBEC Dabai DW camera.However, the Fuayun A100 binocular camera recorded measurement data of 1.015 M, which shows a significant deviation from the actual distance, as depicted in Figure 17.

Distance Testing
Each of the three cameras acquired images on their respective platforms, and after shooting, calibration, and correction, the following data were obtained: ORBBEC Dabai DW Depth Camera: The actual distance from the camera to the target object was 1.339 M, with the camera's measurement data showing 1.269 M; Fuayun A100 Binocular Camera: The actual distance from the camera to the target object was 1.328 M, with the camera's measurement data showing 1.015 M; JEWXON BC200 Bionic Eye Binocular Camera: The actual distance from the camera to the target object was 1.338 M, with the camera's measurement data showing 1.299 M. The experimental results indicate that the JEWXON BC200 bionic eye binocular camera's measurement accuracy is closer to the world coordinate system distance compared to the ORBBEC Dabai DW camera.However, the Fuayun A100 binocular camera recorded measurement data of 1.015 M, which shows a significant deviation from the actual distance, as depicted in Figure 17.

Bionic Binocular Camera Dynamic Viewing Angle Test
Due to the JEWXON BC200 bionic eye binocular camera's capability to rotate freely, similar to human eyeballs, it maintains a wider field of view compared to other fixeddepth and binocular cameras, even when the camera itself is in a fixed position.This feature allows for more comprehensive visual coverage, as depicted in Figure 18.

Bionic Binocular Camera Dynamic Viewing Angle Test
Due to the JEWXON BC200 bionic eye binocular camera's capability to rotate freely, similar to human eyeballs, it maintains a wider field of view compared to other fixed-depth and binocular cameras, even when the camera itself is in a fixed position.This feature allows for more comprehensive visual coverage, as depicted in Figure 18.
The bionic binocular camera conducts real−time online calibration of images, which allows for the rapid acquisition of camera pose and image data.After obtaining left and right disparity images, the system corrects and processes them into images with depth information.However, during the rotation process of the binocular camera, using only the camera rotation model (Moda1) can cause image distortion during multi-angle motion shooting.This leads to incorrect pixel alignment, causing errors in the entire panoramic image composition.
On the other hand, the model where both the axis and camera rotate together (Moda2) results in images that are excessively smoothed during the JEWXON BC200 bionic eye binocular camera's motion, reducing distortion and errors.The resulting panoramic images are more detailed, and the degree of image warping is significantly improved.
Figure 18.Multi-angle testing of the JEWXON BC200 bionic eye binocular camera after fixed installation.This test evaluates the performance of the JEWXON BC200 bionic eye binocular camera in a fixed installation, focusing on its ability to cover multiple angles due to its internal mechanisms that simulate human eye movements.The testing aims to demonstrate how the camera maintains a comprehensive field of vision across various orientations and conditions.
In dynamic environments, objects do not always align perfectly with the camera's field of view, making it crucial to measure depth data at various angles.However, binocularvision systems inherently have installation errors that inevitably affect the calculation of depth values.Additionally, the inability to guarantee perfectly equal rotation angles for the left and right cameras is one source of error.As the bionic binocular camera's rotation angle increases, the distance error caused by pixel differences at each point also increases.Minor pixel position errors can lead to significant distance measurement errors, causing errors to increase with the target distance.Using a basic model alone cannot achieve dynamic visual measurement acquisition.
The bionic binocular camera conducts real-time online calibration of images, which allows for the rapid acquisition of camera pose and image data.After obtaining left and right disparity images, the system corrects and processes them into images with depth information.However, during the rotation process of the binocular camera, using only the camera rotation model (Moda1) can cause image distortion during multi-angle motion shooting.This leads to incorrect pixel alignment, causing errors in the entire panoramic image composition.
On the other hand, the model where both the axis and camera rotate together (Moda2) results in images that are excessively smoothed during the JEWXON BC200 bionic eye binocular camera's motion, reducing distortion and errors.The resulting panoramic images are more detailed, and the degree of image warping is significantly improved.
Experiments show that traditional binocular cameras experience increasing errors as the rotation angle increases.Furthermore, the improved model demonstrates smaller errors compared to the traditional model, proving the stability and effectiveness of the new model, as seen in Table 6.The comparison and results of these models and their impact on image quality and accuracy can be seen in Figure 19.so on in the real world.This demonstrates that the bionic eye binocular camera not only provides a wider field of view and more stable images but also achieves high precision in autonomous target tracking, recognition, and ranging when combined with YOLO-V8.

Comparison of Existing Robotic Bionic Eye Devices
Binocular cameras are discussed in the article "Robot Bionic Vision Technologies: A Review" [40], as shown in Table 7.The bionic eye developed by Zou Wei et al. [41] consists of two CCD cameras and stepper motors, which essentially achieve the movement functions of the human eye.However, using large stepper motors results in an excessively large device with high power consumption, which is detrimental to mobility and portability.The bionic eye developed by Chen et al. [42] incorporates both long-focus and short-focus lenses in each eye, along with a three-degree-of-freedom neck mechanism integrated with an IMU.Although this design enhances perceptual capabilities, the overall size of the device and the high power consumption of the neck mechanism's motor make it unsuitable for mobile applications.The JEWXON BC200, an improvement over previous generations of bionic eyes [43], features 4K HD micro-cameras and IMUs in a more compact form.It also utilizes the latest magnetic levitation conduction technology, allowing the eyeballs to rotate without cable interference, thus increasing the range of motion.Despite the significant reduction in overall power consumption, further improvements are needed to achieve additional functions such as head and neck movement.

Comparison of Existing Robotic Bionic Eye Devices
Binocular cameras are discussed in the article "Robot Bionic Vision Technologies: A Review" [40], as shown in Table 7.The bionic eye developed by Zou Wei et al. [41] consists of two CCD cameras and stepper motors, which essentially achieve the movement functions of the human eye.However, using large stepper motors results in an excessively large device with high power consumption, which is detrimental to mobility and portability.The bionic eye developed by Chen et al. [42] incorporates both long-focus and short-focus lenses in each eye, along with a three-degree-of-freedom neck mechanism integrated with an IMU.Although this design enhances perceptual capabilities, the overall size of the device and the high power consumption of the neck mechanism's motor make it unsuitable for mobile applications.The JEWXON BC200, an improvement over previous generations of bionic eyes [43], features 4K HD micro-cameras and IMUs in a more compact form.It also utilizes the latest magnetic levitation conduction technology, allowing the eyeballs to rotate without cable interference, thus increasing the range of motion.Despite the significant reduction in overall power consumption, further improvements are needed to achieve additional functions such as head and neck movement.

Zou Wei et al. R&D Team [44] Chen et al. R&D Team [42] JEWXON BC200
Features: The device consists of two CCD cameras and stepping motors, which can basically realize the movement function of the human eye.
Features: each eye contains two cameras (long-focus lens and shortfocus lens) for simulating the perception of human eye features.And a 3-degree-of-freedom neck mechanism is designed with an integrated IMU.
Features: 4K HD mini-camera integrated in each eye, integrated IMU, and due to the latest levitation conduction technology used in the data collector, the eye rotates without being interfered by cables during the eye rotation, resulting in a larger angle of eye rotation.Disadvantages: Although the motor

Zou Wei et al. R&D Team [44] Chen et al. R&D Team [42] JEWXON BC200
Features: The device consists of two CCD cameras and stepping motors, which can basically realize the movement function of the human eye.
Features: each eye contains two cameras (long-focus lens and short-focus lens) for simulating the perception of human eye features.And a 3-degree-of-freedom neck mechanism is designed with an integrated IMU.
Features: 4K HD mini-camera integrated in each eye, integrated IMU, and due to the latest levitation conduction technology used in the data collector, the eye rotates without being interfered by cables during the eye rotation, resulting in a larger angle of eye rotation.
Disadvantages: The model uses larger stepper motors, resulting in an excessively large product, and multiple large motors and loads running will inevitably lead to higher overall operating power consumption of the device, which is not conducive to mobility and portability.
Disadvantages: Although the motor controlling the rotation of the eyeball is reduced, the overall design of the bionic eye shape is too large and a larger motor is used for the neck mechanism, which leads to higher power consumption of the device and is not suitable for use with mobile devices.
Disadvantages: Although the overall use of mini cameras and mini motors, the overall power consumption drops a lot, but there is still room for improvement, such as the realization of the head and neck movement function.

Experimental Summary
The bionic eye can mimic the natural movements of the human eye and provide a wider field of view.When applied to humanoid robots, this enhanced vision technology significantly improves the robot's visual perception capabilities, making it easier to track targets and make autonomous decisions in complex environments.The avoidance of repetitive head-turning motions allows for quicker responses in complex and dynamic scenarios.This paper introduces an innovative design for a bionic binocular-vision system, aimed at overcoming the limitations of existing technology by incorporating a model of binocular-vision systems with degrees of freedom, thereby enhancing the robot's visual performance.Two models were developed: one involving only camera rotation and the other involving the rotation of both the axis and the camera.Various depth vision cameras were selected to conduct experiments on measurement accuracy and image quality.These experiments demonstrate that the designed bionic binocular-vision system and models not only simulate the flexible movement of human eyes but also significantly enhance the visual processing capabilities in dynamic environments by introducing degrees of freedom.The contributions of the proposed degree-of-freedom binocular-vision system models to the fields of machine vision and bionic eyes are as follows: 1.
Camera Specifications: JEWXON BC200 Bionic Eye Binocular Camera: Features a maximum resolution of 3296 × 2512 at 30 fps, Static angle FOV: H65 Vision System Combination: The JEWXON BC200, paired with the Jetson Orin NX, achieves an AI performance of 100 TOPS.This combination represents one of the more advanced embedded vision systems currently available, better suited for mobile robots due to its lower power consumption compared to PCs.

3.
Error Experiment: The JEWXON BC200 bionic eye binocular camera has the best distortion value, the average error is 0.08 pixels, and the performance is excellent, performing better than the Fuayun A100 standard binocular camera.4.
Vision Ranging: The measurement accuracy of the JEWXON BC200 is closer to the world coordinate system distance compared to the ORBBEC Dabai DW depth camera.Using only the camera rotation model during the OpenCV panoramic image synthesis process can lead to image precision errors resulting in distortions and pixel misalignments that cause parts of the composite image to be missing.However, the combined rotation model of the axis and camera significantly improves image distortion, enhancing image precision and thus creating a more perfect composite image.

5.
Enhancing Robotic Vision: To make robotic vision more akin to human vision, the introduction of YOLO-V8 improves the autonomous recognition ability and recognition accuracy of the bionic eye.The BC200 can accurately identify objects such as clocks, laptops, keyboards, mice, cups, pedestrians, and so on in the real world.The bionic eye binocular camera not only provides a broader field of view and more stable images but also achieves autonomous target tracking and precise identification, surpassing the capabilities of fixed binocular and structured light cameras.

Conclusions
In this study, we developed the JEWXON BC200 bionic binocular camera and validated its effectiveness and innovation through comparative evaluation experiments with commonly used depth cameras and bionic binocular cameras.The JEWXON BC200 features a maximum resolution of 3296 × 2512 at 30 fps, a static angle FOV of H65 • and V51 • , and an optimal dynamic field of view of H206 • and V192 • .It integrates an IMU, operates within a distance of 0.2 m to 10 m, and consumes only 2.8 W, outperforming other cameras in terms of resolution, viewing angle, working distance, and energy efficiency.Combined with the Jetson Orin NX, which offers AI performance of up to 100 TOPS, the JEWXON BC200 forms one of the most advanced vision systems for embedded platforms, particularly suited for mobile robots.The camera demonstrated superior distortion performance with an average error of 0.08 pixels and achieved higher measurement accuracy compared to the ORBBEC Dabai DW depth camera.When using a model that rotates both the axis and the camera, the image distortion and precision errors observed in fixed installations were significantly reduced, resulting in more accurate and seamless composite images.Additionally, incorporating YOLO-V8 enhanced the camera's ability to accurately identify objects such as clocks, laptops, and pedestrians, making the BC200 not only smaller and more energy-efficient but also more capable in autonomous target tracking and precise recognition, surpassing the capabilities of fixed binocular and structured light cameras.

Future Work Focus
A. The hardware platform for the binocular-vision system with degrees of freedom requires improvements, including adopting more advanced visual platforms, enhancing computational power while achieving lower operational power consumption, and increasing both installation and control precision.This will reduce the introduction of errors and enhance the accuracy of depth computation.B. While it is feasible to calculate distances using functional methods when the camera has degrees of freedom, it also introduces a considerable computational model that increases processing time; there is still room for future optimization.C. A solution is needed for the problem of absolute error variation with distance.One potential method to reduce errors is to attempt to replace the existing YOLO-V8 with a more precise and advanced target detection model.
The experiments conducted demonstrate that the designed bionic eye binocular-vision system offers better target depth calculation performance and a broader field of view compared to traditional binocular-vision systems.The experimental results confirm that the designed bionic binocular-vision system achieves flexible movements similar to human eyes while enhancing image quality and depth calculation accuracy during motion.This advancement not only enhances humanoid robots' adaptability and application range in complex environments but also provides a crucial theoretical basis and technical pathway for the future development of robotic vision systems.

Figure 1 .
Figure 1.Bionic binocular-vision system designed based on the principles of simulating hum movements.

Figure 1 .
Figure 1.Bionic binocular-vision system designed based on the principles of simulating human eye movements.
Biomimetics 2024, 9, x FOR PEER REVIEW 6 of 23 images seen by the left and right cameras decreases, and the difference increases as the target moves closer to the camera.To accurately calculate the distance to a target, it is necessary to calibrate the left and right cameras to determine their geometric relationship and distortion parameters.Subsequently, the feature points in the left and right images would be matched, and the depth information of the target would be obtained by utilizing the parallax of the image and the principle of triangulation.

Figure 3 .
Figure 3.The schematic of the inability of the monocular camera to determine depth.A schematic diagram of the binocular−vision system is shown in Figure 4.In the ideal scenario of a robot bionic eye, two identical cameras are placed in parallel and simultaneously capture images.Due to the different positions of the two cameras (one left and one right), the image points formed by the same observation target in the left and

Figure 3 .
Figure 3.The schematic of the inability of the monocular camera to determine depth.

Figure 3 .
Figure 3.The schematic of the inability of the monocular camera to determine depth.

Figure 4 .
Figure 4.The schematic diagram of binocular−vision system.

Figure 4 .
Figure 4.The schematic diagram of binocular-vision system.

Figure 5 .
Figure 5.The schematic diagram of the model with only the camera rotating.

Figure 5 .
Figure 5.The schematic diagram of the model with only the camera rotating.

Figure 6 .
Figure 6.The schematic of the model of co−rotation of axis and camera.

Figure 6 .
Figure 6.The schematic of the model of co-rotation of axis and camera.

Figure 7 .
Figure 7.The relationship between the eyeball and the observation point p during vertical movement.

Figure 7 .
Figure 7.The relationship between the eyeball and the observation point p during vertical movement.
USB 3.0 to Nvidia's Jetson Orin NX 16 GB embedded vision development board.The operating system used is Ubuntu 20.04, with CUDA version 11.4.19.The AI performance of the setup reaches 100 TOPS.Biomimetics 2024, 9, x FOR PEER REVIEW 12 of 23

Figure 13 .
Figure 13.Reference to the actual distance from the target to the test camera, field experimental environment diagram.Figure 13.Reference to the actual distance from the target to the test camera, field experimental environment diagram.

Figure 13 .
Figure 13.Reference to the actual distance from the target to the test camera, field experimental environment diagram.Figure 13.Reference to the actual distance from the target to the test camera, field experimental environment diagram.

Figure 17 .
Figure 17.Display of measurement accuracy for each camera.

Figure 20 .
Figure 20.Bionic eye binocular camera combined with YOLO−V8 target detection and recognition effect.

Figure 20 .
Figure 20.Bionic eye binocular camera combined with YOLO-V8 target detection and recognition effect.

Table 7 . 23 Table 7 . 23 Table 7 .
Comparison of existing robotic bionic eye devices.Zou Wei et al.R&D Team [44] Chen et al.R&D Team [42] JEWXON BC200 Biomimetics 2024, 9, x FOR PEER REVIEW 19 of Comparison of existing robotic bionic eye devices.Zou Wei et al.R&D Team [44] Chen et al.R&D Team [42] JEWXON BC200 Features: The device consists of two CCD cameras and stepping motors, which can basically realize the movement function of the human eye.Features: each eye contains two cameras (long-focus lens and shortfocus lens) for simulating the perception of human eye features.And a 3-degree-of-freedom neck mechanism is designed with an integrated IMU.Features: 4K HD mini-camera integrated in each eye, integrated IMU, and due to the latest levitation conduction technology used in the data collector, the eye rotates without being interfered by cables during the eye rotation, resulting in a larger angle of eye rotation.Disadvantages: Although the motor Biomimetics 2024, 9, x FOR PEER REVIEW 19 of Comparison of existing robotic bionic eye devices.Zou Wei et al.R&D Team [44] Chen et al.R&D Team [42] JEWXON BC200 Features: The device consists of two CCD cameras and stepping motors, which can basically realize the movement function of the human eye.Features: each eye contains two cameras (long-focus lens and shortfocus lens) for simulating the perception of human eye features.And a 3-degree-of-freedom neck mechanism is designed with an integrated IMU.Features: 4K HD mini-camera integrated in each eye, integrated IMU, and due to the latest levitation conduction technology used in the data collector, the eye rotates without being interfered by cables during the eye rotation, resulting in a larger angle of eye rotation.Disadvantages: Although the motor Biomimetics 2024, 9, x FOR PEER REVIEW 19 of 23

Table 1 .
Performance comparison table of mainstream visual technology.

Table 1 .
Performance comparison table of mainstream visual technology.

Table 1 .
Performance comparison table of mainstream visual technology.

Table 1 .
Performance comparison table of mainstream visual technology.

Table 1 .
Performance comparison table of mainstream visual technology.

Table 1 .
Performance comparison table of mainstream visual technology.

Table 2 .
The coordinates of the points.

Table 2 .
The coordinates of the points.

Table 5 .
Test group stereo camera data table.

Table 5 .
Test group stereo camera data table.

Table 7 .
Comparison of existing robotic bionic eye devices.
• , V51 • , Best dynamic angle FOV: H206• , V192 • , integrated IMU, a working distance of 0.2-10 m, and a power consumption of 2.8 W. This camera surpasses the control group's other cameras in resolution, maximum viewing angle, effective working distance, and energy efficiency.2.