Retina-like Imaging and Its Applications: A Brief Review

: The properties of the human eye retina, including space-variant resolution and gaze characters, provide many advantages for numerous applications that simultaneously require a large ﬁeld of view, high resolution, and real-time performance. Therefore, retina-like mechanisms and sensors have received considerable attention in recent years. This paper provides a review of state-of-the-art retina-like imaging techniques and applications. First, we introduce the principle and implementing methods, including software and hardware, and describe the comparisons between them. Then, we present typical applications combined with retina-like imaging, including three-dimensional acquisition and reconstruction, target tracking, deep learning, and ghost imaging. Finally, the challenges and outlook are discussed to further study for practical use. The results are beneﬁcial for better understanding retina-like imaging. pattern statistical object


Introduction
Many applications benefit from bioinspired optical methods. Here, some typical examples are given for illustration. Artificial compound eye inspired from insects has an extremely large field of view (FOV), low aberration and distortion, high temporal resolution, and an infinite depth of field [1][2][3]. Many artificial compound eye imaging systems have been proposed and used in many applications, such as medical diagnosis [4], navigation [5], and egomotion estimation [6]. The optical systems inspired by lobster eye provide several advantages over traditional system, such as wide field and energy acquisition ability in high-energy radiation field due to the remarkable structure of spherical microchannel [7,8]. Therefore, the lobster eye optical system can be used in the imaging applications where the wavelength includes X-rays, light waves, and infrared bands [9].
The eye is an important organ for learning about the environment for humans. We can focus on near or far scenes by the use of our crystalline lens, that is, different depths. We can also focus on interesting targets in our FOV, that is, the different resolutions in retina. Liquid lenses, as inspired by the human eye, simulate the properties of a crystalline lens [10]. The space-variant pixels of imaging sensors also simulate the properties of retina. Imaging sensors, including two-dimensional (2D) or three-dimensional (3D) ones, play an important role in national defense and civilian use [11][12][13], such as smart monitoring, robotic navigation, mobile phones, AR/VR, and automatic driving. Therefore, imaging sensors based on retina-like mechanisms are suitable for applications that simultaneously require large FOVs, high resolutions, and real-time performance. The development of computing and electronics has facilitated the study on the imaging system based on bioinspired vision from theory to practical use.
The human eye differs from compound eye mainly in the space-variant distribution of photoreceptor cells in the retina [14]. The interested region is assigned by high resolution, while the uninterested region is assigned a low resolution. Meanwhile, the transformation from retina to cortical approximately conforms to log-polar law, which results in good feature of scaling and rotation invariance. Many researchers have explored the properties of retinas and developed retina-like imaging theory and systems. Active and passive are the two main kinds of retina-like imaging systems according to the use of light. This review focuses on the state-of-the-art studies on retina-like imaging, including the implementations, applications, and challenges. The results are beneficial for understanding the properties of retina-like imaging and enable the study on the high performances of imaging systems. The rest of the review is organized as follows. In Section 2, we start from the principle of retina-like imaging and describe the mathematical model and advantages of this imaging method. Then, we propose the development process of the retina-like imaging and introduce it in detail from the two aspects of software and hardware. In Section 3, we discuss in detail the advantages of retina-like imaging in practical applications through four representative applications. In Section 4, we further discuss the retina-like imaging and once again point out the future development trends and the strong potential application value of this method.

Principle and Properties
The principle of retina-like imaging includes two aspects, namely, space-variant sampling and log-polar transformation (LPT) [15], as shown in Figure 1. It is written as Equation (1) [16]: where (x,y) are the Cartesian coordinates and (ξ,θ) are the log-polar coordinates. Some methods of optimizations are added on the basis of log-polar coordinates to ensure the coverage rate of the FOV. Circle spots are used to sample the image, and the sampling between rings is tangent. Accordingly, the space-variant retina sampling model is obtained. The mathematical model is written as Equation (2) [17]: where M and N are the number of rings and number of sectors per ring of the retina-like imaging, respectively. r 0 is the radius of blind hole, q is the growth rate between rings, r 1 is the radius of the first ring, R 1 is the radius of the spot in the first ring, and r max is the maximum radius of the FOV, which is tangent with the spots in the last ring, as shown in Figure 2. Retina-like imaging has three properties. First, it is beneficial for balancing among high resolution, large FOV, and real-time performance. The two axes of log-polar coordinates represent the distance from a certain point to an original point ξ and an angle θ from a certain point to the polar axis, respectively. When the ring number is low, the pixel is near the center of the FOV, and the resolution density is higher than the Cartesian coordinate that brings details to the target details. However, when the ring number is large, the resolution density becomes sparse, and it represents a lower resolution background area. Therefore, the retina-like structure allows for the efficient suppression of redundant data. However, this suppression image makes the pixels uncertain where a fuzzy edge detector would be necessary [18,19]. Second, rotation and scaling invariance are considered. The rotation and scaling of the target in the Cartesian coordinate system only affects the axes θ and ξ, respectively, as represented by Equation (3) [20]: where α stands for the rotation and q stands for the scale change. θ and ξ stand for the two axes of LPT, respectively. In this way, the relationship of rotation and scaling is rewritten by the shift in the θ and ξ directions, and it is called the rotation and scale change invariance [20]. Third, the adjustable optical axis is given attention. On the basis of the space-variant resolution, we always pay attention to the most interesting region. Meanwhile, we can transfer the interesting region according to different environments. With the development of the theory on retina-like imaging, retina-like imaging models do not always strictly conform to LPT, but the space-variant resolution is the typical common feature of retina-like imaging. Therefore, space-variant resolution is the kernel character and is illustrated in the following methods (Section 2.2) and applications (Section 3).

Methods
Many researchers have studied the realization of retina-like imaging from hardware and software. The realization based on hardware can be divided into two kinds of approaches: passive and active. Passive retina-like imaging methods selectively collect the target information, and active retina-like imaging methods explore the target information purposively. The two methods realize the retina-like imaging with specially designed sensors. However, the software-based method uses LPT algorithm directly on the Cartesian imaging sensors to simplify the processing procedure.
The passive method of retina-like imaging has been used since 1990. Tistarellia et al. [21] designed a retina-like CCD sensor to search for targets in a wide FOV. A total of 30 rings with 64 sectors are present per ring. The size of these 1920 pixels increases form 30 µm to 412 µm. In 2000, Sandini et al. [22] realized the retina-like CMOS sensors with 8013 pixels located with log-polar algorithm and 845 pixels in the fovea. Compared with CCD, CMOS is more controllable in data storage and easier to interface with microprocessors; thus, it is more suitable for simulating the acquisition of human eye retina information. In 2014, our group [23] used nonuniformity lens array to collect light on the light-sensitive materials with log-polar configuration. This method shifts the difficulty to the lens array design, which makes it more flexible to adjust to different situations. In 2016, the camera array [24] and dual-channel foveated imaging systems [25] were proposed by Santacana G et al. and Xu C et al., respectively. A camera array formed by prisms and cameras combines the advantages of compound eyes and realizes high-resolution fovea and low-redundancy periphery. It reconstructs the super resolution for imaging the center with repeated stack of multiple cameras. The all-reflective, dual-channel foveated imaging system realizes the retina-like variable-resolution imaging with double barrels and double sensors. It is a combination of two different focus imaging systems with common light channels. Compared with a divided camera, it has a compactable structure, but subsequent image processing is needed to form the final image. This method was improved in 2017 by Guillem et al. [26]. An imaging system with dual-aperture optics that superimpose two images on a single sensor was designed to enable arbitrary magnification ratios using a relatively simple system. This method fully utilizes the sensor's pixels, which is important in applications where the cost of each pixel is high. In the same year, a micro-lens array was used to form the foveated imaging [27]. A 3D-printed micro-lens with different focus lengths in a 2 × 2 arrangement was located directly on the CMOS sensors. It achieves a full FOV of 70 • with an increasing angular resolution of up to two cycles/deg FOV in the center of the image. Different from the methods above, a foveated imaging system with reflective spatial light modulator was proposed by Wang et al. [28]. It corrects the aberration of the region of interest (ROI) while the resolution of the other area is still very low. This method adjusts the fovea location with different phase patterns. In 2019, Wang et al. [29] proposed a foveated imaging system with a liquid crystal lens. This method modulates light selectively with a liquid crystal lens of variable focal length. In 2020, Cao, J.J. et al. [30] realized the bioinspired, zoom eye-enabled, variable-focus imaging with a deformable polydimethylsiloxane lens array. Similar to the lens of human eyes, it images the target with different distances clearly. Its FOV reaches 180 • , and its focal length can be tuned from 3.03 mm to infinity with an angular resolution of 3.86 × 10 −4 rad.
Although the passive methods effectively select and receive target information, the efficiency can be further improved if the required information can be extracted directly. Therefore, the active methods of retina-like imaging have been proposed. In 2017, adaptive, foveated, single-pixel imaging was proposed by David et al. [31]. Given that traditional single-pixel imaging requires a large number of measurements for image reconstruction, the target of interest is reconstructed in high-definition with the help of the human eye gaze function, which improves the reconstruction efficiency. Our group combined the retina-like structure with light detection and ranging (LiDAR). The space-variant scanning based on a micro-electro-mechanical system (MEMS) [17] and an optical phased array [32] were proposed. Compared with traditional LiDAR imaging, this retina-like structure maintains the high resolution in the fovea while enlarging the FOV. Importantly, these methods are simply an improvement on the scanning strategy and need no additional devices.
For software-based method, the algorithm model construction inspired by the human retina has attracted the attention of many researchers. In 2008, Paolo et al. [33] proposed a 3D reconstruction method based on the human retina, and it relies on the rotation and scaling invariance of log-polar transform to rebuild the view of room. In the same year, Wong et al. [34] used log-polar transform to realize panoramic imaging with high efficiency based on FPGA. It decomposes the 256 × 256 omnidirectional image into a 128 × 128 panoramic image in the logarithmic polar coordinate. In 2016, Cheung et al. [35] used the visual attention mechanism of the human eye to solve the problem of high-precision recognition of small targets. Compared with the fixed sampling model, this method reduces the error rate of recognition by half. Figure 3 shows the summary of retina-like imaging methods. The software methods are the least difficult to implement and the most tightly integrated with the other features, such as attention mechanism or omnidirectional imaging. However, more computation cost is required because this method is a further processing of the redundant information. Meanwhile, the information has been screened in the imaging process of the hardware methods, including passive and active methods. In this way, imaging has low redundancy while having increased technical difficulty. Through vertical comparison, more functional retina-like imaging becomes the central issue of bionic imaging. Especially after 2017, fovea gaze control, lens-like longitudinal focus adjustment, and multi-center imaging are combined in the retina-like imaging and achieve good effect. The comparison of these methods is shown in Table 1. Passive methods, which require more improvement on the imaging devices, have the highest difficulty. Meanwhile, active methods also compress the data volume, and the traditional devices are still suitable for retina-like imaging. Therefore, the difficulty in the space-variant scanning strategy is the requirement of more cooperation between imaging elements. The software methods, which are a processing of the imaging, have simple structure and can be flexibly used in different situations. The disadvantage of poor real-time performance will be alleviated with the improvement in computing power.

Applications
The above-mentioned methods have been used in many applications. For example, 2D or 3D imaging methods are the direct applications of retina-like sampling. With the development of deep learning and novel imaging methods, retina-like mechanisms are combined and used to obtain better performances than traditional methods. Here, we select several typical applications combined with retina-like imaging for illustration.

Three-Dimensional Acquisition and Reconstruction
The technology of 3D image acquisition and reconstruction describes the real scenario into the mathematical model, which conforms to the logic expression of computers, and these models play an auxiliary role in many research fields, including historical preservation [36], game development [37], architectural design [38], and clinical medicine [39,40]. The emphasis of 3D reconstruction technology is to obtain the target scenario or the depth and texture information of the object. The methods of retina-like technology result in a great number of interests and considerable attention owing to it fully combining the advantages of variable resolution, anti-rotation, and scale transformation. Figure 4 shows the retina-like LiDAR. It obtains the 3D figure using the time-of-flight principle, and the retina-like adaptive spacial invariance scanning method makes it more flexible. The 3D reconstruction is conducted by the use of LPT [33], and the original input image (Figure 5a) is divided into different regions (Figure 5b) according to linear detection in log-polar coordinates. The main line of (Figure 5a) is found through image processing, and it is divided into front wall, right wall, ceiling, left wall, floor, and right wall. Then, the obtained line is mapped to the inverse log-polar coordinates and fused with the original image. The optical center of the image is acquired via finding the vanishing point (i.e., the intersection points of the line). Finally, the depth information of the image is extracted according to the position of the optical center, and the 3D image reconstruction is realized. Lee et al. [41] proposed a robust adaptive focusing measurement operator based on the biological inspiration, data selection, and edge in-variance characteristics of LPT, which was widely used in 3D shape restoration. The signal-to-noise ratio (SNR) of the image is improved by the use of LPT, and a better corresponding focus plane is obtained, which determines 3D shapes more accurately. Experiments on simulated target and real target image sequences indicated that the robust adaptive focusing measurement operator is effective in the presence of various types of noise, including high noise variance or strong noise density. Image registration is a necessary part of 3D reconstruction. Masuma et al. [42] studied a multi-modal image registration algorithm based on LPT, which conducts in-plane translation, rotation, and out-plane translation by using LPT. Large initial displacement registration of 3D CT images and 2D single-plane fluoroscopic images is successfully performed. Ravi et al. [43] built a scale, rotation, and translation-invariant descriptor based on the LPT of Gabor filter derivatives for the automatic co-registration of 3D multi-sensor point clouds. The co-registration process using data from a study area in Toronto, Ontario, Canada, which consists of a building surrounded by vegetation, bare land, paved roads, and parking lots, is shown in Figure 6. The source and target point clouds are collected from unmanned aerial vehicles and mobile laser scanning, respectively. After extracting multi-scale keypoints from 3D point cloud data, the keypoint descriptors with constant scaling, rotation, and translation are generated on the basis of the scaling and rotation invariance of log-polar coordinate strategy and derivative mapping calculated from local height blocks around the keypoints. Then, the keypoints of the height map are matched, and the registration results are obtained. Takeshi Masuda [44] used log-polar height map (LPHM) to establish the corresponding relationship for achieving the coarse registration of multiple range images. This map represents the shape of a local surface as a height mapping of the log-polar coordinate system relative to the tangent plane. Figure 7 illustrates an overview of the LPHM-based registration method. For the input range images, the corresponding log-polar depth maps are established via LPHM. Subsequently, the paired range images are registered in a robust manner by the RANSAC algorithm, and the incorrect correspondences are eliminated as outliers. Finally, the fine registration is completed on the basis of the coarse registration results. In the application of 3D object reconstruction based on a skeleton, a retina-like descriptive operator algorithm is constructed on the basis of the attention mechanism of human eyes and the ability to deal with complex tasks of visual scenes. The retina-like descriptive operator algorithm improves the performance of artificial bee colony algorithm in 3D object reconstruction. The description operator of retina-like mechanism also cooperates with digital elevation model (DEM). The purpose is using the variable resolution characteristics of the human retina to achieve high resolution while increasing the efficiency of 3D image reconstruction as much as possible. Liu et al. [45] presented a continuative variable resolution DEM (cvrDEM) for the representation of 3D terrain, which has the characteristics of retina-like high and variable resolutions. The final cvrDEM product with resolution varying from 0.004 m to 0.067 m is displayed in Figure 8a. Figure 8c is the grid DEM used for performance comparison, which has a resolution of 0.03 m per pixel. Figure 8b,d are enlarged views of the area outlined by the rectangle in Figure 8a,c, respectively.

Target Tracking
The retina-like mechanism has a notable tracking effect, which is also suitable for the field of computer vision [46,47]. Li et al. [48] introduced LPT into the target tracking field. They used the scale transformation characteristics of LPT for scale estimation without the requirement for multi-resolution pyramids. This method solves the problem of poor tracking performance for objects with obvious scale or appearance transformations under the Cartesian coordinates. Similarly, the high-resolution feature of the fovea that is similar to the human eye has also been widely used in the task of target grasping. Yamaguchi et al. [49] used a mechanism of high resolution in fovea combined with low resolution in the periphery. This method separates the target that needs to be grasped from the complex background, which can better eliminate the effect of the rotation [50].
Several works have also been conducted to enhance the robustness in target tracking. The general approach is the correlated filter tracking. The conventional correlation filterbased tracking approaches only track targets with boxes parallel to the coordinate axis. Similarity transformation for rotated targets has been rarely explained. Unlike conventional methods, Li et al. [51] converted the 4-degree of freedom (DoF) problem of target tracking into the 2-DoF problem. They applied LPT in correlated filter tracking and proposed a target tracking algorithm for large displacements. This method is beneficial to improve the robustness of target tracking.
Sharif et al. [52] introduced a novel approach that combined scale-invariance feature transform (SIFT) and LPT. SIFT solves the problem of keypoint matching between frames, and LPT eliminates the error effect caused by rotation. This kind of stability requirement is also important in the recognition of traffic signs. Ellahyani et al. [53] used LPT to recognize traffic signs. They first used the mean shift clustering method to preprocess the road sign images and then used the random forest for classifying the target. Finally, they used a combination of LPT and cross-correlation for identification. The addition of LPT effectively reduces the detection error of road signs and decreases the effect of reduced accuracy caused by rotation. The approach proposed by Gudigar et al. [54] was similar to that of Ellahyani et al. They first selected ROI and then used LPT on the target in the ROI. Finally, they applied SVM classification on the fovea target. The overview of this approach is shown in Figure 9. The retina-like mechanism also plays an important role for fast-moving objects. Highspeed target detection is required in various scenarios, such as sports events and traffic detection. Conventional cameras generally record visual information during the entire exposure time through the shutter. By contrast, Zhao et al. [55] proposed a spike camera method based on the retina-like mechanism. This method continuously monitors the light intensity value of fast-moving targets and records the continuous spikes emitted by each pixel. By performing independent operations on each spike sequence, they improved the effect of frame loss caused by high-speed moving targets within the exposure time. On this basis, Zhu et al. [56] proposed a different spike camera that combines dynamic and static information. They used the retina-like principle to record continuous spike data and reconstruct images. This structure not only accurately reconstructs static targets but also has a good capture effect on high-speed moving targets. They also constructed a new spike dataset to provide a basis for subsequent research.

Deep Learning
The retina-like mechanism has also been fully applied to the field of deep learning. When humans need to pay attention to a certain area, the light is concentrated to that area by turning their eyes to obtain a high-resolution image. Several works based on this mechanism have been used to solve various problems. Itti et al. [57] proposed the human visual attention mechanism model, and thus, attention was introduced into the computer vision field. In 2017, Cheung et al. illustrated the fovea sampling grid of the biological primate retina through a deep learning computational model. This method is contrary to the previous methods and ideas of designing machine learning models based on biological results. The model is trained on the classification task and uses the least number of fixations in the visual scene under background disturbance. The tiling attributes appearing in the retina sampling grid of the trained model are obtained. They found that this lattice is similar to the primate retina. The eccentricity depends on the sampling lattice. The high-resolution area of the fovea is surrounded by the low-resolution periphery. Under certain conditions, these emerging characteristics are amplified or eliminated.
Many state-of-the-art methods of attention-based classification are available for existing deep learning networks. However, these methods have a general limitation, that is, a large amount of data training is required to improve the accuracy of the model. Dai et al. [58] proposed a guided attention recurrent network (GARN) based on retina-like scanning, as shown in Figure 10, to solve this problem. In this way, multiple ROIs are trained by scanning one image. The instructive ROI selection jointly determines the label category. This kind of guided multi-attention is trained with a small dataset to achieve high accuracy. In 2020, Xia et al. [59] proposed a novel peripheral fovea multi-resolution driving model based on the retina-like structure. This model predicts the speed of the car through the video of a dashcam. The overview of the peripheral fovea multi-resolution driving model is shown in Figure 11. The peripheral vision module processes complete video frames with low resolution, and the foveal vision module selects subregions and uses high-resolution input from these regions to improve its driving performances. They trained the fovea selection module under the supervision of the driver's line of sight. Adding high-resolution input from the predicted gaze position of a human driver will greatly improve the driving accuracy of the model. The multi-resolution fovea model is better than a single-resolution edge model with the same amount of floating-point operations. The driving model based on fovea multi-resolution achieves higher performance than conventional methods. The retina-like attention mechanism has also shown its unique advantages for practical uses in medical image processing. Hayashi et al. [60] used the retina-like sequence scanning method to segment medical images. This method changes the position of the center point of the retina to perform higher-resolution processing on the subregions of the medical image that are difficult to classify. The accuracy of the overall segmentation is improved by increasing the segmentation accuracy of each subregion.
Different from target tracking, the task of object detection and recognition is not only to find the target but also to classify it. The emergence of the retina-like mechanism makes the effect of object detection and recognition more remarkable.
Emmanuel et al. [61] defined retina-like search as a strategy for the pursuit of target detection accuracy. They divided the target detection task into two parts, as shown in Figure 12, in which one is searching for the location of the target, and the other is determining whether the target is in the searched location. They first transformed the image into log-polar coordinates to obtain the rough position of the target. Then, they trained the high-resolution image of the fovea to obtain accurate information about the target type. In 2019, Azevedo et al. [62] proposed a focusing algorithm based on human-like eyes named augmented-range vehicle detection system (ARVDS), which uses a deep convolutional neural network (DCNN) to detect vehicles at different image scales. This method captures the image by the front camera of the self-driving car and obtains slices of the image according to the waypoint projection of the self-driving car in the image. They simulate the way humans look forward while driving. These image slices are enlarged and sent to DCNN. Then, DCNN focuses on them and detects vehicles at a long distance. Compared with detecting the entire image at multiple scales, this method requires less processing power consumption. The ARVDS algorithm improves the average accuracy of long-distance vehicle detection from 29.51% of a single complete image to 63.15%. In 2020, Kim et al. [63] introduced the spiking neural network (SNN), which simulates the human eye, into the field of target detection and recognition. They also proposed the Spiking-YOLO network. This method is different from the method proposed by Cao et al. [64] to convert the CNN model into SNN, which is the method of directly designing the structure of SNN. In this method, two optimization tricks are proposed at the same time: the channel normalization and signed neuron imbalance threshold methods. Both methods provide fast and accurate deep SNN information transmission. The aforementioned method has achieved performance equivalent to Tiny-YOLO, but the power consumption is extremely low, and the accuracy of 51.83% is achieved on the pattern analysis, statistical modeling, and computational learning visual object class dataset.
Although the existing deep learning methods play a great role in image classification tasks, deep learning methods also have limitations for some rotated objects. However, such problems are solved in LPT. Esteves et al. [65] introduced the LPT method into the structure of CNNs and proposed the polar transformer network. As shown in Figure 13, this structure is different from the conventional CNN. The polar origin predictor and polar transformer modules are added before the classification network. The two modules eliminate the effect of rotation and scale transformation on accuracy.

Ghost Imaging
Ghost imaging (GI), which is a novel imaging technology, has attracted much attention due to its advantages, such as low cost, wide spectrum range, and robustness to light scattering [66,67]. GI provides scene information by correlating the modulate light patterns generated by the pseudo-thermal source, digital micromirror device (DMD), or other types of spatial light modulators [68] with the intensity of the light from the target scene. However, GI using traditional light modulation patterns, such as random ones, cannot balance imaging efficiency and quality [4,69]. Some researchers have conducted studies on the retina-like patterns of GI as inspired by the human eye to improve the performances of GI.
A strategy that exploits the spatiotemporal redundancy was presented by Phillips et al. [31]. This strategy rapidly records the detail information of quickly moving features in the scene and accumulates details of slower evolving areas over several consecutive frames. The retina-like patterns are created by reformatting each row of the Hadamard matrix into a 2D grid of spatially variant pixel size, as shown in Figure 14D,E. Figure 14C,F show the imaging results of traditional patterns and retina-like patterns, and using a retina-like structure, enhances the detail in the foveal region at the expense of a lower resolution in the periphery. They also realized the dynamic imaging of single fovea area and dual fovea through experiments. This proposed method is a novel approach for compressive sensing and improves the performances of GI. A model of 3D GI combined with retina-like structure (R-3DCGI) is described for improving imaging efficiency by our group, as shown in Figure 15 [70]. A signal generator (SG) triggers a pulsed laser (PL). Random speckles are produced by the DMD that are illuminated by the PL, which is used to illuminate a target. The one-dimensional timeresolved total intensity distribution of a reflected or scattered light from the target is collected by a receiving lens (RL) and a time-resolved bucket detector (TBD). The 3D imaging is obtained by combining images from different depth slices. However, the retinalike structure of R-3DCGI is based on logarithmic polar coordinates, as shown in Figure 16. Some retina-like properties, such as invariant scaling and rotation, have been realized in R-3DCGI. The other advantage is a higher imaging efficiency than traditional 3D GI for using a retina-like structure.  A foveal GI based on deep learning to realize the intelligent selection of the ROI for foveal imaging was proposed by Zhai et al. [71]. Selecting the ROI intelligently by applying generative adversarial networks based on SSD architecture improves the imaging quality with higher PSNR of ROI than that of uniform-resolution GI can be achieved.
Gao et al. [72] proposed a novel compressive GI called R-CSGI, as inspired by human consciousness controlling the eyes to acquire the ROI. This method uses prior imaging information from fast Fourier single-pixel imaging to achieve a better visual effect and higher imaging quality. The principle of R-CSGI is shown in Figure 17. A sequence of fast Fourier basis patterns is illuminated on the object, and the image is reconstructed by the light intensity collected by a single-pixel detector. The ROI is then generated by the reconstructed image. The real-value random patterns used to perform compressive sensing reconstruction are generated according to the ROI. The advantage of this method is that the imaging quality obtained by compressive sensing technique can be effectively enhanced because the ROI is found previously. A method based on parallel architecture with retina-like structure was proposed in our previous work [73], as shown in Figure 18. Retina-like patterns are divided into various blocks, and this way is different from the previous work that applied whole retina-like patterns to sampling. This method calculates the data of each of block rather than the whole image, and this way improves the efficiency of the reconstruction algorithm and the retina-like pattern enhances the imaging quality of the ROI at the same time. The above-mentioned studies were mostly based on the structural characteristics of the retina. The fact that human eyes have a poorer spatial resolution to blues than reds and greens was utilized by Qiu et al. [74]. They proposed to use an ultra-low sampling ratio to sample the blue component of color images. The results showed that 95% of the measurements can be reduced in the acquisition of the blue component of nature images with the image size of 256 × 256 pixels. This method is an alternative approach to realize real-time and full-color imaging.

Discussion
We learn the methods of retina-like imaging, including hardware and software, from Section 2. Meanwhile, the view of practical use in Section 3 indicates that various applications use the properties of retina-like imaging. Although the theory of retina-like imaging is relatively good, some challenges exist in methods and applications. Here, we select several typical challenges for discussion. First is the challenge of the optimization of multi parameters of retina-like imaging. According to Equation (2), the parameters of retina-like imaging include rings (M), sectors (N), the radius of the blind hole (r 0 ), and the maximum radius of the outmost ring (r max ). These parameters are correlated and designed by practical use. For example, from the view of electro-based method, fill factor (FF) should be given more consideration due to achieving high optical efficiency. Meanwhile, the pixel crosstalk noise should be considered due to the space-variant size of photosurfaces, leading to low SNR. A space-variant lens array has been proposed to increase the distance between the neighbor pixels [23] for solving the above-mentioned issues. As a result, the size of each pixel is the same and is beneficial of fabricating. However, the optical aperture is supposed to be circle, and the maximum of FF based on space-variant lens is π/4 due to the physical limitations of lens. Therefore, other structures of lens array have been proposed to increase FF [75]. Therefore, optimizing or balancing the retina-like parameters is important to improve the performances for different practical uses.
Second, improving the performance of retina-like GI is also a challenge. From the view of the GI mechanism, the method based on retina-like patterns provides a potential method for balancing the trade-off between the resolution and imaging efficiency of GI. Compared with space-invariant patterns, combining the retina-like GI is efficient for improving the imaging quality and shortening the imaging time. For example, superresolution reconstruction methods are efficient to improve the spatial resolution limited by the hardware device (e.g., the micro-mirror size of DMD), such as sub-pixel shifting. Meanwhile, the time cost is extremely reduced when the parallel GI structure is combined. The invariant rotation and scaling, as the main features of the retina-like structure, are used on the GI system to track moving objects, especially with the object that needs to vary the size and the movement forms on the same FOV. With the retina-like patterns, the change in target need not be considered during the imaging reconstruction. The complexity of data processing is also reduced, which improves the efficiency of the GI with moving objects.
The third challenge is deep learning combined with retina-like feature. Although the retina-like mechanism has been widely used in deep learning, it still has some limitations. For example, deep learning networks have a strong dependence on datasets, which restricts its application in many practical scenarios, such as defect detection, military target identification, and medical image processing. The mechanism of retina-like imaging mitigates the effect of redundant data. However, the way to select the information that needs to be focused on is challenging for deep learning. The application of retina-like attention mechanism to SNN is also challenging to study. SNN originated in neuroscience, and it also has an important directive value for the design of retina-like deep learning networks. Given the lightweight network parameters of SNN, it is more suitable for application in actual scenarios. The combination of retina-like architecture and SNN will be more promising.

Conclusions
The properties of the retina show good advantages in applications that simultaneously require large FOV, high resolution, and real-time performance. We review the methods and applications that use retina-like mechanisms. On the one hand, retina-like imaging is obtained by hardware or software. Software-based methods are easier to realize than hardware ones, but they have lower efficiencies. On the other hand, the applications including 3D acquisition and reconstruction, target tracking, deep learning, and GI have been introduced and their practical uses have been illustrated in detail. Compared with the traditional 3D acquisition and reconstruction technologies, the technologies that combine the characteristics of retina-like imaging have achieved better results, and corresponding methods are constantly being proposed. In the task of target tracking, the value of retinalike imaging is mainly reflected in the reduction of image noise. Moreover, for the target of rotation and scaling, the tracking of retina-like imaging is more robust. In the application of deep learning, the retina-like mechanism is mainly reflected in the following three aspects. First of all, the retina-like gaze can be treated as an attention mechanism in deep learning to enhance features. Secondly, in solving the problem of multi-scale object detection and recognition, the retina-like rotation and scale invariance also work well. Finally, the accuracy of image recognition of retina-like imaging has been greatly improved. Research on RGI has focused more on the adjustment of the location and size of the region of interest and its different applications in imaging. Most of the current retina-like structures are applied to the design of illumination patterns for GI, and investigating how to realize non-uniform sampling by designing devices may better enhance the performance of RGI. The discussions of challenges and the potential approaches make it easy to understand the direction of the retina-like character, which will help in designing the good performances of imaging.

Conflicts of Interest:
The authors declare no conflict of interest.