Modeling the Stereoscopic Features of Mountainous Forest Landscapes for the Extraction of Forest Heights from Stereo Imagery

Spaceborne stereoscopic systems have been growing in recent years, and the point cloud extracted from spaceborne stereo imagery has been used to measure forest spatial structures. These systems work on different viewing angles and image spatial resolutions, which are two critical factors determining the quality of the derived point cloud. In addition, the complex terrain is also a great challenge for the regional mapping of forest spatial structures using spaceborne stereo imagery. Although several theoretical models for simulating multi-view spectral features of forest canopies have been developed, there is hardly any report of a stereoscopic analysis using these models due to the limited size of the simulated forest scenes and the lack of a geometric sensory model (i.e., physical relationship between two-dimensional image coordinates and three-dimensional georeferenced coordinates). The stereoscopic features (i.e., parallax) are, as important as the spectral features contained in the multi-view images of a targeted area, the basis for the extraction of a point cloud. In this study, a new model, referred to as LandStereo model, has been proposed, which is capable of simulating the stereoscopic features of forest canopies over mountainous areas at landscape scales. The model comprised five parts, including defining the mountainous forest landscapes, setting the sun-senor observation geometry, simulating images, generating ground control points, and building geometric sensor models. The LandStereo model was validated over three different scenes, including flat forest landscapes, bare mountain landscapes, and mountainous forest landscapes. The results clearly demonstrated that the LandStereo model worked well on simulating stereoscopic features of both terrains and forest canopies at landscape scales. The extracted height of a forest canopy top from simulated stereo imagery was highly correlated to the truth (R2 = 0.96 and RMSE = 0.99 m) over the flat terrains and (R2 = 0.92 and RMSE = 1.15 m) over the mountainous areas. The LandStereo model provided a powerful tool to further our understanding of the relationships between forest spatial structures and point cloud extracted from stereo imagery acquired from different view angles and spatial resolutions under complex terrain conditions.


Introduction
The forest spatial structure is a key determinant of the forest ecosystem functions due to the height-structured competitions for light [1,2]. High spatial resolution maps of three-dimensional (3D) structures of forests are a necessary prerequisite to characterize the impact of human and natural

Descriptions of the LandStereo Model
Ray-tracing is a typical technique for generating an optical image in computer graphics [30]. It is capable of producing a visual realism. The Persistence of Vision Ray tracer (POV-Ray) is an open-source freeware of ray-tracing [31]. Some researchers have employed POV-Ray in the analysis of remote sensing datasets. For example, a model representing a crop canopy was developed using POV-Ray, for the retrieval of leaf area index (LAI) [32]. The POV-Ray simulation was also used in the analysis of directional anisotropy of brightness surface temperature, over vineyards [33]. Images of the Toulouse city center simulated by POV-Ray were used in modeling the daytime thermal infrared directional anisotropy [34]. Most importantly, POV-Ray includes a script language, i.e., scene description language (SDL). Users could be freed from the detailed ray-tracing techniques and could put more attention on the definition of the desired scenes, as well as the lighting and observing geometry using SDL.

Features of the Mountainous Forest Landscapes
The relative terrain model is automatically generated by default in POV-Ray, based on the input of an 8-bit or 16-bit gray graphic image in a general format of gif, jpeg, png, tiff, and so on. However, the images simulated based on the default relative terrain model could only be used for visualization purpose. They could not be used for stereoscopic analysis like real remote sensing images due to the lack of geographical information. It is necessary to first define a geographical coordinate system and then build a georeferenced terrain model for the LandStereo model. The default coordinate system in POV-Ray is left-handed. The right-handed coordinates system could be built by rotating −90 • along the x-axis and inversing the y-axis. The coordinates and height ranges are initially set from 0.0 through to 1.0. The georeferenced terrain model is defined in the LandStereo model, by the transformation of the relative terrain model, as x = x 0,nw + x i * n c * ∆x (1) y = y 0,nw − y i * n r * ∆y (2) where (x i , y i , z i ) and (x, y, z) are the coordinates of one point in the relative and georeferenced models, respectively. n c and n r are the number of samples and lines of input gray graphic image files, respectively. The ∆x and ∆y are the pixel sizes along x and y directions, respectively. (x 0,nw , y 0,nw ) is the georeferenced coordinate of the northwest (upper left) corner of the input gray graphic image file; h max and h min are the maximum and minimum elevations, respectively. In the LandStereo model, each tree is defined by its coordinate, diameter at breast height, tree height, crown width, and crown length. The crown shape of each tree could be described vividly using the SDL (https://f-lohmueller.de/pov_tut/plants/plants_410e.htm). In order to minimize the computation load, the crown shape model proposed by Christoph Hormann (the coded is provided in the file of ..\scenes\advanced\landscape.pov) is adopted in the LandStereo [35]. A tree crown is described by a randomly carved ellipsoid, specified by crown width and crown length. In order to create unique optical textures for each tree, the other two smaller randomly carved concentric ellipsoids with random sizes are put inside the tree crown. The three concentric randomly carved ellipsoids are randomly rotated along their common vertical axis to generate different gap patterns within the tree crown and different shadow patterns. Figure 1 shows an example of a simulated image by describing the tree crowns using three embedded carved concentric ellipsoids. It is clear that the different crowns could exhibit different optical textures and shadows.
be built by rotating −90°along the x-axis and inversing the y-axis. The coordinates and height ranges are initially set from 0.0 through to 1.0. The georeferenced terrain model is defined in the LandStereo model, by the transformation of the relative terrain model, as = , + * * ∆ (1) where ( , , ) and ( , , ) are the coordinates of one point in the relative and georeferenced models, respectively. and are the number of samples and lines of input gray graphic image files, respectively. The ∆ and ∆ are the pixel sizes along x and y directions, respectively. ( , , , ) is the georeferenced coordinate of the northwest (upper left) corner of the input gray graphic image file; ℎ and ℎ are the maximum and minimum elevations, respectively. In the LandStereo model, each tree is defined by its coordinate, diameter at breast height, tree height, crown width, and crown length. The crown shape of each tree could be described vividly using the SDL (https://f-lohmueller.de/pov_tut/plants/plants_410e.htm). In order to minimize the computation load, the crown shape model proposed by Christoph Hormann (the coded is provided in the file of ..\scenes\advanced\landscape.pov) is adopted in the LandStereo [35]. A tree crown is described by a randomly carved ellipsoid, specified by crown width and crown length. In order to create unique optical textures for each tree, the other two smaller randomly carved concentric ellipsoids with random sizes are put inside the tree crown. The three concentric randomly carved ellipsoids are randomly rotated along their common vertical axis to generate different gap patterns within the tree crown and different shadow patterns. Figure 1 shows an example of a simulated image by describing the tree crowns using three embedded carved concentric ellipsoids. It is clear that the different crowns could exhibit different optical textures and shadows.
Spectral features of the mountainous forest landscapes also need to be defined for the simulation of vivid images. The software of POV-Ray provides user many options/tools to define the desired complex visual effects. Appendix A gives a brief description about how spectral features of the forest landscapes are defined in the LandStereo model, using the tools of POV-Ray. Please refer to the help documents of the software for more details.

Settings of the Observation Geometry
The georeferenced mountainous forest landscape can be defined in the LandStereo model by a terrain image and a tree list, as described in the last section. The next important section is to set the parameters of the observation system. POV-Ray provides examples of codes to simulate photos by setting the desired photo size (i.e., a number of lines and samples), positions, and orientations of a frame camera. However, images simulated by these code examples are for the purpose of visualization, but are insufficient for stereoscopic analysis like real remote sensing images. First, the Spectral features of the mountainous forest landscapes also need to be defined for the simulation of vivid images. The software of POV-Ray provides user many options/tools to define the desired complex visual effects. Appendix A gives a brief description about how spectral features of the forest landscapes are defined in the LandStereo model, using the tools of POV-Ray. Please refer to the help documents of the software for more details.

Settings of the Observation Geometry
The georeferenced mountainous forest landscape can be defined in the LandStereo model by a terrain image and a tree list, as described in the last section. The next important section is to set the parameters of the observation system. POV-Ray provides examples of codes to simulate photos by setting the desired photo size (i.e., a number of lines and samples), positions, and orientations of a frame camera. However, images simulated by these code examples are for the purpose of visualization, but Remote Sens. 2019, 11, 1222 5 of 21 are insufficient for stereoscopic analysis like real remote sensing images. First, the typical optical sensor onboard the satellite is a linear pushbroom camera other than the frame camera. Second, the spatial resolution of simulated images could not be accurately calculated if the camera is defined as in the POV-Ray codes examples, because the critical physical variables are ignored, such as the focal length. In the LandStereo model, the observation system is defined by two parts, i.e., the camera parameters and the observation geometry. Typically, the spaceborne stereoscopic mission works on a pushbroom mode using linear charge-coupled-devices (CCD) composed of small detector elements [36]. The field of view of a linear camera could be defined by three parameters, including focal length f, dimension of a detector element ∆ c , and a number of detector elements n c [36].
The observation geometry is defined by the positions of the light source and the view-point, as shown in Figure 2. The x-axis and the y-axis point to east and north, respectively. The sun as a light source is defined by the elevation angle ϕ and the azimuth angle β. β is defined relative to the north direction and increases along the clockwise direction. Therefore, the azimuth angle for the sun in the east, south, and west direction, should be 90 • , 180 • and 270 • , respectively. The view-point is defined by the orbit height h, orbit heading angle γ, off-nadir view angle θ, along the orbit, as shown in Figure 2. The positive value of θ means looking forward along the flying direction. The starting point of the observation is set by co-ordinates of the central point of the first image line, i.e., (x c,0 , y c,0 , z c,0 ). The co-ordinates of the view-point for each image line x c,j , y c,j could be calculated by the observation geometry as where ∆ is the pixel size at ground level (or the ground sampling distance) and j is the line sequential number.
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 21 typical optical sensor onboard the satellite is a linear pushbroom camera other than the frame camera. Second, the spatial resolution of simulated images could not be accurately calculated if the camera is defined as in the POV-Ray codes examples, because the critical physical variables are ignored, such as the focal length. In the LandStereo model, the observation system is defined by two parts, i.e., the camera parameters and the observation geometry. Typically, the spaceborne stereoscopic mission works on a pushbroom mode using linear charge-coupled-devices (CCD) composed of small detector elements [36]. The field of view of a linear camera could be defined by three parameters, including focal length f, dimension of a detector element Δ , and a number of detector elements [36]. The observation geometry is defined by the positions of the light source and the view-point, as shown in Figure 2. The x-axis and the y-axis point to east and north, respectively. The sun as a light source is defined by the elevation angle φ and the azimuth angle β. β is defined relative to the north direction and increases along the clockwise direction. Therefore, the azimuth angle for the sun in the east, south, and west direction, should be 90°, 180° and 270°, respectively. The view-point is defined by the orbit height h, orbit heading angle γ, off-nadir view angle θ, along the orbit, as shown in Figure  ∆= ∆ * ℎ ( * cos ) ⁄ (6) where ∆ is the pixel size at ground level (or the ground sampling distance) and j is the line sequential number. Figure 2. Schematic diagram of the observation geometry. The x-axis and y-axis point to east and north, respectively; α and β are the elevation angle and the azimuth angle defining the geometry of light source; γ and θ are the orbit heading angle and off-nadir view angle, defining the observation geometry; h is the orbit height.

Description of the Geometric Sensor Model
Once the mountainous forest scene is set up and the observation geometry is defined by user provided parameters, multi-view optical images could be simulated by the LandStereo model using the ray-tracing technique. The next essential step for the stereo matching of the simulated multi-view optical images is to build the geometric sensor model. Building the geometric sensor model is one of the main tasks in photogrammetry. Most optical models for the simulation of spectral features of forest canopies do not include this step, since these models are not intended to

Description of the Geometric Sensor Model
Once the mountainous forest scene is set up and the observation geometry is defined by user provided parameters, multi-view optical images could be simulated by the LandStereo model using the ray-tracing technique. The next essential step for the stereo matching of the simulated multi-view optical images is to build the geometric sensor model. Building the geometric sensor model is one of Remote Sens. 2019, 11, 1222 6 of 21 the main tasks in photogrammetry. Most optical models for the simulation of spectral features of forest canopies do not include this step, since these models are not intended to simulate stereoscopic images. Geometric sensor model is used to describe the functional relationship between the two-dimensional image space (line, sample) and the three-dimensional georeferenced object space (latitude, longitude, elevation) [37]. The rational function model (RFM) is a kind of geometric sensor model, which uses ratios of polynomials to establish the relationship between the image space and the georeferenced object space. Appendix B gives the detailed form of RFM.
The RFM model is sensor-independent. Only the rational polynomial coefficients (RPC) of the RFM need to be updated for the images acquired by different satellites. Therefore, the RFM is broadly used in the state-of-art digital photogrammetric software packages. Typically, the RPC is composed of these 90 coefficients, including the 10 parameters used in the coordinate normalization and the 80 polynomial coefficients. For the stereo images simulated by the LandStereo model, the 10 parameters for coordinate normalization could be easily determined by the simulated image size, spatial coverage, and the elevation ranges of the imaging area, while the 80 polynomial coefficients needed to be determined. The iterative least-squares solutions, as proposed by Tao and Hu in [37], is used in the LandStereo model to solve for the 80 polynomial coefficients using the ground control points (GCP), whose image coordinates and object (georeferenced) coordinates are known.

Generation of the Ground Control Points
In order to calculate the RPC of the geometric sensor model for a simulated imagery, the GCP needs to be generated or calculated by a physical imaging process. The image coordinates of one ground point could be accurately calculated using the parameters of observation geometry and its georeferenced coordinates, i.e. to derive (r i , c i ) from (x i , y i , z i ) with known variables, including the starting point for the central point of first image line (x c,0 , y c,0 , z c,0 ), view angle along orbit θ, orbit heading angle γ, focal length f, ground sampling distance ∆, and so on. In the theory of photogrammetry, the GCP could be calculated exactly by a collinearity equation using the observation geometry defined in the last section. For the convenience of understanding, the geometrical relationship between the ground object and its corresponding image pixel is provided in Appendix C. It is suggested that the GCP are evenly scattered within the spatial coverage of the simulated image and its elevation ranges [37].
Once the multi-view optical images are simulated and the corresponding geometric sensor model expressed by RPC is built, the stereo matching could be carried out using general commercial software, such as PCI, ERDAS, ENVI, and so on. The common points, i.e., projections of one ground point on the images of different views, are first identified on the basis of texture matching among multi-view optical images. The parallax is then measured by the identified common points. The elevation of a ground point could be calculated using the measured parallax and the geometric sensor model, i.e., RFM.

Simulated Landscapes
The LandStereo model has the capability to simulate the stereoscopic features of forest canopies over mountainous areas at landscape scales. In this paper, the model was verified by three scenes, including (a) forest landscapes on flat terrains (case 1); (b) bare mountains without any vegetation (case 2); and (c) mountainous forest landscapes (case 3). Appendix D shows the detailed workflow of the model validation. The first scene could examine the model's performance in simulating the stereoscopic features of forest canopies without terrain influences. The second scene could validate the capability of the model to simulate the stereoscopic features of the terrains. The third scene was the most complicated one and the one that was nearest to the real forest scene, among all three scenes.
The third forest scene was built using a digital terrain model (DTM) and a tree list. The DTM used in this study was extracted from airborne LiDAR point cloud data. The point cloud was classified as the ground and non-ground points. The DTM with a resolution of 1 m was produced by the rasterization of the ground points. The size of the DTM was 9.559 km by 8.813 km. The maximum and minimum elevation was 1,162 m and 416 m, respectively, as shown in Figure 3.
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 21 by the rasterization of the ground points. The size of the DTM was 9.559 km by 8.813 km. The maximum and minimum elevation was 1,162 m and 416 m, respectively, as shown in Figure 3. The tree list provided the coordinates, the diameter at breast height (DBH), the tree height, crown width, and the crown length of each tree. It is difficult to measure each tree over such a large area. As pointed out in [38], the tree parameters could be generated using the forest growth models, such as the SIBBORK model, which is a type of spatially-explicit gap model [39]. The coordinates, species and DBH of each tree are the direct output of the forest growth model [39,40]. Regression relationships developed from the field data are used to estimate the tree height, crown depth, and width from the DBH [41]. In total, 842,121 trees were generated with a maximum tree height of 30.0 m and a minimum tree height of 5.0 m. The first scene was built by keeping the tree list but setting the elevations to be a constant, at 772.0 m. The second scene was built by keeping the DTM but removing all trees. Table 1 lists the parameters used in model simulation. The first four lines are the metadata of DTM, including data dimensions ( , ), spatial resolutions (∆ , ∆ ), elevation dynamic ranges (ℎ , ℎ ), and the absolute geo-referenced (UTM 51N) coordinates of the upper left corner ( , , , ). The fifth line defined the direction of the light source (φ, β). The sixth and seventh lines described the camera parameters, including the focal length (f), size of the sensing elements ( ∆ ), the number of sensing elements of the linear CCD ( ), and the number of lines ( ) to be simulated. The eighth to tenth lines depict the observation geometry, including the coordinates of the starting observation point ( , , , ), flying height (h), flying direction (γ), view angle (θ), and image resolution (∆). The elevation of the starting point ( , ) could be read from the DTM by the positions of the starting point. The size of the sensing elements in Table 1 is for the nadir view. For the non-nadir view, it was adjusted by Equation (6), according to the view angle to achieve the specified image resolution.

Simulation Parameters
The geometric sensor model was developed using the terrain-independent method [37]. The GCP were evenly distributed across the full extent of the simulated image, in a grid of 20 columns by 30 rows. Six elevation layers were used ranging from 500 m to 1200 m. The image coordinates and georeferenced coordinates of the GCP were calculated through the physical imaging process as described in Section 3.2. The RPC was solved through the fitting of RPM using GCP through the iterative least square estimations. The checking points were generated in a similar way, but with more density in each dimension. The checking points were generated using 37 columns by 59 rows, with 11 elevation layers. Therefore, in total 3600 GCP were generated for the calculation of RPC and Elevation (m) The tree list provided the coordinates, the diameter at breast height (DBH), the tree height, crown width, and the crown length of each tree. It is difficult to measure each tree over such a large area. As pointed out in [38], the tree parameters could be generated using the forest growth models, such as the SIBBORK model, which is a type of spatially-explicit gap model [39]. The coordinates, species and DBH of each tree are the direct output of the forest growth model [39,40]. Regression relationships developed from the field data are used to estimate the tree height, crown depth, and width from the DBH [41]. In total, 842,121 trees were generated with a maximum tree height of 30.0 m and a minimum tree height of 5.0 m. The first scene was built by keeping the tree list but setting the elevations to be a constant, at 772.0 m. The second scene was built by keeping the DTM but removing all trees. Table 1 lists the parameters used in model simulation. The first four lines are the metadata of DTM, including data dimensions (n c , n r ), spatial resolutions (∆x, ∆y), elevation dynamic ranges (h max , h min ), and the absolute geo-referenced (UTM 51N) coordinates of the upper left corner (x 0,nw , y 0,nw ). The fifth line defined the direction of the light source (ϕ, β). The sixth and seventh lines described the camera parameters, including the focal length (f ), size of the sensing elements ( ∆ c ), the number of sensing elements of the linear CCD (m c ), and the number of lines (m r ) to be simulated. The eighth to tenth lines depict the observation geometry, including the coordinates of the starting observation point (x c,0 , y c,0 ), flying height (h), flying direction (γ), view angle (θ), and image resolution (∆). The elevation of the starting point (z c,0 ) could be read from the DTM by the positions of the starting point. The size of the sensing elements in Table 1 is for the nadir view. For the non-nadir view, it was adjusted by Equation (6), according to the view angle to achieve the specified image resolution. The geometric sensor model was developed using the terrain-independent method [37]. The GCP were evenly distributed across the full extent of the simulated image, in a grid of 20 columns by 30 rows. Six elevation layers were used ranging from 500 m to 1200 m. The image coordinates and georeferenced coordinates of the GCP were calculated through the physical imaging process as described in Section 3.2. The RPC was solved through the fitting of RPM using GCP through the iterative least square estimations. The checking points were generated in a similar way, but with more density in each dimension. The checking points were generated using 37 columns by 59 rows, with 11 elevation layers. Therefore, in total 3600 GCP were generated for the calculation of RPC and 24013 checking points were generated for each simulated image to evaluate the accuracy of the geometric sensor model expressed by the RPC.

Validation Results of the LandStereo Model
The spatial coverage of the simulated image was 3.84 km in width and 6.0 km in length, according to the setting of the simulation parameters in Table 1. The DTM within the simulated area is shown in Figure 4a. The canopy height model (CHM) produced, based on the tree list, is shown in Figure 4b.  View angle (θ) 0°,20°, −20° Image resolution (∆) 1.0 m

Validation Results of the LandStereo Model
The spatial coverage of the simulated image was 3.84 km in width and 6.0 km in length, according to the setting of the simulation parameters in Table 1. The DTM within the simulated area is shown in Figure 4a. The canopy height model (CHM) produced, based on the tree list, is shown in Figure 4b.   Figure 5 shows the accuracy of the geometric sensor model described by the derived RPC. Figure 5a-c are the maximum error of the geometric sensor model, for images simulated with view angles of 0°, 20°, and −20°, respectively, i.e., the maximum difference between the image coordinates of the check points calculated by the physical imaging process and those predicted by the RPC. The horizontal axis is the number of iterations for the estimation of RPC, using iterative least-square-estimations. One set of the RPC was produced in each iteration and it was used as the initial value of the next iteration. It can be seen that the accuracy of the RPC converged after 5−6 iterations. Figure 5d-g are the root mean square error (RMSE) of the geometric sensor model, for the images simulated with a view angle of 0°, 20°, and −20°, respectively. It can be seen that the maximum error of the GCP and the checking points converged at around the 0.07 pixel in the Y Tree Height (m)  Figure 5 shows the accuracy of the geometric sensor model described by the derived RPC. Figure 5a-c are the maximum error of the geometric sensor model, for images simulated with view angles of 0 • , 20 • , and −20 • , respectively, i.e., the maximum difference between the image coordinates of the check points calculated by the physical imaging process and those predicted by the RPC. The horizontal axis is the number of iterations for the estimation of RPC, using iterative least-square-estimations. One set of the RPC was produced in each iteration and it was used as the initial value of the next iteration. It can be seen that the accuracy of the RPC converged after 5-6 iterations. converged at around the 0.03 pixel in the Y direction and around the 0.02 pixel in the X direction for all three view angles. The accuracy was comparable with that reported in [37].

Accuracy of the Sensor Model
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 21 direction and around 0.04 pixel in the X directions, for all of the three view angles. The RMSE converged at around the 0.03 pixel in the Y direction and around the 0.02 pixel in the X direction for all three view angles. The accuracy was comparable with that reported in [37].  Figure 6 shows the simulated image of the first scene (i.e., the flat forest landscape). Figure 6a is the entire simulated nadir image with the true color. Figure 6b is a subset of the nadir image marked by the black rectangle in Figure 6a. The tree crowns, shadows, and ground surface are clearly visible in Figure 6b. Figure 6c-e show an enlarged gray image (i.e., the summation of red, green, and blue channels) simulated with different view angles of 0° (nadir), 20° (forward), and −20° (backward), respectively. The red crosses in Figure 6c-e indicate the same ground position. It can be seen that tree shadows (darker pixels) remained at the same position in all of three view angles but the positions of the tree crowns (pixels with medium gray values) relative to its shadow, changed with the view angle, along the vertical direction. Figure 6f is a composite image of Figure 6c-e in red, green, and blue channels, respectively. The displacement caused by the different view angles is more obvious in Figure 6f. The displacement exhibited on Figure 6c to Figure 6f, clearly demonstrates that the LandStereo model is capable of simulating the stereoscopic features (i.e., parallax) of forest canopies.  Figure 6 shows the simulated image of the first scene (i.e., the flat forest landscape). Figure 6a is the entire simulated nadir image with the true color. Figure 6b is a subset of the nadir image marked by the black rectangle in Figure 6a. The tree crowns, shadows, and ground surface are clearly visible in Figure 6b. Figure 6c-e show an enlarged gray image (i.e., the summation of red, green, and blue channels) simulated with different view angles of 0 • (nadir), 20 • (forward), and −20 • (backward), respectively. The red crosses in Figure 6c-e indicate the same ground position. It can be seen that tree shadows (darker pixels) remained at the same position in all of three view angles but the positions of the tree crowns (pixels with medium gray values) relative to its shadow, changed with the view angle, along the vertical direction. Figure 6f is a composite image of Figure 6c-e in red, green, and blue channels, respectively. The displacement caused by the different view angles is more obvious in Figure 6f. The displacement exhibited on Figure 6c to Figure 6f, clearly demonstrates that the LandStereo model is capable of simulating the stereoscopic features (i.e., parallax) of forest canopies. seen that tree shadows (darker pixels) remained at the same position in all of three view angles but the positions of the tree crowns (pixels with medium gray values) relative to its shadow, changed with the view angle, along the vertical direction. Figure 6f is a composite image of Figure 6c-e in red, green, and blue channels, respectively. The displacement caused by the different view angles is more obvious in Figure 6f. The displacement exhibited on Figure 6c to Figure 6f, clearly demonstrates that the LandStereo model is capable of simulating the stereoscopic features (i.e., parallax) of forest canopies. The simulated images were stereo matched using the extraction module of the digital elevation model provided in the commercial software ENVI. Figure 7 shows the spatial distribution of the point cloud (common points), extracted using the different view combinations. Figure 7a is the subset of the nadir image. Figure 7b is the point cloud identified from the stereo image pair of the nadir and the forward (NF), over the same area as Figure 7a. The white pixel indicates that the pixel matching was successful between the image pair, while the black pixels indicate a failure in pixel matching. Figure 7c,d was similar with Figure 7b but was extracted from an angle combination of NB (nadir and backward) and FB (forward and backward), respectively. Figure 7e is the composite image of Figure 7b-d in red, green, and blue channels, respectively. It can be seen that more points were identified in the image pair of NF and NB, than that in FB. This could be attributed to the changes of the image textures between the different combinations of observation angles. The stereo matching was easier on the images with a less change of image textures. It was not difficult to understand that the changes of the image textures were more severe for the images with larger differences of observation angles. The angle difference between NF and NB was only 20 • , while that between FB was 40 • . Therefore, it could be anticipated that lesser point could be identified in FB, through automatic image matching than what could be identified in NF and NB. The simulated images were stereo matched using the extraction module of the digital elevation model provided in the commercial software ENVI. Figure 7 shows the spatial distribution of the point cloud (common points), extracted using the different view combinations. Figure 7a is the subset of the nadir image. Figure 7b is the point cloud identified from the stereo image pair of the nadir and the forward (NF), over the same area as Figure 7a. The white pixel indicates that the pixel matching was successful between the image pair, while the black pixels indicate a failure in pixel matching. Figure 7c and Figure 7d was similar with Figure 7b but was extracted from an angle combination of NB (nadir and backward) and FB (forward and backward), respectively. Figure 7e is the composite image of Figure 7b-d in red, green, and blue channels, respectively. It can be seen that more points were identified in the image pair of NF and NB, than that in FB. This could be attributed to the changes of the image textures between the different combinations of observation angles. The stereo matching was easier on the images with a less change of image textures. It was not difficult to understand that the changes of the image textures were more severe for the images with larger differences of observation angles. The angle difference between NF and NB was only 20°, while that between FB was 40°. Therefore, it could be anticipated that lesser point could be identified in FB, through automatic image matching than what could be identified in NF and NB. Figure 7e shows the complementary effect between the different view angle combinations. The white pixel indicated that the common pixel was successfully identified in all of the three combinations, while the black pixels indicate that no common pixel could be identified in any of the view combinations. The colorful pixel meant a common pixel was identified in one or two view combinations. Comparing Figure 7e with Figure 7a, it was easily observed that the white pixels were mostly located within the forest gaps, while the colorful pixels were located on the tree crowns. Therefore, it could be concluded that the synergy of the point cloud extracted from different view combinations was helpful for increasing the point cloud density and in further improving the description of the forest spatial structures.    The colorful pixel meant a common pixel was identified in one or two view combinations. Comparing Figure 7e with Figure 7a, it was easily observed that the white pixels were mostly located within the forest gaps, while the colorful pixels were located on the tree crowns. Therefore, it could be concluded that the synergy of the point cloud extracted from different view combinations was helpful for increasing the point cloud density and in further improving the description of the forest spatial structures. Figure 8 shows  Figure 8a shows the vertical distribution of the point cloud extracted over a taller forest stand. It could be seen that the point cloud was mainly located near the canopy top. This was reasonable because taller trees have larger crowns and higher canopy coverage as described in the building of forest scenes. Figure 8b shows the vertical distribution of the point cloud extracted over a forest stand with medium heights. The point cloud could capture the ground surface through forest gaps. Therefore, the point cloud distributed between the ground surface and the canopy tops. Figure 8c shows the vertical distribution of the point cloud extracted over a shorter forest stand. More points were located on the ground surface than the forest canopy top.

Accuracy of the Flat Forest Landscapes
Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 21 blue line was the vertical distribution of the forest canopy tops calculated from the tree list. Figure  8a shows the vertical distribution of the point cloud extracted over a taller forest stand. It could be seen that the point cloud was mainly located near the canopy top. This was reasonable because taller trees have larger crowns and higher canopy coverage as described in the building of forest scenes. Figure 8b shows the vertical distribution of the point cloud extracted over a forest stand with medium heights. The point cloud could capture the ground surface through forest gaps. Therefore, the point cloud distributed between the ground surface and the canopy tops. Figure 8c shows the vertical distribution of the point cloud extracted over a shorter forest stand. More points were located on the ground surface than the forest canopy top.  Figure 9 shows the forest height extracted from the synergy of multi-view stereo images simulated from the first scene (i.e., flat forest landscape). Figure 9a is an image of the number of common points within a resolution cell of 30 m. As depicted previously, there should be 2700 points in total in each 30 m pixel, since the point cloud had a pixel size of 1 meter. Results showed that the maximum number of common points identified was 2137 and a shorter forest had more common points than a taller forest. Figure 9b shows the image of the height of 90% of points (H90), i.e., the number of common points below this height was 90% of the total number of points within a pixel. It could be observed from Figure 8 that the upper boundary of a vertical histogram of the common points was located near the forest canopy top, while some false common points occasionally appeared due to the incorrect match between the stereo image pairs, as shown at the elevation of 799.0 m in Figure 8a. Therefore, H90 was selected as an index for describing the forest height. Figure 9c shows the image of H90 produced from the tree list. The canopy height model with a resolution of 1.0 m was first generated according to the tree list. Then, the vertical histogram of the CHM pixels within each 30 m by 30 m forest stand was calculated. The height of H90 was determined in the same way as that of Figure 9b. Figure Figure 9 shows the forest height extracted from the synergy of multi-view stereo images simulated from the first scene (i.e., flat forest landscape). Figure 9a is an image of the number of common points within a resolution cell of 30 m. As depicted previously, there should be 2700 points in total in each 30 m pixel, since the point cloud had a pixel size of 1 meter. Results showed that the maximum number of common points identified was 2137 and a shorter forest had more common points than a taller forest. Figure 9b shows the image of the height of 90% of points (H90), i.e., the number of common points below this height was 90% of the total number of points within a pixel. It could be observed from Figure 8 that the upper boundary of a vertical histogram of the common points was located near the forest canopy top, while some false common points occasionally appeared due to the incorrect match between the stereo image pairs, as shown at the elevation of 799.0 m in Figure 8a. Therefore, H90 was selected as an index for describing the forest height. Figure 9c shows the image of H90 produced from the tree list. The canopy height model with a resolution of 1.0 m was first generated according to the tree list. Then, the vertical histogram of the CHM pixels within each 30 m by 30 m forest stand was calculated. The height of H90 was determined in the same way as that of Figure 9b. Figure 9d shows the scattering plot of Figure 9b against Figure 9c. They were highly correlated with each other; R 2 = 0.96 and RMSE = 0.99 m.
occasionally appeared due to the incorrect match between the stereo image pairs, as shown at the elevation of 799.0 m in Figure 8a. Therefore, H90 was selected as an index for describing the forest height. Figure 9c shows the image of H90 produced from the tree list. The canopy height model with a resolution of 1.0 m was first generated according to the tree list. Then, the vertical histogram of the CHM pixels within each 30 m by 30 m forest stand was calculated. The height of H90 was determined in the same way as that of Figure 9b. Figure 9d shows the scattering plot of Figure 9b against Figure 9c. They were highly correlated with each other; R 2 = 0.96 and RMSE = 0.99 m.   Figure 10 shows the stereoscopic features of the bare terrains, simulated using the LandStereo model and parameters of the second scene in Table 1 (i.e., bare mountains without any vegetation). The geometric sensor model was the same as that of the first scene, because the simulation parameters were not changed. Figure 10a shows the simulated gray nadir image. Figure 10b shows the stereoscopic image composed by taking epipolar images of the forward view (in red) and that of backward view (in green and blue). Stereo vision can be seen when Figure 10b is observed by red-blue 3D glasses. The colorful pixels were caused by the parallax. Figure 10c is the scatter plot of the elevations extracted from the simulated stereo imagery and the input DTM as shown in Figure 4a. It could be seen that all pixels scattered along the 1:1 diagonal line. They were highly correlated with R 2 = 0.9997 and RMSE = 1.32 m. Figure 10d shows the histogram of the elevation difference between the estimation and the true value. There is no doubt that the LandStereo model showed a good performance in the simulation of stereoscopic features of the terrains. Figure 11 shows the stereoscopic features of mountainous forest landscapes (the third scene) simulated using the LandStereo model and the parameters in Table 1. The geometric sensor model was the same as that of the first scene, because the simulation parameters had no change. Figure 11a shows the simulated nadir image with the true color. Compared Figure 6a with Figure 11a, it can be seen that the terrain features are clearly exhibited in the latter. Figure 11b shows the stereoscopic image composed in the same way as Figure 10b. The subset of Figure 11b over the area covered by the white rectangle is shown as Appendix E. More color pixel could be observed, due to the complicated parallax of forest canopies. Figure 11c shows the image of the number of common points in each forest stand (i.e., 30 m pixel). It could be seen that more common points were identified over the flat area or gentle terrains, but fewer common points could be identified over the steep mountains. This should be attributed to the severe changes of image textures and larger parallax, over the steep mountains than the flat terrains. Figure 11d shows the scattering plots of the heights of the forest canopy tops extracted from the multi-view stereo imagery against the heights of forest canopy top calculated from the tree list. The extracted forest height and its measurements were highly correlated with R 2 = 0.92 and RMSE = 1.15 m. Figures 6-11 show the results from the simulated stereo images of the three scenes, i.e., the flat forest landscape, bare mountains without any vegetation, and the mountainous forest landscapes. These results clearly showed a good performance of the LandStereo model in the simulation of stereoscopic features of forest canopies, over mountainous areas, at landscape scales. backward view (in green and blue). Stereo vision can be seen when Figure 10b is observed by redblue 3D glasses. The colorful pixels were caused by the parallax. Figure 10c is the scatter plot of the elevations extracted from the simulated stereo imagery and the input DTM as shown in Figure 4a. It could be seen that all pixels scattered along the 1:1 diagonal line. They were highly correlated with R 2 = 0.9997 and RMSE = 1.32 m. Figure 10d shows the histogram of the elevation difference between the estimation and the true value. There is no doubt that the LandStereo model showed a good performance in the simulation of stereoscopic features of the terrains.  Figure 11 shows the stereoscopic features of mountainous forest landscapes (the third scene) simulated using the LandStereo model and the parameters in Table 1. The geometric sensor model was the same as that of the first scene, because the simulation parameters had no change. Figure 11a shows the simulated nadir image with the true color. Compared Figure 6a with Figure 11a, it can be seen that the terrain features are clearly exhibited in the latter. Figure 11b shows the stereoscopic image composed in the same way as Figure 10b. The subset of Figure 11b over the area covered by the white rectangle is shown as Appendix E. More color pixel could be observed, due to the complicated parallax of forest canopies. Figure 11c shows the image of the number of common points in each forest stand (i.e., 30 m pixel). It could be seen that more common points were identified over the flat area or gentle terrains, but fewer common points could be identified over the steep mountains. This should be attributed to the severe changes of image textures and larger parallax, over the steep mountains than the flat terrains. Figure 11d shows the scattering plots of the heights of the forest canopy tops extracted from the multi-view stereo imagery against the heights of forest canopy top calculated from the tree list. The extracted forest height and its measurements were highly correlated with R 2 = 0.92 and RMSE = 1.15 m.  Figure 6 to Figure 11 show the results from the simulated stereo images of the three scenes, i.e., the flat forest landscape, bare mountains without any vegetation, and the mountainous forest landscapes. These results clearly showed a good performance of the LandStereo model in the

Discussions
Generally, for remote sensing models simulating multispectral features of vegetation, the comparison of the simulated results to the real data is an essential way to make model validations.
The input in such types of models are the forest scenes and the spectral information of vegetation components, such as leaves, branches, and so on, while the output of the model is the spectral data of a forest stand. We do not know what the simulated multispectral features of the forest stand should be, if no real remote sensing data is used. Therefore, a comparison of the simulated data to real data is necessary for the validation of such models. The LandStereo model proposed in this study was used to simulate the stereoscopic features rather than the multispectral features. The stereoscopic features of remote sensing images were directly determined by the ground surface elevation, the tree size, and the observation geometry. For the given observation geometry (i.e., the geometrical sensor model expressed by RPC), the ground surface elevation and tree sizes should be correctly extracted from the simulated stereoscopic features, if the proposed model is valid. It is impossible to extract the ground surface elevation and tree size if the proposed model is invalid. The ground surface elevation (i.e., digital surface model) and tree sizes are the input of the proposed model. The input ground surface elevation and parameters of the trees above the surface, is the "real data". Therefore, a comparison of the ground surface elevation and canopy height extracted from the simulated stereoscopic features to the model input is enough for the validation of the proposed model.
The development of the LandStereo model and the analysis of the model simulations were interdisciplinary. The simulation of the stereoscopic images belongs to the subject of computer graphics; the development of a geometric sensor model is a task of photogrammetry; the extraction of the parallax by image matching is the work of computer vision. The LandStereo combines the knowledge of these subjects to simulate stereoscopic images for the monitoring of forest spatial structures. The LandStereo model is not just a simple collection of the existing tools in different subjects, rather it is a creative unification of the interdisciplinary knowledge. It is impossible to simulate images applicable to stereo matching without these creative works. The LandStereo model could work as a virtual satellite to capture the stereo imagery of the predefined landscapes with the given system parameters as shown in Table 1. This model could provide an important set of image simulations that help examine the impact of the viewing geometry, on the measurement of the forest structure from spaceborne stereogrammetry.
In this study, the crown shapes were modeled using three embedded carved concentric ellipsoids. This was not the only choice. The POV-Ray has great power to simulate complicated and vivid forest scenes (http://hof.povray.org/). There are many studies that have been working on the simulation of real trees using POV-Ray. Many tree models have been developed (such as http://povrayscrapbook. awardspace.com/index.html). These trees could easily be incorporated into the simulation of the LandStereo model.
The stereo imagery simulated in this study was optical imagery. The effect of the atmosphere on image quality was unavoidable. The POV-Ray was effective at simulating atmospheric conditions, such as fog, dust, haze, or visible gas (http://www.povray.org/documentation/view/3.7.0/252/#l129). Therefore, the LandStereo model should be potentially used to analyze the effects of atmospheric conditions on the measurements of the forest spatial structure, using stereo imagery.
The multi-view observation could produce both the stereoscopic information and the multi-angle spectral information (i.e., the Bidirectional Reflectance Distribution Function). The LandStereo model simulated image using the ray-tracing embedded in the POV-RAY, and had the potential to simulate the BRDF of a visible spectrum of a forest scene. This paper focused on the simulation of stereoscopic features as indicated by the paper title. Its performance on the simulation of BRDF will be explored in future research work.
There were two modes for the acquisition of stereo imagery by spaceborne stereoscopic systems. The first was the along-track stereoscopic observations, i.e., the parallax information was extracted using images acquired by different cameras onboard the same satellite but pointing to different angles along track. The AOLS/PRISM, Chinese ZY-3, and the ASTER all worked in this mode. The second was the stereoscopic observations by the across-track satellite stereo pair. The spaceborne stereoscopic systems having super high spatial resolutions such as worldview imagers work on this mode, because it was difficult to mount two cameras with large telescopes on the same satellite. It could be seen from Equations (4)-(6) that only the along-track angle θ was considered in determining the view direction. Therefore, the current version of the LandStereo model could only be used to simulate the along track stereoscopic observations. The LandStereo model would need to be modified for the simulation of stereoscopic observations, from the arbitrary viewing directions.
Two types of cameras were typically used in the acquisition of remote sensing images, i.e., pushbroom and frame cameras. The pushbroom cameras are widely used on satellites, while the frame cameras are widely used on the airborne platforms, such as the manned or the unmanned aerial vehicle (UAV). The POV-Ray had the capability to simulate optical images acquired by both types of cameras. However, the LandStereo model proposed in this study only accurately defined the observation geometry of the along-track pushbroom cameras. Therefore, the LandStereo model of the current version could not be used to simulate the frame cameras. For the theoretical analysis of stereo imagery acquired by frame cameras, the LandStereo model would be enhanced by modeling the observation geometry of frame cameras in future works.
There were two scenarios for the calculation of RPCs, i.e., terrain-independent and terraindependent. Tao, C.V., and Hu Y. (2001) reported that 80 GCPs were enough for the terrain-dependent scenarios because accuracy would not be improved as such if more number of GCPs were used. There was no need to spend more effort to collect more GCPs considering the cost [37]. In this study, the physical sensor parameters were known as shown in Table 1, therefore, the terrain-independent scenario was applied. For this case, the GCPs should be determined by the image grid and the elevation layers, so that the GCPs points are evenly distributed across the full extent of the image. The number of GCPs could be freely determined, as long as it is more than the minimum requirement.
The performance of the stereo matching algorithm is an important factor affecting the quality of the point cloud extracted from the stereo imagery. Many different algorithms of stereo matching have been developed in the computer version as well as in photogrammetry [42,43]. This study only used the one adopted by the commercial software ENVI in the extraction module of the digital elevation model. The accuracy of canopy height extracted by the point cloud might be improved if more proper algorithm of stereo matching is used. It was worthwhile to evaluate the performances of the different algorithms of stereo matching on the extraction of the forest canopy heights.
The parameters used in the model validation, as shown in Table 1, were not set for any specific satellite, rather were used for the approximation of several on-orbit satellites. For example, the orbit height of 500 km, approximates that used in the Chinese ZY-3, i.e., 505.984 km [44]. The combination of focal length, element size, and orbit height, produced a spatial resolution of 1.0 m, which approximated that of the IKONOS Satellite Sensor, [45]; the view angle was the same as that of SPOT5, i.e., ±20 • [46].
One interesting phenomena observed in Figure 8b,c was that the point cloud extracted from the view combination of forward and backward was mostly located on the ground surface, except for the forest canopy tops. Theoretically, the common part which could be seen from both the forward and backward views, should have been the forest canopy tops, if the top was relatively flat, so that the canopy tops were the matching points. However, the results showed that it was difficult to identify the common points on the forest canopy tops by an automatic image matching algorithm due to the severe changes of the image textures of the forest canopy top between the forward and the backward views. Therefore, a more common point was unexpectedly identified on the ground surface than at the forest canopy top.

Conclusions
The number of spaceborne stereoscopic systems has been growing in recent years. However, these systems work on different viewing angles and image spatial resolutions. A theoretical model is needed for the systematical analysis of their performance on the depiction of forest canopy structures, especially over the complicated mountainous landscapes. Considering the fact that those mostly found in the optical models of remote sensing, focus on the simulation of multispectral features, this study proposed a new model, referred to as the LandStereo model, using the ray tracing technique, to simulate the stereoscopic features of mountainous forest landscapes. The model was validated by making the simulations over flat forest landscapes, bare terrain landscapes, and mountainous forest landscapes. For the flat forest landscape, the height of the forest canopy top were extracted from the simulated stereo imagery, with R 2 = 0.96 and RMSE = 0.99 m; the elevation of the ground surface were extracted from the simulated stereo imagery of bare ground, with R 2 = 0.9997 and RMSE = 1.32 m. Given the elevation of the ground surface, the height of the forest canopy top were extracted from the simulated stereo imagery of the mountainous forest landscapes, with R 2 = 0.92 and RMSE = 1.15 m. It could be seen that the simulated stereo imagery could be used for the extraction of the ground surface elevations and the height of the forest canopy top. These simulated results clearly demonstrated that the LandStereo model worked well in simulating stereoscopic features of both terrains and forest canopies, at landscape scales.
The LandStereo model could work as a virtual satellite to capture stereo imagery of predefined landscapes with the given system parameters. The LandStereo model is capable of creating more simulations to understand the influences of view angles, image resolutions, terrain conditions, and temporal changes on the extraction of forest heights, using spaceborne stereo imagery over different forest types. Such relevant studies will be carried out in our future research endeavors.