3.1. Optical Design
Our design aims to develop a lightweight, compact OST-HMD that delivers both a wide virtual-display FoV and a wide see-through FoV. As the image source, we use a 0.49-inch OLED micro-display with 1920 × 1080 resolution and 8 μm pixel pitch. Based on the spectral characteristics of the display, the optical system is designed to operate at three discrete wavelengths, 490, 590, and 660 nm, with 590 nm serving as the central wavelength.
In near-eye optical systems, especially those requiring binocular fusion, a large exit pupil offers critical advantages: it prevents FoV loss during eye rotation and accommodates a broader inter-pupillary distance (IPD) range, eliminating the need for mechanical IPD-adjustment mechanisms that would add complexity. Yet enlarging the exit pupil typically increases system volume and weight while making a large FOV harder to achieve. Balancing these trade-offs, we set the exit pupil size to 8 mm.
During optimization, we iteratively adjusted parameters like focal length, ultimately settling on 17 mm, corresponding to an F-number of 2.1. The physical size of the image source is directly related to the field of view and the focal length of the optical system, which can be expressed as
where
is the physical length of the image source and
is the effective focal length of the optical system. Based on this relationship, the virtual-display field of view is determined to be 39° diagonally, with horizontal and vertical fields of view of 30° × 24°. This corresponds to a resolution of approximately 51 pixels per degree (PPD), which is close to the human visual acuity limit of approximately 60 PPD and is therefore capable of delivering highly detailed virtual imagery.
To accommodate users with refractive errors, we set eye relief to 18.5 mm. Key system specifications are summarized in
Table 1. To ensure high-quality imaging performance, MTF at half the Nyquist frequency of the display is required to reach at least 10%. Since distortion in the virtual-imaging path does not degrade image sharpness and because mature image process and display technologies can compensate for geometric distortion electronically, strict constraints on optical distortion are not imposed in this design.
Given that such optical systems are typically mass-produced through injection molding, the prism module in this design is fabricated using an optical-grade resin. In this work, we employ APL5514ML from Mitsui Chemicals (Tokyo, Japan), a material known for its excellent optical transparency, low birefringence, high rigidity, and strong chemical stability. However, because all refractive elements share the same material, chromatic aberration cannot be fully corrected through refractive design alone. This limitation is one of the key motivations for introducing a metalens into the system.
As shown in
Figure 2, the system consists of primary prism E
1, auxiliary prism E
2, and secondary lens L
1. The virtual image is projected into the human eye through E
1 and L
1, while E
1 and E
2 jointly provide distortion-free optical see-through functionality. The primary prism E
1 supplies the main optical power of the system and magnifies the image displayed by the micro-display. The secondary lens L
1, placed between the micro-display and E
1, is a plano-convex lens whose planar surface serves as the substrate for the metalens. On the opposite side of the primary prism, the auxiliary prism E
2 is employed to correct distortion in the see-through optical path.
In this system, the primary prism E1 is designed with three optical effective surfaces to realize the specific folded optical path (TIR, reflection, transmission). This topological configuration is the minimal surface set required to fold the optical path within a 12 mm thickness while ensuring the separation of the virtual and see-through channels. In the Zemax OpticStudio model, the virtual-image optical path is constructed using reverse ray tracing. Rays are launched from the pupil position of the human eye and propagate through surfaces S1–S3 of the primary prism and surfaces S4 and S5 of the secondary lens L1 before reaching the micro-display.
For the see-through optical path, forward ray tracing is employed. Rays originating from the real environment sequentially pass through surfaces S6 and S2′ of the auxiliary prism followed by surfaces S2 and S1 of the primary prism before finally entering the human eye.
In the virtual-display optical path, the prism geometry plays a critical role in determining both the optical performance and the manufacturability of the final design. It is therefore essential to control the structure of the primary freeform prism E1, ensuring that rays from all fields of view propagate correctly through the system and reach the user’s eye. Since global coordinates cannot be used directly in Zemax OpticStudio for such geometries, optimization requires the use of operands to read ray coordinates and constrain the ray paths, thereby guaranteeing correct ray propagation across the entire field.
For the freeform surfaces in Zemax OpticStudio, the extended polynomial representation is employed. The optimal freeform shape is obtained through a combination of least-squares optimization and ray-tracing-based evaluation. Because such off-axis systems impose strict constraints on the total optical path length, geometric distance alone is insufficient for describing the system’s behavior. Therefore, optical-path-related operands are typically used during optimization to constrain the ray path and consequently regulate the overall system length.
For the see-through optical path, the auxiliary prism E
2 is primarily responsible for compensating distortion. Its thickness must also be constrained to meet engineering and fabrication requirements. In the Zemax OpticStudio model, the see-through path is constructed using forward ray tracing. Rays originating from real-world objects sequentially pass through surfaces S
6, S
2′, and S
1 before entering the human eye. Due to the nature of the see-through path, the combination of the primary prism and auxiliary prism forms an afocal system for objects located at infinity in the real environment. Because an afocal system generates parallel output rays, an ideal lens is inserted at the exit pupil position to simulate the human eye and enable image formation. The effective focal length of the ideal eye lens is set to 20 mm, and image quality is evaluated at the corresponding focal plane. The schematic diagram of perspective optical system optimization is presented in
Figure 3.
During the optimization, constraining the system thickness is a key consideration, and both distortion and chromatic aberration must be carefully balanced to achieve the desired performance. All prism optics are made of the optical resin APL5514ML. To ensure good augmented reality functionality, appropriate coatings are applied to the prism surfaces. As shown in
Figure 2, surface S
2 is coated with a partial-reflection film. After the light emerges from the micro-display, it first reaches surface S
1, where the total internal reflection (TIR) condition is satisfied; therefore, no reflective coating is required on S
1. Ray-tracing analysis confirms that the incident angles at surface S1 remain larger than the critical angle of the resin material (approximately 40.5°) across the entire 39° FOV, ensuring that the TIR condition is strictly maintained without leakage even for marginal rays. While folded systems are typically prone to ghosting due to multiple surface interactions, this design mitigates such artifacts by relying on strict TIR conditions at S1 rather than a semi-reflective coating, which prevents secondary reflections. Additionally, high-efficiency anti-reflection (AR) coatings are applied to the remaining prism surfaces to suppress stray light. After completing all optimization steps, the final system configuration is obtained, as illustrated in
Figure 4.
3.2. Metalens Design and Analysis
Then, a discrete multi-wavelength achromatic metalens design is employed. The primary goal is to identify suitable meta-atom geometries and spatial arrangements that together enable achromatic performance across multiple wavelengths. To achieve this, we select silicon nitride nanopillars (positive-type structures) along with their corresponding Babinet-inverted hollow counterparts (negative-type structures). The combination of these complementary unit-cell designs allows the resulting metalens to achieve high focusing efficiency while meeting the multi-wavelength achromatic constraints. The schematics of 12 kinds of meta-unit architectures, including nanopillars and their Babinet hollow structures is presented in
Figure 5.
In this design, we selected SiNx nanopillars on a SiO2 substrate for their high refractive index and fabrication compatibility. To ensure full phase coverage, we constructed a library of 12 symmetric unit-cell geometries (including square, circular, cross, and their Babinet-inverted counterparts) with a standard 350 nm period. The geometric parameters were optimized with specific constraints: First, the height was fixed at 1 μm to streamline fabrication and simulation, as transmission sweeps confirmed optimal efficiency at this value. Second, lateral dimensions were restricted (50–320 nm) to maintain sufficient air gaps from the cell boundaries, thereby preventing near-field coupling between adjacent meta-atoms.
Electromagnetic simulations were performed using Lumerical FDTD Solutions, producing the electric-field distribution of each unit cell. The phase response was extracted using angle function, and only structures with transmission above 95% were retained to form the initial phase library.
To analyze and compare the multi-wavelength phase responses of the different geometries, an automated data-processing and visualization script was developed. The script imports simulation results for all solid and hollow structures at multiple wavelengths, normalizes the phase into the
range, and constructs a comprehensive phase response database. For visualization, kernel density estimation (KDE) with Gaussian smoothing is applied to obtain physically meaningful and noise-suppressed phase distribution curves. As shown in
Figure 6, solid structures (circle, ring, ellipse, rectangle, cross) are plotted in the upper panel and their Babinet-inverted counterparts below, using matched colors with different line styles for ease of comparison. This visualization framework provides clear quantitative insight into the phase behavior of the various nanopillar geometries and supports subsequent optimization of the achromatic metalens design.
Based on the phase profile obtained from the binary diffractive surface, the metalens can be designed on the S
4 planar substrate. Following the folded hybrid-optics design scheme, the phase distribution of the binary surface is shown in
Figure 7, and the corresponding parameters are summarized in
Table 2.
With the phase library established, the macroscopic metalens layout was generated by discretely matching the most suitable meta-atoms from the library to the target phase profile, shown in
Figure 8 at each spatial coordinate.
To verify the achromatic performance and light utilization, a representative central sub-array of the constructed metalens was extracted and simulated at the three discrete design wavelengths (490, 590, and 660 nm). As shown in
Figure 9, the phase distributions of this sub-array remain nearly identical across the three wavelengths, indicating a stable phase response with minimal wavelength dependence. Furthermore, consistent with the high-transmission selection criteria (>95%) described in the design process, the constructed device inherently ensures high light throughput and efficient wavefront modulation within the selected spectral range.
To clarify the rigorous design process, the optimization was governed by a specific Figure of Merit (FoM). We defined the FoM as the weighted sum of the phase errors at the three discrete design wavelengths.
where
and
denote the required hyperboloidal phase and the actual phase provided by the meta-atom library, respectively. Considering the need for balanced color performance, we assigned equal optimization weights to all three wavelengths.
Regarding efficiency, performing a full-wave simulation for the entire macroscopic metalens is computationally prohibitive. Therefore, we adopted a localized optimization strategy. By strictly constraining the transmission of selected meta-atoms to exceed 95%, we minimized absorption and scattering losses at the element level. This high local transmission, combined with the low phase error ensured by the FoM, guaranteed effective wavefront modulation, which was macroscopically validated by the high MTF values presented in
Section 3.3.