6.1. Design of Cascaded DOEs for Focusing Different Incident Beams to Different Regions
Let in the input plane of the cascaded DOE, four input distributions
,
be defined, which correspond to Gaussian beams with the radius at the
level equal to
and the wavelength
, incident on this plane from different directions. Let the vectors defining the propagation directions of the beams
and
lie in the plane
and make angles
with the
z axis, and the corresponding vectors of the beams
and
lie in the plane
and also make angles
with the
z axis. The complex amplitudes of these beams in the plane
have the form
Let us consider the calculation of cascaded DOEs generating in the output plane
different uniform intensity distributions
,
for the incident beams of Equation (23). The four output distributions are centered at the origin of coordinates and correspond to a circle with the diameter of 2.3 mm, contour of a square with the side of 2.3 mm, a cross consisting of two perpendicular segments with the length of 2.3 mm, and a “rotated cross” consisting of two diagonals of the square with the side of 2.3 mm (
Figure 2). The thickness of the lines of the required output intensity distributions amounts to 0.2 mm.
We will consider three design examples: a single DOE (located in the plane ) and cascaded DOEs consisting of two DOEs (located at and ) and three DOEs (located at , , and ). We will define the phase functions in the DOE planes on 512 × 512 grids with the step of (these parameters correspond to some of the available spatial light modulators, which can be used as DOEs). In this case, the side length of the square aperture of each DOE amounts to 9.216 mm.
Let us note that at the chosen parameters, the incident beams strongly overlap in the planes of the DOEs. For example, after the propagation to the plane , the centers of the beams are displaced from the optical axis (the z axis) only by , which is significantly smaller than the radius of the beams. This overlap of the incident beams significantly complicates the problem of calculating the cascaded DOE.
The calculation of the phase functions of the DOE was carried out using the gradient method described above. As the error functional, the sum of functionals (14) was used, where the functionals
representing the difference between the required distributions and the ones generated at the input fields
were defined as
At each iteration, the derivatives of the error functional were calculated, which, according to Equation (17), correspond to the sum of derivatives of the functionals
. The calculation of the derivatives of the functionals
was carried out using Equation (12), where
is the complex amplitude of the field incident on the
m-th DOE in the case of the direct propagation of the incident beam
, and the function
is calculated through the backpropagation of the field
where
is the complex amplitude of the field in the output plane. In the optimization, the calculation of the functions
and
featured in the expressions for the derivatives of the functionals was based on the numerical calculation of the Fresnel–Kirchhoff integrals using the fast Fourier transform routine.
Figure 3 shows the calculated phase functions of one, two, and three DOEs. For the calculation of each example, 8000 iterations with an exponentially decreasing step were performed (such a number of iterations turned out to be sufficient for the convergence of the method). As initial values, phases equal to zero at the whole aperture were used. The calculation time on a standard PC (Intel Core i9 10920X CPU, 3.50 GHz) was from 30 min for the single DOE to approximately one hour for the cascade of three DOEs.
One can see that the calculated phase functions of the single DOE and of the first DOEs in the cascaded structures are close to zero (to the initial phase value) near the edges of the aperture. This is caused by the fact that the amplitude of the fields generated in the plane of the first DOE in the case of the input beams of Equation (23) is close to zero in the peripheral regions of the aperture. Since the derivatives of the error functional are close to zero in the regions with a small amplitude of the field, the phase functions changed only weakly in these regions and remained close to the initial zero value.
Figure 4 shows the calculated intensity distributions generated by the calculated single and cascaded DOEs at different incident beams of Equation (23). In order to characterize the quality of the generated distributions, let us use the energy efficiencies
and root-mean-square errors
. The energy efficiencies
describe the fraction of the energy
of the
j-th incident beam, which arrives to the required region
. The root-mean-square errors
describe the root-mean-square deviation of the distribution
generated for the
j-th incident beam from the required distribution
in the region
covering all the required regions
and corresponding to a square with the side of 3 mm centered at the origin of coordinates. Here,
is the area of the region
and
is the average intensity in this region. The values of the energy efficiencies and root-mean-square errors for the designed DOE examples are presented in
Figure 4 above each of the calculated intensity distributions.
From
Figure 4, it is evident that the quality of the generated distributions increases with an increase in the number of DOEs. In particular, for the single DOE, the required distributions are generated with extremely large root-mean-square errors (being close to or even exceeding 100%) and at relatively low energy efficiencies (less than 54%). For a cascaded structure containing three DOEs, the root-mean-square error significantly decreases (the maximum error, which corresponds to the distribution
, amounts to 9.8%), and the energy efficiency exceeds 87%.
Thus, the presented examples demonstrate the advantages of cascaded DOEs over single ones in the problem of generating different required intensity distributions for different incident beams and confirm the high performance of the proposed design method.
6.2. Design of Cascaded DOEs for Classifying Handwritten Digits
In this subsection, we will consider the design of DOEs for classifying handwritten digits from the MNIST database [
25]. Let us start by considering the case of a single DOE. In the calculations, the input images of the digits were defined on a 56 × 56 grid with the step of
. The phase function of the DOE was defined on a 512 × 512 grid with the same step. Let the DOE and the output plane be located at
and
, respectively. Let us note that at the design wavelength
, the diffraction angle at a pixel of the input distribution amounts to
. In this case, the diffraction pattern from the pixel (with respect to the first minimum) at the distance
roughly covers the DOE aperture. In this regard, the chosen parameters ensure the “connection” of each pixel of the input image with all the pixels (grid nodes), at which the phase function of the DOE is defined.
In accordance with the design method, in the output plane, 10 spatially separated square regions
with the side length of 0.5 mm were defined, in which maximum energies for different input images of different digits have to be generated (see
Figure 5).
In the calculation, a training set containing 60,000 images of digits from the MNIST database was used. The DOE was calculated using batch training, with each batch containing 60 randomly chosen digits. As the error functionals, the quadratic error (QE) functional of Equations (14) and (17) and the softmax cross entropy (SCE) functional of Equations (14) and (20) were used. As the initial approximation for the DOE phase function, a random phase from the range
was chosen. In the DOE calculation, 10 epochs were performed, which takes approximately 7 min on a NVIDIA GTX 1070 8 Gb graphics card. Under an epoch, we understand the training of the DOE on 1000 batches containing all the images from the training set. The phase functions of the DOEs calculated using the QE and SCE criteria are shown in
Figure 6.
After training, “blind” testing of the performance of the calculated DOEs was performed using a test set consisting of 10,000 images not included in the training set. For each image from the test set, the generated intensity distribution was simulated, the energies (16) in the regions
were calculated, and then the input digit was determined using the maximum energy value. The testing results represented as confusion matrices and energy distribution matrices are represented in
Figure 7. The element (
i,
j) of the confusion matrix contains the percentage of cases, in which an input image of the digit
j was recognized as the digit
i. Accordingly, the diagonal elements of these matrices contain the percentage of the correct classifications. Similarly, the element (
i,
j) of the energy distribution matrix contains the averaged energy (in percent) in the region
at an input image of the digit
j. The diagonal elements of this matrix correspond to mean energies (in percent) in the “correct” regions corresponding to each digit.
For the DOE calculated using the QE criterion (
Figure 6a), the accuracy of the digit recognition varies from 93.9% for the digit “9” to 99.2% for the digit “1”. The overall classification accuracy (i.e., the ratio of the number of correctly recognized digits to the total amount of digits in the test set) amounts to 97.2%. For the DOE calculated using the SCE criterion (
Figure 6b), the accuracy varies from 91.9% for the digit “8” to 99.5% for the digit “0”, and the overall accuracy equals 96.8%. Let us note that the achieved classification accuracy values are quite high for single DOEs. For the sake of comparison, the overall classification accuracies in Refs. [
3,
5,
21] achieved using cascaded structures containing 5–10 DOEs vary from 91.8% to 93.4%.
As it was noted above, for the DOE calculated using the SCE criterion, the overall classification accuracy turned out to be 0.4% lower. At the same time, the energy distribution matrix for this DOE is better. Indeed, from the practical point of view, an important parameter is the contrast value, which shows, how much the energy in the required region exceeds the energy values in the other regions. Let us define the contrast for the digit
i as
where
are the elements of the energy distribution matrix. For robust determination of the “true maxima”, it is necessary for the contrast values
to exceed 0.1. According to the energy distribution matrix shown in
Figure 7b and corresponding to the DOE calculated using the QE criterion, the minimum contrast is achieved for the digit “9” and amounts to
. For the energy distribution matrix of
Figure 7d corresponding to the DOE calculated using the SCE criterion, the minimum contrast is also achieved for the digit “9” but is somewhat greater:
.
As an example,
Figure 8 shows a typical input image of the digit “3” and the corresponding energy distribution demonstrating a correct digit recognition.
Then, using the QE and SCE criteria, we designed cascaded DOEs comprising two DOEs located in the planes
and
. The output plane was located at
. All the other parameters (discretization, wavelength, and aperture sizes) coincide with the parameters of the examples considered above. The phase functions of the cascaded DOEs calculated after 10 epochs are shown in
Figure 9.
The confusion matrices and the energy distribution matrices for the designed cascaded DOEs are presented in
Figure 10. As before, the DOE performance was evaluated on a test set containing 10,000 images not included in the training set. By comparing the confusion matrices for single and cascaded DOEs (
Figure 7a,c and
Figure 10a,c), one can see an increase in the classification accuracy. The overall accuracy values for the cascaded DOEs calculated using the QE and SCE criteria amount to 98.0% and 97.6%, respectively. Thus, for the considered example, the increase in the classification accuracy achieved by using a cascaded structure containing two DOEs equals 0.8%. The energy distribution matrices for the cascaded DOEs (
Figure 10b,d) are also improved. In particular, minimum contrast values for the cascaded DOEs, which are also achieved for the digit “9”, amount to 0.19 and 0.31 for the QE and SCE criteria, respectively. These contrast values are more than 1.7 times greater than those for single DOEs.
Let us note that a further increase in the number of DOEs leads to only a marginal increase in the classification accuracy but enables improving the contrast values. In particular, for a cascaded structure consisting of three DOEs calculated using the SCE criterion, the minimum contrast amounts to 0.55, which is significantly greater than the value of 0.31 provided by the cascaded structure of two DOEs.
Another way to increase the DOE performance consists in increasing the number of the optimized parameters, which can be achieved by decreasing the step of the grid, at which the phase functions of the DOEs are defined. For example, a single DOE with the step size
(and the rest of the parameters coinciding with those of the single DOE examples considered above) calculated using the QE criterion provides the overall accuracy of 97.9% and minimum contrast of 0.16, which is considerably better than in the case of a single DOE with the larger step size of
[see
Figure 7a,b]. It is worth noting that this result is comparable with the performance of the cascaded structure of two DOEs with the
step size [see
Figure 10a,b].
From the practical point of view, it is important to discuss the misalignment issues, which will inevitably occur when implementing cascaded DOEs (DNNs). It is known that alignment errors smaller than the neuron (DOE pixel) size show a minor influence on the DNN performance [
3,
21]. When the alignment error is just bigger than the neuron size, the classification accuracy can be drastically reduced. It should also be noted that the longitudinal misalignment usually influences the performance of a DNN much less than the lateral (transverse) one [
21].
The cascaded DOEs studied in this work are no exception. In order to estimate the influence of DOE misalignment, as an example, let us consider the cascaded DOE comprising two DOEs and designed using the SCE criterion (
Figure 9c,d). The simulation results demonstrate that when the first DOE is laterally displaced by the vectors
(in the case of a fixed position of the second DOE), the overall classification accuracy remains greater than 95% (i.e., the decrease in the overall accuracy does not exceed 3%). The minimum contrast in this case also stays acceptable and exceeds 0.12. At further increase of the lateral displacement, the accuracy and contrast decrease more significantly: for example, at the lateral displacement
, the overall accuracy and minimum contrast amount to 90.9% and 0.09, respectively. The lateral displacement of the second DOE influences the performance somewhat less, e.g., at the
displacement, the overall accuracy equals 96.3%, whereas the minimum contrast is 0.13. Similar to the results presented in [
21], the longitudinal misalignment is much less critical: for example, the displacement of each of the DOEs along the optical axis by 200 μm leads to a decrease in the overall efficiency not exceeding 0.1% at virtually the same contrast.