Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training

Li, Fansanqiu; Yang, Kaicheng

doi:10.3390/photonics13020139

Open AccessArticle

Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training

by

Fansanqiu Li

^1,2,* and

Kaicheng Yang

^1,2

¹

School of Integrated Circuits, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China

²

Shenzhen Engineering Laboratory of Aerospace Detection and Imaging, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Photonics 2026, 13(2), 139; https://doi.org/10.3390/photonics13020139

Submission received: 19 September 2025 / Revised: 27 October 2025 / Accepted: 28 October 2025 / Published: 31 January 2026

(This article belongs to the Section Optical Communication and Network)

Download

Browse Figures

Versions Notes

Abstract

Diffractive deep neural networks (D²NNs) have garnered significant attention for their ultra-low energy consumption and parallel optical computing capabilities. However, their practical deployment is hindered by the “model–reality” gap caused by fabrication inaccuracy, device fluctuation, assembly misalignment, environmental perturbation, etc. Here, we propose a combined framework that integrates sharpness-aware minimization (SAM) and aberration-immune learning (AIL), enabling joint immunity against both stochastic noise and systematic deviations from theoretical model training. Specifically, we show that under multiple perturbations such as salt-and-pepper noise, Gaussian noise, and wavefront aberration, the SAM–AIL framework achieves significant classification accuracy improvements on MNIST and Fashion-MNIST compared to conventional offline training approaches. D²NN trained with the SAM–AIL scheme exhibited significant accuracy enhancement under moderate salt-and-pepper noise, Gaussian noise, X-axis, and Y-axis tilting perturbations, respectively. Our work provides an efficient solution for offline training and deploying high-robustness D²NNs on realistic physical systems that are resilient to a variety of imperfections, significantly enhancing model transferability and reliability for optical computing tasks.

Keywords:

deep optical neural network (DONN); robustness; optimization

1. Introduction

Optically implemented artificial neural networks are attracting growing interest due to their ultra-fast parallel processing capabilities and ultra-low energy consumption. Unlike their electronic counterparts, light propagation and interference inherently enable large-scale matrix operations and convolution computations, offering a novel hardware platform for deep learning inference [1,2,3,4]. Beyond spatial optics-based computing, other physical computing paradigms (microfluidic and micropneumatic computing) have emerged for specialized needs. For microfluidics, Clouse et al. developed a dragonfly-wing-inspired reservoir system (with 91% pattern classification accuracy via RGB dyes) [5], while Ahrar et al. realized electronics-free microfluidic control using pneumatic state machines and microvalves [6]. For micropneumatics, Picella et al. designed coding blocks for soft robot programming, and Hoang et al. created a < USD 1 fault detector—both without electronics [7,8].

While these microfluidic and micropneumatic approaches excel in scenarios like bio-compatible computing, harsh-environment deployment (no electronics), or low-cost fabrication, spatial optics-based computing remains the most promising for high-speed, large-scale parallel inference—particularly in deep learning tasks implemented via architectures like diffractive deep neural networks (D²NNs). D²NNs leverage multi-layer phase or amplitude modulation to sequentially transform free-space propagating optical fields, achieving automated inference tasks such as recognition and classification of complex targets with near-zero latency and minimal power consumption [9,10,11,12,13,14,15]. Figure 1 illustrates two representative hardware routes for realizing D²NN. Panel (a) shows a free-space diffractive deep neural network (S-D²NN) based on multi-plane light conversion (MPLC), where a phase-only SLM and a mirror (M) implement multiple reprogrammable diffractive planes; the assembly is mounted on a multi-axis manual translation stage (MS) for alignment, and the output field is captured by a CCD. The advent of MPLC has given rise to pixel-level reprogrammable devices and very high effective neuron densities [16,17,18]. Panel (b) depicts an integrated diffractive deep neural network (I-D²NN), in which a DOE/photonic chip comprising stacked diffractive layers is placed between two microscope objectives and imaged on a CMOS sensor, consistent with recent on-chip demonstrations [19,20,21]. However, to train such D²NNs, existing approaches predominantly rely on in silico digital twins, where one needs to construct a physical model for accurately describing the light-wave dynamics and the controllable parameters of the model is trained by data-driven backpropagation of a predefined loss function [3]. The optimized phase/amplitude profiles are subsequently loaded onto optical devices such as spatial light modulators (SLMs) or realized by delicately fabricated diffractive optical elements (DOEs) or metasurfaces to satisfy the desired optical forward propagation [22]. However, practical deployment is hindered by a persistent “model–reality” gap between the in silico-modeled expectations and the realistic experimental setup. This gap is primarily a product of fabrication error, device fluctuation, optical system assembly misalignment, wavefront distortion, environmental perturbations, etc. Such imperfections may lead to significant performance (e.g., inference accuracy) degradation simply because the digital twin model cannot be perfectly matched with—and represents only an approximation (which may be good enough) to—the real system [23,24].

To bridge this gap, researchers have essentially adopted several kinds of strategies: (1) incorporating noise or device mismatch/imperfection into the theoretical and simulation models during offline in silico training in order to enhance error resilience [25,26,27]; (2) implementing online or hybrid-model training by integrating real optical measurements into the forward and backward propagation loop for the learning process, thereby reducing dependency on precise hardware modeling [28,29]; and (3) substantially simplifying the architecture of the D²NN by reducing its depth and modulation discretization level. For example, Spall et al. demonstrated that a hardware-in-the-loop (HIL) training scheme can significantly improve robustness against steady-state noise by aligning error characteristics with the physical system [30]. Nonetheless, these HIL-involving methods often require iterative hardware access for sequential updating and/or multiple feed-forward measurements for gradient or loss evaluation. This may result in high computational costs and limited model portability across apparently similar optical devices and systems [21,31,32]. Non-gradient training schemes are possible but also heavily rely on iterative interrogation of the real system. Therefore, both the design space dimension and the speed of the feed-forward process impose strong limitations on their application [2].

To address these critical challenges in transitioning D²NNs via simulation to real system deployment, here, we propose a co-optimization framework that synergistically integrates sharpness-aware minimization (SAM) and aberration-immune learning (AIL) to achieve joint immunity against multi-type perturbations and imperfections, thereby achieving what we call “dual robustness.” Xu et al. were the first to establish a connection between the loss landscape and physical neural network’s robustness; they also verified that it is universally applicable to different physical computing systems including both deterministic errors and non-deterministic errors [33,34]. Motivated by these findings, we also utilize AIL to dynamically inject Legendre aberrations (e.g., tilt and translation) into the input plane during the training stage, emulating fabrication inaccuracy and optical assembly errors to enhance the artificial neural network’s robustness with respect to optical misalignment [35]. The numerical results demonstrate that under certain perturbations, including salt-and-pepper noise (

p_{sp}

: 0.20), Gaussian noise (variance

σ \sim 1.5

), and/or X-axis and Y-axis tilting perturbations (∼

1.0 °

rotation), the SAM–AIL training scheme achieves significant classification accuracy improvements on the MNIST dataset for handwritten digit items [36] relative to the common backpropagation methods. Substantial accuracy improvement was also observed for Fashion-MNIST [37] for D²NNs with one hidden layer. Similar results were obtained for three hidden layers (see Supplementary Document S2 for details). Our work provides practical, efficient, and computationally economical guidance for training and deploying high-robustness D²NNs in real-world applications.

2. The Proposed Training Method

2.1. Modeling Optical Forward Propagation in a D²NN

Let us first consider an image classification task that is implemented via a phase-only modulated optical diffractive network (see inset in Figure 2a). Following the scalar Huygens–Fresnel principle, the continuous propagation from a source point on layer l to an observation point on layer

l + 1

is expressed as a surface integral. For pixelated modulators (SLM/DOE), this integral over the aperture is approximated by a Riemann sum, yielding a discrete “neuron-to-neuron” weighted summation. This discrete formulation directly leads to Equations (1)–(5). The light wave propagation process within D²NN is mathematically formulated as [22,26]

t_{i}^{l} = a_{i}^{l} exp (j ϕ_{i}^{l}),

(1)

n_{i, v}^{l} = w_{i, v}^{l} \cdot t_{i}^{l} \cdot m_{i}^{l},

(2)

m_{i}^{l} = \sum_{k} n_{k, i}^{l - 1},

(3)

w_{i, v}^{l} = \frac{z_{v} - z_{i}}{r^{2}} (\frac{1}{2 π r} + \frac{1}{j λ}) exp (\frac{j 2 π r}{λ}),

(4)

r = \sqrt{{(x_{v} - x_{i})}^{2} + {(y_{v} - y_{i})}^{2} + {(z_{v} - z_{i})}^{2}},

(5)

where

t_{i}^{l}

is the transmission coefficient of the i-th neuron in layer l, composed of an amplitude term

a_{i}^{l}

and a phase term

ϕ_{i}^{l}

. In this work, we adopt phase-only modulation by default (i.e.,

a_{i}^{l} = 1

, and

ϕ_{i}^{l}

is trainable), while the factorization in Equation (1) also covers amplitude–phase devices if needed. Here, i refers to a neuron of the l-th layer, and v refers to a neuron of the next layer, connected to neuron i by optical diffraction. Additionally,

m_{i}^{l}

represents the input wave to the i-th neuron in layer l, formed by the summation of outputs from all neurons in the previous layer. Here,

n_{i, v}^{l}

denotes the output contribution from the i-th neuron in layer l to the v-th neuron in layer

l + 1

;

w_{i, v}^{l}

is the diffraction weight characterizing wave propagation between these neurons;

λ

is the wavelength; and

j = \sqrt{- 1}

.

We note that most usual implementations of D²NNs, as modeled in Equations (1)–(5), are grounded in the Huygens-Fresnel principle and predominantly rely on computer-aided backpropagation to optimize the trainable parameters (i.e., amplitude and phase profiles across layers), enabling ultra-low-energy classification or imaging via multi-layer diffractive phase plates. However, as illustrated in Figure 2a, discrepancies between theoretical models and physical systems may introduce substantial training errors [3].

2.2. Enhancing Device-Level Robustness Through Sharpness-Aware Minimization

The optical modulation at each layer, determined through computational simulations or in situ optical training, is physically realized using either active (e.g., spatial light modulator (SLM) [38,39,40]) or passive (e.g., diffractive optical element (DOE) [41,42,43] or metasurface [44,45,46]) devices. These optical devices often exhibit random phase errors that can be statistically modeled as a combination of continuous Gaussian noise and discrete salt-and-pepper noise. These phase deviations stem from multiple fabrication and environmental mechanisms. For SLMs, experimental deployment is often hindered by parameter drift due to material aging, calibration complexity, and operational instabilities. For DOEs and metasurfaces, systematic inaccuracies arise from fabrication limitations (e.g., quantization errors in digital-to-analog conversion and distortions induced by manufacturing tools) and environmental factors (e.g., temperature fluctuations, humidity variations, material expansion/contraction, and exposure-dose inconsistencies in recording materials). These issues are collectively categorized as device-level imperfections [19,20,47]. As illustrated in Figure 3a, such imperfections are modeled as Gaussian or salt-and-pepper noise in this study.

In the Gaussian noise model, the phase error

η_{G}

at each pixel is treated as a Gaussian random variable

η_{G} \sim N (0, σ^{2}),

(6)

meaning it has a mean of zero and a variance of

σ^{2}

; its probability density function is

f_{η_{G}} (x) = \frac{1}{σ \sqrt{2 π}} exp (- \frac{x^{2}}{2 σ^{2}}),

(7)

characterizing small fluctuations about the ideal phase. Such a model is justified because fabrication tolerances—for example, slight errors in pixel height

Δ h

or refractive index

Δ n

—can be assumed to be Gaussian-distributed, and these induce Gaussian-distributed phase deviations via the linearized phase error formula:

Δ ψ_{device} = \frac{2 π}{λ} (n_{r} Δ h + h Δ n_{r}) .

(8)

Here,

λ

is the optical wavelength,

n_{r}

is the nominal refractive index of the material of the device is composed of, and h is its design thickness;

Δ h

and

Δ n

are the stochastic fabrication errors in thickness and refractive index, respectively; and

Δ ψ_{device}

denotes the device-induced phase error.

In contrast, the salt-and-pepper noise model accounts for occasional large phase glitches caused by localized defects or malfunctions (e.g., dust on a DOE or a dead/stuck SLM pixel). In this model, each pixel has a probability

p_{sp}

of being corrupted to an extreme phase value—either a fixed maximum

ϕ_{max}

or minimum

ϕ_{min}

phase (for instance, saturating to

2 π

or 0)—typically with equal likelihood. Formally, one can write

P (η_{SP} = ϕ_{\min}) = P (η_{SP} = ϕ_{\max}) = \frac{p_{sp}}{2},

(9)

P (η_{SP} = 0) = 1 - p_{sp},

(10)

meaning that a fraction

p_{sp} / 2

of pixels are randomly set to

ϕ_{min}

, and another are set to

p_{sp} / 2

to

ϕ_{max}

(while the rest remain unperturbed) [19,20,47]. These two noise models thus capture different error mechanisms in phase devices: the Gaussian term describes the aggregate effect of many small independent fabrication errors (yielding a bell-shaped distribution of phase noise), whereas the salt-and-pepper term captures rare, discrete phase jumps due to isolated defects, which manifest as pixels stuck at high or low phase values. We note that the camera-related optoelectronic detection noise is weaker, and it is not discussed in this work [48].

To improve the inference resistance of D²NNs with pure phase modulation, we employed the SAM algorithm for optical parameter training [33,49,50,51]. Traditional phase optimization focuses solely on minimizing the output error

L_{s} (Θ)

, where

Θ = {ϕ_{i}^{l}}

denotes the adjustable phase parameters. However, this method tends to converge to sharp local minima (as schematically shown in Figure 2c), which are extremely sensitive to the phase perturbations, leading to a dramatic decline in inference accuracy. The SAM, by jointly minimizing the error value and the “sharpness” of the loss landscape, thus guides the optimization towards “flat” minima (as shown in Figure 2d), ensuring diffraction stability under perturbations. The core idea of SAM originates from robust optimization theory. It not only optimizes the design space parameter configuration

Θ

but also ensures that the loss remains low within its neighborhood regime. This is especially important for training optical computing systems, as their parameters are susceptible to perturbations embodied by the phase drift

δ ϕ_{i}^{l}

. SAM searches for parameters

Θ

that satisfy the following condition: within a neighborhood with a radius of p (defined by the p-norm), the worst-case loss is minimized. This naturally leads to a minimax problem [33]:

min_{Θ} [max_{{∥ ϵ ∥}_{p} \leq ρ} L_{S} (Θ + ϵ)] + γ {∥ Θ ∥}_{2}^{2} .

(11)

Here, the inner maximization

({max}_{{∥ ϵ ∥}_{p} \leq ρ})

identifies the worst perturbation

ϵ

that maximizes the loss, thereby quantifying the local sharpness of the error surface. The outer minimization

(min Θ)

adjusts

Θ

to suppress sensitivity.

To minimize

L_{s} (Θ)

, an efficient approximation of its gradient

\nabla_{Θ} L_{S} (Θ)

needs to be derived, enabling the application of gradient descent to the SAM optimization objective. The gradient is approximately

\nabla_{Θ} L_{S}^{SAM} (Θ) \approx \nabla_{Θ} L_{S} (Θ + ϵ^{*}),

(12)

where

ϵ^{*}

represents the worst-case perturbation associated with the model weights

Θ

,

ϵ^{*} \approx ρ \cdot \frac{sign (\nabla_{Θ} L_{S} (Θ)) {|\nabla_{Θ} L_{S} (Θ)|}^{q - 1}}{{∥\nabla_{Θ} L_{S} (Θ)∥}_{q}^{q / p}},

(13)

where

1 / p + 1 / q = 1

, the p-norm constrains the perturbation range, and the q-norm normalizes the gradient magnitude.

Figure 2b schematically illustrates a single step of SAM parameter updating.

2.3. Enhancing System-Level Robustness Through Aberration-Immune Learning

As illustrated in Figure 3b, in addition to intrinsic errors caused by constituent optical devices, misalignments of these optical components (e.g., planar shifts along the X/Y-axis or minor tilts with respect to the different axes) in experimental systems would also introduce unpredictable wavefront distortions, which we term system-level environmental aberrations in this work. Such imperfections can be represented by a combination of low-order Legendre polynomials, characterized by gradient-continuous phase profiles [52]. Conventional methods, such as those proposed by Hoshi et al. [53], enhance robustness by superimposing Legendre distortions on phase modulation layers [35,54]. While partially mitigating fabrication errors, these approaches remain sensitive to stochastic salt-and-pepper or Gaussian noise because random noise exhibits high-frequency oscillations, whereas Legendre distortions feature relatively smooth variations.

To accommodate both device-level and system-level inaccuracy, we propose an AIL strategy. Instead of superimposing distortions on the phase modulation layers, AIL injects Legendre-based phase aberrations into the input plane. As illustrated in Figure 2e, the injected phase is constructed from the first ten two-dimensional Legendre polynomials

L_{1}

–

L_{10}

. Low-order two-dimensional Legendre polynomials provide an orthonormal basis for modeling systematic misalignment-induced wavefront aberrations over a rectangular aperture [55]. Each Legendre mode corresponds to a specific misalignment pattern and generates a characteristic smooth phase profile across the aperture. For example, as shown in Figure 3b, the mode

L_{2} (x, y)

with coefficient

α

(representing the tilt-induced phase amplitude) models a slight tilt about the X-axis, whereas

L_{3} (x, y)

with coefficient

γ

models a tilt about the Y-axis.

A small tilt by an angle of

Δ θ

introduces an additional linear phase ramp, whose peak-to-center phase difference is

Δ ψ_{t} = \frac{N}{2} d \frac{2 π}{λ} Δ θ,

(14)

where N is the number of diffractive pixels along one axis, d is the pixel pitch, and

λ

is the operating wavelength. Under this small-angle approximation, the tilting-induced phase coefficient satisfies

α \approx Δ ψ

(and likewise

γ \approx Δ ψ

for

L_{3}

), establishing a direct link between the physical tilt and its Legendre weight. Higher-order Legendre modes

L_{n} (x, y)

with coefficients

β_{n}

(coefficients of the Legendre polynomials) capture more complex or subtle distortions, enabling representation of high-order alignment errors or smoothly varying airflow-induced phase patterns. The overall aberration phase is therefore expressed as

Δ ψ_{c} (x, y) = \sum_{n} β_{n} L_{n} (x, y) .

(15)

where

ψ_{c} (x, y)

is the aberration phase (in radians), and

L_{n} (x, y)

denotes the orthonormal Legendre basis functions defined over the square aperture. During D²NN training, AIL dynamically incorporates wavefront aberrations generated from the Legendre polynomials and random perturbations into each training batch dataset. By multiplying these aberrations by the complex amplitude of the input optical field, AIL emulates cumulative effects from device misalignments and assembly errors [55].

This approach enables D²NNs to adaptively compensate for diverse systematic errors while preserving SAM’s noise immunity. The simulated results demonstrate that AIL significantly reduces performance degradation caused by cumulative wavefront distortions without compromising robustness against random noise inherited from the optical devices (see Figure 4, and there are more details in Section 3).

2.4. Joint SAM–AIL Co-Optimization for Phase-Only D²NNs

SAM promotes flat minima of the loss landscape by optimizing within a small parameter neighborhood, thereby reducing the sensitivity of phase-only D²NNs to device-level stochastic phase errors (e.g., Gaussian and salt-and-pepper noise). AIL improves tolerance to systematic, low-order, physically grounded wavefront aberrations by injecting Legendre-mode distortions in the input plane during training, encouraging the network to learn misalignment compensation. Building on these complementary strengths, rather than applying SAM and AIL sequentially (e.g., “SAM then AIL” or vice versa), we couple them within a single training loop. For each mini-batch, AIL first injects physically grounded aberrations in the input plane, which reshapes the loss landscape observable by the optical phase parameters; then, SAM explicitly searches for flat minima under these aberration-induced distributions by perturbing only the phase variables. This realizes a joint objective—minimizing the worst-case loss in a SAM neighborhood while marginalizing over AIL aberrations—which empirically behaves differently from a naive combination. The complete training flow is summarized in Algorithm 1, which details the batch-wise integration of AIL’s aberration injection and SAM’s perturbation-based phase optimization for phase-only D²NNs.

Algorithm 1 Joint SAM–AIL training for D²NNs

Require: Training set

S = {(x_{i}, y_{i})}_{i = 1}^{n}

; network parameters

Θ = {ϕ_{i}^{l}}

(amplitudes fixed

a_{i}^{l} = 1

); SAM radius

ρ

; step size

η

; Legendre basis

{L_{n}}

and coefficient

{β_{n}}

; loss

L (\cdot)

.

Ensure: D²NN trained with the joint SAM–AIL strategy

1: Initialize parameters

Θ_{0}

;

t \leftarrow 0

2: while not converged do

3: Sample a mini-batch

B = {(x_{j}, y_{j})}_{j = 1}^{b}

4: # AIL injection (input plane, per batch)

5: Sample

{β_{n}}

; build

Δ ψ_{c} (x, y) = \sum_{n} β_{n} L_{n} (x, y)

6: Aberrate inputs:

{\tilde{x}}_{j} \leftarrow x_{j} \cdot exp {j Δ ψ_{c} (x, y)}

for all

j \in B

7: # Forward/backward at current parameters

8: Compute

\nabla_{Θ} L_{B} (Θ)

using

{{\tilde{x}}_{j}}

9: # SAM: worst-case perturbation (optical specialization)

10: Compute

e^{*}

via Equation (13);

e^{*}

has the same shape as

Θ

11: Apply perturbation:

Θ^{+} \leftarrow Θ + e^{*}

12: # Gradient of SAM objective at

Θ^{+}

under the same AIL batch

13:

g \leftarrow \nabla_{Θ} L_{B} (Θ) |_{Θ^{+}}

14: # Parameter update

15: Update

Θ \leftarrow Θ - η g

16:

t \leftarrow t + 1

17: end while

18: return

Θ

3. Results and Discussion

Based on the scheme outlined in the preceding section, we systematically validated the synergistic advantages of integrating SAM and AIL under scenarios with both device-level noise and system-level distortion. All training procedures were conducted on a computational platform equipped with an Intel Xeon Gold 5115 CPU (2.40 GHz), 32 GB of RAM, and an NVIDIA V100 GPU, implemented using PyTorch (v1.13.0). The network was optimized using 60,000 training images from the MNIST handwritten digit dataset over 20 training epochs. We further evaluated a single-layer D²NN on the Fashion-MNIST dataset. The optical diffractive neural network classifier contains a hidden-layer spacing of

8 cm

, a light-source wavelength of

532 nm

, and a neuron resolution of

256 \times 256

[56]. During training, the cross-entropy loss function was employed to optimize network parameters. For SAM, the perturbation radius hyperparameter was set to

ρ = 1

. In the AIL strategy, each training iteration dynamically generated a combination of the first 10 Legendre polynomials, with coefficients

β_{n}

initially sampled from a uniform distribution (range:

- 5

to 5) and subsequently linearly normalized so that the resulting phase distribution lay within

[- π, π]

. This normalization ensured that the simulated aberration magnitudes were physically consistent with experimentally reported ranges of coefficients in optical calibration studies [57,58].

Network performance was evaluated via numerical simulations on the MNIST test set (10,000 images). The numerical simulations included salt-and-pepper noise (

p_{sp}

: 0.05–0.2, step: 0.05), Gaussian noise (

σ

: 0.5–2.0,

μ = 0

, step: 0.5), and input plane tilt (X/Y-axis offsets: 0.25°–1.0°, step: 0.25°). For details on the parameter selection rationale, please refer to the cited literature [26,40,59,60]. The noise parameters were configured based on experimental noise data measured from the actual devices utilized in typical experiments [26,40,41]. Results were averaged over 10 runs, verifying the trained phase masks on 10,000 test images, with error bars reflecting post-convergence stability (±0.5–4.8%). Error bars represent the variability observed across 10 independent test runs, corresponding to the range between the minimum and maximum accuracy obtained across epochs.

Figure 4a,b show that when solely using the Adam optimizer [56], classification accuracy drops sharply from 90.56% (

p_{sp}

: 0.05) to 75.78% under salt-and-pepper noise (

p_{sp}

: 0.20). For Gaussian noise (

σ = 1.5

), accuracy plummets from 91.77% (

σ = 0.5

) to 30.57%, highlighting the extreme noise sensitivity of traditional training methods like Adam. In contrast, with the SAM strategy, the classification accuracy under salt-and-pepper noise (

p_{sp}

: 0.20) increases to 88.56%, representing a 12.78% gain over Adam. For Gaussian noise (

σ = 1.5

), the classification accuracy reaches 60.55% (around 29.98% improvement), showing SAM’s effectiveness against stochastic noise.

We note that under Adam optimization, input-plane tilt degrades classification accuracy severely, as illustrated in Figure 4c,d. For example, X-axis tilt (

1.0 °

) reduces the total accuracy from 90.74% (

0.25 °

) to 43.16% (

Δ = 47.58 %

), while Y-axis tilt causes a drop from 89.81% to 33.13% (

Δ = 56.68 %

). With AIL alone (e.g., Adam with AIL), the accuracy under a

1.0 °

X-axis tilt improves by 46.85% (90.01% vs. 43.16%), and that under a

1.0 °

Y-axis tilt improves by 47.72% (80.85% vs. 33.13%), demonstrating robust compensation for systematic errors.

Noticeably, the combined SAM–AIL strategy retains the individual advantages of both methods without much of a compromise. More specifically, SAM seeks final results with a flat loss landscape to suppress phase fluctuation sensitivity, while AIL injects wavefront aberrations during training to emulate assembly misalignments. The confusion matrix analysis in Figure 5 and Figure 6a–d on the MNIST (10,000 images) shows that SAM–AIL reduces misclassification rates across all digit number classes (0–9) under extreme perturbations. The Hessian matrix eigenvalue analysis further quantifies the robustness. The maximum eigenvalue (

λ_{\max}

) decreases from 1250.47 (Adam) to 35.78 (SAM), confirming the role of flat minima in perturbation immunity. Furthermore, we also compared the training time of the standard Adam optimizer and the proposed SAM with AIL method on both MNIST and Fashion-MNIST datasets in Supplementary Document S1.

The results indicate that SAM–AIL decouples noise resilience and misalignment compensation, enabling mutual reinforcement. Accordingly, we believe that this framework provides a theoretical and experimental foundation for deploying photonic neural networks in dynamic environments, particularly for high-precision optical computing tasks that are quite sensitive to device imperfection and environmental perturbations.

Figure 7 compares the performance of the conventional Adam algorithm and the SAM–AIL strategy for the handwritten digit “1” under both perfect conditions and tilt perturbations. Figure 7a–c show the input digit, the predefined detector area layout, and the optimized phase mask modified by the respective schemes used for testing the example, respectively. As shown in Figure 7d–f, the Adam-trained D²NN fails when the digit is tilted by

1^{\circ}

about both the X- and Y-axes and misclassifies the sample as “4” and “5”, whereas when tested under additive Gaussian phase noise (

σ = 1

), it predicts “4” and “6”. By contrast, the SAM–AIL-trained network correctly recognizes the digit in every case.

To systematically validate the synergistic advantages of the SAM and the AIL approaches in scenarios with various types of device noise and input distortion, we extended the numerical experiments from MNIST to the Fashion-MNIST dataset, which represents a more challenging yet computationally comparable classification task. Figure 8a shows that with SAM, a classification accurcy reached 76.55 % under salt-and-pepper noise (

p_{sp} = 0.15

), representing a 13.55 % improvement over Adam. For Gaussian noise (

σ = 1.0

), SAM achieved classification accuracy of 66.05 % (Figure 8b), surpassing Adam by 22.78 %, showing its strong robustness against stochastic noise. With standalone AIL (Figure 8c), accuracy under a

0.5 °

tilt of the input plane about the x-axis improved by 40.20 % (80.45 % vs. 40.25 %), while a

0.5 °

tilt about the y-axis yielded a 31.59 % increase in accuracy (81.02 % vs. 49.43 %), demonstrating its exceptional ability to compensate for systematic misalignments (see Figure 8d).

Given the relatively higher complexity of Fashion-MNIST compared to MNIST, we conducted confusion matrix analysis on the 10,000-image test set to compare pure Adam and the SAM–AIL. The results are shown in Figure 6e–h and Figure 9. It is evident that the Adam approach exhibits a maximum Hessian eigenvalue

λ_{max} = 978.24

, while SAM reduced this to

λ_{max} = 69.36

. These results align with the MNIST case, suggesting that the SAM–AIL synergy maintains noise robustness and misalignment compensation in classification tasks with much higher complexity.

4. Conclusions

In this work, we demonstrate a robust and transferable diffractive optical neuromorphic system powered by a combined sharpness-aware method and aberration-immune learning strategy. The core contribution lies in leveraging SAM to mitigate device-specific variances, thereby allowing an offline-trained D²NN to retain high accuracy across different hardware instances. In addition, by promoting flatter loss landscapes and incorporating aberration-immune training, our approach yields D²NN models that remain resilient to optical noise, alignment errors, and other post-deployment perturbations without the need for delicate system calibration. This enhanced robustness against system noise and errors underpins reliable real-world deployment of diffractive neural networks.

Beyond algorithmic improvements, the proposed SAM–AIL framework opens pathways to a wide range of practical applications, including optical computing for edge-AI devices, intelligent visual inspection in industrial settings, and medical-image-based decision-support systems, where robustness and power efficiency are crucial. The demonstrated transferability across device variations makes it suitable for hardware reuse and scalable production of optical neural modules.

Nevertheless, the present study has certain limitations. Our experiments were conducted primarily on the MNIST and Fashion-MNIST datasets, which, while standard, are relatively simple compared to those used in complex optical imaging tasks. Future work will therefore extend the joint SAM–AIL strategy to more challenging datasets (e.g., CIFAR and ImageNet) and to physical experimental prototypes. Furthermore, although the proposed SAM–AIL framework has been numerically validated under a wide range of device- and system-level perturbations, these evaluations were conducted within simulation environments. Future work will therefore focus on implementing and experimentally validating the algorithm on physical diffractive prototypes to confirm its real-world robustness and exploring engineering aspects such as fabrication scalability and integration feasibility in order to develop deployable optical-AI systems.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/photonics13020139/s1, Figure S1: Comparison of training time per epoch between the standard Adam optimizer and the proposed SAM–AIL method over the first 20 epochs. (a) Results on the MNIST dataset. (b) Results on the Fashion-MNIST dataset; Figure S2: Phase masks trained via two distinct methods for MNIST classification. The upper row shows the three diffractive layers trained using the Adam algorithm, while the lower row shows the three diffractive layers trained using the combined SAM–AIL algorithm; Figure S3: The upper and lower rows show the confusion matrices produced by the Adam algorithm alone and by the combined SAM–AIL algorithm, respectively, for the MNIST dataset under three conditions: (i) no perturbation, (ii) a 0.25° rotation of the input plane about the X-axis, and (iii) a 0.25° rotation of the input plane about the Y-axis; Figure S4: The upper and lower rows show the confusion matrices of the Adam algorithm alone and the SAM–AIL joint algorithm, respectively, for the MNIST dataset under different noise conditions: salt-and-pepper noise (

p_{sp} = 0.2

) and Gaussian noise (

σ = 1.2

); Figure S5: Phase masks trained via two distinct methods for Fashion-MNIST classification. The upper row shows the three diffractive layers trained using the Adam algorithm, while the lower row shows the three diffractive layers trained using the combined SAM–AIL algorithm; Figure S6: The upper and lower rows show the confusion matrices produced by the Adam algorithm alone and by the combined SAM–AIL algorithm, respectively, for the Fashion-MNIST dataset under three conditions: (i) no perturbation, (ii) a 0.25° rotation of the input plane about the X-axis, and (iii) a 0.25° rotation of the input plane about the Y-axis; Figure S7: The upper and lower rows show the confusion matrices of the Adam algorithm alone and the SAM–AIL joint algorithm, respectively, for the Fashion-MNIST dataset under different noise conditions: salt-and-pepper noise (

p_{sp} = 0.15

) and Gaussian noise (

σ = 1

).

Author Contributions

Methodology, F.L.; Software, F.L.; Writing—original draft, F.L.; Writing—review & editing, K.Y.; Supervision, K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Momeni, A.; Rahmani, B.; Malléjac, M.; Del Hougne, P.; Fleury, R. Backpropagation-free training of deep physical neural networks. Science 2023, 382, 1297–1303. [Google Scholar] [CrossRef] [PubMed]
Wright, L.G.; Onodera, T.; Stein, M.M.; Wang, T.; Schachter, D.T.; Hu, Z.; McMahon, P.L. Deep physical neural networks trained with backpropagation. Nature 2022, 601, 549–555. [Google Scholar] [CrossRef] [PubMed]
Grollier, J.; Querlioz, D.; Camsari, K.; Everschor-Sitte, K.; Fukami, S.; Stiles, M.D. Neuromorphic spintronics. Nat. Electron. 2020, 3, 360–370. [Google Scholar] [CrossRef]
Clouse, J.; Ramsey, T.; Somathilaka, S.; Kleinsasser, N.; Ryu, S.; Balasubramaniam, S. Insect-Wing Structured Microfluidic System for Reservoir Computing. arXiv 2025, arXiv:2508.10915. [Google Scholar] [CrossRef]
Ahrar, S.; Raje, M.; Lee, I.C.; Hui, E.E. Pneumatic computers for embedded control of microfluidics. Sci. Adv. 2023, 9, eadg0201. [Google Scholar] [CrossRef]
Picella, S.; van Riet, C.M.; Overvelde, J.T.B. Pneumatic coding blocks enable programmability of electronics-free fluidic soft robots. Sci. Adv. 2024, 10, eadr2433. [Google Scholar] [CrossRef]
Hoang, S.; Shehada, M.; Patel, Z.; Tran, M.H.; Karydis, K.; Brisk, P.; Grover, W.H. Air-powered logic circuits for error detection in pneumatic systems. Device 2024, 2, 100507. [Google Scholar] [CrossRef]
Wetzstein, G.; Ozcan, A.; Gigan, S.; Fan, S.; Englund, D.; Soljačić, M.; Denz, C.; Miller, D.A.; Psaltis, D. Inference in artificial intelligence with deep optics and photonics. Nature 2020, 588, 39–47. [Google Scholar] [CrossRef]
Zhou, J.; Qian, H.; Chen, C.F.; Zhao, J.; Li, G.; Wu, Q.; Luo, H.; Wen, S.; Liu, Z. Optical edge detection based on high-efficiency dielectric metasurface. Proc. Natl. Acad. Sci. USA 2019, 116, 11137–11140. [Google Scholar] [CrossRef]
Yildirim, M.; Dinc, N.U.; Oguz, I.; Psaltis, D.; Moser, C. Nonlinear processing with linear optics. Nat. Photonics 2024, 18, 1076–1082. [Google Scholar] [CrossRef] [PubMed]
Fu, T.; Zhang, J.; Sun, R.; Huang, Y.; Xu, W.; Yang, S.; Zhu, Z.; Chen, H. Optical neural networks: Progress and challenges. Light. Sci. Appl. 2024, 13, 263. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Nazhamaiti, M.; Xu, H.; Meng, Y.; Zhou, T.; Li, G.; Fan, J.; Wei, Q.; Wu, J.; Qiao, F.; et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 2023, 623, 48–57. [Google Scholar] [CrossRef] [PubMed]
Shastri, B.J.; Tait, A.N.; Ferreira de Lima, T.; Pernice, W.H.; Bhaskaran, H.; Wright, C.D.; Prucnal, P.R. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 2021, 15, 102–114. [Google Scholar] [CrossRef]
Shen, Y.; Harris, N.C.; Skirlo, S.; Prabhu, M.; Baehr-Jones, T.; Hochberg, M.; Sun, X.; Zhao, S.; Larochelle, H.; Englund, D.; et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics 2017, 11, 441–446. [Google Scholar] [CrossRef]
Fontaine, N.K.; Ryf, R.; Chen, H.; Neilson, D.T.; Kim, K.; Carpenter, J. Laguerre–Gaussian mode sorter. Nat. Commun. 2019, 10, 1865. [Google Scholar] [CrossRef]
Lib, O.; Shekel, R.; Bromberg, Y. Building and aligning a 10-plane light converter. J. Phys. Photonics 2025, 7, 033001. [Google Scholar] [CrossRef]
Mididoddi, C.; Kilpatrick, R.J.; Sharp, C.; del Hougne, P.; Horsley, S.A.R.; Phillips, D.B. Threading light through dynamic complex media. Nat. Photonics 2025, 19, 434–440. [Google Scholar] [CrossRef]
Goi, E.; Schoenhardt, S.; Gu, M. Direct retrieval of Zernike-based pupil functions using integrated diffractive deep neural networks. Nat. Commun. 2022, 13, 7531. [Google Scholar] [CrossRef]
Luo, X.; Hu, Y.; Ou, X.; Li, X.; Lai, J.; Liu, N.; Cheng, X.; Pan, A.; Duan, H. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light. Sci. Appl. 2022, 11, 158. [Google Scholar] [CrossRef]
Bandyopadhyay, S.; Sludds, A.; Krastanov, S.; Hamerly, R.; Harris, N.; Bunandar, D.; Streshinsky, M.; Hochberg, M.; Englund, D. Single-chip photonic deep neural network with forward-only training. Nat. Photonics 2024, 18, 1335–1343. [Google Scholar] [CrossRef]
Lin, X.; Rivenson, Y.; Yardimci, N.T.; Veli, M.; Luo, Y.; Jarrahi, M.; Ozcan, A. All-optical machine learning using diffractive deep neural networks. Science 2018, 361, 1004–1008. [Google Scholar] [CrossRef] [PubMed]
Vadlamani, S.K.; Englund, D.; Hamerly, R. Transferable learning on analog hardware. Sci. Adv. 2023, 9, eadh3436. [Google Scholar] [CrossRef] [PubMed]
Xu, T.; Zhang, W.; Zhang, J.; Luo, Z.; Xiao, Q.; Wang, B.; Luo, M.; Xu, X.; Shastri, B.J.; Prucnal, P.R.; et al. Control-free and efficient integrated photonic neural networks via hardware-aware training and pruning. Optica 2024, 11, 1039–1049. [Google Scholar] [CrossRef]
Mengu, D.; Rivenson, Y.; Ozcan, A. Scale-, shift-, and rotation-invariant diffractive optical networks. ACS Photonics 2020, 8, 324–334. [Google Scholar] [CrossRef]
Shi, J.; Chen, M.; Wei, D.; Hu, C.; Luo, J.; Wang, H.; Zhang, X.; Xie, C. Anti-noise diffractive neural network for constructing an intelligent imaging detector array. Opt. Express 2020, 28, 37686–37699. [Google Scholar] [CrossRef]
Li, K.; Jia, Y.; Gu, M.; Fang, X. Robust occlusion-aware orbital angular momentum feature extraction via all-optical diffractive processing systems. Opt. Express 2025, 33, 23053–23064. [Google Scholar] [CrossRef]
Wang, Y.; Chen, M.; Yao, C.; Ma, J.; Yan, T.; Penty, R.; Cheng, Q. Asymmetrical Estimator for Training Encapsulated Deep Photonic Neural Networks. Nat. Commun. 2025, 16, 2143. [Google Scholar] [CrossRef]
Sunada, S.; Niiyama, T.; Kanno, K.; Nogami, R.; Röhm, A.; Awano, T.; Uchida, A. Blending Optimal Control and Biologically Plausible Learning for Noise-Robust Physical Neural Networks. Phys. Rev. Lett. 2025, 134, 017301. [Google Scholar] [CrossRef]
Spall, J.; Guo, X.; Lvovsky, A.I. Hybrid training of optical neural networks. Optica 2022, 9, 803–811. [Google Scholar] [CrossRef]
Zhao, G.; Shu, X.; Zhou, R. High-performance real-world optical computing trained by in situ gradient-based model-free optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 7194–7205. [Google Scholar] [CrossRef]
Nakajima, M.; Inoue, K.; Tanaka, K.; Kuniyoshi, Y.; Hashimoto, T.; Nakajima, K. Physical deep learning with biologically inspired training method: Gradient-free approach for physical hardware. Nat. Commun. 2022, 13, 7847. [Google Scholar] [CrossRef] [PubMed]
Foret, P.; Kleiner, A.; Mobahi, H.; Neyshabur, B. Sharpness-Aware Minimization for Efficiently Improving Generalization. In Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), Virtual, 3–7 May 2021. [Google Scholar]
Xu, T.; Luo, Z.; Liu, S.; Fan, L.; Xiao, Q.; Wang, B.; Wang, D.; Huang, C. Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware Training. arXiv 2024, arXiv:2411.12352. [Google Scholar] [CrossRef]
Mengu, D.; Zhao, Y.; Yardimci, N.T.; Rivenson, Y.; Jarrahi, M.; Ozcan, A. Misalignment resilient diffractive optical networks. Nanophotonics 2020, 9, 4207–4219. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
Tong, Y.; Pivnenko, M.; Chu, D. Effects of phase flicker in digitally driven phase-only LCOS devices on holographic reconstructed images. Appl. Opt. 2022, 61, B25–B33. [Google Scholar] [CrossRef]
García-Márquez, J.; López, V.; González-Vega, A.; Noé, E. Flicker minimization in an LCoS spatial light modulator. Opt. Express 2012, 20, 8431–8441. [Google Scholar] [CrossRef]
Engström, D.; Persson, M.; Bengtsson, J.; Goksör, M. Calibration of spatial light modulators suffering from spatially varying phase response. Opt. Express 2013, 21, 16086–16103. [Google Scholar] [CrossRef]
Fan, D.; Smith, C.S.; Unnithan, R.R.; Kim, S. 3D printed diffractive optical elements for rapid prototyping. Micro Nano Eng. 2024, 24, 100270. [Google Scholar] [CrossRef]
Bengtsson, J.; Johansson, M. Fan-out diffractive optical elements designed for increased fabrication tolerances to linear relief depth errors. Appl. Opt. 2002, 41, 281–289. [Google Scholar] [CrossRef] [PubMed]
Banerji, S.; Sensale-Rodriguez, B. A computational design framework for efficient, fabrication error-tolerant, planar THz diffractive optical elements. Sci. Rep. 2019, 9, 5801. [Google Scholar] [CrossRef] [PubMed]
Jenkins, R.P.; Campbell, S.D.; Werner, D.H. Establishing exhaustive metasurface robustness against fabrication uncertainties through deep learning. Nanophotonics 2021, 10, 4497–4509. [Google Scholar] [CrossRef]
Zhang, X.; Jin, J.; Wang, Y.; Pu, M.; Li, X.; Zhao, Z.; Gao, P.; Wang, C.; Luo, X. Metasurface-based broadband hologram with high tolerance to fabrication errors. Sci. Rep. 2016, 6, 19856. [Google Scholar] [CrossRef]
Patoux, A.; Agez, G.; Girard, C.; Paillard, V.; Wiecha, P.R.; Lecestre, A.; Carcenac, F.; Larrieu, G.; Arbouet, A. Challenges in nanofabrication for efficient optical metasurfaces. Sci. Rep. 2021, 11, 5620. [Google Scholar] [CrossRef]
Chen, H.; Feng, J.; Jiang, M.; Wang, Y.; Lin, J.; Tan, J.; Jin, P. Diffractive deep neural networks at visible wavelengths. Engineering 2021, 7, 1483–1491. [Google Scholar] [CrossRef]
Kim, J.; Yu, N.; Yu, Z. Compute-First Optical Detection for Noise-Resilient Visual Perception. ACS Photonics 2025, 12, 1137–1145. [Google Scholar] [CrossRef]
Yao, Z.; Gholami, A.; Keutzer, K.; Mahoney, M.W. Pyhessian: Neural networks through the lens of the hessian. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 581–590. [Google Scholar]
Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.; Wilson, A.G. Averaging Weights Leads to Wider Optima and Better Generalization. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI 2018), Monterey, CA, USA, 6–10 August 2018; pp. 876–885. [Google Scholar]
Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In Proceedings of the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017. [Google Scholar]
Kewei, E.; Zhang, C.; Li, M.; Xiong, Z.; Li, D. Wavefront reconstruction algorithm based on Legendre polynomials for radial shearing interferometry over a square area and error analysis. Opt. Express 2015, 23, 20267–20279. [Google Scholar] [CrossRef]
Hoshi, I.; Wakunami, K.; Ichihashi, Y.; Oi, R. Wavefront-aberration-tolerant diffractive deep neural networks using volume holographic optical elements. Sci. Rep. 2025, 15, 1104. [Google Scholar] [CrossRef]
Li, Y.; Zheng, Z.; Li, R.; Chen, Q.; Luan, H.; Yang, H.; Zhang, Q.; Gu, M. Multiscale diffractive U-Net: A robust all-optical deep learning framework modeled with sampling and skip connections. Opt. Express 2022, 30, 36700–36710. [Google Scholar] [CrossRef]
Li, Y.; Guo, J.; Liu, R. Exploring Wavefront Detection in Imaging Systems with Rectangular Apertures Using Phase Diversity. Sensors 2024, 24, 1191. [Google Scholar] [CrossRef]
Liu, X.; Zhang, D.; Wang, L.; Ma, T.; Liu, Z.; Xiao, J.J. Parallelized and Cascadable Optical Logic Operations by Few-Layer Diffractive Optical Neural Network. Photonics 2023, 10, 503. [Google Scholar] [CrossRef]
Long, X.; Gao, Y.; Yuan, Z.; Yan, W.; Ren, Z.C.; Wang, X.L.; Ding, J.; Wang, H.T. In-Situ Wavefront Correction via Physics-Informed Neural Network. Laser Photonics Rev. 2024, 18, 2300833. [Google Scholar] [CrossRef]
Jiang, M.; Zhang, X.; Puliafito, C.A.; Zhang, H.F.; Jiao, S. Adaptive Optics Photoacoustic Microscopy. Opt. Express 2010, 18, 21770–21776. [Google Scholar] [CrossRef]
Abdelazeem, R.M.; Agour, M.; Elnaby, S.H. Interferometric Surface Analysis of a Phase-Only Spatial Light Modulator for Surface Deformation Compensation. Photonics 2025, 12, 285. [Google Scholar] [CrossRef]
Whitworth, G.L.; Francone, A.; Sotomayor-Torres, C.M.; Kehagias, N. Real-time optical dimensional metrology via diffractometry for nanofabrication. Sci. Rep. 2020, 10, 5371. [Google Scholar] [CrossRef]

Figure 1. (a) MPLC (multi-plane light conversion)-based free-space diffractive neural network, denoted as S-D²NN. The input beam (red arrow) illuminates a phase-only SLM, the optical path is folded by a mirror (M), the module is mounted on manual translation stages (MS) for precise alignment, and the output is recorded by a CCD sensor. (b) Integrated diffractive deep neural network (I-D²NN): a compact on-chip diffractive stack is placed between two microscope objectives (Obj. 1/2) and imaged on a CMOS sensor. Abbreviations: M—mirror; SLM—spatial light modulator; CCD/CMOS—image sensors; Obj.—objective lens; MS—manual translation stages; S-D²NN—free-space D²NN; I-D²NN—integrated on-chip D²NN.

Figure 2. Scheme of the proposed D²NN training approach. (a) When transferring the in silico training results to real D²NN system, the unavoidable mismatched setup and various noises could be detrimental to the final performance. (b) SAM parameter update mechanism. (c) A sharp minimum toward which the same MNIST trained with ADAM converges. (d) A flat minimum basin toward which a MNIST trained with SAM converges. (e) Aberration-adaptive training for different wavefront aberrations.

Figure 3. Modeling the error sources affecting D²NN performance. (a) Device-level random phase noise (Gaussian and salt-and-pepper) imposed on the designed phase mask. (b) System-level wavefront aberrations represented by the Legendre polynomials: tilt modes

L_{2}

,

L_{3}

, and a composite higher-order profile.

Figure 3. Modeling the error sources affecting D²NN performance. (a) Device-level random phase noise (Gaussian and salt-and-pepper) imposed on the designed phase mask. (b) System-level wavefront aberrations represented by the Legendre polynomials: tilt modes

L_{2}

,

L_{3}

, and a composite higher-order profile.

Figure 4. Classification accuracy of one-hidden-layer network trained via the four optimization strategies with various perturbations or imperfections. (a) Salt-and-pepper noise perturbation affecting the diffractive network weights, and (b) Gaussian noise perturbation affecting diffractive network weights. Network trained for (c) X-axis misalignment of the optical input plane and (d) Y-axis misalignment of the optical input plane.

Figure 5. Comparison between conventional Adam optimization and the proposed SAM–AIL strategy for disturbance-free and tilt perturbation situations. (a–d) Results obtained using the Adam algorithm. (e–h) Results obtained using the proposed SAM–AIL strategy. (a,e) The phase after training and the eigenvalues of the Hessian matrix. (b–d) Confusion matrices obtained from 10,000 MNIST test images with (b) no perturbation, (c)

1^{\circ}

tilt of the input plane about the X-axis, and (d)

1^{\circ}

tilt about the Y-axis. (f–h) Corresponding confusion matrices for the SAM–AIL-trained D²NN.

Figure 5. Comparison between conventional Adam optimization and the proposed SAM–AIL strategy for disturbance-free and tilt perturbation situations. (a–d) Results obtained using the Adam algorithm. (e–h) Results obtained using the proposed SAM–AIL strategy. (a,e) The phase after training and the eigenvalues of the Hessian matrix. (b–d) Confusion matrices obtained from 10,000 MNIST test images with (b) no perturbation, (c)

1^{\circ}

tilt of the input plane about the X-axis, and (d)

1^{\circ}

tilt about the Y-axis. (f–h) Corresponding confusion matrices for the SAM–AIL-trained D²NN.

Figure 6. Performance comparison between conventional Adam optimization and the proposed SAM–AIL strategy under representative noise for MNIST and Fashion-MNIST datasets. (a,b) Confusion matrices of the Adam-optimized D²NN evaluated under salt-and-pepper noise (

p_{sp}

= 0.2) and additive Gaussian phase noise (

σ = 1.5

). (c,d) Corresponding confusion matrices for the SAM–AIL-optimized D²NN under the same noise conditions. (e–h) This comparison was repeated for the Fashion-MNIST dataset, with (e,f) depicting the Adam results and (g,h) depicting the SAM–AIL results under identical noise parameters (

p_{sp} = 0.15

,

σ = 1.0

).

Figure 6. Performance comparison between conventional Adam optimization and the proposed SAM–AIL strategy under representative noise for MNIST and Fashion-MNIST datasets. (a,b) Confusion matrices of the Adam-optimized D²NN evaluated under salt-and-pepper noise (

p_{sp}

= 0.2) and additive Gaussian phase noise (

σ = 1.5

). (c,d) Corresponding confusion matrices for the SAM–AIL-optimized D²NN under the same noise conditions. (e–h) This comparison was repeated for the Fashion-MNIST dataset, with (e,f) depicting the Adam results and (g,h) depicting the SAM–AIL results under identical noise parameters (

p_{sp} = 0.15

,

σ = 1.0

).

Figure 7. Classification inference example regarding D²NN trained with conventional Adam scheme and the proposed SAM–AIL strategy under disturbance-free conditions and various perturbations. (a) A typical handwritten digit image of “1” chosen from the MNIST. (b) Layout of the ten predefined detector regions in the output plane. (c) Phase mask obtained (identical to Figure 5e). (d,e) Normalized output-plane intensity maps obtained with the Adam-trained (upper rows) and SAM–AIL-trained (lower rows) D²NN under five conditions: (I) no perturbation; (II) a

1^{\circ}

input-plane tilt about the X-axis; (III) a

1^{\circ}

tilt about the Y-axis; (IV) salt-and-pepper noise (

p_{sp} = 0.2

); and (V) additive Gaussian phase noise (

σ = 1.5

). (f) Light-energy distribution across the ten predefined detector regions, comparing results for the Adam and SAM–AIL models.

Figure 7. Classification inference example regarding D²NN trained with conventional Adam scheme and the proposed SAM–AIL strategy under disturbance-free conditions and various perturbations. (a) A typical handwritten digit image of “1” chosen from the MNIST. (b) Layout of the ten predefined detector regions in the output plane. (c) Phase mask obtained (identical to Figure 5e). (d,e) Normalized output-plane intensity maps obtained with the Adam-trained (upper rows) and SAM–AIL-trained (lower rows) D²NN under five conditions: (I) no perturbation; (II) a

1^{\circ}

input-plane tilt about the X-axis; (III) a

1^{\circ}

tilt about the Y-axis; (IV) salt-and-pepper noise (

p_{sp} = 0.2

); and (V) additive Gaussian phase noise (

σ = 1.5

). (f) Light-energy distribution across the ten predefined detector regions, comparing results for the Adam and SAM–AIL models.

Figure 8. Classification accuracy of one-hidden-layer network trained via the four optimization strategies with various perturbations or imperfections on the Fashion-MNIST dataset. (a) Salt-and-pepper noise perturbation affecting the diffractive network weights, and (b) Gaussian noise perturbation affecting diffractive network weights. Network trained for (c) X-axis misalignment of the optical input plane and (d) Y-axis misalignment of the optical input plane.

Figure 9. Comparison between conventional Adam optimization and the proposed SAM–AIL strategy for disturbance-free and tilt perturbation situations on the Fashion-MNIST dataset. (a–d) Results obtained using the Adam algorithm. (e–h) Results obtained using the proposed SAM–AIL strategy. (a,e) The phase after training and the eigenvalues of the Hessian matrix. (b–d) Confusion matrices obtained from 10,000 Fashion-MNIST test images with (b) no perturbation, (c)

1^{\circ}

tilt of the input plane about the X-axis, and (d)

1^{\circ}

tilt about the Y-axis. (f–h) Corresponding confusion matrices for the SAM–AIL-trained D²NN.

Figure 9. Comparison between conventional Adam optimization and the proposed SAM–AIL strategy for disturbance-free and tilt perturbation situations on the Fashion-MNIST dataset. (a–d) Results obtained using the Adam algorithm. (e–h) Results obtained using the proposed SAM–AIL strategy. (a,e) The phase after training and the eigenvalues of the Hessian matrix. (b–d) Confusion matrices obtained from 10,000 Fashion-MNIST test images with (b) no perturbation, (c)

1^{\circ}

tilt of the input plane about the X-axis, and (d)

1^{\circ}

tilt about the Y-axis. (f–h) Corresponding confusion matrices for the SAM–AIL-trained D²NN.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, F.; Yang, K. Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training. Photonics 2026, 13, 139. https://doi.org/10.3390/photonics13020139

AMA Style

Li F, Yang K. Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training. Photonics. 2026; 13(2):139. https://doi.org/10.3390/photonics13020139

Chicago/Turabian Style

Li, Fansanqiu, and Kaicheng Yang. 2026. "Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training" Photonics 13, no. 2: 139. https://doi.org/10.3390/photonics13020139

APA Style

Li, F., & Yang, K. (2026). Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training. Photonics, 13(2), 139. https://doi.org/10.3390/photonics13020139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training

Abstract

1. Introduction

2. The Proposed Training Method

2.1. Modeling Optical Forward Propagation in a D²NN

2.2. Enhancing Device-Level Robustness Through Sharpness-Aware Minimization

2.3. Enhancing System-Level Robustness Through Aberration-Immune Learning

2.4. Joint SAM–AIL Co-Optimization for Phase-Only D²NNs

3. Results and Discussion

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Robust Diffractive Optical Neuromorphic System Created via Sharpness-Aware and Immune Training

Abstract

1. Introduction

2. The Proposed Training Method

2.1. Modeling Optical Forward Propagation in a D2NN

2.2. Enhancing Device-Level Robustness Through Sharpness-Aware Minimization

2.3. Enhancing System-Level Robustness Through Aberration-Immune Learning

2.4. Joint SAM–AIL Co-Optimization for Phase-Only D2NNs

3. Results and Discussion

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.1. Modeling Optical Forward Propagation in a D²NN

2.4. Joint SAM–AIL Co-Optimization for Phase-Only D²NNs