Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Conceptual Study of Rapidly Reconfigurable and Scalable Optical Convolutional Neural Networks Based on Free-Space Optics Using a Smart Pixel Light Modulator

Computers 2025, 14(3), 111; https://doi.org/10.3390/computers14030111

by Young-Gu Ju

Reviewer 1:

Jitender Deogun

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Computers 2025, 14(3), 111; https://doi.org/10.3390/computers14030111

Submission received: 18 February 2025 / Revised: 11 March 2025 / Accepted: 18 March 2025 / Published: 20 March 2025

(This article belongs to the Special Issue Emerging Trends in Machine Learning and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is an interesting paper and will be useful to many researchers. There are several strong aspects:

Clearly articulates the computational limitations of traditional electronic CNNs.Effectively highlights the limitations of SLMs and justifies the use of SPLMs for Optical CNN.

Provides well-referenced background and related work.

Presents a coherent progression from existing OCNN architectures to the proposed

SPOCNN and SPBOCNN.

Figures enhance understanding of the proposed system's architecture and functionality.

However, presentation can be and must be improved to improve the readability of the paper. Please, follow the suggestions given below.

––Please, make the section titles more descriptive and informative.

––The paper would benefit by the introduction of subsections to break down large sections into better structured presentation.

––Properly designed Subsections would improve the organization and readability of the paper.

–– Reformatting the paper, will not only help improve the readability and but also help improve the impact of the paper.

These changes must be made

Author Response

Comments 1:
This is an interesting paper and will be useful to many researchers. There are several strong aspects:
Clearly articulates the computational limitations of traditional electronic CNNs.Effectively highlights the limitations of SLMs and justifies the use of SPLMs for Optical CNN.
Provides well-referenced background and related work.
Presents a coherent progression from existing OCNN architectures to the proposed SPOCNN and SPBOCNN.
Figures enhance understanding of the proposed system's architecture and functionality.
However, presentation can be and must be improved to improve the readability of the paper. Please, follow the suggestions given below.

Please, make the section titles more descriptive and informative.
The paper would benefit by the introduction of subsections to break down large sections into better structured presentation.
Properly designed Subsections would improve the organization and readability of the paper.
Reformatting the paper, will not only help improve the readability and but also help improve the impact of the paper.

These changes must be made
--------------------------

Response 1:
Thank you very much for taking the time to review this manuscript.
We have divided the sections into the following subsections to address the reviewer's comment.

2.1. Fundamental Concepts of SPOCNN
2.2. Simplifying SPOCNN with Electrical Fan-In and Fan-Out

3.1. Smart Pixel-Based Bidirectional Optical Convolutional Neural Network (SPBOCNN)
3.2. Simplifying SPBOCNN with Electrical Fan-In and Fan-Out
3.3. Application of SPBOCNN in Difference Mode and Multiple Kernel Sets

4.1. Scalibility of SPOCNN
4.2. A design example of SPOCNN
4.3. Performance Analysis of SPOCNN Throughput
4.4. Transverse Scaling of SPBOCNN Using Smart Pixel Memory
4.5. Longitudinal Scaling of a Two-Mirror-Like SPBOCNN
4.6. Application of SPBOCNN in Solving Partial Differential Equations

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents a theoretical study on implementing CNNs using optical systems. The contribution is incremental - it consists in introducing a smart pixel light modulator into the scalable optical convolutional neural network, proposed in an earlier work by the same author.

In order to make the study more convincing, it should be enriched with some experimental evidence. In the current shape, it is limited to theoretical analyses which are difficult to verify. Alternatively, the author should discuss it thoroughly which parts of the proposed system has been already studied experimentally and provide some arguments to justify the feasibility.

CNNs are commonly deployed relying on GPUs which the author briefly mentions in line 42 - the shortcomings of this implementation should be discussed in more detail and some comparisons (theoretical and experimental at best) must be provided.

While the authors clearly identify their contribution in the introduction, it is quite vague in the abstract - it is not clear that SPOCNN is the main contribution of this paper. Also, avoid using abbreviations in the abstract.

Author Response

Comments 1:
The paper presents a theoretical study on implementing CNNs using optical systems. The contribution is incremental - it consists in introducing a smart pixel light modulator into the scalable optical convolutional neural network, proposed in an earlier work by the same author.
In order to make the study more convincing, it should be enriched with some experimental evidence. In the current shape, it is limited to theoretical analyses which are difficult to verify. Alternatively, the author should discuss it thoroughly which parts of the proposed system has been already studied experimentally and provide some arguments to justify the feasibility.
---------

Response 1:
Thank you very much for taking the time to review this manuscript.

We added the following paragraphs to give a design example.
>> p15, line 489
A design example of Lens2 for SPOCNN is illustrated in Figure 6. The design was carried out using the optical design software CodeV for an 850 nm wavelength, with op-timization starting from a Cooke triplet. The entrance pupil diameter and magnification are 0.4 mm and 5.0, respectively. The kernel array size is 5×5, with dimensions of 0.200×0.200 mm², allowing for a square pixel pitch and size of 40 µm and 10 µm, respectively. The distance from the center to the corner of the kernel array is 0.28 mm, which is why the three field points are set at 0.0, 0.2, and 0.3 mm.
Figure 6(b) shows the spot diagram with an Airy disk, indicating that the RMS spot size is much smaller than the diffraction spread. The combined effects of diffraction and geometric aberration are represented in the encircled energy diagram in Figure 6(c), where 80% of the total energy falls within a circle with a diameter of 68 µm. Given that the kernel array image size or detector pitch is 200 µm, the diffraction and geometric aberration spot size accounts for 34% of the detector pitch, while the geometric image size of a single pixel is 50 µm (25%). The remaining 41% can be used for alignment tolerance in either the detector plane or the SPLM plane. In the SPLM plane, a 41% duty cycle corresponds to 16 µm, providing practical alignment tolerance using modern optical assembly techniques.
To examine the effect of increasing the kernel size, we increased the magnification to 11 by scaling up and re-optimizing the design. With this higher magnification, 80% of the total energy now falls within a 281 µm diameter, while the detector pitch increases to 440 µm. Since the pixel image size is 110 µm, the alignment tolerance is 49 µm at the detector plane and 4.5 µm at the SPLM plane. As magnification increases, greater design effort is required to maintain alignment tolerance, either by reducing the pixel duty cycle or applying different optimization rules.

---------------
---------------

Comments 2:
CNNs are commonly deployed relying on GPUs which the author briefly mentions in line 42 - the shortcomings of this implementation should be discussed in more detail and some comparisons (theoretical and experimental at best) must be provided.
----------

Response 2:
We have revised the manuscript to address the reviewer's comment.
--
Old version: While graphics processing units help mitigate some latency challenges, real-time processing remains a bottleneck for large-scale systems [4].
>> p2, line 44
New version: While graphics processing units (GPUs) are widely used to accelerate CNN-based image recognition, they exhibit several limitations that hinder real-time processing [4]. One major drawback is high power consumption, making GPUs inefficient for edge com-puting and mobile applications [5]. Additionally, memory bandwidth limitations create bottlenecks when handling large-scale models, as CNNs require frequent memory access that exceeds GPU bandwidth capabilities [6]. Latency issues also arise due to kernel launch overhead, memory transfers, and synchronization delays, reducing the efficiency of real-time inference [7]. Furthermore, scaling large CNN models across multiple GPUs is challenging due to interconnect bottlenecks and increasing data transfer overhead, making real-time processing impractical for large-scale systems [8]. While GPUs excel in training deep learning models, they are not fully optimized for inference, leading to in-efficiencies compared to dedicated artificial intelligence accelerators such as tensor processing units, field-programmable gate arrays, and emerging optical computing systems. These limitations highlight the need for alternative hardware architectures to achieve truly real-time, scalable, and power-efficient deep learning inference.

[4] Chetlur S.; Woolley C.; Vandermersch P.; Cohen J.; Tran J.; Catanzaro B.; Shelhamer E. cuDNN: efficient primitives for deep learning, arXiv:1410.0759v3 (2014).
[5] Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems, 28 (2015).
[6] Rhu, M.; Gimelshein, N.; Clemons, J.; Zulfiqar, A.; Keckler, S.W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 1-13 (2016).
[7] Chen, T.; Moreau, T.; Jiang, Z.; Zheng, L.; Yan, E.; Shen, H.; Cowan, M.; Wang, L.; Hu, Y.; Ceze, L.; Guestrin, C. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 578-594 (2018).
[8] Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; Boyle, R. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, 1-12 (2017).
---------------
---------------

Comments 3:
While the authors clearly identify their contribution in the introduction, it is quite vague in the abstract - it is not clear that SPOCNN is the main contribution of this paper. Also, avoid using abbreviations in the abstract.
--------
Response 3:
We have revised the abstract to address the reviewer's comment.
---
Old version : Smart-pixel-based optical convolutional neural network (SPOCNN) was proposed to improve kernel refresh rates in scalable optical convolutional neural networks by replacing the spatial light modulator with a smart pixel light modulator (SPLM), while maintaining benefits such as unlimited input node size, cascadability, and direct kernel representation. The fast updating capability and memory of SPLM enable real-time applications, including convolution with multiple kernel sets and difference mode. Simplifications using elec-trical fan-out reduced hardware complexity and costs. Smart-pixel-based bidirectional optical convolutional neural network (SPBOCNN), an evolution of SPOCNN, adopted bidirectional architecture and single lens-array optics, achieving a computational throughput of 8.3 × 10¹⁴ MAC/s with an SPLM resolution of 3840 × 2160. Further development led to two-mirror-like SPBOCNN (TML-SPBOCNN), which can emulate 2n layers using 2 physical layers, offering significant hardware savings despite increased time delay. TML-SPBOCNN was demonstrated for solving partial differential equations (PDEs), leveraging local interactions represented as a sequence of convolutions. These advancements establish SPOCNN and its derivatives as promising solutions for future convolutional neural network applications.

>> p1, line 9
New version: The smart-pixel-based optical convolutional neural network was proposed to improve kernel refresh rates in scalable optical convolutional neural networks (CNNs) by replacing the spatial light modulator with a smart pixel light modulator while preserving benefits such as an unlimited input node size, cascadability, and direct kernel representation. The smart pixel light modulator enhances weight update speed, enabling rapid reconfigurability. Its fast updating capability and memory expand the application scope of scalable optical CNNs, supporting operations like convolution with multiple kernel sets and difference mode. Simplifications using electrical fan-out reduce hardware complexity and costs. An evolution of this system, the smart-pixel-based bidirectional optical CNN, employs a bidirectional architecture and single lens-array optics, achieving a computational throughput of 8.3 × 10¹⁴ MAC/s with a smart pixel light modulator resolution of 3840 × 2160. Further advancements led to the two-mirror-like smart-pixel-based bidirectional optical CNN, which emulates 2n layers using only two physical layers, significantly reducing hardware requirements despite increased time delay. This architecture was demonstrated for solving partial differential equations by leveraging local interactions as a sequence of convolutions. These advancements position smart-pixel-based optical CNNs and their derivatives as promising solutions for future CNN applications.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents a smart-pixel-based optical convolutional neural network that enhances existing optical CNN architectures by replacing the spatial light modulator with a smart pixel light modulator.

A few comments which are meant to improve the paper:

The kernel size scalability is limited by Lens2, restricting flexibility for certain CNN architectures. A more detailed discussion on lens array optimizations could help address this issue.
The work is primarily conceptual and theoretical, lacking practical experimental demonstrations. Providing real-world performance benchmarks (for instance, image classification tasks) would significantly improve the paper.
Although SPLMs improve speed, frequent weight reconfiguration in SPBOCNN still introduces sequential delays when processing multiple kernels. Future work could explore non-volatile optical memory solutions to further minimize update time.
The study assumes an ideal optical setup, but crosstalk and diffraction effects may degrade performance in practical implementations. More quantitative optical simulations would help assess these effects.

Author Response

Comments 1:
The paper presents a smart-pixel-based optical convolutional neural network that enhances existing optical CNN architectures by replacing the spatial light modulator with a smart pixel light modulator.

A few comments which are meant to improve the paper:

The kernel size scalability is limited by Lens2, restricting flexibility for certain CNN architectures. A more detailed discussion on lens array optimizations could help address this issue.

---
Response 1:
Thank you very much for taking the time to review this manuscript.
We have added the following paragraphs to address the reviewer's comment.
>> p15, line 479
The array size limit of 66×66 is based on the assumption that the f/# of Lens2 is 2 and its angular aberration is approximately 3 mrad. However, this theoretical limit can be sur-passed by using LCOE with clustering methods [13] or SPONN with software-based scaling [18]. While SPOCNN establishes only partial connections between the input and output arrays, LCOE and SPONN provide full connections. Therefore, the connections in SPOCNN can be considered a subset of those in LCOE or SPONN. This means that SPONN can emulate SPOCNN with the same number of inputs and outputs. Since fully connected ONNs are scalable through clustering and smart pixel memory, the limita-tions of SPOCNN can be effectively overcome.
---------------
---------------
Comments 2:

The work is primarily conceptual and theoretical, lacking practical experimental demonstrations. Providing real-world performance benchmarks (for instance, image classification tasks) would significantly improve the paper.
----
Response 2:
We have revised the manuscript to address the reviewer's comment.

Old version: An intriguing application of TML-SPBOCNN is solving partial differential equations (PDEs).
>> p20, line 643
New version: An ideal demonstration of SPBCONN is its application in image classification, where instruction steps are analyzed based on the given hardware architecture, and throughput is measured along with processing delay. However, image classification requires not only the convolution process but also a fully connected process, along with data transfer control between these stages. This will be the focus of future research. Meanwhile, an intriguing and immediate application of TML-SPBOCNN is solving partial differential equations (PDEs).

--------------------
--------------------
Comments 3:
Although SPLMs improve speed, frequent weight reconfiguration in SPBOCNN still introduces sequential delays when processing multiple kernels. Future work could explore non-volatile optical memory solutions to further minimize update time.
---
Response 3:
>>
We thank the reviewer for the suggestions. We will consider a non-volatile optical memory solution in our next paper; however, we currently do not have sufficient knowledge about non-volatile optical memory. Therefore, we are unable to include related comments in this manuscript. We hope the reviewer understands this point.
--------------------
--------------------
Comments 4:
The study assumes an ideal optical setup, but crosstalk and diffraction effects may degrade performance in practical implementations. More quantitative optical simulations would help assess these effects.
---
Response 4:
We have added the following paragraphs to give a design example.

>> p15, line 489
A design example of Lens2 for SPOCNN is illustrated in Figure 6. The design was carried out using the optical design software CodeV for an 850 nm wavelength, with op-timization starting from a Cooke triplet. The entrance pupil diameter and magnification are 0.4 mm and 5.0, respectively. The kernel array size is 5×5, with dimensions of 0.200×0.200 mm², allowing for a square pixel pitch and size of 40 µm and 10 µm, respectively. The distance from the center to the corner of the kernel array is 0.28 mm, which is why the three field points are set at 0.0, 0.2, and 0.3 mm.
Figure 6(b) shows the spot diagram with an Airy disk, indicating that the RMS spot size is much smaller than the diffraction spread. The combined effects of diffraction and geometric aberration are represented in the encircled energy diagram in Figure 6(c), where 80% of the total energy falls within a circle with a diameter of 68 µm. Given that the kernel array image size or detector pitch is 200 µm, the diffraction and geometric aberration spot size accounts for 34% of the detector pitch, while the geometric image size of a single pixel is 50 µm (25%). The remaining 41% can be used for alignment tolerance in either the detector plane or the SPLM plane. In the SPLM plane, a 41% duty cycle corresponds to 16 µm, providing practical alignment tolerance using modern optical assembly techniques.
To examine the effect of increasing the kernel size, we increased the magnification to 11 by scaling up and re-optimizing the design. With this higher magnification, 80% of the total energy now falls within a 281 µm diameter, while the detector pitch increases to 440 µm. Since the pixel image size is 110 µm, the alignment tolerance is 49 µm at the detector plane and 4.5 µm at the SPLM plane. As magnification increases, greater design effort is required to maintain alignment tolerance, either by reducing the pixel duty cycle or applying different optimization rules.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have corrected the paper following my comments and I do not have any further requests.

Article Menu

A Conceptual Study of Rapidly Reconfigurable and Scalable Optical Convolutional Neural Networks Based on Free-Space Optics Using a Smart Pixel Light Modulator

Further Information

Guidelines

MDPI Initiatives

Follow MDPI