Review Reports - A Conceptual Study of Rapidly Reconfigurable and Scalable Bidirectional Optical Neural Networks Leveraging a Smart Pixel Light Modulator

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this work, Ju et al. have introduced the concept of integrating smart pixel light modulators (SPLMs) into bidirectional optical neural networks (BONNs). This conceptual approach is abbreviated as SPBONN. However, I have several concerns regarding the current manuscript:

1. The presentation style of this work largely resembles that of a conceptual proposal, relying primarily on diverse parameters gathered from various literature sources to support the idea. Consequently, it is recommended to revise the title of the manuscript to better reflect its conceptual nature and emphasize its idea-driven framework.

2. As a research article, it is strongly advised to include numerical results simulated using the proposed model to provide concrete evidence and validation for the results discussed in the manuscript.

Author Response

Thank you for your valuable comments. We have made every effort to incorporate the necessary revisions into the manuscript accordingly.

Comments 1:

The presentation style of this work largely resembles that of a conceptual proposal, relying primarily on diverse parameters gathered from various literature sources to support the idea. Consequently, it is recommended to revise the title of the manuscript to better reflect its conceptual nature and emphasize its idea-driven framework.

Response 1:

We changed the title from 'Rapid-Reconfigurable and Flexible Optical Neural Network Based on Free-Space Optics Using Lens Arrays and a Smart Pixel Light Modulator' to 'A Conceptual Study of Rapidly Reconfigurable and Scalable Bidirectional Optical Neural Networks Leveraging a Smart Pixel Light Modulator'.

Comments 2: As a research article, it is strongly advised to include numerical results simulated using the proposed model to provide concrete evidence and validation for the results discussed in the manuscript.

Response 2:

In the 'Results' and 'Discussion' sections, we analyzed its scalability, throughput, and thermal dissipation. However, we did not apply any specific AI algorithm to obtain more precise numerical results, as this would require additional time and space. Such an analysis could be addressed in a future study using a specific AI algorithm.

page 9, line 252:

"Consequently, the maximum input or output array size is limited to 96 × 96 under the op-tical constraints outlined in reference [17]. This array size enables approximately (96 × 96)², or 8.5 × 10⁷, parallel multiply-and-accumulate (MAC) operations per instruction cycle. Assuming a delay of approximately 10 ns in the SPLM and an additional 10-ns delay in the detector plane of the second substrate [6], the parallel throughput of a single layer is estimated to be 4.3 × 10¹⁵ MAC/s."

We added a further comparative analysis between the cases using SLM and SPLM.

page12, line 352:

If typical SLMs are used instead of SPLMs to handle positive and negative weights separately by refreshing weight values, the delay increases to at least 10 μs, compared to just 10 ns for SPLMs. This excludes the time required for serialization and loading input data into the SLM array. Consequently, replacing SLMs with SPLMs results in a through-put improvement of over 10,000 times. A similar advantage is observed in applications where the same input data is applied to different sets of weight values. The throughput gain becomes particularly evident with SPLMs in ONN systems, as they significantly re-duce the delay when switching between different weight sets.

Furthermore, scaling input and output nodes falls under the multiple weight set sce-nario. Doubling the input and output nodes causes the number of interconnections to in-crease fourfold, while the number of calculation steps rises ninefold. If a calculation step using SPLMs takes 10 ns [6], then doubling the scale requires 90 ns, whereas the same calculation using SLMs takes 900 μs. Consequently, the parallel throughput of a single layer, as described at the end of the previous section, is estimated to decrease to approximately 1.9 × 10¹⁵ MAC/s and 1.9 × 10¹¹ MAC/s for SPLMs and SLMs, respectively. Therefore, SPBONN does not significantly degrade parallel throughput while doubling the in-put and output nodes, thereby providing greater flexibility for ONN.

Reviewer 2 Report

Comments and Suggestions for Authors

In this work “Rapid-reconfigurable and flexible optical neural network 2 based on free-space optics using lens arrays and a smart pixel 3 light modulator” proposed by Young-Gu Ju from Kyungpook National University propose a free-space optical computation architecture namely smart pixel light modulators (SPLMs) for bidirectional optical neural networks (BONNs), and highlighting their advantages over traditional spatial light modulators (SLMs). Overall, it is an interesting work and provide some new insight for the comity, especially the the researchers in the field of optical computing. However, before the recommendation, some concerning showed be addressed:

1) Although this work is only about a discussion to the computing architecture, while the expreiments is absent, some additional discussions about the practical implement of SPLM could add the value of the paper.

2) The suggested SPLM is critical thanks to the processed replacement of the slow spatial light modulators with relative fast LED, so the limitation is still the speed of the LED. Hence some information about the development of the modulation speed of the LED should be helpful. Also there should be a heating-desipation-modulation-speed trade-off, and this issue might limit the performance of the SPLMs, the author should give further discussion or evidences to persuade the reviewer.

3) About the fabrication of SPLMs, the prism and grating is employed in SPLMs architecture (e.g., see figure 4b in th emanuscript), so, as a free-space architecture, the assemble and alignment of the prism with the device should be critical for the fabrication. So any further dicsucaion about the fabrication-error limitation of the implement of SPLMs?

4) About the introduction some discussion about the devolepment of the optical computing should be enhanced, especcialy, besides the free-space routes, the discussions exploration on the PIC (photonic integrated chip) routes should be enhanced. Add some Refs on this aspect should be nice.

Author Response

Thank you very much for your valuable comments. We have made every effort to incorporate the necessary revisions into the manuscript accordingly.

Comments 1: Although this work is only about a discussion to the computing architecture, while the expreiments is absent, some additional discussions about the practical implement of SPLM could add the value of the paper.

Response 1:

We added the following paragraphs.

page 13, line 412 :

Implementing a practical SPBONN system requires consideration of many aspects, such as the design and fabrication of optoelectronic chips, lens arrays, and the optical alignment between them. Although smart pixel technology is an established and mature field, it needs to be adapted to accommodate SPBONN using advanced 3D chip packaging techniques, which have become more widely used recently due to the development of AI chips.

A more thorough tolerance analysis of the optical alignment is necessary to ensure the system's feasibility, even though a basic analysis was conducted in the scalability study [17]. The optical elements used in SPBONN, as shown in Figure 5, are simplified and can be implemented with a single diffractive optical element (DOE) of high precision, significantly simplifying the alignment process. In this scheme, the critical factor is the alignment between the smart pixels on the first substrate and the DOE, which can be maintained within a tolerance of less than 5 μm—a feasible target [26]. The same logic applies to the light source for the backward direction, as shown in Figure 4(b).

Comments 2: The suggested SPLM is critical thanks to the processed replacement of the slow spatial light modulators with relative fast LED, so the limitation is still the speed of the LED. Hence some information about the development of the modulation speed of the LED should be helpful. Also there should be a heating-desipation-modulation-speed trade-off, and this issue might limit the performance of the SPLMs, the author should give further discussion or evidences to persuade the reviewer.

Response 2:

We have added the following sentence and corresponding references.

page 3, line 121

"Since LEDs have some limitations in modulation speed, they can be replaced with multi-mode vertical-cavity surface-emitting lasers for higher modulation speeds [19,22,23]."

Comments 3: About the fabrication of SPLMs, the prism and grating is employed in SPLMs architecture (e.g., see figure 4b in th emanuscript), so, as a free-space architecture, the assemble and alignment of the prism with the device should be critical for the fabrication. So any further dicsucaion about the fabrication-error limitation of the implement of SPLMs?

Response 3:

We added the following paragraphs.

page 13, line 418:

A more thorough tolerance analysis of the optical alignment is necessary to ensure the system's feasibility, even though a basic analysis was conducted in the scalability study [17]. The optical elements used in SPBONN, as shown in Figure 5, are simplified and can be implemented with a single DOE of high precision, significantly simplifying the alignment process. In this scheme, the critical factor is the alignment between the smart pixels on the first substrate and the DOE, which can be maintained within a tolerance of less than 5 μm—a feasible target [26]. The same logic applies to the light source for the backward direction, as shown in Figure 4(b).

Comments 4: About the introduction some discussion about the devolepment of the optical computing should be enhanced, especcialy, besides the free-space routes, the discussions exploration on the PIC (photonic integrated chip) routes should be enhanced. Add some Refs on this aspect should be nice.

Response 4:

We added a reference to PIC in addition to [9], along with the following sentence in the introduction.

page 1, line 43:

Some ONNs based on photonic integrated circuits [9,10] have been successful. However, they use physical waveguides that cannot cross each other, resulting in lower space efficiency and reduced parallel throughput for a given connection density.

[10] Ashtiani, F.; Geers, A.J.; Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 2022, 606, 501–506.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper explores the application of smart pixel light modulators (SPLMs) in bidirectional optical neural networks, which improves modulation speed and system performance compared with traditional spatial light modulators. The system offers the advantages of high throughput, high scalability, and hardware simplicity while solving power consumption issues through thermal management.

1. In lines 60–68, the paper is somewhat brief when discussing the innovative aspects of the study. Please provide more specificity about what makes this study truly innovative.

2. The SPLM mentioned in the paper features a high refresh rate and excellent expandability. Could these capabilities be demonstrated in specific tasks? Do the performance evaluations thoroughly account for varying input/output node sizes and network scales?

3. In line 227, the paper mentions a 96×96 node limit. Could this become a performance bottleneck? Are there any potential ways to overcome this limitation?

4. The discussion in the Figure 6 section of the article is cumbersome and confusing. Adding a logic diagram could help clarify the process of doubling the input and output nodes.

5. The description of the multilayer neural network data flow in the paper is rather conceptual. Please include more detailed mathematical modeling or simulation results to better support its feasibility.

6. Multiple conversions between optical and electronic signals introduce delays, which can accumulate in deep neural networks as information is transferred from layer to layer. In line 230, the paper mentions a delay of 10 ns for both the SPLM and the detector. Is this delay caused by the optoelectronic conversion? Please provide the source of this data.

7. The paper emphasizes the superior performance of the proposed neural network but appears to lack sufficient experimental or simulation evidence. Could the authors provide performance validation for specific training tasks?

Author Response

Thank you very much for your valuable comments. We have made every effort to incorporate the necessary revisions into the manuscript accordingly.

Comments 1: In lines 60–68, the paper is somewhat brief when discussing the innovative aspects of the study. Please provide more specificity about what makes this study truly innovative.

Response 1:

We revised the paragraph as follows.

The previous one:

"To address these issues, we propose an ONN based on free-space optics that uses lens ar-rays and a smart pixel light modulator (SPLM). The ONNs proposed herein are a varia-tion of the ONNs previously reported, known as a linear combination optical engine (LCOE) [16] and BONN [18]. We replace the SLM in the previous systems with the SPLM to achieve higher modulation speeds, resulting in a higher refresh rate of weights in the ONN. Furthermore, we investigate the performance of the SPLM-based ONN (SPONN) and examine the effect of memory usage in the SPLM on the scaling of the ONN, without relying on the clustering techniques used in previous systems to increase the number of input and output nodes."

page 2, line 64:

"To address these challenges, we propose an optical neural network (ONN) based on free-space optics that incorporates lens arrays and a smart pixel light modulator (SPLM). The ONNs presented here are advancements of previously reported architectures, such as the linear combination optical engine (LCOE) [17] and BONN [19]. By replacing the SLM in earlier systems with the SPLM, we achieve significantly higher modulation speeds, resulting in a faster refresh rate for weights in the ONN. This enhanced refresh rate makes BONN and TMLBONN more practical for real-world applications. BONN enables backward data flow, which is critical for learning algorithms, while TMLBONN saves significantly more hardware resources by emulating a multi-layer neural network. Thus, integrating SPLM into ONN technology may pave the way for developing more versatile and practical ONNs using current smart-pixel technologies.

Additionally, we analyze the performance of the SPLM-based ONN (SPONN) and explore how memory usage in the SPLM influences the scalability of the ONN. Unlike previous systems, this approach eliminates the need for clustering techniques to increase the number of input and output nodes, offering a more streamlined and efficient solution."

Comments 2: The SPLM mentioned in the paper features a high refresh rate and excellent expandability. Could these capabilities be demonstrated in specific tasks? Do the performance evaluations thoroughly account for varying input/output node sizes and network scales?

Response 2:

We added the following two paragraphs to explain the advantages of the SPLM.

page 12, line 352:

If typical SLMs are used instead of SPLMs to handle positive and negative weights separately by refreshing weight values, the delay increases to at least 10 μs, compared to just 10 ns for SPLMs. This excludes the time required for serialization and loading input data into the SLM array. Consequently, replacing SLMs with SPLMs results in a throughput improvement of over 10,000 times. A similar advantage is observed in applications where the same input data is applied to different sets of weight values. The throughput gain becomes particularly evident with SPLMs in ONN systems, as they significantly reduce the delay when switching between different weight sets.

Furthermore, scaling input and output nodes falls under the multiple weight set sce-nario. Doubling the input and output nodes causes the number of interconnections to in-crease fourfold, while the number of calculation steps rises ninefold. If a calculation step using SPLMs takes 10 ns [6], then doubling the scale requires 90 ns, whereas the same calculation using SLMs takes 900 μs. Consequently, the parallel throughput of a single layer, as described at the end of the previous section, is estimated to decrease to approximately 1.9 × 10¹⁵ MAC/s and 1.9 × 10¹¹ MAC/s for SPLMs and SLMs, respectively. Therefore, SPBONN does not significantly degrade parallel throughput while doubling the input and output nodes, thereby providing greater flexibility for ONN.

Comments 3: In line 227, the paper mentions a 96×96 node limit. Could this become a performance bottleneck? Are there any potential ways to overcome this limitation?

Response 3:

It could be considered a bottleneck to some extent for LCOE-like ONNs. However, this issue can, in principle, be addressed through the clustering technique, as explained in reference [16], and through scaling using SPLMs, as discussed at the end of the Discussion section of this paper.

page 10, line 297:

Previously, the scaling limits of ONNs were overcome by using clustering techniques in hardware, where multiple ONN modules were stacked in layers to redistribute extended input data and make full connections between the doubled input and output nodes [17].

page 10, line 302:

The process of doubling the input and output nodes is illustrated in Figure 6.

page 13, line 402:

In this sense, SPTMLBONN, with memory embedded in smart pixels, enables scaling of the neural network in the direction perpendicular to the layer.

Comments 4: The discussion in the Figure 6 section of the article is cumbersome and confusing. Adding a logic diagram could help clarify the process of doubling the input and output nodes.

Response 4:

We have added a logic diagram in Figure 6(e)

Page 10, line 314:

"The process of doubling the input and output nodes is summarized in the form of a logic diagram, as shown in Figure 6(e)."

Comments 5: The description of the multilayer neural network data flow in the paper is rather conceptual. Please include more detailed mathematical modeling or simulation results to better support its feasibility.

Response 5:

We added an example of a multilayer SPBONN displayed in Figure 5(c).

page 8, line226:

"An example of a multilayer SPBONN is displayed in Figure 5(c), which uses the cascading feature of SPBONN. The number of layers can be expanded further to increase parallel throughput for continuous input data."

Comments 6: Multiple conversions between optical and electronic signals introduce delays, which can accumulate in deep neural networks as information is transferred from layer to layer. In line 230, the paper mentions a delay of 10 ns for both the SPLM and the detector. Is this delay caused by the optoelectronic conversion? Please provide the source of this data.

Response 6:

The SPLM shown in Figure 5 has an electric fan-out connected to the EP. The signal from the EP is sent to the LED or VCSEL on each smart pixel. This process involves a single EO conversion, which is assumed to take approximately 1 ns, considering that modern VCSEL drivers operate at frequencies well above a few GHz. A similar assumption applies to the detector on the second substrate. Therefore, we believe that the total 20 ns delay is not an exaggeration, given modern optoelectronic technology. Additionally, when calculating the parallel throughput of a single layer, the delay is considered only within that layer. If a multilayer ONN operates with continuous input data, the multiple layers function simultaneously with the same delay.

According to reference [6], the smart pixel array (SPA) achieved approximately 1 Gb/s per channel in 1995 (A. L. Lentin et al). This is why we assumed the delay to be on the order of 10 ns, as mentioned in line 115. Therefore, we added reference [6] to the sentences in lines 133 and 255, as follows:

page 3, line 133:

Although the use of SPLM introduces two additional steps of conversion between optical and electronic signals, the delay from this process is less than a few nanoseconds [6].

page 9, line 255:

"Assuming a delay of approximately 10 ns in the SPLM and an additional 10-ns delay in the detector plane of the second substrate, the parallel throughput of a single layer is estimated to be 4.3 × 10¹⁵ MAC/s."

-> "Assuming a delay of approximately 10 ns in the SPLM and an additional 10-ns delay in the detector plane of the second substrate [6], the parallel throughput of a single layer is estimated to be 4.3 × 10¹⁵ MAC/s. "

L. Lentine, R. A. Novotny, D. J. Reiley, R. L. Morrison, J. M. Sasian,
G. Beckman, D. B. Buchholz, S. J. Hinterlong, T. J. Cloonau, G. W.

Richards, and F. B. McCormick, “Demonstration of an experimental

single chip optoelectronic switching system,” presented at the IEEE

LEOS Annual Meeting, San Francisco, CA, Nov. 1995, post-deadline

paper.

Comments 7: The paper emphasizes the superior performance of the proposed neural network but appears to lack sufficient experimental or simulation evidence. Could the authors provide performance validation for specific training tasks?

Response 7:

We conducted an analysis of its scalability, throughput, and thermal dissipation in the 'Results' and 'Discussion' sections. However, we did not apply any specific AI algorithm to obtain more precise numerical results, as this would require additional time and paper space. Such an analysis using a specific AI algorithm could be addressed in a future study.

page 9, line 252: "Consequently, the maximum input or output array size is limited to 96 × 96 under the op-tical constraints outlined in reference [17]. This array size enables approximately (96 × 96)², or 8.5 × 10⁷, parallel multiply-and-accumulate (MAC) operations per instruction cycle. Assuming a delay of approximately 10 ns in the SPLM and an additional 10-ns delay in the detector plane of the second substrate [6], the parallel throughput of a single layer is estimated to be 4.3 × 10¹⁵ MAC/s. This throughput can be further increased as the number of layers increases, provided the data flow remains continuous, as in inference applica-tions, similar to pipelining in a digital computer, with multiple layers operating simulta-neously. For example, if there are 10 layers, the throughput could reach 4.3 × 10¹⁶ MAC/s, surpassing the throughput of a tensor processing unit by nearly 100 times [24]."

Reviewer 4 Report

Comments and Suggestions for Authors

The authors explore the integration of smart pixel light modulators (SPLMs) into bidirectional optical neural networks (BONNs), highlighting their advantages over traditional spatial light modulators (SLMs). This manuscript shows significant advantages. However, I would like to ask the following questions:

1 In the proposed SPBONN architecture, how does the performance of the bidirectional modulation using SPLMs compare to other existing methods for implementing bidirectional optical neural networks in terms of accuracy and computational efficiency?

2 The paper mentions that the scalability of the SPBONN is limited by the projection system. What are the potential technological advancements that could overcome these limitations in the future, and how might they impact the overall architecture and performance of the network?

3 While the energy consumption of the SPLM array is discussed, how does the energy efficiency of the proposed system compare to that of all-electronic neural networks with similar computational capabilities? Are there any strategies to further optimize the energy consumption of the SPBONN?

4 How sensitive is the performance of the SPONN to variations in the optical components (such as lens quality, alignment accuracy, etc.)? Have any tolerance analyses been conducted in this regard?

5 In the process of doubling the input and output nodes using smart pixel memory, the manuscript mentions a reduction in throughput. Can you provide a more detailed analysis of how this reduction affects the overall performance of the network in different application scenarios, and what trade-offs need to be considered?

6 The manuscript focuses on the advantages of SPLMs over SLMs, but are there any potential drawbacks or challenges associated with SPLMs that could limit their widespread adoption in optical neural networks? How could these be addressed?

7 In the context of the proposed architectures, what role could optical nonlinearities play in enhancing the computational capabilities of the ONN and BONN? Have any investigations been made in this direction?

8 How does the proposed SPONN architecture compare to other emerging optical computing architectures in terms of programmability and flexibility? Are there any opportunities for hybrid architectures that combine the strengths of different approaches?

9 The manuscript mentions that the SPLM-based TMLBONN architecture can conserve hardware resources. Can you provide a more detailed breakdown of the hardware savings compared to traditional multilayer ONN architectures, and how does this impact the cost and complexity of implementing the system?

10 In the absence of experimental results, what are the main assumptions and uncertainties in the theoretical performance analysis presented in the manuscript? How could these be addressed in future research to validate the proposed concepts?

11 For "5. Conclusions", why is it so long? For the conclusion part, in general, we just need to write some good experimental results and conclusions. We don't need to be so verbose.

Author Response

Thank you very much for your valuable comments. We have made every effort to incorporate the necessary revisions into the manuscript accordingly.

Comments 1: In the proposed SPBONN architecture, how does the performance of the bidirectional modulation using SPLMs compare to other existing methods for implementing bidirectional optical neural networks in terms of accuracy and computational efficiency?

Response 1:

We added the following two paragraphs to explain the advantages of the SPLM.

page 12, line 352:

If typical SLMs are used instead of SPLMs to handle positive and negative weights separately by refreshing weight values, the delay increases to at least 10 μs, compared to just 10 ns for SPLMs. This excludes the time required for serialization and loading input data into the SLM array. Consequently, replacing SLMs with SPLMs results in a throughput improvement of over 10,000 times. A similar advantage is observed in applications where the same input data is applied to different sets of weight values. The throughput gain becomes particularly evident with SPLMs in ONN systems, as they significantly reduce the delay when switching between different weight sets.

Furthermore, scaling input and output nodes falls under the multiple weight set scenario. Doubling the input and output nodes causes the number of interconnections to increase fourfold, while the number of calculation steps rises ninefold. If a calculation step using SPLMs takes 10 ns [6], then doubling the scale requires 90 ns, whereas the same calculation using SLMs takes 900 μs. Consequently, the parallel throughput of a single layer, as described at the end of the previous section, is estimated to decrease to approximately 1.9 × 10¹⁵ MAC/s and 1.9 × 10¹¹ MAC/s for SPLMs and SLMs, respectively. Therefore, SPBONN does not significantly degrade parallel throughput while doubling the in-put and output nodes, thereby providing greater flexibility for ONN.

In addition, at the end of Section 4, 'Results,' the comparison of SPONN with the previous ONN is carried out.

page 12, line 369:

Compared to previous ONNs [15-17,19], the SPONN architecture offers significant advantages in terms of flexibility and reconfigurability. Earlier ONNs, such as hologram-based ONNs [11] and diffractive deep neural networks [15,16], relied on DOEs to represent both the linear and non-linear components of a neural net-work. While these diffractive optics provided a fast, fully optical implementation, they were neither reconfigurable nor programmable, limiting their applications. Furthermore, hologram-based ONNs [11] lack cascading capability, which critically hinders the im-plementation of multilayer neural networks. Other ONNs, such as LCOE [17] and BONN [19], used SLMs, making them reconfigurable, but the slow speed of current SLMs hinders real-time weight updates. To maintain optical parallelism with slow SLMs, multiple layers need to be applied in a cascading manner, which becomes space-inefficient as modern deep neural networks require hundreds of layers..

In contrast, the SPONN architecture uses SPLMs to reconfigure network weights at speeds of just a few nanoseconds. This allows weight updates to occur while the previous layer’s computations are still being processed, ensuring real-time reconfigurability without sacrificing optical parallelism. This fast reconfigurability brings greater flexibility in both hardware and software. For hardware, SPBONN can adopt the TMLBONN [19] architecture, significantly reducing space requirements by facilitating data flow between two layers.

Comments 2: The paper mentions that the scalability of the SPBONN is limited by the projection system. What are the potential technological advancements that could overcome these limitations in the future, and how might they impact the overall architecture and performance of the network?

Response 2:

It could be considered a bottleneck to some extent for LCOE-like ONNs. However, this issue can, in principle, be addressed through the clustering technique, as explained in reference [16], and through scaling with SPLM, as discussed at the end of the Discussion section of this paper.

page 10, line 297:

page 10, line 302:

The process of doubling the input and output nodes is illustrated in Figure 6.

page 13, line 402:

In this sense, SPTMLBONN, with memory embedded in smart pixels, enables scaling of the neural network in the direction perpendicular to the layer.

Comments 3: While the energy consumption of the SPLM array is discussed, how does the energy efficiency of the proposed system compare to that of all-electronic neural networks with similar computational capabilities? Are there any strategies to further optimize the energy consumption of the SPBONN?

Response 3:

We do not have information about the thermal properties of the competing electronic neural network. The tensor processor in reference [2] achieves 5 × 10¹⁴ MAC/s, which is 1/100th the throughput of a 10-layer SPONN or 1/10th the throughput of a single-layer SPONN. If the SPONN operates at the same throughput as the tensor processor, its power dissipation would be only 17 W, which is very low. We added the following sentence on page 10.

page 10, line 291:

Considering that the throughput of a single-layer SPBONN decreases to the level of a ten-sor processing unit [24], the power consumption drops to only 17 W, which is very low compared to that of an electronic neural network

page 9, line 260:

For example, if there are 10 layers, the throughput could reach 4.3 × 10¹⁶ MAC/s, surpassing the throughput of a tensor processing unit by nearly 100 times [24].

page 10, line 286:

The total electrical power consumption would be approximately 170 W.

Comments 4: How sensitive is the performance of the SPONN to variations in the optical components (such as lens quality, alignment accuracy, etc.)? Have any tolerance analyses been conducted in this regard?

Response 4:

We added the following paragraphs.

page 13, line 418:

A more thorough tolerance analysis of the optical alignment is necessary to ensure the system's feasibility, even though a basic analysis was conducted in the scalability study [17]. The optical elements used in SPBONN, as shown in Figure 5, are simplified and can be implemented with a single DOE of high precision, significantly simplifying the alignment process. In this scheme, the critical factor is the alignment between the smart pixels on the first substrate and the DOE, which can be maintained within a tolerance of less than 5 μm—a feasible target [26]. The same logic applies to the light source for the backward direction, as shown in Figure 4(b).

Comments 5: In the process of doubling the input and output nodes using smart pixel memory, the manuscript mentions a reduction in throughput. Can you provide a more detailed analysis of how this reduction affects the overall performance of the network in different application scenarios, and what trade-offs need to be considered?

Response 5:

We added the following paragraphs.

page 12, line 360:

Furthermore, scaling input and output nodes falls under the multiple weight set scenario. Doubling the input and output nodes causes the number of interconnections to in-crease fourfold, while the number of calculation steps rises ninefold. If a calculation step using SPLMs takes 10 ns [6], then doubling the scale requires 90 ns, whereas the same calculation using SLMs takes 900 μs. Consequently, the parallel throughput of a single layer, as described at the end of the previous section, is estimated to decrease to approximately 1.9 × 10¹⁵ MAC/s and 1.9 × 10¹¹ MAC/s for SPLMs and SLMs, respectively. Therefore, SPBONN does not significantly degrade parallel throughput while doubling the input and output nodes, thereby providing greater flexibility for ONN.

Comments 6: The manuscript focuses on the advantages of SPLMs over SLMs, but are there any potential drawbacks or challenges associated with SPLMs that could limit their widespread adoption in optical neural networks? How could these be addressed?

Response 6:

The main obstacle seems to be the cost of designing and fabricating the optoelectronic chips using advanced 3D chip packaging.

We added the following paragraphs.

page 13, line 412:

Comments 7: In the context of the proposed architectures, what role could optical nonlinearities play in enhancing the computational capabilities of the ONN and BONN? Have any investigations been made in this direction?

Response 7:

We added the following sentences.

page 2, line 96:

Although optical nonlinearities enable faster computing without electronic delays [14], they are difficult to control and reconfigure precisely in large arrays with small form factors. In this paper, this role is performed by the EP or smart pixel. This reflects the direction of LCOE in designing ONNs to be general-purpose and programmable.

Comments 8: How does the proposed SPONN architecture compare to other emerging optical computing architectures in terms of programmability and flexibility? Are there any opportunities for hybrid architectures that combine the strengths of different approaches?

Response 8:

At the end of Section 4, 'Results,' a comparison of SPONN with the previous ONN is presented.

Regarding hybrid architectures, we believe that the features of SPLM can be applied to other architectures to enhance rapid reconfigurability and programmability.

page 12, line 369:

Comments 9: The manuscript mentions that the SPLM-based TMLBONN architecture can conserve hardware resources. Can you provide a more detailed breakdown of the hardware savings compared to traditional multilayer ONN architectures, and how does this impact the cost and complexity of implementing the system?

Response 9:

We added Figure 7 and two more paragraphs as follows.

page 13, line 393:

The operation of a smart pixel-based TMLBONN (SPTMLBONN) is illustrated in Figure 7. Data flows between two layers as light bounces back and forth between two mirrors, emulating 2n layers of a neural network. This approach reduces the hardware re-quirements from 2n layers to just 2 layers, saving hardware space and cost while increasing the delay by 2n times for continuous input compared to a 2n-layer multilayer SPBONN.

For example, if a 10-layer SPBONN achieves 4.3×10¹⁶ MAC/s, a 2-layer SPTMLBONN can achieve 4.3×10¹⁵ MAC/s, using five times less space and hardware resources. Therefore, SPTMLBONN offers significant advantages during the initial development stage, as its architecture requires much less hardware while still emulating an arbitrary number of layers, depending on the memory capacity of the smart pixels. In this sense, SPTMLBONN, with memory embedded in smart pixels, enables scaling of the neural network in the di-rection perpendicular to the layer. In contrast, if an SLM replaces the SPLM in the TMLBONN, the parallel throughput decreases by at least 10,000 times, resulting in 4.3×10^11 MAC/s. Thus, the advantage of the SPLM with TMLBONN is obvious.

Comments 10: In the absence of experimental results, what are the main assumptions and uncertainties in the theoretical performance analysis presented in the manuscript? How could these be addressed in future research to validate the proposed concepts?

Response 10:

The performance of SPBONN primarily depends on that of the smart pixel, which is assumed to have a delay of tens of nanoseconds. This technical goal seems feasible with the current level of semiconductor fabrication and optoelectronic chip packaging. According to reference [6], the smart pixel array (SPA) achieved approximately 1 Gb/s per channel in 1995 (A. L. Lentin et al). In the future, it should be demonstrated that the smart pixel light modulator can function within the SPBONN architecture in practice, starting with a small-sized array and medium modulation speed, and gradually scaling up to a larger array size and higher refresh rates to address the uncertainties.

L. Lentine, R. A. Novotny, D. J. Reiley, R. L. Morrison, J. M. Sasian,
G. Beckman, D. B. Buchholz, S. J. Hinterlong, T. J. Cloonau, G. W.

Richards, and F. B. McCormick, “Demonstration of an experimental

single chip optoelectronic switching system,” presented at the IEEE

LEOS Annual Meeting, San Francisco, CA, Nov. 1995, post-deadline

paper.

We added reference [6] to the relevant sentences and mentioned the development of smart pixels using 3D chip packaging in the future.

page 3, line 133:

Although the use of SPLM introduces two additional steps of conversion between optical and electronic signals, the delay from this process is less than a few nanoseconds [6].

page 9, line 255:

page 13, line 412:

Comments 11: For "5. Conclusions", why is it so long? For the conclusion part, in general, we just need to write some good experimental results and conclusions. We don't need to be so verbose.

Response 11:

We have shortened some parts of the conclusion as follows

"Scalability is a critical aspect of the SPBONN architecture. The core of SPBONN lies in its projection system, which faces scalability limitations due to geometrical aberrations and image magnification. These constraints limit the input and output array size to 96 × 96, enabling approximately 8.5 × 10⁷ parallel MAC operations per instruction cycle. Despite these limitations, the parallel throughput of a single layer can reach 4.3 × 10¹⁵ MAC/s, which can be further increased with additional layers, similar to pipelining in digital computers. For instance, with 10 layers, the throughput could reach 4.3 × 10¹⁶ MAC/s. The increase in throughput by using multiple layers does not apply to the backpropagation algorithm, where calculations are performed one layer at a time. Because multiple layers are not activated simultaneously, the need for high optical parallelism across multiple layers is reduced, along with the hardware space requirements. The SPLM-based TMLBONN architecture addresses these challenges by effectively managing multilayer sequential algorithms with high weight-refresh rates, conserving hardware resources, and ensuring efficient data flow between layers."

page 13, line 439:

"SPBONN with an array size of 96 × 96 enables approximately 8.5 × 10⁷ parallel MAC operations per instruction cycle. The parallel throughput of a single layer can reach 4.3 × 10^15 MAC/s, which can be further increased with additional layers, similar to pipelining in digital computers. For instance, with 10 layers, the throughput could reach 4.3 × 10¹⁶ MAC/s. The SPLM-based TMLBONN can emulate a multilayer ONN using the high weight-refresh rates of the SPLM, saving hardware resources and ensuring data flow be-tween two layers at much higher speeds than an SLM-based BONN.."

"An assessment of the energy consumption of SPBONN was conducted, given that SPLMs may consume more power than SLMs. Unlike SLMs, such as LCDs, which are en-ergy efficient and produce minimal heat, SPLMs use LEDs or LDs that emit both light and heat, particularly in large arrays. For instance, a 96 × 96 SPLM array could consume ap-proximately 170 W. However, effective heat dissipation via convection is feasible if the SPLM array is sufficiently large."

page 14, line 446:

"An assessment of the energy consumption of SPBONN was conducted, given that SPLMs may consume more power than SLMs. For instance, a 96 × 96 SPLM array could consume approximately 170 W. However, effective heat dissipation via convection is fea-sible if the SPLM array is sufficiently large."

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed the queries raised by the reviewer. I recommend that the paper be published.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is only a conceptual introduction of a BONN based on smart pixel light modulator. The key problem of this paper is a lack of simulation or experiment to demonstrate this model and make it convincing. I sugguest the author to add some simulation of the propsed BONN for a specific task . Then the paper can be reconsidered.

Reviewer 4 Report

Comments and Suggestions for Authors

I appreciate the author's effort in incorporating the suggestions and revising the manuscript accordingly. All the raised concerns have been effectively addressed. Therefore, I recommend its publication in Photonics in its current form.