# FatNet: High-Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials

## 3. Methods

- The FatNet should preserve the same number of layers as the original network to keep the same number of non-linear activation functions.
- The FatNet should keep precisely the same architecture as the original network on the shallow layers until the shape of the feature maps pools down to the shape where the number of elements of the feature map is less than or equal to the number of classes.
- FatNet has the same total number of pixels of the feature maps at the output of each layer as the original networks. Hence, since the feature maps’ shape stays constant and does not use pooling, the new number of output channels needs to be calculated, which will be less than for the original network.
- FatNet has the same number of trainable parameters per layer as the original network. Since we have reduced the number of output channels based on the third rule, the number of trainable parameters has also been reduced. Hence, a new kernel size needs to be calculated based on the number of output channels.

#### Experiments

## 4. Results

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

CIFAR | Canadian Institute For Advanced Research |

CNN | Convolutional Neural Network |

ASIC | Application-Specific Integrated Circuit |

ELU | Exponential Linear Unit |

SGD | Stochastic Gradient Descent |

FFT | Fast Fourier Transfer |

TPU | Tensor Processing Unit |

MZI | Mach–Zehnder Interferometer |

SLM | Spatial Light Modulators |

## References

- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM
**2017**, 60, 84–90. [Google Scholar] [CrossRef][Green Version] - Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv
**2016**, arXiv:1506.02640. [Google Scholar] - Tompson, J.; Goroshin, R.; Jain, A.; LeCun, Y.; Bregler, C. Efficient Object Localization Using Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 648–656. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv
**2015**, arXiv:1505.04597. [Google Scholar] - Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Proceedings of the Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 525–542. [Google Scholar]
- Sunny, F.P.; Taheri, E.; Nikdast, M.; Pasricha, S. A Survey on Silicon Photonics for Deep Learning. ACM J. Emerg. Technol. Comput. Syst.
**2021**, 17, 1–57. [Google Scholar] [CrossRef] - Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. arXiv
**2017**, arXiv:1704.04760. [Google Scholar] - Davies, M.; Srinivasa, N.; Lin, T.H.; Chinya, G.; Cao, Y.; Choday, S.H.; Dimou, G.; Joshi, P.; Imam, N.; Jain, S.; et al. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro
**2018**, 38, 82–99. [Google Scholar] [CrossRef] - DeBole, M.V.; Taba, B.; Amir, A.; Akopyan, F.; Andreopoulos, A.; Risk, W.P.; Kusnitz, J.; Ortega Otero, C.; Nayak, T.K.; Appuswamy, R.; et al. TrueNorth: Accelerating From Zero to 64 Million Neurons in 10 Years. Computer
**2019**, 52, 20–29. [Google Scholar] [CrossRef] - Waldrop, M.M. The chips are down for Moore’s law. Nat. News
**2016**, 530, 144. [Google Scholar] [CrossRef][Green Version] - Li, X.; Shao, Z.; Zhu, M.; Yang, J. Fundamentals of Optical Computing Technology: Forward the Next Generation Supercomputer, 1st ed.; Springer: New York, NY, USA, 2018. [Google Scholar]
- Lin, X.; Rivenson, Y.; Yardimci, N.T.; Veli, M.; Luo, Y.; Jarrahi, M.; Ozcan, A. All-optical machine learning using diffractive deep neural networks. Science
**2018**, 361, 1004–1008. [Google Scholar] [CrossRef][Green Version] - Li, S.; Miscuglio, M.; Sorger, V.; Gupta, P. Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators. arXiv
**2020**, arXiv:2011.07391. [Google Scholar] - Chang, J.; Sitzmann, V.; Dun, X.; Heidrich, W.; Wetzstein, G. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep.
**2018**, 8, 12324. [Google Scholar] [CrossRef] [PubMed][Green Version] - Shen, Y.; Harris, N.C.; Skirlo, S.; Prabhu, M.; Baehr-Jones, T.; Hochberg, M.; Sun, X.; Zhao, S.; Larochelle, H.; Englund, D.; et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics
**2017**, 11, 441–446. [Google Scholar] [CrossRef][Green Version] - Hughes, T.W.; Minkov, M.; Shi, Y.; Fan, S. Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica
**2018**, 5, 864–871. [Google Scholar] [CrossRef] - Sui, X.; Wu, Q.; Liu, J.; Chen, Q.; Gu, G. A Review of Optical Neural Networks. IEEE Access
**2020**, 8, 70773–70783. [Google Scholar] [CrossRef] - Bracewell, R.N. The Fourier Transform and Its Applications, 3rd ed.; McGraw-Hill Series in Electrical and Computer Engineering Circuits and Systems; McGraw-Hill: Boston, MA, USA, 2000. [Google Scholar]
- Gaskill, J.D. Linear Systems, Fourier Transforms, and Optics, 1st ed.; Wiley-Interscience: New York, NY, USA, 1978. [Google Scholar]
- Cooley, J.W.; Tukey, J.W. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput.
**1965**, 19, 297–301. [Google Scholar] [CrossRef] - Colburn, S.; Chu, Y.; Shilzerman, E.; Majumdar, A. Optical frontend for a convolutional neural network. Appl. Opt.
**2019**, 58, 3179–3186. [Google Scholar] [CrossRef] [PubMed] - Jutamulia, S.; Asakura, T. Fourier transform property of lens based on geometrical optics. In Proceedings of the Optical Information Processing Technology, Shanghai, China, 14–18 October 2002; Volume 4929, pp. 80–85. [Google Scholar] [CrossRef]
- Culshaw, B. The Fourier Transform Properties of Lenses. In Introducing Photonics; Cambridge University Press: Cambridge, UK, 2020; pp. 132–135. [Google Scholar] [CrossRef]
- Weaver, C.S.; Goodman, J.W. A Technique for Optically Convolving Two Functions. Appl. Opt.
**1966**, 5, 1248–1249. [Google Scholar] [CrossRef] [PubMed] - Jutamulia, S.; Yu, F.T.S. Overview of hybrid optical neural networks. Opt. Laser Technol.
**1996**, 28, 59–72. [Google Scholar] [CrossRef] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef][Green Version]
- Gron, A. Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4353–4361. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report, 2009, University of Toronto, Toronto. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 23 March 2023).
- Shah, A.; Kadam, E.; Shah, H.; Shinde, S.; Shingade, S. Deep Residual Networks with Exponential Linear Unit. In Proceedings of the Third International Symposium on Computer Vision and the Internet, Jaipur, India, 21–24 September 2016; pp. 59–65. [Google Scholar] [CrossRef][Green Version]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv
**2016**, arXiv:1511.07289. [Google Scholar] - Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv
**2020**, arXiv:1905.11946. [Google Scholar] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef][Green Version]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates Inc.: Vancouver, BC, Canada, 10–12 December 2019; Volume 32.
- Miscuglio, M.; Hu, Z.; Li, S.; George, J.K.; Capanna, R.; Dalir, H.; Bardet, P.M.; Gupta, P.; Sorger, V.J. Massively parallel amplitude-only Fourier neural network. Optica
**2020**, 7, 1812–1819. [Google Scholar] [CrossRef] - Li, J.; Peng, Z.; Fu, Y. Diffraction transfer function and its calculation of classic diffraction formula. Opt. Commun.
**2007**, 280, 243–248. [Google Scholar] [CrossRef] - Voelz, D.G. Computational Fourier Optics: A MATLAB® Tutorial; SPIE: Bellingham, WA, USA, 2011. [Google Scholar] [CrossRef]
- Mizusawa, S.; Sei, Y. Interlayer Augmentation in a Classification Task. In Proceedings of the 2021 International Conference on Computing, Electronics & Communications Engineering (iCCECE), Southend, UK, 16–17 August 2021; pp. 59–64. [Google Scholar] [CrossRef]
- Luo, J.H.; Wu, J.; Lin, W. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5058–5066. [Google Scholar]

**Figure 1.**Graphical representation of the 4f system performing the convolution operation, consisting of the input plane (laser), the convex lens, Fourier plane (modulator or phase mask), and another convex lens and the camera separated from each other by one focal distance of the lens. When light passes through the lens, it forms a 2D Fourier transform on the Fourier plane, where it can be multiplied by the kernel in the frequency domain. The light then passes through the second lens, which converts it back into the space domain, where the output is read by the camera.

**Figure 2.**Illustration of CIFAR-100 dataset examples. CIFAR-100 contains tiny images of 100 classes, with a resolution of 32 × 32.

**Figure 3.**Architecture comparison of our modified ResNet-18 used to train CIFAR-100 and FatNet constructed from ResNet-18 specifically for CIFAR-100 classification. (

**a**) ResNet-18 architecture, slightly modified from the original. Our version does not use strides, since optics cannot perform strides in convolutions. We also skipped the second non-residual convolutional layer to make it more compatible with CIFAR-100. (

**b**) FatNet derived from ResNet-18 for CIFAR-100. Compared with ResNet-18, this architecture contains fewer channels but larger resolutions. Kernel resolutions can go up to 10 × 10, while feature maps are not pooled lower than 10 × 10. The last layer is a 10 × 10 matrix flattened to form a vector of 100 elements, each representing a class of CIFAR-100.

**Figure 4.**Training and validation accuracy for each experimented network at every epoch. (

**a**) Training accuracy of ResNet-18, FatNet and Optical simulation of FatNet. All networks achieved an accuracy of 99%. However, the ResNet-18 required fewer epochs. On the other hand, the optical simulation took longer to train since it uses a more extended computation graph to simulate light propagation. (

**b**) Validation accuracy of ResNet-18, FatNet and Optical simulation of FatNet. ResNet-18 trained up to 66%, while FatNet could not achieve the validation and test accuracy higher than 60%, although it performed fewer convolution operations.

Layer | Number of Weights | Feature Pixels | FatNet Layer |
---|---|---|---|

64 × 128, k = (3 × 3) | 73,728 | 8192 | 64 × 82, k = (4 × 4) |

128 × 128, k = (3 × 3) | 147,456 | 8192 | 82 × 82, k = (5 × 5) |

128 × 128, k = (3 × 3) | 147,456 | 82,192 | 82 × 82, k = (5 × 5) |

128 × 128, k = (3 × 3) | 147,456 | 82,192 | 82 × 82, k = (5 × 5) |

128 × 256, k = (3 × 3) | 294,912 | 4096 | 82 × 41, k = (9 × 9) |

256 × 256, k = (3 × 3) | 589,824 | 4096 | 41 × 41, k = (19 × 19) |

256 × 256, k = (3 × 3) | 589,824 | 4096 | 41, 41, k = (19 × 19) |

256 × 256, k = (3 × 3) | 589,824 | 4096 | 41, 41, k = (19 × 19) |

256 × 512, k = (3 × 3) | 1,179,648 | 2048 | 41 × 21, k = (37 × 37) |

512 × 512, k = (3 × 3) | 2,359,296 | 2048 | 21 × 21, k = (73 × 73) |

512 × 512, k = (3 × 3) | 2,359,296 | 2048 | 21 × 21, k = (73 × 73) |

512 × 512, k = (3 × 3) | 2,359,296 | 2048 | 21 × 21, k = (73 × 73) |

FC (512, 100) | 51,200 | 100 | 21 × 1, k = (49 × 49) |

**Table 2.**Comparison of the test accuracy and number of convolution operations used in each tested network.

Architecture | Test Accuracy | Number of Conv Operations | Number of Conv Operations |
---|---|---|---|

mean±std | Ratio to Baseline | ||

ResNet-18 | $66\pm 1.4\%$ | 1,220,800 | 1 (baseline) |

FatNet | $60\pm 1.4\%$ | 148,637 | 0.12 |

Optical simulation FatNet | 60% | 148,637 | 0.12 |

**Table 3.**Inference time in seconds per input for ResNet-18 and FatNet with optics and GPU with batch sizes of 64 and 3136 for cases when the 4k resolution of the 4f device is fully utilized. The frame rate of the 4f device is approximated at 2 MHz [13].

Architecture | Batch 64 | Batch 3136 |
---|---|---|

ResNet-18 (GPU) | $1.350\times {10}^{-4}$ | $1.167\times {10}^{-4}$ |

FatNet (GPU) | $4.565\times {10}^{-4}$ | $7.942\times {10}^{-4}$ |

ResNet-18 (Optics) | $3.815\times {10}^{-2}$ | $7.786\times {10}^{-4}$ |

FatNet (Optics) | $4.645\times {10}^{-3}$ | $9.479\times {10}^{-5}$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ibadulla, R.; Chen, T.M.; Reyes-Aldasoro, C.C.
FatNet: High-Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks. *AI* **2023**, *4*, 361-374.
https://doi.org/10.3390/ai4020018

**AMA Style**

Ibadulla R, Chen TM, Reyes-Aldasoro CC.
FatNet: High-Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks. *AI*. 2023; 4(2):361-374.
https://doi.org/10.3390/ai4020018

**Chicago/Turabian Style**

Ibadulla, Riad, Thomas M. Chen, and Constantino Carlos Reyes-Aldasoro.
2023. "FatNet: High-Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks" *AI* 4, no. 2: 361-374.
https://doi.org/10.3390/ai4020018