A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks

Nguyen, Duy Thong; Nguyen, Trang; Thieu, Minh Duc; Nguyen, Huy

doi:10.3390/photonics13020163

Open AccessArticle

A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks

by

Duy Thong Nguyen

¹

,

Trang Nguyen

²,

Minh Duc Thieu

³ and

Huy Nguyen

^4,5,*

¹

Faculty of Engineering and Technology, Quy Nhon University, Quy Nhon 600000, Vietnam

²

Hanoi School of Business & Management (HSB), Vietnam National University (VNU), Hanoi 100000, Vietnam

³

HuePress JSC, Hanoi 100000, Vietnam

⁴

Institute of Research and Technology, Duy Tan University, Da Nang 550000, Vietnam

⁵

School of Engineering & Technology, Duy Tan University, Da Nang 550000, Vietnam

^*

Author to whom correspondence should be addressed.

Photonics 2026, 13(2), 163; https://doi.org/10.3390/photonics13020163

Submission received: 7 January 2026 / Revised: 2 February 2026 / Accepted: 6 February 2026 / Published: 8 February 2026

(This article belongs to the Special Issue Optical Wireless Communications (OWC) for Internet-of-Things (IoT))

Download

Browse Figures

Review Reports Versions Notes

Abstract

Wireless communication systems, which rely on radio frequencies (RFs), are widely utilized in various applications, such as mobile communications, radio frequency identification, marine networks, smart farms, and smart homes. Due to their ease of installation, wireless systems offer advantages over wired alternatives. But the deployment of high-frequency radio waves for a communication system can pose potential health risks. To address these concerns, many researchers have explored the use of visible light as a safer alternative to radio frequency communication. In this context, optical camera communication has emerged as a good candidate compared to the RF system. Meanwhile, artificial intelligence (AI) is reshaping industries and human life by solving complex problems, enabling intelligent automation, and driving advancements in technologies such as smart farms, smart homes, and future internet of things systems. In this study, we recommend a Multiple-Input Multiple-Output Camera On–Off Keying (MIMO C-OOK) modulation that integrates a YOLOv11 for light source detection and tracking and a deep learning network-based decoder algorithm, optimized for long-range and mobility communication scenarios. The proposed approach enhances the conventional C-OOK system by increasing the data rate and transmission range while reducing errors at the receiver. Implementation results show that the proposed approach can achieve reliable communication up to 10 m with minimal errors, even under mobility conditions (3 m/s, equivalent to walking speed), by optimizing camera parameters and employing forward error correction (FEC).

Keywords:

C-OOK; MIMO technology; internet of things network; OCC

1. Introduction

Wireless technologies offer several advantages over wired communication technologies, such as easier installation, higher flexibility, and the ability to transmit broadcast data without physical connections. Radio frequency systems are widely useful in many communication networks. However, as wireless technologies advance, the RF spectrum faces limitations due to spectrum scarcity and high energy consumption. Achieving higher data rates requires operating in higher frequency bands, which has prompted researchers and organizations to plan a sixth-generation (6G) cellular system within the sub-terahertz (sub-THz) bands, targeting data rates between 1 and 10 Tbps [1]. Nevertheless, RF systems rely on electromagnetic waves that may negatively affect human health [2], particularly in applications such as hospitals, schools, and nursing homes, where vulnerable populations like the elderly, children, and patients are present [3].

From these concerns, researchers around the world have been investigating alternative technologies that can substitute RF systems. By using light waves instead of RF waves, we may reduce the electromagnetic effect on human health. Visible Light Communication (VLC), Light Fidelity (LiFi), and Optical Camera Communication (OCC) are three candidates of Optical Wireless Communication (OWC). These optical methods offer several advantages over RF-based systems:

Safety: Visible light is harmless to humans [4] due to non-electromagnetic interference (EMI).
Bandwidth: the light wave offers a spectrum bandwidth over 1000 times wider than RFs.
Efficiency: Visible light enables safer and more efficient data transmission.

Due to these benefits, OWC has gained significant attention and research funding. The IEEE first standardized OWC through IEEE 802.15.7-2011 [5] and later released an updated version, IEEE 802.15.7-2018 [6] and IEEE 802.15.7a-2024 [7].

LEDs have become a next-generation light for communication systems due to their long lifespan, high energy efficiency, low cost, and flexibility in design. In addition, LEDs are compatible with high-speed OWC systems because LEDs can switch ON/OFF quickly [8,9,10]. VLC and LiFi systems typically use photodiodes to detect the light intensity in real time, while OCC systems employ cameras as receivers to capture images and extract data from LED signals. Both the global-shutter camera and rolling-shutter camera are applied in the OCC system, enabling various modulation schemes depending on the shutter mechanism.

Though RF-based systems are still applied widely in communications, monitoring, and mobile networks, they can impact both system performance and human neurological functions due to emitting EMI [11]. RF-related human health is regarded as indirect rather than biological [12]. The primary problem is that EMI caused by RF signals may deteriorate or interfere with the functioning of medical devices, potentially resulting in hazardous clinical circumstances and patient injury. In contrast, OWC technologies are EMI-free and are therefore being actively studied as potential RF alternatives [13]. In VLC and LiFi, photodiodes detect the light intensity corresponding to the ON/OFF of light sources [14]. Several studies [15,16] already proposed ultra-high-speed pulse-density modulation using photodiodes, achieving excellent spectral efficiency. MIMO (Multiple-Input Multiple-Output) technology [17,18] has also been introduced to further increase data throughput by using multiple high-speed channels. Artificial intelligence (AI) models have also been applied to enhance the performance of VLC-MIMO systems, as reported in [19,20]. In these studies, deep learning techniques were utilized to investigate on a large scale in MIMO technology in VLC technology.

Nevertheless, photodiode-based systems have some limitations. LiFi systems are typically suitable only for short-range communication and are highly sensitive to mobility and channel conditions. In contrast, OCC systems, which use image sensors instead of photodiodes, can achieve a significantly longer communication range up to 200 m [21]. Authors in [22] analyzed how camera characteristics, such as shutter type, frame rate, focal length, and exposure time, affected OCC performance. For instance, global-shutter cameras almost entirely depend on the frame rate as dictated by Nyquist’s theorem, while rolling-shutter cameras require consideration of both frame rate and shutter speed. Currently, LiFi systems can achieve up to 10 m of communication using a photodiode with special lenses [23].

Region of Interest (RoI) signaling algorithms enable multiple light sources tracking in OCC systems; however, RoIs are difficult to detect with black/white strips with the rolling shutter effect in the OCC system. The C-OOK scheme, also standardized in IEEE 802.15.7-2018 [6], offers a high data rate but suffers from a short transmission range and high bit error rate (BER). A MIMO C-OOK approach [24,25] was later proposed to enhance distance and lower BER through matched filtering. However, it cannot achieve good performance with mobility environments, which will be explained in more detail in Section 3.

2. Contributions

In this study, the authors propose a deep learning network not only to detect and track the light sources but also a decoder process in a MIMO C-OOK system to enhance data rate performance besides reducing the bit error rate (BER) under challenging conditions involving long-distance communication and mobility. The proposed scheme offers several key advantages, summarized as follows:

Frame rate variation support: Camera frame rate variation effect cause packet loss at the receiver side. Although many systems assume a constant frame rate (e.g., 30 or 1000 fps), this rate may differ across devices, making synchronization between LEDs and the camera challenging. To address this issue, we insert a sequence number (SN) that helps identify the differences between the frame rate and the packet rate on the receiver/transmitter sides, respectively. From that, we can increase the communication performance.
Missing packet detection: To identify missing packets, the system compares SN from the consecutive image frames. When the difference between two SNs exceeds, the system detects and makes a decision about the missing data.
Merging data algorithm: A merging data process is applied to reconstruct the correct sequence of received frames. If the SNs from the consecutive image frames are consecutive, then the system can merge two continuous packets.
Enhanced bit error rate and throughput: Leveraging deep learning enables the system to detect multiple LEDs with higher accuracy under long-range and mobile conditions compared to conventional Region of Interest (RoI)-for light source detection, resulting in improved bit error rate throughput.
Mobility support: C-OOK system utilizing the rolling-shutter effect is typically more sensitive to motion, making multi-LED detection challenging for object detection algorithms. By applying a deep learning model to enhance LED detection under a mobility environment, this approach significantly improves detection robustness in dynamic environments.

3. System Architecture

The OCC systems remain by collecting the power of light sources for data transmitting and receiving while enhancing communication performance through effective modulation techniques. The simplest and most widely used amplitude-shift keying method is on–off keying (OOK), which encodes data using two levels “on” and “off” corresponding to binary “1” and “0”, correspondingly.

In our approach, we present a detailed MIMO C-OOK system that leverages DL for both LED tracking and decoding. By integrating DL, the proposed approach enhances the OCC system performance compared to conventional methods. The architecture of the proposed system using DL is revealed in Figure 1.

3.1. DL-Based LED Detection and Tracking

In OCC systems, Region of Interest (RoI) algorithms are widely utilized and well established [26,27,28]. Most real-time RoI-based object detection techniques deploy both object-level and feature-based detection methods. However, due to the rolling-shutter impact, LEDs were captured in images as alternating black/white stripes that represent binary “0” and “1” bits. Each frame typically contains multiple black/white stripes, making RoI-based detection challenging—especially under mobility conditions where motion blur and displacement occur.

Deep learning-based neural networks have proven highly effective in computer vision processes, for example: object tracking, object arrangement, positioning, and picture reconstruction. Among these, Convolutional Neural Networks (CNNs) demonstrated strong performance and are particularly suitable for vision-based applications. The YOLO (You Only Look Once) of CNN-based algorithms represents one of the most advanced real-time object-tracking frameworks available.

In this study, authors customize and train YOLOv11 [29,30,31,32] models specifically for LED detection and tracking in OCC systems, taking into account the effects of rolling shutters and mobility. To evaluate the proposed LED detection performance, the authors constructed a dataset using real-world traffic scenes. Both daytime and nighttime video footage were captured under mobile conditions, generating a total of 1000 images that included both clear and motion-blurred samples with varying exposure times, velocities, etc. These images were manually labeled and used to train a modified YOLOv11 network consisting of 5 to 7 convolutional layers, configured for a single detection class. The final convolutional layer employed 40 filters to optimize feature extraction and classification accuracy. Figure 2 shows the deep learning detection and tracking and RoI detection and tracking. With RoI algorithms, we can get a lot of white stripes instead of only the LED position as a deep learning model. The accuracy rate can be achieved higher than 95% for object detection with mobility environment. By applying the DL model for detection and tracking, the OCC performance can be improved compared with RoI for the mobility environment.

3.2. Deep Learning-Based Data Decoder

In the OCC system, the transmission distance increases as the SNR decreases. It makes it more difficult for the receiver to accurately distinguish between ON and OFF signal levels, and the bit error rate will be increased. In our previous work [25], the authors introduced a matched filter to improve the SNR and increase the distances. However, as discussed in Section 4, the conventional decoder has a high bit error rate in mobility environments. Then, the deep learning decoder [33,34,35,36] should apply for the future OCC system.

The preamble position should be carefully considered during decoding because it marks the start of the frame. If the preamble position cannot be found, all packets will be decoded incorrectly. With a long communication distance, the preamble position is difficult to detect, as shown in Figure 3a. We propose deep learning for preamble detection as shown in Figure 4. The threshold level, proposed in [24], splits ON/OFF signal levels at images, and is difficult to define with a low SNR value. This method performs well with short distances (high SNR conditions) at Figure 3a, but if the distances increase, distinguishing between ON and OFF states becomes confusing, leading to the SNR reducing as shown in Figure 3b. The matched filter technique may apply to maximize the SNR under long-distance conditions. Although a matched filter can work well at a low SNR, it does not perform well due to the blur effect in the mobility conditions. In addition, when the LED and camera move relative to each other, image blurring will cause inter-symbol interference (ISI), which will degrade the overall communication performance in OCC technology. To address this issue, we adopt a DL-based approach detecting the preamble and decoding the data, explicitly taking into account the mobility effect. For performance evaluation, the Root Mean Square Error (RMSE) was employed to assess prediction accuracy of the DL model. After training for 200 epochs, the proposed model achieved high accuracy, with RMSE values below 0.1, indicating strong prediction and decoding performance under mobile communication scenarios.

After LED recognition, OCC signals were extracted from multiple light sources within the identified LED area, considering a down-sampling algorithm. C-OOK waveform is obtained by analyzing the center area of each LED, while the start of frame and threshold in each LED are determined through a deep learning decoder. A total of 10,000 raw samples were gathered using a rolling shutter camera considering some distances (1 m, 5 m, 10 m, 15 m, 20 m, and 25 m) and under changed motion velocities. Each dataset included both the preamble and payload segments. To prevent model overfitting, a simple deep learning architecture with two hidden layers was adopted. After successful preamble detection, the proposed system accurately identified the start frame of C-OOK waveforms, thereby enhancing communication performance compared to conventional methods, particularly in mobile environments. When the number of hidden layers exceeded six, the model exhibited signs of overfitting, resulting in decreased accuracy on the test dataset. Figure 4 shows deep learning for the preamble detection and decoder, which are applied to improve OCC performance in a mobility environment.

4. Results

4.1. Fundamentals of OOK Modulation

The pixel noise in cameras can be presented as in [24] as follows:

n ~ N (0, δ {(s)}^{2})

(1)

with s representing the pixel intensity, the noise variance is defined as

δ^{2} (s) = s . a . α + β

, a represents as mark/space intensities, and

α

,

β

are fitting parameters obtained experimentally. These model-fitting coefficients are applied in our implementation and can be estimated empirically, as described in [24]. Equation (2) is then used to compute the pixel-level

E_{b} / N_{0}

at the camera from one bit:

P i x e l \frac{E_{b}}{N_{0}} = \frac{E [s^{2}]}{E [n^{2}]} \approx \frac{a^{2} . Δ}{a . α . Δ + β}

(2)

with

E_{b}

is the energy per bit,

N_{0}

denotes the noise power spectral density, s is a pixel intensity, and

Δ = T_{\exp o s u r e} / T_{b i t}

represents the ratio between the exposure time of camera and the duration of bit. The parameters

α

,

β

correspond to experimentally derived fitting constants. Figure 5 displays the relationship between pixel intensity and pixel

E_{b} / N_{0}

.

4.2. Proposed Modulation

When the frame rate of camera is twice the length of time as the packet rate on the transmitter side, each packet is captured multiple times, leading to an oversampling effect. In addition, packet merging issues may arise at the camera. To mitigate these problems, SN was embedded in the data structure (DS), enabling the receiver to better handle the frame rate variation. If the receiver defines the SN values from DSs of different packets, the duplicated packets are removed. In Figure 6a, the Rx side discards consecutive data in the same sequence number and merges data with consecutive SNs (n − 1, n, n + 1).

Undersampling happens when the camera frame rate is lower than the transmitter’s packet rate. Payload data may be dropped. Figure 6b shows an example in which a packet is missing and is detected using the SN. The SN length in this scenario is sufficient for the receiver to identify the missing payload. Specifically, the length of SN in each frame is raised based on the payload sequence. The missing payloads may be found by linking the SNs of consecutive DSs; a discontinuity indicates an error. The SN length determines the number of detectable states. For example, a 2-bit SN allows identification of up to four missing payloads. An error is detected if two consecutive packets include non-sequential SNs (n and n + 2), as shown in Figure 6b.

4.3. Demonstration Results

In this study, the neural network was deployed multiple times for MIMO C-OOK technology using different image sensors to evaluate the impact of camera parameters. The SN length should also be optimized appropriately to improve the system performance. Figure 7 illustrates an experimental setup, and Figure 8 shows the quantized intensity profile of the MIMO-COOK signal in the receiver side. Figure 9 shows the receiver side, which displays the LED display in camera, the signals from two LEDs, and text output after decoding. The comparison results between the conventional decoder and the deep learning-based decoder for our method are revealed in Figure 10.

For LED detection, after 7000 training cycles, the neural network model achieved an average loss of about 0.12. The trained model was further evaluated under day/night conditions with some ranges and 3 m/s mobility scenarios to make sure that our approach can work well with real scenarios. Figure 10 shows the BER performance with/without the deep learning decoder with different distances considering 3 m/s velocity.

The OCC system uses optical clock rates of 10 kHz with 4B6B coding, combined with Reed–Solomon (15, 11) as forward error correction with four LEDs (12 V, 3 W LED). At the transmitter side, we transmit data with a packet rate of 60 packets per second. The receiver is a rolling-shutter camera (FL3-U3-13S2C-CS, Edmund Optics, Singapore) working at 60 fps. The achieved uncoded data rates were up to 7.2 kbps, while the coded data rates were up to 5.28 kbps.

Figure 10 shows the performance results of BER values versus distance performance using both with/without a deep learning decoder under the same camera parameters. Under similar environments and distances, the deep learning decoder demonstrated a significant improvement in system performance compared to the conventional approach. The results demonstrate that higher data rates can be attained by increasing the packet length and the number of LEDs. But, as discussed previously, the optical clock rate and packet rate should be appropriately matched to the camera parameters, communication distance, and image resolution to ensure optimal performance. The proposed deep learning neural network deploys with a dual role—facilitating accurate LED detection and assisting in data decoding—thereby enhancing the robustness of the OCC system, considering long-range and mobility environments. By applying YOLOv11 for detection, we can achieve higher than 95% accuracy considering the instantaneous speed of the movement up to 3 m/s with two LEDs. To evaluate mobility effects in practical indoor environments, the system was applied at a speed of 3 m/s, corresponding to an average walking speed. To increase the OCC performance, we may collect more datasets, which cover all mobility environments and train it to increase performance. A visual demonstration of the proposed approach employing two light sources at a range of 2 m is shown in the Supplementary Materials.

5. Conclusions

In this study, we showed a MIMO C-OOK modulation incorporating a DL model for multi-LED recognition and tracking. Due to the rolling-shutter impact, the light sources show up in captured images as alternating black/white strips, which complicates accurate detection compared to conventional RoI-based methods. Additionally, the DL model is employed for data decoding, providing improved performance for long distances, especially under mobility conditions. Finally, the BER was evaluated across various transmission distances and compared with that of conventional decoder approaches.

Supplementary Materials

The supplementary materials have been uploaded at https://zenodo.org/records/18170307 (accessed on 6 January 2026). The video demonstrates the MIMO C-OOK scheme incorporating deep learning at a transmission distance of 2 m.

Author Contributions

Methodology, D.T.N.; Validation, D.T.N.; Formal analysis, M.D.T.; Investigation, T.N.; Resources, T.N.; Writing—original draft, D.T.N.; Writing—review & editing, M.D.T. and H.N.; Visualization, T.N., M.D.T. and H.N.; Project administration, H.N.; Funding acquisition, H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Vietnam Ministry of Education and Training (MOET) under grant number B2026-DQN-03.

Data Availability Statement

Restrictions apply to the datasets due to the project policy.

Conflicts of Interest

Author Minh Duc Thieu was employed by the company HuePress JSC. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Pan, Z.; Xu, Z.; Miao, R.; Zhao, T.; Wang, J. Prospects of 6G Technology Framework: A Big-Lite Multi-RATs Concept. IEEE Commun. Mag. 2025, 63, 174–180. [Google Scholar] [CrossRef]
Kim, J.H.; Lee, J.K.; Kim, H.G.; Kim, K.B.; Kim, H.R. Possible effects of radiofrequency electromagnetic field exposure on central nerve system. Biomol. Ther. 2019, 27, 265–275. [Google Scholar] [CrossRef]
Vienne-Jumeau, A.; Tafani, C.; Ricard, D. Environmental Risk Factors of Primary Brain Tumors: A Review. Rev. Neurol. 2019, 175, 664–678. [Google Scholar] [CrossRef] [PubMed]
Sridhar, R.; Richard, D.; Kyu, L.S. IEEE 802.15.7 visible light communication: Modulation and dimming support. IEEE Commun. Mag. 2012, 50, 72–82. [Google Scholar] [CrossRef]
IEEE Std 802.15.7-2011; IEEE Standard for Local and Metropolitan Area Networks—Part 15.7: Short-Range Wireless Optical Communication Using Visible Light. IEEE-SA: Piscataway, NJ, USA, 2011.
IEEE Std 802.15.7-2018; IEEE Standard for Local and Metropolitan Area Networks—Part 15.7: Short-Range Optical Wireless Communications. IEEE-SA: Piscataway, NJ, USA, 2018.
IEEE Std 802.15.7a-2024; IEEE Standard for Local and Metropolitan Area Networks—Part 15.7: Short-Range Optical Wireless Communications Amendment 1: Higher Rate, Longer Range Optical Camera Communication (OCC). IEEE-SA: Piscataway, NJ, USA, 2024.
Luo, P.; Zhang, M.; Ghassemlooy, Z.; Le Minh, H.; Tsai, H.-M.; Tang, X.; Png, L.C.; Han, D. Experimental Demonstration of RGB LED-Based Optical Camera Communications. IEEE Photonics J. 2015, 7, 7904212. [Google Scholar] [CrossRef]
Ong, Z.; Rachim, V.P.; Chung, W.-Y. Novel Electromagnetic-Interference-Free Indoor Environment Monitoring System by Mobile Camera Image Sensor based VLC. IEEE Photonics J. 2017, 9, 7907111. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, M.; Ren, X. Design and Implementation of Wireless Optical Access System for VLC-IoT Networks. J. Light. Technol. 2023, 41, 2369–2380. [Google Scholar] [CrossRef]
Tan, K.S.; Hinberg, I.; Wadhwani, J. Electromagnetic interference in medical devices: Health Canada’s past current perspectives and activities. In Proceedings of the IEEE International Symposium Electromagnetic Compatibility, Montreal, QC, Canada, 13–17 August 2001; pp. 1283–1284. [Google Scholar]
Ghatge, V.; Vanoost, D.; Kleihorst, R.; Pissoort, D. How to Assess EMI-Risk Acceptability Criteria in Medical Device EMC Risk Management. IEEE Lett. Electromagn. Compat. Pract. Appl. 2025, 7, 35–43. [Google Scholar] [CrossRef]
Haas, H.; Yin, L.; Wang, Y.; Chen, C. What is LiFi? J. Light. Technol. 2015, 34, 1533–1544. [Google Scholar] [CrossRef]
Ali, A.Y.; Zhang, Z.; Zong, B. Pulse position and shape modulation for visible light communication system. In Proceedings of the International Conferences Electromagnetics Advanced Application, Palm Beach, FL, USA, 3–8 August 2014; pp. 546–549. [Google Scholar]
Videv, S.; Haas, H. Practical space shift keying VLC system. In Proceedings of the IEEE Wireless Communication Networking Conferences, Istanbul, Turkey, 6–9 April 2014; pp. 405–409. [Google Scholar]
Deng, P.; Kavehrad, M. Real-time software-defined single-carrier QAM MIMO visible light communication system. In Proceedings of the Integrated Communications Navigation and Surveillance (ICNS), Herndon, VA, USA, 19–21 April 2016; pp. 5A3-1–5A3-11. [Google Scholar]
Cai, H.B.; Zhang, J.; Zhu, Y.J.; Zhang, J.K.; Yang, X. Optimal constellation design for Indoor MIMO visible light communications. IEEE Commun. Lett. 2016, 20, 264–267. [Google Scholar] [CrossRef]
Zia-Ul-Mustafa, R.; Le Minh, H.; Ghassemlooy, Z.; Zvánovec, S.; Younus, O.I.; Li, X.; Pham, A.T. A Novel Uplink Positioning and SVD-Based Physical Layer Security Scheme for VLC Systems. IEEE J. Sel. Areas Commun. 2025, 43, 1706–1720. [Google Scholar] [CrossRef]
Sejan, M.A.S.; Rahman, M.H.; Aziz, M.A.; Kim, D.-S.; You, Y.-H.; Song, H.-K. A Comprehensive Survey on MIMO Visible Light Communication: Current Research, Machine Learning and Future Trends. Sensors 2023, 23, 739. [Google Scholar] [CrossRef]
Palitharathna, K.W.S.; Skouroumounis, C.; Krikidis, I. Liquid Lens-Based Imaging Receiver for MIMO VLC Systems. IEEE Trans. Commun. 2025, 73, 11663–11678. [Google Scholar] [CrossRef]
Nguyen, H.; Utama, I.B.K.Y.; Jang, Y.M. Enabling Technologies and New Challenges in IEEE 802.15.7 Optical Camera Communications Standard. IEEE Commun. Mag. 2023, 62, 90–95. [Google Scholar] [CrossRef]
Nguyen, H.; Thieu, M.D.; Pham, T.L.; Nguyen, H.; Jang, Y.M. The Impact of Camera Parameters on Optical Camera Communication. In Proceedings of the 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 11–13 February 2019; pp. 526–529. [Google Scholar]
Ayyash, M.; Elgala, H.; Khreishah, A.; Jungnickel, V.; Little, T.; Shao, S.; Rahaim, M.; Schulz, D.; Hilt, J.; Freund, R. Coexistence of WiFi and LiFi toward 5G: Concepts, opportunities, and challenges. IEEE Commun. Mag. 2016, 54, 64–71. [Google Scholar] [CrossRef]
Nguyen, V.H.; Thieu, M.D.; Nguyen, H.; Jang, Y.M. Design and Implementation of the MIMO–COOK Scheme Using an Image Sensor for Long-Range Communication. Sensors 2020, 20, 2258. [Google Scholar] [CrossRef]
Nguyen, H.; Jang, Y.M. Design of MIMO C-OOK using Matched filter for Optical Camera Communication System. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021. [Google Scholar]
Yu, Q.; Wang, B.; Su, Y. Object Detection-Tracking Algorithm for Unmanned Surface Vehicles Based on a Radar-Photoelectric System. IEEE Access 2021, 9, 57529–57541. [Google Scholar] [CrossRef]
Lin, H.; Si, J.; Abousleman, G.P. Region-of-interest detection and its application to image segmentation and compression. In Proceedings of the 2007 International Conference on Integration of Knowledge Intensive Multi-Agent Systems, Waltham, MA, USA, 30 April–3 May 2007. [Google Scholar]
Yan, C.; Chen, W.; Chen, P.C.Y.; Kendrick, A.S.; Wu, X. A new two-stage object detection network without RoI-Pooling. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 1680–1685. [Google Scholar]
Zhang, H.; Gao, L.; Gong, Y.; Liu, H.; Zhu, Y.; Yang, Y. RTF-SAW-YOLOv11: A Bolt Defect Detection Model for Power Transmission Lines Under Low-Light Conditions. IEEE Access 2025, 13, 138640–138659. [Google Scholar] [CrossRef]
Luo, C.; Tang, H.; Li, S.; Wan, G.; Chen, W.; Guan, J. YOLOv11s-CD: An Improved YOLOv11s Method for Catenary Dropper Fault Detection. IEEE Trans. Instrum. Meas. 2025, 74, 5043410. [Google Scholar] [CrossRef]
Xue, Z.; Kong, L.; Wu, H.; Chen, J. Fire and Smoke Detection Based on Improved YOLOV11. IEEE Access 2025, 13, 73022–73040. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, A.; Sun, X.; Sun, Z. Enhanced YOLOv11-Based River Aerial Image Detection Research. IEEE Geosci. Remote Sens. Lett. 2025, 22, 8002405. [Google Scholar] [CrossRef]
Lee, H.; Lee, S.H.; Quek, T.Q.S.; Lee, I. Deep Learning Framework for Wireless Systems: Applications to Optical Wireless Communications. IEEE Commun. Mag. 2019, 57, 35–41. [Google Scholar] [CrossRef]
Wu, H.; Chen, Z.; Geng, X.; Zhao, Y.; Liu, Z. CRS-Based Joint CFO and Channel Estimation Using Deep Learning in OFDM-Based Vehicular Communication Systems. IEEE Trans. Wirel. Commun. 2025, 24, 3882–3893. [Google Scholar] [CrossRef]
Kong, M.; Pan, Y.; Zhou, H.; Yu, R.; Le, X.; Yuan, H.; Wang, R.; Yang, Q. Deep Learning-Based Acquisition Pointing and Tracking for Underwater Wireless Optical Communication. IEEE Photonics Technol. Lett. 2024, 37, 555–558. [Google Scholar] [CrossRef]
Jia, B.; Ge, W.; Cheng, J.; Du, Z.; Wang, R.; Song, G.; Zhang, Y.; Cai, C.; Qin, S.; Xu, J. Deep Learning-Based Cascaded Light Source Detection for Link Alignment in Underwater Wireless Optical Communication. IEEE Photonics J. 2024, 16, 7801512. [Google Scholar] [CrossRef]

Figure 1. System architecture of proposed scheme based on deep learning.

Figure 2. (a) Deep learning detection and tracking; (b) RoI detection and tracking.

Figure 3. C-OOK waveforms with: (a) short distance and (b) long distance.

Figure 4. Deep learning for decoder.

Figure 5. The relationship between pixel intensity and Pixel Eb/N0.

Figure 6. (a) Merging packet algorithm. (b) Missing packet detection algorithm.

Figure 7. The demonstration setup.

Figure 8. Quantized intensity profile of MIMO-COOK signals.

Figure 9. The receiver interfaces.

Figure 10. BER values versus distances performance of the proposed modulation with/without DL-based decoder for a velocity of 3 m/s.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nguyen, D.T.; Nguyen, T.; Thieu, M.D.; Nguyen, H. A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks. Photonics 2026, 13, 163. https://doi.org/10.3390/photonics13020163

AMA Style

Nguyen DT, Nguyen T, Thieu MD, Nguyen H. A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks. Photonics. 2026; 13(2):163. https://doi.org/10.3390/photonics13020163

Chicago/Turabian Style

Nguyen, Duy Thong, Trang Nguyen, Minh Duc Thieu, and Huy Nguyen. 2026. "A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks" Photonics 13, no. 2: 163. https://doi.org/10.3390/photonics13020163

APA Style

Nguyen, D. T., Nguyen, T., Thieu, M. D., & Nguyen, H. (2026). A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks. Photonics, 13(2), 163. https://doi.org/10.3390/photonics13020163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Learning-Enhanced MIMO C-OOK Scheme for Optical Camera Communication in Internet of Things Networks

Abstract

1. Introduction

2. Contributions

3. System Architecture

3.1. DL-Based LED Detection and Tracking

3.2. Deep Learning-Based Data Decoder

4. Results

4.1. Fundamentals of OOK Modulation

4.2. Proposed Modulation

4.3. Demonstration Results

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI