Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Learning-Based Optical Camera Communication with a 2D MIMO-OOK Scheme for IoT Networks

Electronics 2025, 14(15), 3011; https://doi.org/10.3390/electronics14153011

by Huy Nguyen

and Yeng Min Jang^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Electronics 2025, 14(15), 3011; https://doi.org/10.3390/electronics14153011

Submission received: 24 June 2025 / Revised: 23 July 2025 / Accepted: 24 July 2025 / Published: 29 July 2025

(This article belongs to the Special Issue Advances in Optical Communications and Optical Networks)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents an Optical Camera Communication (OCC) system for Internet of Things (IoT) applications using a 2D MIMO-OOK scheme. The system leverages a compact LED matrix as a transmitter and supports both global and rolling shutter cameras—including CCTV cameras—as receivers. It features key mechanisms such as Region of Interest (RoI)-based anchor detection, preamble signaling, and a sequence number (SN)-based data structure to handle frame-rate variation. A Deep Learning-based decoder is used to improve decoding accuracy. The experimental results demonstrate communication across distances up to 20 meters with up to 20 simultaneous links using 8×8 and 16×16 LED arrays. The work is timely and relevant, particularly for RF-free environments.

Despite the paper’s strengths, there are several critical issues that must be addressed:

The Deep Learning component is under-described. There is no mention of the model architecture, number of layers, activation functions, training procedure, dataset size, loss function, or accuracy metrics. Without this, the method cannot be replicated or validated.
Experimental results are limited. BER is only presented for a single distance (8 meters), and the impact of different distances or lighting conditions on communication quality is not provided.
Several figures lack essential information. Specifically, Figures 3 and 6 lack axis labels, units, and statistical representation (e.g., variance, multiple trials). Equation (1), which defines SNR, is not rendered correctly and is unreadable in its current form.
Some system components lack justification. For example, the choice of the preamble pattern “011100” and the method for rotation handling using matrix transpose are not empirically validated or compared with alternatives.
Language and formatting issues persist throughout the manuscript, including hyphenation artifacts (“Us-ing”, “expo-sure”, etc.) and informal expressions (“tweak a few parameters”, etc.), which reduce clarity and professionalism.

Here’s some specific comments:

Abstract:

Reword to include specific performance outcomes (e.g., “20 communication links over 20 meters with CCTV-compatible cameras”).
Fix formatting artifacts (e.g., “Us-ing”, “com-pact”).

Section 1 (Introduction):

Background is generally well-structured.
The claim that OCC can reach 150 meters (Lines 80–82) should be clearly attributed to the cited reference [17], and the conditions (e.g., type of camera, modulation method) should be briefly described.

Section 2 (Contributions):

Clearly lists the system’s advantages. However, the contributions should be more formally structured, using complete sentences instead of bullet fragments.

Section 3 (System Architecture):

The RoI anchor detection is explained correctly.
Figure 1 provides a useful schematic, but each block (e.g., SN inserting, OOK mapping, Deep Learning decoder) should be briefly explained in the accompanying text.
The preamble “011100” is defined, but its selection rationale is not provided. Is it optimal in terms of detection rate or synchronization?

Section 4 (Implementation Results):

Figure 3: Add axis labels, units (e.g., SNR in dB), and mention whether the values are averaged or based on single captures.
Equation (1) is unreadable due to formatting. Reformat clearly using inline math or standard notation.
Figure 9: Shows BER with and without Deep Learning at 8 meters. Add similar plots at different distances to validate long-range performance.
Table 1: Consider expanding this to include test conditions, actual throughput, number of links, and exposure times used.

Section 5 (Conclusion):

The conclusion summarizes the work clearly.
Recommend adding a short note on limitations (e.g., indoor testing, limited distance range) and a sentence on future extensions (e.g., mobile receivers, real-time OCC streaming).

To sum up, the manuscript presents a practical and valuable contribution to the OCC field for IoT, combining known elements in a system-oriented implementation that supports multi-link and mobility. However, the paper requires:

Complete Deep Learning model specification
Expanded and statistically supported experimental results
Corrected figure formatting and equation rendering
Justification of design choices
Full proofreading and language polishing

Once these revisions are made, the paper would merit strong consideration for publication.

Comments on the Quality of English Language

The manuscript is understandable overall, but the English language requires significant improvement to meet publication standards. There are several recurring issues, including:

Incorrect hyphenation from line breaks (e.g., “Us-ing,” “expo-sure,” “com-pact”)
Grammatical errors and sentence structure problems
Informal or unclear expressions (e.g., “tweak a few parameters,” “peoples assume”)
Inconsistent verb tenses and phrasing
Occasional lack of clarity in technical descriptions

A thorough professional language edit is strongly recommended to improve readability, academic tone, and clarity throughout the manuscript.

Author Response

Reply to the Review Report (Reviewer 1)

First of all, we are grateful for the consideration of the review of our work. All the comments, we received on this study have been taken into account in improving the quality of the article, and we present our reply to each of them separately as follows. We hope the replies are detailed to explain our research.

Despite the paper’s strengths, there are several critical issues that must be addressed:

The Deep Learning component is under-described. There is no mention of the model architecture, number of layers, activation functions, training procedure, dataset size, loss function, or accuracy metrics. Without this, the method cannot be replicated or validated.

Thank you for your suggestions. We updated more detail about DL model based on your suggestion.

“

This SN helps synchronize the packet transmission rate with the camera's frame capture rate, and its length is adjusted accordingly based on this relationship. After identifying the objects, OCC signal was designated after the LED region using a down-sampling. 2D MIMO-OOK data was then distinguished by extracting the central intensity of each LED by YOLOv11 [31-34]. A dataset of 10,000 samples, including both preamble and payload segments, was collected using global-shutter and rolling-shutter cameras at varying distances (ranging from 1 to 20 meters) and speeds. To prevent overfitting, we applied a simple deep learning neural network as revealed in Figure 5. The accuracy of this model achieved more than 95%. Once the preamble was detected, we were able to accurately decode the 2D-MIMO signals, enhancing the performance of the OCC system in mobile scenarios compared to traditional methods. It was observed that using six or more hidden layers could lead to overfitting, reducing the accuracy on test data. Besides that, we apply deep learning model for decoding data. By apply DL for detecting the preamble position and threshold value between bits “0” and “1” as shown in Figure 5, we may improve OCC performance. For deep learning based on decoder, We used a basic deep learning neural network model with two hidden layers to avoid overfitting the model. To reduce the interference between two LEDs in LED array, the distance between two LEDs and the communication distance should be considered. Due to the camera process based on the pixel in image, then to clarify multiple LEDs in array, the distance between each LED should higher than 1 pixel in image.

”

Experimental results are limited. BER is only presented for a single distance (8 meters), and the impact of different distances or lighting conditions on communication quality is not provided.

Thank you for your suggestions. In Figure 3, we demonstrated SNR measurement considering different distance and exposure time. From that, we can understand the relationship among SNR, distance, and exposure time. In Figure 9, we measured Bit error rate considering the different distances with/without Deep Learning model.

“

Figure 4 illustrations the frame structure of the 2D MIMO-OOK system. The four outermost corners of the LED matrix serve as anchor points for corner recognition. Using their coordinates, the positions of all LEDs in the array can be determined through perspective transformation. In this system, an 8×8 LED array is applied, with 40 LEDs allocated for information communication. As illustrated in Figure 4, four corner LEDs act as anchors, with anchor position 3 specifically enhancing rotational robustness at the receiver. Anchor 4 will be the special anchor for support rotation effect. With anchor 1-3, we just use only single LED with ON status, and three LEDs is OFF status. However, with anchor 4, we use with 4 LEDs is ON status. It is the special corner to support rotation effect. Moreover, 16 LEDs at the anchor positions function as training signals, enabling the camera to distinguish between the ON and OFF states. Each frame includes a preamble to help the receiver identify the frame's starting point. To accommodate variations in frame rate, SN is incorporated in each packet. This SN helps synchronize the packet transmission rate with the camera's frame capture rate, and its length is adjusted accordingly based on this relationship. After identifying the objects, OCC signal was designated after the LED region using a down-sampling. 2D MIMO-OOK data was then distinguished by extracting the central intensity of each LED by YOLOv11 [31-34]. A dataset of 10,000 samples, including both preamble and payload segments, was collected using global-shutter and rolling-shutter cameras at varying distances (ranging from 1 to 20 meters) and speeds. To prevent overfitting, we applied a simple deep learning neural network as revealed in Figure 5. The accuracy of this model achieved more than 95%. Once the preamble was detected, we were able to accurately decode the 2D-MIMO signals, enhancing the performance of the OCC system in mobile scenarios compared to traditional methods. It was observed that using six or more hidden layers could lead to overfitting, reducing the accuracy on test data. Besides that, we apply deep learning model for decoding data. By apply DL for detecting the preamble position and threshold value between bits “0” and “1” as shown in Figure 5, we may improve OCC performance. For deep learning based on decoder, We used a basic deep learning neural network model with two hidden layers to avoid overfitting the model. To reduce the interference between two LEDs in LED array, the distance between two LEDs and the communication distance should be considered. Due to the camera process based on the pixel in image, then to clarify multiple LEDs in array, the distance between each LED should higher than 1 pixel in image.

…………………….

Figure 9 presents the implementation results using a PointGrey camera across various distances and with/without Deep Learning. The demonstration shows that deep learning decoder plays a critical role in the performance of proposed schemes. Even under identical communication distances and noise conditions, the Bit Error Rate (BER) with Deep learning has good performance compared non-deep learning decoder. To reduce BER or extend communication range, channel coding techniques should be applied to enhance system reliability. Main system parameters are summarized in Table 1. As show in Figure 9, we can see that, we can achieve BER of with deep learning model considering processing time of 0.025s and achieve BER of without deep learning model at distance of 8 m considering processing time of 0.02s. Then the Deep Learning model is good candidate to improve OCC performance. In the future, intelligent multi-modal sensing-communication integration [35, 36] will be a feasible path that can facilitate the design of the OCC system and the 6G and beyond 6G system. By applying communication integrated with sensing model, we may improve the performance of the next generation communication system. The implementation, which employs a 16×16 LED matrix with 20 links and 8×8 LED matrix with 10 links considering mobility environment, successfully achieves communication at a distance of 20 meters, as demonstrated in the Supplementary Materials.

”

Several figures lack essential information. Specifically, Figures 3 and 6 lack axis labels, units, and statistical representation (e.g., variance, multiple trials). Equation (1), which defines SNR, is not rendered correctly and is unreadable in its current form.

Thank you for your suggestions. We updated based on your suggestion

Figure 3 with the axis SNR [dB] and Amplitude of pixel. Figure 6 with the axis X, Y, Z, and Intensity of pixel.

Equation (1) shown the equation to measure SNR values. A denotes the signal power from the LED, corresponding to the pixel amplitudes collected when the LED is “on”. B denotes the background noise, measured as the pixel amplitude when the LED is “off”.

“

SNR measurements were conducted at distances of 20, 15, 10, and 5 meters, as shown in Figures 3. The SNR was considered based on the two operating states of the LED. When the LED is ON, the SNR reflects the signal power emitted by the LED. Equally, when the LED is OFF, the SNR represents the background noise level. The SNR (dB) can be determined as equation:

(1)

From that, A denotes the signal power from the LED, corresponding to the pixel amplitudes collected when the LED is “on”. B denotes the background noise, measured as the pixel amplitude when the LED is “off”. The variable n indicates the number of samples collected. At short distances, the pixel amplitudes are higher due to stronger signal strength, whereas at greater distances, the amplitudes decrease as the signal weakens.

”

Figure 3. SNR measurement considering different distance and exposure time

Figure 6. The quantized intensity profile of 8x8 LED array at 2 m and 4 m with different exposure time setting

Some system components lack justification. For example, the choice of the preamble pattern “011100” and the method for rotation handling using matrix transpose are not empirically validated or compared with alternatives.

Thank you for your suggestions. We revised based on your comment.

As shown in Figure 4, Anchor 4 will be the special anchor for support rotation effect. With anchor 1-3, we just use only single LED with ON status, and three LEDs is OFF status. However, with anchor 4, we use with 4 LEDs is ON status. It is the special corner to support rotation effect.It is the special corner to support rotation effect. With 2D decoder, the LED matrix will be rotation 30, 90, or 270 degree. If we did not support rotation effect, the data will be decoded wrong.

Example of rotation effect

“

”

Language and formatting issues persist throughout the manuscript, including hyphenation artifacts (“Us-ing”, “expo-sure”, etc.) and informal expressions (“tweak a few parameters”, etc.), which reduce clarity and professionalism.

“Us-ing”, “expo-sure”, etc. were shown in manuscript based on the format of Electronics Journal then we can not revise.

“

When it comes to rolling shutter cameras, we need to take into account both the rolling shutter speed and the frame rate. It is also important to control several parameters, such as exposure time, camera focal length, and signal-to-noise ratio (SNR), in order to increase the communication range. Currently, Li-Fi technology can achieve a maximum transmission distance of approximately 10 meters by incorporating photodiode lenses [19].

”

Here’s some specific comments:

Abstract:

Reword to include specific performance outcomes (e.g., “20 communication links over 20 meters with CCTV-compatible cameras”).

“

Radio frequency (RF)-based wireless systems are broadly used in communication systems such as mobile networks, satellite links, and monitoring applications. These systems offer outstanding advantages over wired systems, particularly in terms of ease of installation. However, researchers are looking on safer alternatives as a result of worries about the possible health problems connected to high-frequency radiofrequency transmission. Using the visible light spectrum is one promising approach; three cutting-edge technologies are emerging in this regard: Optical Camera Communication (OCC), Light Fidelity (Li-Fi), and Visible Light Communication (VLC). In this paper, we propose a Multiple Input Multiple Output (MIMO) modulation technology for Internet of Things (IoT) applications, utilizing an LED array and time-domain on-off keying (OOK). The proposed system is compatible with both rolling shutter and global shutter cameras, including commercially available models such as CCTV, webcam, smart cammera cameras commonly deployed in buildings and industrial environments. Despite the compact size of the LED array, we demonstrate that, by optimizing parameters such as exposure time, camera focal length, and channel coding, our system can achieve up to 20 communication links over a 20-meter distance with low bit error rate.

”

Fix formatting artifacts (e.g., “Us-ing”, “com-pact”).

Section 1 (Introduction):

Background is generally well-structured.

The claim that OCC can reach 150 meters (Lines 80–82) should be clearly attributed to the cited reference [17], and the conditions (e.g., type of camera, modulation method) should be briefly described.

“

The communication range is quite short, making it more suitable for indoor applications. In outdoor environments, channel distortions hinder the photodiode's ability to effectively receive LED signals. Besides that, OCC, utilizes image sensors as receivers, can operate over significantly longer distances up to 150 meters [17]. Research [18] has highlighted the impact of different types of cameras on OCC system. In

”

Section 2 (Contributions):

Clearly lists the system’s advantages. However, the contributions should be more formally structured, using complete sentences instead of bullet fragments.

“

In this study, authors recommend an Optical Camera Communication (OCC) system for IoT applications using a MIMO technology with an LED matrix. This approach is well-suited with most commercial cameras existing on the market. The key advantages of the proposed scheme are outlined below:

Compatibility with Various Camera Types: The system supports well-suited with most commercial cameras existing on the market by appropriately adjusting the exposure time.
Rotation Support: The scheme ensures full 360-degree rotation support by applying a matrix transpose. By anchor positioning the four corners of the LED matrix, cameras can accurately distinguish and process rotation.
Frame Rate Variation Handling and Data Merging Algorithm: Frame rate variation is a significant challenge in OCC systems, it often leading to packet loss during data decoding at the receiver. While many peoples assume that a camera’s labeled frame rate (e.g., 30fps, 60fps) remains non-change, fluctuations can cause synchronization issues between the transmitter and receiver. The system embeds a sequence number (SN) in each sub-packet, which indicates it’s the sequence within the data stream. By adjusting the SN length according to the data packet size, OCC system can be optimized.
Missing Packet Detection: To efficiently reconstruct images from consecutive packet transmissions, the system employs SN within each packet. By comparing SNs across successive images, the receiver can easily identify and compensate for almost missing packets.
Multiple link process: by applying Deep learning, we can process multiple users in same time with high speed and long distance. With Deep Learning for object detection and decode data, we can achieve very good performances (data rate: 15.360 kbps, distance up to 25m, and 20 links support considering mobility environment).

”

Section 3 (System Architecture):

The RoI anchor detection is explained correctly.

Figure 1 provides a useful schematic, but each block (e.g., SN inserting, OOK mapping, Deep Learning decoder) should be briefly explained in the accompanying text.

The preamble “011100” is defined, but its selection rationale is not provided. Is it optimal in terms of detection rate or synchronization?

“

”

Section 4 (Implementation Results):

Figure 3: Add axis labels, units (e.g., SNR in dB), and mention whether the values are averaged or based on single captures.

Equation (1) is unreadable due to formatting. Reformat clearly using inline math or standard notation.

Figure 9: Shows BER with and without Deep Learning at 8 meters. Add similar plots at different distances to validate long-range performance.

Table 1: Consider expanding this to include test conditions, actual throughput, number of links, and exposure times used.

“

Figure 3. SNR measurement considering different distance and exposure time

(1)

4.2. 2D MIMO-OOK technology

Table 1. Characteristic parameters of 2D MIMO-OOK scheme in lab environment

Transmitter
LED types	8x8	16x16
FEC	Reed Solomon (15,11)
Receiver
Camera	PointGrey camera
Frame Rate	60 fps
Data rate (kbps)
Uncode bit rate	3.840	15.360
Code bit rate	2.816	11.264

”

Section 5 (Conclusion):

The conclusion summarizes the work clearly.

Recommend adding a short note on limitations (e.g., indoor testing, limited distance range) and a sentence on future extensions (e.g., mobile receivers, real-time OCC streaming).

“

This paper proposes a monitoring system for Optical Camera Communication (OCC) based on the 2D MIMO-OOK scheme. The system utilizes a custom-designed spatial frame format with an LED array, allowing for rotation support—an essential capability for accurate 2D code recognition. To address frame rate variation and enable the reconstruction of large data packets from multiple images, each transmitted packet includes a SN. SNR measurements were conducted under different distances and exposure times in the indoor environment to examine the interdependence among some key parameters: distance, exposure time, and SNR. While increased exposure time can improve SNR, this leads to a trade-off with reduced bandwidth, requiring precise calibration. Besides that, by applying deep learning model, our proposed scheme can support for mobility environment with YOLOv11 to LED matrix detection and low BER with Deep Learning decoder considering long distance and multiple users.

”

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript, the authors propose and demonstrate a 20-meter MIMO optical camera communication system. The following comments are provided for improvement:

The structure of the manuscript could be improved:
1. The introduction about VLC systems can be shortened, as there are some repeated contents, particularly between Line 46 and Line 72
2. In Line 109, the phrase “In this study” should begin a new paragraph to clearly separate the contributions of this work from other literature.
3. In Section 2, titled "Contribution," but only the advantages of the proposed scheme are discussed. It is essential to include specific performance metrics, such as data rate and transmission distance, to support the claims made.
The authors mention parameters like exposure time, camera focal length, channel coding, and SNR in the abstract and Section 1, but no results regarding focal length are presented. Is the 'focal length' actually ‘transmission distance’?
In the MIMO OCC system, it should be clarified whether the modulation signals for each LED are independent and how interference from adjacent channels is managed.
Further details about the deep-learning method used in the study should be provided.
Abbreviations, such as CCTV mentioned in Line 19, should be explained upon their first appearance.
There are some grammatical errors to address, such as in Line 64, where it states, “LEDs are a promising next-generation for light system”.

Author Response

Reply to the Review Report (Reviewer 2)

In this manuscript, the authors propose and demonstrate a 20-meter MIMO optical camera communication system. The following comments are provided for improvement:

The structure of the manuscript could be improved:
The introduction about VLC systems can be shortened, as there are some repeated contents, particularly between Line 46 and Line 72
In Line 109, the phrase “In this study” should begin a new paragraph to clearly separate the contributions of this work from other literature.
In Section 2, titled "Contribution," but only the advantages of the proposed scheme are discussed. It is essential to include specific performance metrics, such as data rate and transmission distance, to support the claims made.

Thank you for your suggestions. We will update based on your suggestions

“

Compared to RF technologies, OWC offers some advantages:

Safety: Unlike RF waves, visible light waves do not pose health risks to humans [5]. RF waveform, however, can have harmful effects and may also cause system performance issues due to interference of electromagnetic.
Higher bandwidth: The bandwidth of light waveform is over 1,000 times greater than that of RF, making it a more efficient for data transmission.
Secure and efficient transmission: Visible light waves offer high safety and more efficient communication.

Recognizing the potential of OWC, many companies have invested significant resources in research and development to enhance this technology. The IEEE 802.15.7-2011 standard [6] presented OWC considering a simple method in 2011, primarily focusing on Visible Light Communication (VLC). After that, the IEEE 802.15.7-2018 standard [7] was proposed as an updated version, incorporating additional advancements in OWC. The IEEE 802.15.7a-2024 standard follows on enhancements to the physical (PHY) layer [4]. Due to advancements in manufacturing technology, LEDs are a promising next-generation for light system, offering several benefits such as long lifespan, low power consumption, and availability in various sizes and operational modes. Moreover, one key feature which makes LEDs ideal for OWC systems is their ability to switch on and off speedily [8], allowing high-speed data communication.

……………..

In [23], a color intensity modulation-MIMO technology was introduced, achieving data using a global shutter camera operating at 330 frames per second with a maximum distance of 1.4 meters. However, the global shutter camera is expensive and not broadly available. Moreover, applying colors for signal transmission introduces challenges compared to the On-Off Keying (OOK) scheme, such as limited transmission range and higher bit error rates. To address these issues, the authors in [24-26] developed MIMO scheme using LED array. Discrete Hartley Transform (DHT) IV algorithm is a mathematical transformation that, like the Discrete Fourier Transform. It is applied for image and video processing, and computer vision with low latency. While this method mitigates flicker effects, it has several disadvantages, including a short communication range (1.4 meters) and a lack of rotational support, which is critical for OCC-based IoT systems.

In this study, we suggest a monitoring system utilizing an OCC framework that integrates RoI signaling and MIMO techniques considering an LED array. This OCC scheme is compatible with both rolling shutter camera and global shutter camera, then it is particularly suitable for deployment with Closed-Circuit Television (CCTV) cameras, which are widely available. The system's adaptability to existing CCTV infrastructure enhances its applicability in environmental monitoring. By applying Deep Learning for 2D MIMO-OOK scheme, we achieve a communication range of up to 20 meters with 20 links simultaneously.

The remainder of study is presented as follows: Section 2 highlights the contributions of proposed approach. Section 3 presents the system architecture of Deep Learning-Based Optical Camera Communication. In section 4, the implementation results are shown with long distance and low data rate. Finally, section 5 will conclude this study.

………………

Contributions

Compatibility with Various Camera Types: The system supports well-suited with most commercial cameras existing on the market by appropriately adjusting the exposure time.
Rotation Support: The scheme ensures full 360-degree rotation support by applying a matrix transpose. By anchor positioning the four corners of the LED matrix, cameras can accurately distinguish and process rotation.
Frame Rate Variation Handling and Data Merging Algorithm: Frame rate variation is a significant challenge in OCC systems, it often leading to packet loss during data decoding at the receiver. While many peoples assume that a camera’s labeled frame rate (e.g., 30fps, 60fps) remains non-change, fluctuations can cause synchronization issues between the transmitter and receiver. The system embeds a sequence number (SN) in each sub-packet, which indicates it’s the sequence within the data stream. By adjusting the SN length according to the data packet size, OCC system can be optimized.
Missing Packet Detection: To efficiently reconstruct images from consecutive packet transmissions, the system employs SN within each packet. By comparing SNs across successive images, the receiver can easily identify and compensate for almost missing packets.
Multiple link process: by applying Deep learning, we can process multiple users in same time with high speed and long distance. With Deep Learning for object detection and decode data, we can achieve very good performances (data rate: 15.360 kbps, distance up to 25m, and 20 links support considering mobility environment)

”

The authors mention parameters like exposure time, camera focal length, channel coding, and SNR in the abstract and Section 1, but no results regarding focal length are presented. Is the 'focal length' actually ‘transmission distance’?

Thank you for your comment.

Focal length is the distance between a lens and its focal point, where light rays converge or diverge. It's a key property of lenses that determines magnification and angle of view in optical systems like cameras and the human eye.

Below figure shows the relationship between focal length and distance. With higher focal length, we may increase the distance. In my implementation, we just use focal length of 35mm, then the limitation distance of 25m. We already test with another focal length of 70mm, the distance up to 50m. In this paper, we just use 35mm for IoT environment with lower 20m, then we did not mention a lot related to focal length, but we also mention generally for readers can easily understand.

Relationship between the camera focal length and communication distance

In the MIMO OCC system, it should be clarified whether the modulation signals for each LED are independent and how interference from adjacent channels is managed.

Thank you for your suggestions. We will update based on your suggestion between two adgacent LEDs

“

Once the preamble was detected, we were able to accurately decode the 2D-MIMO signals, enhancing the performance of the OCC system in mobile scenarios compared to traditional methods. It was observed that using six or more hidden layers could lead to overfitting, reducing the accuracy on test data. Besides that, we apply deep learning model for decoding data. By apply DL for detecting the preamble position and threshold value between bits “0” and “1” as shown in Figure 5, we may improve OCC performance a lot. To reduce the interference between two LEDs in LED array, the distance between two LEDs and the communication distance should be considered. Due to the camera process based on the pixel in image, then to clarify multiple LEDs in array, the distance between each LED should higher than 1 pixel in image.

”

Further details about the deep-learning method used in the study should be provided.

Thank you for your suggestions. We will update based on your suggestion.

“

”

Abbreviations, such as CCTV mentioned in Line 19, should be explained upon their first appearance.

Thank you for your suggestions. We will update based on your suggestion.

“

The proposed system is compatible with both rolling shutter and global shutter cameras, including commercially available models such as CCTV, webcam, smart cammera cameras commonly deployed in buildings and industrial environments. Despite the compact size of the LED array, we demonstrate that, by optimizing parameters such as exposure time, camera focal length, and channel coding, our system can achieve up to 20 communication links over a 20-meter distance with high reliability.

“

There are some grammatical errors to address, such as in Line 64, where it states, “LEDs are a promising next-generation for light system”.

Thank you for your suggestions. We will update based on your suggestion.

“

Due to advancements in manufacturing technology, LEDs offering several benefits such as long lifespan, low power consumption, and availability in various sizes and operational modes compared the conventional light, then it is one candidate for the future lighting system instead of normal light in the market. Moreover, one key feature which makes LEDs ideal for OWC systems is their ability to switch on and off speedily [8], allowing high-speed data communication. At present, RF systems dominate various applications, including communication networks, monitoring systems, and radar technology. Nevertheless, RF signals produce electromagnetic interference (EMI) [9], which has been connected to possible health hazards, especially with regard to brain function [10]. In contrast, OWC technologies are emerging as a good solution, EMI-free substitute to RF-based communication systems [11].

”

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript presents a deep-learning based decoder method to decode optical camera signals contained in a 2D LED array. The following questions and comments should be addressed during revision:

Line 16, MIMO should be defined in the abstract.
Figure 3, data points for 5m seem to be missing for 250 and 300us exposure times, can the authors explain why?
Line 204, why use 4 LEDs at each corner as the anchor? Is using only one LED per corner not sufficient/increases the decoding error rate?
Both global and rolling shutter cameras were used in the experiments, did the authors notice any difference in the performance of the proposed deep learning decoding model for the two types of video inputs?
Section 4.2, the authors should provide a more detailed description on how the deep learning model is used to decode the signal, based on the current description, it seems like it is used to detect the preamble as well as to predict a proper threshold value for the thresholding algorithm to determine the on/off state for each LED, is this correct?
The authors should present some examples on a rotated LED array and a LED array pictured at a different view angle to show the robustness of the proposed algorithm. These two aspects were mentioned but no examples were given.
Figure 9, the decoding time for both schemes should also be reported.

Comments on the Quality of English Language

Line 184: "contained of" should be "consists of"

Author Response

Reply to the Review Report (Reviewer 3)

This manuscript presents a deep-learning based decoder method to decode optical camera signals contained in a 2D LED array. The following questions and comments should be addressed during revision:

Line 16, MIMO should be defined in the abstract.

Thank you for your suggestions. We will update based on your suggestions

“

In this paper, we propose a Multiple Input Multiple Output (MIMO) modulation technology for Internet of Things (IoT) applications, utilizing an LED array and time-domain on-off keying (OOK).

”

Figure 3, data points for 5m seem to be missing for 250 and 300us exposure times, can the authors explain why?

Thank you for your suggestions. With high exposure time, we did not measure because if the exposure time is high with short distance, the SNR will very high then we can not show in Figure 3. With 250 and 300us, the SNR values is high with 10m and 20 m. In this Figure, we want to explain that the relationship between distance, SNR, and exposure time.

Line 204, why use 4 LEDs at each corner as the anchor? Is using only one LED per corner not sufficient/increases the decoding error rate?

Yes we just apply 4 LEDs in the corner as the anchor. As QR code, we apply 4 corner as the anchor, from that, we can know the position of all LEDs in LED array. From that, we can decode data correctly.

About the decoding error rate, we have a lot of algorithms to reduce the bit error rate: good decoder, forward error correction, etc, Then we can achieve good performance with mobility environment.

Both global and rolling shutter cameras were used in the experiments, did the authors notice any difference in the performance of the proposed deep learning decoding model for the two types of video inputs?

Thank you for your comment.

Rolling and Global Shutter modes describe two distinct sequences through which the image may be read off an sCMOS sensor. In rolling shutter mode, different lines of the array are exposed at different times as the read out 'wave' sweeps through the sensor, whereas in global shutter mode each pixel in the sensor begins and ends the exposure simultaneously, analogous to the exposure mechanism of an interline CCD. However, absolute lowest noise and fastest non-synchronized frame rates are achieved from rolling shutter mode.

We already demonstrate with both of camera in the lab environment. By controlling exposure time, we may demonstrate the our work with same deep learning decoding model with both of cameras.

Section 4.2, the authors should provide a more detailed description on how the deep learning model is used to decode the signal, based on the current description, it seems like it is used to detect the preamble as well as to predict a proper threshold value for the thresholding algorithm to determine the on/off state for each LED, is this correct?

Thank you for your suggestions. We will update based on your suggestions

“

2D MIMO-OOK data was then distinguished by extracting the central intensity of each LED by YOLOv11 [31-34]. A dataset of 10,000 samples, including both preamble and payload segments, was collected using global-shutter and rolling-shutter cameras at varying distances (ranging from 1 to 20 meters) and speeds. To prevent overfitting, we applied a simple deep learning neural network as revealed in Figure 5. Once the preamble was detected, we were able to accurately decode the 2D-MIMO signals, enhancing the performance of the OCC system in mobile scenarios compared to traditional methods. It was observed that using six or more hidden layers could lead to overfitting, reducing the accuracy on test data. Besides that, we apply deep learning model for decoding data. By apply DL for detecting the preamble position and threshold value between bits “0” and “1” as shown in Figure 5, we may improve OCC performance a lot.

”

The authors should present some examples on a rotated LED array and a LED array pictured at a different view angle to show the robustness of the proposed algorithm. These two aspects were mentioned but no examples were given.

Thank you for your suggestions.

We will update to make easier for reader.

“

”

Example of rotation effect

Figure 9, the decoding time for both schemes should also be reported.

Thank you for your suggestions.

“

”

Author Response File: Author Response.docx

Reviewer 4 Report

Comments and Suggestions for Authors

The reviewer still has two minor comments.

The advantage of exploiting the DHT-IV algorithm should be highlighted.
In the future, intelligent multi-modal sensing-communication integration [R1, R2] seems to be a feasible path that can facilitate the design of the OCC system. The authors are recommended to discuss how to incorporate intelligent multi-modal sensing-communication integration into the design of the OCC system for IoT scenarios based on the proposed method.

[R1] "Real-time digital twins: Vision and research directions for 6G and beyond," IEEE Communications Magazine, vol. 61, no. 11, pp. 128-134, Nov. 2023

[R2] “A LiDAR-aided channel model for vehicular intelligent sensing-communication integration,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 12, pp. 20105-20119, Dec. 2024.

Author Response

Reply to the Review Report (Reviewer 3)

This manuscript presents a deep-learning based decoder method to decode optical camera signals contained in a 2D LED array. The following questions and comments should be addressed during revision:

Line 16, MIMO should be defined in the abstract.

Thank you for your suggestions. We will update based on your suggestions

“

In this paper, we propose a Multiple Input Multiple Output (MIMO) modulation technology for Internet of Things (IoT) applications, utilizing an LED array and time-domain on-off keying (OOK).

”

Figure 3, data points for 5m seem to be missing for 250 and 300us exposure times, can the authors explain why?

Line 204, why use 4 LEDs at each corner as the anchor? Is using only one LED per corner not sufficient/increases the decoding error rate?

About the decoding error rate, we have a lot of algorithms to reduce the bit error rate: good decoder, forward error correction, etc, Then we can achieve good performance with mobility environment.

Both global and rolling shutter cameras were used in the experiments, did the authors notice any difference in the performance of the proposed deep learning decoding model for the two types of video inputs?

Thank you for your comment.

We already demonstrate with both of camera in the lab environment. By controlling exposure time, we may demonstrate the our work with same deep learning decoding model with both of cameras.

Section 4.2, the authors should provide a more detailed description on how the deep learning model is used to decode the signal, based on the current description, it seems like it is used to detect the preamble as well as to predict a proper threshold value for the thresholding algorithm to determine the on/off state for each LED, is this correct?

Thank you for your suggestions. We will update based on your suggestions

“

”

The authors should present some examples on a rotated LED array and a LED array pictured at a different view angle to show the robustness of the proposed algorithm. These two aspects were mentioned but no examples were given.

Thank you for your suggestions.

We will update to make easier for reader.

“

”

Example of rotation effect

Figure 9, the decoding time for both schemes should also be reported.

Thank you for your suggestions.

“

”

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Great job and good luck with your ongoing research

Author Response

Thanks for your valuable comments.

Reviewer 2 Report

Comments and Suggestions for Authors

In the revised version, most of my questions have been addressed. However, there is still some minor issues.

The discussion about focal length is not included in the main context.
Some typo issues like 'cammera' in Line 20.

Author Response

Reply to the Review Report (Reviewer 2)

In the revised version, most of my questions have been addressed. However, there is still some minor issues.

The discussion about focal length is not included in the main context.

Thank you for your suggestions. We will update based on your comment.

“

Figure 7 depicts the experimental setup used to validate the 2D MIMO-OOK system, and Figure 8 presents the results obtained using 20 links. By using focal length 35mm, we may achieve 20 links at 25m distance with data rate up to 15 kbps. Focal length refers to the distance between a lens and its focal point, where light rays converge or diverge. It is a fundamental property of lenses that determines both magnification and the angle of view in optical systems, such as cameras and the human eye. To increase the communication distance, we may use higher focal length. The implementation outcomes are providing in Table 1. Deep Learning decoder was applied to improve OCC performance. To apply for mobility environment, YOLOv11 was applied to detect LED matrix. After receiving data, camera will distinguish the preamble and threshold prediction based on the Deep Learning model. In this study, we proposed Deep Learning for 2D MIMO-OOK for long range and mobility situation.

”

Some typo issues like 'cammera' in Line 20.

Thank you for your suggestions. We will update based on your comment.

“

Radio frequency (RF)-based wireless systems are broadly used in communication systems such as mobile networks, satellite links, and monitoring applications. These systems offer outstanding advantages over wired systems, particularly in terms of ease of installation. However, researchers are looking on safer alternatives as a result of worries about the possible health problems connected to high-frequency radiofrequency transmission. Using the visible light spectrum is one promising approach; three cutting-edge technologies are emerging in this regard: Optical Camera Communication (OCC), Light Fidelity (Li-Fi), and Visible Light Communication (VLC). In this paper, we propose a Multiple Input Multiple Output (MIMO) modulation technology for Internet of Things (IoT) applications, utilizing an LED array and time-domain on-off keying (OOK). The proposed system is compatible with both rolling shutter and global shutter cameras, including commercially available models such as CCTV, webcam, smart cameras commonly deployed in buildings and industrial environments. Despite the compact size of the LED array, we demonstrate that, by optimizing parameters such as exposure time, camera focal length, and channel coding, our system can achieve up to 20 communication links over a 20-meter distance with low bit error rate.

”

Author Response File: Author Response.docx

Article Menu

Deep Learning-Based Optical Camera Communication with a 2D MIMO-OOK Scheme for IoT Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI