Automated and Intelligent System for Monitoring Swimming Pool Safety Based on the IoT and Transfer Learning

: Recently, integrating the Internet of Things (IoT) and computer vision has been utilized in swimming pool automated surveillance systems. Several studies have been proposed to overcome o ﬀ -time surveillance drowning incidents based on using a sequence of videos to track human motion and position. This paper proposes an e ﬃ cient and reliable detection system that utilizes a single image to detect and classify drowning objects, to prevent drowning incidents. The proposed system utilizes the IoT and transfer learning to provide an intelligent and automated solution for o ﬀ -time monitoring swimming pool safety. In addition, a specialized transfer-learning-based model utilizing a model pretrained on “ImageNet”, which can extract the most useful and complex features of the captured image to di ﬀ erentiate between humans, animals, and other objects, has been proposed. The proposed system aims to reduce human intervention by processing and sending the classiﬁcation results to the owner’s mobile device. The performance of the specialized model is evaluated by using a prototype experiment that achieves higher accuracy, sensitivity, and precision, as compared to other deep learning algorithms.


Introduction
The rapid development in both Internet of Things technologies and artificial intelligence algorithms has attracted the attention of many researchers seeking to design and develop intelligent systems. The Internet of Things enables millions of different objects and devices to communicate and exchange information over the internet. A Cisco report predicts that 50% of the 29.1 billion network devices will support various IoT applications by 2023 [1,2]. These devices and objects will generate large amounts of data that can be utilized to provide a better solution and enhance both the efficiency and sustainability. Thus, Internet of Things technologies are developed to address and solve challenging tasks in our environments, industries, cities, homes, and society by processing this massive amount of data collected from potentially millions of internet-connected devices and sensors [3]. Recently, machine-learning algorithms that can extract hidden and complex features from collected information have been widely exploited [4], to enhance the performance of various IoT systems [5]. These technologies can be merged together to model, integrate, and deploy intelligent solutions to serve human needs and enhance the quality of life. A number of IoT applications and services have utilized machine learning (ML) techniques to enrich their intelligent decision-making capabilities in different domains, e.g., smart cities, smart healthcare, smart homes, and smart transportation [4,6]. A smart application based on IoT and ML can be designed to provide an intelligent swimming pool management system because most swimming pool management is based on the traditional method of hiring a lifeguard to observe the water level and drowning incidents. This is a challenging issue for swimming-pool owners with limited budgets. In 2016, approximately 230,000 people died from drowning worldwide, which makes drowning the third leading cause of unintentional injury and death. Therefore, integrating IoT and ML technologies can play a huge role in automating the monitoring of swimming pool management systems, to reduce the occurrence of these incidents [7,8]. To fully automate a smart monitoring system without human involvement, computer vision techniques, which allow machines to see and understand the world through image capturing and processing, should be utilized and implemented. Image processing has received great attention in the computer vision field due to its ability to solve various problems, e.g., image segmentation, image detection, image enhancement, and image recognition. In recent years, machine learning (ML) has led to significant improvements in image processing, by achieving an impressive performance on detection and classification tasks that sometimes exceeds human performance [9]. Machine learning algorithms require sufficiently labeled training data to perform well on a particular vision task, because limited labeled data may lead to low performance.
In this study, only a small amount of label data is available; thus, transfer learning is utilized to enhance the model performance by utilizing a model pretrained on ImageNet. The collected data from different physical devices were processed by using ML techniques, to generate an action value. This value is transferred back to the system, in order to make a decision or to take an action. Thus, this study utilizes transfer learning (TL) algorithms to analyze and classify the data, in order to prevent drowning incidents and to measure the water level.
The main contributions of this study are summarized as follows: • We introduce a novel intelligent system for off-time monitoring swimming pools based on IoT and transfer learning.

•
We propose a specialized neural network based on the ResNet50 (residual network with 50 layers) architecture that can utilize a single image to detect and classify drowning humans from other objects.

•
The method achieves an accuracy of 99% on a collected dataset and outperforms existing transfer learning algorithms, such the DCNN (deep convolution neural network), Xception (Extreme Inception), VGG16 (Visual Geometry Group), and VGG19.

•
The efficiency and robustness of the proposed system are evaluated through several experimental analyses.
The main contribution of this study is that it develops an automated and intelligent system for monitoring swimming pools, to prevent drowning incidents. The remainder of this paper is organized as follows: Section 2 discusses the previous studies related to the proposed system, and Section 3 introduces the hardware requirements and transfer-learning background. The proposed methodology and system architecture are presented in Section 4. Section 5 demonstrates and discusses the experimental results. Finally, Section 6 concludes this study and discusses future works.

Related Work
The existing drowning detection methods can be generally divided into vision-based detection systems and wearable sensor-based detection systems. Vision-based detection systems can be further characterized based on the image acquisition location: underwater cameras and overhead water cameras. In this section, only vision-based detection systems are explored and discussed in detail. Zhang et al. [10] proposed a camera-based drowning detection framework utilizing video sequences acquired through underwater cameras. A swimmer is detected by using the background subtraction technique, which is followed by an interframe-based denoising scheme to remove the detection noise. Alshbatat et al. [11] proposed an automated vision-based surveillance system to prevent drowning accidents. Their system consists of a Raspberry Pi, two Pixy cameras, an Arduino Nano board, stepper motors, an alarm system, and motor drivers. They used two cameras to detect and track the swimmers by calculating the swimmers' positions, and the swimmers were required to wear passive yellow vests. In References [12], the authors introduced a near-drowning early prediction technique, called NEPTUNE, using novel equations. It combines statistical image processing and Electronics 2020, 9,2082 3 of 13 k-means clustering to process image frames and extract the segmentation of the drowning object. In Reference [13], the authors proposed an improved VIBE drowning-person-detection algorithm that processes a sequence of video images captured through a camera installed above the swimming pool's surface. The swimmer's motion and position are tracked by using an improved visual background extraction algorithm. Wong et al. [14] introduced an off-time swimming pool surveillance system to detect moving human and water activity, using a thermal imaging system. This system consists of two sub-algorithms used to effectively detect an intruder inside and outside the swimming pool by dividing the images into two regions to perform the detection: head detection in both regions and water activity in only the second region. Fei et al. [15] presented a drowning detection system utilizing a set of video frames obtained through an underwater camera. The authors employed the background subtraction method to detect moving objects. Recently, Fazanes et al. [16] studied and analyzed the visible behavior of drowning persons, using drowning videos. These videos are observed and analyzed by both international water-safety experts and the Lince observation software to identify drowning persons and provide early behavioral detection. Claesson et al. [17] used drones and online machine learning to recognize drowning victims in the ocean.

Background
The typical IoT system architecture consists of three layers: (i) a perception layer or physical layer, (ii) a network layer, and (iii) an application layer [18]. The physical layer, which has some sensors, is responsible for sensing and gathering useful information/data from the surrounding things or environments. In addition, it transforms the data collected into a digital setup. The network layer is the core layer of the IoT that ensures unique addressing identification and routing between different devices. It also provides secure data transmission and communication between the physical layer and the application layer. The application layer is the topmost layer of the IoT architecture that delivers a specific service to the end user. This proposed system is composed of hardware nodes and software tools and algorithms to monitor the swimming pool, utilizing both IoT layer architectures and machine learning techniques. The hardware components composed of sensors both detect and capture objects that fall into a swimming pool. The software algorithms are applied to detect and classify the captured objects.

Hardware
The entire system consists of different hardware components, including sensing hardware, IoT devices, and wireless technology, as shown in Table 1. The hardware nodes are responsible for collecting the data from different sensors, in order to transmit the data, using wireless technology. These components are explained as follows. The sensing hardware is an essential component of the proposed system. It aims to detect objects near the swimming pool and collect data through sensor nodes. The system uses the following hardware-based sensors, as shown in Figure 1: a motion-detection sensor and a camera sensor. The motion-detection sensor is an electronic device that senses any motion surrounding the area. Then, the camera sensor is used to capture a 2D image after receiving a signal from the motion-detection sensor.

Control Unit
The control unit is used as a "hub" for controlling and automating the system. In this study, a Raspberry Pi 3 was used as a microcontroller for controlling and connecting the system components, as shown in Figure 1c). A Raspberry Pi 3 is a small digital computer with the following specifications 1 GB of RAM, multitasking, USB ports, and an operating system.

Communications
Several wireless communication protocols have been proposed in recent years, due to the rapid advancement in IoT and wireless technologies. In this study, Wi-Fi is used to keep IoT devices connected to the internet. To transmit the collected data [8], Wi-Fi was used in this study because it has a higher data rate and covers less physical area, as compared to other wireless communication technologies, as shown in Table 2.

Transfer Learning
The software components utilize machine learning to efficiently process the captured images and make decisions that keep the swimming-pool owner informed about falling objects. Machine learning algorithms have been extensively used in computer vision. Machine learning algorithms usually require large training samples to perform well. In our case, the available data size is small, which may cause overfitting problems. Thus, transfer learning with a pretrained model is utilized to enhance the proposed system's performance. The lack of available training image samples for the computer vision field is a well-known issue in the machine learning and deep learning communities and is the reason behind the development of the transfer learning approach. Mainly, TL refers to the transfer of knowledge between different domains where the source domain is trained on a large dataset (such as ImageNet), and then the knowledge is transferred across another domain [21][22][23]. TL utilizes the pretrained model, to provide a better generalization ability and reduce the computational process. The following three common CNN deep pretrained networks are explained briefly: (1) ResNet, (2) VGGNet, and (3) Xception. 1) ResNet:

Control Unit
The control unit is used as a "hub" for controlling and automating the system. In this study, a Raspberry Pi 3 was used as a microcontroller for controlling and connecting the system components, as shown in Figure 1c). A Raspberry Pi 3 is a small digital computer with the following specifications 1 GB of RAM, multitasking, USB ports, and an operating system.

Communications
Several wireless communication protocols have been proposed in recent years, due to the rapid advancement in IoT and wireless technologies. In this study, Wi-Fi is used to keep IoT devices connected to the internet. To transmit the collected data [8], Wi-Fi was used in this study because it has a higher data rate and covers less physical area, as compared to other wireless communication technologies, as shown in Table 2.

Transfer Learning
The software components utilize machine learning to efficiently process the captured images and make decisions that keep the swimming-pool owner informed about falling objects. Machine learning algorithms have been extensively used in computer vision. Machine learning algorithms usually require large training samples to perform well. In our case, the available data size is small, which may cause overfitting problems. Thus, transfer learning with a pretrained model is utilized to enhance the proposed system's performance. The lack of available training image samples for the computer vision field is a well-known issue in the machine learning and deep learning communities and is the reason behind the development of the transfer learning approach. Mainly, TL refers to the transfer of knowledge between different domains where the source domain is trained on a large dataset (such as ImageNet), and then the knowledge is transferred across another domain [21][22][23]. TL utilizes the pretrained model, to provide a better generalization ability and reduce the computational process. The following three common CNN deep pretrained networks are explained briefly: (1) ResNet, (1) ResNet: ResNet [24] is a deep residual network based on a deep CNN architecture with 152 layers that was introduced by Microsoft Research in 2015 and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It introduces a so-called residual block to address the vanishing/exploding gradient associated with training the deep model, using the identity shortcut connection, as shown in Figure 2. The residual blocks skip one or more layers, to copy the input of the layers to the output, and afterward, they process them by using a rectified linear unit function (ReLU). ResNet has achieved great performance and is state-of-the-art on many visual recognition tasks, especially on the ImageNet database that contains 1000 classes, with an accuracy of 96.43%. ResNet50 is a short-form of deep residual networks that consists of 50 layers. ResNet50 is composed of identical blocks with identity shortcut connections [25].
Electronics 2020, 9, x FOR PEER REVIEW 5 of 13 ResNet [24] is a deep residual network based on a deep CNN architecture with 152 layers that was introduced by Microsoft Research in 2015 and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It introduces a so-called residual block to address the vanishing/exploding gradient associated with training the deep model, using the identity shortcut connection, as shown in Figure 2. The residual blocks skip one or more layers, to copy the input of the layers to the output, and afterward, they process them by using a rectified linear unit function (ReLU). ResNet has achieved great performance and is state-of-the-art on many visual recognition tasks, especially on the ImageNet database that contains 1000 classes, with an accuracy of 96.43%. ResNet50 is a short-form of deep residual networks that consists of 50 layers. ResNet50 is composed of identical blocks with identity shortcut connections [25].

2) VGGNet [26]:
VGGNet is a very deep convolution network proposed in 2014 by the Visual Geometry Group (VGG) of Oxford University. VGG16 consists of 16 weight layers (13 convolution layers and 3 fully connected layers) with a small 3 × 3 receptive field and a max pooling layer size of 2 × 2. The ReLU activation function is applied to all the hidden layers, and softmax is applied to the last layer. In addition, VGG19 consists of 19 weight layers (16 convolution layers and 3 fully connected layers). VGGNet was designed to win the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) and is still one of the best-used techniques in visual recognition tasks.

3) Xception [26]:
Xception is an extension of Google's Inception model and is based on deep convolution networks. The Xception architecture has 36 convolution layers that are structured into 14 modules. It replaces the Inception modules with a linear stack of depth-wise separable convolutions, followed by a pointwise convolution layer. The Xception model outperforms the reported results of other methods such as ResNet50 and VGG16 on ImageNet.

Proposed Methodology
The proposed automated and intelligent system for monitoring swimming-pool safety based on the IoT and transfer learning is illustrated in Figure 3. The system utilizes a single image captured through two steps: (1) the motion sensor detects any objects around the swimming pool and sends a signal to the camera, and (2) the camera that is installed overhead the swimming pool, as shown in Figure 3c, captures a single image. Then, Raspberry Pi 3 is connected to the camera, and the single image is sent via wireless communication to the server station, for processing and classification. Transfer learning is utilized to extract the most significant features and classify the detected object. The swimming-pool owner will receive a notification through a mobile application during off-time that provides only three results: human, animal, or object. This study utilizes the ResNet50 model (2) VGGNet [26]: VGGNet is a very deep convolution network proposed in 2014 by the Visual Geometry Group (VGG) of Oxford University. VGG16 consists of 16 weight layers (13 convolution layers and 3 fully connected layers) with a small 3 × 3 receptive field and a max pooling layer size of 2 × 2. The ReLU activation function is applied to all the hidden layers, and softmax is applied to the last layer. In addition, VGG19 consists of 19 weight layers (16 convolution layers and 3 fully connected layers). VGGNet was designed to win the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) and is still one of the best-used techniques in visual recognition tasks.
(3) Xception [26]: Xception is an extension of Google's Inception model and is based on deep convolution networks. The Xception architecture has 36 convolution layers that are structured into 14 modules. It replaces the Inception modules with a linear stack of depth-wise separable convolutions, followed by a pointwise convolution layer. The Xception model outperforms the reported results of other methods such as ResNet50 and VGG16 on ImageNet.

Proposed Methodology
The proposed automated and intelligent system for monitoring swimming-pool safety based on the IoT and transfer learning is illustrated in Figure 3. The system utilizes a single image captured through two steps: (1) the motion sensor detects any objects around the swimming pool and sends a signal to the camera, and (2) the camera that is installed overhead the swimming pool, as shown in Figure 3c, captures a single image. Then, Raspberry Pi 3 is connected to the camera, and the single image is sent via wireless communication to the server station, for processing and classification. Transfer learning is utilized to extract the most significant features and classify the detected object. The swimming-pool owner will receive a notification through a mobile application during off-time that provides only three results: human, animal, or object. This study utilizes the ResNet50 model pretrained on "ImageNet" to extract the most distinctive features in order to detect and recognize the objects, as shown in Figure 4.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 13 pretrained on "ImageNet" to extract the most distinctive features in order to detect and recognize the objects, as shown in Figure 4.

Proposed Specialized ResNet50
The proposed model is based on a modified ResNet50 architecture [24]. The input layer is a single image with a size of (100, 100, 3), where each pixel value is normalized between zero and one. Three fully connected layers were added toward the end, with sizes of 1000, 300, and 32 nodes. Each layer of the three proposed layers is followed by dropout regularization (dropout ratio of 0.05). The last layer of ResNet50 has 1000 classes, which were removed by setting include_top to false and replaced by 3 classes. The specialized model uses stochastic gradient descent (SGD) to optimize the cost function and minimize the training error. Table 3 demonstrates the rest of the specialized ResNet50 hyperparameters.  Electronics 2020, 9, x FOR PEER REVIEW 6 of 13 pretrained on "ImageNet" to extract the most distinctive features in order to detect and recognize the objects, as shown in Figure 4.  The proposed model is based on a modified ResNet50 architecture [24]. The input layer is a single image with a size of (100, 100, 3), where each pixel value is normalized between zero and one. Three fully connected layers were added toward the end, with sizes of 1000, 300, and 32 nodes. Each layer of the three proposed layers is followed by dropout regularization (dropout ratio of 0.05). The last layer of ResNet50 has 1000 classes, which were removed by setting include_top to false and replaced by 3 classes. The specialized model uses stochastic gradient descent (SGD) to optimize the cost function and minimize the training error. Table 3 demonstrates the rest of the specialized ResNet50 hyperparameters.  •

Proposed Specialized ResNet50
The proposed model is based on a modified ResNet50 architecture [24]. The input layer is a single image with a size of (100, 100, 3), where each pixel value is normalized between zero and one. Three fully connected layers were added toward the end, with sizes of 1000, 300, and 32 nodes. Each layer of the three proposed layers is followed by dropout regularization (dropout ratio of 0.05). The last layer of ResNet50 has 1000 classes, which were removed by setting include_top to false and replaced by 3 classes. The specialized model uses stochastic gradient descent (SGD) to optimize the cost function and minimize the training error. Table 3 demonstrates the rest of the specialized ResNet50 hyperparameters.

Discussion and Experimental Results
The aim of this section is to demonstrate the robustness and effectiveness of the proposed system based on transfer learning and the Internet of Things. The system utilizes only a single image to detect fallen objects and classify them into three types: humans, animals, and objects. The transfer learning model pretrained on ImageNet enhances the convergence of the proposed system. This section is divided into four subsections: (1) Dataset, (2) Metrics, (3) Performance evaluation, and (4) Discussion.

Dataset
The dataset used in this experiment was collected in a prototype swimming pool, using a Pixy camera, and the dataset contains 300 image samples. As shown in Figure 5, the collected image samples are 2D images that are classified into three types: human class, animal class, and object class. The entire dataset is split into two parts, training and test sets, where 66% of the image samples are used for training purposes and the remaining 33% of the sample images are used for test purposes.

Discussion and Experimental Results
The aim of this section is to demonstrate the robustness and effectiveness of the proposed system based on transfer learning and the Internet of Things. The system utilizes only a single image to detect fallen objects and classify them into three types: humans, animals, and objects. The transfer learning model pretrained on ImageNet enhances the convergence of the proposed system. This section is divided into four subsections: (1) Dataset, (2) Metrics, (3) Performance evaluation, and (4) Discussion.

Dataset
The dataset used in this experiment was collected in a prototype swimming pool, using a Pixy camera, and the dataset contains 300 image samples. As shown in Figure 5, the collected image samples are 2D images that are classified into three types: human class, animal class, and object class. The entire dataset is split into two parts, training and test sets, where 66% of the image samples are used for training purposes and the remaining 33% of the sample images are used for test purposes.

Metrics
To measure and analyze the performance of the proposed system, specific evaluation metrics are computed to measure the classifier performance, and they are explained as follows. TP and TN denote the number of correctly identified humans and correctly identified nonhumans (including animals and objects), respectively. Additionally, FP and FN indicate the number of incorrectly classified nonhumans and incorrectly classified humans. Based on the abovementioned metrics, the following metrics are computed:

Metrics
To measure and analyze the performance of the proposed system, specific evaluation metrics are computed to measure the classifier performance, and they are explained as follows. TP and TN denote the number of correctly identified humans and correctly identified nonhumans (including animals and objects), respectively. Additionally, FP and FN indicate the number of incorrectly classified nonhumans and incorrectly classified humans. Based on the abovementioned metrics, the following metrics are computed:

Performance Evaluation
To explore the efficiency of the proposed intelligent system, different deep learning models were evaluated and analyzed by using the collected dataset. First, the DCNN, the most powerful deep learning model used in computer vision, was evaluated and achieved an accuracy of 95%. In addition, ResNet50 without any further modification was also assessed by using the same dataset and achieved an accuracy of 97%. Although these approaches obtained high accuracies, they are not acceptable in a system that deals directly with human life. The experimental results, as presented in Table 4, show that the proposed intelligent system outperforms previous state-of-the-art deep-learning-based algorithms such as VGG16, VGG19, ResNet50, and Xception. Furthermore, performance metrics (sensitivity, specificity, precision, and F1-score) and confusion matrices are used to further evaluate the multi-classification performance. It is observed that utilizing the pretrained deep learning approach ensures better consistency in detecting human beings with high values, as shown in Table 5. To further explore and analyze the classification of the three classes, the proposed system and a deep convolutional neural network were compared and assessed. It is observed that the proposed system has obtained great results at detecting humans and animals, with high sensitivity, precision, and F-scores, as shown in Figures 6 and 7; however, the proposed system has lower sensitivity when detecting objects, which is acceptable in real-life applications. The proposed system misclassified one object as an animal, as shown in the confusion matrix in Figure 8a. In addition, the deep convolutional network misclassified one human being as a false negative, as shown in the confusion matrix in Figure 8b, which is not acceptable in real-life applications.
system has obtained great results at detecting humans and animals, with high sensitivity, precision, and F-scores, as shown in Figures 6 and 7; however, the proposed system has lower sensitivity when detecting objects, which is acceptable in real-life applications. The proposed system misclassified one object as an animal, as shown in the confusion matrix in Figure 8a. In addition, the deep convolutional network misclassified one human being as a false negative, as shown in the confusion matrix in Figure 8b, which is not acceptable in real-life applications.

Discussion
In this subsection, the efficiency and reliability of the intelligent system is discussed and analyzed based on three aspects: reliability, cost efficiency, and system complexity. In addition, the proposed system is compared with previous vision-based detection systems, as shown in Table 6. First, the proposed intelligent system was compared with existing deep learning algorithms such as

Discussion
In this subsection, the efficiency and reliability of the intelligent system is discussed and analyzed based on three aspects: reliability, cost efficiency, and system complexity. In addition, the proposed system is compared with previous vision-based detection systems, as shown in Table 6. First, the proposed intelligent system was compared with existing deep learning algorithms such as

Discussion
In this subsection, the efficiency and reliability of the intelligent system is discussed and analyzed based on three aspects: reliability, cost efficiency, and system complexity. In addition, the proposed system is compared with previous vision-based detection systems, as shown in Table 6. First, the proposed intelligent system was compared with existing deep learning algorithms such as the DNN, DCNN, and Resnet50; and it outperformed all previously mentioned algorithms, as shown in Table 4. The proposed system has the ability to automatically capture a drowning object and extract the underlying and complex features from a single image. The proposed system consistently yields better results compared to the existing deep learning algorithm. The proposed system achieves an accuracy of 99% on three classes and an accuracy of 100% on two classes ((1) human class and (2) animal and object class), which makes the system reliable. Replacing the last layer of ResNet50 with our proposed layers helps our specialized model to avoid the overfitting and vanishing gradient problems. In addition, the model pretrained on ImageNet is utilized to initialize the weights to expedite the convergence and to enhance the performance during the training, as compared to the CNN without the pretrained model, as shown in Figures 9 and 10. Second, the requirements to design and deploy the proposed system are only a motion-detection sensor, cameras, a router, and a computer station, which are cheaper than other systems that require video cameras and other computer resources to process image frames in order to detect drowning objects in a manageable time. Third, the complexity of the proposed system is evaluated and analyzed by calculating the computational time required to detect a drowning object, which is approximately 0.56 s/image. The system only utilizes a single captured image, which means that it requires less computational time compared to other systems utilizing video sequences. The proposed intelligent system and various transfer learning algorithms are implemented by using Windows 10 with an Intel i7 CPU and 32 gb RAM. TensorFlow, Numpy, sklearn, matplotlib, and the Keras library are utilized as tools to implement the system. Despite the design achieving a high performance on automated drowning detection, the proposed system was only tested on images that were captured during the daytime and have only one object. In the future, both daytime and nighttime images will be investigated. In addition, the system will be tested by utilizing images that have two or more objects and validate the robustness of the system. NEPTUNE [12] sequences between 1 to 5 s alarm   Despite the design achieving a high performance on automated drowning detection, the proposed system was only tested on images that were captured during the daytime and have only one object. In the future, both daytime and nighttime images will be investigated. In addition, the system will be tested by utilizing images that have two or more objects and validate the robustness of the system.

Conclusions and Future Work
Drowning incidents are increasing and are considered the third leading cause of unintentional injury and death. Several researches have explored and utilized IoT technologies, to prevent drowning incidents. This paper proposes an efficient and reliable system that utilizes IoT technologies and transfer learning, to prevent the occurrence of these incidents. A specialized deep learning model was proposed and developed, utilizing only a single image to detect and classify the drowning object into three categories: human, animal, and object. This system has the ability to process and notify a swimming-pool owner through a mobile application, to overcome off-time surveillance drowning incident. A prototype experiment was designed to evaluate the performance of the proposed system, and the system obtained a higher accuracy of 99% in the overall classification, a precision of 100%, and a sensitivity of 100%, as compared to the human detection rate. Thus, the specialized model has outperformed other deep learning algorithms and can achieve impressive results in human drowning incident detection. In the future, a generative adversarial network will be

Conclusions and Future Work
Drowning incidents are increasing and are considered the third leading cause of unintentional injury and death. Several researches have explored and utilized IoT technologies, to prevent drowning incidents. This paper proposes an efficient and reliable system that utilizes IoT technologies and transfer learning, to prevent the occurrence of these incidents. A specialized deep learning model was proposed and developed, utilizing only a single image to detect and classify the drowning object into three categories: human, animal, and object. This system has the ability to process and notify a swimming-pool owner through a mobile application, to overcome off-time surveillance drowning incident. A prototype experiment was designed to evaluate the performance of the proposed system, and the system obtained a higher accuracy of 99% in the overall classification, a precision of 100%, and a sensitivity of 100%, as compared to the human detection rate. Thus, the specialized model has outperformed other deep learning algorithms and can achieve impressive results in human drowning incident detection. In the future, a generative adversarial network will be applied to generate synthesis data, in order to increase the size of the training dataset. In addition, more classes will be added to explore and to investigate the efficiency of the proposed system.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: