Performance Evaluation of Deep Learning Models for Classifying Cybersecurity Attacks in IoT Networks

: The Internet of Things (IoT) presents great potential in various fields such as home automation, healthcare, and industry, among others, but its infrastructure, the use of open source code, and lack of software updates make it vulnerable to cyberattacks that can compromise access to data and services, thus making it an attractive target for hackers. The complexity of cyberattacks has increased, posing a greater threat to public and private organizations. This study evaluated the performance of deep learning models for classifying cybersecurity attacks in IoT networks, using the CICIoT2023 dataset. Three architectures based on DNN, LSTM, and CNN were compared, highlighting their differences in layers and activation functions. The results show that the CNN architecture outperformed the others in accuracy and computational efficiency, with an accuracy rate of 99.10% for multiclass classification and 99.40% for binary classification. The importance of data standardization and proper hyperparameter selection is emphasized. These results demonstrate that the CNN-based model emerges as a promising option for detecting cyber threats in IoT environments, supporting the relevance of deep learning in IoT network security.


Introduction
The term "Internet of Things" (IoT), also known as "Internet of Everything" or sometimes referred to as "industrial internet" [1], refers to a communication infrastructure in which multiple tangible devices are enabled to establish connections and communication over the global network known as the internet.These devices, commonly referred to as "smart objects", incorporate all kinds of sensors, software, and electronic components that allow them to capture information, store it, analyze it, process it, and share it with other devices and systems, making the surrounding environment smarter [2][3][4].The variety of these smart objects can range from household devices to complex industrial machinery and transportation systems.It is estimated that by the year 2030, the number of devices connected to the internet could exceed 29 trillion [5].
The potential applications of IoT encompass a wide and diverse range of fields, and its impact is manifested in various industries, such as tourism [6], home healthcare [7], agriculture [8,9], and finance [10], among others.As the number of devices connected to the global network continues to grow, it has raised new concerns regarding security, stemming from the vulnerabilities presented by IoT devices, such as authentication, access control, device security, and heterogeneity, among others [2,[11][12][13].These vulnerabilities require defense strategies against potential attacks.
The proliferation and expansion of physical devices connected to the network make them attractive targets to be hijacked into botnets and used in attacks such as phishing and distributed denial of service (DDoS) [14].Malware attacks like Mirai, Hajime, and Bashlite, among others, also pose a significant challenge to IoT security [15,16], along with web-based attacks [17].To achieve the necessary level of protection, solutions based on traditional approaches such as traffic protection systems, firewalls, and managed security services have been implemented.However, these measures are not sufficient to counter these attacks due to their complexity, and they are rule-based, which limits protection against network-circulating attacks [18].
Given the growing threat of cyberattacks on IoT networks, it is necessary to evaluate the performance of deep learning models for efficiently and accurately classifying these attacks.The effectiveness of these models in detecting and mitigating threats can significantly contribute to improving security in IoT environments, protecting both devices and the sensitive data transmitted through them.
To address these challenges, researchers and professionals have turned to deep learning models, which have demonstrated suitable temporal performance and high detection rates with great accuracy for classifying and mitigating threats in IoT networks.In reference [18], three deep learning models based on deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN) were implemented to detect cyberattacks from the CICIoT2023 dataset, where the RNN model achieved a higher accuracy of 96.56% for multiclass classification.In the work by Akgun et al. [19], they developed a hybrid model based on DNN, CNN, and long short-term memory (LSTM) to classify DDoS attacks from the CIC-DDoS2019 dataset, achieving an accuracy of 99.30% for multiclass classification.In the study presented by Wang et al. [20], they proposed a lightweight method called DL-BiLSMT, which combines bidirectional long short-term memory networks (BiLSTMs) and DNN.This method was evaluated using a subset of data from CICIDS2017, N-BaIoT, and CICIoT2023, achieving accuracy rates of 93.13%, 99.98%, and 99.67%, respectively.In reference [14], gated recurrent units (GRU), an advanced variant of LSTM, were implemented to detect multi-vectorial DDoS attacks, achieving accuracies of 99.82% and 99.85% on the CICDDoS2019 and CICIoT2023 datasets, respectively.Despite the favorable results, the training and testing time exceeds 60 min, which should also be considered.
In research [21], a DNN was implemented that achieved an accuracy of 99.14% on the CICIoT2023 dataset, although its precision was only 67.6%.In [22], a new model for feature selection from the CICIoT2023 dataset based on extra tree classifier was proposed, which was implemented with an LSTM, achieving a multiclass classification accuracy rate of 92%.In reference [23], a federated learning approach based on deep learning was employed to predict attacks using the CICIoT2023 dataset, reaching an experimental accuracy of 99%.Other classical machine learning algorithms have also been implemented.In the study by Le et al. [24], a blending model was developed as a combination of three classifiers: gradient boosting, random forest, and decision tree.The performance of this method showed an accuracy rate of 99.51% and 100% on the CICIoT2023 and IoTID20 datasets, respectively.In the study by [25], a comparative analysis of different machine learning approaches was conducted and evaluated with the ToN-IoT and BoT-IoT databases, demonstrating that the neural network-based model achieved the best result of 99.9% accuracy.Although the results obtained for the BoT-IoT dataset are not detailed, it appears to be a good option for predicting threats in IoT networks.
These models have shown high detection rates and accuracy in identifying threats, making them ideal candidates for protecting IoT devices against malicious attacks.The main objective of this study is to evaluate the performance of different deep learning models including DNN, LSTM, and CNN for classifying cybersecurity attacks in IoT networks related to DDoS, denial of service (DoS), recon, web, brute force, spoofing, and Mirai.The CICIoT2023 dataset was used, which underwent data preprocessing and feature selection, resulting in a new simplified dataset.The performance of these models was evaluated in terms of accuracy, precision, recall, and F1 score in threat identification, with the purpose of providing an effective tool for protecting IoT devices against malicious attacks.

Materials Dataset
To conduct the present research, a dataset called CICIoT2023 [26] was employed, which is novel, extensive, and very recent.This dataset was designed to evaluate largescale attacks in the Internet of Things (IoT) ecosystem.The CICIoT2023 was created by the Canadian Institute for Cybersecurity and provides a realistic representation of attacks in an IoT topology composed of 105 devices.This dataset includes 33 types of attacks, classified into categories such as DDoS, DoS, recon, web, brute force, spoofing, and Mirai.In total, the CICIoT2023 contains an impressive record of 46,686,579 events and presents 47 distinctive features.Table 1 provides a detailed description of the CICIoT2023 dataset.This table includes the type of attack, the request target, the total number of records, the percentage of records used for training and validation of the proposed models, and the percentage distribution of classes.The total number of records refers to the quantity of feature tuples extracted from the original pcap files, which are summarized within a fixed-size packet window.These features are derived from a sequence of packets carrying information between two hosts [26].The different deep learning models were implemented in Google Colab, an online platform that provides free access to computing resources such as GPUs and TPUs.This tool proved to be essential for the development of the research, as it allowed for the efficient and scalable execution of deep learning models, significantly reducing training times and facilitating experimentation with different architectures and parameters.

Preprocessing, Feature Selection, and Data Standardization
The data preprocessing stage constitutes the most crucial and important phase in supervised learning, during which a series of transformations and adjustments are applied to the data with the purpose of improving both the data quality and the results of any machine learning model [27].The database contains 46 features and 46,686,579 records that host different attacks on IoT devices.
With the aim of simplifying the database, improving processing time, and reducing the required memory space, the following data cleaning and selection actions were performed.Outliers such as null values, duplicates, empty values, positive infinity, and negative infinity were removed, resulting in the elimination of a total of 34 records.However, due to the large number of remaining records, it was decided to use only 1% of the total records, equivalent to 466,866 records randomly selected in each attack category, as described in Table 1.This random sampling strategy allowed for reducing computational load and maintaining data representativeness in subsequent analysis.Additionally, six features were identified that contained only zero values in all records, which were removed.These features are "ece_flag_number", "cwr_flag_number", "Telnet", "SMTP", "IRC", and "DHCP".As a result, a new dataset with 40 features and another one with the "label" value used as a tag containing the name of each attack category was obtained.The details of these features are presented in Table 2 considering their maximum and minimum values for each feature.Given the different scales of each descriptor, the dataset was standardized using the StandardScaler() method to transform the data so that the mean of the resulting distribution is zero and the standard deviation is one.This transformation is achieved by subtracting the mean value of each observation and dividing by the standard deviation, as shown in Equation (1).
where z is the transformed value of the feature, x is the original value of each descriptor, µ is the mean, and σ is the standard deviation of the feature in the dataset.

Proposed Models of Deep Learning Architecture
Deep learning has emerged as an efficient and high-performance machine learning method for solving very complex classification and prediction problems.
Different performance comparisons of two types of deep learning architectures were conducted using the new dataset obtained from CICIoT2023.Based on these two models, a new model was proposed.Figure 1 illustrates the principle of operation of the system proposed in this research.The new dataset is divided into training, testing, and tuning through random selection.In each of the models, the selection of suitable hyperparameters includes network size, types of layers, activation functions, and optimizers.Therefore, we experimented with different types of layers for each proposed model, such as dense layer, LSTM layer, and convolutional layers of the proposed model.
The first architecture employed is based on a DNN, as shown in Figure 2a, consisting of five densely connected layers.This architecture begins with an input layer that has a number equal to the descriptors in the training data.It is followed by several dense layers, each with a "ReLU" activation function, introducing nonlinearities into the network.The dense layers have 64, 128, 256, 512, 256, and 128 neurons, respectively, progressively increasing the complexity and learning capacity of the model.A dropout layer with a rate of 0.2 was added to regularize the model and reduce overfitting.Then, a batch normalization layer was incorporated to normalize the activation of each layer and accelerate training.Finally, the output layer has the number of neurons equal to the number of classes in the simplified dataset, with a "softmax" activation function for multiclass classification.The first architecture employed is based on a DNN, as shown in Figure 2a, consisting of five densely connected layers.This architecture begins with an input layer that has a number equal to the descriptors in the training data.It is followed by several dense layers each with a "ReLU" activation function, introducing nonlinearities into the network.The dense layers have 64, 128, 256, 512, 256, and 128 neurons, respectively, progressively in creasing the complexity and learning capacity of the model.A dropout layer with a rate of 0.2 was added to regularize the model and reduce overfitting.Then, a batch normali zation layer was incorporated to normalize the activation of each layer and accelerat training.Finally, the output layer has the number of neurons equal to the number of clas ses in the simplified dataset, with a "softmax" activation function for multiclass classifi cation.
The second architecture is based on a long short-term memory (LSTM) recurrent neu ral network model, as shown in Figure 2b.This architecture begins with an LSTM laye with 32 units and "return_sequences=True", meaning it returns complete sequences in stead of just the last output.The LSTM layer has an input shape equal to the number o descriptors in the training dataset; next is another LSTM layer with 16 units, followed by a dense layer with 32 units and the ReLU activation function.A dropout layer with a rat of 0.2 is included.Finally, the output layer has a number of units equal to the number o classes in the simplified dataset.
The third architecture, as detailed in Figure 3, is based on a CNN consisting of severa layers that process one-dimensional data.The input layer, like in the previous models, i equal to the number of descriptors in the training dataset.The first section of the network consists of three parallel convolutional branches, each followed by a ReLU activation func tion.Each branch has a convolutional filter with different kernel sizes (3, 5, and 11) and the same number of filters (64), allowing the network to capture patterns at differen scales.The outputs of these branches are then concatenated along the last axis (axis = −2) meaning they are combined along the features.After concatenation, an additional convo lutional layer with 72 filters and a kernel size of 7 was added, followed by a MaxPooling layer to reduce dimensionality.Next, the output was flattened to feed into dense layers which are used to learn more abstract representations of the data.A dense layer with 256 neurons with a ReLU activation function was added, followed by a dropout layer with a rate of 0.2 to regularize the network and prevent overfitting.Finally, the output layer ha a number of units equal to the number of classes in the dataset and uses the softmax acti vation function for multiclass classification.The second architecture is based on a long short-term memory (LSTM) recurrent neural network model, as shown in Figure 2b.This architecture begins with an LSTM layer with 32 units and "return_sequences=True", meaning it returns complete sequences instead of just the last output.The LSTM layer has an input shape equal to the number of descriptors in the training dataset; next is another LSTM layer with 16 units, followed by a dense layer with 32 units and the ReLU activation function.A dropout layer with a rate of 0.2 is included.Finally, the output layer has a number of units equal to the number of classes in the simplified dataset.
The third architecture, as detailed in Figure 3, is based on a CNN consisting of several layers that process one-dimensional data.The input layer, like in the previous models, is equal to the number of descriptors in the training dataset.The first section of the network consists of three parallel convolutional branches, each followed by a ReLU activation function.Each branch has a convolutional filter with different kernel sizes (3, 5, and 11) and the same number of filters (64), allowing the network to capture patterns at different scales.The outputs of these branches are then concatenated along the last axis (axis = −2), meaning they are combined along the features.After concatenation, an additional convolutional layer with 72 filters and a kernel size of 7 was added, followed by a MaxPooling layer to reduce dimensionality.Next, the output was flattened to feed into dense layers, which are used to learn more abstract representations of the data.A dense layer with 256 neurons with a ReLU activation function was added, followed by a dropout layer with a rate of 0.2 to regularize the network and prevent overfitting.Finally, the output layer has a number of units equal to the number of classes in the dataset and uses the softmax activation function for multiclass classification.

Performance Evaluations
The detection of an attack can be classified as true positive (TP) or true negative (TN) when accurate detections are made on the attacks.Conversely, it is classified as false positive (FP) or false negative (FN) when detections are incorrect.
To evaluate the performance of the proposed models, metrics based on the confusion matrix were used to assess the models' ability to classify the different classes in the sim-

Performance Evaluations
The detection of an attack can be classified as true positive (TP) or true negative (TN) when accurate detections are made on the attacks.Conversely, it is classified as false positive (FP) or false negative (FN) when detections are incorrect.
To evaluate the performance of the proposed models, metrics based on the confusion matrix were used to assess the models' ability to classify the different classes in the simplified dataset.
The metrics evaluated in this study were as follows: Precision (P) is a metric used to measure the quality of predictions, minimizing false positives and maximizing the number of correctly classified true positives.It can be calculated using Formula (2).
Recall (R): Assesses the classification accuracy of all elements within a given class.
F1 Score (F): This indicator provides a balance between precision and recall, allowing for a better comparison of combined performance.
Accuracy (Acc): Evaluating the prediction that the algorithm makes correctly and returning an accurate classification.

Results
The configuration of the proposed deep learning models was carried out as follows.The training of each model was conducted for 50 epochs, using the Adam optimizer with a learning rate of 0.001, which is a popular choice in deep learning due to its ability to adapt dynamically to the learning rate during training, which can improve model convergence.Additionally, the "sparse_categorical_crossentropy" loss function was employed, suitable for classification problems with multiple classes.During this phase, callbacks were included to enhance the model's generalization ability and prevent overfitting.
The callbacks used in the model training include ModelCheckpoint, which saves the model with the best score for "val_accuracy" on the test set; ReduceLROnPlateau, which monitors "val_loss" and reduces the learning rate if the loss stops improving, with a reduction factor of 0.1, patience of 3, and a minimum learning rate limit (min_lr) of 1 × 10 −7 .Additionally, CSVLogger was used to log the training progress to a CSV file, TensorBoard for visualizing training and validation metrics, and EarlyStopping to stop training if "val_loss" did not improve after six epochs.These callbacks were used to improve the performance and generalization of each model during the training and testing phase.
In Figures 4-6, the training and validation results for the accuracy and loss metrics of each proposed model are presented.It can be observed that, in all three models, accuracy increases in both datasets, indicating that the models learned to classify the data correctly.Additionally, the loss decreases steadily in each epoch, indicating that the models learned to minimize the error between their predictions and the actual labels.
rectly.Additionally, the loss decreases steadily in each epoch, indicating that the models learned to minimize the error between their predictions and the actual labels.
The CNN-based model shows the best results, with a val_loss and val_accuracy of 0.0279 and 0.9901 after 17 epochs, respectively, followed by the DNN-based model, which achieved a val_loss and val_accuracy of 0.02771 and 0.9896 after 25 epochs, respectively.In contrast, the LSTM model required more training and validation epochs but did not surpass 0.9 in the val_accuracy metric.To evaluate the effectiveness of the proposed models in classifying threats in an IoT network, the results of the confusion matrix are shown in Figures 7-9.In this matrix, the rows represent the true labels, and the columns represent the predictions of the deep learning model.The darker the color of a square on the diagonal of the matrix, the higher the number of samples correctly predicted for that category.
In Table 3, the evaluation of the proposed models using confusion matrix metrics is presented, highlighting the superior performance of the CNN model compared to the DNN and LSTM models.The CNN model achieved the highest accuracy with a value of 99.10%, indicating its ability to correctly classify threats in IoT environments.Additionally, its precision (99.08%) and recall (99.10%) scores further support its effectiveness in  To evaluate the effectiveness of the proposed models in classifying threats in an IoT network, the results of the confusion matrix are shown in Figures 7-9.In this matrix, the rows represent the true labels, and the columns represent the predictions of the deep learning model.The darker the color of a square on the diagonal of the matrix, the higher the number of samples correctly predicted for that category.
In Table 3, the evaluation of the proposed models using confusion matrix metrics is presented, highlighting the superior performance of the CNN model compared to the DNN and LSTM models.The CNN model achieved the highest accuracy with a value of 99.10%, indicating its ability to correctly classify threats in IoT environments.Additionally, its precision (99.08%) and recall (99.10%) scores further support its effectiveness in The CNN-based model shows the best results, with a val_loss and val_accuracy of 0.0279 and 0.9901 after 17 epochs, respectively, followed by the DNN-based model, which achieved a val_loss and val_accuracy of 0.02771 and 0.9896 after 25 epochs, respectively.In contrast, the LSTM model required more training and validation epochs but did not surpass 0.9 in the val_accuracy metric.
To evaluate the effectiveness of the proposed models in classifying threats in an IoT network, the results of the confusion matrix are shown in Figures 7-9.In this matrix, the rows represent the true labels, and the columns represent the predictions of the deep learning model.The darker the color of a square on the diagonal of the matrix, the higher the number of samples correctly predicted for that category.Another important aspect to consider is the inference time used by each model.Firstly, it was observed that the size of the CNN model is considerably larger compared to the other two models, at 4.15 megabytes, while the LSTM model has a much smaller size, at only 32.41 kilobytes.However, despite its size, the CNN model requires less time for both training and inference compared to the DNN and LSTM models.The CNN model showed the shortest training time at 767 s and the fastest inference time at just 6 s.This suggests that, despite its greater complexity and size, the CNN model manages to maintain notable efficiency in terms of processing time compared to the other models.Another important aspect to consider is the inference time used by each model.Firstly, it was observed that the size of the CNN model is considerably larger compared to the other two models, at 4.15 megabytes, while the LSTM model has a much smaller size, at only 32.41 kilobytes.However, despite its size, the CNN model requires less time for both training and inference compared to the DNN and LSTM models.The CNN model showed the shortest training time at 767 s and the fastest inference time at just 6 s.This suggests that, despite its greater complexity and size, the CNN model manages to maintain notable efficiency in terms of processing time compared to the other models.In Table 3, the evaluation of the proposed models using confusion matrix metrics is presented, highlighting the superior performance of the CNN model compared to the DNN and LSTM models.The CNN model achieved the highest accuracy with a value of 99.10%, indicating its ability to correctly classify threats in IoT environments.Additionally, its precision (99.08%) and recall (99.10%) scores further support its effectiveness in both identifying true positives and minimizing false positives and false negatives, respectively.Regarding binary classification, the last layer of each proposed architecture in this study consists of a dense layer with a single output that uses the "sigmoid" function for comparing the two classes.In this case, all threats from the preprocessed dataset were grouped into the "Attack" class, while the other class remained named "Benign."According to the results in Table 4, the CNN-based model achieved the highest accuracy score, at 99.40%, closely followed by the other models.Additionally, this model also yielded the shortest training time compared to the others.Table 5 presents a comparative analysis of various models used in threat detection in IoT networks, including the present study, which utilized the CICIoT2023 dataset.Each entry describes the bibliographic reference, model architecture, number of descriptors, evaluation metrics, and temporal metrics regarding training and inference times.Compared to the previously mentioned studies, the proposed CNN model in this work exhibited equal or superior accuracy compared to others.Compared to the previously mentioned studies, the proposed CNN model in this work exhibited an equal or superior F1 score compared to others, except for in the study [24], which outperformed our results by 0.02% in the F1 metric.However, it is worth noting that the study [24] was based solely  Although precision and accuracy are important metrics for evaluating a model's performance, the F1 score provides a more comprehensive assessment, particularly when it comes to classifying multiple classes of attacks in IoT networks.By combining precision and recall into a single metric, the F1 score helps to demonstrate the predictive capability and effectiveness of the models in this specific context.Given the presence of class imbalance in the dataset, as evidenced in Table 1, the F1 score becomes a crucial metric, as this imbalance could significantly impact the interpretation of precision and accuracy.In the case of the CNN-based model, it achieved the best result of 99.05% for the F1 metric, validating its superior performance despite the class imbalance.
Another important aspect to consider is the inference time used by each model.Firstly, it was observed that the size of the CNN model is considerably larger compared to the other two models, at 4.15 megabytes, while the LSTM model has a much smaller size, at only 32.41 kilobytes.However, despite its size, the CNN model requires less time for both training and inference compared to the DNN and LSTM models.The CNN model showed the shortest training time at 767 s and the fastest inference time at just 6 s.This suggests that, despite its greater complexity and size, the CNN model manages to maintain notable efficiency in terms of processing time compared to the other models.
Regarding binary classification, the last layer of each proposed architecture in this study consists of a dense layer with a single output that uses the "sigmoid" function for comparing the two classes.In this case, all threats from the preprocessed dataset were grouped into the "Attack" class, while the other class remained named "Benign".According to the results in Table 4, the CNN-based model achieved the highest accuracy score, at 99.40%, closely followed by the other models.Additionally, this model also yielded the shortest training time compared to the others.Table 5 presents a comparative analysis of various models used in threat detection in IoT networks, including the present study, which utilized the CICIoT2023 dataset.Each entry describes the bibliographic reference, model architecture, number of descriptors, evaluation metrics, and temporal metrics regarding training and inference times.Compared to the previously mentioned studies, the proposed CNN model in this work exhibited equal or superior accuracy compared to others.Compared to the previously mentioned studies, the proposed CNN model in this work exhibited an equal or superior F1 score compared to others, except for in the study [24], which outperformed our results by 0.02% in the F1 metric.However, it is worth noting that the study [24] was based solely on six features, namely "IAT", "Magnitude", "Total size", "Minimum", "Flow duration", and "Total sum", implying lower data complexity, which could explain their superior results in F1.It is important to highlight the need to consider more features that were not included, especially given the threshold restriction they used after applying the MDI method.Additionally, the training and inference times are significantly better, suggesting the effectiveness of the proposed model.It is relevant to highlight that the CNN architecture used in this study achieves comparable or superior results with reduced computational burden, positioning it as a promising approach for cybersecurity threat detection in a realistic IoT environment.

Conclusions
This article evaluated the performance of different deep learning architectures (DNN, LSTM, and CNN) for classifying attacks in the IoT ecosystem.A recent and realistic dataset, CICIoT2023, containing eight attack categories and a class of benign records, was used.An analysis and removal of outliers and irrelevant features for the models were performed, resulting in a new dataset with 40 features.Due to the large number of records in CICIoT2023, 1% of each class was randomly selected and divided into 60% for training, 20% for testing, and 20% for validation.The results show that the proposed CNN-based model achieved an accuracy of 99.10% in multiclass classification and 99.40% in binary classification, outperforming other models in the study and those from another research.Additionally, the inference time of the proposed model is reasonable compared to reference models, demonstrating its effectiveness in detecting various types of attacks in IoT networks.
The results obtained with the LSTM-based model were unfavorable.Although this architecture is highly effective for modeling long temporal sequences and capturing longterm dependencies in data, it does not suit the context of attack detection in IoT networks.This is because the structure of the sequences of the 40 descriptors lacks significant longterm dependencies for classification; instead, they are characterized by being local attributes independent of time.Additionally, the LSTM network tends to overfit with data that exhibit less temporal sequentiality, as is the case with the preprocessed database used in this study.Despite employing 50 training epochs, resulting in a total training time of 3701 s, the results were not favorable.This suggests a limited capacity for the LSTM network to progressively adapt to the dataset.
The DNN model, whose results were slightly inferior to those of the convolutional neural network, attributes its performance to a simpler architecture, limiting its ability to effectively capture the characteristics of complex data such as attack patterns in IoT environments, thereby affecting its ability to perform accurate classification.On the other hand, the CNN-based architecture proved to be more efficient in extracting relevant features from attack data in IoT networks.It was observed that the CNN has an effective capacity to adapt to complex data, thanks to the three convolutional layers included in its architecture, each with a kernel size of 3, 5, and 11, and 64 filters per layer, allowing it to capture patterns at different scales of the input data.Subsequently, the outputs of these layers were concatenated, and an additional convolutional layer with 72 filters and a kernel size of 7 was added, contributing to its high performance in the classification of attacks in IoT environments.
Based on the results obtained, it is confirmed that the CNN architecture is the most suitable for classifying attacks in IoT networks, surpassing DNN and LSTM architectures in the various evaluated metrics and inference time.In order to expand knowledge in this field, it is suggested to explore other deep learning architectures with potential for classifying attacks in IoT networks, evaluate the performance of proposed models on diverse IoT attack datasets, and develop real-time intrusion detection systems for IoT networks.It is expected that the results of this study will contribute significantly to the advancement in the field of IoT network security, paving the way for the development of more effective and efficient intrusion detection systems.

Figure 1 .
Figure 1.General Scheme of the Proposed Method.

Figure 2 .
Figure 2. Reference Models of Deep Learning.In (a), the architecture based on a DNN is presented, and in (b), the architecture based on LSTM is presented.

Figure 3 .
Figure 3. Architecture Model based on CNN.

Figure 4 .
Figure 4. Training and validation results for the DNN-based architecture.

Figure 6 .
Figure 6.Training and validation results for the CNN-based architecture.

Figure 5 . 14 Figure 5 .
Figure 5. Training and validation results for the LSTM-based architecture.

Figure 6 .
Figure 6.Training and validation results for the CNN-based architecture.

Figure 6 .
Figure 6.Training and validation results for the CNN-based architecture.

Informatics 2024 ,
11, x 10 of 14 validating its superior performance despite the class imbalance.

Figure 7 .
Figure 7. Confusion matrix results for the DNN architecture.

Figure 8 .
Figure 8. Confusion matrix results for the LSTM architecture.

Figure 7 .
Figure 7. Confusion matrix results for the DNN architecture.

Informatics 2024 ,
11, x 10 of 14 validating its superior performance despite the class imbalance.

Figure 7 .
Figure 7. Confusion matrix results for the DNN architecture.

Figure 8 .
Figure 8. Confusion matrix results for the LSTM architecture.

Figure 8 .
Figure 8. Confusion matrix results for the LSTM architecture.

Figure 9 .
Figure 9. Confusion matrix results for the CNN architecture.

Figure 9 .
Figure 9. Confusion matrix results for the CNN architecture.

Table 1 .
Description of the CICIoT2023 Dataset.

Table 2 .
Final features of the new simplified dataset.

Table 3 .
Performance of deep learning architectures for multiclass classification.

Table 4 .
Performance of deep learning architectures for binary classification.

Table 3 .
Performance of deep learning architectures for multiclass classification.

Table 4 .
Performance of deep learning architectures for binary classification.

Table 5 .
Comparison of results obtained with other studies.