sensors-logo

Journal Browser

Journal Browser

Sensors Data Processing Using Machine Learning

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (25 August 2023) | Viewed by 31446

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Electrical Engineering and Information Technology, University of Žilina, Žilina, Slovakia
Interests: acoustic attenuation measurements; ion-conductive glasses; relaxation processes

E-Mail Website
Guest Editor
Department of multimedia and information-communication technologies, FEIT, University of Zilina, Univerzitna 8215/1, 01026 Zilina, Slovakia
Interests: neural network; machine learning; deep learning; computer vision; image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Multimedia and Information-Communication Technologies, University of Zilina, 010 26 Zilina, Slovakia
Interests: image segmentation; image analysis; feature extraction; computer vision; pattern recognition; digital image processing; object recognition; classification algorithms; image processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Different sensors estimate measured variables using computational models, and these data must be processed (data processing). Data processing is the task of converting data from a given form into a much more usable and desirable form (which makes them more meaningful and informative). For this purpose, machine learning (ML), deep learning (DL) and artificial intelligence (AI) are turning out to be effective procedures. With the help of machine learning algorithms, mathematical modeling or different statistical knowledge, this whole process can be automated.

The main aim of this Special Issue is to collect research focusing on data processing using machine learning and deep learning. We invite investigators to contribute both original and review articles, covering the research and development in the areas of data processing using machine learning (ML) and deep learning (DL). These areas include solutions that are designed for smart devices. Potential topics include, but are not limited to, the following:

  • Machine-learning-based deblurring/denoising;
  • Machine-learning-based computer vision;
  • Machine-learning-based depth estimation;
  • Evaluation of 3D models using machine learning and deep learning;
  • Recognition of 3D models using machine learning and deep learning;
  • New trends and applications for systems based on machine learning;
  • Pattern recognition using machine learning and deep learning;
  • Machine-learning-based segmentation, shape detection;
  • Machine-learning-based object detection, object tracking, object localization.

Prof. Dr. Peter Hockicko
Prof. Dr. Róbert Hudec
Dr. Patrik Kamencay
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data processing
  • machine learning
  • deep learning
  • pattern recognition
  • computer vision
  • depth estimation
  • 3D reconstruction

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

5 pages, 163 KiB  
Editorial
Sensors Data Processing Using Machine Learning
by Patrik Kamencay, Peter Hockicko and Robert Hudec
Sensors 2024, 24(5), 1694; https://doi.org/10.3390/s24051694 - 06 Mar 2024
Viewed by 594
Abstract
Various sensors utilize computational models to estimate measured variables, and the generated data require processing [...] Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)

Research

Jump to: Editorial

12 pages, 7170 KiB  
Article
Comparing Three Methods of Selecting Training Samples in Supervised Classification of Multispectral Remote Sensing Images
by Hongying Zhang, Jinxin He, Shengbo Chen, Ye Zhan, Yanyan Bai and Yujia Qin
Sensors 2023, 23(20), 8530; https://doi.org/10.3390/s23208530 - 17 Oct 2023
Cited by 2 | Viewed by 1372
Abstract
Selecting training samples is crucial in remote sensing image classification. In this paper, we selected three images—Sentinel-2, GF-1, and Landsat 8—and employed three methods for selecting training samples: grouping selection, entropy-based selection, and direct selection. We then used the selected training samples to [...] Read more.
Selecting training samples is crucial in remote sensing image classification. In this paper, we selected three images—Sentinel-2, GF-1, and Landsat 8—and employed three methods for selecting training samples: grouping selection, entropy-based selection, and direct selection. We then used the selected training samples to train three supervised classification models—random forest (RF), support-vector machine (SVM), and k-nearest neighbor (KNN)—and evaluated the classification results of the three images. According to the experimental results, the three classification models performed similarly. Compared with the entropy-based method, the grouping selection method achieved higher classification accuracy using fewer samples. In addition, the grouping selection method outperformed the direct selection method with the same number of samples. Therefore, the grouping selection method performed the best. When using the grouping selection method, the image classification accuracy increased with the increase in the number of samples within a certain sample size range. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

14 pages, 5685 KiB  
Article
A Self-Supervised Model Based on CutPaste-Mix for Ductile Cast Iron Pipe Surface Defect Classification
by Hanxin Zhang, Qian Sun and Ke Xu
Sensors 2023, 23(19), 8243; https://doi.org/10.3390/s23198243 - 04 Oct 2023
Cited by 1 | Viewed by 741
Abstract
Online surface inspection systems have gradually found applications in industrial settings. However, the manual effort required to sift through a vast amount of data to identify defect images remains costly. This study delves into a self-supervised binary classification algorithm for addressing the task [...] Read more.
Online surface inspection systems have gradually found applications in industrial settings. However, the manual effort required to sift through a vast amount of data to identify defect images remains costly. This study delves into a self-supervised binary classification algorithm for addressing the task of defect image classification within ductile cast iron pipe (DCIP) images. Leveraging the CutPaste-Mix data augmentation strategy, we combine defect-free data with enhanced data to input into a deep convolutional neural network. Through Gaussian Density Estimation, we compute anomaly scores to achieve the classification of abnormal regions. Our approach has been implemented in real-world scenarios, involving equipment installation, data collection, and experimentation. The results demonstrate the robust performance of our method, in both the DCIP image dataset and practical field application, achieving an impressive 99.5 AUC (Area Under Curve). This presents a cost-effective means of providing data support for subsequent DCIP surface inspection model training. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

25 pages, 2151 KiB  
Article
Student Learning Behavior Recognition Incorporating Data Augmentation with Learning Feature Representation in Smart Classrooms
by Zhifeng Wang, Longlong Li, Chunyan Zeng and Jialong Yao
Sensors 2023, 23(19), 8190; https://doi.org/10.3390/s23198190 - 30 Sep 2023
Cited by 4 | Viewed by 1105
Abstract
A robust and scientifically grounded teaching evaluation system holds significant importance in modern education, serving as a crucial metric that reflects the quality of classroom instruction. However, current methodologies within smart classroom environments have distinct limitations. These include accommodating a substantial student population, [...] Read more.
A robust and scientifically grounded teaching evaluation system holds significant importance in modern education, serving as a crucial metric that reflects the quality of classroom instruction. However, current methodologies within smart classroom environments have distinct limitations. These include accommodating a substantial student population, grappling with object detection challenges due to obstructions, and encountering accuracy issues in recognition stemming from varying observation angles. To address these limitations, this paper proposes an innovative data augmentation approach designed to detect distinct student behaviors by leveraging focused behavioral attributes. The primary objective is to alleviate the pedagogical workload. The process begins with assembling a concise dataset tailored for discerning student learning behaviors, followed by the application of data augmentation techniques to significantly expand its size. Additionally, the architectural prowess of the Extended-efficient Layer Aggregation Networks (E-ELAN) is harnessed to effectively extract a diverse array of learning behavior features. Of particular note is the integration of the Channel-wise Attention Module (CBAM) focal mechanism into the feature detection network. This integration plays a pivotal role, enhancing the network’s ability to detect key cues relevant to student learning behaviors and thereby heightening feature identification precision. The culmination of this methodological journey involves the classification of the extracted features through a dual-pronged conduit: the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN). Empirical evidence vividly demonstrates the potency of the proposed methodology, yielding a mean average precision (mAP) of 96.7%. This achievement surpasses comparable methodologies by a substantial margin of at least 11.9%, conclusively highlighting the method’s superior recognition capabilities. This research has an important impact on the field of teaching evaluation system, which helps to reduce the burden of educators on the one hand, and makes teaching evaluation more objective and accurate on the other hand. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

18 pages, 2791 KiB  
Article
Comparison of the Usability of Apple M2 and M1 Processors for Various Machine Learning Tasks
by David Kasperek, Pawel Antonowicz, Marek Baranowski, Marta Sokolowska and Michal Podpora
Sensors 2023, 23(12), 5424; https://doi.org/10.3390/s23125424 - 08 Jun 2023
Cited by 1 | Viewed by 7898
Abstract
Thispaper compares the usability of various Apple MacBook Pro laptops were tested for basic machine learning research applications, including text-based, vision-based, and tabular data. Four tests/benchmarks were conducted using four different MacBook Pro models—M1, M1 Pro, M2, and M2 Pro. A script written [...] Read more.
Thispaper compares the usability of various Apple MacBook Pro laptops were tested for basic machine learning research applications, including text-based, vision-based, and tabular data. Four tests/benchmarks were conducted using four different MacBook Pro models—M1, M1 Pro, M2, and M2 Pro. A script written in Swift was used to train and evaluate four machine learning models using the Create ML framework, and the process was repeated three times. The script also measured performance metrics, including time results. The results were presented in tables, allowing for a comparison of the performance of each device and the impact of their hardware architectures. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

20 pages, 4750 KiB  
Article
A Development of an IoT-Based Connected University System: Progress Report
by Slavomir Matuska, Juraj Machaj, Miroslav Hutar and Peter Brida
Sensors 2023, 23(6), 2875; https://doi.org/10.3390/s23062875 - 07 Mar 2023
Cited by 1 | Viewed by 2099
Abstract
In this paper, a report on the development of an Internet of Things (IoT)-based connected university system is presented. There have been multiple smart solutions developed at the university over recent years. However, the user base of these systems is limited. The IoT-based [...] Read more.
In this paper, a report on the development of an Internet of Things (IoT)-based connected university system is presented. There have been multiple smart solutions developed at the university over recent years. However, the user base of these systems is limited. The IoT-based connected university system allows for integration of multiple subsystems without the need to implement all of them in the same environment, thus enabling end-users to access multiple solutions through a single common interface. The implementation is based on microservice architecture, with the focus mainly on system robustness, scalability, and universality. In the system design, four subsystems are currently implemented, i.e., the subsystem for indoor navigation, the subsystem for parking assistants, the subsystem for smart classrooms or offices, and the subsystem for news aggregation from university life. The principles of all implemented subsystems, as well as the implementation of the system as a web interface and a mobile application, are presented in the paper. Moreover, the implementation of the indoor navigation subsystem that uses signals from Bluetooth beacons is described in detail. The paper also presents results proving the concept of the Bluetooth-based indoor navigation, taking into account different placements of nodes. The tests were performed in a real-world environment to evaluate the feasibility of the navigation module that utilizes deterministic fingerprinting algorithms to estimate the positions of users’ devices. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

17 pages, 6236 KiB  
Article
A New Deep-Learning Method for Human Activity Recognition
by Roberta Vrskova, Patrik Kamencay, Robert Hudec and Peter Sykora
Sensors 2023, 23(5), 2816; https://doi.org/10.3390/s23052816 - 04 Mar 2023
Cited by 18 | Viewed by 2471
Abstract
Currently, three-dimensional convolutional neural networks (3DCNNs) are a popular approach in the field of human activity recognition. However, due to the variety of methods used for human activity recognition, we propose a new deep-learning model in this paper. The main objective of our [...] Read more.
Currently, three-dimensional convolutional neural networks (3DCNNs) are a popular approach in the field of human activity recognition. However, due to the variety of methods used for human activity recognition, we propose a new deep-learning model in this paper. The main objective of our work is to optimize the traditional 3DCNN and propose a new model that combines 3DCNN with Convolutional Long Short-Term Memory (ConvLSTM) layers. Our experimental results, which were obtained using the LoDVP Abnormal Activities dataset, UCF50 dataset, and MOD20 dataset, demonstrate the superiority of the 3DCNN + ConvLSTM combination for recognizing human activities. Furthermore, our proposed model is well-suited for real-time human activity recognition applications and can be further enhanced by incorporating additional sensor data. To provide a comprehensive comparison of our proposed 3DCNN + ConvLSTM architecture, we compared our experimental results on these datasets. We achieved a precision of 89.12% when using the LoDVP Abnormal Activities dataset. Meanwhile, the precision we obtained using the modified UCF50 dataset (UCF50mini) and MOD20 dataset was 83.89% and 87.76%, respectively. Overall, our work demonstrates that the combination of 3DCNN and ConvLSTM layers can improve the accuracy of human activity recognition tasks, and our proposed model shows promise for real-time applications. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

17 pages, 33271 KiB  
Article
Impact of Packet Loss Rate on Quality of Compressed High Resolution Videos
by Juraj Bienik, Miroslav Uhrina, Lukas Sevcik and Anna Holesova
Sensors 2023, 23(5), 2744; https://doi.org/10.3390/s23052744 - 02 Mar 2023
Cited by 4 | Viewed by 1532
Abstract
Video delivered over IP networks in real-time applications, which utilize RTP protocol over unreliable UDP such as videotelephony or live-streaming, is often prone to degradation caused by multiple sources. The most significant is the combined effect of video compression and its transmission over [...] Read more.
Video delivered over IP networks in real-time applications, which utilize RTP protocol over unreliable UDP such as videotelephony or live-streaming, is often prone to degradation caused by multiple sources. The most significant is the combined effect of video compression and its transmission over the communication channel. This paper analyzes the adverse impact of packet loss on video quality encoded with various combinations of compression parameters and resolutions. For the purposes of the research, a dataset containing 11,200 full HD and ultra HD video sequences encoded to H.264 and H.265 formats at five bit rates was compiled with a simulated packet loss rate (PLR) ranging from 0 to 1%. Objective assessment was conducted by using peak signal to noise ratio (PSNR) and Structural Similarity Index (SSIM) metrics, whereas the well-known absolute category rating (ACR) was used for subjective evaluation. Analysis of the results confirmed the presumption that video quality decreases along with the rise of packet loss rate, regardless of compression parameters. The experiments further led to a finding that the quality of sequences affected by PLR declines with increasing bit rate. Additionally, the paper includes recommendations of compression parameters for use under various network conditions. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

16 pages, 731 KiB  
Article
Enhancing the Generalization for Text Classification through Fusion of Backward Features
by Dewen Seng and Xin Wu
Sensors 2023, 23(3), 1287; https://doi.org/10.3390/s23031287 - 23 Jan 2023
Cited by 1 | Viewed by 1341
Abstract
Generalization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting. [...] Read more.
Generalization has always been a keyword in deep learning. Pretrained models and domain adaptation technology have received widespread attention in solving the problem of generalization. They are all focused on finding features in data to improve the generalization ability and to prevent overfitting. Although they have achieved good results in various tasks, those models are unstable when classifying a sentence whose label is positive but still contains negative phrases. In this article, we analyzed the attention heat map of the benchmarks and found that previous models pay more attention to the phrase rather than to the semantic information of the whole sentence. Moreover, we proposed a method to scatter the attention away from opposite sentiment words to avoid a one-sided judgment. We designed a two-stream network and stacked the gradient reversal layer and feature projection layer within the auxiliary network. The gradient reversal layer can reverse the gradient of features in the training stage so that the parameters are optimized following the reversed gradient in the backpropagation stage. We utilized an auxiliary network to extract the backward features and then fed them into the main network to merge them with normal features extracted by the main network. We applied this method to the three baselines of TextCNN, BERT, and RoBERTa using sentiment analysis and sarcasm detection datasets. The results show that our method can improve the sentiment analysis datasets by 0.5% and the sarcasm detection datasets by 2.1%. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Graphical abstract

15 pages, 576 KiB  
Article
Multi-Delay Identification of Rare Earth Extraction Process Based on Improved Time-Correlation Analysis
by Rongxiu Lu, Hongliang Liu, Hui Yang, Jianyong Zhu and Wenhao Dai
Sensors 2023, 23(3), 1102; https://doi.org/10.3390/s23031102 - 18 Jan 2023
Cited by 2 | Viewed by 933
Abstract
The rare earth extraction process has significant time delay characteristics, making it challenging to identify the time delay and establish an accurate mathematical model. This paper proposes a multi-delay identification method based on improved time-correlation analysis. Firstly, the data are preprocessed by grey [...] Read more.
The rare earth extraction process has significant time delay characteristics, making it challenging to identify the time delay and establish an accurate mathematical model. This paper proposes a multi-delay identification method based on improved time-correlation analysis. Firstly, the data are preprocessed by grey relational analysis, and the time delay sequence and time-correlation data matrix are constructed. The time-correlation analysis matrix is defined, and the H norm quantifies the correlation degree of the data sequence. Thus the multi-delay identification problem is transformed into an integer optimization problem. Secondly, an improved discrete state transition algorithm is used for optimization to obtain multi-delay. Finally, based on an Neodymium (Nd) component content model constructed by a wavelet neural network, the performance of the proposed method is compared with the unimproved time delay identification method and the model without an identification method. The results show that the proposed algorithm improves optimization accuracy, convergence speed, and stability. The performance of the component content model after time delay identification is significantly improved using the proposed method, which verifies its effectiveness in the time delay identification of the rare earth extraction process. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

16 pages, 553 KiB  
Article
Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study
by Rafael Silva Barbon and Ademar Takeo Akabane
Sensors 2022, 22(21), 8184; https://doi.org/10.3390/s22218184 - 26 Oct 2022
Cited by 8 | Viewed by 3047
Abstract
The Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing [...] Read more.
The Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing text classification. State-of-the-art outcomes have recently been obtained by employing language models trained from scratch on corpora made up from news online to handle text classification better. A language model that we can highlight is BERT (Bidirectional Encoder Representations from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation model. In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in different datasets. The results show that DistilBERT’s training time for English and Brazilian Portuguese was about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of language comprehension skills for balanced datasets. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

17 pages, 1216 KiB  
Article
An Improved IoT-Based System for Detecting the Number of People and Their Distribution in a Classroom
by Slavomir Matuska, Juraj Machaj, Robert Hudec and Patrik Kamencay
Sensors 2022, 22(20), 7912; https://doi.org/10.3390/s22207912 - 18 Oct 2022
Cited by 5 | Viewed by 1877
Abstract
This paper presents an improved IoT-based system designed to help teachers handle lessons in the classroom in line with COVID-19 restrictions. The system counts the number of people in the classroom as well as their distribution within the classroom. The proposed IoT system [...] Read more.
This paper presents an improved IoT-based system designed to help teachers handle lessons in the classroom in line with COVID-19 restrictions. The system counts the number of people in the classroom as well as their distribution within the classroom. The proposed IoT system consists of three parts: a Gate node, IoT nodes, and server. The Gate node, installed at the door, can provide information about the number of persons entering or leaving the room using door crossing detection. The Arduino-based module NodeMCU was used as an IoT node and sets of ultrasonic distance sensors were used to obtain information about seat occupancy. The system server runs locally on a Raspberry Pi and the teacher can connect to it using a web application from the computer in the classroom or a smartphone. The teacher is able to set up and change the settings of the system through its GUI. A simple algorithm was designed to check the distance between occupied seats and evaluate the accordance with imposed restrictions. This system can provide high privacy, unlike camera-based systems. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

28 pages, 11112 KiB  
Article
Physical and Digital Infrastructure Readiness Index for Connected and Automated Vehicles
by Boris Cucor, Tibor Petrov, Patrik Kamencay, Ghadir Pourhashem and Milan Dado
Sensors 2022, 22(19), 7315; https://doi.org/10.3390/s22197315 - 27 Sep 2022
Cited by 6 | Viewed by 2042
Abstract
In this paper, we present an assessment framework that can be used to score segments of physical and digital infrastructure based on their features and readiness to expedite the deployment of Connected and Automated Vehicles (CAVs). We discuss the equipment and methodology applied [...] Read more.
In this paper, we present an assessment framework that can be used to score segments of physical and digital infrastructure based on their features and readiness to expedite the deployment of Connected and Automated Vehicles (CAVs). We discuss the equipment and methodology applied for the collection and analysis of required data to score the infrastructure segments in an automated way. Moreover, we demonstrate how the proposed framework can be applied using data collected on a public transport route in the city of Zilina, Slovakia. We use two types of data to demonstrate the methodology of the assessment-connectivity and positioning data to assess the connectivity and localization performance provided by the infrastructure and image data for road signage detection using a Convolutional Neural Network (CNN). The core of the research is a dataset that can be used for further research work. We collected and analyzed data in two settings—an urban and suburban area. Despite the fact that the connectivity and positioning data were collected in different days and times, we found highly underserved areas along the investigated route. The main problem from the point of view of communication in the investigated area is the latency, which is an issue associated with infrastructure segments mainly located at intersections with heavy traffic or near various points of interest. The low accuracy of localization has been observed mainly in dense areas with large buildings and trees, which decrease the number of visible localization satellites. To address the problem of automated assessment of the traffic sign recognition precision, we proposed a CNN that achieved 99.7% precision. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

17 pages, 3299 KiB  
Article
Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions
by Kristína Machová, Marián Mach and Kamil Adamišín
Sensors 2022, 22(17), 6468; https://doi.org/10.3390/s22176468 - 27 Aug 2022
Cited by 6 | Viewed by 2090
Abstract
This article focuses on the problem of detecting toxicity in online discussions. Toxicity is currently a serious problem when people are largely influenced by opinions on social networks. We offer a solution based on classification models using machine learning methods to classify short [...] Read more.
This article focuses on the problem of detecting toxicity in online discussions. Toxicity is currently a serious problem when people are largely influenced by opinions on social networks. We offer a solution based on classification models using machine learning methods to classify short texts on social networks into multiple degrees of toxicity. The classification models used both classic methods of machine learning, such as naïve Bayes and SVM (support vector machine) as well ensemble methods, such as bagging and RF (random forest). The models were created using text data, which we extracted from social networks in the Slovak language. The labelling of our dataset of short texts into multiple classes—the degrees of toxicity—was provided automatically by our method based on the lexicon approach to texts processing. This lexicon method required creating a dictionary of toxic words in the Slovak language, which is another contribution of the work. Finally, an application was created based on the learned machine learning models, which can be used to detect the degree of toxicity of new social network comments as well as for experimentation with various machine learning methods. We achieved the best results using an SVM—average value of accuracy = 0.89 and F1 = 0.79. This model also outperformed the ensemble learning by the RF and Bagging methods; however, the ensemble learning methods achieved better results than the naïve Bayes method. Full article
(This article belongs to the Special Issue Sensors Data Processing Using Machine Learning)
Show Figures

Figure 1

Back to TopTop