Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (32)

Search Parameters:
Keywords = Image2Vec

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 7222 KB  
Article
Multi-Channel Spectro-Temporal Representations for Speech-Based Parkinson’s Disease Detection
by Hadi Sedigh Malekroodi, Nuwan Madusanka, Byeong-il Lee and Myunggi Yi
J. Imaging 2025, 11(10), 341; https://doi.org/10.3390/jimaging11100341 - 1 Oct 2025
Viewed by 198
Abstract
Early, non-invasive detection of Parkinson’s Disease (PD) using speech analysis offers promise for scalable screening. In this work, we propose a multi-channel spectro-temporal deep-learning approach for PD detection from sentence-level speech, a clinically relevant yet underexplored modality. We extract and fuse three complementary [...] Read more.
Early, non-invasive detection of Parkinson’s Disease (PD) using speech analysis offers promise for scalable screening. In this work, we propose a multi-channel spectro-temporal deep-learning approach for PD detection from sentence-level speech, a clinically relevant yet underexplored modality. We extract and fuse three complementary time–frequency representations—mel spectrogram, constant-Q transform (CQT), and gammatone spectrogram—into a three-channel input analogous to an RGB image. This fused representation is evaluated across CNNs (ResNet, DenseNet, and EfficientNet) and Vision Transformer using the PC-GITA dataset, under 10-fold subject-independent cross-validation for robust assessment. Results showed that fusion consistently improves performance over single representations across architectures. EfficientNet-B2 achieves the highest accuracy (84.39% ± 5.19%) and F1-score (84.35% ± 5.52%), outperforming recent methods using handcrafted features or pretrained models (e.g., Wav2Vec2.0, HuBERT) on the same task and dataset. Performance varies with sentence type, with emotionally salient and prosodically emphasized utterances yielding higher AUC, suggesting that richer prosody enhances discriminability. Our findings indicate that multi-channel fusion enhances sensitivity to subtle speech impairments in PD by integrating complementary spectral information. Our approach implies that multi-channel fusion could enhance the detection of discriminative acoustic biomarkers, potentially offering a more robust and effective framework for speech-based PD screening, though further validation is needed before clinical application. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

18 pages, 16540 KB  
Article
E-CMCA and LSTM-Enhanced Framework for Cross-Modal MRI-TRUS Registration in Prostate Cancer
by Ciliang Shao, Ruijin Xue and Lixu Gu
J. Imaging 2025, 11(9), 292; https://doi.org/10.3390/jimaging11090292 - 27 Aug 2025
Viewed by 516
Abstract
Accurate registration of MRI and TRUS images is crucial for effective prostate cancer diagnosis and biopsy guidance, yet modality differences and non-rigid deformations pose significant challenges, especially in dynamic imaging. This study presents a novel cross-modal MRI-TRUS registration framework, leveraging a dual-encoder architecture [...] Read more.
Accurate registration of MRI and TRUS images is crucial for effective prostate cancer diagnosis and biopsy guidance, yet modality differences and non-rigid deformations pose significant challenges, especially in dynamic imaging. This study presents a novel cross-modal MRI-TRUS registration framework, leveraging a dual-encoder architecture with an Enhanced Cross-Modal Channel Attention (E-CMCA) module and a LSTM-Based Spatial Deformation Modeling Module. The E-CMCA module efficiently extracts and integrates multi-scale cross-modal features, while the LSTM-Based Spatial Deformation Modeling Module models temporal dynamics by processing depth-sliced 3D deformation fields as sequential data. A VecInt operation ensures smooth, diffeomorphic transformations, and a FuseConv layer enhances feature integration for precise alignment. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model significantly improves registration accuracy and performs robustly in both static 3D and dynamic 4D registration tasks. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model achieves a DSC of 0.865, RDSC of 0.898, TRE of 2.278 mm, and RTRE of 1.293, surpassing state-of-the-art methods and performing robustly in both static 3D and dynamic 4D registration tasks. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

28 pages, 24868 KB  
Article
Deep Meta-Connectivity Representation for Optically-Active Water Quality Parameters Estimation Through Remote Sensing
by Fangling Pu, Ziang Luo, Yiming Yang, Hongjia Chen, Yue Dai and Xin Xu
Remote Sens. 2025, 17(16), 2782; https://doi.org/10.3390/rs17162782 - 11 Aug 2025
Viewed by 412
Abstract
Monitoring optically-active water quality (OAWQ) parameters faces key challenges, primarily due to limited in situ measurements and the restricted availability of high-resolution multispectral remote sensing imagery. While deep learning has shown promise for OAWQ estimation, existing approaches such as GeoTile2Vec, which relies on [...] Read more.
Monitoring optically-active water quality (OAWQ) parameters faces key challenges, primarily due to limited in situ measurements and the restricted availability of high-resolution multispectral remote sensing imagery. While deep learning has shown promise for OAWQ estimation, existing approaches such as GeoTile2Vec, which relies on geographic proximity, and SimCLR, a domain-agnostic contrastive learning method, fail to capture land cover-driven water quality patterns, limiting their generalizability. To address this, we present deep meta-connectivity representation (DMCR), which integrates multispectral remote sensing imagery with limited in situ measurements to estimate OAWQ parameters. Our approach constructs meta-feature vectors from land cover images to represent the water quality characteristics of each multispectral remote sensing image tile. We introduce the meta-connectivity concept to quantify the OAWQ similarity between different tiles. Building on this concept, we design a contrastive self-supervised learning framework that uses sets of quadruple tiles extracted from Sentinel-2 imagery based on their meta-connectivity to learn DMCR vectors. After the core neural network is trained, we apply a random forest model to estimate parameters such as chlorophyll-a (Chl-a) and turbidity using matched in situ measurements and DMCR vectors across time and space. We evaluate DMCR on Lake Erie and Lake Ontario, generating a series of Chl-a and turbidity distribution maps. Performance is assessed using the R2 and RMSE metrics. Results show that meta-connectivity more effectively captures water quality similarities between tiles than widely utilized geographic proximity approaches such as those used in GeoTile2Vec. Furthermore, DMCR outperforms baseline models such as SimCLR with randomly cropped tiles. The resulting distribution maps align well with known factors influencing Chl-a and turbidity levels, confirming the method’s reliability. Overall, DMCR demonstrates strong potential for large-scale OAWQ estimation and contributes to improved monitoring of inland water bodies with limited in situ measurements through meta-connectivity-informed deep learning. The temporal-spatial water quality maps can support large-scale inland water monitoring, early warning of harmful algal blooms. Full article
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)
Show Figures

Figure 1

14 pages, 2138 KB  
Article
Comparison Between Bond Strengths of a Resin Cement on Traditional Prosthetic Substrates and a 3D-Printed Resin for Permanent Restorations
by Alessandro Vichi, Hanan Al-Johani, Dario Balestra and Chris Louca
Coatings 2025, 15(8), 896; https://doi.org/10.3390/coatings15080896 - 1 Aug 2025
Viewed by 891
Abstract
Recently, 3D-printed resins have been introduced as materials for definitive indirect restorations. Herein, a comparative assessment of the bond strengths of 3D-printed resins to a resin cement was performed. Methods: four definitive restorative materials were selected, i.e., a Feldspar ceramic (VITA Mark II, [...] Read more.
Recently, 3D-printed resins have been introduced as materials for definitive indirect restorations. Herein, a comparative assessment of the bond strengths of 3D-printed resins to a resin cement was performed. Methods: four definitive restorative materials were selected, i.e., a Feldspar ceramic (VITA Mark II, VM), a polymer-infiltrated ceramic network (VITA Enamic, VE), a nanohybrid resin composite (Grandio Bloc, GB), and one 3D-printed resin (Crown Permanent, CP). VM and VE were etched and silanized, GB was sandblasted, and CP was glass bead blasted; for one further experimental group, this was followed by sandblasting (CPs). A resin cement (RelyX Unicem) was then used for bonding, and then a notched shear bond strength test (nSBS) was performed. Failure modes were observed and classified as adhesive, cohesive, or mixed, and SEM representative images were taken. Data were statistically analyzed with one-way ANOVA, Tukey, and Chi-square tests. Significant differences were detected in nSBS among materials (p < 0.001). The highest nSBS was found in VM (30.3 ± 1.8 MPa) a, followed by CPb, GBbc, CPbc, and VEc. Failure modes were significantly different (p < 0.001), and with different prevalent failure modes. The bond strength for 3D-printed permanent resin materials was shown to be lower than that of the felspathic ceramic but comparable to that of the resin block and PICN substrates. Full article
(This article belongs to the Special Issue Advanced Polymer Coatings: Materials, Methods, and Applications)
Show Figures

Figure 1

7 pages, 11708 KB  
Proceeding Paper
Urban Functional Zone Mapping by Integrating Multi-Source Data and Spatial Relationship Characteristics
by Daoyou Zhu, Xu Dang, Wenjia Shi, Yixiang Chen and Wenmei Li
Proceedings 2024, 110(1), 17; https://doi.org/10.3390/proceedings2024110017 - 4 Dec 2024
Cited by 1 | Viewed by 1090
Abstract
Timely and precise acquisition of urban functional zone (UFZ) information is crucial for effective urban planning, management, and resource allocation. However, current UFZ mapping approaches primarily focus on individual functional units’ visual and semantic characteristics, often overlooking the crucial spatial relationships between them, [...] Read more.
Timely and precise acquisition of urban functional zone (UFZ) information is crucial for effective urban planning, management, and resource allocation. However, current UFZ mapping approaches primarily focus on individual functional units’ visual and semantic characteristics, often overlooking the crucial spatial relationships between them, resulting in classification inaccuracies. To address this limitation, our study presents a novel framework for UFZ classification that seamlessly integrates visual image features, Points of Interest (POI) semantic attributes, and spatial relationship information. This framework leverages the OpenStreetMap (OSM) road network to partition the study area into functional units, employs a graph model to represent urban functional nodes and their intricate spatial topological relationships, and harnesses the capabilities of Graph Convolutional Network (GCN) to fuse these multi-dimensional features through end-to-end learning for accurate urban function discrimination. Experimental evaluations utilizing Gaofen-2 (GF-2) satellite imagery, POI data, and OSM road network information from Shenzhen, China have yielded remarkable results. Our method has achieved significant improvements in classification accuracy across all functional categories, surpassing approaches that rely solely on visual or semantic features. Notably, the overall classification accuracy reached an impressive 87.92%, marking a significant 2.08% increase over methods that disregard spatial relationship features. Furthermore, our method has demonstrated superior performance when compared to similar techniques, underscoring its effectiveness and potential for widespread application in UFZ classification. Full article
(This article belongs to the Proceedings of The 31st International Conference on Geoinformatics)
Show Figures

Figure 1

31 pages, 4733 KB  
Article
Enhanced Network Intrusion Detection System for Internet of Things Security Using Multimodal Big Data Representation with Transfer Learning and Game Theory
by Farhan Ullah, Ali Turab, Shamsher Ullah, Diletta Cacciagrano and Yue Zhao
Sensors 2024, 24(13), 4152; https://doi.org/10.3390/s24134152 - 26 Jun 2024
Cited by 22 | Viewed by 7099
Abstract
Internet of Things (IoT) applications and resources are highly vulnerable to flood attacks, including Distributed Denial of Service (DDoS) attacks. These attacks overwhelm the targeted device with numerous network packets, making its resources inaccessible to authorized users. Such attacks may comprise attack references, [...] Read more.
Internet of Things (IoT) applications and resources are highly vulnerable to flood attacks, including Distributed Denial of Service (DDoS) attacks. These attacks overwhelm the targeted device with numerous network packets, making its resources inaccessible to authorized users. Such attacks may comprise attack references, attack types, sub-categories, host information, malicious scripts, etc. These details assist security professionals in identifying weaknesses, tailoring defense measures, and responding rapidly to possible threats, thereby improving the overall security posture of IoT devices. Developing an intelligent Intrusion Detection System (IDS) is highly complex due to its numerous network features. This study presents an improved IDS for IoT security that employs multimodal big data representation and transfer learning. First, the Packet Capture (PCAP) files are crawled to retrieve the necessary attacks and bytes. Second, Spark-based big data optimization algorithms handle huge volumes of data. Second, a transfer learning approach such as word2vec retrieves semantically-based observed features. Third, an algorithm is developed to convert network bytes into images, and texture features are extracted by configuring an attention-based Residual Network (ResNet). Finally, the trained text and texture features are combined and used as multimodal features to classify various attacks. The proposed method is thoroughly evaluated on three widely used IoT-based datasets: CIC-IoT 2022, CIC-IoT 2023, and Edge-IIoT. The proposed method achieves excellent classification performance, with an accuracy of 98.2%. In addition, we present a game theory-based process to validate the proposed approach formally. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

20 pages, 4414 KB  
Article
Uncertainty in Visual Generative AI
by Kara Combs, Adam Moyer and Trevor J. Bihl
Algorithms 2024, 17(4), 136; https://doi.org/10.3390/a17040136 - 27 Mar 2024
Cited by 6 | Viewed by 4269
Abstract
Recently, generative artificial intelligence (GAI) has impressed the world with its ability to create text, images, and videos. However, there are still areas in which GAI produces undesirable or unintended results due to being “uncertain”. Before wider use of AI-generated content, it is [...] Read more.
Recently, generative artificial intelligence (GAI) has impressed the world with its ability to create text, images, and videos. However, there are still areas in which GAI produces undesirable or unintended results due to being “uncertain”. Before wider use of AI-generated content, it is important to identify concepts where GAI is uncertain to ensure the usage thereof is ethical and to direct efforts for improvement. This study proposes a general pipeline to automatically quantify uncertainty within GAI. To measure uncertainty, the textual prompt to a text-to-image model is compared to captions supplied by four image-to-text models (GIT, BLIP, BLIP-2, and InstructBLIP). Its evaluation is based on machine translation metrics (BLEU, ROUGE, METEOR, and SPICE) and word embedding’s cosine similarity (Word2Vec, GloVe, FastText, DistilRoBERTa, MiniLM-6, and MiniLM-12). The generative AI models performed consistently across the metrics; however, the vector space models yielded the highest average similarity, close to 80%, which suggests more ideal and “certain” results. Suggested future work includes identifying metrics that best align with a human baseline to ensure quality and consideration for more GAI models. The work within can be used to automatically identify concepts in which GAI is “uncertain” to drive research aimed at increasing confidence in these areas. Full article
(This article belongs to the Special Issue Artificial Intelligence in Modeling and Simulation)
Show Figures

Graphical abstract

22 pages, 5939 KB  
Article
Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing Homes
by Feng Zhou, Shijing Hu, Xiaoli Wan, Zhihui Lu and Jie Wu
Electronics 2023, 12(15), 3206; https://doi.org/10.3390/electronics12153206 - 25 Jul 2023
Cited by 8 | Viewed by 2093
Abstract
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer [...] Read more.
The intensification of population aging has brought pressure on public medical care. In order to reduce this pressure, we combined the image classification method with computer vision and used audio data that is easy to collect in nursing homes. Based on MelGAN, transfer learning, and Vision Transformer, we propose an application called Risevi (A Disease Risk Prediction Model Based on Vision Transformer), a disease risk prediction model for nursing homes. We first design a sample generation method based on MelGAN, then refer to the Mel frequency cepstral coefficient and the Wav2vec2 model to design the sample feature extraction method, perform floating-point operations on the tensor of the extracted features, and then convert it into a waveform. We then design a sample feature classification method based on transfer learning and Vision Transformer. Finally, we obtain the Risevi model. In this paper, we use public datasets and subject data as sample data. The experimental results show that the Risevi model has achieved an accuracy rate of 98.5%, a precision rate of 96.38%, a recall rate of 98.17%, and an F1 score of 97.15%. The experimental results show that the Risevi model can provide practical support for reducing public medical pressure. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

24 pages, 7714 KB  
Article
Vector Road Map Updating from High-Resolution Remote-Sensing Images with the Guidance of Road Intersection Change Detection and Directed Road Tracing
by Haigang Sui, Ning Zhou, Mingting Zhou and Liang Ge
Remote Sens. 2023, 15(7), 1840; https://doi.org/10.3390/rs15071840 - 30 Mar 2023
Cited by 5 | Viewed by 4357
Abstract
Updating vector road maps from current remote-sensing images provides fundamental data for applications, such as smart transportation and autonomous driving. Updating historical road vector maps involves verifying unchanged roads, extracting newly built roads, and removing disappeared roads. Prior work extracted roads from a [...] Read more.
Updating vector road maps from current remote-sensing images provides fundamental data for applications, such as smart transportation and autonomous driving. Updating historical road vector maps involves verifying unchanged roads, extracting newly built roads, and removing disappeared roads. Prior work extracted roads from a current remote-sensing image to build a new road vector map, yielding inaccurate results and redundant processing procedures. In this paper, we argue that changes in roads are closely related to changes in road intersections. Hence, a novel changed road-intersection-guided vector road map updating framework (VecRoadUpd) is proposed to update road vector maps with high efficiency and accuracy. Road-intersection changes include the detection of newly built or disappeared road junctions and the discovery of road branch changes at each road junction. A CNN-based intersection-detection network (CINet) is adopted to extract road intersections from a current image and an old road vector map to discover newly built or disappeared road junctions. A road branch detection network (RoadBranchNet) is used to detect the direction of road branches for each road junction to find road branch changes. Based on the discovery of direction-changed road branches, the VecRoadUpd framework extracts newly built roads and removes disappeared roads through directed road tracing, thus, updating the whole road vector map. Extensive experiments conducted on the public MUNO21 dataset demonstrate that the proposed VecRoadUpd framework exceeds the comparative methods by 11.01% in pixel-level Qual-improvement and 13.85% in graph-level F1-score. Full article
(This article belongs to the Section Urban Remote Sensing)
Show Figures

Figure 1

26 pages, 2453 KB  
Article
AQSA: Aspect-Based Quality Sentiment Analysis for Multi-Labeling with Improved ResNet Hybrid Algorithm
by Muhammad Irfan, Nasir Ayub, Qazi Arbab Ahmed, Saifur Rahman, Muhammad Salman Bashir, Grzegorz Nowakowski, Samar M. Alqhtani and Marek Sieja
Electronics 2023, 12(6), 1298; https://doi.org/10.3390/electronics12061298 - 8 Mar 2023
Cited by 10 | Viewed by 3498
Abstract
Sentiment analysis (SA) is an area of study currently being investigated in text mining. SA is the computational handling of a text’s views, emotions, subjectivity, and subjective nature. The researchers realized that generating generic sentiment from textual material was inadequate, so they developed [...] Read more.
Sentiment analysis (SA) is an area of study currently being investigated in text mining. SA is the computational handling of a text’s views, emotions, subjectivity, and subjective nature. The researchers realized that generating generic sentiment from textual material was inadequate, so they developed SA to extract expressions from textual information. The problem of removing emotional aspects through multi-labeling based on data from certain aspects may be resolved. This article proposes the swarm-based hybrid model residual networks with sand cat swarm optimization (ResNet-SCSO), a novel method for increasing the precision and variation of learning the text with the multi-labeling method. Contrary to existing multi-label training approaches, ResNet-SCSO highlights the diversity and accuracy of methodologies based on multi-labeling. Five distinct datasets were analyzed (movies, research articles, medical, birds, and proteins). To achieve accurate and improved data, we initially used preprocessing. Secondly, we used the GloVe and TF-IDF to extract features. Thirdly, a word association is created using the word2vec method. Additionally, the enhanced data are utilized for training and validating the ResNet model (tuned with SCSO). We tested the accuracy of ResNet-SCSO on research article, medical, birds, movie, and protein images using the aspect-based multi-labeling method. The accuracy was 95%, 96%, 97%, 92%, and 96%, respectively. With multi-label datasets of varying dimensions, our proposed model shows that ResNet-SCSO is significantly better than other commonly used techniques. Experimental findings confirm the implemented strategy’s success compared to existing benchmark methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Technologies and Applications)
Show Figures

Figure 1

12 pages, 7154 KB  
Article
Vertebral Endplate Concavity in Lateral Lumbar Interbody Fusion: Tapered 3D-Printed Porous Titanium Cage versus Squared PEEK Cage
by Naoki Segi, Hiroaki Nakashima, Ryuichi Shinjo, Yujiro Kagami, Masaaki Machino, Sadayuki Ito, Jun Ouchida, Kazuaki Morishita, Ryotaro Oishi, Ippei Yamauchi and Shiro Imagama
Medicina 2023, 59(2), 372; https://doi.org/10.3390/medicina59020372 - 15 Feb 2023
Cited by 7 | Viewed by 3240
Abstract
Background and Objectives: To prevent postoperative problems in extreme lateral interbody fusion (XLIF), it is critical that the vertebral endplate not be injured. Unintentional endplate injuries may depend on the cage. A novel porous titanium cage for XLIF has improved geometry with [...] Read more.
Background and Objectives: To prevent postoperative problems in extreme lateral interbody fusion (XLIF), it is critical that the vertebral endplate not be injured. Unintentional endplate injuries may depend on the cage. A novel porous titanium cage for XLIF has improved geometry with a tapered tip and smooth surface. We hypothesized that this new cage should lead to fewer endplate injuries. Materials and Methods: This retrospective study included 32 patients (mean 74.1 ± 6.7 years, 22 females) who underwent anterior and posterior combined surgery with XLIF for lumbar degenerative disease or adult spinal deformity from January 2018 to June 2022. A tapered 3D porous titanium cage (3DTi; 11 patients) and a squared PEEK cage (sPEEK; 21 patients) were used. Spinal alignment values were measured on X-ray images. Vertebral endplate concavity (VEC) was defined as concavity ≥ 1 mm of the endplate on computed tomography (CT) images, which were evaluated preoperatively and at 1 week and 3 months postoperatively. Results: There were no significant differences in the patient demographic data and preoperative and 3-month postoperative spinal alignments between the groups. A 3DTi was used for 25 levels and an sPEEK was used for 38 levels. Preoperative local lordotic angles were 4.3° for 3DTi vs. 4.7° for sPEEK (p = 0.90), which were corrected to 12.3° and 9.1° (p = 0.029), respectively. At 3 months postoperatively, the angles were 11.6° for 3DTi and 8.2° for sPEEK (p = 0.013). VEC was present in 2 levels (8.0%) for 3DTi vs. 17 levels (45%) for sPEEK (p = 0.002). After 3 months postoperatively, none of the 3DTi had VEC progression; however, eight (21%) levels in sPEEK showed VEC progression (p = 0.019). Conclusions: The novel 3DTi cage reduced endplate injuries by reducing the endplate load during cage insertion. Full article
Show Figures

Graphical abstract

26 pages, 3801 KB  
Article
Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu
by Samia Aziz, Muhammad Shahzad Sarfraz, Muhammad Usman, Muhammad Umar Aftab and Hafiz Tayyab Rauf
Mathematics 2023, 11(4), 969; https://doi.org/10.3390/math11040969 - 14 Feb 2023
Cited by 12 | Viewed by 5197 | Correction
Abstract
Social media has transformed into a crucial channel for political expression. Twitter, especially, is a vital platform used to exchange political hate in Pakistan. Political hate speech affects the public image of politicians, targets their supporters, and hurts public sentiments. Hate speech is [...] Read more.
Social media has transformed into a crucial channel for political expression. Twitter, especially, is a vital platform used to exchange political hate in Pakistan. Political hate speech affects the public image of politicians, targets their supporters, and hurts public sentiments. Hate speech is a controversial public speech that promotes violence toward a person or group based on specific characteristics. Although studies have been conducted to identify hate speech in European languages, Roman languages have yet to receive much attention. In this research work, we present the automatic detection of political hate speech in Roman Urdu. An exclusive political hate speech labeled dataset (RU-PHS) containing 5002 instances and city-level information has been developed. To overcome the vast lexical structure of Roman Urdu, we propose an algorithm for the lexical unification of Roman Urdu. Three vectorization techniques are developed: TF-IDF, word2vec, and fastText. A comparative analysis of the accuracy and time complexity of conventional machine learning models and fine-tuned neural networks using dense word representations is presented for classifying and predicting political hate speech. The results show that a random forest and the proposed feed-forward neural network achieve an accuracy of 93% using fastText word embedding to distinguish between neutral and politically offensive speech. The statistical information helps identify trends and patterns, and the hotspot and cluster analysis assist in pinpointing Punjab as a highly susceptible area in Pakistan in terms of political hate tweet generation. Full article
(This article belongs to the Special Issue New Insights in Machine Learning and Deep Neural Networks)
Show Figures

Figure 1

13 pages, 3147 KB  
Article
Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
by Shihan Ma and Jidong J. Yang
Eng 2023, 4(1), 444-456; https://doi.org/10.3390/eng4010027 - 1 Feb 2023
Cited by 6 | Viewed by 2421
Abstract
This paper introduces a novel approach to leveraging features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle [...] Read more.
This paper introduces a novel approach to leveraging features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multiple layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the previously learned representations in harmony with the wheel positional features during the training of the classifier. Our experiments show that the data2vec-distilled representations, which are consistent with our wheel masking strategy, outperformed the DINO counterpart, resulting in a celebrated Top-1 classification accuracy of 97.2% for classifying the 13 vehicle classes defined by the Federal Highway Administration. Full article
(This article belongs to the Special Issue Feature Papers in Eng 2022)
Show Figures

Figure 1

18 pages, 3434 KB  
Article
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images
by Agus Nursikuwagus, Rinaldi Munir and Masayu Leylia Khodra
J. Imaging 2022, 8(11), 294; https://doi.org/10.3390/jimaging8110294 - 22 Oct 2022
Cited by 5 | Viewed by 4154
Abstract
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, [...] Read more.
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger. Full article
Show Figures

Figure 1

17 pages, 4611 KB  
Article
The Application of Artificial Intelligence to Automate Sensory Assessments Combining Pretrained Transformers with Word Embedding Based on the Online Sensory Marketing Index
by Kevin Hamacher and Rüdiger Buchkremer
Computers 2022, 11(9), 129; https://doi.org/10.3390/computers11090129 - 26 Aug 2022
Cited by 2 | Viewed by 5019
Abstract
We present how artificial intelligence (AI)-based technologies create new opportunities to capture and assess sensory marketing elements. Based on the Online Sensory Marketing Index (OSMI), a sensory assessment framework designed to evaluate e-commerce websites manually, the goal is to offer an alternative procedure [...] Read more.
We present how artificial intelligence (AI)-based technologies create new opportunities to capture and assess sensory marketing elements. Based on the Online Sensory Marketing Index (OSMI), a sensory assessment framework designed to evaluate e-commerce websites manually, the goal is to offer an alternative procedure to assess sensory elements such as text and images automatically. This approach aims to provide marketing managers with valuable insights and potential for sensory marketing improvements. To accomplish the task, we initially reviewed 469 related peer-reviewed scientific publications. In this process, manual reading is complemented by a validated AI methodology. We identify relevant topics and check if they exhibit a comprehensible distribution over the last years. We recognize and discuss similar approaches from machine learning and the big data environment. We apply state-of-the-art methods from the natural language processing domain for the principal analysis, such as word embedding techniques GloVe and Word2Vec, and leverage transformers such as BERT. To validate the performance of our newly developed AI approach, we compare results with manually collected parameters from previous studies and observe similar findings in both procedures. Our results reveal a functional and scalable AI approach for determining the OSMI for industries, companies, or even individual (sub-) websites. In addition, the new AI selection and assessment procedures are extremely fast, with only a small loss in performance compared to a manual evaluation. It resembles an efficient way to evaluate sensory marketing efforts. Full article
Show Figures

Graphical abstract

Back to TopTop