MDPI - Publisher of Open Access Journals

25 pages, 9913 KB

Open AccessArticle

Video-Based CSwin Transformer Using Selective Filtering Technique for Interstitial Syndrome Detection

by Khalid Moafa, Maria Antico, Christopher Edwards, Marian Steffens, Jason Dowling, David Canty and Davide Fontanarosa

Appl. Sci. 2025, 15(16), 9126; https://doi.org/10.3390/app15169126 - 19 Aug 2025

Viewed by 880

Abstract

Interstitial lung diseases (ILD) significantly impact health and mortality, affecting millions of individuals worldwide. During the COVID-19 pandemic, lung ultrasonography (LUS) became an indispensable diagnostic and management tool for lung disorders. However, utilising LUS to diagnose ILD requires significant expertise. This research aims to develop an automated and efficient approach for diagnosing ILD from LUS videos using AI to support clinicians in their diagnostic procedures. We developed a binary classifier based on a state-of-the-art CSwin Transformer to discriminate between LUS videos from healthy and non-healthy patients. We used a multi-centric dataset from the Royal Melbourne Hospital (Australia) and the ULTRa Lab at the University of Trento (Italy), comprising 60 LUS videos. Each video corresponds to a single patient, comprising 30 healthy individuals and 30 patients with ILD, with frame counts ranging from 96 to 300 per video. Each video is annotated using the corresponding medical report as ground truth. The datasets used for training the model underwent selective frame filtering, including reduction in frame numbers to eliminate potentially misleading frames in non-healthy videos. This step was crucial because some ILD videos included segments of normal frames, which could be mixed with the pathological features and mislead the model. To address this, we eliminated frames with a healthy appearance, such as frames without B-lines, thereby ensuring that training focused on diagnostically relevant features. The trained model was assessed on an unseen, separate dataset of 12 videos (3 healthy and 9 ILD) with frame counts ranging from 96 to 300 per video. The model achieved an average classification accuracy of 91%, calculated as the mean of three testing methods: Random Sampling (92%), Key Featuring (92%), and Chunk Averaging (89%). In RS, 32 frames were randomly selected from each of the 12 videos, resulting in a classification with 92% accuracy, with specificity, precision, recall, and F1-score of 100%, 100%, 90%, and 95%, respectively. Similarly, KF, which involved manually selecting 32 key frames based on representative frames from each of the 12 videos, achieved 92% accuracy with a specificity, precision, recall, and F1-score of 100%, 100%, 90%, and 95%, respectively. In contrast, the CA method, where the 12 videos were divided into video segments (chunks) of 32 consecutive frames, with 82 video segments, achieved an 89% classification accuracy (73 out of 82 video segments). Among the 9 misclassified segments in the CA method, 6 were false positives and 3 were false negatives, corresponding to an 11% misclassification rate. The accuracy differences observed between the three training scenarios were confirmed to be statistically significant via inferential analysis. A one-way ANOVA conducted on the 10-fold cross-validation accuracies yielded a large F-statistic of 2135.67 and a small p-value of 6.7 × 10⁻²⁶, indicating highly significant differences in model performance. The proposed approach is a valid solution for fully automating LUS disease detection, aligning with clinical diagnostic practices that integrate dynamic LUS videos. In conclusion, introducing the selective frame filtering technique to refine the dataset training reduced the effort required for labelling. Full article

► Show Figures

Figure 1

24 pages, 19576 KB

Open AccessArticle

Evaluating HAS and Low-Latency Streaming Algorithms for Enhanced QoE

by Syed Uddin, Michał Grega, Mikołaj Leszczuk and Waqas ur Rahman

Electronics 2025, 14(13), 2587; https://doi.org/10.3390/electronics14132587 - 26 Jun 2025

Cited by 2 | Viewed by 5043

Abstract

The demand for multimedia traffic over the Internet is exponentially growing. HTTP adaptive streaming (HAS) is the leading video delivery system that delivers high-quality video to the end user. The adaptive bitrate (ABR) algorithms running on the HTTP client select the highest feasible video quality by adjusting the quality according to the fluctuating network conditions. Recently, low-latency ABR algorithms have been introduced to reduce the end-to-end latency commonly experienced in HAS. However, a comprehensive study of the low-latency algorithms remains limited. This paper investigates the effectiveness of low-latency streaming algorithms in maintaining a high quality of experience (QoE) while minimizing playback delay. We evaluate these algorithms in the context of both Dynamic Adaptive Streaming over HTTP (DASH) and the Common Media Application Format (CMAF), with a particular focus on the impact of chunked encoding and transfer mechanisms on the QoE. We perform both objective as well as subjective evaluations of low-latency algorithms and compare their performance with traditional DASH-based ABR algorithms across multiple QoE metrics, various network conditions, and diverse content types. The results demonstrate that low-latency algorithms consistently deliver high video quality across various content types and network conditions, whereas the performance of the traditional adaptive bitrate (ABR) algorithms exhibit performance variability under fluctuating network conditions and diverse content characteristics. Although traditional ABR algorithms download higher-quality segments in stable network environments, their effectiveness significantly declines under unstable conditions. Furthermore, the low-latency algorithms maintained high user experience regardless of segment duration. In contrast, the performance of traditional algorithms varied significantly with changes in segment duration. In summary, the results underscore that no single algorithm consistently achieves optimal performance across all experimental conditions. Performance varies depending on network stability, content characteristics, and segment duration, highlighting the need for adaptive strategies that can dynamically respond to varying streaming environments. Full article

(This article belongs to the Special Issue Video Streaming Service Solutions)

► Show Figures

Figure 1

17 pages, 771 KB

Open AccessArticle

PaCs: Playing Time-Aware Chunk Selection in Short Video Preloading

by Sen Fu, Guanyu Yang, Zhengjun Yao, Shuxin Tan, Yongxin Shan and Wanchun Jiang

Electronics 2024, 13(24), 4864; https://doi.org/10.3390/electronics13244864 - 10 Dec 2024

Cited by 1 | Viewed by 1410

Abstract

Short video applications are popular nowadays. Typically, a short video may be played for a few seconds or swiped away by users at any time. Due to this uncertain user behavior, video chunks should be preloaded to ensure a smooth viewing process for users. In other words, the short video preloading scheme is crucial to the quality of experience (QoE) of users and revenue. Specifically, the short video preloading scheme should determine which video chunk to download, when to download it, and at what bitrate. Existing schemes either fail to consider all these factors together or find it hard to make the best decision. In this work, we argue that the selection of downloaded video chunks is the foremost task due to the uncertain user behaviors and the corresponding huge number of possible playing sequences of video chunks. As a step further, we propose the playing time-aware chunk selection (PaCs) scheme, which downloads the video chunk with the smallest expected playing time first. After the selection of the video chunk, the bitrate is selected according to the classic MPC algorithm and then whether the downloading process is paused or executed is discussed. In total, PaCs can improve the score consisting of the QoE of downloaded video chunks and the utilized network bandwidth under different conditions. Simulations confirm that PaCs achieves a higher score than the existing scheme Dashlet proposed in NSDI 2024. Full article

► Show Figures

Figure 1

18 pages, 7120 KB

Open AccessArticle

Enhancing Crowd-Sourced Video Sharing through P2P-Assisted HTTP Video Streaming

by Jieran Geng and Satoshi Fujita

Electronics 2024, 13(7), 1270; https://doi.org/10.3390/electronics13071270 - 29 Mar 2024

Cited by 6 | Viewed by 2931

Abstract

This paper introduces a decentralized architecture designed for the sharing and distribution of user-generated video streams. The proposed system employs HTTP Live Streaming (HLS) as the delivery method for these video streams. In the architecture, a creator who captures a video stream using a smartphone camera subsequently transcodes it into a sequence of video chunks called HLS segments. These chunks are then stored in a distributed manner across the worker network, forming the core of the proposed architecture. Despite the presence of a coordinator for bootstrapping within the worker network, the selection of worker nodes for storing generated video chunks and autonomous load balancing among worker nodes are conducted in a decentralized fashion, eliminating the need for central servers. The worker network is implemented using the Golang-based IPFS (InterPlanetary File System) client, called kubo, leveraging essential IPFS functionalities such as node identification through Kademlia-DHT and message exchange using Bitswap. Beyond merely delivering stored video streams, the worker network can also amalgamate multiple streams to create a new composite stream. This bundling of multiple video streams into a unified video stream is executed on the worker nodes, making effective use of the FFmpeg library. To enhance download efficiency, parallel downloading with multiple threads is employed for retrieving the video stream from the worker network to the requester, thereby reducing download time. The result of the experiments conducted on the prototype system indicates that those concerned with the transmission time of the requested video streams compared with a server-based system using AWS exhibit a significant advantage, particularly evident in the case of low-resolution video streams, and this advantage becomes more pronounced as the stream length increases. Furthermore, it demonstrates a clear advantage in scenarios characterized by a substantial volume of viewing requests. Full article

(This article belongs to the Special Issue Signal, Image and Video Processing: Development and Applications)

► Show Figures

Figure 1

16 pages, 1237 KB

Open AccessArticle

Secure Video Communication Using Multi-Equation Multi-Key Hybrid Cryptography

by Youcef Fouzar, Ahmed Lakhssassi and Ramakrishna Mundugar

Future Internet 2023, 15(12), 387; https://doi.org/10.3390/fi15120387 - 29 Nov 2023

Cited by 2 | Viewed by 3200

Abstract

The safeguarding of intellectual property and maintaining privacy for video content are closely linked to the effectiveness of security protocols employed in internet streaming platforms. The inadequate implementation of security measures by content providers has resulted in security breaches within entertainment applications, hence causing a reduction in the client base. This research aimed to enhance the security measures employed for video content by implementing a multi-key approach for encryption and decryption processes. The aforementioned objective was successfully accomplished through the use of hybrid methodologies, the production of dynamic keys, and the implementation of user-attribute-based techniques. The main aim of the study was to improve the security measures associated with the process of generating video material. The proposed methodology integrates a system of mathematical equations and a pseudorandom key within its execution. This novel approach significantly augments the degree of security the encryption mechanism provides. The proposed methodology utilises a set of mathematical equations that are randomly employed to achieve encryption. Using a random selection procedure contributes to the overall enhancement of the system’s security. The suggested methodology entails the division of the video into smaller entities known as chunks. Following this, every segment is subjected to encryption using unique keys that are produced dynamically in real-time. The proposed methodology is executed via Android platforms. The transmitter application is tasked with the responsibility of facilitating the streaming of the video content, whereas the receiver application serves the purpose of presenting the video to the user. A careful study was conducted to compare and contrast the suggested method with other similar methods that were already in use. The results of the study strongly support the safety and dependability of the procedure that was made available. Full article

► Show Figures

Figure 1

20 pages, 7735 KB

Open AccessArticle

Human Interaction Classification in Sliding Video Windows Using Skeleton Data Tracking and Feature Extraction

by Sebastian Puchała, Włodzimierz Kasprzak and Paweł Piwowarski

Sensors 2023, 23(14), 6279; https://doi.org/10.3390/s23146279 - 10 Jul 2023

Cited by 7 | Viewed by 3278

Abstract

A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skeletons were estimated using dedicated software, such as OpenPose or HRNet. The skeleton data for a given window were collected, analyzed, and eventually corrected. A knowledge-aware feature extraction from the corrected skeletons was performed. A deep network model was trained and applied for two-person interaction classification. Three network architectures were developed—single-, double- and triple-channel LSTM networks—and were experimentally evaluated on the interaction subset of the ”NTU RGB+D” data set. The most efficient model achieved an interaction classification accuracy of 96%. This performance was compared with the best reported solutions for this set, based on “adaptive graph convolutional networks” (AGCN) and “3D convolutional networks” (e.g., OpenConv3D). The sliding-window strategy was cross-validated on the ”UT-Interaction” data set, containing long video clips with many changing interactions. We concluded that a two-step approach to skeleton-based human activity classification (a skeleton feature engineering step followed by a deep neural network model) represents a practical tradeoff between accuracy and computational complexity, due to an early correction of imperfect skeleton data and a knowledge-aware extraction of relational features from the skeletons. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 1519 KB

Open AccessArticle

Super-Resolution-Empowered Adaptive Medical Video Streaming in Telemedicine Systems

by Hangcheng Han and Jian Lv

Electronics 2022, 11(18), 2944; https://doi.org/10.3390/electronics11182944 - 16 Sep 2022

Cited by 8 | Viewed by 3287

Abstract

Due to influence of COVID-19, telemedicine is becoming more and more important. High-quality medical videos can provide a physician with a better visual experience and increase the accuracy of disease diagnosis, but this requires a dramatic increase in bandwidth compared to that required by regular videos. Existing adaptive video-streaming approaches cannot successfully provide high-resolution video-streaming services under poor or fluctuating network conditions with limited bandwidth. In this paper, we propose a super-resolution-empowered adaptive medical video streaming in telemedicine system (named SR-Telemedicine) to provide high quality of experience (QoE) videos for the physician while saving the network bandwidth. In SR-Telemedicine, very low-resolution video chunks are first transmitted from the patient to an edge computing node near the physician. Then, a video super-resolution (VSR) model is employed at the edge to reconstruct the low-resolution video chunks into high-resolution ones with an appropriate high-resolution level (such as 720p or 1080p). Furthermore, the neural network of VSR model is designed to be scalable and can be determined dynamically. Based on the time-varying computational capability of the edge computing node and the network condition, a double deep Q-Network (DDQN)-based algorithm is proposed to jointly select the optimal reconstructed high-resolution level and the scale of the VSR model. Finally, extensive experiments based on real-world traces are carried out, and the experimental results illustrate that the proposed SR-Telemedicine system can improve the QoE of medical videos by 17–79% compared to three baseline algorithms. Full article

(This article belongs to the Special Issue Advances in Multi-Media Network Transmission)

► Show Figures

Figure 1

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI