Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (93)

Search Parameters:
Keywords = visual field manipulation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 11649 KB  
Article
Development of Shunt Connection Communication and Bimanual Coordination-Based Smart Orchard Robot
by Bin Yan and Xiameng Li
Agronomy 2025, 15(8), 1801; https://doi.org/10.3390/agronomy15081801 - 25 Jul 2025
Viewed by 297
Abstract
This research addresses the enhancement of operational efficiency in apple-picking robots through the design of a bimanual spatial configuration enabling obstacle avoidance in contemporary orchard environments. A parallel coordinated harvesting paradigm for dual-arm systems was introduced, leading to the construction and validation of [...] Read more.
This research addresses the enhancement of operational efficiency in apple-picking robots through the design of a bimanual spatial configuration enabling obstacle avoidance in contemporary orchard environments. A parallel coordinated harvesting paradigm for dual-arm systems was introduced, leading to the construction and validation of a six-degree-of-freedom bimanual apple-harvesting robot. Leveraging the kinematic architecture of the AUBO-i5 manipulator, three spatial layout configurations for dual-arm systems were evaluated, culminating in the adoption of a “workspace-overlapping Type B” arrangement. A functional prototype of the bimanual apple-harvesting system was subsequently fabricated. The study further involved developing control architectures for two end-effector types: a compliant gripper and a vacuum-based suction mechanism, with corresponding operational protocols established. A networked communication framework for parallel arm coordination was implemented via Ethernet switching technology, enabling both independent and synchronized bimanual operation. Additionally, an intersystem communication protocol was formulated to integrate the robotic vision system with the dual-arm control architecture, establishing a modular parallel execution model between visual perception and motion control modules. A coordinated bimanual harvesting strategy was formulated, incorporating real-time trajectory and pose monitoring of the manipulators. Kinematic simulations were executed to validate the feasibility of this strategy. Field evaluations in modern Red Fuji apple orchards assessed multidimensional harvesting performance, revealing 85.6% and 80% success rates for the suction and gripper-based arms, respectively. Single-fruit retrieval averaged 7.5 s per arm, yielding an overall system efficiency of 3.75 s per fruit. These findings advance the technological foundation for intelligent apple-harvesting systems, offering methodologies for the evolution of precision agronomic automation. Full article
(This article belongs to the Special Issue Smart Farming: Advancing Techniques for High-Value Crops)
Show Figures

Figure 1

23 pages, 4047 KB  
Article
Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
by Potito Valle Dell’Olmo, Oleksandr Kuznetsov, Emanuele Frontoni, Marco Arnesano, Christian Napoli and Cristian Randieri
Mach. Learn. Knowl. Extr. 2025, 7(2), 54; https://doi.org/10.3390/make7020054 - 13 Jun 2025
Cited by 1 | Viewed by 905
Abstract
Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forensics, underlining [...] Read more.
Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forensics, underlining the importance of ensuring the integrity of digital visual content. In this study, we present a systematic evaluation of the performance of a convolutional neural network (CNN) specifically designed for copy-move manipulation detection, applied to three datasets widely used in the literature in the context of digital forensics: CoMoFoD, Coverage, and CASIA v2. Our experimental analysis highlighted a significant variability of the results, with an accuracy ranging from 95.90% on CoMoFoD to 27.50% on Coverage. This inhomogeneity has been attributed to specific structural factors of the datasets used, such as the sample size, the degree of imbalance between classes, and the intrinsic complexity of the manipulations. We also investigated different regularization techniques and data augmentation strategies to understand their impact on the network performance, finding that adopting the L2 penalty and reducing the learning rate led to an accuracy increase of up to 2.5% for CASIA v2, while on CoMoFoD we recorded a much more modest impact (1.3%). Similarly, we observed that data augmentation was able to improve performance on large datasets but was ineffective on smaller ones. Our results challenge the idea of universal generalizability of CNN architectures in the context of copy-move forgery detection, highlighting instead how performance is strictly dependent on the intrinsic characteristics of the dataset under consideration. Finally, we propose a series of operational recommendations for optimizing the training process, the choice of the dataset, and the definition of robust evaluation protocols aimed at guiding the development of detection systems that are more reliable and generalizable. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition, 2nd Edition)
Show Figures

Figure 1

19 pages, 2154 KB  
Article
A New Method for Inducing Mental Fatigue: A High Mental Workload Task Paradigm Based on Complex Cognitive Abilities and Time Pressure
by Lei Ren, Lin Wu, Tingwei Feng and Xufeng Liu
Brain Sci. 2025, 15(6), 541; https://doi.org/10.3390/brainsci15060541 - 22 May 2025
Cited by 2 | Viewed by 1350
Abstract
Objectives: With the advancement of modern society, people in cognitively demanding jobs are increasingly exposed to occupational stress. Prolonged and high-intensity cognitive activities are prone to inducing mental fatigue (MF), which adversely affects both psychological and physiological well-being, as well as task [...] Read more.
Objectives: With the advancement of modern society, people in cognitively demanding jobs are increasingly exposed to occupational stress. Prolonged and high-intensity cognitive activities are prone to inducing mental fatigue (MF), which adversely affects both psychological and physiological well-being, as well as task performance. Existing methods for inducing MF often demonstrate limited effectiveness due to insufficient cognitive load from overly simplistic tasks and the potential emotional disturbance caused by prolonged task duration. This study aims to explore a comprehensive cognitive task paradigm that integrates task complexity and time pressure, thereby developing a novel and effective method for inducing MF based on high mental workload (HMW) and the effects of time on task (ToT). Methods: Using convenience sampling, university students from a medical college were recruited as participants. The study was conducted in three steps. In the first step, we constructed a 1-back Stroop (BS) task paradigm by designing tasks with varying levels of complexity and incorporating time pressure through experimental manipulation. In the second step, the efficacy of the BS task paradigm was validated by comparing it with the traditional 2-back cognitive task in inducing HMW. In the third step, an MF induction protocol was established by combining the BS task paradigm with the ToT effect (i.e., a continuous 30 min task). Effectiveness was assessed using validated subjective measures (NASA Task Load Index [NASA-TLX] and Visual Analog Scale [VAS]) and objective behavioral metrics (reaction time and accuracy). Statistical analyses were performed using analysis of variance (ANOVA) and t-tests. Results: The BS task paradigm, which integrates complex cognitive abilities such as attention, working memory, inhibitory control, cognitive flexibility, and time pressure, demonstrated significantly higher NASA-TLX total scores, as well as elevated scores in mental demand, temporal demand, performance, and frustration scales, compared to the 2-back task. Additionally, the BS task paradigm resulted in longer reaction times and lower accuracy. As the BS task progressed, participants exhibited significant increases in mental fatigue (MF), mental effort (ME), mental stress (MS), and subjective feelings of fatigue, while the overall number of correct trials and accuracy showed a significant decline. Furthermore, reaction times in the psychomotor vigilance test (PVT) were significantly prolonged, and the number of lapses significantly increased between pre- and post-task assessments. Conclusions: The BS task paradigm based on complex cognitive abilities and time pressure could effectively induce an HMW state. Combined with the ToT effect, the BS paradigm demonstrated effective MF induction capabilities. This study provides a novel and reliable method for inducing HMW and MF, offering a valuable tool for future research in related fields. Full article
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)
Show Figures

Figure 1

18 pages, 2001 KB  
Review
Depth Perception Based on the Interaction of Binocular Disparity and Motion Parallax Cues in Three-Dimensional Space
by Shuai Li, Shufang He, Yuanrui Dong, Caihong Dai, Jinyuan Liu, Yanfei Wang and Hiroaki Shigemasu
Sensors 2025, 25(10), 3171; https://doi.org/10.3390/s25103171 - 17 May 2025
Viewed by 1498
Abstract
Background and Objectives: Depth perception of the human visual system in three-dimensional (3D) space plays an important role in human–computer interaction and artificial intelligence (AI) areas. It mainly employs binocular disparity and motion parallax cues. This study aims to systemically summarize the related [...] Read more.
Background and Objectives: Depth perception of the human visual system in three-dimensional (3D) space plays an important role in human–computer interaction and artificial intelligence (AI) areas. It mainly employs binocular disparity and motion parallax cues. This study aims to systemically summarize the related studies about depth perception specified by these two cues. Materials and Methods: We conducted a literature investigation on related studies and summarized them from aspects like motivations, research trends, mechanisms, and interaction models of depth perception specified by these two cues. Results: Development trends show that depth perception research has gradually evolved from early studies based on a single cue to quantitative studies based on the interaction between these two cues. Mechanisms of these two cues reveal that depth perception specified by the binocular disparity cue is mainly influenced by factors like spatial variation in disparity, viewing distance, the position of visual field (or retinal image) used, and interaction with other cues; whereas that specified by the motion parallax cue is affected by head movement and retinal image motion, interaction with other cues, and the observer’s age. By integrating these two cues, several types of models for depth perception are summarized: the weak fusion (WF) model, the modified weak fusion (MWF) model, the strong fusion (SF) model, and the intrinsic constraint (IC) model. The merits and limitations of each model are analyzed and compared. Conclusions: Based on this review, a clear picture of the study on depth perception specified by binocular disparity and motion parallax cues can be seen. Open research challenges and future directions are presented. In the future, it is necessary to explore methods for easier manipulating of depth cue signals in stereoscopic images and adopting deep learning-related methods to construct models and predict depths, to meet the increasing demand of human–computer interaction in complex 3D scenarios. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 6376 KB  
Article
Deep Reinforcement Learning-Based Uncalibrated Visual Servoing Control of Manipulators with FOV Constraints
by Xungao Zhong, Qiao Zhou, Yuan Sun, Shaobo Kang and Huosheng Hu
Appl. Sci. 2025, 15(8), 4447; https://doi.org/10.3390/app15084447 - 17 Apr 2025
Cited by 1 | Viewed by 1003
Abstract
In this article, we put forward a brand-new uncalibrated image-based visual servoing (IBVS) method. It is designed for monocular hand–eye manipulators with Field-of-View (FOV) feature constraints and makes use of a deep reinforcement learning (DRL) approach. First, the IBVS and its feature-loss problems [...] Read more.
In this article, we put forward a brand-new uncalibrated image-based visual servoing (IBVS) method. It is designed for monocular hand–eye manipulators with Field-of-View (FOV) feature constraints and makes use of a deep reinforcement learning (DRL) approach. First, the IBVS and its feature-loss problems are introduced. Then, a uncalibrated IBVS method is presented to address the feature-loss issue and improve servo efficiency with DRL. Specifically, the uncalibrated IBVS is integrated into the deep Q-network (DQN) control framework to ensure analytical stability. Additionally, a feature-constrained Q-network based on offline camera FOV environment feature mapping is designed and trained to adaptively output compensation for the IBVS controller, which helps maintain the feature within the camera’s FOV and improve servo performance. Finally, to further demonstrate the effectiveness and practicality of the proposed DQN-based uncalibrated IBVS method, experiments are conducted on a 6-DOF manipulator, and the results validate the proposed approach. Full article
(This article belongs to the Special Issue Robotics and Intelligent Systems: Technologies and Applications)
Show Figures

Figure 1

24 pages, 4969 KB  
Article
Adrenergic Modulation of Cortical Gain and Sensory Processing in the Mouse Visual Cortex
by Ricardo Medina-Coss y León, Elí Lezama, Inmaculada Márquez and Mario Treviño
Brain Sci. 2025, 15(4), 406; https://doi.org/10.3390/brainsci15040406 - 17 Apr 2025
Viewed by 851
Abstract
Background/Objectives: Sensory perception is influenced by internal neuronal variability and external noise. Neuromodulators such as norepinephrine (NE) regulate this variability by modulating excitation–inhibition balance, oscillatory dynamics, and interlaminar connectivity. While NE is known to modulate cortical gain, it remains unclear how it shapes [...] Read more.
Background/Objectives: Sensory perception is influenced by internal neuronal variability and external noise. Neuromodulators such as norepinephrine (NE) regulate this variability by modulating excitation–inhibition balance, oscillatory dynamics, and interlaminar connectivity. While NE is known to modulate cortical gain, it remains unclear how it shapes sensory processing under noisy conditions. This study investigates how adrenergic modulation affects signal-to-noise processing and perceptual decision-making in the primary visual cortex (V1) of mice exposed to varying levels of visual noise. Methods: We performed in vivo local field potential (LFP) recordings from layers 2/3 and 4 of V1 in sedated mice to assess the impact of visual noise and systemic administration of atomoxetine, a NE reuptake inhibitor, on cortical signal processing. In a separate group of freely moving mice, we used a two-alternative forced-choice to evaluate the behavioral effects of systemic and intracortical adrenergic manipulations on visual discrimination. Results: Moderate visual noise enhanced cortical signal processing and visual choices, consistent with stochastic resonance. High noise levels impaired both. Systemic atomoxetine administration flattened the cortical signal-to-noise ratio function, suggesting disrupted gain control. Behaviorally, clonidine impaired accuracy at moderate noise levels, while atomoxetine reduced discrimination performance and increased response variability. Intracortical NE infusions produced similar effects. Conclusions: Our findings demonstrate that NE regulates the balance between signal amplification and noise suppression in a noise- and context-dependent manner. These results extend existing models of neuromodulatory function by linking interlaminar communication and cortical variability to perceptual decision-making. Full article
(This article belongs to the Special Issue Perceptual Learning and Cortical Plasticity)
Show Figures

Figure 1

31 pages, 412 KB  
Review
Visual Function After Schlemm’s Canal-Based MIGS
by Masayuki Kasahara and Nobuyuki Shoji
J. Clin. Med. 2025, 14(7), 2531; https://doi.org/10.3390/jcm14072531 - 7 Apr 2025
Viewed by 1086
Abstract
Filtration surgery is highly effective in lowering intraocular pressure; however, it is associated with a higher risk of severe complications. Visual dysfunction may persist in relatively uneventful cases because of induced astigmatism or worsening optical aberrations. Therefore, for early- to moderate-stage glaucoma, an [...] Read more.
Filtration surgery is highly effective in lowering intraocular pressure; however, it is associated with a higher risk of severe complications. Visual dysfunction may persist in relatively uneventful cases because of induced astigmatism or worsening optical aberrations. Therefore, for early- to moderate-stage glaucoma, an increasing number of surgeons are prioritizing surgical safety and preserving postoperative visual function by opting for minimally invasive glaucoma surgery (MIGS). Among the various MIGS techniques, canal-opening surgery—targeting aqueous outflow through the Schlemm’s canal (Schlemm’s canal-based MIGS, CB-MIGS)—has gained increasing popularity. Unlike filtration surgery, CB-MIGS does not require creating an aqueous outflow pathway between the intraocular and extraocular spaces. Consequently, it is considered a minimally invasive procedure with a reduced risk of severe complications and is increasingly being chosen for suitable cases. Although this surgical technique has limitations in lowering intraocular pressure, it avoids the manipulation of the conjunctiva or sclera and is primarily performed through a small corneal incision. Therefore, a minimal impact on induced astigmatism or postoperative refractive changes is expected. However, few reviews comprehensively summarize postoperative changes in visual function. Therefore, this study reviews the literature on visual function after CB-MIGS, focusing on changes in best-corrected visual acuity (BCVA), refraction, astigmatism, and the effectiveness of visual field preservation to assess the extent of these postoperative changes. Hyphema is the primary cause of early postoperative vision loss and is often transient in cases in which other complications would have led to visual impairment. Severe complications that threaten vision are rare. Additionally, compared with filtration surgery, postoperative visual recovery tends to be faster, and the degree of induced astigmatism is comparable to that of standalone cataract surgery. When combined with cataract surgery, the refractive error is at the same level as that of cataract surgery alone. However, in some cases, mild hyperopic shifts may occur because of axial length shortening, depending on the extent of intraocular pressure reduction. This possibility has been highlighted in several studies. Regarding the effectiveness of slowing the progression of visual field defects, most studies have focused on short- to medium-term postoperative outcomes. Many of these studies have reported the sufficient suppression of progression rates. However, studies with large sample sizes and long-term prospective designs are limited. To establish more robust evidence, future research should focus on conducting larger-scale, long-term investigations. Full article
(This article belongs to the Special Issue Clinical Debates in Minimally Invasive Glaucoma Surgery (MIGS))
23 pages, 3543 KB  
Article
Learning from Demonstrations via Deformable Residual Multi-Attention Domain-Adaptive Meta-Learning
by Zeyu Yan, Zhongxue Gan, Gaoxiong Lu, Junxiu Liu and Wei Li
Biomimetics 2025, 10(2), 103; https://doi.org/10.3390/biomimetics10020103 - 11 Feb 2025
Viewed by 1014
Abstract
In recent years, the fields of one-shot and few-shot object detection and classification have garnered significant attention. However, the rapid adaptation of robots to previously unencountered or novel environments remains a formidable challenge. Inspired by biological learning processes, meta-learning seeks to replicate the [...] Read more.
In recent years, the fields of one-shot and few-shot object detection and classification have garnered significant attention. However, the rapid adaptation of robots to previously unencountered or novel environments remains a formidable challenge. Inspired by biological learning processes, meta-learning seeks to replicate the way humans and animals quickly adapt to new tasks by leveraging prior knowledge and generalizing across experiences. Despite this, traditional meta-learning methods that rely on deepening or widening neural networks offer only marginal improvements in model performance. To address this, we proposed a novel framework termed Residual Multi-Attention Domain-Adaptive Meta-Learning (DRMA-DAML). Our framework, motivated by biological principles like the human visual system’s concurrent handling of global and local details for enhanced perception and decision making, empowers the model to significantly enhance performance without augmenting the depth of the neural network, thus avoiding the overfitting and vanishing gradient problems typical of deeper architectures. Empirical evidence from both simulated environments and real-world applications demonstrates that DRMA-DAML achieves state-of-the-art performance. Specifically, it improves adaptation accuracy by 11.18% on benchmark tasks and achieves a 97.64% success rate in real-world object manipulation, surpassing existing methods. These results validate the effectiveness of our approach in rapid adaptation for robotic systems. Full article
Show Figures

Figure 1

24 pages, 8881 KB  
Article
Research on Multimodal Control Method for Prosthetic Hands Based on Visuo-Tactile and Arm Motion Measurement
by Jianwei Cui and Bingyan Yan
Biomimetics 2024, 9(12), 775; https://doi.org/10.3390/biomimetics9120775 - 19 Dec 2024
Viewed by 1338
Abstract
The realization of hand function reengineering using a manipulator is a research hotspot in the field of robotics. In this paper, we propose a multimodal perception and control method for a robotic hand to assist the disabled. The movement of the human hand [...] Read more.
The realization of hand function reengineering using a manipulator is a research hotspot in the field of robotics. In this paper, we propose a multimodal perception and control method for a robotic hand to assist the disabled. The movement of the human hand can be divided into two parts: the coordination of the posture of the fingers, and the coordination of the timing of grasping and releasing objects. Therefore, we first used a pinhole camera to construct a visual device suitable for finger mounting, and preclassified the shape of the object based on YOLOv8; then, a filtering process using multi-frame synthesized point cloud data from miniature 2D Lidar, and DBSCAN algorithm clustering objects and the DTW algorithm, was proposed to further identify the cross-sectional shape and size of the grasped part of the object and realize control of the robot’s grasping gesture; finally, a multimodal perception and control method for prosthetic hands was proposed. To control the grasping attitude, a fusion algorithm based on information of upper limb motion state, hand position, and lesser toe haptics was proposed to realize control of the robotic grasping process with a human in the ring. The device designed in this paper does not contact the human skin, does not produce discomfort, and the completion rate of the grasping process experiment reached 91.63%, which indicates that the proposed control method has feasibility and applicability. Full article
(This article belongs to the Special Issue Bionic Technology—Robotic Exoskeletons and Prostheses: 2nd Edition)
Show Figures

Figure 1

20 pages, 868 KB  
Essay
Untangling Photographic Manipulation: Exploring a Dual Concept and Its Societal Implications
by Liv Hausken
Journal. Media 2024, 5(4), 1881-1900; https://doi.org/10.3390/journalmedia5040114 - 12 Dec 2024
Cited by 1 | Viewed by 2054
Abstract
In recent years, the pervasive presence of visual disinformation in the media and visual culture, propelled by technological advancements, has become an escalating concern. This article asserts the urgent need to revise the current conceptual framework for addressing this challenge. A significant hurdle [...] Read more.
In recent years, the pervasive presence of visual disinformation in the media and visual culture, propelled by technological advancements, has become an escalating concern. This article asserts the urgent need to revise the current conceptual framework for addressing this challenge. A significant hurdle is the ambiguity surrounding the very concept of manipulation. Two distinct concepts of manipulation coexist—one with moral implications and the other without. This article examines this conceptual discrepancy across academic cultures, identifying them as anchored, respectively, in the social sciences and humanities and in the natural sciences and medicine. It then analyzes how these two concepts are used in white papers and other policy documents that guide responses to visual disinformation from 2018 to 2021. The article further investigates the complexities of these manipulation concepts within photography and visual expression. By elucidating and questioning them, the article aims to enhance the framework for addressing visual manipulation, foster interdisciplinary collaboration, and enrich theories of camera-based imaging across various fields. Overall, this article highlights deficiencies in the current framework and strives to improve it, thereby aiding in tackling visual disinformation and fostering effective collaboration among stakeholders. Full article
Show Figures

Figure 1

17 pages, 7503 KB  
Article
Integrating Historical Learning and Multi-View Attention with Hierarchical Feature Fusion for Robotic Manipulation
by Gaoxiong Lu, Zeyu Yan, Jianing Luo and Wei Li
Biomimetics 2024, 9(11), 712; https://doi.org/10.3390/biomimetics9110712 - 20 Nov 2024
Cited by 1 | Viewed by 1430
Abstract
Humans typically make decisions based on past experiences and observations, while in the field of robotic manipulation, the robot’s action prediction often relies solely on current observations, which tends to make robots overlook environmental changes or become ineffective when current observations are suboptimal. [...] Read more.
Humans typically make decisions based on past experiences and observations, while in the field of robotic manipulation, the robot’s action prediction often relies solely on current observations, which tends to make robots overlook environmental changes or become ineffective when current observations are suboptimal. To address this pivotal challenge in robotics, inspired by human cognitive processes, we propose our method which integrates historical learning and multi-view attention to improve the performance of robotic manipulation. Based on a spatio-temporal attention mechanism, our method not only combines observations from current and past steps but also integrates historical actions to better perceive changes in robots’ behaviours and their impacts on the environment. We also employ a mutual information-based multi-view attention module to automatically focus on valuable perspectives, thereby incorporating more effective information for decision-making. Furthermore, inspired by human visual system which processes both global context and local texture details, we have devised a method that merges semantic and texture features, aiding robots in understanding the task and enhancing their capability to handle fine-grained tasks. Extensive experiments in RLBench and real-world scenarios demonstrate that our method effectively handles various tasks and exhibits notable robustness and adaptability. Full article
Show Figures

Figure 1

17 pages, 2983 KB  
Article
Pose Estimation of a Cobot Implemented on a Small AI-Powered Computing System and a Stereo Camera for Precision Evaluation
by Marco-Antonio Cabrera-Rufino, Juan-Manuel Ramos-Arreguín, Marco-Antonio Aceves-Fernandez, Efren Gorrostieta-Hurtado, Jesus-Carlos Pedraza-Ortega and Juvenal Rodríguez-Resendiz
Biomimetics 2024, 9(10), 610; https://doi.org/10.3390/biomimetics9100610 - 9 Oct 2024
Viewed by 1308
Abstract
The precision of robotic manipulators in the industrial or medical field is very important, especially when it comes to repetitive or exhaustive tasks. Geometric deformations are the most common in this field. For this reason, new robotic vision techniques have been proposed, including [...] Read more.
The precision of robotic manipulators in the industrial or medical field is very important, especially when it comes to repetitive or exhaustive tasks. Geometric deformations are the most common in this field. For this reason, new robotic vision techniques have been proposed, including 3D methods that made it possible to determine the geometric distances between the parts of a robotic manipulator. The aim of this work is to measure the angular position of a robotic arm with six degrees of freedom. For this purpose, a stereo camera and a convolutional neural network algorithm are used to reduce the degradation of precision caused by geometric errors. This method is not intended to replace encoders, but to enhance accuracy by compensating for degradation through an intelligent visual measurement system. The camera is tested and the accuracy is about one millimeter. The implementation of this method leads to better results than traditional and simple neural network methods. Full article
Show Figures

Figure 1

39 pages, 9734 KB  
Review
A Survey of Robot Intelligence with Large Language Models
by Hyeongyo Jeong, Haechan Lee, Changwon Kim and Sungtae Shin
Appl. Sci. 2024, 14(19), 8868; https://doi.org/10.3390/app14198868 - 2 Oct 2024
Cited by 15 | Viewed by 13638
Abstract
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In [...] Read more.
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In general, traditional supervised learning-based robot intelligence systems have a significant lack of adaptability to dynamically changing environments. However, LLMs help a robot intelligence system to improve its generalization ability in dynamic and complex real-world environments. Indeed, findings from ongoing robotics studies indicate that LLMs can significantly improve robots’ behavior planning and execution capabilities. Additionally, vision-language models (VLMs), trained on extensive visual and linguistic data for the vision question answering (VQA) problem, excel at integrating computer vision with natural language processing. VLMs can comprehend visual contexts and execute actions through natural language. They also provide descriptions of scenes in natural language. Several studies have explored the enhancement of robot intelligence using multimodal data, including object recognition and description by VLMs, along with the execution of language-driven commands integrated with visual information. This review paper thoroughly investigates how foundation models such as LLMs and VLMs have been employed to boost robot intelligence. For clarity, the research areas are categorized into five topics: reward design in reinforcement learning, low-level control, high-level planning, manipulation, and scene understanding. This review also summarizes studies that show how foundation models, such as the Eureka model for automating reward function design in reinforcement learning, RT-2 for integrating visual data, language, and robot actions in vision-language-action models, and AutoRT for generating feasible tasks and executing robot behavior policies via LLMs, have improved robot intelligence. Full article
Show Figures

Figure 1

14 pages, 2422 KB  
Article
Moderating Effects of Visual Order in Graphical Symbol Complexity: The Practical Implications for Design
by Nuowen Zhang, Jing Zhang, Shangsong Jiang, Xingcheng Di and Weijun Li
Appl. Sci. 2024, 14(17), 7592; https://doi.org/10.3390/app14177592 - 28 Aug 2024
Cited by 1 | Viewed by 1262
Abstract
In the field of visual graphic design, complexity plays a crucial role in visual information processing, and it is assumed to be an absolute quantity based on the number of the presenting features and components. However, it remains unclear whether the visual order [...] Read more.
In the field of visual graphic design, complexity plays a crucial role in visual information processing, and it is assumed to be an absolute quantity based on the number of the presenting features and components. However, it remains unclear whether the visual order of the constituent elements in graphical symbol complexity affects cognitive processing, especially memory processing. Our research innovatively generated four groups of novel, meaningless graphical symbols (complex and ordered, complex and disordered, simple and ordered, and simple and disordered) and experimentally manipulated the level of complexity and order in these stimuli. Before the formal experiment, a five-point scale was used to further rule out differences between objective and subjective definitions of these graphical symbols on ratings of complexity, order, concreteness, and familiarity. Then, we used a cue-recall task to compare subjects’ memory performance of those four graphical symbol groups. The analytical results showed a significant interaction between visual order and graphical symbol complexity, with the complexity effect appearing only when the stimuli were in disordered condition and disappearing once the stimuli were ordered. In addition, this study conducted a practical application validation to confirm that increasing the level of visual order is an effective way to improve user experience while maintaining the same level of complexity. The findings can serve as a reference for graphical symbol design, graphic design, and visual communication design. Full article
Show Figures

Figure 1

16 pages, 5014 KB  
Article
Leveraging Virtual Reality for the Visualization of Non-Observable Electrical Circuit Principles in Engineering Education
by Elliott Wolbach, Michael Hempel and Hamid Sharif
Virtual Worlds 2024, 3(3), 303-318; https://doi.org/10.3390/virtualworlds3030016 - 2 Aug 2024
Viewed by 2209
Abstract
As technology advances, the field of electrical and computer engineering continuously demands the introduction of innovative new tools and methodologies to facilitate the effective learning and comprehension of fundamental concepts. This research addresses an identified gap in technology-augmented education capabilities and researches the [...] Read more.
As technology advances, the field of electrical and computer engineering continuously demands the introduction of innovative new tools and methodologies to facilitate the effective learning and comprehension of fundamental concepts. This research addresses an identified gap in technology-augmented education capabilities and researches the integration of virtual reality (VR) technology with real-time electronic circuit simulation to enable and enhance the visualization of non-observable concepts such as voltage distribution and current flow within these circuits. In this paper, we describe the development of our immersive educational platform, which makes understanding these abstract concepts intuitive and engaging. This research also involves the design and development of a VR-based circuit simulation environment. By leveraging VR’s immersive capabilities, our system enables users to physically interact with electronic components, observe the flow of electrical signals, and manipulate circuit parameters in real-time. Through this immersive experience, learners can gain a deeper understanding of fundamental electronic principles, transcending the limitations of traditional two-dimensional diagrams and equations. Furthermore, this research focuses on the implementation of advanced and novel visualization techniques within the VR environment for non-observable electrical and electromagnetic properties, providing users with a clearer and more intuitive understanding of electrical circuit concepts. Examples include color-coded pathways for current flow and dynamic voltage gradient visualization. Additionally, real-time data representation and graphical overlays are researched and integrated to offer users insights into the dynamic behavior of circuits, allowing for better analysis and troubleshooting. Full article
Show Figures

Figure 1

Back to TopTop