- Article
Integrating Vision–Language–Action Models and RGB-D Sensing for Robotic Waste Sorting on KUKA LBR iiwa
- Teresa Sinico,
- Daniele Businaro and
- Giovanni Boschetti
Robotic waste sorting presents significant challenges, including object variability, cluttered environments, and the predominant reliance on deep learning and traditional computer vision techniques, which typically demand extensive datasets and task-specific training. This paper introduces a robotic waste sorting system that integrates the Gemini Vision–Language–Action (VLA) model with a KUKA LBR iiwa collaborative robot and an RGB-D camera. Our approach leverages the advanced reasoning capabilities of large, pre-trained VLA models to perform waste sorting, without requiring explicit training or dataset collection. Key contributions include the development of effective prompt engineering strategies for waste object identification, the assessment of the VLA’s performance in terms of inference time and accuracy, and the development of different grasping strategies for operation in cluttered scenarios. Our experimental tests demonstrated that the system’s inference time is between 2 and 4 s, which is suitable for collaborative robotic applications, and the system achieved a high overall classification accuracy of 89.64%. Crucially, we demonstrated that integration of RGB-D sensing enhanced the model’s ability to perceive object heights, resolve occlusions, and make informed grasping decisions in realistic, three-dimensional settings. We further validated multiple real-world grasping strategies, demonstrating tradeoffs between system efficiency and safety in heavily cluttered scenarios. This work establishes a practical and adaptable framework for deploying VLA-driven intelligence on commercial robotic platforms, highlighting the potential of VLAs for complex manipulation tasks beyond waste sorting.
18 May 2026





![Top: Depiction of combined wrist–hand rehabilitation in different stages (reprinted from ref. [20]). Bottom: Depiction of the M3Rob (reprinted from ref. [19]) platform for only wrist and combined wrist and hand rehabilitation.](https://mdpi-res.com/cdn-cgi/image/width=281%2Cheight=192/https://mdpi-res.com/robotics/robotics-15-00099/article_deploy/html/images/robotics-15-00099-g001-550.jpg)

![Operational stock of industrial manipulators in the world per 1000 units [2].](https://mdpi-res.com/cdn-cgi/image/width=281%2Cheight=192/https://mdpi-res.com/robotics/robotics-15-00097/article_deploy/html/images/robotics-15-00097-g001-550.jpg)

