Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts

De Silva, Daswin; Withanage, Sudheera; Sumanasena, Vidura; Gunasekara, Lakshitha; Moraliyage, Harsha; Mills, Nishan; Manic, Milos

doi:10.3390/robotics14040038

Open AccessArticle

Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts

by

Daswin De Silva

^1,*

,

Sudheera Withanage

¹,

Vidura Sumanasena

¹,

Lakshitha Gunasekara

¹

,

Harsha Moraliyage

¹

,

Nishan Mills

¹

and

Milos Manic

²

¹

Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3086, Australia

²

Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(4), 38; https://doi.org/10.3390/robotics14040038

Submission received: 31 December 2024 / Revised: 4 March 2025 / Accepted: 25 March 2025 / Published: 28 March 2025

(This article belongs to the Section AI in Robotics)

Download

Browse Figures

Versions Notes

Abstract

The rapid adoption of artificial intelligence (AI) systems, such as predictive AI, generative AI, and explainable AI, is in contrast to the slower development and uptake of robotic AI systems. Dynamic environments, sensory processing, mechanical movements, power management, and safety are inherent complexities of robotic intelligence capabilities that can be addressed using novel AI approaches. The current AI landscape is dominated by machine learning techniques, specifically deep learning algorithms, that have been effective in addressing some of these challenges. However, these algorithms are subject to computationally complex processing and operational needs such as high data dependency. In this paper, we propose a computation-efficient and data-efficient framework for robotic motion intelligence (RMI) based on vector symbolic architectures (VSAs) and blockchain-based smart contracts. The capabilities of VSAs are leveraged for computationally efficient learning and noise suppression during perception, motion, movement, and decision-making tasks. As a distributed ledger technology, smart contracts address data dependency through a decentralized, distributed, and secure transactions ledger that satisfies contractual conditions. An empirical evaluation of the framework confirms its value and contribution towards addressing the practical challenges of robotic motion intelligence by significantly reducing the learnable parameters by 10 times while preserving sufficient accuracy compared to existing deep learning solutions.

Keywords:

robotic motion intelligence; blockchain; vector symbolic architecture; artificial intelligence; robotic learning

1. Introduction

Robotic learning lies at the intersection of conventional robotics and machine learning capabilities. It is typically limited to end-to-end tasks that optimise narrowly-defined objectives in a singular problem setting, such as a robotic arm movement to grasp an object or a humanoid robot to reach a specific location [1,2]. However, with the rise of generative AI capabilities, robotic learning has transitioned into general-purpose models, trained on real-world, large-volume datasets for robotic movements [3,4]. Gato [5] was one of the first such generalist robotic models, which learns a generalist agent policy. With multi-modal, multi-task, multi-embodiment generalist policies, this model was able to play video games, derive text captions for images, operate as a chatbot, and conduct robotic arm movements [5]. A recent collaboration between 21 institutions provides direction for ‘X-embodiment robotic learning’, where a high-capacity model can be trained using datasets collected from multiple different robots conducting numerous skills and tasks. They demonstrate this using their robotics transformer model, RT-X, which exhibits significant positive transfer between the 22 different robot datasets for 527 skills and 160,266 tasks [6].

Despite these developments from specific to generalist robotic capabilities and applications, the challenge of computationally complex processing and the operational needs of high data dependency continue to have an impact on the speed, agility, and safety of robots. For example, behavior reflex methods try to construct a direct map between the sensory input to the motion action. For behavior reflex approaches, early works such as [7,8] laid the foundation to use a neural network to map images directly to steering control. More recent work [9] proposes a three-camera system that trains end-to-end via imitation learning. In [10], the authors try to improve the explainability of these end-to-end black boxes by training an imitation learning agent with an attention model. Similarly, in surgical robotic settings, where safety and agility are critical factors, the need for risk avoidant behaviors; a human in the loop; in-control approaches, data integration, and integrity across healthcare systems; and improving safety beyond demonstration data have been highlighted as major gaps [11]. Another study [12] evaluated robustness and zero-shot generalisability in robotic surgery, specifically using the segment anything model (SAM) [13], trained on more than a billion object masks of bounding boxes and points. The study reports significant performance degradation in domain shift, data corruptions, perturbations, and complex scenarios, which is further indication of the challenges mentioned. Not limited to the precision required of robotic surgery [14], several other sectors have highlighted the need for robotic intelligence in addressing complex physical operations and tasks, such as digital health [15,16], autonomous vehicles [17,18], and smart grids [19,20].

Given the RMI focus of our work, we reviewed several related frameworks to establish a foundational context for our study. In [21], the authors have utilised deep reinforcement learning, where the input is a state vector that has information about the robot’s current position, orientation, relative target position, and the previous action. The motion intelligence system has two neural networks: the actor network and the critic network. The actor network takes the state vector as the input and provides angular and linear speeds adhering to the predefined speed limits. The critic network predicts the cumulative reward that the robot can expect to obtain if it acts according to the current policy starting from the current state. Conceptually, this work is also similar to [22]. Another noteworthy study is [23], which employs an end-to-end deep learning approach, integrating perception with motion command generation. It concatenates the depth information and corresponding RGB image channel-wise to form an RGBD structure and feeds it into an encoder. The encoder, which is based on ResNet-V2, is used to convert that RGBD input into a feature vector, and it is passed to a branched fully connected network called a driving policy network with a manually selected navigational command. The driving policy network outputs a steering angle and longitudinal speed. While these systems are highly capable, a key challenge lies in their lack of explainability regarding how they select actions from the action space based on sensor readings since they use deep neural networks to obtain them.

In order to address the aforementioned challenges of computation, explainability, and data dependency, we propose a novel framework for robotic motion intelligence (RMI) based on vector symbolic architectures (VSAs) and blockchain-based smart contracts. VSAs provide computationally efficient learning and noise suppression during perception, motion, movement, and decision-making tasks, with smart contracts addressing data dependency through decentralized, distributed, and secure transactions. The rest of the paper is organised as follows. Section 2 provides a review of related work on robotic learning, VSAs, and blockchain-based smart contracts, which is followed by Section 3 that presents the proposed RMI framework in terms of its perception, motion, and decisional intelligence, as well as the capabilities of the VSAs and smart contracts in terms of providing computationally efficient and data-efficient RMI. Section 4 presents the experiments and results that validate the capabilities of the proposed framework, and Section 5 concludes the paper.

2. Related Work

The proposed framework draws on recent work in robotic learning, VSAs, and blockchain-based smart contracts. In this section, we focus on a subset of work that has a direct association with the proposed framework and its capabilities.

2.1. Robotic Learning

Robotic learning is primarily based on two forms of learning algorithms, reinforcement learning (RL) and imitation learning (IL) [24,25,26]. Although RL is a lot more effective than IL, developing a real-world RL model is challenging because of the need to identify a suitable reward function for complex manipulations, ensuring the safety of operational environment, and the training time needed in response to a growing state space. In contrast, an IL model observes an expert completing tasks that the robot is required to perform. This observation-based approach reduces the learning time, unlike RL which has to explore and assess many different scenarios.IL or learning from demonstration (LfD) can take three forms—behavioral cloning, inverse reinforcement learning, and direct policy learning. Behavioral cloning is fundamentally a supervised learning approach that uses offline expert demonstrations for training. In a further three-way split, end-to-end control prediction, direct perception, and uncertainty quantification [27], involve a direct mapping of sensory inputs to control signals [9,10,28,29]. Inverse reinforcement learning (IRL) is the inference of hidden preferences of another agent from its observed behaviour. This overcomes the designation of the reward function compared to standard RL [30,31]. IRL is further extended as feature-based IRL [32,33], entropy-based IRL [34,35] and generative adversarial imitation learning [36]. Direct policy learning (DPL) addresses the limitations of behavioral cloning by querying an expert at training time [37,38] to access information across all states in real time. The DAgger algorithm is a pioneering DPL method [39]; although as an online imitation learning algorithm, it was computationally intensive and, therefore, [40] proposed SafeDAgger to minimize the number of queries to expert. Most DPL has been conducted as simulations to ensure access to experts [27], such as observational imitation learning (OIL) [41] and Sim4CV [42], a simulation environment used to evaluate OIL.

Deep learning is associated with robotic learning, but it is primarily focused on robotic vision, through scene representation, spatiotemporal vision, visual place recognition, semantic segmentation, and object detection. Diverse challenges have been solved using deep learning models, including encoding scene information to performing navigation [43]. Deep convolutional neural network (CNN)-receptive field-based deep features have been used to perform image recognition [44]. Another study proposed a CNN-based appearance-invariant image encoding system that encodes to a 128-dimensional vector such that images of the same location under severe weather changes are in proximity [45]. The closed-set limitation in computer vision has been addressed with a CNN coupled with one-vs-all classifiers [46], and scene representation through object-level information using CNNs has also been demonstrated [47]. Despite these capabilities, deep learning approaches are affected by high resource consumption in terms of processing power and memory [43], which is further explicated in industrial cyber–physical environments [48]. Recent studies, such as [49,50], have introduced multi-objective optimization models as an alternative to deep learning approaches or disassembly line balancing, focusing on minimizing idle rate, cost, and energy consumption.

To develop robust and efficient robot motion intelligence systems, various approaches have been proposed. The NEAT system [51] consists of three primary components, the encoder, the neural attention field, and the decoder. The Encoder extracts image features from multiple camera inputs while incorporating the robot’s velocity, utilizing a ResNet-based convolutional neural network (CNN) and a transformer. The neural attention field, conditioned on a given target location and time, identifies the most relevant visual features for trajectory planning. The decoder processes the output of the neural attention field to perform two key functions, semantic understanding and motion planning. It detects objects and their spatial locations within the environment, referred to as semantics, and determines the appropriate motion trajectory by predicting offsets in the bird’s-eye-view (BEV) plane, represented as x and y displacements. Both the neural attention field and the decoder are implemented using deep neural networks.

The learning from all vehicles (LAV) [52] system has three key modules. They are the perception module, the motion planner, and the low-level controller. The perception module processes input from RGB camera images and LiDAR sensor data to construct a robust and generalizable semantic map of the environment. The motion planner utilizes this semantic map, along with a high-level command (e.g., “turn left” or “go straight”) and GNSS (global navigation satellite system) goal coordinates, to generate future waypoints that define the vehicle’s trajectory. This is achieved through a sequential architecture employing two GRU (gated recurrent unit) networks. Finally, the low-level controller translates the planned trajectory into driving commands while avoiding collisions.

The unified autonomous driving (UniAD) [53] framework consists of four key stages, which are backbone, perception, prediction, and planning. The backbone module processes multi-view camera images and converts them into a unified BEV feature representation using the BEV encoder from BEVFormer [54]. The perception stage comprises two parallel components: TrackFormer and MapFormer, both of which take the BEV feature set as input. TrackFormer, inspired by MUTR3D [55], performs 3D object detection and multi-object tracking to identify and track objects across frames. MapFormer is responsible for modeling both the ego vehicle (the self-driving car itself) and surrounding objects, while also performing panoptic segmentation to identify lanes, other vehicles, and crossings. The prediction stage includes MotionFormer and OccFormer. MotionFormer takes the outputs from TrackFormer, MapFormer, and BEVFormer to predict future trajectories of all objects, including the ego vehicle. It generates possible travel paths for each identified object within the BEV space, primarily leveraging transformer networks. The OccFormer encodes object-specific motion information from the MotionFormer into the scene representation learned by the BEVFormer and predicts future environmental occupancy. The output is a BEV grid where each grid cell indicates whether it will be occupied in the future and, if so, which object will occupy it. Finally, the planning stage generates the vehicle’s future waypoints using a trajectory planner. It integrates BEV features with OccFormer outputs to generate an optimized 2D BEV trajectory, ensuring a collision-free, and feasible path. Every component in the UniAD system is implemented using transformers.

2.2. Vector Symbolic Architectures

Symbolic computing with large vectors to solve computational problems is the foundational notion of vector symbolic architectures [56]. These large vectors are known as hypervectors or hyperdimensional computing, and the dimensionality is much larger than needed to distinguish between the number of representations. The type of VSA influences the nature of the hypervectors in use. For example, in a multiply-add-permute bipolar architecture, the hypervectors have components from

{- 1, 1}

[57]. The storage of compositional structures including multiple hypervectors in a single hypervector is facilitated by the distribution of information across many dimensions [57]. Randomly chosen hypervectors are also quasi-orthogonal due to their high dimensionality [58], which also ensures that the vectors are manipulated in carefully curated operations to produce hypervectors in the same VSA space. The operations of a typical VSA are similarity, binding, unbinding, and superposition.

Similarity ⨀ is the operation to measure how close two hypervectors are in the vector space they occupy. If you consider two vectors $u, v$ with positive and negative components that are selected randomly, their similarity is expected to be close to zero: $u ⨀ v \approx 0$ .
Binding ⊗ operates on two (or more) vectors and associates them together: $w = u \otimes v$ . The idea of binding is commonly used to associate concepts such as role-filler pairs. Bound vectors can later be retrieved using an unbinding operation.
Unbinding ⊘ reverses the binding operation, i.e., if $w = u \otimes v$ , then $w ⊘ v \approx u$ .
Superposition ⊕ operates on two hypervectors and combines them: $u \oplus v = w$ . The purpose of superposition is to accumulate concepts of a certain entity to generate a holistic representational hypervector.

VSAs have been developed into diverse AI models and applications, such as in memory computing [59], hand gesture recognition [60], and multi-channel time series classification [61]. VSA-based learning algorithms have also been proposed such as the hyperseed algorithm for unsupervised learning [62], the generalization of tri-state learning [63], and incremental learning [64]. The distributed nature of representations in VSAs enable them to be highly noise tolerant and, as a result, suitable for cognitive processing tasks like decision making and adaptation. Some of these tasks are command, control, communications, computers, cyber, intelligence, surveillance, and reconnaissance (C5ISR), and the observe, orient, decide, and act (OODA) loop [65]. As discussed in [65], in C5ISR and OODA contexts, VSAs can simulate cognitive processes at the network edge, and have the ability to enhance autonomy and decision superiority by integrating heterogeneous data streams. OODA frameworks using VSAs [65] can handle multi-sensor data when performing complex and time-sensitive operations. They encode heterogeneous sensor inputs into high-dimensional semantic representations. This simplifies the complex computations used in sensor fusion techniques since VSA operations can be used to combine different sensor vectors. As time-efficient sensor fusion methods, VSA enhances the performance of cognitive models for the observation and orientation phases of the OODA loop [65,66]. Another advantage of the semantic vector space generated by VSA is in applications such as neuromorphic event-based cameras [67] where frame-based CMOS sensors are integrated with VSAs to offer bandwidth efficient scene understanding capabilities.

2.3. Blockchain-Based Smart Contracts

Distributed ledger technologies (DLTs) encompass a range of capabilities that facilitate the recording, verification, and management of transactions across a distributed network. Unlike traditional centralized databases, where a single entity has control over the data, DLTs rely on a network of distributed nodes that collectively maintain and update the ledger. Blockchain, a specific implementation of a DLT, organizes data into a series of linked blocks, each containing a set of transactions [68]. This structure creates a chronological chain of records that is resistant to tampering and fraud. The decentralized nature of DLTs and blockchain ensures that no single participant has control over the entire network, which enhances security, transparency, and trust in the recorded data. Overall, blockchain technology has developed from theoretical concepts to practical applications that enhance data integrity, transparency, and security. Its ability to immutably record transactions and manage digital assets has created a robust infrastructure for verifying ownership and preventing unauthorized modifications.

Nick Szabo introduced the notion of blockchain-based smart contracts [69]; which are self-executing contracts with encoded rules. This concept later influenced Bit Gold [70], a precursor to modern cryptocurrencies based on decentralized hashing algorithms. More recently, Ethereum has materialised Szabo’s concept of smart contracts. Ethereum’s protocol allows for the deployment of self-executing computer programs on the blockchain, expanding the technology’s use beyond simple transactions to a wide array of decentralized applications. This advancement enabled the recording and management of complex activities on-chain, including the verification and enforcement of digital tokens and intellectual property. The working mechanism of a smart contract involves writing the contract’s terms in code, typically using a programming language like Solidity for platforms such as Ethereum. Once the contract is coded, it is deployed on a blockchain, where it becomes a decentralized application that can automatically execute predefined actions. When the contract’s conditions are met, such as receiving a specific payment or event trigger, it self-executes without any intermediary, carrying out the agreed-upon actions. The transaction is then recorded on the blockchain, ensuring transparency, security, and immutability. This process eliminates the need for third parties, reduces errors, and ensures integrity.

Smart contracts can improve the security, efficiency, and coordination of robotic systems, such as securing robot swarms against malicious byzantine robots that can disrupt swarm behaviour [71,72]. Smart contracts serve as "meta-controllers” that govern the swarm’s collective decision-making and sensing tasks to ensure resilience against byzantine failures. Smart contracts further enable secure and efficient multi-robot collaboration and coordination [73,74]. By recording robot interactions and status on the blockchain, smart contracts can facilitate tasks such as secure docking, resource sharing, and task allocation among heterogeneous robots. This enhances the security and reliability of industrial robotic applications. Furthermore, blockchain technology can support the integration of robotics with other Industry 4.0 technologies, such as the Internet of Things (IoT) and cloud computing [75]. Smart contracts can automate payments, data management, and other interactions between robots, IoT devices, and cloud services in a decentralized manner [76].

3. The Proposed RMI Framework

Robotic motion systems can be broadly classified into mediated perception and behavior reflex. Mediated perception [77] identifies the motion task as a combination of elements; for instance, autonomous driving is dissected into several visual elements of lanes, vehicles, pedestrians, traffic lights, etc., using an AI-based decision-maker [78]. The recognition of these visual elements is combined to generate a representation of the world so that the AI-based decision-maker can generate an informed steering decision. These systems are modular and usually decouple the path planning and control from the initial perception step [79]. Most autonomous navigation is based on mediated perception approaches [80].

The proposed RMI framework is illustrated in Figure 1. It is composed of three primary components—perception, motion control, and the agent. By abstracting the intricacies of robotic motion in this manner, we gain a clearer understanding of its internal modularity. More importantly, this abstraction provides a structured framework, highlighting specific areas where artificial intelligence can be integrated and further developed to enhance the capabilities of the system. The main components of the robotic motion intelligence are as follows: (1) perception intelligence, (2) motion control intelligence, and (3) decisional intelligence.

Perception intelligence is responsible for state estimation, a multifaceted process that encompasses the recognition of scenes and objects, precise localization, and meaningful interaction with humans. This capability enables robots to integrate and operate within dynamic environments. Perception intelligence extracts and processes data from the environment through sensors embedded within the robot. Perception intelligence can be unpacked based on the constituent sensors as follows:

Visual sensors for still images and video footage, facilitating dynamic vision capabilities. Through these sensors, robots can discern and interpret their surroundings, recognizing patterns, obstacles, and entities.
Depth sensors for spatial understanding. These sensors aid in tasks like simultaneous localization and mapping (SLAM) and 3D mapping. By gauging depth, robots can navigate complex terrains and avoid potential obstructions.
Audio sensors—besides sound detection, these sensors are equipped for voice and speech recognition, natural language processing (NLP), and noise suppression. This not only enables robots to understand human commands but also to interact in noisy environments.
Touch sensors to detect tactile feedback, such as pressure variations and collisions. By sensing touch, robots can navigate safely, avoiding undue pressure on objects and preventing potential damage during interactions.

Motion control intelligence is the second foundational abstraction of RMI, and focuses on the robot’s ability to move and interact with its environment in a purposeful and efficient manner. Motion control intelligence ensures that movements are not just reactive but are proactive, predictive, and purpose-driven, allowing for optimal performance in diverse settings. It is composed of the following:

Grasp Detection: The robot’s ability to hold an object using optimal force and positioning. This involves identifying and reaching the proposed pose of the final actuator with the highest likelihood of achieving a successful grasp. This ensures that the robot can interact with various objects securely and effectively.
Path Planning: This involves determining a route through path planning as a dynamic and responsive task. It requires the robot to continuously adapt its path based on real-time changes in the environment, ensuring safe and efficient navigation in unpredictable settings.
Trajectory Planning: Path planning determines the route and trajectory planning is about how the robot will traverse that path. It determines the sequence of movements, speeds, and orientations the robot should adopt to reach its destination.

Decisional intelligence is the cognitive layer of robotic AI that bridges perception and action. While perception intelligence interprets the environment and motion control intelligence dictates how a robot moves within it, decisional intelligence determines the ‘why’ and ‘when’ behind those movements, based on its understanding of the environment. Central to decisional intelligence is the training of the underlying intelligent agent. This training leverages advanced techniques such as the following:

Reinforcement Learning (RL): A robotic learning technique for learning by interacting with the environment. Through a system of rewards and penalties, the agent learns to make decisions that maximize a certain objective over time. It is akin to teaching a robot through trial and error, allowing it to understand the consequences of its actions and refine its decision-making process accordingly.
Imitation Learning: A robotic learning technique where robots learn from demonstrations, often by humans or other trained agents. Instead of learning from scratch, they leverage existing knowledge, adapting and refining it based on their own experiences.

The primary responsibility of decisional intelligence is the mapping of perception to motion signals, by translating what the robot ‘sees’ and ‘understands’ into actionable movements. For instance, if perception intelligence identifies an obstacle, decisional intelligence determines whether to navigate around it, jump over it, or stop and reassess. It is this decision-making capability that ensures robots move and interact with purpose, strategy, and adaptability.

This RMI framework encapsulates the synergies of perception intelligence, motion control intelligence, and decisional intelligence. It represents these three foundational concepts, orchestrating their interactions to achieve effective robotic functionalities. RMI leverages AI capabilities to initiate, manage, action, and optimise perception, motion, and the intelligent agency of a robot. Functioning as a central hub, RMI controls the flow of information. (1) From perception to agent, RMI ensures that the data and insights from the environment through perception intelligence are effectively channeled to the decision-making agent. This ensures that the agent’s decisions are always informed by the most recent and relevant environmental data. (2) From agent to motion control—once the agent makes a decision based on the perceived environment, RMI oversees the translation of these decisions into actionable motion commands, ensuring they align with the robot’s capabilities and safety protocols. Besides task completion, RMI provides a safety layer, vetting the decisions made by the agent. By doing so, it ensures that the robot’s actions are not only efficient but also safe, mitigating potential risks. RMI also maintains an internal representation of the environment. This dynamic model allows the robot to anticipate changes and adapt accordingly. For instance, if there’s a drastic alteration in the environment, RMI has the authority to enforce modifications to the perception module, ensuring the robot’s understanding remains accurate. Furthermore, the bidirectional communication between motion control and RMI is crucial. Supplemented with this information, RMI can make informed decisions about which control signals from the agent should be filtered or modified before being relayed to the actuators. By orchestrating the interactions between perception, decision-making, and motion, RMI ensures that robots are proactive and adaptive in response to the complexities of diverse environments.

This RMI framework becomes computation-efficient through VSA capabilities and data-efficient through blockchain-based smart contracts. As depicted in Figure 2, perception intelligence, motion control intelligence, and decisional intelligence are consolidated into computation-efficient VSA representations, which then feed into RMI actions. These RMI actions become data-efficient within the blockchain-based smart contracts in the top layer. The following subsections delineate these two capabilities in the context of a navigational robot, equipped with a camera, depth and perception sensors, and motor-powered wheels for motion. Certain movement restrictions are assumed in the actuators that are imposed for the safety of the hardware or due to the nature of the actuators.

3.1. Perception Intelligence

Given the large diversity of sensors used in robotic perception, several VSA methods have been proposed to encode those sensor inputs into hypervectors [81,82,83]. Multiple sensors can be fused together by mapping all sensor output vectors to the same dimension by a random projection and combining them using VSA operations to generate a single sensor vector [65].

{s^{'}}_{m \times 1} = R_{m \times n} s_{n \times 1}

(1)

As shown in Equation (1), n dimensional sensor projection can be mapped to an m dimensional hypervector using a random matrix

R

. Each sensor is given a label hypervector which is used to combine each sensor output with its sensor during the fusion of data. If the sensor data is a scalar (e.g., temperature or humidity), the label vector is scaled with the sensor reading or different random hypervectors can be assigned for different bins of the readings. Once all the sensor readings are mapped to hypervectors, sensor output is fused by superimposing the bound vectors of each label and sensor reading hypervectors as follows:

v_{perception} = ⨁_{i} s_{i}^{'} \otimes b_{i}

(2)

where

b_{i}

is the label vector for sensor i. We call the resulting hypervectors

v_{perception}

the “perception vector”. For example, consider the following sensors in Table 1, which are typical in most robots.

In Table 1, all sensor label hypervectors are d-dimensional random hypervectors chosen from the target hyperdimensional space to represent each sensor. At given time instances, readings from each sensor are from diverse dimensional spaces. Therefore, as per Equation (1), sensor readings have to be mapped to the target d-dimension as follows:

\begin{matrix} {s_{camera}^{(\cdot)}}^{'}_{(d \times 1)} = R_{(d \times n)}^{(1)} s_{camera (n \times 1)}^{(\cdot)} \end{matrix}

(3)

\begin{matrix} {s_{lidar}^{(\cdot)}}^{'}_{(d \times 1)} = R_{(d \times m)}^{(2)} s_{lidar (m \times 1)}^{(\cdot)} \end{matrix}

(4)

\begin{matrix} b_{sonar (d \times 1)}^{'} = d \cdot b_{sonar (d \times 1)} \end{matrix}

(5)

\begin{matrix} {b_{acc}^{(\cdot)}}^{'}_{(d \times 1)} = a_{(\cdot)} \cdot b_{acc (d \times 1)}^{(\cdot)} \end{matrix}

(6)

R^{(1)}

and

R^{(2)}

are random matrices chosen to project the raw sensor reading to d-dimensional hypervector space. These matrices are kept constant for each sensor to project subsequent sensor readings. Scaler sensor readings are used to scale the label vectors. After the projection is complete, the resulting hypervectors can be bundled as per Equation (2) to generate the perception vector as follows:

\begin{matrix} v_{perception} = (⨁_{i \in R, G, B} {s_{camera}^{(i)}}^{'} \otimes b_{camera}^{(i)}) \oplus (⨁_{i \in X, Y, Z} {s_{lidar}^{(i)}}^{'} \otimes b_{lidar}^{(i)}) \oplus b_{sonar}^{'} \\ \oplus (⨁_{i \in X, Y, Z} {b_{acc}^{(i)}}^{'}) . \end{matrix}

(7)

Similarly, we could integrate further sensors into the perception intelligence with minimal changes to the framework, enhancing adaptability for different robotic frameworks and tasks.

The perception vector has the knowledge of the surrounding environment, and this is the main output from the perception module to the central RMI module. RMI has the option to engage with this representation or pass it directly to the agent.

3.2. Motion Control Intelligence

The motion control module is unique to each type of robot. Defining a high-level action space is important for the agent as this is used as an action space to provide instructions through the RMI module. The motion control module uses the given instruction and performs motion planning and execution. As an example, for wheeled or legged robots, forward-backward or left-turn right-turn motions are typical actions, whereas for aerial robots, vertical, horizontal, and diagonal movement are typical actions. Therefore, in a navigational robot setting, slow movement, fast movement, turns, and stopping are suitably defined as the action space. Each such action can be represented as a random hypervector and be used for further computations.

Certain robots have limitations when it comes to the action space. For example, turning a navigational robot traveling at a high speed could be unsafe for the robot as well as its surroundings as it has an increased risk of slipping. Another example is a robot arm trying to move through its own body space, as it might collide with itself. To encode this type of restriction in a VSA, where the sequence of actions are restricted both ways, we can use the following VSA operations, which results in the “motion intelligence” vector.

v_{motion} = \sum_{(i, j)} a_{i} \otimes a_{j}

(8)

where

a_{i}

and

a_{j}

are two hypervectors representing actions from the action space, which are constrained to being performed in sequence in any order. Specifically, the i-th action cannot be followed by the j-th action, and vice versa. Note that Equation (8) can be changed as per the requirement as here we demonstrate a basic constraint for demonstration.

3.3. RMI Agent and Decisional Intelligence

The RMI agent takes decisions regarding the motions of the robot. It uses inputs from perception module and provides actions to the motion control module via the RMI. The agent receives the perception vector and is tasked with understanding which action the robot should perform. To learn which positions correspond to which actions, we employ the following methodology inspired by [84]. For each action, we bind the action vector with the perception vector in the following manner and superimpose all the results for the complete task or action.

v_{agent} = ⨁_{t}^{T} v_{perception (t = t)} \otimes a_{(t = t)}

(9)

where T is the total simulation timesteps and for each timestep t = t, the action assigned being

a_{t = t}

from the action space. We call

v_{agent}

the “decisional intelligence” vector. This vector can be seen as the robot’s knowledge of the task.

During inference, the perception module captures the environment, creates the perception vector, and passes it to the agent. This vector is then unbound from the decisional intelligence vector, followed by clean-up memory to recover the action corresponding to the current location.

v_{agent} ⊘ v_{perception (t = t)} = a_{(t = t)} + R^{'} \approx a_{(t = t)}

(10)

where

R^{'}

is a noisy hypervector. The result of the unbinding operation can be cleaned by calculating the similarities to the action vectors and choosing the best match. Once the action is recovered from the decisional intelligence vector, the action vector is passed to the RMI.

This module further facilitates the communication between other modules while utilizing high-level planning to ensure the goal is achieved while preserving the safety of the robot. Robots are susceptible to safety threats originating from their operational environment or the internal control instructions. Threats from the operational environment can be mitigated by maintaining accurate and near-real-time communication between the perception module and the agent in order for the agent to determine the best sequence of actions to preserve the safety of the robot. It is expected that even for a well-trained agent, when an unfamiliar location is seen, the action can be ambiguous. VSA operations can provide confidence in the action provided by the agent using the similarity of the recovered result of the unbinding operation in Equation (10). If the action has a low confidence, RMI has the option to seek an alternative approached to recover an action or fall back to a safer action. One alternative if the recovered action has low confidence (i.e., a low similarity to any of the action vectors) is to recover the average perception vector for each action in the action space and compare it against the current location.

v_{perception (avg)}^{(i)} = v_{agent} ⊘ a_{i}

(11)

Once the average perception vector for each action is calculated using Equation (11), cosine similarity to the current perception vector can be calculated and the action related to the best match can be identified as an appropriate action.

Actions generated by agents are based on the perception of navigational robots. This introduces the risk of producing infeasible or harmful action sequences, as the agent has no knowledge of their limitations in motion control. In this situation, RMI acts as a mediator where actions generated by the agent can be matched against the motion intelligence vector. This matching ensures that only safe action sequences will be permitted for execution. RMI stores and records all actions given to the motion control in its blockchain ledger. Soon as a new action is generated by the agent, RMI can bind the previous action to the new action and can compare their similarity to the motion control vector. A higher similarity indicates that the sequence of actions is invalid or cannot be handled by the actuators. During this time, the RMI can further investigate an appropriate action or fall back to a safer default action.

sim (a_{(t = t)} \otimes a_{(t = t - 1)}, v_{motion})

(12)

3.4. Blockchain-Based Smart Contracts

Smart contracts provide an immutable record of perception intelligence, motion control intelligence, and decisional intelligence within the proposed RMI framework. The smart contract maintains a local blockchain that records the progression of the three VSA vectors delineated above: perception (

v_{perception}

), motion control (

v_{motion}

), and decisional intelligence (

v_{agent}

). In a single-robot setting, the smart contract ensures an immutable record of how intelligence vectors evolve over time and the blockchain creates a verifiable history of the robot’s learning progression. This action-based record becomes important for analyzing the development of understanding and decision-making capabilities. Through smart contracts, the blockchain also enforces safety constraints by validating motion sequences before execution, ensuring that potentially harmful actions are prevented before they occur. Furthermore, the permanent recording of each decision’s context, confidence level, and outcome enables comprehensive analysis for continuous improvement of the robot’s performance. For instance, Figure 3 presents the role of the smart contract in enforcing safety constraints of the RMI framework. The RMI module is programmed to generate a transaction in the blockchain ledger before a motion vector is passed to the motion control module. Each block generated by such transactions has a perception vector, which is passed to the agent; a decisional vector, which is provided by the agent using the above perception vector; and a motion control vector passed by the RMI module to the motion control module based on that decisional vector. It also includes a timestamp to record when the motion control vector was passed to the motion control module. As illustrated under robot location C in Figure 3, the RMI module will change the decision vector taken by the agent as explained in Section 3.3 since executing the ‘stop’ action right after the ‘travel faster’ action is defined as an invalid sequence of actions. It uses the latest block in the ledger to get the previous motion control vector. Smart contracts are used in the RMI module to prevent the execution of invalid action sequences and to ensure the immutability of the safety constraints. Furthermore, in multi-robot scenarios, where robots operate in swarms, smart contracts are effective at transforming individual action outcomes into collective/shared knowledge. Effective and safe perceptions, motion sequences, and decision strategies discovered by one robot become available to other robots in the swarm, which then accelerates the learning process for the entire swarm. This includes safety-related discoveries, such as unsafe navigational actions or environmental hazards that are propagated immediately through the smart contract network to improve the safety of the entire swarm. The knowledge sharing process occurs through blockchain synchronization when robots are within communication range, allowing each robot to benefit from the collective intelligence while maintaining its autonomy. This creates a form of distributed robotic intelligence in which:

v_{shared} = ⨁_{r \in C (t)} v_{r}

(13)

where

v_{shared}

represents the collective RMI vector formed by superimposing individual RMI vectors

v_{r}

within the cluster

C (t)

. This mechanism ensures that learning experiences propagate through the swarm while maintaining the computational efficiency of VSA operations. The combination of VSAs’ efficient vector operations with blockchain’s secure, distributed record-keeping creates a robust framework for both individual robot intelligence and swarm learning.

4. Empirical Evaluation of the RMI Framework

We conducted two distinct experiments to evaluate the proposed RMI framework, a single navigational robot for visual place recognition-type task and multiple navigational robots for a landmark discovery-type task. The following subsections present the experimental setup and results that confirm the effectiveness of the proposed RMI framework.

4.1. Experiment 1—Single Navigational Robot for Visual Place Recognition

This experiment is based on a single navigational robot that is required to navigate and stop at a specific location upon recognizing the location. The agent has three possible actions: traveling more slowly, traveling faster, and stopping. Certain movement restrictions are imposed to prevent sudden movements such as the robot being unable to stop while moving quickly and being unable to accelerate excessively from a stationary position. The visual place recognition (VPR) dataset, GardernsPointWalking, forms the basis for this experiment. The GardensPointWalking dataset has 200 images of the same walking path taken from different viewpoints and at different times of day. The robot is initially trained to use the day-left images of the dataset with labelled actions from the action space for each image and to use Equation (9) to train a decisional intelligence vector. To represent the images in the perception vector, we use the HDC-DELF approach [82] to extract image features from camera images. From each image, 200 DELF local descriptors [85] are extracted and combined based on methods proposed here [82]. The final result is a 1024-dimensional holistic vector combining all local descriptors for each image. This vector is projected to 10,000 dimensions using a Gaussian random matrix as in Equation (1). Since the perception vector contains real-valued elements, we use the holographic reduced represented (HRR) [86] VSA for our RMI implementation. The binding, unbinding, superimposing, and similarity operations are implemented as per [57]. Following the completion of task training, we determined the effectiveness of the RMI by querying with the same day-left dataset to check the retention of information with and without the RMI. Then, we integrated the capability of the RMI to identify low-confidence actions, followed by the integration of motion intelligence capability to the RMI and the filtering of the action sequences. The results are summarized in Figure 4.

From the results, we can observe how the similarity of all three actions decrease at approx. frame 100. This means the agent was not able to recover sufficiently confident actions. At this point, the RMI intervenes and tries to find a higher confidence action by following Equation (11), which in this case was successful. If the agent cannot find a sufficiently high-confidence action after this intervention, it falls back to the safer default action that has been defined, which is slower motion. Around frames 100 and 135, the agent produces unsafe control sequences of stopping while traveling quickly. This triggers the unsafe action trigger as explained in Equation (12) and the RMI falls back to a safe default action. Across all results, it is evident that compared to the agent’s isolated actions, the RMI-initiated actions are safe and more adaptive.

Parameter Selection

The selection of key parameters in this experiment is based on a balance between computational efficiency and task performance. The dimensionality of the hypervectors in the perception vector was set to 10,000 to ensure sufficient capacity for encoding image features while maintaining manageable computational complexity. The choice of 200 DELF local descriptors was made to strike a balance between feature diversity and redundancy, ensuring that the perception vector captures enough scene information without excessive noise. The Gaussian random projection matrix used for dimensionality reduction was initialized with a fixed random seed to ensure the reproducibility of results across trials. Additionally, we evaluated different similarity thresholds for the retrieval of actions from the decisional intelligence vector, selecting an optimal threshold that minimizes incorrect actions while maintaining adaptive behavior. The selection of action confidence thresholds was also tuned through empirical testing to prevent false positives when filtering out low-confidence actions. Lastly, movement constraints, such as preventing abrupt stops from high-speed motion, were incorporated through predefined rules to ensure safe and realistic robotic behavior. These parameters collectively shape the effectiveness of the RMI framework in navigating the robot while adhering to real-world constraints.

In comparison to similar work such as [51,52,53,87,88], we use only 30,000 parameters, for training on the GardensPointWalking dataset. We use three 10,000-dimension VSA vectors for the agent, perception, and motion intelligence. We demonstrate the few-shot learning capability of VSAs to achieve the same outcomes of computationally intensive deep learning methods that require 10-times more parameters. See Table 2 for a detailed comparison.

4.2. Experiment 2—Multiple Navigational Robots for Landmark Discovery

This experiment is based on a multi-robot landmark discovery scenario implemented using the ARGoS (Autonomous Robots Go Swarming) [89] robot simulator. While the complete RMI framework incorporates VSA-based perception and motion intelligence, this initial experiment focuses on validating the blockchain-based coordination layer. ARGoS, a multi-physics robot simulator designed specifically for robot swarm experiments, provides realistic physics models and customizable robot controllers, allowing for the efficient simulation of large-scale robot swarms. The experiment involves five robots tasked with identifying and mapping five landmarks distributed across unexplored terrain. Each robot is equipped with proximity sensors for landmark detection and wireless communication capabilities for peer-to-peer networking. All of the robots have the same action space. While robots use basic random walking exploration strategies, ARGoS’s modular architecture enables us to integrate custom control software and external modules, making it ideal for our blockchain-based coordination experiments. Figure 5 depicts this experimental setup. Toychain is a blockchain implementation platform [90]. Originally designed for robot swarm research, Toychain implements core blockchain functionalities while maintaining integration with robotic systems. In our experiment, each simulated robot runs a Toychain node that maintains a local blockchain copy and participates in the consensus process. Toychain’s proof of authority consensus mechanism is particularly suitable for our robot swarm scenario as it provides deterministic block production without the computational overhead of traditional consensus mechanisms like proof of work. The integration between ARGoS and Toychain occurs through a Python-based interface where each robot controller maintains both its physical state in ARGoS and its blockchain state in Toychain. The robots’ movement and landmark detection are handled by ARGoS, while discovery sharing and consensus are managed by Toychain. When a robot detects a landmark in the ARGoS simulation, it creates a transaction in its Toychain node, which then propagates through the blockchain network based on the robots’ physical proximity. This setup allows us to study both the physical dynamics of swarm exploration and the logical aspects of a distributed consensus in a unified experimental framework.

The results are visualised in Figure 6 which depicts how five robots maintained their local blockchains and memory pools throughout the experiment. Starting with a genesis block, each robot (R1 to R5) was assigned block generation authority in a round-robin manner. As robots moved around the arena, they formed dynamic clusters that enabled blockchain and memory pool synchronization. The discovery process unfolded across seven blocks, demonstrating how our blockchain implementation handled distributed landmark discovery. It began when R5 found Landmark 5 (L5) and stored it in its memory pool. At this point, R2, who had the authority to create Block 2, was not connected to R5’s cluster and thus generated an empty block. The L5 discovery spread through the network as robots moved and formed new clusters—R2 synced with R3, R3 with R5, and R1 with R5. When R3’s turn came to generate Block 3, it included the L5 discovery and this block was eventually synchronized across most robots. More landmarks were discovered as the robots continued exploring. R2 found L4 and L1, while R3 discovered L3 and L2. These discoveries spread through the network as the robots formed and dissolved clusters based on their movements. R4’s Block 4 included L4 and L1, showing how the system handled multiple discoveries and removed duplicates based on timestamps. At this stage, network partitioning led to a temporary fork—R4 and R5 had the longest chain with four blocks, while others maintained shorter chains. The final stages of discovery showed how the system recovered from network partitions. R5 and R1 generated Blocks 5 and 6, respectively, both empty since their memory pools had no new discoveries. The process was completed when R2 reconnected with R3, received the missing L3 and L2 discoveries, and included them in Block 7. This final block, containing all five landmark locations, eventually propagated to all robots through cluster synchronization.

These results confirm how the RMI framework ensures efficient memory pool management with effective transaction de-duplication, robust handling of temporary network partitions, and successful maintenance of consensus through the proof of authority mechanism. The dynamic formation and dissolution of robot clusters based on physical proximity enabled efficient information propagation without requiring constant connectivity between all robots. The complete process of discovering and recording all five landmarks required seven blocks, with the final blockchain providing a consistent, tamper-proof record of all discoveries across the robot swarm.

4.3. Scalability Analysis

The experiments conducted with five robots demonstrate the feasibility of the blockchain-based coordination layer at small to medium scale. To understand system performance with increasing numbers of robots, we analysed network communication complexity as a primary scaling factor. In a fully connected network where every robot communicates directly with every other robot, the communication overhead would scale quadratically:

C_{full} = O (n^{2})

(14)

where n is the number of robots in the swarm. This reflects the fact that, in the worst case, each robot would need to maintain

n - 1

connections, and across the entire network, there would be

n (n - 1) / 2

connections. Our dynamic clustering approach significantly reduces this complexity by limiting synchronization to proximity-based groups. If we denote the average cluster size as k, which is relatively stable regardless of total swarm size due to physical proximity constraints, the communication complexity can be approximated as:

C_{cluster} = O (k \cdot n)

(15)

This represents a substantial improvement in scalability, transitioning from quadratic to linear as the swarm size increases. In realistic deployments, k is bound by communication range limitations and further practical considerations regarding the number of simultaneous connections each robot can efficiently maintain. For example, robots with a communication range of 10 meters deployed across a large area will only communicate with others within that 10 meter radius. If spatial density remains consistent, each robot will communicate with roughly the same number of neighbors, (k) regardless of total swarm size.

For comparison, in a centralized architecture, where all robots communicate exclusively with a central server rather than directly with each other, the communication complexity would be:

C_{centralized} = O (n)

(16)

Although this is aligned with our pre-selected approach, a centralized architecture has further limitations of single point of failure, bandwidth bottleneck at the server, restricted operational range, and increased communication latency between robots.

4.4. Limitations and Future Work

In this section, we discuss the limitations of the proposed framework. Since the proposed framework for RMI relies on VSAs rather than deep learning, the number of learnable parameters is significantly lower. This can lead to reduced accuracy in certain complex motion tasks compared to state-of-the-art deep learning-based models. However, this trade-off is expected as we focus on explainability in decision-making, which is a key advantage over black-box neural networks. While blockchain-based smart contracts provide a decentralized and secure mechanism for data integrity and decision validation, they introduce computational and network overhead [91]. As the number of robots increases, maintaining a real-time consensus and the synchronization of motion intelligence records may become challenging, requiring the optimization of blockchain protocols. Blockchain operations introduce computational overhead that affects resource-constrained robotic systems [92]. In our implementation, the proof of authority consensus mechanism was selected to minimize computational demands compared to alternatives such as proof of work. However, this approach still requires dedicated processing capacity for block creation, verification, and chain maintenance. Similarly, energy consumption also represents a significant constraint for mobile robotic systems. Each blockchain transaction consumes power for computation, data transmission, and storage operations. In resource-limited scenarios, this consumption can impact operational duration. Our dynamic clustering approach partially addresses this concern by reducing unnecessary communications and synchronizations when robots operate independently. Latency considerations also emerge when implementing blockchain solutions in robotic systems that require real-time responsiveness. In our simulation, block creation is configured to every 5 s, and that can introduce delays across the robot network. While acceptable for landmark discovery tasks, such delays may be prohibitive for applications requiring immediate response to environmental changes or high-frequency coordination. The framework has been validated in specific robotic settings (e.g., navigation and landmark discovery). However, its adaptability to other robotic tasks, such as manipulation or industrial automation, requires further investigation.

To improve accuracy while maintaining explainability, we aim to integrate hybrid approaches that combine VSAs with lightweight deep learning models. This can enable the selective use of deep learning where necessary while leveraging VSAs for interpretable decision-making. Furthermore, we plan to apply and evaluate the RMI framework across additional robotic domains, including robotic arms for industrial automation and drones for aerial navigation, to assess its robustness on contrasting applications. We also plan on investigating the robustness of the framework in dynamic environments.

Another significant area for development involves extending the framework to support heterogeneous robotic systems. The current implementation, as demonstrated in Experiment 2, utilizes a homogeneous set of robots with identical mobility capabilities, perception sensors, and control parameters. This uniformity facilitates straightforward knowledge sharing since all robots interpret and execute actions in a consistent manner. Extending the framework to accommodate robots with diverse capabilities and different action repertoires represents an important challenge. For instance, a swarm might include aerial drones with six degrees of freedom alongside ground-based robots limited to two-dimensional movement. In such scenarios, the VSA representations would need to incorporate mechanisms for translating between different action spaces while preserving semantic meaning.

The current distributed architecture operates without dependence on a centralized infrastructure, using direct robot-to-robot communication within proximity-based clusters. For large-scale deployments across extended operational areas, incorporating edge computing nodes could enhance connectivity while maintaining robustness to individual failures. Performance optimization for larger swarms presents another avenue for future work. While our mathematical analysis indicates reasonable scaling to dozens of robots, practical deployments of hundreds or thousands of robots would benefit from hierarchical consensus mechanisms. Such architectures could organize robots into functional clusters with local consensus processes that periodically synchronize at a higher level, reducing overall communication overhead while maintaining coordination benefits.

5. Conclusions

Dynamic environments, sensory processing, mechanical movements, power management and safety are some of the technical complexities of robotic intelligence that still requires novel AI approaches as we transition from task-oriented robots to more generalist capabilities. In this paper, we have proposed the design and development of a computation-efficient and data-efficient framework for robotic motion intelligence (RMI) based on VSAs and blockchain-based smart contracts. VSAs drive computationally efficient learning and noise suppression during perception, motion, and decision-making, while smart contracts provide for decentralized, distributed, and secure transaction ledgers of these actions and tasks. The framework is empirically evaluated in two distinct settings of a single navigational robot for visual place recognition and multiple navigational robots for landmark discovery. The results of both experiments confirm the effectiveness and validity of the proposed RMI framework with VSA operations for computational efficiency and blockchain-based smart contracts for data efficiency.

Author Contributions

Conceptualization, D.D.S. and M.M.; Methodology, D.D.S., S.W., V.S., H.M., N.M. and M.M.; Software, S.W. and L.G.; Validation, S.W. and L.G.; Formal analysis, S.W., V.S., H.M. and N.M.; Investigation, D.D.S., V.S., L.G., H.M. and M.M.; Resources, L.G., H.M. and N.M.; Writing— original draft, D.D.S., S.W., V.S., L.G., H.M., N.M. and M.M.; Visualization, V.S. and N.M.; Supervision, D.D.S. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Department of Climate Change, Energy, the Environment and Water of the Australian Federal Government, as part of the International Clean Innovation Researcher Networks (ICIRN) program, grant number ICIRN000077.

Data Availability Statement

Data available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cleary, K.; Nguyen, C. State of the art in surgical robotics: Clinical applications and technology challenges. Comput. Aided Surg. 2001, 6, 312–328. [Google Scholar] [PubMed]
Manti, M.; Cacucciolo, V.; Cianchetti, M. Stiffening in soft robotics: A review of the state of the art. IEEE Robot. Autom. Mag. 2016, 23, 93–106. [Google Scholar]
Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Chen, X.; Choromanski, K.; Ding, T.; Driess, D.; Dubey, A.; Finn, C.; et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv 2023, arXiv:2307.15818. [Google Scholar]
Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Dabis, J.; Finn, C.; Gopalakrishnan, K.; Hausman, K.; Herzog, A.; Hsu, J.; et al. Rt-1: Robotics transformer for real-world control at scale. arXiv 2022, arXiv:2212.06817. [Google Scholar]
Reed, S.; Zolna, K.; Parisotto, E.; Colmenarejo, S.G.; Novikov, A.; Barth-Maron, G.; Gimenez, M.; Sulsky, Y.; Kay, J.; Springenberg, J.T.; et al. A generalist agent. arXiv 2022, arXiv:2205.06175. [Google Scholar]
O’Neill, A.; Rehman, A.; Gupta, A.; Maddukuri, A.; Gupta, A.; Padalkar, A.; Lee, A.; Pooley, A.; Gupta, A.; Mandlekar, A.; et al. Open x-embodiment: Robotic learning datasets and rt-x models. arXiv 2023, arXiv:2310.08864. [Google Scholar]
Pomerleau, D.A. Alvinn: An autonomous land vehicle in a neural network. Adv. Neural Inf. Process. Syst. 1988, 1. [Google Scholar]
Pomerleau, D.A. Neural Network Perception for Mobile Robot Guidance; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 239. [Google Scholar]
Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
Cultrera, L.; Seidenari, L.; Becattini, F.; Pala, P.; Del Bimbo, A. Explaining autonomous driving by learning end-to-end visual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 340–341. [Google Scholar]
Schmidgall, S.; Kim, J.W.; Kuntz, A.; Ghazi, A.E.; Krieger, A. General-purpose foundation models for increased autonomy in robot-assisted surgery. Nat. Mach. Intell. 2024, 6, 1275–1283. [Google Scholar]
Wang, A.; Islam, M.; Xu, M.; Zhang, Y.; Ren, H. Sam meets robotic surgery: An empirical study on generalization, robustness and adaptation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer: Cham, Switzerland, 2023; pp. 234–244. [Google Scholar]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
Adikari, A.; De Silva, D.; Ranasinghe, W.K.; Bandaragoda, T.; Alahakoon, O.; Persad, R.; Lawrentschuk, N.; Alahakoon, D.; Bolton, D. Can online support groups address psychological morbidity of cancer patients? An artificial intelligence based investigation of prostate cancer trajectories. PLoS ONE 2020, 15, e0229361. [Google Scholar]
Yang, G.; Pang, Z.; Deen, M.J.; Dong, M.; Zhang, Y.T.; Lovell, N.; Rahmani, A.M. Homecare robotic systems for healthcare 4.0: Visions and enabling technologies. IEEE J. Biomed. Health Inform. 2020, 24, 2535–2549. [Google Scholar] [PubMed]
De Silva, D.; Burstein, F.; Jelinek, H.F.; Stranieri, A. Addressing the complexities of big data analytics in healthcare: The diabetes screening case. Australas. J. Inf. Syst. 2015, 19. Available online: https://proceedings.neurips.cc/paper/1988/hash/812b4ba287f5ee0bc9d43bbf5bbe87fb-Abstract.html (accessed on 31 December 2024).
Rao, Q.; Frtunikj, J. Deep learning for self-driving cars: Chances and challenges. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems, Gothenburg Sweden, 28 May 2018; pp. 35–38. [Google Scholar]
Nallaperuma, D.; De Silva, D.; Alahakoon, D.; Yu, X. Intelligent detection of driver behavior changes for effective coordination between autonomous and human driven vehicles. In Proceedings of the IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3120–3125. [Google Scholar]
Chehri, A.; Jeon, G.; Fofana, I.; Imran, A.; Saadane, R. Accelerating power grid monitoring with flying robots and artificial intelligence. IEEE Commun. Stand. Mag. 2021, 5, 48–54. [Google Scholar]
De Silva, D.; Yu, X.; Alahakoon, D.; Holmes, G. Semi-supervised classification of characterized patterns for demand forecasting using smart electricity meters. In Proceedings of the 2011 International Conference on Electrical Machines and Systems, Beijing, China, 20–23 August 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–6. [Google Scholar]
Taheri, H.; Hosseini, S.R.; Nekoui, M.A. Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation. arXiv 2024, arXiv:2405.16266. [Google Scholar] [CrossRef]
de Moraes, L.D.; Kich, V.A.; Kolling, A.H.; Bottega, J.A.; Grando, R.B.; Cukla, A.R.; Gamarra, D.F.T. Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots Using Double Deep Reinforcement Learning Techniques. arXiv 2023, arXiv:2310.13809. [Google Scholar] [CrossRef]
Huang, Z.; Lv, C.; Xing, Y.; Wu, J. Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving with Scene Understanding. IEEE Sensors J. 2021, 21, 11781–11790. [Google Scholar] [CrossRef]
Kober, J.; Peters, J. Imitation and reinforcement learning. IEEE Robot. Autom. Mag. 2010, 17, 55–62. [Google Scholar]
Rothmann, M.; Porrmann, M. A survey of domain-specific architectures for reinforcement learning. IEEE Access 2022, 10, 13753–13767. [Google Scholar]
Nguyen, N.D.; Nguyen, T.; Nahavandi, S. System design perspective for human-level agents using deep reinforcement learning: A survey. IEEE Access 2017, 5, 27091–27102. [Google Scholar]
Le Mero, L.; Yi, D.; Dianati, M.; Mouzakitis, A. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14128–14147. [Google Scholar]
Haavaldsen, H.; Aasboe, M.; Lindseth, F. Autonomous vehicle control: End-to-end learning in simulated urban environments. In Proceedings of the Nordic Artificial Intelligence Research and Development: Third Symposium of the Norwegian AI Society, NAIS 2019, Trondheim, Norway, 27–28 May 2019; Proceedings 3. Springer: Cham, Switzerland, 2019; pp. 40–51. [Google Scholar]
Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9329–9338. [Google Scholar]
Russell, S. Learning agents for uncertain environments. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, WI, USA, 24–26 July 1998; pp. 101–103. [Google Scholar]
Ng, A.Y.; Russell, S. Algorithms for inverse reinforcement learning. In Proceedings of the Icml, Stanford, CA, USA, 29 June–2 July 2000; Volume 1, p. 2. [Google Scholar]
Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 1. [Google Scholar]
Sadigh, D.; Sastry, S.; Seshia, S.A.; Dragan, A.D. Planning for autonomous cars that leverage effects on human actions. In Proceedings of the Robotics: Science and Systems, Ann Arbor, MI, USA, 18–22 June 2016; Volume 2, pp. 1–9. [Google Scholar]
Ziebart, B.D.; Maas, A.; Bagnell, J.A.; Dey, A.K. Maximum Entropy Inverse Reinforcement Learning. In Proceedings of the Proc. AAAI, Chicago, IL, USA, 13–17 July 2008; pp. 1433–1438. [Google Scholar]
Wulfmeier, M.; Ondruska, P.; Posner, I. Maximum entropy deep inverse reinforcement learning. arXiv 2015, arXiv:1507.04888. [Google Scholar]
Ho, J.; Ermon, S. Generative adversarial imitation learning. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Yang, C.; Liang, P.; Ajoudani, A.; Li, Z.; Bicchi, A. Development of a robotic teaching interface for human to human skill transfer. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 710–716. [Google Scholar] [CrossRef]
Luo, J.; Dong, X.; Yang, H. Session search by direct policy learning. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval, Northampton, MA, USA, 27–30 September 2015; pp. 261–270. [Google Scholar]
Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
Zhang, J.; Cho, K. Query-efficient imitation learning for end-to-end autonomous driving. arXiv 2016, arXiv:1605.06450. [Google Scholar]
Li, G.; Mueller, M.; Casser, V.; Smith, N.; Michels, D.L.; Ghanem, B. Oil: Observational imitation learning. arXiv 2018, arXiv:1803.01129. [Google Scholar]
Müller, M.; Casser, V.; Lahoud, J.; Smith, N.; Ghanem, B. Sim4cv: A photo-realistic simulator for computer vision applications. Int. J. Comput. Vis. 2018, 126, 902–919. [Google Scholar] [CrossRef]
Ruiz-del Solar, J.; Loncomilla, P.; Soto, N. A survey on deep learning methods for robot vision. arXiv 2018, arXiv:1803.10862. [Google Scholar]
Zhou, B.; Lapedriza, A.; Xiao, J.; Torralba, A.; Oliva, A. Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
Gomez-Ojeda, R.; Lopez-Antequera, M.; Petkov, N.; Gonzalez-Jimenez, J. Training a convolutional neural network for appearance-invariant place recognition. arXiv 2015, arXiv:1505.07428. [Google Scholar]
Sünderhauf, N.; Dayoub, F.; McMahon, S.; Talbot, B.; Schulz, R.; Corke, P.; Wyeth, G.; Upcroft, B.; Milford, M. Place categorization and semantic mapping on a mobile robot. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 5729–5736. [Google Scholar]
Liao, Y.; Kodagoda, S.; Wang, Y.; Shi, L.; Liu, Y. Understand scene categories by objects: A semantic regularized scene classifier using convolutional neural networks. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2318–2325. [Google Scholar]
Madhavi, I.; Chamishka, S.; Nawaratne, R.; Nanayakkara, V.; Alahakoon, D.; De Silva, D. A deep learning approach for work related stress detection from audio streams in cyber physical environments. In Proceedings of the 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria, 8–11 September 2020; IEEE: Piscataway, NJ, USA, 2020; Volume 1, pp. 929–936. [Google Scholar]
Tian, G.; Zhang, C.; Zhang, X.; Feng, Y.; Yuan, G.; Peng, T.; Pham, D.T. Multi-objective evolutionary algorithm with machine learning and local search for an energy-efficient disassembly line balancing problem in remanufacturing. J. Manuf. Sci. Eng. 2023, 145, 051002. [Google Scholar]
Zhang, X.; Tian, G.; Fathollahi-Fard, A.M.; Pham, D.T.; Li, Z.; Pu, Y.; Zhang, T. A chance-constraint programming approach for a disassembly line balancing problem under uncertainty. J. Manuf. Syst. 2024, 74, 346–366. [Google Scholar]
Chitta, K.; Prakash, A.; Geiger, A. NEAT: Neural Attention Fields for End-to-End Autonomous Driving. arXiv 2021, arXiv:2109.04456. [Google Scholar] [CrossRef]
Chen, D.; Krähenbühl, P. Learning from All Vehicles. arXiv 2022, arXiv:2203.11934. [Google Scholar] [CrossRef]
Hu, Y.; Yang, J.; Chen, L.; Li, K.; Sima, C.; Zhu, X.; Chai, S.; Du, S.; Lin, T.; Wang, W.; et al. Planning-oriented Autonomous Driving. arXiv 2023, arXiv:2212.10156. [Google Scholar] [CrossRef]
Li, Z.; Wang, W.; Li, H.; Xie, E.; Sima, C.; Lu, T.; Yu, Q.; Dai, J. BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. arXiv 2022, arXiv:2203.17270. [Google Scholar] [CrossRef]
Zhang, T.; Chen, X.; Wang, Y.; Wang, Y.; Zhao, H. MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries. arXiv 2022, arXiv:2205.00613. [Google Scholar] [CrossRef]
Gayler, R.W. Vector symbolic architectures answer Jackendoff’s challenges for cognitive neuroscience. arXiv 2004, arXiv:cs/0412059. [Google Scholar]
Schlegel, K.; Neubert, P.; Protzel, P. A comparison of vector symbolic architectures. Artif. Intell. Rev. 2022, 55, 4523–4555. [Google Scholar] [CrossRef]
Kanerva, P. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 2009, 1, 139–159. [Google Scholar]
Karunaratne, G.; Le Gallo, M.; Cherubini, G.; Benini, L.; Rahimi, A.; Sebastian, A. In-memory hyperdimensional computing. Nat. Electron. 2020, 3, 327–337. [Google Scholar] [CrossRef]
Moin, A.; Zhou, A.; Rahimi, A.; Menon, A.; Benatti, S.; Alexandrov, G.; Tamakloe, S.; Ting, J.; Yamamoto, N.; Khan, Y.; et al. A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nat. Electron. 2021, 4, 54–63. [Google Scholar]
Schlegel, K.; Rachkovskij, D.A.; Osipov, E.; Protzel, P.; Neubert, P. Learnable Weighted Superposition in HDC and its Application to Multi-channel Time Series Classification. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Osipov, E.; Kahawala, S.; Haputhanthri, D.; Kempitiya, T.; De Silva, D.; Alahakoon, D.; Kleyko, D. Hyperseed: Unsupervised learning with vector symbolic architectures. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 6583–6597. [Google Scholar]
Kleyko, D.; Osipov, E.; De Silva, D.; Wiklund, U.; Alahakoon, D. Integer self-organizing maps for digital hardware. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
Hersche, M.; Karunaratne, G.; Cherubini, G.; Benini, L.; Sebastian, A.; Rahimi, A. Constrained few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9057–9067. [Google Scholar]
Bent, G.; Davies, C.; Vilamala, M.R.; Li, Y.; Preece, A.; Sola, A.V.; Di Caterina, G.; Kirkland, P.; Tutcher, B.; Pearson, G. The transformative potential of vector symbolic architecture for cognitive processing at the network edge. In Proceedings of the Artificial Intelligence for Security and Defence Applications II, Edinburgh, UK, 16–20 September 2024; SPIE: Bellingham, WA, USA, 2024; Volume 13206, pp. 404–420. [Google Scholar]
Fung, M.L.; Chen, M.Z.; Chen, Y.H. Sensor fusion: A review of methods and applications. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3853–3860. [Google Scholar]
Chandrasekaran, B.; Gangadhar, S.; Conrad, J.M. A survey of multisensor fusion techniques, architectures and methodologies. In Proceedings of the SoutheastCon 2017, Concord, NC, USA, 30 March–2 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
Zheng, Z.; Xie, S.; Dai, H.; Chen, X.; Wang, H. An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Honolulu, HI, USA, 25–30 June 2017; pp. 557–564. [Google Scholar] [CrossRef]
Szabo, N. Smart Contracts. 1994. Available online: https://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/smart.contracts.html (accessed on 31 December 2024).
Novak, I. A Systematic Analysis of Cryptocurrencies. Ph.D. Thesis, Faculty of Economics and Business, University of Zagreb, Zagreb, Croatia, 2023. [Google Scholar]
Strobel, V.; Pacheco, A.; Dorigo, M. Robot Swarms Neutralize Harmful Byzantine Robots Using a Blockchain-Based Token Economy. Sci. Robot. 2023, 8, eabm4636. [Google Scholar] [CrossRef] [PubMed]
Ferrer, E.C.; Jiménez, E.; López-Presa, J.L.; Martín-Rueda, J. Following Leaders in Byzantine Multirobot Systems by Using Blockchain Technology. IEEE Trans. Robot. 2022, 38, 1101–1117. [Google Scholar] [CrossRef]
Wang, T. PrivShieldROS: An Extended Robot Operating System Integrating Ethereum and Interplanetary File System for Enhanced Sensor Data Privacy. Sensors 2024, 24, 3241. [Google Scholar] [CrossRef]
Salimi, S.; Morón, P.T.; Queralta, J.P.; Westerlund, T. Secure Heterogeneous Multi-Robot Collaboration and Docking with Hyperledger Fabric Blockchain. In Proceedings of the 2022 IEEE 8th World Forum on Internet of Things (WF-IoT), Yokohama, Japan, 26 October–11 November 2022; pp. 1–7. [Google Scholar] [CrossRef]
Xiong, Z.; Zhang, Y.; Niyato, D.; Wang, P.; Han, Z. When Mobile Blockchain Meets Edge Computing. IEEE Commun. Mag. 2018, 56, 33–39. [Google Scholar] [CrossRef]
Ge, X. Smart Payment Contract Mechanism Based on Blockchain Smart Contract Mechanism. Sci. Program. 2021, 2021, 3988070. [Google Scholar] [CrossRef]
Ullman, S. Against direct perception. Behav. Brain Sci. 1980, 3, 373–381. [Google Scholar]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar]
Ölsner, F.; Milz, S. Catch me, if you can! a mediated perception approach towards fully autonomous drone racing. In Proceedings of the NeurIPS 2019 Competition and Demonstration Track, PMLR, Vancouver, BC, Canada, 8–14 December 2019; pp. 90–99. [Google Scholar]
Chen, C.; Seff, A.; Kornhauser, A.; Xiao, J. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2722–2730. [Google Scholar] [CrossRef]
Yuan, D.; Fermüller, C.; Rabbani, T.; Huang, F.; Aloimonos, Y. A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM). arXiv 2024, arXiv:2404.01568. [Google Scholar]
Neubert, P.; Schubert, S. Hyperdimensional computing as a framework for systematic aggregation of image descriptors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16938–16947. [Google Scholar]
Imani, M.; Kong, D.; Rahimi, A.; Rosing, T. Voicehd: Hyperdimensional computing for efficient speech recognition. In Proceedings of the 2017 IEEE International Conference on Rebooting Computing (ICRC), Washington, DC, USA, 8–9 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
Neubert, P.; Schubert, S.; Protzel, P. Learning Vector Symbolic Architectures for Reactive Robot Behaviours. 2017. Available online: https://d-nb.info/1214377416/34 (accessed on 31 December 2024).
Noh, H.; Araujo, A.; Sim, J.; Weyand, T.; Han, B. Large-scale image retrieval with attentive deep local features. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3456–3465. [Google Scholar]
Plate, T.A. Holographic reduced representations. IEEE Trans. Neural Netw. 1995, 6, 623–641. [Google Scholar]
Wu, P.; Jia, X.; Chen, L.; Yan, J.; Li, H.; Qiao, Y. Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. arXiv 2022, arXiv:2206.08129. [Google Scholar] [CrossRef]
Hu, S.; Chen, L.; Wu, P.; Li, H.; Yan, J.; Tao, D. ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning. arXiv 2022, arXiv:2207.07601. [Google Scholar] [CrossRef]
Pinciroli, C.; Trianni, V.; O’Grady, R.; Pini, G.; Brutschy, A.; Brambilla, M.; Mathews, N.; Ferrante, E.; Di Caro, G.; Ducatelle, F.; et al. ARGoS: A Modular, Parallel, Multi-Engine Simulator for Multi-Robot Systems. Swarm Intell. 2012, 6, 271–295. [Google Scholar] [CrossRef]
Pacheco, A.; Denis, U.; Zakir, R.; Strobel, V.; Reina, A.; Dorigo, M. Toychain: A Simple Blockchain for Research in Swarm Robotics. arXiv 2024, arXiv:2407.06630. [Google Scholar] [CrossRef]
Lopes, V.; Alexandre, L.A. An overview of blockchain integration with robotics and artificial intelligence. arXiv 2018, arXiv:1810.00329. [Google Scholar] [CrossRef]
Aditya, U.S.; Singh, R.; Singh, P.K.; Kalla, A. A survey on blockchain in robotics: Issues, opportunities, challenges and future directions. J. Netw. Comput. Appl. 2021, 196, 103245. [Google Scholar] [CrossRef]

Figure 1. Proposed Framework for Robotic Motion Intelligence.

Figure 2. RMI Framework Composition: VSA and Smart Contracts.

Figure 3. A smart contract for enforcing the safety constraints of the RMI framework.

Figure 4. Results of Experiment 1—A single navigational robot for visual place recognition. The cosine similarities of the recovered action vs other actions is shown in the top graph, while the bottom graph presents ground truth actions, original agent-generated actions, and RMI-adjusted actions considering the confidence of agent’s actions and safety.

Figure 5. ARGoS Setup for Experiment 2—multiple navigational robots for landmark discovery.

Figure 6. Results of Experiment 2—blockchain and memory of five navigational robots R1–R5. The colors represent blocks from Genesis through to Block 7.

Table 1. Sensor Hypervector Representation.

Sensor	Label Hypervector	Raw Reading	Example Reading
RGB Camera	Channel-R: $b_{camera}^{(R)}$	$s_{camera (n \times 1)}^{(R)}$	${[12, 15, 34, \dots]}^{⊤}$
	Channel-G: $b_{camera}^{(G)}$	$s_{camera (n \times 1)}^{(G)}$	${[12, 15, 34, \dots]}^{⊤}$
	Channel-B: $b_{camera}^{(B)}$	$s_{camera (n \times 1)}^{(B)}$	${[12, 15, 34, \dots]}^{⊤}$
LiDAR	Channel-X: $b_{lidar}^{(X)}$	$s_{lidar (m \times 1)}^{(X)}$	${[2.3, 1.2, 0.5, \dots]}^{⊤}$
	Channel-Y: $b_{lidar}^{(Y)}$	$s_{lidar (m \times 1)}^{(Y)}$	${[2.3, 1.2, 0.5, \dots]}^{⊤}$
	Channel-Z: $b_{lidar}^{(Z)}$	$s_{lidar (m \times 1)}^{(Z)}$	${[2.3, 1.2, 0.5, \dots]}^{⊤}$
Sonar	$b_{sonar}$	d (scaler)	$1.50 m$
Accelerometer	Channel-X: $b_{acc}^{(X)}$	$a_{X}$	$0.00 m / s^{2}$
	Channel-Y: $b_{acc}^{(Y)}$	$a_{Y}$	$0.00 m / s^{2}$
	Channel-Z: $b_{acc}^{(Z)}$	$a_{Z}$	$9.81 m / s^{2}$

Table 2. A comparison of the parameter counts and datasets used to train different motion modules.

Method	Navigational Method	Parameter Count	Dataset
UniAD [53]	MotionFormer	2,628,352	The nuScenes dataset is used which has a training set with 28,130 labeled keyframes which have 1.4 million bounding boxes
TCP [87]	Trajectory + Multi-step Control	3,422,127	420 k data points are generated using CARLA simulator
LAV [52]	Motion Planner	4,884,864	400 k data points are generated using CARLA simulator.
NEAT [51]	Neural Attention Field + Decoder	492,359	130 k data points are generated using CARLA simulator
ST-P3 [88]	High Level planner	269,438	The nuScenes dataset
VSA-RMI (ours)	VSA operations	30,000 (10,000 × 3)	200 data-points from GardernsPointWalking

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

De Silva, D.; Withanage, S.; Sumanasena, V.; Gunasekara, L.; Moraliyage, H.; Mills, N.; Manic, M. Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts. Robotics 2025, 14, 38. https://doi.org/10.3390/robotics14040038

AMA Style

De Silva D, Withanage S, Sumanasena V, Gunasekara L, Moraliyage H, Mills N, Manic M. Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts. Robotics. 2025; 14(4):38. https://doi.org/10.3390/robotics14040038

Chicago/Turabian Style

De Silva, Daswin, Sudheera Withanage, Vidura Sumanasena, Lakshitha Gunasekara, Harsha Moraliyage, Nishan Mills, and Milos Manic. 2025. "Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts" Robotics 14, no. 4: 38. https://doi.org/10.3390/robotics14040038

APA Style

De Silva, D., Withanage, S., Sumanasena, V., Gunasekara, L., Moraliyage, H., Mills, N., & Manic, M. (2025). Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts. Robotics, 14(4), 38. https://doi.org/10.3390/robotics14040038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robotic Motion Intelligence Using Vector Symbolic Architectures and Blockchain-Based Smart Contracts

Abstract

1. Introduction

2. Related Work

2.1. Robotic Learning

2.2. Vector Symbolic Architectures

2.3. Blockchain-Based Smart Contracts

3. The Proposed RMI Framework

3.1. Perception Intelligence

3.2. Motion Control Intelligence

3.3. RMI Agent and Decisional Intelligence

3.4. Blockchain-Based Smart Contracts

4. Empirical Evaluation of the RMI Framework

4.1. Experiment 1—Single Navigational Robot for Visual Place Recognition

Parameter Selection

4.2. Experiment 2—Multiple Navigational Robots for Landmark Discovery

4.3. Scalability Analysis

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI