1. Introduction
The construction industry stands at a critical juncture, both in terms of its economic significance and its environmental impact. Currently, it is responsible for approximately 35% [
1] of global energy-related
emissions, highlighting its substantial carbon footprint. Economically, the sector contributes around 14% to global GDP [
2] and is projected to grow significantly, reaching an estimated value of USD 17.5 trillion by 2030 [
3].
Despite this growth potential, the industry faces several major challenges. A global survey conducted by ABB in 2021, involving 1900 construction businesses across Europe, the US, and China, revealed a widespread skills shortage, with 91% of respondents anticipating a labor crisis within the next decade. Nearly half (44%) of these businesses are already struggling to recruit qualified workers [
4]. In parallel, improving health and safety on construction sites and addressing environmental concerns were each identified as priorities by 42% of respondents, suggesting that workforce well-being and sustainability are becoming central to future industry strategies [
5].
In 2017, the McKinsey Global Institute (MGI) published a well-known report titled
Reinventing Construction: A Route to Higher Productivity [
6]. This report highlights the significant productivity gap in the construction industry compared to other sectors (
The Economist has elegantly represented these metrics, as shown in
Figure 1). Productivity growth in construction has been stagnant for decades, with labor productivity increasing by just 1% per year over the past 20 years—far below the 2.8% annual growth observed in other industries [
7]. From 1950 to 2010, in the United States, labor productivity in construction declined, particularly after 1968, in contrast to its rise in other sectors, such as agriculture, manufacturing, and retail [
6].
In response to these challenges, there is growing interest in the adoption of robotics and automation [
7,
8,
9]. While currently only 55% of construction companies report using robotic technologies—compared with 84% in the automotive industry and 79% in manufacturing—81% of respondents indicated that they plan to either introduce or expand automation in the coming years [
5].
The adoption of automation and robotics technologies is, nonetheless, highly dependent on the scale of the construction site [
8]. Advanced tools enable better planning, real-time monitoring, and precise execution, which are important requirements of major infrastructure projects, but are less likely to deliver the same levels of advancement for limited-scale, bespoke interventions.
In fact, unlike large-scale, repetitive industrial environments, where processes and tasks are highly standardized, small construction projects are characterized by low repeatability and high variability in the work to be performed. These types of construction sites—usually involving minor building works like residential renovations, commercial fit-outs, small home builds, extensions, or repairs—are, for these authors, to be taken as a benchmark, since they present a unique set of challenges that make the automation barrier extremely challenging to overcome.
Apart from reducing efficiency gains and increasing operational complexity, traditional automation is cost-prohibitive and demands highly skilled workers in the domain of automation engineering and robotics. In fact, whether the solution is offered by the integration of industrial robots with high payloads on teleoperated or autonomously guided vehicles [
10,
11,
12], as shown in
Figure 2, their placement on platforms and linear tracks to augment calibration accuracy [
13], as shown in
Figure 3, or the automation of repetitive high-load tasks, it is difficult to use them in small-scale operations without extensive and costly logistical infrastructure [
14].
Interesting precursors of robotic assembly on the construction site, among others, date from 1998 [
15] (
Figure 4) and 1994 [
16] (
Figure 5) and are centered on the idea of robotic clay block (or cellular concrete block) wall construction as a natural advancement enabled by the
CAD representation of the building. The computerization of the building process and the integration of a computer-based system for work preparation and quality control demonstrate the possibilities of flexible automation of masonry work [
16].
Despite significant efforts, most of these solutions remained at the stage of technical descriptions, with only a handful advancing to the prototype level [
14]. This is probably due to the fact that the weight and scale of their supporting infrastructure are generally disproportionate to the scale of the construction site, and that, more often than not, it is the design of the intervention that must follow the constraints of the machine, and not vice versa (
Figure 6).
Many tasks on small sites require manual dexterity, problem-solving, and on-the-spot adaptation, which are hard to automate without a highly skilled workforce. While automation technologies help manage large-scale infrastructural projects efficiently by streamlining workflows (as in the placement of structural arches in tunnels [
18], as shown in
Figure 7, or rebar in bridges [
19], as shown in
Figure 8), improving project coordination, and reducing operational delays, the non-repetitive nature of tasks undertaken in small construction sites means workers must be skilled across multiple disciplines, yet training for this kind of versatility is both time-consuming and very costly.
To ensure the same productivity levels while ensuring assignment adaptability and compliance with safety standards, machines and robots should constantly conform to new conditions, layouts, and obstacles, which significantly complicates job automation [
20,
21]. In practice, every project requires tailored adjustments, and automated equipment must be capable of understanding and navigating complex, unpredictable scenarios. This is obviously quite unlike a factory assembly line, where robots perform the same movements repeatedly in a controlled, predictable setting. Unlike a smooth factory floor, these sites often feature uneven terrain, debris, variable lighting, and constantly changing conditions due to ongoing construction activities.
For these reasons, construction companies generally employ a skilled manual workforce of around 15 workers or fewer and often prioritize productivity over safety requirements in order to achieve cost-effectiveness and meet shorter timelines—weeks to a few months—for job completion. A large portion of the current construction workforce in Europe is nearing retirement age, and there is a lack of younger workers entering the trades to replace them. Physical demands, safety risks [
22], and the consequent lack of gender, age, and disability inclusivity are increasingly deterring younger generations from considering construction as a viable career path.
Moreover, musculoskeletal disorders (MSDs) are the most common work-related health issue in the EU, affecting workers across all sectors and causing significant costs to enterprises and society [
23]. Jobs with higher MSD risks, the so-called “3-D jobs” (dirty, dangerous, and demanding), are found particularly in the agriculture, horticulture, construction, healthcare, household, transport, and food sectors.
In response, EU-OSHA launched a 4-year research initiative in 2017 aimed at addressing MSDs. The initiative’s goals include improving policy instruments at both EU and national levels, enhancing MSD prevention and management in workplaces, sharing successful practices, supporting national measures for workplace prevention, promoting the reintegration of workers with MSDs, and identifying research priorities to better understand the causes of MSDs.
One of the study’s conclusions was that the integration of ergonomic principles into product and process design plays a pivotal role in fostering effective and safe human–machine cooperation. Ergonomic design seeks to adapt systems to human capabilities and limitations, ensuring that workers interact with machines in ways that promote safety, comfort, and efficiency “by design”.
Bric-a-Brick leverages ergonomics and user-friendliness as key strategies to automate physically demanding and repetitive tasks within small construction sites.
The remainder of this paper is organized as follows.
Section 2 outlines the project background and core assumptions, establishing the context and guiding principles for this work.
Section 3 describes the materials, detailing both the hardware and software components used.
Section 4 presents the methods, focusing on the implementation process and key technical approaches.
Section 5 reports the results, highlighting the main findings and timing metrics for each operation. Finally,
Section 6 offers a discussion of the outcomes, addressing the project’s implications, limitations, and potential directions for future work.
2. Project Background and Core Assumptions
The precursor of the
Bric-a-Brick project is the 2024
BRIX [
24] system (
Figure 9), a mobile manufacturing system prototyped by
INDEXLAB and
Sigma Ingegneria that integrates autonomous vehicles, robotic clay-block laying, and algorithmic programming. First shown at the 2023
SAIE Construction Fair in Bari, Italy, it leverages the adaptability of off-site parametric modeling and, employing a vision system and calibration markers, constructs a clay-block wall on-site.
Clay blocks align perfectly with Europe’s decarbonization goals by offering a sustainable, low-carbon solution for modern construction. These blocks are manufactured through a vacuum extrusion process that compacts the clay mixture, resulting in a dense and uniform brick with consistent properties. This method not only enhances the strength and durability of Porotherm blocks compared to other brick types but also contributes to their excellent thermal insulation properties, making clay-block wall construction an energy-efficient choice. Additionally, the ribbed sides of the blocks improve adhesion with other blocks, plaster, or stucco finishes.
Prefabricated systems using clay blocks allow for efficient building processes, reducing waste and energy consumption on-site. Their lightweight design and compatibility with thin-joint mortar speed up construction and reduce labor costs. Made from natural materials, they support eco-friendly building practices with low embodied energy and help regulate indoor humidity for healthier environments. The chosen type of clay block for these experiments is the
Wienerberger Pth BIO MOD 30-25/19, which is a modular block for load-bearing seismic-resistant walls measuring 300 × 250 × 190 mm [
25] and weighing 12.9 kg.
Construction with Porotherm blocks achieves 25–30 m2 per person per day, which is significantly faster than the 12–15 m2 typical of traditional methods. It reduces water usage by 95%, requiring just 72 L for mortar in a 212 m2 building compared to 1060 L with traditional methods. The blocks have a compressive strength of 10 N/mm2, offering better durability than standard concrete blocks. They are lighter, reducing the risk of injuries, and are non-combustible, enhancing safety. The clay’s natural properties ensure long-term sustainability, offering low-carbon manufacturing processes and providing buildings with a lifespan of over 150 years with minimal maintenance.
Clay blocks also provide excellent thermal and acoustic properties, and roughly 30% of the block material is created from alternative, recycled, or secondary sources. The system’s engineering results in minimal waste, no moisture shrinkage, ensuring cleaner sites and reduced costs. Overall, Porotherm is a fast, efficient, and sustainable building system that offers several advantages over traditional masonry construction [
26].
In BRIX, an algorithm is employed to take a BIM parametric wall of given dimensions, dividing it into sections and deriving the number of elements to be handled for each portion of the wall. A given number of waypoints are provided to the rover to position itself coherently near the wall, as well as the instructions for a cobot to pick and place the clay block from a pallet to build the wall.
First, the movements for each clay block are programmed offline (the visual sequence is shown in
Figure 10), starting with a linear approach to the gripping point, followed by the alignment of the robot over the gripping point. The gripper then tightens, and the robot withdraws from the gripping point in a linear motion. It proceeds with joint movement toward the release point, followed by another linear approach, placing the robot at the release point, opening the gripper, and finally withdrawing from the release point. The program is transferred to the robot controller, and the second phase begins: positioning the rover at the first predefined point, integrating the vision system, and establishing communication between the rover and the robot. At the designated waypoint, the rover sends a digital signal to the robot, indicating the start of the subsequent operations. However, before executing any commands, the system ensures that the rover’s positioning is validated by the vision algorithm.
This validation process involves adjusting the camera height to target a specific marker, a grid-based point recognized by the algorithm. This marker, located near the pallet’s corner, helps redefine the shared reference system between the robot, pallet, and wall, as initially defined during offline programming. Once the marker’s position is verified, indicating the correction of any rover placement error, the robot proceeds to execute the programmed sequence for the designated portion of the wall. Upon completion, the robot sends a digital signal to the rover, instructing it to move toward the next station.
While the BRIX system constitutes a leap forward in the automation of clay-block construction, it does not match the dexterity of a bricklayer in placing a clay block to build a wall. Mistakes like the misplacement of a certain clay block on the pallet in the pick or the placing operation, or in the correct alignment of the rover—particularly in the presence of failures to recognize the markers with the in-built vision system—require debugging and problem-solving that can only be done by a skilled professional. Since the bricklayer is not equipped with an intuitive way to interact with the end-effector, the training required to interact with the machine is extensive for someone who is not necessarily willing to learn more about robot control and programming. Additionally, even though the system is lightweight (weighing around 650 kg), the rover is too large to pass through regular doors and does not include a mechanism to build the higher clay-block rows of the wall.
The experience of
BRIX shifted the focus of the
Bric-a-Brick prototype from automating clay-block wall construction toward relieving the majority of manual work strains for regular bricklayers while still ensuring human–machine collaboration. In this way, the project promotes broader inclusivity for operators of different ages, genders, and abilities, nudging present practices toward an inclusive approach to make construction work more equitable and accessible, and emphasizing the importance of creating technological solutions that respect diversity and value individual capabilities (
Figure 11).
Therefore, to meet TRL 9 by 31 December 2026, some key assumptions have been established to enable a more immediate dive into the core characteristics of the machine.
The first set of assumptions concerns the replicability of the job type: in this use case, doors and passages can be as narrow as 800 mm, so the focus is on clay-block wall construction (including load-bearing walls) up to around 3 m in height. This is because modular yet cost-effective solutions for small-scale residential buildings in Italy normally employ clay blocks, and a typical residential structure withstands live loads up to 2 kN/m
2 [
27]. It is mandatory to work within these limits (
Figure 12), along with another important factor: foundations are already in place.
These requirements are met if the prototyped system is compact (its width is small enough to allow the machine to easily pass through doors;
Figure 13), lightweight (it is easily transportable on-site using a simple van;
Figure 14), and directly movable next to a wall by a bricklayer, thereby reducing the majority of the system’s transportation costs. It must also include a lifting mechanism for the robot arm to lift the clay block to the required height to finish the wall construction.
The second set of assumptions concerns the features of the clay blocks to be used, specifically why the same model—the Wienerberger Pth BIO MOD 30-25/19—of the BRIX prototype was employed in this case.
Since this phase involves an innovative approach to problem-solving, the priority was to eliminate non-essential elements that, while potentially elegant, would not necessarily contribute to long-term objectives. A Porotherm clay block with rectangular apertures at the center was assumed to be the easiest to use straight away, as the fingers of a standard industrial gripper can immediately conform to its surface without additional customization. It was deemed non-essential to focus on gripping flexibility in this phase, with the plan to address other shapes of clay blocks in the future using a custom-designed mechanical gripper. In the long term, a vacuum-based gripper adhering to the lateral surface could offer much greater resilience for a plethora of other shapes [
15].
This method has some drawbacks because gripping the clay block from the top prevents Bric-a-Brick from placing the last row of wall elements directly under an existing beam. Nonetheless, as described above, since many challenges remain with lateral mechanical gripping, and the prototyping cost and effort are largely unjustified compared to its long-term prospects, it was decided to adopt the cheapest solution available and focus on other critical issues.
The perks of working with Porotherm blocks, among a great variety of block-based construction options, include their future sustainability prospects and their highly recognizable shape. In fact, if the geometry of the piece, including its distinctive hole pattern or silhouette, is simple enough to extract from the surrounding environment, a computer vision system can detect “seen” objects—i.e., supplied with the CAD model or 3D mesh in advance—without the need for real-time shape acquisition and reconstruction. The hardware setup can therefore be cheaper in this phase, mainly in terms of the GPU used, and the focus can be on optimizing the pipeline on the software side.
Using a collaborative robot (cobot) with a high-payload capacity of 25 kg is also a strategic decision for the future development of Bric-a-Brick. This choice ensures that if the type, shape, or weight of the clay block evolves (i.e., if the intention is to incorporate more insulating and finishing layers into the wall to augment its thermal transmittance), the cobot will be capable of lifting up to twice the current weight of the clay block.
Working from these premises, the solution could be effectively implemented within approximately one year of research and development, paving the way for future advancements and extensive testing of the
Bric-a-Brick prototype. Despite its compact design and the fact that it weighs just 150 kg,
Bric-a-Brick delivers an impressive range of motion through its telescopic and retractable features (
Figure 15).
3. Materials: Hardware and Software Overview
At the heart of
Bric-a-Brick lies the integration of a mobile base and a robotic arm through a lifting column and a manually operated stabilizing system (
Figure 16). Following a strategic evaluation, at this stage of development, it was decided to use mature, off-the-shelf technologies currently available on the market. Since the focus is not on optimizing for efficiency but on delivering a first working version, rather than reinventing core hardware components, it was decided that the complexity of interoperability and integration would be delegated to the software layer. This approach enables the quick unblocking of critical issues to move forward with development and concentrates efforts on the “missing parts”.
A lightweight and cost-effective mobile base, increasingly used for transporting materials like sand, gravel, cement, and bricks across construction sites, is the wheelbarrow. As it helps reduce physical strain, speed up work processes, and improve safety by reducing the likelihood of worker fatigue or injury, it is a very versatile tool for overcoming difficult terrain and steep slopes. Employing the typical construction-site three-layer laminated spruce panel for formwork reinforcement (which normally measures 27 mm × 500 mm × 3000 mm), it is possible to manually load and unload
Bric-a-Brick from a van without the need for a tail lift (
Figure 17). Wheelbarrows are not only simple to maneuver and very useful for solving logistical issues, but they are also very cost-effective and compact compared to small cranes or other semi-movable vehicles like skid-steer loaders, mini dumpers, or even forklifts.
The
BEACH 270-E [
28] is a tracked electric wheelbarrow with a load capacity of up to 450 kg, designed to offer high-performance load transport on sloped surfaces of up to 35%. This makes it especially suitable for use in construction, agriculture, and other labor-intensive sectors where transporting heavy items is routine, and rough, uneven ground usually requires machines with tracks. Equipped with an 800 W motor and powered by rechargeable 60 V–20 Ah batteries, the
BEACH 270-E can reach speeds of up to 5 km/h, ensuring emission-free operations and great adaptability.
Its removable loading bed, measuring 1320 × 1000 × 480 mm, was removed to install a custom-bent steel 5 mm sheet interface, serving as a connection between the wheelbarrow and the robotic arm. This metal piece’s geometry integrates a stabilizing system made of retractable square-section bars, with manual jacks at the end to balance the wheelbarrow, and a lifting column with a cobot on top. Provision was made for additional clearance at the front of the unit to accommodate the future integration of a supplementary robotic arm and, in the meantime, to allow for the easy attachment and transport of objects.
The
Bosch GCL 2-15 Professional Combi Laser [
29] offers straightforward visual help to the operator to stabilize the machine. Featuring two centered plumb points, it has a working range of 15 m and laser class 2. Additionally, the included laser target plate enhances accuracy in various lighting conditions.
In order to lift the robot—and reach up to 4.5 m with the fully extended arm—the
Linak LC3E0008DK Elevate Easy [
30,
31] is mounted on top of the metal piece; it is a robust lifting column weighing just 29 kg and measuring 163 mm × 163 mm × 730 mm when fully retracted. Designed with a three-stage telescopic mechanism, it is engineered for vertical actuation in collaborative robotic and industrial automation settings, enabling a long stroke of 900 mm with a payload capacity of up to 1000 N. It is powered by a 24 V DC supply and incorporates a brushless motor, which ensures high efficiency, low maintenance, and quiet operation. Depending on the spindle pitch, the unit can achieve a maximum speed of up to 100 mm/s, making it responsive enough for dynamic positioning tasks. The connection cables provide the necessary infrastructure for TCP master–slave functionality, ensuring communication with the robot controller.
Bric-a-Brick operations are carried out with a collaborative robotic arm (
Doosan model H2515 [
32]) mounted at the center top to ensure optimal balance and reach.
It is useful to point out that choosing a cobot over an industrial robot goes beyond payload and weight advantages. While benefits like reduced energy demand and integration flexibility support easy prototyping, cobots are part of a broader strategy to blend robotic productivity with human flexibility in safe collaboration zones. Cobots feature advanced safety sensors and built-in limits to ensure secure operations. Their intuitive interfaces and low-code setups, combined with computer vision, dramatically reduce deployment time, especially in dynamic, unstructured environments.
The H2515 is part of the H-SERIES and is a collaborative 6-DoF robot with a payload capacity of 25 kg and an operating radius of 1500 mm. It features six torque sensors and operates efficiently with low electrical power consumption. The maximum linear TCP speed is 1 m/s, and it has a repeatability of ±0.1 mm.
Although more affordable robotic arm options, with higher reach but a lower payload, were considered for the application, the decision was made to prioritize the collaborative robot with the highest payload available on the market. This choice ensures that, even with an extended arm reach—and with a heavier clay block than the one chosen for the demonstration—the robotic system can maintain optimal joint load distribution, preventing excessive strain on the actuators and ensuring stable movements without compromising mechanical integrity.
The system’s control box, comprising an inverter to convert the battery voltage of 48 V DC into an alternating current at 220 V AC and 50 Hz, the Doosan controller, its Teach Pendant, and a notebook host machine, is housed on top of a trailer measuring 680 × 540 × 320 mm. This trailer is designed for easy detachment and transportation on-site. This configuration is particularly critical for the future integration of the wheelbarrow’s battery pack as a power source, an objective that has not yet been tested but will be the next step in the integration process.
As a gripper,
Schunk’s EGU 60-MB-M-B [
33] was chosen, as it is a universal electric gripper designed for handling objects weighing less than 5 kg. It features a 60 mm stroke per jaw, with gripping forces ranging from 325 N to 1300 N, and these characteristics were selected as being the most suitable to mechanically grasp the clay block by inserting the fingers inside its central holes, considering a weight range of the piece from 10 to 18 kg. Powered at 24 V, it ensures a secure hold via grip-force maintenance. Operating temperatures range from 5 to 55 °C, which is normally fine in temperate environments, particularly in the context of construction sites where the horizontal partitions are already in place and exposure to harsh outside climate conditions is not common. The gripper interfaces with the notebook controller via a Modbus RTU-to-USB interface.
It is important to note that, even if most of the hardware components are IP65-rated, should the prototype be required to endure significant rainfall, it would be prudent to choose hardware components specifically designed to be waterproof—and to provide accessory casing and jacketing for both the column and the cobot—ensuring optimal performance and durability under such conditions.
As the primary computer vision module for depth perception and object detection, an
Intel RealSense D457 [
34] stereo depth camera with global-shutter sensors is mounted on
Bric-a-Brick, offering depth sensing from ~0.52 to 6 m with <2% error at 4 m. This range is optimal for the application, considering that error-prone depth sensing and disparities begin to appear at almost double the span of the robot’s working range. It delivers depth at up to 1280 × 720 pixels at 90 fps and RGB at 1280 × 800 pixels at 30 fps. The depth field of view is 87° × 58°, which is optimized for mid- to long-range perception. IP65-rated, it is suitable for dusty industrial environments like the average construction site and can withstand projected water.
A custom 3D-printed robot flange–gripper interface (
Figure 18) was designed and engineered to add handles to the end-effector and easily manipulate the robot arm, to position and secure the camera in an optimal orientation for image capture, and to include four buttons for both taking direct control of the Doosan robot while running the program and releasing the clay block in the desired position. Overall, the interface serves as a backup if the camera is not able to recognize the clay block because of poor lighting conditions, but it also prospectively provides a comfortable remote control for enhanced data acquisition of the cobot’s motion and its joint sensors.
The ergonomics of this piece, printed in PLA using a
Bambu Lab X1C 3D Printer (Bambu Lab, Shenzhen, China) [
35], allow for easy grasping by the operator, and it has embedded safety, ensuring the clay block is released only when both buttons are pushed and inside the collaborative volume. The weight compensation of the cobot gives more overall maneuverability to the bricklayer in the careful deployment of the gripped clay block using the handles.
The hardware for the
Bric-a-Brick application is integrated using a
Dell Inspiron 7577 (Dell Inc., Round Rock, TX, USA) [
36] as the host machine. This computer has 16.0 gigabytes of RAM and features an
Intel Core i7-7700HQ processor (Intel Corp., Santa Clara, CA, USA) with eight cores and a dedicated
NVIDIA GeForce GTX 1060 (NVIDIA Corp., Santa Clara, CA, USA) [
37] graphics card.
Serving as a base for the major part of the pipeline is an infrastructure of Docker microservices running Ubuntu 24.04.1 LTS, 64-bit, with a firmware version of 1.17.0 and a kernel version of Linux 6.8.041-generic. The streaming pipeline, image segmentation, object recognition, and retrieval of the clay-block gripping-plane coordinates use a combination of Python version 3.11.8 scripts to output a coordinate file. The use of Docker containers for these Python scripts ensures that each service is isolated, making the infrastructure more scalable, maintainable, and portable for future use cases.
The
Segment Anything Model (SAM 2) [
38] is a state-of-the-art segmentation tool designed to identify and delineate objects within images. It operates by leveraging advanced deep learning techniques, specifically trained on a diverse dataset, to enhance its generalization capabilities. SAM 2 employs a text-prompt or point-detection approach, allowing users to specify regions of interest (ROIs), which the model then processes to generate segmentation masks. This flexibility enables it to adapt to various applications, from medical imaging to autonomous driving. The model utilizes a combination of convolutional neural networks and attention mechanisms to capture object boundaries and details. By analyzing pixel-level information, SAM 2 can distinguish between overlapping objects and complex backgrounds.
To retrieve the 6-DoF pose,
NVlabs’s FoundationPose [
39,
40] pipeline was used. FoundationPose is a unified model for object pose estimation and tracking that supports both model-based and model-free approaches. It can be applied to new objects at test time without fine-tuning, requiring only a CAD model or a few reference images. The model uses a neural implicit representation for novel view synthesis, ensuring consistent pose estimation across different setups. It achieves strong generalization through large-scale synthetic training, has a novel transformer-based architecture, and uses contrastive learning [
41].
Supplying a CAD model of the clay block and its SAM 2-segmented mask superimposed on the RGB image taken from the camera, FoundationPose recognizes the pose of the clay block and supplies the 6-DoF coordinates of the mesh plane in the frame’s coordinate space (
Figure 19).
On Windows 10, this coordinate file is imported into a simulation environment in Grasshopper3D [
42], a popular visual programming environment for 3D modeling and analysis built on
Rhinoceros. This Grasshopper3D script interprets the coordinate file, simulates the motion of the robot using the
visose/Robots library [
43], and plans the optimal path for the robot arm to grip the clay block, move it safely to the collaborative volume, interact with the bricklayer, and return to pick up another clay block.
4. Methods: Implementation
The tests for the
Bric-a-Brick project were undertaken at the INDEXLAB laboratory in Lecco, Italy, in a covered outdoor space (
Figure 20) adjacent to the parking area, where the lighting is uniformly distributed at a low intensity, creating consistent yet subdued illumination throughout. This lighting configuration, which prioritizes energy efficiency while avoiding any harsh contrasts or shadows, should be the minimum standard for testing the camera vision system. While no direct light sources were aimed at or near the testing area, in the context of a construction site, direct sources of light can be added at convenience to ensure the clay block is recognized in most conditions.
In the first phase, employing two standard three-layer laminated spruce panels, commonly used for formwork reinforcement in construction, placed one on top of the other, a proof of concept for loading and unloading
Bric-a-Brick from a van was successfully performed. With the panel serving as a slope for the wheelbarrow, without the assistance of a tail lift (
Figure 17), a single operator could handle these tasks intuitively and without supervision.
In the second phase, the operator positioned the electrical wheelbarrow within the 1500 mm reach of the cobot arm (usually 1200–1300 mm away), adjacent to the clay-block pallet and the wall to be built. The operator then manually extended the stabilizing system, consisting of simple retractable square-section bars, and adjusted the jacks at the end. In order to ensure the wheelbarrow was level, the operator used the combi laser. After completing this quick setup, the operator started the program on the notebook machine (
Figure 20).
The operator knows where to manually guide the wheelbarrow from a 3D visualization on the notebook screen, indicating how the clay-block wall is clustered (
Figure 21) and in what area—approximately—the robot and the pallet should be placed to pick and place every clay-block cluster correctly. In these visualizations, the bricklayer can orbit the view using the mouse and point and click to see annotations of distances and measurements.
This visualization is prepared beforehand by another operator, who designed and planned the construction of the wall for the chosen clay block using a Grasshopper3D script that proposes the building process according to defined clustering rules and specified parameters. If a wall has a defined axis, width, and height, it can be divided according to the clay block and an appropriate mortar-joint thickness. Subsequently, it is divided into “macro-rows”, which span the width of the wall and are generally about to of the height, depending on the specific dimensions of the wall and the robot’s reachability.
These macro-rows are then divided as well using clusters of a certain number of blocks—generally between 8 and 11—that constitute the batch of blocks that Bric-a-Brick will build at a time. After having picked and placed this batch, the wheelbarrow is moved, and the routine continues to the next batch of blocks. The parameters that control this process are the mortar thickness, dimensions of the clay block, and batch geometry. The output is a simulation of robot reachability, a visualization and count of blocks in each batch, and the number of batches within a macro-row.
It should be noted that the very nature of traditional clay-block wall construction—the process by which the bricklayer places the blocks to build the first row, then the second, then the third, until the bottom of the structural beam or ceiling—is completely different in
Bric-a-Brick. It would be inefficient to move the wheelbarrow multiple times to build the wall row by row, since the stabilizing system would need to be repositioned and readjusted repeatedly, leading to losses in productivity and possibly more calibration errors. If the wall is built in macro-rows, the batches of blocks could all have the same parallelogram shape (apart from the batches at the ends, which have a triangular,
peacock-tail shape to conform to the constraints of vertical structures), allowing for faster construction (
Figure 22).
When the Grasshopper3D script supplies the visualization on the
Rhinoceros canvas of the wall divided into portions—in convenient clay-block batches according to the reachability of the cobot—the bricklayer is prompted to permit the next phase to begin (see
Figure 23 for a graphical representation of the pipeline).
The cobot begins a scanning routine using the camera, which involves waiting for manual input from the user to hand-guide the end-effector using the handles to point at the pallet of bricks, registering the position, and then moving in spiral mode (a built-in function in the Doosan movement library) outwards and inwards while saving RGB and depth frames with the camera using the best available streaming configuration. Specifically, an
OpenCV [
44] window opens on the notebook, and the operator sees the live camera stream framed by the current end-effector pose. When the clay-block pallet is approximately in the center of the scene, the operator presses a key to advance to the next stage or another key to redo this step.
If the frame is satisfactory, the stream begins and, with the pallet well visible, the operator is prompted to sketch—on the screen using the mouse—to define the ROI of the pallet relative to the background of the construction site. Other modes are available to segment the pallet of bricks, such as text mode and point mode, but they were not extensively tested since it was deemed more intuitive to draw with the mouse directly on the picture instead of relying on a point-and-click solution.
A bounding box is thus derived from the rectangle convex hull of the sketch, and the SAM 2 model starts to segment the bricks, refining the segmentation in real time while the stream is running and the script is saving frames to the dataset folder on the host machine. Even if the segmentation is not always correct and further refinement may be needed (i.e., the contour of the clay block is pixelated, some portions are missing, or some of the bricks are excluded because lighting conditions are poor), the scope of this step is only to remove a portion of the overall frame and separate it from the background. This ensures computing resources are not wasted by running the algorithm across the entire image.
An automatic offset of roughly 30 pixels applied to the ROI ensures the majority of the blocks are included within the scene. A second bounding box is then derived, and the RGB and depth frames are cropped and saved in another folder.
The program waits and prompts the user to supply a CAD model to identify within the ROI. In this case, the 3D mesh of the clay block was made readily available—alongside a plethora of other clay-block mesh options—through a Grasshopper3D script. This script takes Porotherm datasheets in .PDF as input, converts the plan of the clay block to vectors, extrudes and caps it, textures it with a solid orange color, and exports it in .STL format. When the user selects the appropriate mesh from the clay-block list, the subsequent part of the script begins.
In the last part of the pipeline, FoundationPose compares the segmented portions of the scene and reconstructs—from the cropped RGB and depth frames of the scene—the 6-DoF pose of the recognized blocks. Another OpenCV window opens, showing the video result of the operation and overlaying a bounding box of the supplied mesh and the x, y, z axes of the plane onto the items it recognizes in the image stream. It then closes and outputs the coordinate files of the plane for each detected clay block in every frame of the RGB-cropped stream.
A Python function calculates, among these planes representing the coordinates of each recognized clay block, which ones are on the top of the pallet, dividing the top row from the others. It then excludes the outliers from these coordinate values—i.e., those describing plane coordinates that are not contained within the convex hull of the majority of the other planes, or those with a completely different orientation—and outputs a single file with one unambiguous gripping-plane coordinate for each detected clay block.
A Grasshopper3D script automatically opens, interpreting the coordinates and plotting them in the simulation of the corresponding environment in the robot world. Each clay block is rendered oriented according to the coordinates of the plane in the Grasshopper3D canvas, and the top row is the first to be de-palletized, starting with the pieces nearest to the robot. It then calculates the movements to pick and place all the blocks needed to construct the first portion of the wall with the first calculated clay-block batch, generally using the top row of the pallet and part of the row underneath.
The Grasshopper3D script outputs a program, employing a custom post-processor, that orchestrates linear motions at a speed varying between 50 and 250 mm/s to move the clay block outside the pallet area and bring it inside the collaborative volume for the operator to take control of the end-effector in joint motion. After placing the mortar on the clay block, the operator is prompted to take control of the cobot via manual hand-guiding by pressing the two buttons on the handle of the 3D-printed gripper interface.
The operator places the block in the desired position, applying the required pressure on the mortar bed to make it fully adhere, then presses the two buttons on the top of the gripper interface to successfully de-grip the clay block, and presses the two buttons on the handles again to exit manual hand-guiding mode. The cobot registers the placement of the clay block’s coordinates and moves back, in linear motion up in Z, to the collaborative volume before gripping the next clay block from the pallet. This operation continues until the first portion of the wall is completed, then another cycle begins, and the wheelbarrow must be repositioned according to the initial 3D simulation.
When the last macro-rows have reached a certain height, a lifting mechanism for the bricklayer is also required to continue collaborating with the robot comfortably at that height (
Figure 24) and finish the wall.
5. Results
Overall, 12 full de-palletizing cycles were tested for the
Bric-a-Brick project (
Table 1 and
Figure 25). The loading and unloading operations performed by a single operator were completed in 560 and 670 s, respectively, for each operation, demonstrating the ease of use and efficiency of the system. The tasks of robot packing for transport and unpacking contributed roughly 150 s of downtime to the loading and unloading operations.
When the electrical wheelbarrow had been unloaded and guided adjacent to the clay-block pallet and the wall to be built, the operator extended the stabilizing system and adjusted the jacks at the end using the combi laser. The operator took from 400 to 500 s on average to complete these tasks. Nevertheless, during the first four tests, the full de-palletizing cycle had to be aborted, and the operation of extending the stabilizing bars was repeated with another configuration because it was not stable enough when two bars on one side were “too parallel” to the wheelbarrow’s length.
When the scanning routine began, from the hand-guiding step to the end of the spiral-mode movement, no more than 200 s were employed during all tests, with lows of 140 s. The operation had to be performed again five times—i.e., with the clay-block pallet approximately in the center of the scene—to ensure that the spiral-mode motion would catch the majority of the first row of blocks in a single operation. Lighting variability and pallet misalignment influence segmentation quality, as excessive direct light—or cast shadows—on the blocks could affect SAM’s ability to segment the object of interest in the RGB-feature-rich depth camera’s extracted point cloud. Safe deployment in real-world sites would entail better control of lighting variability to ensure that the segmentation results are consistent.
The sketching stage failed the first two times, but then the operator learned how to define the ROI in the most straightforward way: drawing a simple sketched circle. This step took no more than 11 s.
When the user had selected the appropriate mesh from the clay-block list, FoundationPose began the recognition routine. While in an industrial application this step would have been pretrained to recognize the specific clay block, it already worked pretrained out of the box for 9 of the 12 tests conducted by our team. On the notebook used, based on the frame rate of the stream, this operation took from around 250 s (for a 480 × 360-pixel resolution and a 30 fps frame rate) to a maximum of 520 s (when streaming at a 640 × 480-pixel resolution and a 30 fps frame rate). Higher resolutions were tested outside the Bric-a-Brick’s testing pipeline, but they took substantially longer for the recognition of more than one block, considering the current hardware setup.
When the coordinates had successfully been selected, the outliers excluded, and the sifted coordinates exported, the Grasshopper3D simulation opened and rendered the scene and the robot trajectories. In these tests, this step took an average of 39 to 100 s based on the Grasshopper3D opening time via the Python subprocess library. The trajectories were then visualized, and the program was generated. The operator was prompted to send the program to the robot, and the motions began.
It is important to highlight that the motion timings employed as benchmarks in all the metrics, charts, and tables discussed were based on the lower bound of the speed range tested for the application, specifically between 50 mm/s and 250 mm/s. This means that performance was evaluated using the robot’s minimum operational speed, which represents a fairly conservative approach to assessing robotic movements in the context of the study and underscores an embedded potential for optimization. The cycle speed of these operations can therefore be improved without much difficulty, allowing for enhanced efficiency and productivity in robotic tasks.
The linear motions—to move the clay block outside of the pallet area and bring it inside the collaborative volume—were performed correctly seven times without any interruption, three times with interruptions (with the cobot blocking the motion when extracting the clay block too abruptly from the top batch and prompting the user to unblock the joints again and resume the motion), and two times causing another adjacent clay block to fall down. These results are promising but show the difficulties in extracting an element too tightly squeezed between other clay elements in the pallet. For these linear motions, the robot took between 15 and 40 s to complete these tasks.
It should be noted that these errors can occur because of the non-structured way these blocks—depending on the producer—are placed on the pallet. When the wrapping paper around the pallet of bricks is ripped apart, several issues can arise: general misalignment, the presence of broken pieces in the central holes of the clay block, inconsistencies in the positioning of the clay-block rows in one pallet compared with another pallet from the same producer, and other issues that the bricklayer generally has to deal with using dexterity. In Bric-a-Brick, such issues are still dealt with in the same way: by using human “help” to facilitate automation. This is why, at any moment within the execution of the script, there is an idling function that can be activated by a double-tap on the Doosan arm cockpit to abort the pipeline, take manual control via the handles, and use the system as a traditional lifter, counterbalancing the clay block’s self-weight.
When the operator took control of the end-effector in joint motion, the mortar was successfully placed (the bricklayer took about 8 to 10 s to do this), and the hand-guiding of the clay block began by pressing the two buttons on the handle of the 3D-printed gripper interface. Depending on the placement position of the clay block, this operation took from 15 to 30 s. The cobot then registered the clay block’s placement coordinates, moved upward in Z in linear motion, and proceeded to the collaborative volume in about 35 to 45 s.
None of the 12 tests conducted implemented a full pick-and-place cycle of a clay block using the lifting column. While all the programmed “automatic” cobot movements that use the lifting column were tested, no interaction test with the bricklayer at height was performed. Further advancements of the Bric-a-Brick prototype will ensure that these tests are conducted on a real-world construction site with all the safety requirements and standards in place.
6. Discussion
Overall, the timings of the performed operations were considered satisfactory compared to traditional wall-building methods—not necessarily in terms of optimizing the total cycle time but rather in highlighting the potential for more enriched human–machine collaboration.
Construction with Porotherm blocks achieved 25–30 m
2 per person per day, compared to the 12–15 m
2 typical of traditional methods [
26]. Using a median of 20 m
2 (a 4 × 5 m wall) for a regular bricklayer’s 8 h day shift as a benchmark, and considering the chosen Porotherm clay block, it was calculated that the wall would be composed of roughly 230 blocks, resulting in an overall productivity of 28.75 blocks per hour.
Summing up all the partial timings of the operations carried out, Bric-a-Brick took from 1029 to 1334 s on average to complete the pick-and-place task per clay-block batch. Let us consider that, for a 4 × 5 m wall of roughly 230 blocks, around 23 batches of pick-and-place operations would be carried out. If of them required the lifting column to a certain height, the time increase for these batches would need to be calculated, and an additional 20 s per batch (10 for the column to extend and bring the cobot up, and 10 for it to get back down) would need to be accounted for.
Adding 20 s to the last eight tests would account for the use of the column, while the cycle timings, on average, would decrease, reflecting the bricklayer’s learning curve and improved operational efficiency over time.
The median of the first 4 batches (1289 s) was multiplied for the first 7 batches of the wall (corresponding to the first macro-row on the ground), and the median of the last 8 batches (1119 s) was multiplied for the last 16 batches (corresponding to the macro-rows above the first one). This yielded 9023 s (around 150 min) to complete the first macro-row and 17896 s (around 298 min) to complete the other ones. To these, the loading and unloading times of the robot were added—around 9 and 11 min, respectively—giving a rough estimate of 468 min, or 7.8 h, to complete the wall. A more in-depth analysis is presented in
Table 2.
While these tests are not to be taken as real-world implementation metrics, they show promising results. Since the cycles were performed by constructing the same clay-block batch over and over, their timings inevitably lack generalization for a real-case scenario of building a 20 m2 wall on a construction site.
In terms of efficiency, at this stage of development,
Bric-a-Brick’s collaboration strategy does not drastically reduce the overall cycle time of building a clay-block wall. However, it reduces physical strain and provides the foundation for replicability and automation in this use case. Even if it were to maintain the productivity level typical of skilled labor,
Bric-a-Brick could greatly relieve the bricklayer from heavy physical loads and open up the profession to more human–machine collaboration (
Figure 11) without reducing safety or the overall quality of construction work.
This project has the potential to inspire applications in other sectors as well, such as logistics and manufacturing, where challenges related to ergonomics and efficiency are also critical. Future research will explore several additional topics, and the team is actively seeking investors to support ongoing development and advance to the next phases of the project.
The current mechanical gripper should be replaced with a vacuum-based gripping system to improve adaptability, reduce mechanical complexity, and enhance reliability in handling various clay-block geometries. The space in the front of Bric-a-Brick can accommodate a 10-bar compressor, ensuring that if electric vacuum-based gripping systems are not strong enough to pick up the clay block, a pneumatic one can easily be implemented.
The system’s mechanical stability can be significantly improved by incorporating an auto-leveling control system technology [
45] instead of a fully manual one, reducing undesired movements or vibrations during precision tasks.
The manually operated electric wheelbarrow can be replaced with a custom-designed Autonomous Guided Vehicle (AGV), providing navigation (and also material transport and monitoring) within the construction site. While this step involves re-thinking the vehicle from scratch, the integration of the existing wheelbarrow’s battery pack as a power source for all Bric-a-Brick’s equipment could be faster and more straightforward to implement right away.
SAM can be further optimized through user-defined segmentation cues, such as specifying inclusion/exclusion zones or reference points. Additionally, enhanced debugging tools and operator feedback mechanisms should be implemented to enable iterative segmentation corrections, such as point-and-click exclusion or inclusion areas.
The current notebook-based user interface can be replaced with either a native application on the Doosan Teach Pendant or a browser-based web application developed using Flask that runs on a tablet, improving usability and deployment flexibility.
The gripper currently interfaces with the notebook controller using a Modbus RTU protocol over a USB interface. A planned upgrade will transition this communication to Modbus TCP over Ethernet, enabling more robust, faster, and scalable integration with the broader automation network.
The process of mortar placement can be fully automated through the integration of a mortar-dispensing system, synchronized with the bricklaying cycle, to ensure consistent and efficient material application.
There is significant potential for enhanced integration between building information models (BIMs) and physical construction, enabling a continuous BIM-to-built workflow with minimal manual intervention. The robot’s flange–gripper interface serves a dual role: executing tasks and acquiring data. While manipulating the clay block, trajectory data can be captured and stored in a spatial database. This dataset can then be used to train a large language model (LLM) for autonomous trajectory generation. As the dataset grows and the AI model improves, more of the robotic process can be automated by comparing real-time wall construction with a parameterized BIM and generating future actions accordingly.
Interesting solutions in this respect are provided by the introduction and use of Deep Reinforcement Learning (DRL) routines. However, existing neural network policies for collision avoidance lack theoretical safety guarantees and struggle with unexpected situations [
22].
Using an augmented reality (AR) visor integrated with a building information model (BIM), the system can project the robot’s planned operations into the operator’s field of view. This would enable real-time spatial awareness [
46], allowing the operator to see from the robot’s perspective—similar to the capabilities offered by the
BRIX system. A wearable sensor (e.g., RFID, UWB, or IMU-enabled device) can be employed to detect when the operator enters the clay-block loading zone, enabling the robot to adjust its behavior accordingly for improved safety and coordination.
Overall, the application architecture could evolve toward real-time operation by leveraging real-time operating systems (RTOSs), time-synchronized fieldbus protocols (e.g., EtherCAT, TSN-based Ethernet), and low-latency edge computing. This would enable deterministic task execution, such as closed-loop motion control, real-time feedback from vision or force sensors, and dynamic trajectory adjustments based on environmental changes or operator input, significantly enhancing system responsiveness and reliability.