1. Introduction
Robotics simulation is now a key tool for both teaching engineering and testing systems at an early stage. This is because it lets you safely try out new ideas, quickly make changes, and get to complicated robotic scenarios that would normally need special hardware and supervised lab time. Recent evaluations indicate that educational robotics simulators are progressively employed to facilitate STEM (Science, Technology, Engineering, and Mathematics) education, programming practice, and experiential investigation of intelligent behaviors, particularly when they offer visual interaction, prompt feedback, and user-friendly programming workflows [
1,
2,
3]. Simultaneously, contemporary robotics research and industrial advancement persist in utilizing virtual environments and simulator-based digital twins for algorithm prototyping, performance assessment, and the mitigation of risks linked to physical implementation [
4,
5].
Recent research indicates that digital twin and immersive virtual platforms are increasingly employed to facilitate active learning, human–robot collaboration, and enhanced safety in manipulator training scenarios, thereby underscoring the significance of simulation as both a technical and educational resource [
6,
7,
8].
An attractive feature of CoppeliaSim in this regard is the fact that it allows for the integration of external control software, sensors, end-effectors, and robot models into a unified workspace, as well as the facilitation of physics-based interaction [
9].
In more recent studies, CoppeliaSim is still used for modeling, control, and validation. For both academics and teachers, this proves that it is a valuable resource [
5,
10]. Its value extends beyond visual perception; it may also serve as a programmed experimental platform for the controlled and reproducible testing of robotic manipulation, item motion, and system-level behavior. This is especially pertinent for motion-based studies since a high-quality trajectory is still an important feature of an industrial robot’s performance. A recent study found that geometric continuity criteria like C1, C2, and C3 could significantly impact the smoothness and characteristics of an industrial robot’s trajectory [
11]. In this context, C1 continuity refers to continuity of velocity, C2 continuity refers to continuity of acceleration, and C3 continuity extends smoothness to higher-order motion variation associated with jerk-sensitive trajectory quality. Improved robot adaptability and decision support in structured tasks are being discussed increasingly often in the robotics literature, along with trajectory smoothness, data-driven modeling, and AI-enhanced control [
12,
13].
Their compatibility with third-party software is a huge plus for these kinds of setups. In academic and institutional contexts, this is crucial because it enables the simulation layer to interact with externally executed machine learning models, data processing pipelines, and higher-level control logic. This connection is accomplished in this work using a Python program that is connected to CoppeliaSim over the ZeroMQ Remote API v2.0.4. This allows an external script to access scene objects, robot targets, grasping logic, and bin locations in real-time. By bypassing a locked internal routine in favor of an open, changeable software layer, this solution makes it possible to manipulate the robotic workcell. This works well for both classwork and hands-on experimentation. The educational value of open simulator-based robotic laboratories has recently been highlighted in research on robotics digital twins and user-friendly online robotics platforms [
4,
14].
The Python programming language is well-suited for this position because of its intuitive syntax, extensive support for scientific computing, data processing, machine learning, and rapid interface building. Python is the top option for data science and AI (artificial intelligence), according to a frequently referenced poll on machine learning in Python. This is due to Python’s productivity, robust library support, and ease of creating high-level workflows from interoperable components [
15]. Rather than seeing data collecting, preprocessing, learning, prediction, and user interaction as discrete concepts, this approach allows students to comprehend, modify, and contribute to whole pipelines [
2,
14,
15]. This is particularly useful for robotics education. The importance of transparent model behavior and intelligible decision-making processes in applied intelligent systems has been highlighted in recent research on explainable AI [
16,
17], and this educational value is enhanced when Python-based workflows retain interpretability.
In the smart robotics business, data-driven control and machine learning are becoming more prominent. Adaptive systems that can learn, detect, and make choices independently have replaced inflexible automation, enabling more autonomous operation and more diverse manufacturing, according to evaluations in industrial robotics [
18,
19].
Professors may help students become ready for the change by teaching them basic robotic motions and explaining how a system learns new tasks by comparing them to examples or organized data. By integrating data, classification rules, and observable robot behavior into a single learning experience, robotic platforms may aid students in understanding ML, according to new research in robotics education and ML-enhanced education [
20,
21].
A growing movement toward robotics-oriented curriculum and practical educational frameworks that place an emphasis on hands-on learning, algorithmic reasoning, and the assessment of various computational methodologies is in line with this point of view [
22,
23].
Against this background, the present study proposes a virtual object-sorting framework that combines robotics simulation, learning-from-demonstration, and lightweight machine learning within a single Python-controlled workcell. The study is intentionally positioned at the simulator level, where reproducibility, controlled experimentation, and transparent observation of the complete workflow take priority over immediate hardware deployment.
Related Work
Related to the present work, prior studies may be grouped into three main directions, including surveys on imitation learning and learning-from-demonstration in robotics that examine interactive teaching, trajectory imitation, dexterous manipulation, and contact-rich tasks [
24,
25,
26]. The first includes educational robotics simulators and user-friendly virtual platforms designed to support programming practice, STEM learning, and interactive exploration of robotic behavior [
1,
2,
3,
14].
The second includes simulator-based robotic validation and digital-twin-oriented environments, where virtual workcells are used for prototyping, safety-oriented experimentation, and system-level testing [
4,
5,
6,
7,
8,
10].
The third includes machine-learning-supported robotics, where learning components are integrated into robotic pipelines for classification, decision support, or adaptive behavior [
12,
13,
18,
19,
24,
25,
26].
The present study is positioned at the intersection of these three directions. Unlike works focused mainly on simulator interaction or on robotic validation in isolation, the proposed framework exposes a complete workflow linking minimal demonstrations, rule inference, structured dataset relabeling, comparative classifier training, autonomous execution, and offline inference benchmarking within a single Python–CoppeliaSim environment.
Despite the growing literature on educational robotics simulators, simulator-based digital twins, and machine-learning-supported robotic platforms, several gaps remain. First, many existing studies emphasize simulation, interaction, or control, but do not expose a complete data-to-action workflow linking minimal demonstrations, training-data generation, model training, and autonomous robotic execution within the same reproducible environment. Second, comparative studies often focus either on task performance or on algorithmic aspects in isolation, without clearly separating end-to-end robotic execution from classifier-dependent inference cost. Third, relatively few simulator-based studies present a compact framework in which a user-defined sorting rule can be inferred from minimal demonstrations, transferred to a larger structured dataset, and then evaluated both online in the robotic workcell and offline through prediction benchmarking, even though recent surveys show that imitation-learning and demonstration-based robotics research increasingly emphasizes richer task classes, interaction modes, and manipulation settings [
24,
25,
26].
The main aim of this study is to develop and evaluate an integrated Python–CoppeliaSim framework for learning-from-demonstration in a virtual robotic object-sorting task. More specifically, the present framework follows the learning-from-demonstration paradigm at the level of task-rule acquisition: the user demonstrates the desired object-to-bin assignment, the system infers the corresponding symbolic mapping, and the robot then applies that learned mapping autonomously. The study does not address trajectory imitation or low-level kinesthetic motion learning. To achieve this aim, the study pursues three objectives: first, to implement a complete workflow that links manual demonstrations, rule inference, structured dataset generation, classifier training, and autonomous robotic replication; second, to compare the behavior of several lightweight classical classifiers under identical simulated conditions; and third, to distinguish between task-level correctness during online robotic execution and computational efficiency during offline inference benchmarking.
The main contribution of this work is not a new classifier, but an integrated experimental framework that links minimal human demonstrations, automatic rule transfer to a larger structured dataset, comparative training of lightweight classifiers, online robotic replication, and offline inference benchmarking. Accordingly, the scientific novelty of the present study does not lie in proposing a new data-processing algorithm for shape discrimination, but in integrating minimal task demonstration, automatic rule transfer, comparative lightweight classifier evaluation, autonomous robotic replication, offline benchmarking, and perturbation-aware robustness analysis within a single transparent simulator-based framework. Its distinctive value lies in separating task-level correctness from computational efficiency within the same Python–CoppeliaSim workcell. In addition, the framework includes a perturbation-aware robustness layer that complements the nominal online and offline evaluation. This layer introduces controlled positional perturbation, systematic bias, controlled shape corruption, repeated perturbation voting, and stability-aware scoring, thereby extending the comparison beyond nominal correctness and raw inference time.
2. Materials and Methods
2.1. General Research Methodology
The research methodology adopted in this study combined software development, virtual robotic experimentation, comparative machine learning evaluation, and quantitative performance analysis within a single Python–CoppeliaSim framework. The methodological workflow consisted of five main stages:
A virtual robotic workcell for object sorting was created in CoppeliaSim, including the robot manipulator, gripper, worktable, objects, and destination bins.
An external Python application was developed to manage simulator communication, user interaction, data collection, classifier training, autonomous execution, and benchmark logging.
The learning-from-demonstration procedure was implemented by allowing the user to provide minimal manual examples, from which a shape-to-bin rule could be inferred and then transferred to a larger structured dataset of object positions.
Five lightweight classifiers were trained and compared under identical conditions using both manual and file-based samples.
The framework was evaluated at three complementary levels: through online autonomous replication in the virtual workcell, through offline inference benchmarking designed to isolate prediction cost from robot motion, and through perturbation-aware robustness evaluation under controlled uncertainty.
This methodological structure was adopted in order to ensure reproducibility, transparency, and a clear separation between robotic execution behavior and classifier-dependent computational performance.
2.2. Hardware-Independent Virtual Workcell
The proposed system is composed of two tightly coupled components: a virtual robotic workcell implemented in CoppeliaSim Edu 4.9.0 and an external control application written in Python 3.12 and executed from PyCharm 2024.3.5. The simulated scene contains an ABB IRB140 industrial robot, a Baxter vacuum gripper mounted on the robot end-effector, a rectangular support table, two destination bins, and eight manipulable objects placed on the table, namely four cubes and four cylinders. The two bins, referenced in the application as Bin_1 and Bin_2, are positioned on the left and right sides of the robot and represent the two output classes of the sorting task. The scene used in the experiments is shown in
Figure 1. Its composition follows the object handles defined in the Python program, where the gripper is accessed as/IRB140/BaxterVacuumCup, the motion reference as/IRB140/target, the bins as/Cos_1 (first bin) and/Cos_2 (second bin), and the objects as/Cub_1 to/Cub_4 (four cubes) and/Cilindru_1 to/Cilindru_4 (four cylinders).
The system was designed as a compact educational workcell in which the classification problem remains simple, but the robotic pipeline remains complete. This simulator-based design was also selected to provide a controlled and reproducible experimental environment before any transfer to physical robotic hardware. Accordingly, the present study should be interpreted as a virtual validation of the framework rather than as a real-robot deployment study. The user interacts with the Python application, while the robot motion and object transport are executed in the simulation scene. This separation makes it possible to observe the entire chain from user action to robotic execution without modifying the internal logic of the CoppeliaSim scene itself.
2.3. Remote Connection, Scene Initialization, and Motion Control
The Python program connects to CoppeliaSim through the RemoteAPIClient from the coppeliasim_zmqremoteapi_client package and obtains the main simulation object through client.getObject (‘sim’). After the connection is established, the application retrieves the handles of the gripper, target dummy, bins, and manipulable objects. It also stores the initial position of the target dummy, the initial orientation of the target, and the world coordinates of the two bins, which are later used during sorting to return the robot to a known reference configuration.
Robot motion is implemented through the target object/IRB140/target. Instead of commanding joints directly, the application updates the position of this target in Cartesian space, and the robot follows it in the simulated environment. The function move_gripper_to() performs this motion by linear interpolation between the current target position and the destination point over a predefined number of steps. In the current implementation, the default motion uses 20 interpolation steps and a delay of 0.05 s between successive updates, while the final descent during grasping is performed with 30 steps and a delay of 0.03 s. This approach provides smooth and reproducible motion without requiring explicit trajectory planning at the joint level.
The pick-and-place routine is implemented in the function pick_and_place(). For each selected object, the application first reads the object position and checks that its vertical coordinate is above the minimum valid threshold. A safe transport height of 0.8 m is then used to define an approach point above the object and a second point above the destination bin. The robot first moves above the object, then descends to a pickup point located 0.035 m above the object center, attaches the object, rises again to the safe height, moves above the destination bin, descends to the destination position, releases the object, and finally returns to the initial target pose and orientation.
Object attachment is simulated through hierarchy reassignment. The function attach_object() first detaches the object from its current parent, sets it to static and non-respondable, and then reparents it to the gripper. The function returns True only if the gripper becomes the effective parent of the object. The reverse procedure is performed by detach_object(), which reassigns the object to the world and restores its dynamic and respondable state. In order to avoid invalid selections during execution, the function get_first_free_object() excludes objects already attached to the gripper, already placed inside one of the bins, objects below the height threshold z < 0.05, and objects outside the admissible workspace. Among the remaining candidates, the function selects the nearest object in the horizontal plane.
2.4. Python Application and Software Workflow
The software workflow of the application is summarized in
Figure 2.
The execution starts by launching the CoppeliaSim scene and then running the Python script. During initialization, the program establishes the simulator connection, loads the machine learning model registry, initializes the GUI, creates the label encoder for the two object classes, and prepares the internal variables used for the manual samples, training statistics, and model persistence across GUI events.
After initialization, the application enters an event-driven loop implemented through window.read(). From this point onward, the user can choose one of five operating modes. The first mode is manual execution, in which the user selects an object type and a destination bin and commands a single pick-and-place action. The second mode is model training, in which the application loads a selected number of samples from the CSV file, determines the labeling rule, optionally appends manual examples, and trains the selected classifier. The third mode is replication with machine learning, in which the trained model predicts the bin for each remaining object and the robot automatically performs the corresponding placements. The fourth mode is the offline prediction benchmark, which evaluates inference throughput without moving the robot. A fifth mode is perturbation-aware robust replication, in which the trained classifier is evaluated under controlled feature-level uncertainty by combining perturbed voting, shape corruption, and stability-aware scoring during autonomous execution. The process ends when the user exits the application. The branch structure visible in
Figure 2 matches the event-handling logic in the source code, including the optional path for collecting manual training samples and the return to the operation-selection state after each completed action.
This software structure was selected because of its distinct divisions of labor between training, benchmarking, autonomous execution, and demonstration. This means that the same software may serve as a human rule definition tutorial or a platform to compare several lightweight classifiers in the same scenario.
2.5. Graphical User Interface and Function of Each Control
The Python interface, shown in
Figure 3, was implemented in PySimpleGUI and directly mirrors the internal software workflow defined in the source code.
The upper part of the window is dedicated to manual execution, while the central and lower sections group the controls for training, offline benchmarking, and perturbation-aware robust replication. The field Shape is a drop-down selector with the two available object classes, Cub and Cilindru, and determines which object type will be selected from the scene. The field Bin is a second drop-down selector that allows the user to choose the destination class, either Bin_1 or Bin_2. The checkbox Enable ML Learning determines whether the current manual action will also be stored as a supervised training example. When this option is active, pressing Execute not only moves the selected object to the chosen bin, but also records one feature vector and one class label in the internal lists manual_X and manual_y. The text field Manual samples collected displays in real-time the number of such stored demonstrations and is updated after each recorded execution.
The button Execute performs a manual pick-and-place operation using the values currently selected in the Shape and Bin controls. If the user does not provide both fields, the application stops the action and shows a warning message. If no valid free object of the selected shape remains in the scene, the action is also rejected. The button Replicate with ML starts the autonomous sorting routine, but only if a model has already been trained and a valid rule mapping is available. Otherwise, the application prevents execution and prompts the user to train the model first.
The lower robustness-oriented section of the GUI is dedicated to perturbation-aware replication. The field Scenario allows the user to select one of four controlled uncertainty settings: Low noise, Medium noise, High noise, and Bias + medium noise. The field Votes per object defines how many perturbed predictions are aggregated for each evaluated object. The button Run Robust Replication executes the perturbation-aware protocol, applies repeated voting under the selected scenario, computes stability-aware results, and stores the corresponding output in a dedicated CSV log.
The central section of the interface controls model training and offline-capable configuration. The field Training dataset (CSV positions) specifies the path to the CSV file used for loading position samples, with ml_training_random_positions.csv as the default. The associated browse button allows the user to replace the file interactively. The field Training rule defines how class labels are assigned to the file-based samples. Three options are available in the current implementation: Manual demo (infer rule from manual moves), Fixed: Cube Bin_1, Cylinder Bin_2, and Fixed: Cube Bin_2, Cylinder Bin_1. In the first case, the program infers the shape-to-bin rule from the majority of manual demonstrations stored for each encoded object class. In the other two cases, the mapping is predefined and does not depend on user demonstrations.
The checkbox Include manual samples in training controls whether the examples collected through manual execution are appended to the file-based training set. The field File samples for training selects the number of rows loaded from the CSV file. The current implementation provides the values 2, 4, 8, 20, 1000, and 1600, with 1000 as the default. The field ML Model selects the classifier to be trained. The available models are decision tree, kNN with k = 3, logistic regression, naive Bayes, and linear SVM, exactly as defined by the model registry in the source code. The button Train ML Model triggers the complete training procedure: rule creation or retrieval, stratified dataset loading, automatic relabeling of file samples, optional manual-sample inclusion, classifier fitting, and display of the final training statistics.
The lower section of the interface is dedicated to the prediction benchmark. The field Predict probes selects the number of prediction inputs used in the benchmark, with the available values 1600 and 10,000. The field Timing report selects the statistical format used to summarize inference time, either median (50 runs) or mean ± std (30 runs). The button Run Predict Benchmark performs repeated predictions on the selected number of probes without commanding the robot and saves the results to a dedicated CSV file. The same window in the interface supports manual demonstration, nominal model training, nominal offline benchmarking, and perturbation-aware robust evaluation without requiring a separate execution environment. Finally, the Exit button closes the GUI and terminates the application.
2.6. Data Representation, Rule Generation, and Model Training
The ML layer uses two categories of data. The first category consists of manually demonstrated examples stored during GUI interaction. Each time the user performs a manual action with “Enable ML Learning” active, the application reads the current position of the selected object directly from CoppeliaSim, using the simulator object handle to obtain its Cartesian coordinates relative to the world reference frame. These values provide the spatial components x, y, and z of the input vector. The object type, identified in the scene as Cub (for the cube) or Cilindru (for the cylinder), is encoded using a LabelEncoder fitted to these two labels and provides the fourth input component, encoded_forma. As a result, each manually collected sample is represented by a four-dimensional feature vector of the form [encoded_forma, x, y, z], together with the destination bin selected by the user as the class label. This compact feature representation was intentionally selected to keep the baseline workflow transparent and reproducible. At the same time, because it remains idealized and does not include sensor-derived uncertainty or vision-based perception, the framework also evaluates these same structured features under controlled synthetic perturbations during the robustness-oriented stage. The second category consists of file-based samples read from the CSV dataset. The loader requires the presence of the columns encoded_forma, x, y, and z, and rejects the file if one of these fields is missing or if one of the two encoded classes is absent.
For training, the file-based dataset is loaded either in full or in a stratified subset. The function load_positions_from_csv_stratified() splits the samples by encoded object class and attempts to construct a balanced subset with respect to cubes and cylinders. A fixed random seed of 42 is used for reproducibility. Once the subset is obtained, class labels are generated by the function label_file_samples_from_mapping(), which applies the currently active shape-to-bin rule to each file sample. If the user selected the manual-rule mode, the mapping is inferred automatically by infer_shape_to_bin_mapping() from the majority destination assigned during manual demonstrations for encoded class 0 and encoded class 1. The application requires at least one manually demonstrated cube and one manually demonstrated cylinder before allowing this option to proceed.
Reason for Choosing Classifiers
The selection of the investigated classifiers was guided by four criteria: suitability for relatively small, structured datasets derived from manual demonstrations and file-based position samples; support for rapid training and prediction within an interactive simulator-based workflow; interpretability and methodological transparency, which are important in an educational setting; and diversity of classification principles rather than minor variants of the same approach. For this reason, the study included a decision tree as a rule-based model, k-nearest neighbors as an instance-based model, logistic regression as a linear probabilistic model, Gaussian naive Bayes as a generative probabilistic model, and linear SVM as a margin-based discriminative model.
In the context of the present study, these five classifiers also serve as internal comparative baselines. The purpose was not to compete with state-of-the-art deep-learning or imitation-learning systems, nor to claim that all five models provide the same degree of interpretability. Instead, the selected classifiers were used as lightweight comparative baselines representing different classification principles under identical task, data, and execution conditions, including a robustness-oriented evaluation layer added to the nominal workflow.
The training function clones the selected classifier from the internal registry and fits it on the final dataset, which may include only relabeled file samples or both file samples and manual samples, depending on the GUI setting. The models currently implemented are decision tree, kNN with three neighbors, logistic regression with max_iter = 1000, Gaussian naive Bayes, and a linear SVM using SVC (kernel = “linear”). After fitting, the application stores the number of file samples used, the number of manual samples used, the total number of training examples, the training time in milliseconds, and the effective rule applied to cubes and cylinders.
The main hyperparameters of the investigated classifiers, together with the principal implementation and experimental settings affecting reproducibility, are summarized in
Table 1.
2.7. Autonomous Replication and Offline Benchmark
After training, the function replicate_with_ml() performs autonomous execution inside the simulated workcell. For each object type, the program repeatedly selects the first valid free object, reads its position, builds the feature vector, predicts the destination bin using the trained classifier, and calls pick_and_place() to transport the object to the predicted bin. During this process, the predicted bin is compared with the bin defined by the current mapping rule, and the program counts the number of correct and wrong placements. At the end of the run, the application computes the total number of moves, total execution time, and classification accuracy, then appends the results to a model-specific CSV file named replication_results_<model>.csv.
In addition to robot-based replication, the application includes an offline inference benchmark implemented by predict_benchmark(). This benchmark uses the CSV-derived feature matrix only and therefore excludes robot motion from the measured time. If the selected number of probes is 1600, the full dataset is used directly. If the selected number of probes is 10,000, the 1600-sample matrix is replicated and truncated until 10,000 feature vectors are obtained. Prediction time is then measured repeatedly. For the median (50 runs) option, the program executes 50 repeated prediction runs and reports the median time. For the mean ± std (30 runs) option, the program executes 30 repeated runs and reports the mean and standard deviation. The results are saved to a second CSV log, predict_benchmark_<model>.csv, together with the model name, number of prediction probes, reporting mode, and timing statistics.
Because the nominal task remains highly structured and quickly reaches saturated accuracy, the experimental workflow also includes a perturbation-aware robustness evaluation designed to expose classifier behavior under controlled uncertainty.
2.8. Perturbation-Aware Robustness Evaluation
The perturbation-aware robustness evaluation was introduced to complement the nominal deterministic experiments with a more informative uncertainty-oriented analysis. Since the baseline representation uses only object type and Cartesian position, the robustness protocol was designed to expose these structured features to controlled synthetic perturbations during evaluation.
More specifically, the protocol applies Gaussian positional perturbation to the Cartesian coordinates, optional systematic bias in the horizontal plane, controlled shape corruption at the feature level, repeated perturbation voting for each evaluated object, and stability-aware scoring. For each object, multiple perturbed predictions are generated, the final bin is determined by majority vote, and the corresponding stability is computed as the fraction of votes supporting the selected output class. Thus, the robustness protocol perturbs the nominal Cartesian position through the additive model defined in Equation (1) while retaining the same downstream classification interface.
where
is the nominal object position,
is the perturbed position used during robustness evaluation, ϵ is the Gaussian positional perturbation, and
, is an optional horizontal bias term used only in the biased scenario.
The final voted class and the associated decision stability were computed according to Equation (2).
where
is the number of perturbation votes per object,
is the class predicted at vote
,
is the final voted output class, and
is the decision stability score.
Four perturbation scenarios were considered: Low noise, Medium noise, High noise, and Bias + medium noise. These scenarios differ in perturbation amplitude, optional systematic bias, and shape-corruption intensity. In addition to final task correctness, the protocol records decision stability under perturbation, thereby extending the comparison beyond nominal correctness and raw inference time alone.
Although this protocol does not replace a real perception pipeline with sensors, occlusion, calibration errors, or hardware-level uncertainty, it provides a substantially less idealized simulator-based evaluation than the original nominal setup. Nevertheless, it allows the research section to incorporate several destabilizing factors in a controlled and reproducible way, including positional noise, systematic bias, and feature-level shape corruption.
3. Results
The proposed framework was evaluated from three complementary perspectives. First, an online replication experiment was conducted in the CoppeliaSim workcell, where the trained model was used to drive the autonomous pick-and-place of the remaining objects. Second, an offline prediction benchmark was performed in order to isolate inference time from robot motion and compare the computational behavior of the five implemented classifiers. Third, a perturbation-aware robustness evaluation was introduced in order to examine classifier behavior under controlled uncertainty. This evaluation design was chosen deliberately in order to distinguish robotic task-level performance from classifier-dependent computational behavior under otherwise identical experimental conditions.
3.1. Autonomous Replication in the Virtual Workcell
For the robot-based experiments, five classifiers were tested: decision tree, k-nearest neighbors with k = 3, logistic regression, naive Bayes, and linear SVM. Each model was trained under the same sorting rule, with two manually collected samples and file-based training subsets of size 2, 4, 8, 20, and 1000, resulting in total training sizes of 4, 6, 10, 22, and 1002 samples, respectively. In all cases, the inferred sorting rule remained identical, namely Cube Bin_2 and Cylinder Bin_1.
The online replication results indicate that all five models achieved perfect task performance across all tested training sizes. In every recorded run, the robot executed six autonomous placements, all of which were correct, resulting in 100% classification accuracy and zero wrong moves. This indicates that for the considered rule-based sorting task, the problem was sufficiently structured for all tested classifiers to learn a consistent decision boundary even from very small training sets. The inclusion of multiple training-set sizes was intended to provide a basic sensitivity check with respect to sample availability. Under the present deterministic rule, the online task remained saturated in terms of accuracy across all tested sizes, while the main observable variation appeared in replication time rather than in classification correctness. This confirms that in the full-data nominal regime, raw accuracy alone offers limited discriminative value for classifier comparison and therefore needs to be complemented by an additional robustness-oriented perspective.
Although the classification outcome was identical for all models, small differences were observed in total replication time. However, these differences remained modest, with all values concentrated in a narrow interval around 46–49 s. This suggests that in the online robotic experiment, the overall execution time was dominated by the physical manipulation sequence implemented in the simulator rather than by the computational cost of the classifier itself.
To facilitate comparison at the model level, the replication times were summarized across the five tested training sizes using the mean, standard deviation, minimum, and maximum values (see
Table 2). Linear SVM and decision tree yielded the lowest mean replication times, while kNN showed the highest average duration. Even so, the spread between the fastest and slowest model remained small relative to the total cycle time of the robotic task.
Figure 4 confirms that the online replication times remained tightly grouped across all tested models, with only minor dispersion. This supports the observation that the total execution time is dominated mainly by the simulated pick-and-place sequence rather than by classifier-dependent prediction cost.
3.2. Offline Prediction Benchmark
In order to compare the computational efficiency of the trained classifiers independently of robot motion, an offline benchmark was performed on 10,000 prediction probes. Two reporting modes were used: median over 50 runs and mean ± standard deviation over 30 runs.
The results (see
Table 3) showed a clear separation between the models. Decision tree and logistic regression were the fastest in terms of median inference time, with 2.6271 ms and 2.6722 ms, respectively. Naive Bayes followed at 3.3622 ms, while linear SVM required 4.6750 ms. The slowest model by a large margin was kNN, which reached 15.6193 ms in the median-based evaluation. Therefore, while all models solved the sorting task correctly, their inference cost differed substantially when measured independently of the robotic execution layer.
A similar ranking was observed in the mean ± standard deviation condition. Decision tree again remained among the most efficient models, followed by naive Bayes and logistic regression, whereas linear SVM showed higher average times and kNN remained the slowest. The higher dispersion measured for kNN also suggests less stable inference-time behavior than for the other models under the tested conditions.
These results confirm that in the present application, the choice of classifier had little effect on end-to-end robotic sorting accuracy but had a measurable effect on raw inference throughput. Since all models achieved perfect online classification in the tested scenario, the offline benchmark becomes particularly useful for highlighting deployment-oriented trade-offs. In other words, once accuracy saturates on a deterministic task of this type, the main remaining differences between models concern computational efficiency rather than correctness.
Figure 5 confirms that the largest separation between models appeared when inference time was evaluated independently of robot motion. In this setting, kNN was clearly the slowest classifier, whereas decision tree and logistic regression remained among the fastest.
These findings indicate that in the present deterministic sorting task, online evaluation is sufficient to validate task-level correctness, whereas offline benchmarking is required to expose classifier-dependent computational differences. A third evaluation layer was therefore introduced to examine how the same classifiers behave when the nominal feature space is exposed to controlled perturbation.
3.3. Results of Perturbation-Aware Robustness Evaluation
Because the nominal binary sorting task remained saturated in the full-data regime, an additional perturbation-aware robustness analysis was performed to obtain a more informative comparison. This complementary evaluation was especially relevant under reduced-data settings, where the same deterministic rule was learned from much smaller training subsets and then tested under controlled uncertainty. The corresponding results are summarized in
Table 4.
As shown in
Table 4, the perturbation-aware experiments did not produce wrong final placements in the reduced-data setting, but they did reduce the mean decision stability in a consistent manner across scenarios. Under Low noise, the mean stability remained 1.0000 for all five classifiers, indicating fully consistent voted decisions. Under Medium noise and Bias + medium noise, the mean stability decreased to 0.8182, while under High noise, it decreased further to 0.7273, representing the strongest degradation. These findings indicate that in the present structured binary sorting task, controlled perturbations affected the internal robustness of the decision process more clearly than final task correctness. In this way, the revised research section partially accounts for destabilizing factors rather than reserving them only for future work. At the same time, the nearly identical stability values observed across classifiers suggest that the task remains relatively simple even under perturbation, so classifier separation is still more limited than in more complex learning settings.
To complement
Table 4 with a model-level visual comparison,
Figure 6 presents the replication time recorded for all five classifiers across the four perturbation scenarios in the reduced-data setting. Unlike mean decision stability, which remained identical across classifiers for each scenario, replication time preserved modest model-dependent differences, thereby offering a clearer graphical comparison at the classifier level.
Figure 6 confirms that the perturbation-aware protocol did not substantially separate the classifiers in terms of final task correctness or stability, but it preserved modest execution-time differences across models. In particular, kNN remained slower than the other tested classifiers across all four perturbation scenarios, whereas decision tree and linear SVM remained among the most time-efficient models in the reduced-data robustness setting. Therefore, the additional robustness-oriented evaluation extends the comparison beyond timing alone, even though the present compact task still does not generate strong separation in final accuracy.
In the full-data regime, final replication accuracy remained saturated even under the perturbation-aware protocol, confirming that the task structure remained highly learnable for all tested classifiers. In contrast, the reduced-data regime provided a more meaningful robustness perspective, because perturbations affected decision stability even when the final task accuracy remained unchanged.
These findings show that even when the final sorting decision remained correct, the robustness margin of the prediction process became smaller under controlled perturbation. Therefore, the comparison between classifiers is no longer restricted to throughput differences alone, but also includes a stability-oriented perspective under uncertainty, even if the present compact task still does not generate large classifier separation in the final accuracy.
4. Discussion
The results confirm that the proposed framework is effective as a virtual platform for studying learning-from-demonstration and lightweight machine learning in robotic object sorting, in line with the recent survey literature showing continued interest in imitation-learning-based robotic training across multiple task settings [
24,
25,
26]. In the online tests, all five classifiers were able to perfectly replicate their performance across all tested training sizes. There were no wrong moves and 100% accuracy in every recorded run. This shows that the learned decision rule was simple and structured enough to be reliably captured even from small training sets for the binary sorting task that was looked at. In practice, the virtual workcell was good enough to show the whole process, from human-defined sorting logic to robotic execution. In this sense, the learning-from-demonstration component of the framework should be interpreted as demonstration-based acquisition of a task rule rather than as demonstration-based reproduction of motion trajectories.
The findings are in line with the existing research that was covered in the Introduction, which describes simulation systems as helpful for safe experimentation, testing iteratively, and teaching robotics. Recent literature reviews have shown that educational robotics simulators, especially those with visual feedback and user-friendly interfaces, are a great tool for fostering hands-on inquiry, programming practice, and the observation of intelligent behaviors in a controlled setting [
1,
2,
3]. Since the proposed software let users choose the sorting criteria by hand, train a model outside of CoppeliaSim in Python, and then see the resulting autonomous behavior in CoppeliaSim, the present results are in line with that perspective. Thus, the framework serves as an instructional tool as well as a technical demonstration by making the link between data, model, and robot behavior immediately apparent.
The results also support the role of simulator-based platforms and digital-twin-oriented environments highlighted in the literature [
4,
5,
6,
7,
8]. Previous work has shown that virtual and immersive robotic platforms are useful for active learning, safer experimentation, and accessible system validation. The present study extends that educational value by integrating not only simulation and robotic motion, but also supervised learning, rule inference from demonstrations, and comparative model evaluation in a single workflow. This is important because many educational simulation studies focus mainly on interaction with the virtual scene, whereas the current framework makes the machine learning layer explicit and experimentally inspectable. More specifically, unlike many simulator-oriented educational studies that mainly emphasize interaction with the virtual scene, the present framework exposes the complete data-to-action loop, including demonstration capture, rule inference, dataset relabeling, classifier training, autonomous execution, and inference benchmarking.
From a robotics perspective, the findings also fit well with the literature describing CoppeliaSim as a flexible environment for robotic modeling, control, and validation [
5,
9,
10]. The application confirmed that CoppeliaSim can support not only scene visualization, but also reproducible robotic experiments in which manipulation, object state management, and external software control are tightly integrated. The software architecture based on Python and the ZeroMQ Remote API was especially useful in this respect, because it allowed the classification logic, GUI supervision, logging, and benchmarking procedures to remain external, transparent, and easy to modify, while the simulator handled the physical layer of motion and object transport.
An important point emerging from the results is the difference between task correctness and computational efficiency. In the online replication experiments, the classifiers behaved equivalently in terms of accuracy, and the observed differences in total replication time remained relatively small, with mean values between 46.397 s and 47.445 s across models. This suggests that under the present conditions, the total cycle time was dominated mainly by the repeated pick-and-place sequence rather than by classifier inference. This finding is in line with what is found in the literature on motion-oriented robotics, which states that, in addition to decision logic, trajectory execution and motion characteristics have a significant impact on the overall behavior of robots [
11]. Put simply, once the model’s predictions are accurate, the robotic transport process takes up the most time in the system.
This distinction highlights one of the practical strengths of the proposed framework, namely its multi-level evaluation strategy. In addition to this nominal online/offline distinction, the study also benefits from a third evaluation layer based on perturbation-aware robustness analysis. This addition makes it possible to examine classifier behavior not only through end-to-end correctness and inference throughput, but also through decision stability under controlled uncertainty. By combining online replication with offline inference benchmarking, the framework makes it possible to separate motion-dominated execution time from classifier-dependent computational cost, which would be much harder to observe from end-to-end robotic experiments alone.
This is further shown by the offline benchmark. The disparities in computational accuracy amongst models became much more apparent once robot motion was eliminated from the assessment. Decision tree and logistic regression topped the median-based benchmark with 10,000 probes, followed closely by naive Bayes, linear SVM, and kNN, which lagged behind. This order of merit is consistent with the computational character of the competing approaches. While kNN is still instance-based and needs to calculate distances at prediction time, decision trees and logistic regression use compact learnt decision structures. The higher dispersion observed for kNN in the repeated benchmark runs supports the same interpretation, showing that classifier choice matters more when computational efficiency is analyzed independently of robotic motion. At the same time, the robustness-oriented experiments show that classifier comparison should not be reduced to timing alone. Even in cases where task-level accuracy remained unchanged, the perturbation-aware protocol made it possible to observe reduced decision stability under stronger uncertainty, especially in the reduced-data regime.
These observations are also relevant in relation to the literature on AI-enhanced robotics and data-driven control [
12,
13,
18,
19]. Recent studies have emphasized that modern robotic systems increasingly combine control, sensing, and learning in order to improve adaptability and decision support. The present work does not claim to address complex adaptive robotics, but it does provide a simplified and transparent example of how data-driven decision rules can be embedded into a robotic pipeline and evaluated comparatively, which is consistent with the broader imitation-learning literature that spans interactive learning, trajectory imitation, dexterous manipulation, and contact-rich skill acquisition [
24,
25,
26]. In that sense, the contribution is more methodological and educational than performance-driven: the framework makes visible the distinction between “a model that is accurate enough for the task” and “a model that is computationally preferable for deployment”. Therefore, the manuscript should not be interpreted as claiming a novel classifier or an improved generic data-processing method for binary shape recognition. Its contribution is the creation and structured evaluation of a reproducible simulator-based framework that exposes the full path from demonstration to robotic execution under both nominal and perturbed conditions.
The educational significance of the framework is also supported by the literature on robotics-based AI teaching and accessible learning environments [
14,
20,
21,
22,
23]. Prior work has argued that robotics can make machine learning concepts more concrete by connecting training data, learned rules, and observable system behavior. The current results confirm this point: the user can define the sorting logic through demonstrations, trigger training through the GUI, observe autonomous sorting in the virtual workcell, and compare models through an explicit benchmark. This makes the framework suitable not only for demonstrating robotic sorting, but also for introducing students to model comparison, dataset size effects, and the distinction between online system performance and offline inference cost. A formal student-centered usability study was not included in the present work, but the current GUI-based structure and simulator-based workflow make such an educational evaluation feasible in future classroom-oriented research.
Limitations
Several limitations regarding the present inquiry should be mentioned. To keep things simple, the work was limited to only two classes of geometric forms and a binary shape-to-bin mapping. This resulted in a well-structured choice problem, and online validation of all models yielded flawless correctness. This weakens the discriminative power of the online trials, but it remains acceptable as a proof-of-concept setting for educational and methodological purposes.
To partially reduce this limitation, the study also included a perturbation-aware robustness analysis based on positional noise, systematic bias, controlled shape corruption, repeated perturbation voting, and stability-aware scoring. However, these perturbations remain simulation-based and do not replace a full perception pipeline with real sensors, visual occlusion, calibration uncertainty, or hardware-level manipulation variability.
The second point is that the feature space intentionally ignored all characteristics save the object type and Cartesian position. These were not covered: object occlusion, sensor noise, grip variability, location uncertainty, and camera-based perception. Therefore, the approach evaluates the efficiency of learning and robotic execution rather than the system’s ability to manage real perception or manipulation errors.
Finally, all experiments were conducted in simulation, and additional research is required before the reported results can be transferred to a physical robotic prototype. The main sim-to-real challenges concern the reliable estimation of object pose under real sensing conditions, the calibration of robot, gripper, and workspace coordinates, the repeatability of suction-based grasping, and the effect of communication and control delays during execution. Further difficulties may arise from discrepancies between simulated and physical contact behavior, including object slip, imperfect placement, and variation in grasp success across repeated trials. For this reason, future physical validation should not only replicate the present sorting task on hardware, but also examine the robustness of the learned decision pipeline under real perception noise, calibration errors, and actuation variability.
Last but not least, the benchmark only considered small classical classifiers. Despite our best intentions, this does not yet include more sophisticated models such as ensemble approaches, neural networks, or online adaptive learning methodologies; nonetheless, it does facilitate simpler comparisons.
5. Conclusions
This study presented an integrated Python–CoppeliaSim framework for robotic object sorting that combines learning-from-demonstration, comparative training of lightweight classifiers, autonomous pick-and-place execution, and offline inference benchmarking within a single virtual workcell. Within this framework, learning-from-demonstration is realized at the level of sorting-rule acquisition from minimal examples, and not at the level of continuous trajectory teaching.
The experimental results showed that all five tested classifiers—decision tree, k-nearest neighbors, logistic regression, naive Bayes, and linear SVM—achieved 100% online sorting accuracy in the considered virtual task, with zero wrong moves in all recorded runs. At the model level, the mean replication times remained in a narrow interval between 46.397 s and 47.445 s, confirming that end-to-end execution time was dominated mainly by the simulated pick-and-place sequence. In contrast, the offline benchmark on 10,000 probes revealed clearer computational differences: decision tree and logistic regression showed the lowest median inference times (2.6271 ms and 2.6722 ms, respectively), while kNN was the slowest model (15.6193 ms in the median-based evaluation). These quantitative results reinforce the distinction between task-level correctness and classifier-dependent computational efficiency in the proposed framework. At the same time, the perturbation-aware robustness analysis showed that classifier comparison can also be extended beyond nominal accuracy and raw inference speed by examining decision stability under controlled uncertainty, particularly in reduced-data conditions.
From an educational perspective, the main strength of the proposed framework lies in its transparency and integrative character. The novelty therefore lies primarily in the framework-level integration and evaluation strategy rather than in the invention of a new classification algorithm. Rather than isolating machine learning from robotic execution, the application connects demonstrations, data generation, training, prediction, and robot behavior within a single observable workflow. This makes it suitable for teaching how learned decision rules can be embedded into robotic systems and how different classifiers may behave similarly in terms of correctness while still differing in computational cost.
Even while this approach is theoretically limited to a basic binary sorting situation, it does provide a solid groundwork for upgrades in the future. Object classes that are more varied, feature representations that are more detailed, vision-based perception that handles uncertain or noisy inputs, online adaptation, and validation on physical robotic platforms are all potential areas for future improvement. Already, in its current state, the framework provides a user-friendly and accessible testbed for teaching and learning about the relationship between robotic manipulation and machine learning in a simulated setting.
Future work will focus on extending the present simulator-based robustness analysis toward more complex object classes, richer feature representations, camera-based perception, and physical robotic validation under real sensing and actuation variability.