Assessing and Benchmarking Learning Outcomes of Robotics-Enabled STEM Education

: Experienced middle school mathematics and science teachers were recruited for a pilot study. The teachers separately responded to a survey related to determining expected learning outcomes based on their traditional teaching, classroom experiences and observations, and self-brainstorming. The teachers then received training on how to design, develop, and implement robotics-enabled lessons under a design-based research approach for experiential learning, and taught robotics-enabled lessons to a selected student population in classroom settings. The teachers then responded to the survey for the robotics-enabled teaching. For each case (traditional and robotics-enabled), the survey responses were analyzed, and a set of expected learning outcomes of math and science lessons was derived separately. The thematic analysis results showed that the expected learning outcomes for the robotics-enabled lessons were not only related to the educational gains (content knowledge) observed in traditional teaching, but also to the improvements in the behavioral, social, scientiﬁc, cognitive, and intellectual aptitudes of the students. Then, a set of metrics and methods were proposed for assessing the learning outcomes separately. To validate the assessment metrics and methods, teachers from different schools taught two selected robotics-enabled lessons (one math, one science) to same grade students, and separately assessed the learning outcomes of each student using the proposed metrics and methods. The learning outcomes were then compared and benchmarked between schools and subjects. The results of a user study with the teachers showed user acceptance, effectiveness, and suitability of the assessment metrics and methods. The proposed scheme of assessing learning outcomes can be used to assess and justify the beneﬁts and advantages of robotics-enabled STEM education, benchmark the outcomes, help improve teaching preparations, motivate decision-makers to confer on robotics-enabled STEM education and curricula development, and promote robotics-enabled STEM education.


Introduction
Students in K-12 levels need to learn STEM concepts that are fully or partly abstract in nature [1,2]. Learning abstract STEM topics in young ages may limit the learners' comprehension abilities [1][2][3][4][5][6]; especially, such abstract practices may increase the cognitive workload of the learners, and decrease their computational thinking abilities [2]. This problem can become more severe when lower grade students (e.g., K-8) attempt to learn the abstract concepts [3]. This problem can also persist with upper grade students in college levels, especially with the freshmen or with other grade college students who attempt to learn new concepts for the first time. It is assumed that suitably designed tangible learning platforms may provide active and experiential learning opportunities to students that may facilitate them having kinesthetic learning experiences and making the abstract STEM concepts easy to comprehend [2]. In this regard, it is posited that application of robotic and mechatronic devices as tangible learning platforms may be a pragmatic choice to illustrate abstract STEM concepts to students [1][2][3][4][5][6][7]. The reasons behind parents, and communities [2,[17][18][19][20][21][22][23][24][25][26][45][46][47][48][49]. If so, it requires an appropriately designed and customized assessment scheme for robotics-enabled STEM education [50][51][52][53][54][55][56][57][58][59]. In addition, there may have differences in the assessment schemes between student grades, subjects, etc. Furthermore, the assessment scheme needs to be validated in actual classroom settings for its practicality and generalization. However, the attempt to propose an appropriate, customized, comprehensive, and validated assessment scheme comprising of appropriate assessment methods and metrics for assessing learning outcomes of robotics-enabled STEM education is still a future work.
Hence, the objective of this article is to attempt to propose an appropriate, customized, and comprehensive assessment scheme comprising of appropriate assessment methods and metrics for assessing the learning outcomes of robotics-enabled STEM education for K-12 levels, and to validate the assessment scheme for its practicality and generalization. For the simplicity, in this article, the efforts will be limited to middle school math and science lessons. However, the results, in principle, may be applicable to robotics-enabled STEM lessons for K-12 and collegiate levels as well. The learning vision is that the derived assessment methods and metrics can guide robotics-enabled lesson designers to design, implement, and predict expected learning outcomes of the lessons, which can help them distinguish the benefits of robotics-enabled lessons from that of traditional lessons, and thus can highlight the importance of robotics-enabled lessons [60][61][62][63][64]. The vision is also that the overall efforts can enhance the learning outcomes significantly.

Analysis of Related Works
A plethora of research reports are found in the existing literature that clearly show the growing interests in research in robotics-enabled STEM education [1][2][3][4][5][6][7]13]. The applications of repeated evaluation and feedback approach were proposed and verified to assess and optimize design, development, and implementation performance of a professional development program for in-service middle school teachers for teaching K-12 STEM lessons using robotics-enabled illustrations [1]. The prerequisites (the expected qualifications, attitudes and aptitudes) of K-12 students interested in attending robotics-enabled STEM lessons were determined [2]. The prerequisites were meant to be the qualifications, attitudes, and aptitudes that the prospective students would need to have to obtain optimum benefits from robotics-enabled STEM lessons. For this purpose, the computational thinking ability of the students was identified as one of the key requirements that the students would need to have before they could attend robotics-enabled STEM lessons. It was found that the computational thinking ability of the students might also increase if the students could participate in robotics-enabled STEM lessons [2].
A teaching framework called the technological-pedagogical and content knowledge (TPACK) was applied to instruct robotics-enabled middle school mathematics and science lessons [3]. The variations in the applications of the TPACK, and the impact of the TPACK framework on teaching robotics-enabled STEM lessons with varying difficulty were investigated, and the outcomes of the robotics-enabled lessons over that of the traditionally taught lessons were compared. The results showed the superiority of the robotics-enabled teaching over the traditional teaching. The dynamic behaviors of the TPACK framework for teaching robotics-enabled STEM lessons in middle schools were explored [4]. The results showed significant variations in the effectiveness of TPACK with variations in subjects, grades, and teachers. The factors affecting the trust of students and teachers in robots for robotics-enabled middle school STEM lessons were determined [5]. It was found that there was significant impacts of the trust of the students in the robots on their learning outcomes in their robotics-enabled STEM lessons. The systems approach to analyzing the design-based research strategy in robotics-enabled middle school STEM lessons was proposed, and its effectiveness was justified [6]. The effectiveness of cognitive apprenticeship approach in conjunction with the systematic design-based research was confirmed.
It was found that robots were applied to enhance the learning effectiveness of English language in elementary school students [7]. As was reported, a framework utilizing LEGO robots was developed to enhance problem solving ability in students [8]. The authors found that the robotics-enabled teaching enhanced student engagement in classrooms. The authors found that the use of LEGO robots was effective to create interests among high school students in their STEM lessons [9]. The review results showed that the social robots could be useful in education as they could be used as robot tutors or robotic peer learners [10]. It was argued that the social robots were proved effective in improving the cognitive and affective abilities of students. It was reported that the learning outcomes were similar to those of human teachers tutoring similar lessons. It might happen because of the interactive embodiment and physical presence of the social robots that the traditional nonrobotic teaching and learning technologies and facilities could not provide. A review study on the applications of robotics in STEM education especially on young children education was conducted [11]. It was showed that there was a strong trend in the effectiveness of robotics-enabled education on children. A systematic survey to explore the educational potentials of robots and robotics-enabled lessons in school environment was conducted, and strong learning potential were found [12]. It was found that the creativity of students in higher education significantly increased through robotics-enabled STEM lessons [13].
Instructing a mechatronics course to undergraduate engineering students in colleges following the TPACK framework was proven efficacious [14]. An all-in-one robotic platform was used to instruct mechatronics fundamentals such as actuators and sensors to the students. It was found that instructing mechatronics concepts using the robotic platform seemed to enhance learning outcomes and learners' satisfaction. The 7E instructional model with the design-based research (DBR) method was proposed to design and instruct a mechatronics course for undergraduate engineering students [15]. Some robotics devices such as actuator and sensor systems were used as the pedagogical tools to instruct the mechatronics concepts, especially the fundamental concepts of actuation and sensing. It was found that the implementation of mechatronics lessons following the 7E instructional model along with the DBR method enhanced the teaching and learning outcomes and effectiveness. A few mechanical engineering concepts such as additive manufacturing (3D printing), pneumatics principles, fine machining (such as laser engraving), etc. were instructed through the applications of a robotic platform. It was found that the experiential kinesthetic learning through the applications of a robotic platform enhanced the teaching and learning outcomes and effectiveness significantly [16].
In all of the above examples, different aspects of robotics-enabled STEM education were addressed. Enhancing overall learning outcomes was considered as the main objective of implementing robotics-enabled STEM education [7][8][9][10][11][12][13][17][18][19][20][21][22][23]. A few examples such as [14][15][16] attempted to present the impacts of robotics-enabled STEM education on the learning outcomes. The literature shows that researchers are very active in proposing different approaches to assess the learning outcomes [17][18][19][20][21][22][23][24][25][26], including the SOLO model. The authors expressed the learning outcomes in terms of critical thinking [17]. The authors explained the learning outcomes from the perspective of the students or the learners [18]. The authors described the importance of assessing learning outcomes of students in higher education [19]. Other issues related to assessing the learning outcomes such as the definitions, thresholds, roles, integration, student perception and sustainability of the learning outcomes were presented in various ways [20][21][22][23][24][25][26]. However, no scheme seems to be holistic and comprehensive, rather each of those schemes focused on some part of the learning outcomes. So far, there is no generalization in the assessment methods and metrics, which cannot help benchmark the learning outcomes among different grade students, subjects, and schools. Most importantly, those state-of-the-art works did not consider robotics-enabled STEM education. It is assumed that a new paradigm in studying the learning outcomes is necessary for the lessons instructed using robotics as pedagogical tools. It is believed that a comprehensive assessment scheme could capture all the possible and relevant learning outcomes of robotics-enabled education, and thus could help use the integrated results for various purposes such as curriculum development, benchmarking, student and teacher awarding, education related policy planning and decision-making, etc.
However, the current initiatives in the literature do not focus on developing such a customized, comprehensive, and holistic assessment scheme for assessing and benchmarking the learning outcomes of robotics-enabled STEM education.
Based on the aforementioned literature reviews representing the state-of-the-art research and development activities in this field, it can be posited that there is a big gap in the state-of-the-art works regarding assessing the learning outcomes of robotics-enabled STEM education, especially in the K-12 classes. Thus, an appropriately designed comprehensive assessment and benchmarking scheme to assess and benchmark the learning outcomes of robotics-enabled STEM education is still a future work. This paper aims to contribute to this direction, and bridge the gaps in the state-of-the-art knowledge and practices of assessing the learning outcomes of robotics-enabled STEM education.

Research Questions
Considering the gaps in the state-of-the-art research related to learning outcomes of robotics-enabled STEM lessons (education) as above [17][18][19][20][21][22][23][24][25][26], answers to the following research questions will be sought in this paper: Q 1: Are the methods and metrics used to assess the learning outcomes of roboticsenabled STEM lessons different for teaching different subjects (e.g., math and science) and different grade (e.g., 6-8) students? Q 2: How are the methods and metrics used to assess the learning outcomes of robotics-enabled STEM lessons different from or similar to that of the traditionally taught non-robotics-based STEM lessons?

Materials and Resources
In total, 20 math and 20 science teachers from 20 middle schools were recruited to participate in the pilot study. As the sampling procedures, for the teachers, we contacted the selected schools and circulated a recruitment notice. On the notice, we mentioned that the teachers who had good experience in math and science teaching following traditional methods but did not have experience in robotics-enabled lessons would apply. We then conducted an interview with each teacher separately, and conveyed the information regarding the duties and responsibilities of the teachers for the study. We also considered the years of experience of each teacher in teaching math or science in middle schools. We then selected the teachers who were found to be the most promising and interested in the proposed study. For students, we randomly selected students from each class of each selected teacher to participate in the pilot study.
We took consent of the students and teachers, and preserved their consents. The study was conducted following local ethical standards and principles for human subjects, and we were aware of the privacy and security principles for human subjects (students and teachers) mentioned in the ethical standards. We then trained the teachers on how to develop and implement robotics-enabled math and science lessons. We and the trained teachers together developed 10 math and 10 science lessons using robotics as a pedagogical tool. All lessons were planned to meet the state standards for middle school science and math based on the Next Generation Science Standards (NGSS) and the Common Core State Standards for Math (CCSSM) [44]. As an example, a math lesson is described as follows. The teachers used LEGO robots to create illustrations to teach number line to middle school students of grade 6 in their math lessons as exhibited in Figure 1. In such an example, a number line was drawn on the classroom floor. The number line was divided into positive and negative digits. The space between two adjacent digits had a value of |1|. A LEGO robot vehicle was programmed to move along the number line. The touch buttons were used to give addition and subtraction commands to the robot. The robot illustrated the addition or the subtraction results through its movement along the number line. For example, if it was commanded to subtract 3 from 2 (i.e., 2-3), the robot started to move from '2 towards '0 , moved for 3 spaces and stopped at '−1 . Thus, the robot wanted to illustrate that 2-3 = −1 (see Figure 1). Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lesson [1][2][3][4][5][6].
school students of grade 6 in their math lessons as exhibited in Figure 1. In such an example, a number line was drawn on the classroom floor. The number line was divided into positive and negative digits. The space between two adjacent digits had a value of |1|. A LEGO robot vehicle was programmed to move along the number line. The touch buttons were used to give addition and subtraction commands to the robot. The robot illustrated the addition or the subtraction results through its movement along the number line. For example, if it was commanded to subtract 3 from 2 (i.e., 2-3), the robot started to move from '2′ towards '0′, moved for 3 spaces and stopped at '−1′. Thus, the robot wanted to illustrate that 2-3 = −1 (see Figure 1). Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lesson [1][2][3][4][5][6]. In another example, teachers put the LEGO robot vehicles at different locations on a sliding surface so that the robots could come down from higher positions to lower positions along the sliding path, as illustrated in Figure 2a. Similarly, a robot was programmed to move a block on the floor, as illustrated in Figure 2b. Those illustrations were used to teach the students about fundamentals concepts of mass, force, friction, displacement, velocity, speed, acceleration, momentum, etc. in their science lessons. Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lessons. Similar examples of robotics-enabled STEM lesson design and development can be found in [1][2][3][4][5][6]. The trained teachers separately implemented the robotics-enabled math and science lessons in actual classroom settings. Students were divided into teams, and they observed robot activities for the lessons preprogrammed by their teachers, interacted with the ro- In another example, teachers put the LEGO robot vehicles at different locations on a sliding surface so that the robots could come down from higher positions to lower positions along the sliding path, as illustrated in Figure 2a. Similarly, a robot was programmed to move a block on the floor, as illustrated in Figure 2b. Those illustrations were used to teach the students about fundamentals concepts of mass, force, friction, displacement, velocity, speed, acceleration, momentum, etc. in their science lessons. Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lessons. Similar examples of robotics-enabled STEM lesson design and development can be found in [1][2][3][4][5][6].
school students of grade 6 in their math lessons as exhibited in Figure 1. In such an exam-ple, a number line was drawn on the classroom floor. The number line was divided into positive and negative digits. The space between two adjacent digits had a value of |1|. A LEGO robot vehicle was programmed to move along the number line. The touch buttons were used to give addition and subtraction commands to the robot. The robot illustrated the addition or the subtraction results through its movement along the number line. For example, if it was commanded to subtract 3 from 2 (i.e., 2-3), the robot started to move from '2′ towards '0′, moved for 3 spaces and stopped at '−1′. Thus, the robot wanted to illustrate that 2-3 = −1 (see Figure 1). Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lesson [1][2][3][4][5][6]. In another example, teachers put the LEGO robot vehicles at different locations on a sliding surface so that the robots could come down from higher positions to lower positions along the sliding path, as illustrated in Figure 2a. Similarly, a robot was programmed to move a block on the floor, as illustrated in Figure 2b. Those illustrations were used to teach the students about fundamentals concepts of mass, force, friction, displacement, velocity, speed, acceleration, momentum, etc. in their science lessons. Lesson materials such as lesson descriptions, activity sheets, instruction procedures, etc. were developed for the lessons. Similar examples of robotics-enabled STEM lesson design and development can be found in [1][2][3][4][5][6]. The trained teachers separately implemented the robotics-enabled math and science lessons in actual classroom settings. Students were divided into teams, and they observed robot activities for the lessons preprogrammed by their teachers, interacted with the ro- The trained teachers separately implemented the robotics-enabled math and science lessons in actual classroom settings. Students were divided into teams, and they observed robot activities for the lessons preprogrammed by their teachers, interacted with the robots, performed lesson activities, completed activity sheets, etc. instructed by their teachers. The robotics-enabled math and science lessons, trained teachers, selected students, and the classroom settings were used as the materials and resources for the research presented herein.

Research Methods and Procedures
The research methods presented herein were based on surveys with the teachers [50,51], and observations on students and their classroom activities [52,62]. The research procedures included two phases (steps): (i) development of the assessment methods and metrics for the learning outcomes of students for their robotics-enabled lessons, and (ii) validation and generalization of the assessment methods and metrics in actual classroom environment. For the first phase, a survey was conducted with the math and science teachers separately [50,51]. The survey questionnaires are given in Appendix A. The survey was conducted with each teacher twice: (i) before their trainings on robotics-enabled lessons (it was treated as the traditional or non-robotic based teaching), (ii) after their trainings on and implementation of robotics-enabled lessons (it was treated as the robotic-based/enabled teaching). The participating math and science teachers responded to the surveys separately based on their classroom experiences and observations of student activities [52,62]. They also conducted self-brainstorming to fill out the survey questionnaires [53]. The survey questionnaires were given to the responding teachers and they were allowed to take two days to think individually and respond to the survey questionnaires. Hence, the responses received from the teachers were treated as their well-thought opinions that were based on their teaching and classroom experiences. The name of each responding teacher was coded so that the true identity of the responder could not be identified while processing the responses data, as per the ethical standards. This phase of research was conducted to develop a set of assessment methods and metrics for assessing expected learning outcomes of students for their math and science lessons for both traditional and robotics-enabled scenarios.
For the second phase, 20 math teachers taught the same topic/lesson (e.g., number line) to the same grade students (e.g., grade 6 students) in their schools using robotics. Similarly, 20 science teachers taught the same topic/lesson (e.g., force/friction) to the same grade students (e.g., grade 6 students) in their schools using robotics. Then, each teacher assessed the learning outcomes of the lesson that he/she taught using the assessment methods and metrics developed in the first phase. The assessment was performed during the class, in a 1-h extra session with the participating students after the class, and during a 1-week follow up period to assess different criteria properly. The learning outcomes between schools and subjects were compared and benchmarked. Then, another user study survey was conducted with the teachers taking their opinions about the usability, practicability, and reliability of the assessment methods and metrics for assessing the learning outcomes separately [54]. The survey was based on a 7-point Likert scale where +3 was the most positive (highest) and −3 was the most negative (lowest) response [27]. The Likert scale is exhibited in Figure 3.
The research methods presented herein were based on surveys with the teachers [50,51], and observations on students and their classroom activities [52,62]. The research procedures included two phases (steps): (i) development of the assessment methods and metrics for the learning outcomes of students for their robotics-enabled lessons, and (ii) validation and generalization of the assessment methods and metrics in actual classroom environment. For the first phase, a survey was conducted with the math and science teachers separately [50,51]. The survey questionnaires are given in Appendix A. The survey was conducted with each teacher twice: (i) before their trainings on robotics-enabled lessons (it was treated as the traditional or non-robotic based teaching), (ii) after their trainings on and implementation of robotics-enabled lessons (it was treated as the roboticbased/enabled teaching). The participating math and science teachers responded to the surveys separately based on their classroom experiences and observations of student activities [52,62]. They also conducted self-brainstorming to fill out the survey questionnaires [53]. The survey questionnaires were given to the responding teachers and they were allowed to take two days to think individually and respond to the survey questionnaires. Hence, the responses received from the teachers were treated as their well-thought opinions that were based on their teaching and classroom experiences. The name of each responding teacher was coded so that the true identity of the responder could not be identified while processing the responses data, as per the ethical standards. This phase of research was conducted to develop a set of assessment methods and metrics for assessing expected learning outcomes of students for their math and science lessons for both traditional and robotics-enabled scenarios.
For the second phase, 20 math teachers taught the same topic/lesson (e.g., number line) to the same grade students (e.g., grade 6 students) in their schools using robotics. Similarly, 20 science teachers taught the same topic/lesson (e.g., force/friction) to the same grade students (e.g., grade 6 students) in their schools using robotics. Then, each teacher assessed the learning outcomes of the lesson that he/she taught using the assessment methods and metrics developed in the first phase. The assessment was performed during the class, in a 1-hour extra session with the participating students after the class, and during a 1-week follow up period to assess different criteria properly. The learning outcomes between schools and subjects were compared and benchmarked. Then, another user study survey was conducted with the teachers taking their opinions about the usability, practicability, and reliability of the assessment methods and metrics for assessing the learning outcomes separately [54]. The survey was based on a 7-point Likert scale where +3 was the most positive (highest) and −3 was the most negative (lowest) response [27]. The Likert scale is exhibited in Figure 3.

Determining the Assessment Criteria and Metrics
The responses of the questionnaires in Appendix A were analyzed. The responses with similar meanings for the first question for math and science lessons/teachers were

Determining the Assessment Criteria and Metrics
The responses of the questionnaires in Appendix A were analyzed. The responses with similar meanings for the first question for math and science lessons/teachers were tallied under different key terms separately as Tables 1 and 2 show for the math and science lessons respectively. Here, the key terms can be considered as the assessment criteria for learning outcomes, and the criteria together can be called the assessment metrics. The tables compare different criteria (key terms) proposed by the responding teachers for assessing the learning outcomes of their students for the math and science lessons between traditional and robotics-enabled teaching methods [30]. Here, the traditionally taught lessons and participants could serve as the control group for the robotics-enabled lessons group when we compared the perceived learning outcomes between the traditionally taught and the robotics-enabled lessons. The tables also show the frequencies of the responses. For example, "Problem solving ability (9)" in Table 1 for the traditional teaching of the math lesson means that out of 20 responding math teachers, 9 teachers mentioned on their responses to the question 1 in Appendix A that the problem solving ability of the students should be considered as a criterion to assess the learning outcomes of the students for their math lessons. In other words, the problem solving ability should be an outcome of the math lesson as opined by 9 teachers out of 20. Table 1. Comparison of the criteria proposed by the responding math teachers for assessing the learning outcomes of the students for the math lessons between traditional and robotics-enabled teaching.  (2) Tables 1 and 2 show that, in general, the responding teachers expected better learning outcomes in terms of assessment criteria and their frequencies for the robotics-enabled teaching over the traditional teaching for both math and science lessons. It is assumed that the teachers' expectations of learning outcomes for the robotics-enabled teaching increased because they realized higher monetary investment, high-tech kinesthetic teaching and learning artifacts, intense classroom activities and better pedagogical clarity and transparency associated with the robotics-enabled teaching over the traditional teaching [1][2][3][4][5][6]. For example, intense classroom activities centering round the robots occurred in the classrooms for robotics-enabled teaching of math and science lessons. The students and teachers together needed to implement the lessons using robotics in the classroom environment, and the students needed to manage and complete such activities in teams within a specified timeframe. As a result, teachers might have expectations that the performed activities would create higher abilities and skills in the students related to content knowledge, leadership, social responsibility, time management, punctuality, teamwork, decision making, interpersonal relationship, classroom engagement, problem solving, critical thinking, professional ethics, communications, basic engineering, ICT, practical work, experimentation, research formu-lation, organization and planning, troubleshooting, contingency management, adaptation to changes, creativity, innovation, etc. The well-developed robotics-based learning systems and devices might create entrepreneurial thinking in the students. The robotic device as an experiential learning tool might be itself a source of intrinsic and extrinsic motivation to the students that might engage students with their lessons, stimulate continuous and life-long learning, build trust in the learning devices, etc. The tangible and visible robotics learning tools might reduce cognitive workload of students while learning because such tools might reduce the mental demand, temporal demand, frustration, and efforts, and increase the learning performance simultaneously [28]. Students needed to utilize different multidisciplinary and interdisciplinary concepts to work with robotics-enabled lessons, and to complete the lesson activities. Students from different culture and races needed to work in teams to learn their lessons using robotics as kinesthetic learning tools. All those might enhance their interdisciplinary and multidisciplinary skills, and inculcate an inclusive, diverse, and multicultural mentality in the students. Table 2. Comparison of the criteria proposed by the responding science teachers for assessing the learning outcomes of the students for the science lessons between traditional and robotics-enabled teaching. The robotics-enabled teaching was usually more student-centered while the traditional teaching was more teacher-centered [45]. As the results in Tables 1 and 2 show, it was assumed that the paradigm shifted in the centering of the classroom activities (from teachercentered to student-centered), which might create higher expectations of the teachers about the learning outcomes of their students for the student-centered robotics-enabled teaching [45]. The results might also indirectly reveal that the robotics-enabled teaching should produce better learning outcomes of students in order to be admired by their teachers, parents, school administration, and school districts.

Assessment Criteria of Learning Outcomes and Their Frequencies in Parentheses
The results in Tables 1 and 2 show slight differences in the expected learning outcomes between math and science lessons for both traditional and robotics-enabled teaching methods. For example, the teachers expected computational thinking ability as the learning outcome for the math lessons. However, computational thinking ability was not expected for the science lessons. Instead, the imagination ability of students was expected for the science lessons. The reasons may be that computational thinking is more related to math than to science. However, students should have more imagination ability to imagine science concepts to comprehend them through developing or using tangible learning tools such as the robotic devices. It is further observed that the frequencies of responses from the teachers for different learning outcomes for the science lessons were greater than that for the math lessons. The reasons may be that the kinesthetic learning using robotics was expected to influence the science learning more intensely than the math learning because the math is more abstract than the science as best as it was understood while observing classroom activities associated with the lessons.
For the second question in Appendix A, 18 teachers out 20 for the math lessons opined that they did not expect different learning outcomes for different grades of middle school students. However, 19 teachers out of 20 for the science lessons did not expect different learning outcomes for different grades of middle school students. It might happen because the syllabi, standards, and depth of the education for different grades of students in middle schools are not enough different for the teachers to perceive different learning outcomes for different grade students. Hence, it is posited that the same or similar assessment criteria of learning outcomes may be used for different grades of students in middle schools. However, the differences may be easily understandable if the expected learning outcomes between middle school grades (e.g., grade 6) and high school grades (e.g., grade 10) are compared.
Then, the responses (the proposed criteria of learning outcomes) in Tables 1 and 2 for the robotics-enabled teaching were grouped under different themes separately through thematic analysis [2,55]. Table 3 shows the themes of learning outcomes for the roboticsenabled math lessons (based on Table 1) as an example. Then, the frequencies for all the criteria of each theme were added separately, as Table 4 shows. Figure 4 shows the relative and timely, create a life-long learning aspiration in the students based on their long term relationship with the tangible interactive robotic platform, and finally enhance their teamwork ability through the activities they perform in teams during the lessons centering round the robotic platform. The results show that improvements in scientific/technical, managerial/leadership, intellectual, cognitive, and social abilities of the students are also the expected outcomes of learning math lessons through a roboticsenabled teaching method. Similar results were obtained for robotics-enabled science lessons. The results in general mean that the learning outcomes of robotics-enabled math and science lessons can be treated as satisfactory if the assessment results for the mentioned criteria of learning outcomes (Tables 1 and 2) are satisfactory, and/or the assessment results for each of the outcome themes (Table 3) are satisfactory. Contribution of each theme in the total contribution. Results in Figure 4 show that the improvement in the behavioral characteristics of the students through their robotics-enabled math lessons is the most expected learning outcome. Based on the results, it is realized that the robot is not simply a pedagogical tool that can help learn the subject matter (or content knowledge), which is called here the educational outcome. Instead, the robot should generate intrinsic and extrinsic motivation in the students, enhance their trust in the robot as a learning tool, improve their physical and mental engagement with the learning platform such as the robotic platform, motivate them to attend the school regularly Table 3. Determination of different themes of criteria of assessing the learning outcomes of the students for robotics-enabled math lessons.

Assessment Criteria (Expected Learning Outcomes) Themes
Test results (20)  Educational Life-long learning aspiration (7)    Now, the question is what metrics are to be used to assess the mentioned c learning outcomes, and how. The answer to this question is as follows. The natur assessment criterion in Table 1 for the robotics-enabled math lessons was critic lyzed with respect to the scenarios where the students performed the robotics activities. Then, the assessment metric for each criterion was proposed, being ins the existing body of knowledge on each criterion found in the literature, consid nature of each criterion with respect to the activity scenario, and conducting bra ing with the concerned teachers and education experts. The results are given in (in Appendix B). Similar results were found for the robotics-enabled science less The results in Table A1 show that in some cases, the exact assessment met not proposed. Those were kept open for mainly two reasons: (i) there might have options for the metrics to assess those criteria of learning outcomes depending tions and scenarios, and (ii) it was difficult to decide the metrics unless the actual was known in general. In such cases, the assessment metrics would need to be det by teachers and/or education researchers implementing robotics-enabled lessons their experiences, knowledge, understanding, and observations. On the other h assessment methods may also be influenced by the assessment metrics, and vi For example, a Likert scale is to be used as an assessment metric for a learning ou the outcome needs to be assessed subjectively and quantitatively [27], and vice v the criteria where tests/quizzes and surveys were proposed as the assessment special quizzes/tests and surveys might need to be designed and administered. Th TLX and work sampling should follow the standard NASA TLX and work samp plementation methods and materials, respectively [27,28]. For the criteria of qu observations, the teachers and/or education researchers will need to observe t room scenarios and activities, assess the learning outcomes qualitatively, and p qualitative report on the assessment of each specific assessment criterion of learn come. Note that in actual implementation scenarios, the learning outcomes ma favorable for all assessment criteria, which may open a road to improvements. Now, the question is what metrics are to be used to assess the mentioned criteria of learning outcomes, and how. The answer to this question is as follows. The nature of each assessment criterion in Table 1 for the robotics-enabled math lessons was critically analyzed with respect to the scenarios where the students performed the robotics-enabled activities. Then, the assessment metric for each criterion was proposed, being inspired by the existing body of knowledge on each criterion found in the literature, considering the nature of each criterion with respect to the activity scenario, and conducting brainstorming with the concerned teachers and education experts. The results are given in Table A1 (in Appendix B). Similar results were found for the robotics-enabled science lessons.

Validation of the Learning Outcome Assessment Methods and Metrics
The results in Table A1 show that in some cases, the exact assessment metrics were not proposed. Those were kept open for mainly two reasons: (i) there might have multiple options for the metrics to assess those criteria of learning outcomes depending on situations and scenarios, and (ii) it was difficult to decide the metrics unless the actual scenario was known in general. In such cases, the assessment metrics would need to be determined by teachers and/or education researchers implementing robotics-enabled lessons based on their experiences, knowledge, understanding, and observations. On the other hand, the assessment methods may also be influenced by the assessment metrics, and vice versa. For example, a Likert scale is to be used as an assessment metric for a learning outcome if the outcome needs to be assessed subjectively and quantitatively [27], and vice versa. For the criteria where tests/quizzes and surveys were proposed as the assessment metrics, special quizzes/tests and surveys might need to be designed and administered. The NASA TLX and work sampling should follow the standard NASA TLX and work sampling implementation methods and materials, respectively [27,28]. For the criteria of qualitative observations, the teachers and/or education researchers will need to observe the classroom scenarios and activities, assess the learning outcomes qualitatively, and prepare a qualitative report on the assessment of each specific assessment criterion of learning outcome. Note that in actual implementation scenarios, the learning outcomes may not be favorable for all assessment criteria, which may open a road to improvements.

Validation of the Learning Outcome Assessment Methods and Metrics
For the second phase of the research, Tables A2 and A3 (in Appendix B) compare the learning outcomes assessed using different criteria and metrics proposed earlier (see Table A1) between robotics-enabled math and science lessons for different participating schools. The results show that the proposed assessment criteria and metrics of learning outcomes (Table A1) can be implemented successfully to understand the status of learning outcomes of robotics-enabled math and science lessons. The results also show that the robotics-enabled science lessons produced slightly better outcomes. However, based on the t-test results between the subjects (math and science), it was found that the differences were not statistically significant (p > 0.05) for each criterion of learning outcome between math and science lessons. It might happen due to the reasons that the science concepts might be less abstract but more related to real-life scenarios and thus the tangible robotic platform as a learning tool might impact the science learning outcomes more intensely than the math learning outcomes. Figure 5 further exhibits the slight differences in the learning outcomes for different assessment criteria between math and science lessons.
For the second phase of the research, Tables A2 and A3 (in Appendix B) compare the learning outcomes assessed using different criteria and metrics proposed earlier (see Table A1) between robotics-enabled math and science lessons for different participating schools. The results show that the proposed assessment criteria and metrics of learning outcomes (Table A1) can be implemented successfully to understand the status of learning outcomes of robotics-enabled math and science lessons. The results also show that the robotics-enabled science lessons produced slightly better outcomes. However, based on the t-test results between the subjects (math and science), it was found that the differences were not statistically significant (p > 0.05) for each criterion of learning outcome between math and science lessons. It might happen due to the reasons that the science concepts might be less abstract but more related to real-life scenarios and thus the tangible robotic platform as a learning tool might impact the science learning outcomes more intensely than the math learning outcomes. Figure 5 further exhibits the slight differences in the learning outcomes for different assessment criteria between math and science lessons.  Figure 5a compares the mean assessment scores of learning outcomes assessed based on the Likert scale (max. score +3), and Figure 5b compares the mean assessment scores of learning outcomes assessed as the percentages of the total obtainable scores (max. score 100%) for different assessment criteria between the math and science lessons for school#1 as an example. These results as a whole validate the effectiveness and prove the practicality of the proposed assessment criteria and metrics of learning outcomes for the robotics-   Figure 5a compares the mean assessment scores of learning outcomes assessed based on the Likert scale (max. score +3), and Figure 5b compares the mean assessment scores of learning outcomes assessed as the percentages of the total obtainable scores (max. score 100%) for different assessment criteria between the math and science lessons for school#1 as an example. These results as a whole validate the effectiveness and prove the practicality of the proposed assessment criteria and metrics of learning outcomes for the robotics-enabled math and science lessons. Therefore, the metrics can be used to compare and benchmark the learning outcomes between students, student grades, subjects, schools, and school districts.
Based on the user study (teachers' opinions) results, Figure 6 compares the usability, practicability, and reliability of the assessment criteria and metrics of learning outcomes between robotics-enabled math and science lessons. The results in Figure 6 show that the assessment scheme was proven usable, practical, and reliable as it was opined by the users (teachers). The scheme was proven slightly better in terms of usability, practicability, and reliability for the science lesson in comparison with the math lesson. The reasons may be similar as explained earlier. These results validate the effectiveness and prove the practicality of the proposed assessment criteria and metrics for assessing the learning outcomes of robotics-enabled math and science lessons. enabled math and science lessons. Therefore, the metrics can be used to compare and benchmark the learning outcomes between students, student grades, subjects, schools, and school districts.
Based on the user study (teachers' opinions) results, Figure 6 compares the usability, practicability, and reliability of the assessment criteria and metrics of learning outcomes between robotics-enabled math and science lessons. The results in Figure 6 show that the assessment scheme was proven usable, practical, and reliable as it was opined by the users (teachers). The scheme was proven slightly better in terms of usability, practicability, and reliability for the science lesson in comparison with the math lesson. The reasons may be similar as explained earlier. These results validate the effectiveness and prove the practicality of the proposed assessment criteria and metrics for assessing the learning outcomes of robotics-enabled math and science lessons.

Discussion
The results of the presented study are limited in many ways. A few of the limitations can be summarized as follows: (i) the results are limited to 6-8 grade students in middle schools only, and the results may not be readily applicable to the elementary and high school grades and college levels, (ii) the study was conducted using LEGO (Mindstorms) robots, and it is yet to investigate the effectiveness of the results if other robotic platforms are used, (iii) the study was conducted with a limited number of lesson scenarios, and the results may be changed or may need to be adjusted if more lessons with different scenarios or the same lessons with different and multiple scenarios are implemented, (iv) the study was conducted with a limited number of teachers and students, and the results may need to be adjusted if greater number of teachers and students are recruited, (v) teaching experiences of teachers and previous experiences of teachers and students with robotics may also impact the results that were not considered in the presented study, (vi) the study considered only a few representative lessons from math and science, but lessons from engineering and technology need to be considered to have a clear picture about the expected learning outcomes of robotics-enabled lessons, etc. However, it is possible to address all of these limitations properly. Despite having limitations, this study can convey the preliminary information about assessing and benchmarking the expected learning outcomes of robotics-enabled STEM lessons, which is significant. The results are in line with what were found in the state-of-the-art traditional teaching and learning methods [17][18][19][20][21][22][23][24][25][26]30,46]. However, the results obtained herein augment the scope of the state-of-the-art initiatives, and increase the effectiveness of the existing results to make them suitable for teaching and learning robotics-enabled lessons.

Discussion
The results of the presented study are limited in many ways. A few of the limitations can be summarized as follows: (i) the results are limited to 6-8 grade students in middle schools only, and the results may not be readily applicable to the elementary and high school grades and college levels, (ii) the study was conducted using LEGO (Mindstorms) robots, and it is yet to investigate the effectiveness of the results if other robotic platforms are used, (iii) the study was conducted with a limited number of lesson scenarios, and the results may be changed or may need to be adjusted if more lessons with different scenarios or the same lessons with different and multiple scenarios are implemented, (iv) the study was conducted with a limited number of teachers and students, and the results may need to be adjusted if greater number of teachers and students are recruited, (v) teaching experiences of teachers and previous experiences of teachers and students with robotics may also impact the results that were not considered in the presented study, (vi) the study considered only a few representative lessons from math and science, but lessons from engineering and technology need to be considered to have a clear picture about the expected learning outcomes of robotics-enabled lessons, etc. However, it is possible to address all of these limitations properly. Despite having limitations, this study can convey the preliminary information about assessing and benchmarking the expected learning outcomes of robotics-enabled STEM lessons, which is significant. The results are in line with what were found in the state-of-the-art traditional teaching and learning methods [17][18][19][20][21][22][23][24][25][26]30,46]. However, the results obtained herein augment the scope of the state-of-the-art initiatives, and increase the effectiveness of the existing results to make them suitable for teaching and learning robotics-enabled lessons.
In the integrative model of interdisciplinary learning, knowledge, modes of inquiry, and pedagogies from multiple disciplines (multiple disciplines may mean multiple majors, subjects, topics, ideas, solutions, concepts, etc.) can be brought together within the context of a single course or program or practice [56]. Students learning in this model are able to apply the knowledge gained in one discipline or subject area to different other disciplines or subject areas or concepts to deepen overall learning experiences [56]. On the other hand, active learning method asks learners to fully participate in their learning by thinking, discussing, investigating, and creating. In active learning, students/learners are asked to practice skills, solve problems, struggle with complex questions, propose solutions, and explain ideas in their own words through speaking, writing, and discussing [57]. Research shows that active learning methods are more effective than traditional lecturing for student learning [57]. Experiential learning is another form of education closely related to active learning where students can learn though experiences [58]. It may be hypothesized that experiential learning and active learning are complementary with each other; they can be integrated and implemented with interdisciplinary learning concepts, and such an integration may be more effective and impactful than individual active learning, experiential learning, or interdisciplinary learning. Robotics can be used as a pedagogical and learning tool that can integrate and foster active learning, experiential learning, and interdisciplinary learning [56][57][58]. However, effective applications of such an integrated model to highly impact the STEM education are usually not observed in the literature, and the expected learning outcomes of such an integration are yet to be known. The results presented herein may inspire this multimodal integrative model of education.

Conclusions and Future Works
Based on a survey conducted with 40 middle school math and science teachers having experiences of developing and implementing robotics-enabled lessons, a set of expected learning outcomes of robotics-enabled STEM education (here, only the math and science education) was derived, and the metrics and methods to evaluate each outcome were proposed. The survey results showed that the expected learning outcomes were not only related to the educational gains (content knowledge), but also to the improvements in the behavioral, social, scientific, cognitive and intellectual attitudes, and aptitudes of the students. The results showed clear differences in the expected learning outcomes between the traditional and robotics-enabled experiential methods of teaching. The reasons might be the higher level investment of cognitive resources and artifacts in robotics-enabled lessons. However, the expected learning outcomes between the math and the science lessons were not so significant. The results (the set of learning outcomes, assessment metrics and methods) were then validated through actual classroom applications, and the effectiveness of the assessment methods and metrics were evaluated based on a user study with the participating teachers. The user study results proved the effectiveness of the proposed methods and metrics of learning outcomes of the robotics-enabled lessons. The main contribution of this article is the determination of the assessment and benchmarking criteria, metrics, and methods for assessing learning outcomes of robotics-enabled STEM lessons, which is novel, practical, and useful to advance robotics-enabled kinesthetic K-12 STEM education in particular, and the college-level STEM education in general. The results uphold the significance of active learning and experiential learning. The proposed evaluation scheme of learning outcomes can be used to justify the benefits and advantages of robotics-enabled STEM education, benchmark the outcomes, help improve preparations of instructors and teaching institutions and develop more effective robotic systems and demonstrations under design-based research, motivate education decision-makers to confer on robotics-enabled STEM education and curricula development, and thus can promote robotics-enabled K-16 STEM education practices. All these can help meet the learning vision of enhancing the learning outcomes of STEM lessons taught through the application of robotics as a kinesthetic experiential pedagogical tool.
In the future, the survey will be conducted with a larger number of STEM teachers and learners to enhance the generality of the results. The expected learning outcomes for other grades of students will be investigated. The results will be verified and validated using other robotic platforms for teaching more STEM lessons to K-12 and college students. Acknowledgments: The research was partly conducted collaborating with teachers of different middle schools under the New York City Department of Education. The author thanks to the teachers and students who attended the studies and responded to the surveys and interviews.

Conflicts of Interest:
The author declares no conflict of interest.

Ethics Statements:
The study was guided by ethics. The study was conducted following local ethical standards and principles for human subjects.

Appendix A
Funding: This particular research presented herein received no external/additional funding. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Available from the author on request.

Acknowledgments:
The research was partly conducted collaborating with teachers of different middle schools under the New York City Department of Education. The author thanks to the teachers and students who attended the studies and responded to the surveys and interviews.

Conflicts of Interest:
The author declares no conflict of interest.

Ethics Statements:
The study was guided by ethics. The study was conducted following local ethical standards and principles for human subjects. Table A1. Proposed metric for assessing each criterion of learning outcomes of students for robotics-enabled math lessons.

Test results
Test scores on selected math topics can be used to assess this criterion. Quizzes/tests can be arranged by concerned teachers. In addition, the Dimension of Success (DoS) observation tool can be used to assess math knowledge and practices [31].
percentage (%) of test scores obtained Computational thinking ability [2] Computational thinking can be assessed based on custom-developed specific problem-solving scenarios developed and implemented by the teachers. For example, a specific scenario can be developed where students need to solve a particular problem that reflects students' computational thinking abilities. The teachers can observe the students and assess the computational thinking ability of each student separately or of the team as a whole. The teachers can use a 7-point Likert scale to rate the computation thinking ability subjectively based Subjective rating score (see note 2) Appendix B Table A1. Proposed metric for assessing each criterion of learning outcomes of students for robotics-enabled math lessons.

Test results
Test scores on selected math topics can be used to assess this criterion. Quizzes/tests can be arranged by concerned teachers. In addition, the Dimension of Success (DoS) observation tool can be used to assess math knowledge and practices [31].

percentage (%) of test scores obtained
Computational thinking ability [2] Computational thinking can be assessed based on custom-developed specific problem-solving scenarios developed and implemented by the teachers. For example, a specific scenario can be developed where students need to solve a particular problem that reflects students' computational thinking abilities. The teachers can observe the students and assess the computational thinking ability of each student separately or of the team as a whole. The teachers can use a 7-point Likert scale to rate the computation thinking ability subjectively based on observations (see note 1). In addition, the computational thinking can be assessed taking inspiration from the methods proposed by Kong [32].
Subjective rating score (see note 2)

Intrinsic and extrinsic motivation
Intrinsic and extrinsic motivation expressed through students' interest in math and their awareness levels for their math-related careers can be assessed directly using a subjective rating scale (e.g., a 7-point Likert scale) based on observations and interviews with the participating students administered by concerned teachers [4]. In addition, the Intrinsic Motivation Instrument (i.e., Self-Determination [33]) may be used to assess the motivation levels of the students for their career path in math. STEM Career Awareness tool may be used to assess their math-related career awareness levels [34]. The PEAR Institute's Common Instrument Suite Student (CIS-S) survey may be used to assess students' math-related attitudes in terms of math engagement, identity, career interest, and career knowledge and activity participation [35]. The DoS can be used to assess math activity engagement, math practices (inquiry and reflection), and youth development in math [31].
Subjective rating score

Trust in robotics
Trust of students in robotics as a pedagogical tool expressed through students' interest to rely on or to believe in the math-related solutions provided by the robotic system can be assessed directly by concerned teachers using a subjective rating scale (e.g., a 7-point Likert scale) based on observations and interviews with the participating students administered by the concerned teachers [5]. See note 3 for more.

Engagement in class activities
Work sampling method may be used to assess students' engagement in their robotics-enabled lessons [36]. In this method, the teachers may observe each student separately or the team as a whole after a specified time interval (e.g., after every 5 min) during the class, and mark whether they are engaged in their lessons or not. At the end of the observations, the percentage of total class time the students are engaged (or not engaged) can be determined. This is a probabilistic but quantitative assessment method. The following formula may be used to assess student engagement (E) using work sampling, where O t is the total number of observations in a class and O e is the total number of observations in that class when student(s) was/were found engaged.
Percentage (%) of total class time students are engaged in the lesson

Class attendance and punctuality
Attendance record can be used to assess each student's attendance and punctuality (e.g., timely attendance or late attendance) in the class. Percentage (%) of attendance in a specific time period can be calculated. In addition, percentage of timely or late attendance in a specific time period may also be calculated. The objective is to check if student attendance in regular classes increases after participating in the robotics-enabled lessons or being inspired by the robotics-enabled lessons.

Interpersonal relationship
The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their interpersonal relationships (e.g., how a student addresses his/her team members, reacts at his/her team members' opinions, etc.), and assess each student or the team as a whole using a 7-point Likert scale for their interpersonal relationships. Alternatively, the assessment may be performed as satisfactory or unsatisfactory. In addition, the CIS-S survey can be used to assess the 21st century skills or the socio-emotional learning (SEL) of the students, e.g., relationships with peer students and teachers [35].
Subjective rating score Engineering and ICT skills Tests/quizzes administered by the teachers on students' engineering and ICT skills can be used to assess this criterion.

Percentage (%) of test scores obtained
Life-long learning aspiration The teachers can observe the students for their robotics-enabled lessons, take interviews of each student to know their future plans and goals about their math learning and applications, and assess each student or the team as a whole using a 7-point Likert scale for their life-long learning aspiration.
Subjective rating score

Hands-on and practical ability
Observations administered by the teachers on students' hands-on practical works during a robotics-enabled lesson can be used to assess this criterion. The teachers can observe the class activities performed by the students and rate the hands-on and practical ability of each student or of the team using a 7-point Likert scale.
Subjective rating score Lab skills and experiment ability Observations administered by the teachers on students' lab skills and experiment ability during an experiment conducted by the students as a part of a robotics-enabled lesson can be used to assess this criterion. The teachers can observe the class activities and rate the lab skills and experiment ability of each student or of the team using a 7-point Likert scale.
Subjective rating score

Problem solving ability
Observations administered by the teachers on students' problem solving ability as a part of a robotics-enabled lesson can be used to assess this criterion. Assume, there is a problem related to a real-world situation in a robotics-enabled lesson that the students need to solve using math. The students should identify the problem, formulate the problem and determine the strategies to solve the problem using math knowledge and skills. The teachers can observe the ability of each student or of the team in these efforts and rate their abilities using a 7-point Likert scale. The CIS-S survey can also be used to assess the 21st century skills or the socio-emotional learning (SEL) of students, e.g., problem solving/perseverance [35].
Subjective rating score

Formulation of research strategy
Observations administered by the teachers on students' formulation of research strategy during a robotics-enabled lesson can be used to assess this criterion. Assume, there is a problem in a robotics-enabled lesson that the students need to solve using math. The students should identify the problem, formulate the problem, identify the objective, determine hypotheses and research questions, determine the experimental methods and procedures, and analyze the results with future directions. The teachers can observe the ability of each student or of the team in these efforts and rate their abilities using a 7-point Likert scale.

Teamwork ability
The youth teamwork skills survey can be used to assess the teamwork ability [37]. In addition, the teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their teamwork ability (e.g., how the students split the entire activities of the lesson and assign them to different team members of the team), and assess each student or the team as a whole using a 7-point Likert scale for their teamwork ability.
Subjective rating score Cognitive workload in learning NASA TLX can be administered by the teachers on the participating students at the end of each robotics-enabled lesson [28]. Note that the least cognitive workload is the best [29].
Percentage (%) total cognitive workload Adapting to new situations and changes The teachers can observe the students for their robotics-enabled lessons, identify a few cues relevant to adapting to new situations and changes (e.g., whether a student can adjust if he/she is transferred to a new team or if a sudden change occurs in the lesson activities), and assess each student or the team as a whole using a 7-point Likert scale for their ability to adapt with new situations and changes.
Subjective rating score Respect for diversity and multiculturality The teachers can observe the students for their robotics-enabled lessons, identify a few cues relevant to respect for diversity and multiculturality (e.g., whether a student can adjust with another team member who has different nationality, color, ethnicity, food habits, etc.), and assess each student or the team as a whole using a 7-point Likert scale for their respect for diversity and multiculturality.
Subjective rating score

Professional ethics
The teachers can observe the students for their robotics-enabled lessons, identify a few ethical cues relevant to the class events (e.g., whether a student captures and records true data and does not manipulate the data) and assess each student or the team as a whole using a 7-point Likert scale for their professional ethics.
Subjective rating score

Troubleshooting and contingency
The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to troubleshooting and contingency (e.g., how a student or a team troubleshoots in case the robotics-based experimental system does not work temporarily), and assess each student or the team as a whole using a 7-point Likert scale for their ability for troubleshooting and contingency.
Subjective rating score Interdisciplinary/ multidisciplinary abilities The teachers can observe the students for their robotics-enabled lessons and assess each student or the team as a whole using a 7-point Likert scale for their ability to learn and use interdisciplinary and multidisciplinary knowledge and skills (e.g., math content knowledge combined with engineering and computer programming skills to solve a math problem).
Subjective rating score

Reflexive analysis
The teachers can observe the students for their robotics-enabled lessons, take their interviews, and assess each student or the team as a whole using a 7-point Likert scale for their ability to summarize what they learn during the lesson, identify their limitations and develop action plans for improvements in the next lessons.
Subjective rating score Critical thinking ability The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their critical thinking ability (e.g., how the students analyze and compare different alternative possibilities of experimental procedures based on prior findings), and assess each student or the team as a whole using a 7-point Likert scale for their critical thinking ability. In addition, the CIS-S survey can be used to assess the 21st century skills or the socio-emotional learning (SEL) of the students, e.g., critical thinking [35].
Subjective rating score Decision making ability The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their decision-making ability (e.g., how the students make a decision based on the experimental findings, and how they decide the next experiments based on prior findings), and assess each student or the team as a whole using a 7-point Likert scale for their decision-making ability. In addition, the DORA tool can be used to assess reasoning and decision-making abilities of the students [38].
Subjective rating score

Creativity and innovation
The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their creativity and innovation (e.g., how the students propose a new configuration of the robotic device to solve a particular math problem), and assess each student or the team as a whole using a 7-point Likert scale for their creativity and innovation. In addition, the creativity and innovation can be assessed by the approach proposed by Barbot, Besançon, and Lubart [39].

Entrepreneurial ability
The students build a robotic device and verify its suitability to learn math and solve real-world problems using math. Such building practices may inculcate entrepreneurial aspiration in the students, which may direct them towards starting a new business initiative to market their ideas and develop new business ventures in the future. The teachers can observe the students for their robotics-enabled lessons, take interviews of each student to know their business plans if any, and assess each student or the team as a whole using a 7-point Likert scale for their entrepreneurial aspiration or ability. In addition, the entrepreneurial ability of the students can be assessed taking inspiration from the methods proposed by Bejinaru [40], and Coduras, Alvarez and Ruiz [41].
Subjective rating score

Communication skills
The teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their communication skills (e.g., how the students communicate the findings of the experiments during their robotics-enabled lessons to their team leader, teachers and each team member), and assess each student or the team as a whole using a 7-point Likert scale for their communication skills. In addition, the CIS-S survey can be used to assess the 21st century skills or the socio-emotional learning (SEL) of the students, e.g., communication skills [35].
Subjective rating score Leadership ability Based on specific tasks and scenarios during students' engagement with the robotics-enabled lesson, the surveys proposed by Mazzetto [42] and Chapman and Giri [43] can be used to assess leadership skills of the students. Alternatively, the teachers can observe the students for their robotics-enabled lessons, identify a few cues related to their leadership ability (e.g., how the students decide their leader for a lesson, how the leader directs the team members towards the goal of the lesson, and how the student members follow the directions of the leader), and assess each student or the team as a whole using a 7-point Likert scale for their leadership ability.
Subjective rating score Organizational and planning ability The teachers can observe the students for their robotics-enabled lessons, identify a few issues related to organization and planning of the robotics-enabled lesson (e.g., how the students split the responsibility of each team member and determine and ensure the required resources for each member in each step/phase of the entire lesson), and assess each student or the team as a whole using a 7-point Likert scale for their organizational and planning ability.
Subjective rating score

Social responsibility
The teachers can observe the students for their robotics-enabled lessons, identify a few social cues relevant to the class events (e.g., whether a student wishes another student in his/her birthday that falls on the day of a robotics-enabled lesson, or how a student feels if another student of the team is known to be sick), and assess each student or the team as a whole using a 7-point Likert scale for their social responsibility.
Subjective rating score Note 1: In the 7-point Likert scale, −3 is the least or worst, 0 is the neutral, and +3 is the highest or the best response. Note 2: The subjective rating score is expressed as a score value between −3 and +3 with a possible difference of |1| between two adjacent scores. Note 3: For some learning outcomes, in addition to the proposed assessment metrics, the assessment may be qualitatively performed as satisfactory or unsatisfactory. Furthermore, teachers can qualitatively assess each outcome and prepare a short qualitative report on each outcome criterion. These can be cross-checked/triangulated with the proposed quantitative metrics under mixed method analyses [2].  Note 4: The mean score of test results for School#1 was 93. Its meaning is as follows: assume 10 students participated in the robotics-enabled math lesson. The teacher determined the mean of the test scores obtained by all of the 10 students in the math test after the math lesson, and found 93 (rounded) as the mean score. Other scores were calculated in the similar way.