Exploring the Features of Educational Robotics and STEM Research in Primary Education: A Systematic Literature Review

: STEM education programs with educational robotics are frequently used in formal or informal education, with participants ranging from kindergarten children up to university students. The widespread implementation of these programs in schools and the growing interest of researchers in the ﬁeld has led several authors/researchers to review and summarize the characteristics of STEM research. However, the literature on the features of STEM research in primary education (kindergarten and primary school) is limited. Therefore, this article is a systematic literature review that tries to enrich the STEM agenda by answering the questions: (a) which study designs are commonly used in STEM interventions, (b) what the characteristics of the sample are (number/age of the students), (c) which equipment and user interfaces (tangible/graphical) are used, and (d) what are the characteristics of the studies (duration, intervention objectives, activities) and how studies’ data were recorded. For this review, 36 out of 337 articles were analyzed and emerged from eight databases, three search-keywords and six exclusion criteria. The examination of the reviewed articles showed, inter alia, that non-experimental design is usually used, that in half of the cases written evaluations are used and the sample size is almost equal between girls and boys. Finally, long-term research is restricted, therefore it is not safe to generalize the ﬁndings of these studies.


Introduction
The foundations of educational robotics (ER) and science, technology, engineering and mathematics (STEM) lie in the learning theory of constructivism, by Piaget. According to Piaget, knowledge is an experience constructed by interaction with the environment [1]. Papert extended the theory of constructivism, stating that when real-world content is used, learning is more effective [2]. Moreover, some researchers claim that the use of robots in STEM teaching and learning might help students to understand related topics in more depth and engage them in complex problem solving [3].
Over the past several years, ER and STEM activities have entered into many schools around the globe. This is because some studies have shown that ER activities appear to increase students' interest and motivation, fostering the learning process [4]. Moreover, ER activities might support teachers in an effort to make their lessons easier and more enjoyable [5,6]. The variety of ER activities seems to attract students and, in some cases, might have cognitive, social and metacognitive benefits at all levels of education [7][8][9][10]. Similarly, other studies argue that students might develop several skills such as computation thinking (CT), problem solving, collaboration and self-efficacy through ER [11]. These skills are essential, as they will help children cope with the challenges of their adult life [12].
On the contrary, there are studies such as Konijn and Hoorn [13], in which students who participated in ER activities did not show statistically significant differences compared to students who attended the standard curriculum. In other words, ER did not have a positive effect compared to classical teaching.
Based on the above, it becomes clear that in order to be able to draw safer conclusions about the value of using robots in STEM teaching, it is necessary to perform an in-depth analysis that will present the characteristics of the studies that have been undertaken in the specific scientific domain. Consequently, this article aims to summarize the features of STEM research in primary education (kindergarten and primary school). In this way, it provides researchers and educators with useful information for the implementation of STEM programs with ER.

Background
Many articles have been written about STEM education and its effect on the educational process. Hence, there are several reviews that have examined and analyzed the impact and benefits of STEM education. For instance, some authors focused on a specific topic (e.g., special education), while others dealt with specific features (e.g., CT assessments).
Benitti [14] used six databases from which he selected 10 articles, and he aimed at primary, middle and high schools. Benitti's review considered that an article would be excluded if it presented only qualitative learning assessment. His research questions were: "(a) What topics (subjects) are taught through robotics in schools? (b) Is robotics an effective tool for teaching?". The results showed that most studies focused on mathematics and physics topics. In addition, ER usually enhances learning, though there are several factors that can affect the outcome, such as the pedagogical approaches, the activities and the working group size.
Xia and Zhong [15] used 22 articles from one database and three rounds of snowballing approach (a snowballing approach uses the reference list of an article to find other articles), aimed at K-12 education. Their research questions were: "(a) How have robotics been incorporated into K-12? (b) What implications for teaching are indicated by these empirical studies?". The results showed that most of the studies were conducted in elementary schools, lasted 8 weeks and up to 40 students were involved, using Lego. Finally, the design of most studies was non-experimental, while observations and questionnaires were used as measurement instruments.
Anwar et al. [16] used five databases from which they selected 147 articles published between 2000-2018. The authors' aim was to determine the specific benefits of K-12 STEM education for student and teachers. Thus, they categorized the studies as "(a) general benefits of educational robots, (b) learning and transfer skills, (c) creativity and motivation, (d) diversity and broadening participation, (e) teachers' professional development". Their findings show that ER can be used as a learning tool even with students who do not show much interest in science and technology. In addition, ER enables a multidisciplinary approach, which helps students develop connections for STEM concepts. Students' creativity and motivation can be enhanced with activities related to everyday life. Moreover, groups like minorities can benefit from ER by developing knowledge and skills. Lastly, the findings show that teachers rely on Massive Open Online Courses (MOOCs) for professional development.
Cutumisu et al. [17] used seven databases from which they selected 39 articles published 2014-2018, and they aimed at K-12 education. The review focused on CT, and especially on the feature classification of CT assessments. The results showed that most assignments aim at algorithmic skills, problem decomposition and logical thinking. In addition, assignments were aligned with STEM subjects, while most studies adopted a quasiexperimental design. Finally, the findings showed that most studies used selected-response items such as multiple-choice questions.
Gao et al. [18] aimed to review the assessment of student learning in STEM education. More specifically, they used 49 articles published between 2000-2019 and they did a double analysis: (a) they categorized the assessments as mono/inter/trans-disciplinary; (b) they categorized the learning objectives as knowledge, skill, practice or affective domain. Their research showed that although several programs aimed at interdisciplinary education, most assessments were monodisciplinary and targeted at knowledge.
Tlili et al. [19], based on activity theory [20] aimed at robot-assisted special education. Authors used eight databases from which 30 articles were selected for their analysis. In particular, they categorized aspects of studies such as disability types, use of robots, learning domains, activity types and type of performance measures. Their analysis showed the necessity for a stronger link between design/implementation and the needs of students with disabilities. On this basis, recommendations are provided to minimize omissions in future designs.
According to the above, the previous reviews dealt with ER and STEM research without focusing on primary education. Specifically, they include studies from kindergarten children up to university students. In addition, some reviews used articles from a specific time period, or examined a specific topic like CT assessments. Therefore, the purpose of our article is to deliver a systematic literature review of STEM research for young students in primary education. However, we do not focus on articles with specific topics (such as special education) or features (such as including quantitative evaluations).
Subsequently, this review tries to deepen the existing STEM literature providing useful information on current trends for educators and researchers. This way, a more detailed view will arise for the features of STEM research for students under 12 years old. To achieve this, multiple aspects of studies that were used in our review have been recorded. In more detail, we have recorded: the sample characteristics (age of students, number of girls/boys, prior experience), the school, the country, the subject of study, the intervention objectives, the study design, the duration of the intervention, the activities/tasks, the size of groups in collaborative learning, the user interface, the data type, the data source, the equipment type and the results of the intervention.
The rest of the paper is organized as follows: first we list the methodology and the process of searches in databases. Following this, with the appropriate criteria, we are led to the articles that we use. Finally, our results are grouped and the findings of STEM research in primary education are discussed.

Methods
According to Okoli [21] there are four major stages to conduct any kind of systematic literature review: The articles were examined on the basis of their title, abstract and content. The articles had to be written in English and published in a journal. In total, 255 out of 337 articles were excluded. From the 82 included journal articles, 46 were duplicates. Therefore, 36 unique/selected articles dealing with STEM research in primary education emerged, as shown in Table 2.  1  69  8  12  7  5  10  0  42  27  12  15  2  65  7  12  21  5  3  4  52  13  7  6  3  203  26  35  49  23  10  18  161  42  27  15  Total  337  41  59  77  33  23  22  255  82 46 36 In detail, the search results for each database are shown in Table 3. In order to be able to organize and analyze the findings, we categorized some aspects of the studies. In detail, the subject of the study refers to the content to be taught, such as: physics, technology, mathematics, bioengineering, history, computer science or robotics (combination of computer science and engineering). The intervention objectives refer to the offered knowledge (such as programming), skills (such as CT, collaboration) and attitudes (such as motivation) that students can acquire. The activities refer to the nature of the tasks students are called to complete, which can be: mathematics, programming and/or engineering. The robotic system is programmed via tangible user interface (TUI), or graphical user interface (GUI). The type of data is qualitative, and/or quantitative. Quantitative data emerged from written evaluations like multiple choices, whereas qualitative data emerged from observation and/or interviews. The data source refers to how the data were recorded, like videos, observations, interviews and/or questionnaires. The study design according to Campbell and Stanley [22] can be represented with X for treatment/intervention, R for randomized assignment and O for observation/measurement. However, in order for qualitative and quantitative research to be presented in the same table, we propose that the representation for observation/measurement be made as follows: • O = measurement through questionnaires (quantitative) • Oq = observation through video, interview or observation by researchers (qualitative) • Om = mixed measurement/observation through quantitative and qualitative data In addition, according to the taxonomy of Trochim and Donnelly [23] and Benitti [14] we classified the study design into three categories: • (true) experimental: a design with random assignment to groups • quasi-experimental: a design with no random assignment to groups • non-experimental: a design without groups Finally, Table 4 shows three examples for each study design category and its representation.

Findings
Execution (stage 4): The results of the systematic literature review were sorted into tables, so that we could analyze and discuss them more efficiently. In particular, Table 5 lists the 36 studies, their category and design.  Twenty-eight studies (78%) were published from 2018 onwards, while only one study was published before 2016. Twenty-four studies (67%) use non-experimental design. Eleven studies (30%) use experimental design, while only in one case (3%) is quasi-experimental design used, as shown in Figure 1. From the non-experimental category, we can observe that the researchers in 16 studies (67%) out of 24 preferred a design that contained a test and/or observation before and after the intervention. From the experimental category, we can observe that the researchers in 8 studies (73%) out of 11 preferred a design that contained a test and/or observation before and after the intervention. For the quasi-experimental category, there is only one study and it is not safe to draw conclusions. Table 6 lists the sample characteristics for each study. In particular, we recorded the continent, the country, the school and the students' age, gender and prior experience. The studies were conducted in primary schools (64%), kindergartens (14%) or a combination of both (22%). A total of 49% of the studies were conducted in North America, 37% in Europe and 14% in Asia. The dispersion of the studies on the world map is shown in Figure 2. The age of the students who took part in the studies is shown in Figure 3. We can observe that 66% of the participants were older than 7, while 20% are kindergarten children. More specifically, from the total number of participants in all studies, 45% were girls and 55% were boys. Additionally, we can observe that in seven studies (19%) students had previous experience with ER. The distribution of the sample size is shown in Figure 4. We can see from Figure 4 that most of the studies had about 60 students.  Table 7 lists the materials and user interface used in each study. It is noteworthy that the equipment type in 18 studies (50%) was wheeled, in 12 studies (33.3%) was modular and in 2 studies (5.6%) a software application was used. In the remaining four studies, different equipment types were used. In addition, Lego robotic systems were used in 10 studies (27.8%), Bee-Bot in 6 studies (16.7%) and in 2 studies each (5.6%) Cubelets, KIBO and Dash were used. In the remaining 14 studies different systems were used.
Furthermore, in 50% of the studies GUIs were used, in 47% TUIs and in 3% a combination of the two, as shown in Figure 5.   Table 8 lists the study characteristics: subject, intervention objectives, study duration, activities and group size. The subject in 15 studies (42%) was robotics, in 11 studies (31%) computer science and in 4 studies (11%) mathematics.
The intervention objective in 11 studies (31%) was knowledge, in 6 studies (17%) attitudes and in 5 studies (14%) skills. In six studies (17%), there was a combination of skills and attitudes, in five studies (14%) a combination of knowledge and skills and in three studies (8%) a combination of knowledge and attitudes.
The distribution of the study duration is shown in Figure 6. The study duration peaks before 500 min, while only seven studies (19%) exceed 1000 min.  For the study activities, in 20 studies (56%) the activities were a combination of engineering and programming, in 12 studies (33%) they were programming, in 2 studies (6%) mathematics and in the remaining 2 studies (6%) programming and mathematics.
Moreover, Figure 7 shows the number of participants within a group. It can be observed that the children worked in groups of 3-4 in half of the studies.  Table 9 presents the intervention objectives of each study. We can see that in 19 cases (53%), the objective was knowledge, in 16 cases (44%) skills and in 15 cases (42%) the objective was attitudes. Table 9. Intervention objectives and studies.

Discussion
In this section, the results of the systematic literature review are examined, in an effort to find answers to our RQs.
Research Question 1. (RQ1). Which study design is commonly used in the STEM interventions? The majority of the researchers (67%) chose to use a non-experimental design. The review of Xia and Zhong [15] ended in similar conclusions, as in their findings 59% of the studies for ER in K-12 education were non-experimental. This may be related to the characteristics of the sample since we should not overlook the fact that in a study with young students, various difficulties and problems arise. For example, young students are easily distracted, thus, the use of a simpler design can facilitate research and lead to the successful completion of a study. On the contrary, the use of a more sophisticated and complex study design may lead to misleading results and study failure.
Based on our analysis, we have found that 23 out of the 36 studies included measurements and/or observations before and after the intervention. In these cases, there was a direct comparison of the impact of the intervention on students adding enhanced reliability to the findings. However, 13 out of the 36 study designs included measurements and/or observation only after the intervention. Therefore, in these studies, the effect of the intervention cannot be easily justified and supported since there is no corresponding data before the intervention.
Based on the above, it seems that the researchers avoided experimental design by not dividing the participants into groups. This prevented them from investigating the effects of the intervention on each group. To overcome this gap, they used pre/post-test/observation in order to give credibility to the results.
Research Question 2 (RQ2). What are the characteristics of the sample? We can see that the examined articles did not take place in South America, Africa or Oceania. In addition, Asia plays a small part, with only three countries. In contrast, most studies were implemented in North America. For several years, the USA has been investing in new technologies and education [57], and perhaps that is a reason for the flourishing of STEM programs in that country. However, this conclusion may be related to the keywords we used in our searches. In other words, through new searches and keywords in the same databases, articles from other countries might have emerged.
From the findings, it can be concluded that 2/3 of the participants were 7 years old or older. This is expected, as according to the intellectual developmental stages of Piaget, children from the age of 7 enter the concrete operational stage and can think in a more organized and logical way [58]. Consequently, STEM activities seem to be more suitable for these students. In any case, it looks like more research is needed to clarify this assumption.
Furthermore, the sample size did not exceed 50 students in the majority of the studies. The number of boys and girls shows a similar distribution with small variations. Although in some cases "boys reported significantly higher motivation than girls" [42], according to Sullivan and Bers [51] "the robotics curriculum impacted girls' interest in engineering enough that they were just as interested as boys by the end of the intervention". In addition, according to Zviel-Girshin et al. [34] "the majority of both boys and girls consider robotics education as fun and want to continue their robotic education in the next school year". Therefore, it is important that both genders have the same opportunities in STEM education, without exclusion due to prejudices or stereotypes. However, the distribution of the sample size raises concerns about the reliability and depth of the studies, since a small sample size cannot lead to generalized conclusions. The review of Xia and Zhong [15] ended with similar conclusions as in their findings, up to 40 students participated in the interventions.
Finally, one out of five primary education students re-participated in similar STEM programs, which indicates the penetration of such technologies in education.
Research Question 3 (RQ3). Which equipment and user interface are used? Our findings show that GUI and TUI were used almost equally. Nevertheless, recent studies show that students express their preference and a positive attitude towards TUIs, as GUIs might create boundaries in their cooperation [59,60]. Based on our results, only in one case article were both interfaces used. In this unique case, using the creative hybrid environment for robotic programming (CHERP) children created tangible physical programs using interlocking wooden blocks and simultaneously were able to create programs onscreen using the same icons that represent commands to control their robots [56]. In the literature, studies combining interfaces are quite limited, yet it may be worthwhile to include such approaches in future designs, to gain a deeper knowledge of the effect of different interfaces on the STEM field.
In addition, from the results, we can observe a preference for the use of wheeled robotic systems. This is probably because researchers using cars or moving mechanisms might easily attract students' interest and motivate them to engage with STEM topics.
Research Question 4 (RQ4). What are the characteristics and how were studies' data recorded? According to the findings, we see that the main studies' subjects are robotics and computer science. This finding contradicts Benitti [14] as his results "show that most of the studies (80%) explore topics related to the fields of physics and mathematics". In addition, the use of robotic systems for teaching subjects that are not traditionally related to STEM topics (science, technology, engineering, mathematics), such as history (e.g., [32]), is noteworthy. This finding agrees with Anwar et al. [16], in which it is conceded that ER can be used as a learning tool for other sciences and domains like language and argumentation thinking.
Moreover, an important element of any educational process is the cooperation and learning of students in groups. In the majority of the studies, the students worked in groups, although in a significant proportion of studies (24%), the children participated in individual activities. As a result, they may have missed the opportunity to gain the benefits of collaborative learning [61].
The researchers, in more than half of the cases, preferred to record quantitative instead of qualitative data. Questionnaires were used as a measurement tool, although in some cases might have not been validated. In this way, researchers collected sufficient data to make statistical analyses. These analyses showed that in most cases, students who attended the ER activities had significantly improved results compared to students who participated in traditional activities. Therefore, it seems that the use of robots in short-term studies can offer significant benefits, at least in the first stage of the educational process. This conclusion might not be valid in long-term research. We argue that the sample size and duration of the examined studies are too limited to safely generalize the results, so long-term research is recommended in order to collect results with more depth and quality.
Finally, we observe that the three intervention objectives (knowledge, attitudes and skills) are investigated by researchers almost equally. A meta-analysis would be an interesting and challenging future research proposal, to find out which intervention objective is the most beneficial with the use of ER.

Conclusions
STEM research has been part of education research for years. Each researcher aims to explore specific topics or features. Thus, from the plethora of studies, several reviews have emerged. However, there is limited knowledge about STEM research in students under the age of 12, so the purpose of this article was to explore the features of STEM research in primary education. Therefore, a systematic literature review was conducted and our findings showed that the study design usually contained pre/post-intervention evaluation. Researchers also seem to prefer non-experimental designs. Most studies were conducted in primary schools and the number of girls and boys did not differ significantly. However, the overall sample was quite limited in size. Likewise, the duration of the studies was limited. The use of GUI and TUI was equal, while wheeled robotic systems were preferred. The researchers also preferred to form groups of 3-4 students; however, in several cases individual activities were used. Finally, in half of the cases, the data were recorded with questionnaires.
The findings of our work can be used in future research designs. More specifically, all of the above should be taken into account by a researcher to prepare their own intervention. Additionally, educators can use ER as an educational tool based on this article, enriching their teaching approaches.
Finally, although studies on STEM research in primary education have increased in recent years, there is still the need to increase the sample size and study duration, so that the findings can more easily be generalized.