Impact and Classification of Augmented Reality in Science Experiments in Teaching—A Review

: Augmented reality (AR) improves science education by facilitating teaching and experiments in schools and universities. Studies show positive effects, like increased motivation and improved concept connections, but there is a lack of consistency in the implementation and investigation of AR in science experiments. This review examines AR usage, criteria, design parameters for the development of AR applications and the validation methods, taking into account the PRISMA guidelines. A Web of Science database search using “Publish or Perish” software (version 23.4.0) identified 247 potentially relevant articles from 2000 to March 2024 in international, peer-reviewed journals. After removing duplicates, inaccessible abstracts, and applying inclusion and exclusion criteria, 40 studies were selected for in-depth analysis. Physics had the most AR applications, primarily for visualizing invisible properties. Most studies used quantitative or qualitative methods, only a few used both or did not conduct empirical research. Research questions varied, but common drawbacks included small sample sizes and low use of AR design parameters such as interactivity, adaptivity, realistic representation and use of game elements. This review identifies opportunities for improvement in the implementation and investigation of AR in science education experiments and emphasizes consistent and rigorous approaches to fully exploit the benefits of AR in science education.


Introduction
Experiments play a central role in science education as they provide students with a unique opportunity to engage with science (chemistry, biology and physics) through multisensory experiences.These experiments not only promote hands-on and inquiry skills, but also familiarize students with various instruments, equipment, and substances, thus promoting deeper conceptual understanding [1].However, historical debates have questioned the value of experiments in science education.In the 1970s and 1980s, researchers raised doubts about the efficacy of experiments, due to a lack of concrete evidence of their benefits [2][3][4].This skepticism extended to concerns about the costs and time investment associated with traditional experiments, which led to investigations into more cost-effective alternatives [2,5].In addition, issues such as students' lack of understanding when conducting experiments and safety concerns further fueled the debate.Despite these reservations, Carnduff and Reid [6] highlighted the unique skills fostered through experimentation, particularly practical, observational, analytical, and problem-solving skills that are difficult to cultivate through traditional teaching methods.
The complex nature of experiments presents a challenge for science education, as students must bridge the macro-level of immediate sensory perception and the interpretation of sub-microscopic phenomena, while simultaneously capturing and representing these experiences [7].Navigating these levels requires intricate cognitive processes that must Educ.Sci.2024, 14, 760 3 of 20 mental purpose and methodology [28].Safety concerns are another deterrent, especially for novice students, who might lack familiarity with laboratory settings and protocols for instrument operation and chemical handling.Conversely, Carnduff and Reid [6] argue that experiments foster practical, analytical, and problem-solving skills that are hard to cultivate through other methods.
To address these concerns, the integration of AR offers potential benefits.AR enables students to interact with the learning material in novel ways, which can enhance the traditional learning experience.Studies have shown that AR has positive effects on motivation, self-efficacy, attitudes, and learning performance [29][30][31].AR supports the understanding of abstract concepts [21][22][23][24][25], as demonstrated by Radu et al. [14], where AR users exhibited better understanding of electromagnetism and faster knowledge transfer.

Defining Augmented Reality
Milgram defines AR as an intermediate step in the whole range from the real to the virtual environment: the reality-virtuality (RV) continuum [41].It can be described as a fluid transition between the real environment to AR (real environment enhanced with virtual objects) and augmented virtuality (AV) and finally to a completely virtual environment, where an entire world is synthetically created.This transition occurs along a continuum.The middle of this continuum, where virtual and real objects are displayed side by side, are forms of mixed reality [41].Azuma [42] refines the definition of AR as a variation of virtual reality (VR).Virtual environments allow users to completely immerse themselves in a synthetically created world while shutting off their real surroundings.The virtual environment can, but does not have to, mimic the characteristics of the real environment.AR does not replace the surroundings with a virtual one, but rather enhances the real environment with virtual objects that complement the user's actual surroundings [42].Azuma defines AR as a combination of reality and virtuality that works interactively in real time.AR should be registered in 3D so that devices other than head mounted displays (HMDs) can also be used [42].

Design Parameters to AR-Learning Environments
To select and compare the literature, a set of seven design parameters for AR-Learning environments was used that was developed from Krug et al. [43,44].These parameters are adaptivity, interactivity, immersion, congruence with reality, content proximity to reality, game elements, and complexity.
Each parameter has different levels or indicators that allow the comparison of different AR applications.In the following section, the different parameters and their levels and indicators are explained individually.

Adaptivity
According to Krug et al. [43,44] adaptivity describes the capacity of the user or the program itself to adapt the application to different situations.It is the ability of a program to react to activities, events or changes in situations, or a combination of them.This definition is the result of authors combining the definitions from Paramythis and Loidl-Reisinger [45] and Söldner [46].A reaction could be a dynamic adjustment in the software elements or service.The indicators of this parameter are split into four levels and defined in Table S1 in the Supplementary Material.The table shows the different levels of "adaptivity" and describes how the program can adapt to the user's needs, from no adaptability (level 1) to fully automatic adaptation based on sensor input and user activity (level 4).

Interactivity
The definition of interactivity was defined by Krug et al. [43,44], according to Schulmeister [47], as the intended interaction with an object, device, or content of a digital media component.There are six levels of interactivity (see in detail Table S2).The levels of interactivity range from the observation and reproduction/reflection/reporting of an object (level 1) to the generation of processes, as well as construction and manipulation with situational feedback (level 6).Higher levels enable more active participation and influence of the user on the content.

Immersion
Slater and Wilbur [48] defined immersion as a description of a technology in which a digital media can influence the senses of a human user, particularly how it composes an "inclusive, extensive, surrounding and vivid illusion" for the user.To determine the level of immersion Krug et al. [43] constructed indicators around the human senses (visual, auditory, haptic, olfactory, gustatory).The degree of immersion gets higher as more senses are involved.

Congruence with Reality
This parameter is based on the definition of congruence with reality by Krug et al. [43,44] according to McMahan [49].It is divided into social reality and perceptual reality.The former describes how plausible and realistic to life occurrences and social interactions are depicted.Perceptual reality specifies how real objects and events appear outwardly.It describes how realistic the appearance and acoustics of an object are.The more indicators an AR application has, the higher the degree of the congruence with reality.The various indicators are described in Table S3 in the Supplementary Material.

Content Proximity to Reality
The plausibility of AR content in terms of causal, local, and temporal factors is defined by this parameter, as well as the plausible use and depiction of the tracking method according to Krug et al. [43,44].To avoid overlap with the previous parameter, the outer appearance of the object is not investigated with the indicators of content proximity to reality (see in detail Table S4).With each additional indicator, the degree of the parameter increases.

Game Elements
Gamification, or the use of game elements in education, can enhance the interactivity and motivation [50].Krug et al. [43] established eight indicators for this parameter.The level of this parameter increases with the inclusion of each additional indicator.
The indicators include clear rules/goals, conflicts/challenges for users, opportunities for control and manipulation, scoring systems such as points or feedback, opportunities for interaction between user and program, social interaction between users, independence from a particular environment, and the presence of an overarching story or narrative (see in detail Table S5).

Complexity
This parameter illustrates the content and cognitive structures of the AR functions according to Krug et al. [43,44].The user's knowledge performance increases with increasing complexity [43,44].Complexity was formulated according to the ESNAS model [51] (evaluation of standards in science, technology, engineering, art, and mathematics (STE(A)M) education in secondary school).For this parameter five levels are defined (see in detail Table S6) [43,44].
At the lowest level (1), individual facts such as technical terms or properties are described.Level 2 comprises two unrelated facts.At level 3, functional relationships between facts are placed in context.Level 4 links different contexts to form complex relationships such as cycles.The highest level 5 describes higher-level, situation-independent concepts.

Methodology
To determine the implementation of AR in science experiments in an educational context, we followed the four stages of the PRISMA standard [27] for the external structure of the selection process (see Figure 1).First, the literature entries from the database are identified by means of keywords, which are described in detail in the subsection "Literature search".The next step was to check for duplicates.Then, in a combined screening phase, in which first the abstracts and then the articles were screened, non-qualifying papers were excluded.The exclusion criteria used are explained in more detail in the Section 3.2.The study of Krug et al. [43,44] was used to evaluate the AR applications with regard to the design parameters.

Selection of the Papers
To find appropriate literature, a search of the Web of Science database was condu using the Publish or Perish software [52].Only quality journals that meet the criter being international, peer-reviewed, and recognized in the scientific community ca found in the database Web of Science.Keywords were selected to filter publication lated to AR in science education with an experimental context.The following keyw

Selection of the Papers
To find appropriate literature, a search of the Web of Science database was conducted using the Publish or Perish software [52].Only quality journals that meet the criteria of being international, peer-reviewed, and recognized in the scientific community can be found in the database Web of Science.Keywords were selected to filter publications related to AR in science education with an experimental context.The following keywords were used: "Augmented Reality AND Chemistry OR Biology OR Physics OR STEM OR STEAM AND experiment OR experiment*".Only published studies between 2000 and March 2024 were chosen.As "Publish or Perish" only indicates 200 publications, the search was split into two searches: 2000-2021 and 2022-2024.To verify results, the search for 2021-2022 was also used as a comparison, and these searches matched.In April 2024, we identified 247 articles that may be suitable.Checking for duplicates showed only two articles that needed to be removed.

Criteria Screening
In this phase, the title and abstract of each potential paper were examined, and the full-text papers were retrieved.Five abstracts could not be retrieved so we deleted these reports.The next steps were carried out by using inclusion or exclusion criteria.The criteria were worded as a "yes" or "no" question, meaning that the articles could have a hit or a miss for each criterion.The reason for reports being excluded was no reference to a school or university science education (n = 62).If the reference to science education was given, we focused on the experiments and excluded 87 references with no experiments or reviews that did not include experiments of their own.We identified nine references that did not use AR in their experiments, and 40 references that used VR, which we also excluded.
We also excluded a reference with a remote experiment.If the paper could not be classified after reading the abstract, the full text was read.Only the article by Martinez et al. [53] was excluded at this stage of the analysis, as the corresponding article could not be obtained.After this selection process, 40 studies were chosen to be analyzed in detail for the present review.
The process of abstract and paper selection and sorting was deliberated through dialogic discourse between the authors.

Extraction of Study Content
The information needed to analyze the literature on the research questions was extracted and the AR applications developed were compared.The objective for each AR and the point in the experimentation process at which it was used were also recorded.The subjects, topics and experiments were compared.The literature was organized according to the type of research conducted and the different research interests and outcomes were analyzed.The different design indicators were also assessed [43].To examine the design parameters used, the corresponding parameters and indicators were identified.

Results
The distribution of publications over a period of 14 years shows a significant increase in the integration of AR technology in science education experiments between 2016 and 2020.However, the number of new publications per year related to AR in science education with experiments has slightly declined since then (Figure 2).The trend between 2016 and 2020 aligns with the findings of Garzón et al. [54] and Sırakaya and Sırakaya [55], which report a steady increase in the utilization of AR in education.The growing popularity of mobile devices in academic settings [56] and the progressive enhancement of these devices' processing capabilities [36] could potentially account for this surge in AR implementation.Mazzuco et al. [26] also demonstrated this for AR applications in the field of chemistry.
and 2020 aligns with the findings of Garzón et al. [54] and Sırakaya and Sırakaya which report a steady increase in the utilization of AR in education.The growing po larity of mobile devices in academic settings [56] and the progressive enhancemen these devices' processing capabilities [36] could potentially account for this surge in implementation.Mazzuco et al. [26] also demonstrated this for AR applications in the of chemistry.S7) [1,16,17,19,22,[34][35][36][37][38][39].One tool (number 18) was reworked significantly between the publications, so it was differentiated in two parts (a and b).Most of the 28 applications were used in physics (17) or chemistry (7), while five tools were not designated to a specific science subject as shown in Figure 3.The topics for the corresponding applications were electricity (6), thermodynamics (5), mechanics (4), magnetism (3), radiation (2), redox-reactions (2), safety (2), general operation and handling of instruments and chemicals (1), biochemistry (1), titrations (1), packed bed column (1), and the attributes of water for primary school (1), as depicted in Figure 4. Most of the applications were aimed at high school or secondary school students (11), followed by undergraduates in university (10), or for high school students as well as undergraduates (2).Only two applications considered primary school or lower secondary school students.The remaining applications were not specifically designated for a particular age group.The 40 selected papers presented 29 different AR applications used for STE(A)M experiments (refer to Table S7) [1,16,17,19,22,[34][35][36][37][38][39].One tool (number 18) was reworked significantly between the publications, so it was differentiated in two parts (a and b).Most of the 28 applications were used in physics (17) or chemistry (7), while five tools were not designated to a specific science subject as shown in Figure 3.The topics for the corresponding applications were electricity (6), thermodynamics (5), mechanics (4), magnetism (3), radiation (2), redox-reactions (2), safety (2), general operation and handling of instruments and chemicals (1), biochemistry (1), titrations (1), packed bed column (1), and the attributes of water for primary school (1), as depicted in Figure 4. Most of the applications were aimed at high school or secondary school students (11), followed by undergraduates in university (10), or for high school students as well as undergraduates (2).Only two applications considered primary school or lower secondary school students.The remaining applications were not specifically designated for a particular age group.The AR applications were mainly used during the experiment to support the experimentation process.The remaining applications were used before the experiment for preparation purposes, and afterwards for revision purposes, or as optional tools when assistance was required.The majority of the devices used for the AR applications were mobile devices, such as smartphones or tablets, sixteen devices in total.However, one specific application required a modified iPad with an additional sensor.For HoloLens or other head mounted devices, eight AR applications were developed.Additionally, a completely new device called A-Cube, equipped with a projector and camera, was developed for one case.The primary software development kit (SDK) used for the development of these applications was Vuforia.The Unity Game Engine was also frequently utilized.The remaining Educ.Sci.2024, 14, 760 8 of 20 applications were developed using various other programs or programming platforms.In six instances, the programs used to develop the applications were not explicitly specified.The AR applications were mainly used during the experiment to support the experimentation process.The remaining applications were used before the experiment for preparation purposes, and afterwards for revision purposes, or as optional tools when assistance was required.The majority of the devices used for the AR applications were mobile devices, such as smartphones or tablets, sixteen devices in total.However, one specific application required a modified iPad with an additional sensor.For HoloLens or other head mounted devices, eight AR applications were developed.Additionally, a completely new device called A-Cube, equipped with a projector and camera, was developed for one case.The primary software development kit (SDK) used for the development of these applications was Vuforia.The Unity Game Engine was also frequently utilized.The Physics; 17 Chemistry; 7 The intended purposes for the AR application differentiated significantly among the studies examined (refer to Figure 5).Out of the total number of applications, nine were designed to provide instructions or manuals to assist with the execution of the experiment.These included tools such as action-mapping tools, AR-enhanced manuals, instruction videos, and schematic-circuit overlays.Additionally, six applications were developed to provide information for preparation or revision, establishing connections between the experiment and the theoretical background.Two applications focused on enhancing safety precautions.The first of these tools served as an alert system to notify users of potentially occurring accidents, while the other served as a safety-training assistant to educate students prior to conducting the experiment.Furthermore, nine applications were developed to visualize measurements or parameters.Most of the applications were utilized for the visualizing properties that are otherwise invisible, contributing to a better understanding of general concepts (fourteen applications).While the majority of applications only fulfilled a single purpose described above, there were ten that could be used for multiple purposes.
Educ.Sci.2024, 14, x FOR PEER REVIEW 9 of 21 remaining applications were developed using various other programs or programming platforms.In six instances, the programs used to develop the applications were not explicitly specified.
The intended purposes for the AR application differentiated significantly among the studies examined (refer to Figure 5).Out of the total number of applications, nine were designed to provide instructions or manuals to assist with the execution of the experiment.These included tools such as action-mapping tools, AR-enhanced manuals, instruction videos, and schematic-circuit overlays.Additionally, six applications were developed to provide information for preparation or revision, establishing connections between the experiment and the theoretical background.Two applications focused on enhancing safety precautions.The first of these tools served as an alert system to notify users of potentially occurring accidents, while the other served as a safety-training assistant to educate students prior to conducting the experiment.Furthermore, nine applications were developed to visualize measurements or parameters.Most of the applications were utilized for the visualizing properties that are otherwise invisible, contributing to a better understanding of general concepts (fourteen applications).While the majority of applications only fulfilled a single purpose described above, there were ten that could be used for multiple purposes.From the selected 40 papers, six performed qualitative and quantitative research, eight only performed qualitive research and 15 focused on quantitative research.However, eleven articles did not conduct any empirical research at all (refer to Figure 6).The sample size of the qualitive studies ranged from 3 to 42 students and the quantitative studies from 8 to 280 students.The sample sizes ranged from 12 to 110 in the studies that used both qualitative and quantitative research methods (see in detail Table S8).The main focus of the studies was usability of the application, concept connection, cognitive load, attitude, usage of AR, performance, and knowledge gain.Also, design, skill gain, motivation, usefulness, and others were examined.The frequency of each research focus within the analyzed articles is shown in Figure 7.The results of different research interests are given in Table 1.Most of the research conducted with regard to concept connections or knowledge gain showed an increase in the corresponding aspects.For cognitive load there was either no difference detected between AR-and control-Group or the AR-Group had a lower cognitive load.The attitude to the experiments enhanced through AR was always found to be positive.

Interest
Result Number of Studies The main focus of the studies was usability of the application, concept connection, cognitive load, attitude, usage of AR, performance, and knowledge gain.Also, design, skill gain, motivation, usefulness, and others were examined.The frequency of each research focus within the analyzed articles is shown in Figure 7.The main focus of the studies was usability of the application, concept connection, cognitive load, attitude, usage of AR, performance, and knowledge gain.Also, design, skill gain, motivation, usefulness, and others were examined.The frequency of each research focus within the analyzed articles is shown in Figure 7.The results of different research interests are given in Table 1.Most of the research conducted with regard to concept connections or knowledge gain showed an increase in the corresponding aspects.For cognitive load there was either no difference detected between AR-and control-Group or the AR-Group had a lower cognitive load.The attitude to the experiments enhanced through AR was always found to be positive.

Interest
Result Number of Studies Higher 5 The results of different research interests are given in Table 1.Most of the research conducted with regard to concept connections or knowledge gain showed an increase in the corresponding aspects.For cognitive load there was either no difference detected between AR-and control-Group or the AR-Group had a lower cognitive load.The attitude to the experiments enhanced through AR was always found to be positive.Almost every AR application stimulated only the visual senses, for an immersion score of 1.Only three also had auditive immersion, as shown in Figure 8, and were specific to literature, Table S9 in the Supplementary Material.Almost every AR application stimulated only the visual senses, for an immersion score of 1.Only three also had auditive immersion, as shown in Figure 8, and were specific to literature, Table S9 in the Supplementary Material.The measuring of different physical or chemistry parameters results in high interactivity, which resulted in some applications earning a high interactivity level.Nine applications reached the fourth level on this design parameter, and two the third level.Five applications had an Interactivity level of 2, because the user could switch between different representations or similar interactions.The rest had an interactivity level of 1 (refer to Figure 9 and Table S9).Nearly all applications showed hits on the indicators "3D-Registration" and "Proportions" within the "Congruence with Reality" parameter.The frequency of these indicators is presented in Figure 10.However, only two applications displayed the "Photorealism" indicator, such as using videos.Additionally, only seven applications demonstrated the "Plausibility" indicator, also seven had "Light effects", five presented "Proximity to Life", and another five displayed "Shadow effects".The distribution of indicator numbers is depicted in Figure 11.Most applications exhibited two indicators (9).The measuring of different physical or chemistry parameters results in high interactivity, which resulted in some applications earning a high interactivity level.Nine applications reached the fourth level on this design parameter, and two the third level.Five applications had an Interactivity level of 2, because the user could switch between different representations or similar interactions.The rest had an interactivity level of 1 (refer to Figure 9 and Table S9).Almost every AR application stimulated only the visual senses, for an immersion score of 1.Only three also had auditive immersion, as shown in Figure 8, and were specific to literature, Table S9 in the Supplementary Material.The measuring of different physical or chemistry parameters results in high interactivity, which resulted in some applications earning a high interactivity level.Nine applications reached the fourth level on this design parameter, and two the third level.Five applications had an Interactivity level of 2, because the user could switch between different representations or similar interactions.The rest had an interactivity level of 1 (refer to Figure 9 and Table S9).Nearly all applications showed hits on the indicators "3D-Registration" and "Proportions" within the "Congruence with Reality" parameter.The frequency of these indicators is presented in Figure 10.However, only two applications displayed the "Photorealism" indicator, such as using videos.Additionally, only seven applications demonstrated the "Plausibility" indicator, also seven had "Light effects", five presented "Proximity to Life", and another five displayed "Shadow effects".The distribution of indicator numbers is depicted in Figure 11.Most applications exhibited two indicators (9).Nearly all applications showed hits on the indicators "3D-Registration" and "Proportions" within the "Congruence with Reality" parameter.The frequency of these indicators is presented in Figure 10.However, only two applications displayed the "Photorealism" indicator, such as using videos.Additionally, only seven applications demonstrated the "Plausibility" indicator, also seven had "Light effects", five presented "Proximity to Life", and another five displayed "Shadow effects".The distribution of indicator numbers is depicted in Figure 11.Most applications exhibited two indicators (9).In thirteen applications all five of the "Content Proximity to Reality" indicators were detected.Seven applications only had four indicators.Just a few applications had fewer indicators (refer to Figure 12).All the tracking methods were found to be appropriate and just eight of the tracking depictions did not match the theme of the AR subject.Temporal, causal, and local plausibility was observed in all, but five to seven applications, as shown in Figure 13.In thirteen applications all five of the "Content Proximity to Reality" indicators were detected.Seven applications only had four indicators.Just a few applications had fewer indicators (refer to Figure 12).All the tracking methods were found to be appropriate and just eight of the tracking depictions did not match the theme of the AR subject.Temporal, causal, and local plausibility was observed in all, but five to seven applications, as shown in Figure 13.In thirteen applications all five of the "Content Proximity to Reality" indicators were detected.Seven applications only had four indicators.Just a few applications had fewer indicators (refer to Figure 12).All the tracking methods were found to be appropriate and just eight of the tracking depictions did not match the theme of the AR subject.Temporal, causal, and local plausibility was observed in all, but five to seven applications, as shown in Figure 13.In thirteen applications all five of the "Content Proximity to Reality" indicators were detected.Seven applications only had four indicators.Just a few applications had fewer indicators (refer to Figure 12).All the tracking methods were found to be appropriate and just eight of the tracking depictions did not match the theme of the AR subject.Temporal, causal, and local plausibility was observed in all, but five to seven applications, as shown in Figure 13.The level of "Adaptivity" was mostly very low.Twenty applications had the first level, while three were classified in the second, and four in the third level.Only one application exhibited a high level (4) of adaptivity (refer to Figure 14).
Mainly, the analyzed applications showed a low level of "Gamification".The maximum number of game elements found was three, and this was only the case for two applications (Hanafi's Safety-Assistant with a Quiz-Character and Bakri's learning App to Heat-Experiments with different functions and control-mechanism).Three applications had two game elements, while two had three elements.Eight applications had only one.Most applications (15) did not have any game elements at all (refer to Figure 15).The level of "Adaptivity" was mostly very low.Twenty applications had the first level, while three were classified in the second, and four in the third level.Only one application exhibited a high level (4) of adaptivity (refer to Figure 14).
Mainly, the analyzed applications showed a low level of "Gamification".The maximum number of game elements found was three, and this was only the case for two applications (Hanafi's Safety-Assistant with a Quiz-Character and Bakri's learning App to Heat-Experiments with different functions and control-mechanism).Three applications had two game elements, while two had three elements.Eight applications had only one.Most applications (15) did not have any game elements at all (refer to Figure 15).
The main game element found in the investigated applications was the "Control" indicator (12).Two tools offered right and wrong answers and so the "Rules/Goals" indicator was assigned in those cases.The flexible environment indicator was assigned if the app could be used at any location.If the marker could be printed and moved to any place, the conditions for the indicator were satisfied.When other instruments or modifications of the mobile device were needed, the requirements for the indicator were not met.With these conditions only six applications showed the "Environment" element, as shown in Figure 16.The level of "Adaptivity" was mostly very low.Twenty applications had the first level, while three were classified in the second, and four in the third level.Only one application exhibited a high level (4) of adaptivity (refer to Figure 14).
Mainly, the analyzed applications showed a low level of "Gamification".The maximum number of game elements found was three, and this was only the case for two applications (Hanafi's Safety-Assistant with a Quiz-Character and Bakri's learning App to Heat-Experiments with different functions and control-mechanism).Three applications had two game elements, while two had three elements.Eight applications had only one.Most applications (15) did not have any game elements at all (refer to Figure 15).
The main game element found in the investigated applications was the "Control" indicator (12).Two tools offered right and wrong answers and so the "Rules/Goals" indicator was assigned in those cases.The flexible environment indicator was assigned if the app could be used at any location.If the marker could be printed and moved to any place, the conditions for the indicator were satisfied.When other instruments or modifications of the mobile device were needed, the requirements for the indicator were not met.With these conditions only six applications showed the "Environment" element, as shown in Figure 16.The level of "Adaptivity" was mostly very low.Twenty applications had the first level, while three were classified in the second, and four in the third level.Only one application exhibited a high level (4) of adaptivity (refer to Figure 14).
Mainly, the analyzed applications showed a low level of "Gamification".The maximum number of game elements found was three, and this was only the case for two applications (Hanafi's Safety-Assistant with a Quiz-Character and Bakri's learning App to Heat-Experiments with different functions and control-mechanism).Three applications had two game elements, while two had three elements.Eight applications had only one.Most applications (15) did not have any game elements at all (refer to Figure 15).
The main game element found in the investigated applications was the "Control" indicator (12).Two tools offered right and wrong answers and so the "Rules/Goals" indicator was assigned in those cases.The flexible environment indicator was assigned if the app could be used at any location.If the marker could be printed and moved to any place, the conditions for the indicator were satisfied.When other instruments or modifications of the mobile device were needed, the requirements for the indicator were not met.With these conditions only six applications showed the "Environment" element, as shown in Figure 16.The main game element found in the investigated applications was the "Control" indicator (12).Two tools offered right and wrong answers and so the "Rules/Goals" indicator was assigned in those cases.The flexible environment indicator was assigned if the app could be used at any location.If the marker could be printed and moved to any place, the conditions for the indicator were satisfied.When other instruments or modifications of the mobile device were needed, the requirements for the indicator were not met.With these conditions only six applications showed the "Environment" element, as shown in Figure 16.
The "Complexity" level of the applications ranged through all levels, but most had a low level of complexity.The distribution of the levels is shown in Figure 17.Four applications reached level 4, and just one level 5, which thereby had a high level of complexity.The "Complexity" level of the applications ranged through all levels, but most had a low level of complexity.The distribution of the levels is shown in Figure 17.Four applications reached level 4, and just one level 5, which thereby had a high level of complexity.

RQ1 AR-Enhanced Science Experiments and Implementation Approaches
Most of the articles were focused on physics, particularly the topics of electricity, magnetism, or thermodynamics.AR for experiments in physics are most effective in visualizing invisible processes like magnetic field lines and can be easily implemented for a static system/setup.It might be more difficult to develop an AR tool which helps to visualize the invisible processes in a vessel in which a chemical reaction occurs.No AR application was introduced especially for the subject of biology.AR tools could be used here, for example, to show three-dimensional models of microscopy experiments.An interesting application of AR is the replacement of real chemicals, substances, or liquids to reduce safety-risks, costs, and other resources [35,39].In general, the topic of laboratory risks and accident prevention is a potential that has been enabled by new technologies such as AR.Applications could be used in the future to detect accidents before they happen [37,38] or to reduce risks by using better preparations.A lot of groups have developed AR applications to prepare for experiments via enhanced instructions or manuals, which could help reduce anxiety and increase confidence in handling chemicals or laboratory instruments [66].Also, applications were used to show measurements, rather than using additional instruments which may be unfamiliar to students.The most frequently used devices were smartphones or tablets, which would make the AR Applications widely available for a lot of teachers and students, as no new or dedicated equipment is needed.Special devices like the HoloLens are expensive and not every school or institution has access to this  The "Complexity" level of the applications ranged through all levels, but most had a low level of complexity.The distribution of the levels is shown in Figure 17.Four applications reached level 4, and just one level 5, which thereby had a high level of complexity.

RQ1 AR-Enhanced Science Experiments and Implementation Approaches
Most of the articles were focused on physics, particularly the topics of electricity, magnetism, or thermodynamics.AR for experiments in physics are most effective in visualizing invisible processes like magnetic field lines and can be easily implemented for a static system/setup.It might be more difficult to develop an AR tool which helps to visualize the invisible processes in a vessel in which a chemical reaction occurs.No AR application was introduced especially for the subject of biology.AR tools could be used here, for example, to show three-dimensional models of microscopy experiments.An interesting application of AR is the replacement of real chemicals, substances, or liquids to reduce safety-risks, costs, and other resources [35,39].In general, the topic of laboratory risks and accident prevention is a potential that has been enabled by new technologies such as AR.Applications could be used in the future to detect accidents before they happen [37,38] or to reduce risks by using better preparations.A lot of groups have developed AR applications to prepare for experiments via enhanced instructions or manuals, which could help reduce anxiety and increase confidence in handling chemicals or laboratory instruments [66].Also, applications were used to show measurements, rather than using additional instruments which may be unfamiliar to students.The most frequently used devices were smartphones or tablets, which would make the AR Applications widely available for a lot of teachers and students, as no new or dedicated equipment is needed.Special devices like the HoloLens are expensive and not every school or institution has access to this

RQ1 AR-Enhanced Science Experiments and Implementation Approaches
Most of the articles were focused on physics, particularly the topics of electricity, magnetism, or thermodynamics.AR for experiments in physics are most effective in visualizing invisible processes like magnetic field lines and can be easily implemented for a static system/setup.It might be more difficult to develop an AR tool which helps to visualize the invisible processes in a vessel in which a chemical reaction occurs.No AR application was introduced especially for the subject of biology.AR tools could be used here, for example, to show three-dimensional models of microscopy experiments.An interesting application of AR is the replacement of real chemicals, substances, or liquids to reduce safety-risks, costs, and other resources [35,39].In general, the topic of laboratory risks and accident prevention is a potential that has been enabled by new technologies such as AR.Applications could be used in the future to detect accidents before they happen [37,38] or to reduce risks by using better preparations.A lot of groups have developed AR applications to prepare for experiments via enhanced instructions or manuals, which could help reduce anxiety and increase confidence in handling chemicals or laboratory instruments [66].Also, applications were used to show measurements, rather than using additional instruments which may be unfamiliar to students.The most frequently used devices were smartphones or tablets, which would make the AR Applications widely available for a lot of teachers and students, as no new or dedicated equipment is needed.Special devices like the HoloLens are expensive and not every school or institution has access to this technology.The A-Cube requires specially programmed software, cameras, and a projector.This means that it requires a complex and expensive set-up.However, with future development and perhaps in combination with devices that are cheap and widely available, it could help teachers to conduct more experiments at school with students.By ensuring the safety of students, the fear can be taken away.Most of the AR tools in these studies had only one specific function and purpose, but with further development, AR applications for science education could become more complex and help with different goals.

RQ2 Empirical Evaluations of AR-Enhanced Laboratory Environments
Eleven AR applications presented in articles were still prototypes and were not verified yet.The research was more quantitative than qualitative.Some articles only had small sample sizes.It was questionable how significant the results of a study were if the sample included only eight individuals, for example.Also, not all the studies conducted a test with a control-group.In these cases, the results cannot be entirely attributed to the AR application itself.In these cases, further studies need to be conducted.The focus of most of the empirical research was on the usability of the applications.This makes sense because if the usability is low, it could affect the results of a variety of other research interests.Overall, the positive effects of AR applications in fostering concept connections, minimizing cognitive load and increasing knowledge were shown in different studies.Also, it was shown that students perceive experiments enhanced with AR in a positive way.This could be interpreted as potential proof of the value of AR for science experiments.An interesting research focus is the pattern in which AR applications are used and which information and functions are favored by the user.This research would allow future applications to focus on these functions of AR tools.

RQ3 Implementation of Design Parameters for AR in Educational Experiments
The level of immersion is very low in all the applications.Increasing the number of involved senses can help with the learning process [86].However, it can be debated whether greater immersion is needed, because the other senses are addressed by the experiment itself.The students receive haptic feedback while handling different objects, instruments, and other equipment.The olfactory sense might be used as well, especially for chemistry experiments.A potential for the applications could be auditory immersion with regards to inclusion of impaired students.
The interactivity in the examined applications was mostly low.Interactivity by itself does not guarantee a productive learning environment [87], but it can support the learning process by leading the user and framing the thinking of the student [88].
The indicators of the parameter "Congruence with Reality" were found only partially in the different applications.Here, especially the 3D-Registration and appropriate proportions were considered.On the other hand, the indicators for a realistic inclusion such as light and shadow effects, or a lifelike depiction, were almost only found in videos portrayed with AR.This is not a surprise, as those graphic designs would need a lot more time, resources, and skills.Here, it could be debated how much the output justifies an overly complex and expensive development process.Maybe the same advantages for science education or experiments could be achieved with a less complicated and cheaper tool, which is also easier to produce.
The "Content proximity to Life" was specified as high for most of the AR applications.The tracking methods were all found to be reasonable, and the various types of plausibility were considered.Without this parameter, a successful tool for science education would not be possible.The understanding of concepts and the gain in knowledge could benefit from a high level of this design parameter.
Most of the applications were not adaptive.Neither the user nor the program itself could be adapted to new situations or needs.It could be helpful to change the difficulty or presentations.
For students with impairments, it would offer the possibility to use the application in another way.Also, it would provide the usefulness of the tool to a wider range of students, such as different grades or educational institutions.
In contrast to the previously mentioned parameter, the number of game elements were found to be very low in the studies reviewed.Most of the AR tools had no gamification at all.This could be a particularly interesting and fruitful area for further research, because studies showed a positive effect of game elements on student engagement and motivation [89,90].
Most of the AR applications only depicted one fact or invisible process.Potentially, future AR tools could become more complex and depict more facts, connections, and different representation.On one hand, the value of the application for the learning process could therefore increase.On the other hand, such applications could become confusing, difficult to operate, or overwhelming.

Limitations
Restriction to the Web of Science database resulted in the exclusion of articles not published there, potentially excluding valuable research.This bias towards including only high-quality and published research works may limit the comprehensiveness of this study.Furthermore, the parameters selected for this review's literature analysis could be evaluated for potential subjective biases.Some of the analyzed articles lacked sufficiently detailed descriptions of AR tools, making it challenging to evaluate them objectively without interpretation.Data coding was not subjected to double coding; instead, articles were discussed and assigned by two researchers through dialogue.This approach could introduce subjectivity into the data assignment process.Due to inadequate descriptions in some papers, there may have been instances where AR applications were misclassified.For example, papers may have been categorized as involving VR or lacking experimentation when they featured AR.If we had the opportunity to test all AR applications ourselves, we could have presented fewer interpretative data.The study did not consider the specific year within which papers were submitted, as it was not feasible to integrate this information into the analysis within the given timeframe.This limitation may affect the temporal context of the findings.

Conclusions
This review has revealed an abundance of developed and partially empirically validated AR applications.Various application domains were explored, highlighting gaps in the distribution of subjects and topics.Most of these applications were centered around physics, with a specific focus on physics education topics like electricity.However, there is an opportunity to develop more AR applications for science subjects to enrich experiments.
In general, this review demonstrated that numerous research groups are actively creating innovative programs aimed at enhancing science experiments in education.Further empirical research of these applications is warranted, as many of them are still in prototype stages and have not yet been validated.In addition, existing applications could benefit from enhancements, including the addition of new features and improved visual representations.
A wide range of potential new applications remain unexplored, offering opportunities to depict various invisible properties, states of matter, and chemical reactions, or to serve as a cost-effective alternative to traditional materials.If these new applications are developed with a focus on widely available and commonly used devices and platforms, they could serve as catalysts for the advancement of AR-enhanced education.
Moreover, the development of subject-specific and interdisciplinary AR applications are crucial to facilitate their individual use in schools and universities.Furthermore, it is recommended to simplify the availability of AR applications through downloads, in order to promote their acceptance and enable the use of existing AR applications.
The empirical validation of these applications should also be given more attention.Due to the small sample size, only initial insights into the value and effectiveness of AR tools were possible.Larger sample studies and an expansion of the research landscape comparing different design parameters in AR extensions for the same experiment would be desirable.Moreover, the underlying design parameters of these AR applications were often overlooked or insufficiently researched, especially regarding the parameters of "Interactivity", "Adaptivity", and "Game Elements", which were scarcely found in most AR applications.Given the importance of adaptivity in achieving individualized learning outcomes, future developments should consider these parameters and prioritize the validation of AR applications.Emphasizing these parameters in AR design could lead to higher engagement among science students.Additionally, increased immersion and a higher degree of "Interactivity" and "Adaptivity" could offer inclusive learning opportunities for students with disabilities.
Further developments of a variety of suitable AR-supported experiments in science subjects that cover a wide range of learning objectives is therefore necessary.Moreover, these should be made publicly accessible to educators, for instance, through platforms or teacher training programs.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/educsci14070760/s1,Table S1: Levels of the parameter "Adaptivity"; Table S2: Levels of the parameter "Interactivity"; Table S3: Indicators of the parameter "Congruence with Reality"; Table S4: Indicators of the parameter "Content proximity to Reality"; Table S5: Indicators of the parameter "Game Elements"; Table S6: Levels of the parameter "Complexity"; Table S7

Funding:
The APC was funded by the University of Konstanz.

6 Figure 1 .
Figure 1.Identification of relevant papers via the database.

Figure 1 .
Figure 1.Identification of relevant papers via the database.

4. 1 .
RQ1: Which Experiments in Science Education Become Enhanced through AR? How Does AR Become Implemented in Experiments in Science Education?The 40 selected papers presented 29 different AR applications used for STE(A)M experiments (refer to Table

Figure 3 .Figure 3 .
Figure 3. Subject distribution of the AR applications.

Figure 3 .
Subject distribution of the AR applications.

Figure 5 .
Figure 5. Purpose of the AR application.

4. 2 .
RQ2: How Does the Empirical Research Examine the AR Enhanced Laboratories in an Educational Context?From the selected 40 papers, six performed qualitative and quantitative research, eight only performed qualitive research and 15 focused on quantitative research.How-

Figure 5 .
Figure 5. Purpose of the AR application.

4. 2 .
RQ2: How Does the Empirical Research Examine the AR Enhanced Laboratories in an Educational Context?

21 Figure 6 .
Figure 6.Distribution of empirical research from selected literature.

Figure 7 .
Figure 7. Research interests from selected literature.

21 Figure 6 .
Figure 6.Distribution of empirical research from selected literature.

Figure 7 .
Figure 7. Research interests from selected literature.

Figure 11 .
Figure 11.Distribution of indicators for "Congruence with Reality".

Figure 12 .
Figure 12.Distribution of indicators for "Content proximity to Life".

Figure 11 .
Figure 11.Distribution of indicators for "Congruence with Reality".

Figure 12 .
Figure 12.Distribution of indicators for "Content proximity to Life".

Figure 11 .
Figure 11.Distribution of indicators for "Congruence with Reality".

Figure 11 .
Figure 11.Distribution of indicators for "Congruence with Reality".

Figure 12 .
Figure 12.Distribution of indicators for "Content proximity to Life".Figure 12. Distribution of indicators for "Content proximity to Life".

Figure 12 .
Figure 12.Distribution of indicators for "Content proximity to Life".Figure 12. Distribution of indicators for "Content proximity to Life".

Figure 13 .
Figure 13.Quantity of "Content proximity to Life" indicators.

Figure 13 .
Figure 13.Quantity of "Content proximity to Life" indicators.

Figure 13 .
Figure 13.Quantity of "Content proximity to Life" indicators.

Table 1 .
Results of different research interests.

Table 1 .
Results of different research interests.

Table 1 .
Results of different research interests.How Are the Designated Design Parameters for AR in Educational Experiments Implemented?
How Are the Designated Design Parameters for AR in Educational Experiments Implemented?