Peer Assessment in Physical Education: A Systematic Review of the Last Five Years

: Purpose: A systematic review of the use of peer assessment in Physical Education in the last ﬁve years (2016–2020). Method: Four databases were used to select those articles that included information on peer assessment in Physical Education in the di ﬀ erent educational stages. According to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, including the PICO (participants, intervention, comparators, and outcomes) strategy, after the exclusion criteria, 13 articles were fully assessed based on seven criteria: (1) year and author; (2) country; (3) educational stage; (4) type of paper; (5) purpose; (6) content; and (7) outcomes. Results: the results show that the research was geographically dispersed, although Spain and the USA had half of the articles reviewed. The research was carried out at all educational stages, although a greater focus was observed in higher education than in primary and secondary education. Quantitative, qualitative, and mixed research was almost equally represented, and dealt mainly with sports and games. Regarding the goals of the studies, a diversity of research so great that it produced a lack of continuity and coherence in the literature on the subject was found. The research results on the use of peer assessment showed an increase in the level of motivation, perceived teaching conﬁdence and competence, and teaching self-e ﬃ cacy. More research is needed on the beneﬁts of the use of peer assessment on the self-regulation of learning and the critical thinking of students.


Introduction
The role of assessment has increased in recent years in the field of education [1]. There have been many publications, scientific conferences, and training sessions on it, but the transfer to the educational reality is not always easy and the truth is that certain conceptual errors still exist, such as, for example, the indiscriminate use of the concepts of assessment and scoring as if they were one and the same thing [2]. One of the skills expected of a teacher is his or her ability to evaluate the teaching-learning process [3,4], necessarily relating assessment activities to planned activities in order to give meaning to the whole pedagogical process [5]. Such is the importance given to evaluation that some authors consider that the first and most important change to be made at the methodological level for a global transformation of the teaching-learning process has to do with the implementation of formative assessment [6,7] in substitution of the more traditional assessment with a summative approach.

Formative Assessment: A Key Element for Learning
Formative assessment is any evaluative process that aims to improve the teaching-learning process in its three lines of intervention: improving learning and evidence of student learning, improving the teaching process of the teacher, and, finally, improving the teaching-learning process in a progressive way, correcting and refining the procedures carried out in it [8][9][10]. Brown and Pickford [11] define it as the process used to recognize and respond to student learning and thus reinforce it during the teaching-learning process. In this way, assessment acts as a tool for student self-knowledge and the improvement of all educational processes [12]. However, assessment in itself does not produce beneficial effects if it is not given special treatment, so it has to be approached in an intentional pedagogical way in the classroom in order to generate competences in the students [13]. To do this, and in order to incorporate assessment into action structures based on the motivation of the students, it is essential to involve them actively in the assessment procedure [14]. In this sense, the implementation of formative and shared assessment processes has shown a greater awareness of what students learn, as well as a greater capacity to self-regulate their tasks over time [15]. Here the concept of triadic assessment arises, understood as a triple assessment approach which combines self-assessment, peer assessment, and teacher assessment, in the same instrument, before a final grade, and on a given assessment procedure [16].
The formative assessment processes are shown to be ideal for the use of assessment as a tool which favors self-regulation and awareness of learning by the student and the extrapolation of learning to a variety of contexts [17]. Hortigüela-Alcalá, Pérez-Pueyo, and González-Calvo [2] highlight five reasons for applying formative assessment: Firstly, the student improves his/her awareness of learning when he/she participates in the assessment process. Secondly, it favors the self-regulation of learning by influencing the organizational capacities of the students, as already pointed out by Meusen-Beekman, Joosten-ten Brinke, and Boshuizen [18]. Thirdly, it makes it possible to apply learning to other different contexts outside the classroom since, as Joughin, Dawson, and Boud [19] comment, knowing in depth personal limitations and possibilities facilitates the transfer of learning from knowledge to know-how. Fourthly, there is an increase in feedback channels, which enriches the information that reaches the student as it comes from different sources and not only from the teacher [20]. Finally, the use of formative assessment improves teaching practice, since, as Wei [21] points out, the teacher reflects on the impact that teaching is having on students and, therefore, what they should do to improve the process.

Peer Assessment: Giving Students Responsibility in the Assessment Process
In addition to the need for evaluation to be formative, many authors point out that it should be shared [22][23][24][25], thus encouraging student participation throughout the process and increasing their awareness of what they are learning. As is the case with educational assessment in general, peer assessment only makes sense when it is learning-oriented; hence authors such as Kepell et al. [26] use the concept of learning-oriented peer assessment. Assessment must stop being an individual and imposed process and become a dialogue in which students play an important role from the point of view of decision-making [27]. According to these authors, the three most common techniques for carrying out shared assessment processes are self-assessment, dialogue evaluation, and peer assessment. Peer assessment is a very useful learning strategy to improve the feedback process in students [28], encouraging critical thinking [29]. This concept, coined at the end of the last century [30], gives students the role of evaluator and advisor at the same time as the peers are carrying out the proposed activities.
Numerous articles have been published in recent years showing some of the advantages of this evaluation strategy. Studies show the effectiveness of peer assessment in increasing students' active participation, motivation level, and improvement in learning attitudes [31,32]. Furthermore, the use of peer assessment improves students' capacity for reflection and commitment and reduces the teacher's burden, thus allowing teachers to pay more attention to other important factors [33], as well as Sustainability 2020, 12, 9233 3 of 15 facilitating a better use of time when students participate in assessment processes in large classes [34]. According to Chetcuti and Cutajar [35], it also favors processes of self-assessment, self-government, and enhances the higher-level thinking skills. Authors such as Nicol, Thomson, and Breslin [36] add that providing feedback to classmates generates even greater benefit than just receiving it, as it triggers higher-order processes from a cognitive point of view, such as diagnosing problems and suggesting solutions. A recent systematic review concludes that the use of peer assessment has a positive impact on academic performance, over and above traditional teacher assessment, although with levels similar to self-assessment [37]. However, another recent meta-analysis [38] shows that, from a learning point of view, there are only significant benefits in the use of peer assessment when both teachers and students have been previously trained in such a way that the essential procedures and mechanisms of this type of assessment are known.

Peer Assessment in Physical Education: A Gap in Literature
In recent years, successful practices on the use of formative and shared assessment in the area of Physical Education (PE) have been disseminated [10] within the framework of the international concept of alternative assessment [23], as opposed to traditional assessment. The emphasis is on assessment which contributes to the generation of significant learning by allowing the participation of students throughout the teaching-learning process [7].
Nevertheless, from the area of PE little research has been carried out that deals with peer assessment as an evaluation procedure integrated into the current of formative evaluation. Peer assessment is presented as a shared evaluation mechanism that encourages student participation and promotes learning by allowing greater awareness of the evaluation criteria and even participation in their elaboration [2]. Students evaluate their peers, acquiring a role of observer and evaluator that broadens their vision of the teaching-learning process. Guidelines have been proposed for the implementation of peer assessment in the subject of PE in the educational context [39,40], and field research has been carried out in which positive results have been obtained, such as the increase in motivation for content [41], in the level of confidence in secondary education students [42], and in the initial training of future teachers [43]. However, the vast majority of articles published in the last twenty years deals with formative and shared assessment together, integrating peer-assessment processes within other more general ones and coexisting in almost all cases with self-assessment [44][45][46][47][48][49], in such a way that it is difficult to establish to which specific assessment procedure the results obtained in all this research are due. It is for this reason that we believe it is essential to analyze the real impact on learning that the use of peer assessment has on PE students, isolating this type of shared assessment from other associated mechanisms such as self-assessment or teacher assessment. Although some reviews on assessment in the general educational context have been published in recent years [50][51][52][53][54], including a systematic review on the use of alternative assessment in PE [23], to date, as far as we have been able to ascertain, there is no systematic review addressing peer assessment in the context of PE, so the aim of this paper is to conduct a review of the scientific literature published over the last five years (2016-2020) on peer assessment in PE. Therefore, this research focuses specifically on student evaluation and student involvement in it, contributing directly both to the concept of sustainability of the journal itself and to the thematic line of the Special Issue on evaluation in education from a sustainability perspective.

Search Sources
The present paper consists of a systematic review of articles published in the last five years on the use of peer assessment in PE. Papers published between January 2016 and September 2020 were searched in four electronic databases: SCOPUS, ERIC, Web of Science, and Taylor and Francis. The descriptors "Peer assessment" and "Physical education" were used with the search operator AND.

Exclusion Criteria
The exclusion criteria used were as follows: (1) Duplicated articles, (2) Articles not published in journals indexed in the Journal Citation Report (JCR) or the Scimago Journal Rank (SJR), (3) Articles in languages other than English or Spanish, and (4) Articles using peer assessment in contexts other than PE.

Limits and Methodology of the Search
The search was conducted following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [55], including the PICO strategy: Participants (e.g., primary, secondary, country), Intervention (e.g., content, type of research), Comparators (e.g., Peer Assessment, Physical Education), and Outcomes.

Procedure
The research began on 10 September 2020 and ended on 30 September 2020. Firstly, the criteria for selecting the articles that could be part of the review were drawn up, as well as the selection of exclusions and the databases in which to carry out the bibliographic search. As for the inclusion criteria, after a review of the scientific literature on the subject, we found that the term "peer assessment" is fully accepted and widespread. Since the focus of the review was not on formative assessment procedures in general, but on peer assessment, it was decided to use this term. To this was added the term "physical education" to limit the search to that context only. Those articles that dealt with peer assessment in areas other than physical education were discarded. After completing the process of defining the inclusion and exclusion criteria, the selection of the databases for the bibliographic search was carried out. Four databases were selected for the following reasons. ERIC was chosen because it is the online database with the most articles in the field of education. SCOPUS and WEB OF SCIENCE are the two most important citation databases in the world and are highly regarded by the scientific community, so researchers considered it essential to include them in the review. Taylor and Francis was selected for its strong worldwide presence and for having over 2600 journals in its database.
All articles were extracted from the databases and analyzed through the MEDELEY software. With the inclusion criteria, initially, 104 publications were found using the mentioned descriptors: 68 articles from Taylor and Francis, 20 articles from ERIC, 7 articles from SCOPUS, and 9 articles from Web of Science ( Figure 1). The analysis of the articles was carried out by two researchers, who worked independently, respecting the criteria of inclusion and exclusion. At the end of the work they shared the results. After the second phase of exclusion, in which those articles that dealt in a general way with formative or shared assessment, but without explicitly naming peer assessment, were discarded, only 13 articles remained. The most complex phase was this second one, since in the databases there were several articles that contained the concept of peer assessment in their abstract or keywords but, after a careful reading of the whole text, it was found that they did not refer to this type of assessment in particular, but rather dealt with assessment for learning, formative assessment, or self-assessment in a generic way, without going into detail or quoting peer assessment in particular. This is the reason why 18 other articles were discarded, leaving only 13. field research in which the peer assessment process has been implemented and concrete results are obtained. (5) Purpose: the objective of the study. (6) Content: it details the curricular content around which the research is developed when it has a more limited duration, or it states that it is research that covers a complete school year in which many contents are developed. (7) Outcomes: this last category describes the main results of the research. It should be noted that this last category does not make sense for articles with a theoretical focus, as they do not show the results of research.

Quality Assessment
To ensure that the selected articles, following the inclusion and exclusion criteria, were of sufficient quality to be considered in the present review, three procedures were carried out. First, the review was included in the Prospective Register of Systematic Reviews (PROSPERO) register, an international database for systematic reviews. This database records and maintains permanently the key features of the review protocol. Second, the PRISMA guidelines [55] were used to assess the quality of this systematic review. PRISMA includes an evidence-based set of items to report the quality of meta-analyses and systematic reviews. In addition, the A MeaSurement Tool to Assess systematic Reviews (AMSTAR) 2 critical appraisal tool for systematic reviews [58] was used. Third, the criteria for assessing the quality of the selected studies were based on the Consolidated Standards of Reporting Trials Statement [59], the Checklist for Measuring Study Quality [60], and the Strengthening the Reporting of Observational Studies Statement [61].  Table 1 was drawn up with the 13 final articles selected, following a systematic and thorough review process, in which each was described on the basis of the following categories, taken from previous systematic reviews [56,57]. (1) Author and year of publication: this category shows information about the authors of each article and the years in which the publications were made, over the last five years.
(2) Country of application of the model: it shows the countries in which the research was carried out, regardless of the country of origin of the authors or the place where the publisher of the journal in which the article is published is located. (3) Educational stage: this category details whether the article is contextualized for the primary education, secondary education, or higher education stage, or whether it has a general orientation for any stage. (4) Type of paper: includes information about the type of article, as it can be an article with a theoretical approach or a field research in which the peer assessment process has been implemented and concrete results are obtained. (5) Purpose: the objective of the study. (6) Content: it details the curricular content around which the research is developed when it has a more limited duration, or it states that it is research that covers a complete school year in which many contents are developed. (7) Outcomes: this last category describes the main results of the research. It should be noted that this last category does not make sense for articles with a theoretical focus, as they do not show the results of research.

Several games and sports
Both groups showed the same significant improvement in teaching competence and confidence, as well as in the perception of self-efficacy. To learn the importance and functionality that assessment rubrics used in written group tasks have for teachers in initial training.
Written group assignments (a) it is easier to perform the task in a better way when the assessment criteria is known in advance; (b) there were significant differences in the students' previous experiences of peer assessment; and (c) students showed their will to use formative assessment in the future.
Macken, MacPhail, and Calderon (2020) Ireland Primary Research paper: qualitative approach. Field notes, reflective journals, and interviews To examine the extent that primary PSTs demonstrate assessment literacy in their enactment of AfL while teaching PE.

All the curricular contents, in general
The use of teacher educator modelling, mentoring, and scaffolding with primary school students, during upskill sessions and in-situ during the PST school placements, enhanced the PSTs' assessment literacy in the enactment of AfL in primary PE to a greater extent than when implemented during the module with their PST peer.
Martos-García, Usabiaga, and Valencia-Peris (2018) Spain Higher Education Research paper: mixed approach. Questionnaire and test To analyze the differences of perception between two groups of students when undergoing a formative and peer assessment process through the use of the blogosphere.

Basque pelota and Valencian pilota
Basque students were more satisfied with the assessment tool used than the Valencian students. In both groups they point to the motivating and functional component of the blogosphere in contrast to other more traditional evaluation systems.

Michael and Webster (2020) USA Primary and Secondary Theoretical
To introduce the Pickleball Assessment of Skill and Tactics (PAST).

Pickleball
No research results as it is an article with a theoretical approach.

Soytürk (2019) Turkey Higher Education
Research papers: quantitative approach. Observation forms To analyze efficiency of teacher candidates in movement analysis, self-evaluation, and peer evaluation for four basic volleyball skills.

Volleyball
The teacher candidates' scores for self-evaluation of their skills and their peers' scores were found to be correlated.

Quality Assessment
To ensure that the selected articles, following the inclusion and exclusion criteria, were of sufficient quality to be considered in the present review, three procedures were carried out. First, the review was included in the Prospective Register of Systematic Reviews (PROSPERO) register, an international database for systematic reviews. This database records and maintains permanently the key features of the review protocol. Second, the PRISMA guidelines [55] were used to assess the quality of this systematic review. PRISMA includes an evidence-based set of items to report the quality of meta-analyses and systematic reviews. In addition, the A MeaSurement Tool to Assess systematic Reviews (AMSTAR) 2 critical appraisal tool for systematic reviews [58] was used. Third, the criteria for assessing the quality of the selected studies were based on the Consolidated Standards of Reporting Trials Statement [59], the Checklist for Measuring Study Quality [60], and the Strengthening the Reporting of Observational Studies Statement [61].

Results and Discussion
The 13 articles selected between January 2016 and September 2020 are discussed around the seven elements used in the categorization set out in Table 1. The year is not included in the discussion as they are all from the last five years.

Country
In order to know the degree of dissemination of the peer assessment in PE throughout the world, this category has been included in the analysis. The results show a variety in the countries where research has been carried out on the use of peer assessment in PE. Four continents are represented, although more than half of the publications have been made in the USA (three articles) and Spain (four articles). The amount of research carried out in these two countries is significant, as it is carried out by different research groups belonging to different universities all over the countries. However, the volume of articles on this subject published in Spain and in the USA is not surprising, as both countries have been working on educational assessment in PE for more than twenty years. In 2007, an article was published in which the path followed in Spain towards the construction of quality formative assessment in PE since the end of the 20th century was described [62], and in the USA there is a long tradition of the use of assessment for learning influenced to a great extent by the policies of the Welsh government exported to the other side of the ocean [63]. Australia has two publications by the same research team with a very similar focus, with only two years' difference between one article and the other. Norway, Taiwan, Ireland, and Turkey complete the remaining four articles.

Educational Stage
In terms of the educational stage on which the publications reviewed focus, the results are heterogeneous. Except for the article by Flynn, Duell, Dehaven, and Heidorn [64], whose focus is on swimmers of any age, the rest of the publications explicitly express the educational stage to which they refer. Seven of the thirteen articles are contextualized in higher education, geared towards future PE teachers or sports coaches. This result is the consequence of a wide dissemination of research on formative assessment in higher education in recent years, both online [65,66] and face-to-face [67][68][69], and greater facility for researchers to investigate in the context in which they work on a daily basis. The remaining five articles focus on PE practice at the primary (6-11 years old) and secondary (12-18 years old) stages. Traditionally, assessment methods that allow for objective measurement, such as tests and physical protocols, have been very present in PE, showing a lack of understanding of the objective associated with learning that any evaluation process should have [11,12] and generating a certain reluctance on the part of teachers to apply assessment procedures that are less simple to quantify or measure [70,71], although in recent years an approach to alternative methods has been observed [72]. The inclusion of this category of analysis supports the argument that there is a need to broaden and deepen research on the use of peer assessment in the school context, as there is little research on the early stages of education.

Type of Paper
In the inclusion and exclusion criteria, it was decided not to limit the review to research articles, as it was considered interesting to explore the methodological orientation of publications on peer assessment in PE. The results show that three of the thirteen articles have a theoretical approach, so they do not use a sample from which an experiment is carried out and from which results are extracted to be analyzed. It is surprising how few theoretical articles exist given the importance of the subject matter and its direct influence on learning [12]. The article by Aarskog [73] deals with student participation in shared assessment processes, based on the theory of Black and William [74] and comparing it with the educational reality of Norway. Michael and Webster [75] propose a shared assessment instrument for Pickleball content (Pickleball Assessment of Skill and Tactics (PAST)), while Flynn, Duell, Dehaven, and Heidorn [64] present a program called Kick, Stroke, and Swim (KSS) for teaching swimming, giving practical ideas for assessing learning in a shared way. As for the eleven research articles included in the review, six of them have a qualitative approach, using various data collection instruments such as questionnaires [76][77][78], semi-structured interviews [78,79], self-reports, and reflective journals [79,80] or video-recording [81]. Three articles use a quantitative methodology through the application of tests [28], observation forms [82], and self-reports whose data were treated quantitatively in a two-arm randomized trial design [83]. Martos-García, Usabiaga, and Valencia-Peris [84] propose a mixed design for which they use both questionnaires and tests. The fact that there are more qualitative articles than quantitative ones points to a trend towards the use of less positivist approaches in the world of educational research, traditionally taken up by quantitative approaches [85,86] in which the aim is to find evidence rather than to understand the phenomena that take place in the educational context.

Purpose and Content
The heterogeneity present in the articles included in the review is also shown in the purpose they pursue and the content in which they develop the discourse. On the one hand, we find several articles in which the central content is a sport or a set of sports, pursuing for each of them very different purposes. The article by Michael and Webster [75] aims to present an assessment instrument, to be used among students, of the technical-tactical aspects of Pickleball. Flynn, Duell, Dehaven, and Heidorn [64] focus on providing strategies and techniques to increase swimmers' commitment using the Kick, Stroke, and Swim program, as the benefits of using shared assessment on motivation have been demonstrated [41]. Kuo, Chen, Chu, Yang, and Chen [28] have as their main objective to develop a mobile learning system for a Kung Fu Tai-Chi PE course through a peer-assessment mobile PE approach. Martos-García, Usabiaga, and Valencia-Peris [84] compare the perception of two groups of students from two different universities about the use of formative and shared assessment strategies through the blogosphere to evaluate the learning of the essential aspects of two traditional sports in the cities where the research was carried out: Basque pelota and Valencian pilota. Soytürk's paper [82] analyzes the efficiency of the use of formative assessment and peer assessment by future PE teachers to evaluate the learning of four volleyball techniques. Research by Eather, Riley, Miller, and Bradley [76] seeks to explore the benefits of using peer dialogue assessment in different invasion games. Two years later, these same authors [83] compare the effects of the use of peer dialogue assessment with those of dialogical feedback provided by an academic in different sports games. The works of Asun-Dieste, Romero-Martín, Aparicio-Herguedas, and Fraile-Aranda [80], of Aarskog [73], and of Macken, MacPhail, and Calderón [79] are not based on a specific content either, but deal with different contents of the curriculum, although with different purposes: while the first article seeks to detect the difficulties from a proxemic point of view that occur when leading physical activity classes, the other two focus on how students participate in the assessment of their own learning as a starting point, since only a correct use of shared assessment can produce beneficial effects on learning [38]. The fact that some articles deal with the use of peer assessment in all curricular content shows the great transversality and applicability of the use of formative assessment [2].

Outcomes
Although the thirteen articles included in the review include peer assessment in PE in one form or another, not all of them have assessment as their main subject. This is the case with the article by Asun-Dieste, Romero-Martín, Aparicio-Herguedas, and Fraile-Aranda [80], whose purpose is to identify the spatial difficulties generated in the development of the direction of a physical activity session, categorizing the results into four groups: teacher orientation and position, group position and organization, teacher movement, and physical and affective distance-immediacy established between teacher and students. Therefore, in this paper the peer assessment is used as an instrument to achieve the objectives of the research, not as an end of it. The three articles with a theoretical focus have not been included in this category, since they do not generate results from field research. The other nine articles do generate results related to formative assessment in general and to peer assessment in particular. Two articles [28,84] show among their results an increase in student motivation after the use of peer assessment processes, in line with Santana, Bedoya, and Robles [41]. The two works from Australia [76][77][78][79][80][81][82][83] show that the use of peer assessment produces an improvement in perceived teaching confidence and competence, and teaching self-efficacy, coinciding with the results of previous research [42,43]. As reflected in the scientific literature, peer assessment produces an improvement in learning awareness [2,19]. Similar results were obtained in the article by Canadas, Castejón, and Santos-Pastor [77], whose participants were able to culminate the peer-assessment process with the rating of their peers. In the research by López-Pastor, Pérez-Pueyo, Barba, and Lorente-Catalán [78], it is concluded that previous knowledge of the assessment instrument is essential, since, although the use of rubrics or other instruments favors the development of the process, the previous experiences of the students with the formative and shared assessment are decisive, in line with the work of Li, Xiong, Hunter, Guo, and Tywoniw [38], as the students expand their awareness of learning by being responsible for it [2]. This is precisely the conclusion reached by Alstot [81] in his work, showing that the students are capable of making a correct evaluation of the learning of their peers after a sufficiently long process of training in the formative and shared-assessment procedures. Soytürk's [82] research shows a high correlation between the results obtained through peer assessment and through self-assessment, coinciding with Krause, O'Neil, and Dauenhauer [49] and with Chróinín and Cosgrave [48], since both procedures depend largely on previous training and experience [38]. This result is particularly favorable for time management by the teacher, since, if students participate in the assessment process, it can facilitate the teacher's work and allow more time to be spent on the most pressing issues [33,34]. Another result of the application of peer assessment is the increase of socialization, as detailed in two of the research studies reviewed [28,84], since dialogue between students is required [27] and interaction between all participants in the process is increased [25]. No results have been found that refer to the self-regulation of learning through the use of peer assessment [15,17,18] or to the development of critical thinking [29].

Conclusions
The review carried out shows that little research has been done in recent years on the use of peer assessment in PE. The existing bibliography on the use of formative or alternative assessment is extensive and includes the shared-assessment approaches within which peer assessment is found. However, as seen in this review, few articles address peer assessment in PE specifically, making it difficult to identify to which of all formative assessment processes the results shown in the research are due. Only thirteen articles have been published and only ten of them are field research. The two countries that have done most research on the subject are the USA and Spain, with more than half of the total publications in the last five years. This shows the need to further internationalize research on formative and shared evaluation. There is an effort to investigate the benefits of shared assessment at the university level, but much more research is needed in the primary and secondary school context. It is essential that the university approaches the school context in order to obtain the most reliable information possible on what is happening in the teaching-learning process. Only through a real transfer from theory to educational practice can the desired dynamics be changed, and it is essential that research efforts are directed towards a transformation of educational reality throughout all stages. The methodological approach is quite heterogeneous, with qualitative, quantitative, and mixed research, and with much diversity in the purposes of the studies. This lack of continuity in research results in a great variety of results that cannot be compared with other similar studies, in addition to generating gaps in knowledge that have not been covered until now, such as the effects of the use of peer assessment on the self-regulation of learning and the development of critical thinking or motor skills. There are studies that show these benefits in other areas of knowledge, but there is a lack of scientific evidence applied to PE.
The main contribution of this work is to provide the scientific literature with the first review on the use of peer assessment in PE, since until now none existed. Furthermore, a very complete information is offered, divided into different categories, which can serve as an aid for future reviews on assessment in PE.
As a line of future research and given the large number of unaddressed aspects in the scientific literature that this review has left in evidence, it would be interesting to carry out research in schools in which the benefits of the use of peer assessment on self-regulation of learning, on critical thinking, and on learning by students are proven. It is essential that teachers can research and reflect on their own practice in order to increase scientific and practical evidence on formative assessment in general and peer assessment in particular in the context of PE. Research on the use of formative and shared assessment in general and on peer assessment in particular shows the great impact that these processes have on learning, so it is essential to explore the extent to which these benefits occur when peer assessment is applied in school PE. Students cannot simply be receiving agents of contents but must be part of the teaching-learning process in order to be able to self-regulate their progress, to know the reason for the activities they carry out, and to understand the evaluation criteria that will verify the learning and contribute to improve the process to achieve an optimal development of their physical, cognitive, affective, and social potential.

Conflicts of Interest:
The authors declare no conflict of interest.