Next Article in Journal
FP-Deeplab: A Novel Face Parsing Network for Fine-Grained Boundary Detection and Semantic Understanding
Previous Article in Journal
A Machine-Learning-Based Approach for the Detection and Mitigation of Distributed Denial-of-Service Attacks in Internet of Things Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Study on the Consistency and Efficiency of Student Performance Evaluation Methods: A Mathematical Framework and Comparative Simulation Results

by
Cecilia Leal-Ramírez
1,
Héctor Alonso Echavarría-Heras
1,*,
Enrique Villa-Diharce
2 and
Horacio Haro-Avalos
3
1
Centro de Investigación Científica y de Estudios Superiores de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Baja California, Mexico
2
Centro de Investigación en Matemáticas, A.C. Jalisco s/n, Mineral Valenciana, Guanajuato 36240, Guanajuato, Mexico
3
Instituto Mexicano de Investigación en Pesca y Acuacultura Sustentables (IMIPAS), Km 97.5 Carretera Tijuana-Ensenada, s/n, El Sauzal de Rodríguez, Ensenada 22760, Baja California, Mexico
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(11), 6014; https://doi.org/10.3390/app15116014
Submission received: 7 April 2025 / Revised: 9 May 2025 / Accepted: 16 May 2025 / Published: 27 May 2025

Abstract

:
Background: Consistent evaluation methods foster fairness, reduce bias, and enhance student understanding and motivation. Notably, mathematical inconsistencies, such as improper weighting, flawed averaging, and unsound scaling, can undermine the accuracy and reliability of assigned grades. This paper addresses the critical need for consistent student evaluation methods, with a primary focus on ensuring mathematical consistency within grading systems. Methods: We propose a scheme aimed at identifying inconsistencies in student evaluation related to the mathematical framework of the used grading method. To explain the functioning of our construct, we provide mathematical representation of conventional grading methods, including summative assessments, rubrics, and the Systematic Task-Based Assessment Method (STBAM) that we have recently developed, which incorporates both traditional and fuzzy logic-based grading modules. We introduce a Consistency Index ( C I M ) depending on the Mean Absolute Deviation of assigned scores ( M A D M ) and a method’s Complexity Pointer ( C P M ). We also propose a method’s Efficiency Index ( β M ) expressed in terms of the Consistency Index and the latter indicator. We compared the mathematical consistency and efficiency of the methods addressed in this study through simulation runs. Results: We demonstrated how the proposed indices can reveal the strengths and weaknesses of each grading scheme analysed. Conclusions: The fuzzy logic-based modulus of the STBAM yielded the highest values of C I M and β M . However, performing a pending analysis of scalability, teacher training, and cultural adaptability would be essential to strengthen the potential of the STBAM to be adopted as a reliable grading alternative to conventional grading approaches. In the meantime, our approach could provide a clear, logical, and defensible framework for testing the mathematical consistency of student assessment methods.

1. Introduction

Consistent performance evaluation methods ensure the use of the same criteria and standards for assessing all students [1,2]. While this does not guarantee the complete elimination of bias, particularly that arising from teacher subjectivity, it contributes to ensuring that all students within a group have an equal opportunity to demonstrate their learning. In this context, the mathematical consistency of a grading method fosters fairness and equity at the group level [3].
It is worth adding that generally when the methods of evaluation are stable, students understand how they will be assessed and what they are expected to achieve. As a result, transparency is ensured, which further sustains fairness and reduces anxiety [4]. We can therefore undoubtedly state that consistent evaluation methods lead to a more accurate assessment of student learning. Conversely, if evaluation methods vary, comparing student performance and tracking progress over time becomes difficult [5].
Consistent methods produce more reliable student scoring results. This means that when activating the same proofing at different times, the results are expected to be similar [6]. Particularly, assessing mathematical consistency spreads confidence in evaluation methods, as educators can identify areas where there may be conflict or bias. This enables the improvement and accountability of assessment methods by suggesting the implementation of adjustments and advancements to ensure more accurate and fair assessments [7,8]. Establishing consistent evaluation methods holds educators accountable for their grading practices. This encourages them to use fair and practical protocols to assess student learning [9]. However, consistent evaluation methods also allow students to understand what is expected of them. This can increase their motivation to learn and achieve [10]. Implementing consistent evaluation methods will enable students to understand their strengths and weaknesses. This helps them focus on areas for improvement, facilitating targeted feedback [11]. In summary, we can state that assessing the mathematical consistency of a student evaluation method not only promotes student success but also improves teaching practices while enhancing the overall quality of the learning experience.
Our primary concern here is not just general consistency in evaluation, but also the mathematical consistency underlying how we assign grades. For instance, when relying on a summative approach, grading often involves combining scores from different learning assignment sections. A mathematically inconsistent weighting of these components could unintentionally bias the final grade. For example, if a small section carries the same weight as a major one, a student might be unfairly penalized for a minor slip-up [12]. Therefore, a quest for the mathematical consistency of a grading method begins by preventing unintended bias tied to improper weighting [13]. How we average scores can also introduce bias. A simple average might not accurately reflect a student’s overall progress [14]. We specifically refer to the lack of information in plain averages, which makes it challenging to infer whether performance has improved significantly over time. More sophisticated methods, such as weighted averages or trend analysis, may be necessary for a more accurate and fair assessment. Then, averaging tied inconveniences should also be avoided for consistency [15]. If scores are scaled or transformed (e.g., converting raw scores to percentages), the scaling method needs to be mathematically sound. Otherwise, it can distort the meaning of the scores and misrepresent a student’s actual understanding [16]. Then, concern with granting consistency to a grading method ensures that scaling leads to an accurate representation of learning. If grade distribution is not based on sound mathematical principles, it can lead to unfair comparisons between students and create unnecessary competition. Therefore, how grades are distributed also matters in assessing the consistency of an assessment method [17,18]. When the mathematical setup of grading is clear and well defined, students can recognize how their grades are aligned. This transparency fosters trust and alleviates concerns about the grading process [19]. Then, testing the consistency of a scoring scheme should examine to what extent the involved mathematical setup enhances clarity, leading to transparency and trust. A mathematically consistent grading system is reproducible. This means that if the same student were evaluated by different teachers using the same system, they would likely receive similar grades [20]. This accommodates consistency well in enhancing the reliability and validity of the grades. A mathematically sound grading protocol allows for meaningful analysis of student performance data. Instructors can utilize these data to identify areas where students are struggling; to evaluate the effectiveness of teaching methods; and to make informed decisions about curriculum and instruction. Then, in our view, a consistent grading scheme requires mathematical coherence to facilitate data-driven improvement [21]. Consistent mathematical frameworks enable comparisons of student performance across different classes, schools, or even districts. This can help to identify the best practices and areas for improvement at a larger scale. Therefore, mathematical consistency also prompts sound comparison of student performance [22,23]. In essence, we have attempted to highlight that a mathematically consistent approach to grading ensures that student evaluations are not arbitrary or subjective. On the contrary, consistency in the sense that we are dealing with sustains clear, logical, and defensible evaluation practices. This is crucial for promoting fairness, accuracy, and meaningfulness in student assessment.
This paper proposes a framework for evaluating the consistency of the formal setup of designated student performance assessment protocols. On order to explore our construct’s performance, we will consider Summative assessments (S) and Rubric (R) as conventional evaluation approaches. These methods rely on standardized tests that assess specific knowledge or skills through multiple-choice, true/false, or fill-in-the-blank formats, where correct answers are quantified and the final score is a straightforward sum. Additionally, for comparison, based on the integrated instruction method presented in [24], we built what we refer to as the Systematic Task-Based Assessment Method (STBAM), which sustains both a direct grading modulus (ST) bearing a conventional structure plus its fuzzy logic based counterpart (ST-FIS). To explore the mathematical consistency of each one of the grading schemes addressed here, we first conceived what we consider to be its fundamental mathematical formalization. Moreover, we adapted a Consistency Index ( C I M ) depending on the Mean Absolute Deviation ( M A D M ) of grades entiled by the output of the considered grading method M , plus a Complexity Pointer ( C P M ) expressed as a weighted sum of the number of components involved, the number of distinct elements contributing to the final grade, and the number of mathematical operations required to compute it. We also combined the C P M , C I M and M A D M values to offer what we conceive as a grading method’s efficiency index ( β M ). To explore the consistency of the formal structure of the conventional S and R methods compared to those deriving from STBAM advanced here, we will rely on simulation runs to obtain the values of M A D M ,   C I M , and β M associated with each grading method and use the values of these indices to stress the benefits and shortcomings projected for each method.
Section 2 provides a summary of the Summative, Rubric, ST and ST-FIS, along with their mathematical descriptions. Section 3 presents the results and discussion. Section 4 includes discussion of the projections emanating from our construct. Section 5 concerns the conclusions of this study. We also provide five appendices that explain the specific details of the implementation of our consistency and efficiency rating device. Executable code for the calculation involved is provided in the Supplementary Materials section.

2. Materials and Method

Section 2.1 provides a summary of the summative assessment and its mathematical representation, which allows exploration of the consistency and efficiency of the scheme. Section 2.2 deals with generalities of the Rubric and offers its mathematical setup. Section 2.3 concerns the ST arrangement. Section 2.4 gives a note on the concept of objective grades, and Section 2.5 deals with the formalities of the ST-FIS.
We rely on simulation procedures to explore the consistency and efficiency of the grading protocols considered here. Generally, within a mathematical framework, simulations serve as a form of empirical investigation. This is particularly relevant when evaluating constructs such as grading methods. These can be conceptualized as algorithmic or rule-based systems. Simulation allows us to systematically explore the behavior of a grading method under controlled variations in input parameters. For example, we can examine how these schemes influence final grades by applying different weighting schemes to simulate student performance profiles. Such analyses reveal whether certain methods disproportionately penalize specific performance patterns (e.g., when a single poorly performed high-stakes task overrides consistent performance in other areas), thus exposing potential inconsistencies or biases in the method’s mathematical design. Agreement measures such as Mean Absolute Deviation or Mean Squared Deviation are used to assess the stability and sensitivity of grading outcomes under these scenarios. While we do not use actual student data, the simulations show how grading rules function in practice by exposing their structural implications consistently and repeatably.
Furthermore, well-designed simulations can yield meaningful insights about a method’s fairness, robustness, and internal coherence. If simulations show that a method produces predictable and proportionate outcomes across diverse scenarios, this supports its mathematical consistency. Conversely, recurrent anomalies or biases in simulated results can signal structural shortcomings. In this sense, simulation serves as a bridge between theoretical analysis and empirical evaluation. It allows us to investigate the practical implications of mathematical constructs within a controlled environment, contributing to our epistemological understanding of their behavior. We therefore consider simulation a valid and informative tool for the scope and objectives of our study.

2.1. Summative Evaluation

Summative evaluation can be described as an assessment assembly that takes place at the end of a learning period. For this, we mean, for instance, the conclusion of a learning activity, a course or a semester. The summative evaluation scheme targets measuring the extent to which the learning objectives have been achieved. To achieve its aims, this assessment approach often involves (1) final exams, which serve as comprehensive assessments covering all course material; (2) culminating tasks that are described as projects and presentations designed to demonstrate knowledge and the display of skills; and (3) standardized tests described as uniformized examinations that try to compare student performance across different groups.

2.1.1. Formalization of the Summative Assessment Protocol

When the S method of evaluation, based on traditional logic, is used to characterize E it is necessary to establish, initially, the different sections or components of the learning to be evaluated. For this explanation, we will consider that the symbol s i , represents each section and that we have a number m of them, that is, we will define the matrix A   ( m × 1 ) of these assignment sections formally,
A = s 1 s 2 s m 1 s m T .
Each assignment section s i for, i = 1 , 2 , , m will have an associated weight which is symbolized by w i . The weight determines the relative importance that the development of the learning section s i bears in determining the value of the global evaluation e s j of the j -th student’s performance. Then, we will also have to consider a matrix of weights W with the same number of entries as the matrix A of leaning sections. Let us say
W = w 1 w 2 w m 1 w m T ,
where the following condition must hold:
i = 1 m w i = 1 .
In addition, for j = 1 , 2 , , n and i = 1 , 2 , , m , we will use the symbol p j i to represent the number, expressed as a percentage, standing for the score or grade assigned by the teacher to the j -th student x j , for the performance shown while carrying out the activities contemplated in the assignment section s i . Therefore, with G , we associate a global matrix of scores P , by which we mean
P = p 11 p 12 p 1 m p 21 p 22 p 2 m p n 1 p n 2 p n m .
Particularly, for the j -th student x j we count on a matrix P j , A of a 1 × m dimension, which represents the evaluations obtained for performance in carrying out the activities associated with the learning section s k . To be precise,
P j , A = p j 1 p j 2 p j m 1 p j m T .
Consider now a group G which includes a number n of students. Let us represent through e s ( j , A , W ) , for j = 1 , 2 , , n , the total assessment or grade that the Summative Assessment (SA) method produces in correspondence to a student x j for the displayed performance. Let us symbolize by E S the ( n × 1 ) matrix of grades that are assigned to the G group, that is,
E S = e s ( 1 , A , W ) e s ( 2 , A , W ) e s ( n 1 , A , W ) e s ( n , A , W ) T .
Therefore, the global assessment matrix E S of the group G is obtained by multiplying the matrices P and W , by which we mean
E S = p 11 p 12 p 1 m p 21 p 22 p 2 m p n 1 p n 2 p n m w 1 w 2 w m .
Specifically, the teacher’s grade e s ( j , A , W ) assigned through the summative scheme is given by the scalar product of P j , A and W , that is,
e s ( j , A , W ) = i = 1 m p j i w i .
To introduce a criterion to assess the consistency of the mathematical scheme backing the SA method, we will assume that independently of the teacher’s grading, the student’s actual abilities determine a matrix P O j , A of objective scores, namely
P O j , A = p o j 1 p o j 2 p o j m 1 p j m T .
We can consider the associating objective grade e s o j , A , W given by the scalar product of P O j , A and W , that is
e s o j , A , W = i = 1 m p o j i w i .
The absolute deviation between e s j , A , W , and e s o j , A , W is denoted through the symbol e s ( j , e s , e s o ) and is defined by
e s ( j , e s , e s o ) = e s o j , A , W e s j , A , W .
Continuing with the building of the aforementioned consistency evaluation of the SA method, for the score p j i assigned by the teacher, we introduce a sequence labelling index q = 1 , 2 , , β m , so we can consider simulated replicates p ( q ) j i , for the score p j i . Then, we can consider deviations given by
e s ( j , e s q , e s o ) = e s o j , A , W e s q j , A , W ,
where the term e s q j , A , W stands for the q t h simulated replicate for the teacher’s assigned grades e s ( j , A , W ) that agree with Equation (8), as defined by
e s q j , A , W = i = 1 m p ( q ) j i w i .
Let e s o ¯ ( j , A , W ) stand for the average value of objective grades e s o j , A , W . We can then obtain the Summative method-determined Mean Absolute Deviation of e s q j , A , W relative to e s o ¯ ( j , A , W ) . We denote these statistics by means of e s ¯ j , e s q , e s o ¯ and define it through
e s ¯ ( j , e s q , e s o ¯ ) = q = 1 β m e s o ¯ ( j , A , W ) e s q j , A , W β m .
We set the convention that e s o j , A , W lies in the interval ( 0 , 1 ] . Then, we could set e s o ¯ j , A , W = 1 / 2 . We ahead adopt e s ¯ ( j , e s q , e s o ¯ ) as a proxy for the average rating estimation error for the summative method granting grade e s j , A , W , relative to the standardized mean passing grade e s o ¯ ( j , A , W ) .

2.1.2. Simulation Procedure for the Summative Scheme

In our simulation, we consider that the grades assigned by the teacher may vary within the range allowed by the educational system (0 to 100). Therefore, each simulated replicate p ( q ) i j is generated based on a random integer between 0 and 100 using the function r a n d i ( 100 ) in MATLAB 2018b, which is then divided by 100 for normalization purposes or using the function r a n d ( ) in MTALAB 2018b which produces numbers between 0 and 1. This approach models a possible subjective variation in the teacher’s judgment, maintaining pedagogical validity by avoiding negative or unrealistic grades. Each set of simulated learning section scores is then combined with the matrix of weights W to obtain a final score e s j , A , W . This is subtracted from the mean score, and the absolute error between the two is calculated to obtain the corresponding M A D S value. Then, this value is combined with the complexity pointer C P S to calculate C I S and β S . Appendix A provides a thorough explanation of the implementation of the Summative grading method.

2.2. The Rubric

Rubrics are regarded as valuable tools in educational evaluation. These constructs are often praised for providing clear, consistent student performance assessment and structured feedback that supports effective learning. As a scoring tool, rubric outlines specific criteria for evaluating performance on an assignment or task, defining various levels of achievement across these criteria. This structure helps to establish clear expectations for students and ensures consistent standards for grading. The elements comprising a rubric include the following. (1) Criteria that specify what will be assessed. Along with the criteria, the rubric scheme considers weights for each criterion. Weights designate the relevance of the associated criterion in the determination of the overall student’s assessment. (2) Performance Levels, which define stages of performance for each criterion. (3) Descriptors that help clarify what is expected at each performance level. Descriptors provide detailed explanations for the requirements to meet each level of performance for every criterion. (4) Scoring that generally assigns evaluations based on a summative scheme. This allows for converting quantitative assessments into qualitative scores.

2.2.1. Mathematical Setup for the Rubric

There is no single universal formula for all rubrics, as they vary according to the context and evaluation criteria. We offer here a possible adaptation of a qualitative heading. This method is usually represented by a table which we will refer to concisely using the symbol T ( R ) (see Table A2).
Usually, in the rows of T R , the criteria are accommodated. These indicate the specific aspects of the task to be assessed. They are placed starting from the second cell of the first column of T R . We will assume that we have a number m of them, that is, i = 1 , 2 , , m . In this way, we introduce the matrix H of the criteria:
H = h 1 h 2 h m 1 h m T .
In what follows, the symbol p h ( j ) i will be used to notate the performance of student x j relative to achieving the tasks linked to criterion h i .
Typically, a simple rubric R includes relative weights assigned to its criteria. These determine the degree of importance in assessing each student’s overall performance. Let ρ i be the relative weight of the criterion h i , i = 1 , 2 , , m , in the global assessment. By this, we mean that ρ i represents the percentage of the overall student’s assessment which is fixed by the criterion h i . Let us represent through Ρ the matrix of ( m × 1 ) of weights ρ i ,   i = 1 , 2 , , m . This consists of several entries that are equal to the matrix H of criteria, let us say
P = ρ 1 ρ 2 ρ m 1 ρ m T ,
where the condition
i = 1 m ρ i = 1 .
must be met.
Suppose T ( R ) represents the table of a simple rubric R expressed in its general form, and that is obtained by considering a number m of evaluation criteria h i according to Equation (15). We will use the symbol c r i s , to represent the cell in T ( R ) that is determined by the intersection of the i -th row and the s -th column of T ( R ) . From the second cell of the first line of T ( R ) , the general identifiers of the possible levels of performance exhibited by students when carrying out the activities stipulated in the different criteria appear. These generally refer to the various qualities that the expected achievement can have for each criterion. A linguistic descriptor will identify each level of performance, denoted by a symbol l s , where s = 1 , 2 , , k . Then, the formalization of R must include a matrix P L given by
P L = l s l 2 . l k 1 l k T .
Additionally, we assign to each level of performance l s a rating α l s according to the hierarchical ordering put by inequality
α ( l 1 ) < α ( l 2 ) < < α ( l k 1 ) < α ( l k ) .
Usually, the Rubric formulation sets for
α l s = s .
Then, symbolically, for a simple rubric, we will have a matrix P O L of ordered level indicators, namely,
P O L = α ( l 1 ) α ( l 2 ) α ( l k 1 ) α ( l k ) T .
In addition, for j = 1 , 2 , , n and i = 1 , 2 , , m , we will represent using ϑ ( j ) i s the numerical ranking that, according to the performance ordering of Inequality (19), was obtained by student x j for achievement shown in carrying out the activities contemplated in the criterion h i . Then, we have
ϑ ( j ) i s = I ( p h ( j ) i l s ) α l s .
where through the expression p h j i l s , we point to the condition the j -th student has obtained an assessment l s for work displayed in the activities aligning with the criterion h i , and I ( p h ( j ) i l s ) is the indicator function of the condition ( p h ( j ) i l s ) , by which we mean
I p h j i l s = 1 i f j t h   s t u d e n t   p e r f o r m e d   l s   i n   c r i t e r i o n   h i 0 i f j t h   s t u d e n t   d i d   n o t   p e r f o r m   l s   i n   c r i t e r i o n   h i .
Moreover, in the general case, the cells c r i s of T ( R ) implicitly contain information associated with the evaluation criteria h i and with the performance levels l s . In fact, the data carried by each cell c r i s can be conceived as a ternary association of the linguistic descriptors u i s , l s and the numerical hierarchy indicator α ( l s ) . The linguistic descriptor u i s specifies the characteristics or skills that the student must demonstrate to opt for the s -th performance level l s while carrying out the provisions of the i -th criterion. Then, we will have an association
c r i s u i s , l s , ϑ ( j ) i s .
Even more so, in the general case of a regular R , in addition to the above triad which typifies the cells c r i s of T ( R ) , for each evaluated student x j , with j = 1 , 2 , , n , the rubric’s formal set up includes a matrix A ( j , H , P L ) of a ( m × k ) dimension, with entries a ( j ) i s   f o r   j = 1 , 2 , , n and with i = 1 , 2 , , m and s = 1 , 2 , , k , which are given by
a ( j ) i s = ϑ ( j ) i s ρ i .
We rely on the R i A ( j , H , P L ) symbol to designate the i -th row of A ( j , H , P L ) , that is
R i A ( j , H , P L ) = a ( j ) i 1 , a ( j ) i 2 , , a ( j ) i ( k 1 ) , a ( j ) i k .
Moreover, for the j -th student, we also let T A ( j , H , P L ) represent the total sum of the entries of A ( j , H , P L ) , let us say
T A ( j , H , P L ) = i = 1 m s = 1 k a ( j ) i s .
Now, for p h ( j ) i expressing the quality of accomplishment of the duties tied to the i -th criterion by the j -th student, the rubric’s arrangement assigns one and only one performance level. We will denote this performance level using the symbol l s i . We also convene that α ( l s i ) stands for the numerical hierarchy indicator of l s i . Thus, we will have that all the entries of R i A ( j , H , P L ) but one will vanish. We denote such a non-vanishing entry through the symbol a ( j ) i s ( i ) which, using Equation (25), transforms into
a ( j ) i s ( i ) = I ( p h ( j ) i l s ( i ) ) α ( l s ( i ) ) ρ i .
Provided that for i = I , 2 , , m , we have
α ( l s i ) 1 , 2 , , k .
Likewise, we let the symbol S ( j , H , P L ) represent the m -tuple
S ( j , H , P L ) = a ( j ) 1 s ( 1 ) , a ( j ) 2 s ( 2 ) , , a ( j ) ( k 1 ) s ( k 1 ) , a ( j ) k s ( k ) .
Then, we ca define
T S ( j , H , P L ) = i = 1 m a ( j ) i s ( i ) .
From Equations (27) and (31), we would have the equivalence
T A j , H , P L = T S j , H , P L .
Therefore, we set the total direct quantitative grade e r ( j , H , P L ) assigned by the teacher to the j -th student to be
e r j , H , P L = T S ( j , H , P L ) k = T A ( j , H , P L ) k .
Consider a group G that includes a number n of students. Let e r ( j , H , P L ) for j = 1 , 2 , , n   b e the global evaluation or grade that R assigns to a student x j in a particular assignment. We will then have an ( n × 1 ) matrix E R of global grades assigned to G , namely,
E R = e r ( 1 , H , P L ) e r ( 2 , H , P L ) e r ( n 1 , H , P L ) e r ( n , H , P L ) T .
Additionally, e r ( j , H , P L ) provides a criterion for assigning a qualitative assessment for the overall performance of the j -th student. This is symbolized by e r Q ( j , H , P L ) . Moreover, we could propose the following inference rule,
e r Q ( j , H , P L ) = D e f i c i e n t i f 0 e r ( j , H , P L ) < 1 k L o w i f 1 k e r ( j , H , P L ) < 2 k A c c e p t a b l e i f 2 k e r ( j , H , P L ) < 3 k G o o d i f 3 k e r ( j , H , P L ) < 4 k O u t s t a n d i n g i f 4 k e r ( j , H , P L ) < 1 .
In a similar manner to the way we elaborated for the SA method, we consider that separately from the teacher’s assignation of the vector S ( j , H , P L ) as given by Equation (30), the student’s abilities determine an objective performance in achieving the various activities outlined in the criteria. Then, analogously, we in turn represent by means of the symbol S O ( j , H , P L ) the objective performance rating vector counterpart to S ( j , H , P L ) , say
S O ( j , H , P L ) = a o ( j ) 1 s ( 1 ) , a o ( j ) 2 s ( 2 ) , , a o ( j ) ( m 1 ) s ( m 1 ) , a o ( j ) m s ( m ) .
Correspondingly, we associate an objective scoring sum counterpart T S O ( j , H , P L ) with the teacher’s produced one T S j , H , P L , as given by Equation (31). Specifically,
T S O ( j , H , P L ) = i = 1 m a o ( j ) i s ( i ) .
Therefore, we set the objective grade e r o j , H , P L determined by the j -th student’s proficiency to be
e r o ( j , H , P L ) = T S O ( j , H , P L ) k .
Thus, we can consider the deviation for e r ( j , H , P L ) , relative to e r o j , H , P L , denoted through e r ( j , S , S O ) and defined by
e r ( j , S , S O ) = e r o ( j , H , P L ) e r ( j , H , P L ) .
In a similar manner as we did for the SA method, for a replicate labelling index q = 1 , 2 , , δ m , and keeping in mind Equation (30), we will consider simulated replicates S q ( j , H , P L ) given by
S q ( j , H , P L ) = a q ( j ) 1 s ( 1 ) , a q ( j ) 2 s ( 2 ) , , a q ( j ) m s ( m ) .
Then, in turn, we can consider replicating absolute deviations e r j , e r q , e r o , given by
e r ( j , e r q , e r o ) = e r o ( j , H , P L ) e r q ( j , H , P L ) ,
where, agreeing with Equation (33), we have
e r q ( j , H , P L ) = T S q ( j , H , P L ) k
with
T S q ( j , H , P L ) = i = 1 m a q ( j ) i s ( i ) ,
being a q ( j ) i s ( i ) , the q -th simulated value of a ( j ) i s ( i ) (cf. Equation (31)).
We now set e r o ¯ j , H , P L , to represent the mean value of the objective grades e r o j , H , P L . Then, the rubric’s method determines a Mean Absolute Deviation of e r q ( j , H , P L ) relative to e r o ¯ j , H , P L . This is denoted by e r ¯ j , H , P L and given by
e r ¯ j , H , P L = q = 1 δ m e r o ¯ ( j , H , P L ) e r q ( j , H , P L ) δ m .
As we did for the summative method, we adopt the convention that e r o j , H , P L lies in the interval ( 0 , 1 ] . Then, in the simulation runs that yield e r ¯ j , H , P L we take e r o ¯ ( j , H , P L ) = 1 / 2 . We ahead embrace e r ¯ j , H , P L as a proxy for the rating estimation error for the rubric’s method conferring grade e r ( j , H , P L ) , relative to e r o ¯ ( j , H , P L ) .

2.2.2. Simulation Procedure for the Rubric Protocol

The generation of the q -th simulated value for a ( j ) i s ( i ) is carried out by assigning for each criterion, h i , a randomly chosen performance level l s { 1 , 2 , 3 , , k } . This mapping is implemented by the MATLAB 2018b function r a n d i ( [ 1 ,   k ] ) , which generates random integer values with uniform distribution across the set of possible performance levels. Each simulation run produces a complete vector of assigned levels for the m criteria, representing a plausible configuration of performance. The level selected for each criterion is converted to a hierarchical numerical value by the function α ( l s ) = s , following the usual convention that performance levels are ordered in increasing order. These hierarchical values are weighed using the relative weights of each criterion ρ i , and the total weighted sum across all criteria generates a rating e r q ( j , H , P L ) according to Equation (42). The use of a uniform distribution for the selection of performance levels is justified by its methodological neutrality; it assumes that all levels have the same probability of being selected, in the absence of prior information on the student’s tendency. This approach avoids introducing bias into the simulation and is consistent with the principles of Monte Carlo methods, which seek to explore the sensitivity of the assessment system to possible variability in student achievement patterns. Each set of simulated scores e r q ( j , H , P L ) is subtracted from the mean score, and the absolute error between the two is calculated to obtain the corresponding M A D R value. This value is then combined with the complexity pointer C P S to calculate C I R and β R . Appendix B provides a detailed description of the implementation of the Rubric grading procedure.

2.3. The Systematic Task-Based Assessment Method (STBAM)

This section describes the formalities of the Systematic Task-Based Assessment Method (STBAM). This protocol generalizes the Integrated Instruction Method (IIM) introduced by [24]. The STBAM device currently being dealt with builds upon a formal structure that echoes the summative and rubric alternatives. However, like the IIM, it also contemplates a Fuzzy Inference System (FIS) module to assign a competence-based evaluation of a student’s performance.
Unlike the rubric method, the STBAM does not include evaluation criteria or assign individual weights. Instead, adhering to the IIM approach, the STBAM primarily relies on explicitly displaying the guidelines or indications that students must follow while completing a learning activity. The STBAM also includes specific indicators that, once evaluated, determine whether the activity was carried out successfully following a basic set of skill-building attributes. Furthermore, the indicators in the STBAM are not directly associated with performance levels, as they are distinct from the rubric method. These indicators are grouped into three main traits corresponding to key aspects of competence acquisition—Learning, Procedure, and Attitude—which are categories and can be represented through fuzzy sets. In Section 2.3.1, we present the formalities of the conventional module of the STBAM. The output of this component allows comparison with the one produced by its FIS-driven integration presented in Section 2.3.2, which helps corroborate the advantages of the ensuing fuzzy logic-based alternative.

2.3.1. Formalization of ST, the Direct Modulus of the Systematic Task-Based Assessment Method

The ST protocol’s primary objective is to evaluate the acquisition of competencies required by a student to complete a specified learning activity. The structure of the ST protocol is based on a set of essential competence-related attributes that students must demonstrate while completing a given assignment. These skill-building inputs are Learning (L), Procedure (P), and Attitude (A).
The STBAM is represented by means of a table that we will refer to concisely as T ( S T ) . Unlike T R , T ( S T ) only has four columns (cf. Table A4). In short, the tasks to be performed by students to carry out a learning activity will be referred to as directions. We assume that the learning activity comprises a number m of these directions. We will denote a particular direction using the symbol d i . Therefore, we have i = 1 , 2 , , m . The directions d i . are placed starting from the second cell of the first column of T ( S T ) . This way, formally, we conceive a matrix D of directions composing the STBAM, namely
D = d 1 d 1 d m 1 d m .
Each direction d i must describe exactly what the student must do. Each one of these directions must be designed so that the development of one or more skills can be demonstrated while completing the part of the learning activity they associate with.
Given a particular direction d i , we will denote using the symbol L i the set of attributes L k i , k = 1 , 2 , , π L , i that a student must display to demonstrate the acquisition of the skills sheltered by the abilities-building factor L . In this regard, the student must exhibit that the extent of their cognitive pool is suitable for achieving the specific commands of direction d i . As an example, an attribute L k i could stand for data, terms, facts, concepts, principles, theories, definitions, formulas, or key foundations related to the given direction d i .
Correspondingly, P i will stand for the pool of attributes P k i , k = 1 , 2 , , π P , i that allow the student’s performance to be aligned with the P competence-building factor. In this regard, the application of knowledge must be demonstrated through practical activities. For example, P k i can include the ability to identify, explain, recall, apply, solve, analyze, synthesize, program, evaluate, justify, relate, calculate, correct, review, or choose.
Sequentially, A i will denote the set of A k i ,   k = 1 , 2 , , π ( A , i ) , attributes that rate performance aligned with an A capability. In this performance-qualifying segment, what is expected to be verified during the development of the learning activity is specifically the behavior that the student adopts to perform one or a set of actions. Then, the attribute A k i must be aligned with the objectives of the units, topics, and subtopics of a curriculum, as well as with the overarching competencies. For example, an attribute A k i could include critical, logical, analytical, abstract, creative, descriptive, or collaborative thinking, or the ability to solve a problem. Therefore, within the ST structure, we also accommodate the matrices
L i = L 1 i L 2 i L π L , i 1 i L π L , i i     P i = P 1 i P 2 i P π P , i 1 i P π P , i i     A i = A 1 i A 2 i A π ( A , i ) 1 i A π ( A , i ) i .
To verify that the j -th student carried out successfully, according to the competence-tied attributes triad L , P , A , each instruction d i that the T ( S T ) comprises, we arrange for a set of specific attainment indicators that are placed starting from the second cell of the second, third, and fourth columns of T S T , respectively.
Accordingly, for the j -th student, we will let the symbol l ( j ) k i with k = 1 , 2 , , π L , i , stand for an attainment indicator associated with the L k i attribute required to accomplish d i for i = 1 , 2 , , m . Similarly, p ( j ) k i for k = 1 , 2 , , π P , i will represent the corresponding success indicator for the P k i attribute linked to performing direction d i . In turn, a ( j ) k i , where k = 1 , 2 , , π ( A , i ) , will represent the success indicator of the A k i attribute while performing direction d i . Therefore, formally, for direction d i , the structure of the S T protocol also includes the indicator matrices I L ( j ) i ,   I P ( j ) i , and I A ( j ) i with 1 × π L , i , 1 × π P , i , and 1 × π ( A , i ) entries one-to-one, namely,
I L ( j ) i = l ( j ) 1 i l ( j ) 2 i l ( j ) π L , i 1 i l ( j ) π L , i i     I P ( j ) i = p ( j ) 1 i p ( j ) 1 i p ( j ) π P , i 1 i p ( j ) π P , i i     I A ( j ) i = a ( j ) 1 i a ( j ) 2 i a ( j ) π ( A , i ) 1 i a ( j ) π ( A , i ) i .
Each one of the indicators l ( j ) k i , p ( j ) k i and a ( j ) k i can bear one of the binary values ✓ or X . The ( ) pointer will appear whenever the student’s performance exhibits a particular manifestation of the general attributes Learning ( L k i ), Procedure ( P k i ), or Attitude ( A k i ). Correspondingly, if one of these attributes is not displayed, then the ( X ) marker will show up. An instruction with indicators graded ( ) in one or more competence-bearing attributes ensures that the student not only “knows” something ( L ) but can also “do it” ( P ) and embraces the task with the right disposition ( A ). This approach guarantees holistic development, promoting learning beyond memorization.
In what follows, the symbolic statement ( l ( j ) k i ) will signal that the j -th student indicator l ( j ) k i is linked to a success pointer. Similarly, ( p ( j ) k i ) will signify that p ( j ) k i bears an achievement pointer, and ( a ( j ) k i ) will be interpreted similarly.
The total number of indicators linked to direction d i that display a binary pointer for student x j will be denoted using the z ( j ) L i symbol and defined through
z ( j ) L i = k = 1 π L , i I ( l ( j ) k i ) ,
with I ( l j k i ) being the indicator function of the statement that l ( j ) k i ties to a binary value , namely,
I ( l ( j ) k i ) = 1 i f l ( j ) k i   t i e s   t o   a   b i n a r y   p o i n t e r   v a l u e   0 i f l ( j ) k i   t i e s   t o   a   b i n a r y   p o i n t e r   v a l u e   X .
This pool rates the student’s commitment to adequately completing the tasks specified in direction d i according to the L attribute. Many indicators marked indicate a successful display of learning competence.
Correspondingly, the total number of p ( j ) k i indicators associated to d i that display a binary pointer will be denoted using the z ( j ) P i symbol and given by
z ( j ) P i = k = 1 π P , i I p j k i ,
where I ( p ( j ) k i ) stands for the indicator function of the statement that p ( j ) k i takes a binary value , namely,
I ( p ( j ) k i ) = 1 i f p ( j ) k i   a s s o c i a t e s   t o   a   b i n a r y   p o i n t e r   v a l u e   0 i f p ( j ) k i   a s s o c i a t e s   t o   a   b i n a r y   p o i n t e r   v a l u e   X .
Then, the size of z ( j ) P i rates the j -th student’s dedication to adequately completing the tasks specified in direction d i according to the P attribute. Many p ( j ) k i indicators bearing imply a successful display of procedural competence.
In turn, the total number of a ( j ) k i indicators pooling for d i and that display a binary pointer will be denoted using the z ( j ) A i symbol and given by
z ( j ) A i = k = 1 π ( A , i ) I ( a ( j ) k i ) .
I ( a ( j ) k i ) = 1 i f a ( j ) k i   t a k e s   a   b i n a r y   p o i n t e r   v a l u e   0 i f a ( j ) k i   t a k e s   a   b i n a r y   p o i n t e r   v a l u e   X .
Then, the size of z ( j ) A i values the j -th student’s enthusiasm in effectively completing the tasks specified in direction d i according to the A attribute. Many a ( j ) k i indicators bearing imply an effective display of attitudinal competence.
In summary, the indicators l ( j ) k i ,   p ( j ) k i , and a ( j ) k i allow the assessment of the j -th student’s performance while completing a training or developmental activity to be aligned according to the Learning, Performance, and Attitude attributes. Teachers must design educational tasks in such a way that learning the application of procedure and the attitudes adopted while completing an academic chore are clearly demonstrated. This is the unique endeavor that teachers need to undertake to ensure that the table T ( S T ) represents what they aim to evaluate and what will demonstrate the achievement of the unit, topic, or subtopic objectives, as well as the development of skills and competencies.
For the elaboration ahead, it is essential to consider entries z ( j ) L , z ( j ) P , and z ( j ) A that, respectively, give the number of indicators corresponding to L ,   P ,   a n d   A , which show a binary pointer , namely,
z ( j ) L = i = 1 m z ( j ) L i ,     z ( j ) P = i = 1 m z ( j ) P i   and   z j A = i = 1 m z ( j ) A i ,
where z ( j ) L i ,   z ( j ) P i and z ( j ) A i are given by Equations (48), (50), and (52), respectively.
Likewise, formally, for the ST structure, the total number of indicators in L , P , and A are, respectively, given by
T L = i = 1 m π L , i         T P = i = 1 m π P , i         T A = i = 1 m π ( A , i ) .
Also, for the S T structure, the total number of indicators composing the ST scheme is denoted through T I and given by
T I = T L + T P + T A .
We introduce the vector z ( j , D ) of direction-determined grading inputs for the j -th student, namely,
z ( j , D ) = z ( j ) L , z ( j ) P , z j A .
We can then express the grade e s t ( j , z , D ) directly assigned by the teacher to the j -th student for j = 1 , 2 , , n , using the ST scheme through
e s t ( j , z , D ) = z ( j ) L + z ( j ) P + z j A T I
with T I given by Equation (56). Then, e s t ( j , z , D ) can be interpreted as the score assigned by the teacher to the j -th student using the conventional modulus of the S T protocol given by Equations (45)–(58).
Consider a group G that includes a number n of students. Suppose that the conventional modulus of the ST scheme is used to assign grades tied to a learning activity that has been assigned to G . Let e s t ( j , z , D ) for j = 1 , 2 , , n , the global evaluation or grade obtained by student x j . We will then have an ( n × 1 ) matrix E S T of global grades assigned to G , namely,
E S T = e s t ( 1 , z , D ) e s t ( 2 , z , D ) e s t ( n 1 , z , D ) e s t ( n , z , D ) .
As we elaborated for the summative and the rubric methods, we consider that independently of the teacher’s assessment e s t ( j , z , D ) given by Equation (58), the student’s pool of abilities determines an objective performance at achieving the different activities contemplated into the directions composing the S T construct. According to Equation (58), we represent the associated objective performance ratings through the symbols z o ( j ) L ,   z o ( j ) P and z o j A , which, recalling Equation (54), can be expressed by
z o ( j ) L = i = 1 m z o ( j ) L i ,   z o ( j ) P = i = 1 m z o ( j ) P i   and   z o j A = i = 1 m z o ( j ) A i ,
respectively, in similar way as we set for Equation (57), we consider the vector z o ( j , D ) of objective grading inputs
z o ( j , D ) = z o ( j ) L , z o ( j ) P , z o j A .
Therefore, based on Equation (58), we can consider an objective grade e s t ( j , z o , D ) to be given by
e s t ( j , z o , D ) = z o ( j ) L + z o ( j ) P + z o j A T I ,
where T I is given by Equation (56).
Therefore, the direct arrangement of the ST scheme entails an absolute deviation e s t ( j , z , z o , D ) , formally given by
e s t ( j , z , z o , D ) = e s t ( j , z o , D ) e s t ( j , z , D ) .
Moreover, the direct e s t ( j , z , D ) assessments provide a device to estimate what would be the uncertainty conferred to the grade assigned to the j -th student, as mainly determined by the teacher’s subjectivity while assigning the achievement pointers (✓) to the involved l ( j ) k i , p ( j ) k i , and a ( j ) k i indicators.
Recalling the simulation procedures we arranged for the SA and R methods, we now introduce a labeling index q = 1 , 2 , , θ m , and as set by Equation (54), we can consider simulated replicate values z ( q , j ) L z ( q , j ) P and z ( q , j ) A , given by
z ( q , j ) L = i = 1 m z ( q , j ) L i ,   z ( q , j ) P = i = 1 m z ( q , j ) P i ,   z ( q , j ) A = i = 1 m z ( q , j ) A i ,
where z ( q , j ) L i , z ( q , j ) P i , and z ( q , j ) A i are simulated versions of z ( j ) L i , z ( j ) P i , and z ( j ) A i , (cf. Equations (48)–(52)), which along with Equation (57) yield the simulated rating triplet
z q ( j , D ) = z ( q , j ) L , z ( q , j ) P , z q , j A .
The last equation concurrently determines the simulated grade e s t ( j , z q , D ) :
e s t ( j , z q , D ) = z ( q , j ) L + z ( q , j ) P + z q , j A T I .
Then, agreeing with Equations (62) and (66), we can consider replicating absolute deviations e s t ( j , z q , z o , D ) , given by
e s t ( j , z q , z o , D ) = e s t ¯ ( j , z o , D ) e s t ( j , z q , D ) ) .
Then, we can introduce the Mean Absolute Deviation e s t ¯ ( j , z q , z o , D ) , given by
e s t ¯ ( j , z q , z o , D ) = q = 1 θ m e s t ¯ ( j , z o , D ) e s t ( j , z q , D ) ) θ m .
We in turn set the convention that e s o ( j , z o , D ) lies in the interval ( 0 , 1 ] . Therefore for the considered simulation runs, we set e s t ¯ j , z o , D = 1 / 2 . Then, we can offer e s t ¯ j , z q , z o , D as a proxy for the average rating error for e s t j , z , D relative to e s t ¯ j , z o , D .

Simulation Procedures for the ST Device

The generation of replicates of student grades assigned using the ST scheme is based on the random assignment of binary indicators to the evaluation attributes Learning (L), Procedure (P), and Attitude (A). For each direction in an activity, a set of specific indicators is generated per attribute, and each of them is assigned a value of 0 or 1, using the function r a n d i ( [ 0 ,   1 ] ) in MATLAB 2018b. A value of 1 indicates that the student showed evidence of the attribute being assessed, while a value of 0 indicates no evidence of displaying the referred trait. This procedure is repeated a defined number θ ( m ) = 100 of times to generate an equal number of possible performance configurations z q j , D , (cf. Equation (65)). Each configuration z q j , D allows the calculation of a grade replicate e s t ( j , z q , D ) by means of Equation (66). Each score replicate e s t ( j , z q , D ) is subtracted from the mean score e s t ¯ j , z o , D = 1 / 2 , and the absolute deviation between the two is calculated to obtain the corresponding M A D S T value. This value is then combined with the complexity pointer C P S T to calculate C I S T and β S T . The use of an even distribution (0 or 1 with equal probability) models uncertainty about the teacher’s judgment or the student’s interpretation of achievement. This approach seeks to explore the sensitivity of the ST method to reasonable and objectively possible variations in performance, without introducing bias. Appendix C offers a detailed explanation of the functioning of the ST grading scheme.

2.3.2. The ST-FIS, a Fuzzy Inference System-Based Modulus for the STBAM Scheme

A Fuzzy Inference System (FIS) in the ST settings refers to a framework or method used to evaluate a student’s performance by applying human-like reasoning based on vague or subjective inputs. An FIS is constructed with key elements and utilizes fundamental processes to make inferences, aiming to produce as output E the result of the evaluation process, which reflects the student’s performance in a clear and precise manner. FIS in the ST setting allows uncertainty to be handled by combining the L, P, and A to produce a fair and balanced performance assessment.

Key Elements of the Mathematical Representation of an ST-Fuzzy Inference System (ST-FIS)

1.
Fuzzy sets
Fuzzy sets represent the foundation of the ST-FIS. A fuzzy set is defined by its membership function, which expresses the degree to which a value belongs to it.
Let F stand for a fuzzy set concurring in the ST-FIS, and let μ F z be its own membership function, which maps a crisp input z belonging to a domain D to a number in the interval 0 , 1 , namely
μ F z : D 0 , 1 .
This mapping reflects how strongly z belongs to the fuzzy set F .
2.
Input variables
Generally, an FIS considers a set
Z = z 1 , z 2 , , z k , , z n 1 , z n ,
of input values z k , where each one represents a crisp value in a domain D R . For each input z k , there is an associated pool
F = B 1 F , , B 2 F , , B m F 1 F , B m ( F ) F ,
where F is a fuzzy set, and B k F represents the k t h linguistic term with k = 1 , 2 , , m F , which qualitatively interprets the input z k . These fuzzy sets collectively partition the range of z k into overlapping regions, enabling smooth transitions between categories. Each linguistic term B k F ,   f o r   k = 1 , 2 , , m F entails a fuzzy set defined by its membership function μ B k F z r .
Particularly, for the ST settings, given a direction d i for i = 1 , 2 , , m , adapting an FIS takes up as input variables F L , which encompass the learning of theoretical background, F P , leading to the display of practical skills; it also takes up F A , covering the student’s disposition to complete tasks. These input variables are defined in the domains D L R , D P R , and D A R , respectively. Moreover, for the input variable L , we associate linguistic terms
L = B 1 L , B 2 L , , B m L 1 L , B m ( L ) L .
Correspondingly, for P , we arrange
P = B 1 P , B 2 P , , B m P 1 P , B m ( P ) P .
And for the input variable A ,
A = B 1 A , B 2 A , , B m A 1 A , B m ( A ) A .
3.
Output variables
Likewise, adapting an FIS for the ST protocol involves an output variable Y which corresponds to an Enlightenment or Proficiency skill demonstrated by the student. We have
E = B 1 E , B 2 E , , B m E 1 E , B m ( E ) E .
4.
Rule base
In the FIS joining the ST scheme, the rule base is composed of a set of “IF-THEN” rules that connect the input variables L , P , A with the output variable E in linguistic terms. The ongoing FIS has a set of fuzzy rules that allow it to generate a comprehensive assessment of the student’s performance. The rules capture the diversity of possible combinations between the inputs, adapting to different types of students and contexts. Each rule incorporates the three dimensions (cognitive, practical and attitudinal), ensuring that learning is not assessed in isolation. These rules reflect real-life situations that occur during the development of any activity, where performance depends on a balanced interaction among Knowledge, Procedure and Attitude. Rules are denoted using the symbol R r for r = 1 n and acquire a generic form such as
R r : I F   μ L z ( j ) L μ P z ( j ) P μ A z ( j ) A   THEN   y r .
where and represent the ‘or’ and ‘and’ logic operators z ( j ) L ,   z ( j ) P , a n d   z ( j ) A , as given by Equation (54), which are input values for L , P , and A , respectively, and y r is an output one associated with Y for the rule r . These terms define the qualitative levels in the fuzzy logic-based system.

2.3.3. Fundamental Processes of an ST-Fuzzy Inference System (ST-FIS)

Fuzzification

In the general FIS structure, through the fuzzification process, the input values are transformed into fuzzy values through the fuzzification process, based on fuzzy sets predefined by their corresponding membership functions. This step allows the FIS to handle uncertainty and subjective reasoning by mapping exact inputs to degrees of membership in linguistic terms.
Adapting an FIS for the ST device involves transforming the crisp evaluation metrics z ( j ) L , z ( j ) P and z ( j ) A , as given by Equation (54), into a fuzzy-based decision framework. This allows for subtle grading, especially in scenarios where binary values ( or X ) might not fully capture the degree of student performance.
Keeping up with Equations (72)–(74), the linguistic terms B k L , B k P and B k A yield fuzzy sets defined by their membership functions, which map the crisp inputs z ( j ) L , z ( j ) P and z ( j ) A to a degree of membership in the interval 0 , 1 . Moreover, for L , we can consider membership functions
μ B 1 L z ( j ) L , μ B 2 L z ( j ) L , , μ B m L 1 L z ( j ) L , μ B m ( L ) L z ( j ) L .
Similarly, for P , we take membership functions
μ B 1 P z ( j ) P , μ B 2 P z ( j ) P , , μ B m P 1 P z ( j ) P , μ B m ( P ) P z ( j ) P .
And for A ,
μ B 1 A z ( j ) A , μ B 2 A z ( j ) A , , μ B m A 1 A z ( j ) A , μ B m A A z ( j ) A .
The output of the fuzzification step is a fuzzy representation of each input variable z ( j ) * across its relevant fuzzy sets. This representation serves as the basis for subsequent processing, including the application of fuzzy rules and aggregation procedures.

Rule Evaluation Procedure

The FIS of the ST scheme operates the rules to compute a degree of activation α r for each output evaluation y r . The inference engine evaluates the rules in the knowledge base using fuzzy logic operators to combine membership degrees. For a given input Z = z ( j ) L , z ( j ) P , z ( j ) A set (cf. Equation (70)), the FIS calculates the activation level α r of each rule R r using the ‘and’ ( m i n ) or ‘or’ ( m a x ) operators, let us say
α r = m i n ( μ B k L z ( j ) L , μ B k P z ( j ) P , μ B k A z ( j ) A
or
α r = max ( μ B k L z ( j ) L , μ B k P z ( j ) P , μ B k A z ( j ) A

Aggregation Process

The aggregation process combines the outputs y r of all activated rules to produce an overall fuzzy set Y . This procedure relies on the THEN clauses composing the inference rules R r that jointly specify the output fuzzy set Y :
Y = B 1 Y , B 2 Y , , B j Y , , B m ( Y ) Y .
Each output fuzzy set B j Y in Y is defined by its membership function μ B j Y ( y ) . For each rule R r , the degree of activation α r (calculated during the rule evaluation step) is used to scale the membership function of the corresponding output fuzzy set B r Y . This scaling modifies the membership function values to reflect the strength of the contribution of rule R r , to the global output Y . Then, the aggregation process combines these scaled membership functions α r μ B r E ( y ) from all rules using the m a x operator. This operator ensures that the highest degree of membership across all rules is retained for each value of y . The resulting aggregated membership function is expressed as
μ Y ( y ) = max r α r μ B r Y ( y ) .
This aggregated fuzzy set represents the combined output of the FIS, capturing all relevant contributions from the activated rules.

Defuzzification Process

To convert the fuzzy output into a crisp grade, the FIS uses a defuzzification method. For the FIS adapted for the ST scheme, we rely on a centroid approach; then, setting Y = e s t f i s ( j , z , D ) , we have
e s t f i s j , z , D = s = 1 n R y s μ Y y s s = 1 n R μ Y y s ,
where n ( R ) stands for the number of rules composing the ST-FIS. We can then introduce the grade matrix E S T F I S , with entries e s t f i s ( j , z , D ) representing the evaluations produced by the ST-FIS protocol, namely,
E S T F I S = e s t f i s ( 1 , z , D ) e s t f i s ( 2 , z , D ) e s t f i s ( n 1 , z , D ) e s t f i s ( n , z , D ) T .
Again, as we elaborated for the summative method, the rubric, and also for the conventional section of the ST scheme, we conjecture that independently of the teacher’s assessment of the input vector z j , D , the student’s pool of abilities determines true or objective ratings z o ( j ) L , z o ( j ) P and z o j A given by Equation (60), which in turn establish the vector z o ( j , D ) of objective grading inputs with components specified by Equation (61). Therefore, as it was explained for the direct S T device, the ST-FIS scheme entails an absolute rating deviation, of e s t f i s ( j , z , D ) relative to e s t f i s j , z o , D , which is denoted through e s t f i s ( j , z o , z , D ) and is formally given by
e s t f i s ( j , z o , z , D ) = e s t f i s ( j , z o , D ) e s t f i s ( j , z , D ) .
We again set a labeling index q = 1 , 2 , , θ m and consider simulated replicates z ( q , j ) L , z ( q , j ) P and z ( q , j ) A , given by Equation (64). These replicates define the simulated input triplet z q ( j , D ) given by Equation (65). This yields the simulation-based grade e s t f i s j , z q , D ) (cf. Equation (84)). This way, we can obtain replicate deviations e s t f i s ( j , z q , z o , D ) given by
e s t f i s ( j , z o , z q , D ) = e s t f i s j , z o , D e s t f i s j , z q , D .
Then, we can conceive the Mean Absolute Deviation e s t f i s ¯ ( j , z q , z o , D ) given by
e s t f i s ¯ ( j , z q , z o , D ) = q = 1 θ m e s t f i s ¯ ( j , z o , D ) e s t f i s ( j , z q , D ) θ m ,
where the symbol e s t f i s ¯ ( j , z o , D ) stands for the average value of the e s t f i s ( j , z o , D ) grades assigned by the ST-FIS method and lying in the interval 0 , 1 . Therefore, for the involved simulation runs, we set e s t f i s ¯ j , z o , D = 1 / 2 . In what follows, we will consider that e s t f i s ¯ ( j , z q , z o , D ) gives rise to a reasonable a proxy for the average rating error for the ST-FIS grade e s t f i s ( j , z , D ) .

2.3.4. Simulation Procedure for the ST-FIS Arrangement

The simulation procedure replicates the structure of the Systematic Task-Based Assessment Method (STBAM), incorporating a Fuzzy Inference System (FIS) as an alternative grading modulus. In each simulation, values representing the total number of indicators marked with ✓ for the attributes Learning ( L ), Procedure ( P ), and Attitude ( A ) are generated based on random inputs drawn from uniform distributions over predefined domains. These totals correspond to the variables z ( q , j ) L , z ( q , j ) P , z ( q , j ) A in Equation (64), which form the input triplet z q ( j , D ) used by the ST-FIS to produce a simulated e s t f i s j , z q , D following Equation (84). Each set of simulated score e s t f i s ( j , z q , D ) is subtracted from the mean score e s t f i s ¯ j , z o , D = 1 / 2 , and the absolute deviation between the two is calculated to obtain the corresponding M A D S T F I S value. This value is then combined with the complexity pointer C P S T F I S , to calculate C I S T F I S and β S T F I S . Appendix D presents a detailed description of the implementation of the ST-FIS grading protocol.

2.4. A Note on the Concept of an Objective Grades

We recognize that the concept of an objective grade may require further clarification, especially regarding its role within our model framework. Indeed, in our formulation, the objective grade (denoted as e s o ( j , A , W ) in the Summative assessment method, e r o j , H , P L in the Rubric method, and e s t j , z o , D or e s t f i s j , z o , D in STBAM) is not derived from actual empirical data, but rather represents a theoretical benchmark. It is included in our formal settings to distinguish the rating determined by the student’s “true” performance from that resulting from the evaluator’s judgment. This enables us to consider the absolute deviation between teacher-assigned scores and a hypothetical ideal evaluation, thereby offering a symbolic device or proxy formally representing the subjective bias introduced by human evaluators.

2.5. Formalization of the Consistency and Efficiency Indicators

In what follows, we rely on an M subscript to generically refer to a student grading method M . This way, to refer to the Summative protocol, we take M = S , M = R for the Rubric, M = S T for the ST scheme, and M = S T F I S to refer to the S T F I S arrangement. To assess the suitability of the addressed methods, we depend on using a mathematical Consistency Index ( C I M ) that defines in terms of the Mean Absolute Deviation ( M A D M ), and a Complexity Pointer ( C P M ). This last one is conceived as a numerical indicator of the relative complexity of the formal structure of a grading method M . Setting its form requires primarily considering (1) the number N M of components, (2) that of distinct elements ( E M ), and (3) that of the mathematical operations Q M involved in M to compute a final grade.
For the addressed consistency comparison among different methods, we consider pondered values P N ( N M ) , P E ( E M ) and P Q ( Q M ) , associated with the factors N M ,   E M and Q M , respectively. To obtain these entries, we consider the strengths of the later indicators relative to their maximum values across all methods M , that is,
P N N M = N M max N M
P E E M = E M max E M
P Q Q M = Q M max Q M
where
max N M = m a x N S , N R , N S T , N S T F I S ,
max E M = m a x E S , E R , E S T , E S T F I S ,
max Q M = m a x Q S , Q R , Q S T , Q S T F I S .
Then, the indicators P N ( N M ) , P E ( E M ) and P E ( Q M ) as given by Equations (89)–(91) will take values within the interval [ 0 , 1 ] and consequently will thereby remain as normalized entries.
Accordingly, the Complexity Pointer C P M for method M is defined as the weighted sum of the P N ( N M ) ,   P E ( E M ) , and P Q ( Q M ) factors, namely,
C P M = a P N ( N M ) + b P E ( E M ) + c P Q ( Q M ) ,
where a , b , c are weighting coefficients reflecting the relative importance that the factors N M , E M , and Q M , respectively, bear in the determination of the complexity pointer C P M . Moreover, we set the condition
a + b + c = 1 .
Then, defining
ρ = m a x P N ( N M ) , P E ( E M ) , P Q ( Q M )
and
θ = m i n P N ( N M ) , P E ( E M ) , P Q ( Q M )
We have that Equations (97) and (98) imply C P M lying within the range
θ C P M ρ .
Then, we define C I M as the ratio:
C I M = 1 + C P M 1 + M A D M · C P M 1 + M A D M · C P M ,
where C P M is given by Equation (95). According to the evaluation method M we are dealing with, the M A D M entry embodies each one of the forms given by Equations (14), (44), (68), or (88).
Correspondingly we set the β M efficiency rating index to take the form
β M = C I M · C P M ( M A D M + 1 ) · C P M + 1 .
Appendix E provides a glossary of the terms used throughout this manuscript.

3. Results

The intent behind student performance evaluation is to determine the extent to which the objectives of a specific learning activity were accomplished [25]. Another essential aim of students’ evaluation is to rate the quality of instructional procedures and facilitate feedback for course enhancement [7,26]. The assessment also helps instructors determine the appropriateness of their capabilities from the perspective of their particular and professional requirements [27]. Moreover, the suitability of the students’ evaluation helps in detecting their learning difficulties [28].
A primary challenge in evaluating student performance is ensuring that the mathematical structures of the methods used for grading are consistent and reliable. This means finding a way to accurately and fairly measure student achievements using these methods. Nevertheless, few studies address this vital subject (e.g., [29,30,31,32]).

3.1. Variation Ranges for the M A D M , C I M , and β M Indicators

From the perspective above, we proposed the C I M index, defined by Equation (100), as a device for comparing the consistency of the formal structure supporting the presently addressed grading methods. Concurrent exploration of its variability begins by noting that, according to Equation (95), the involved term C P M is constrained as set by Inequality (99). Concerning its second determining factor M A D M , we require achieving a few algebraic steps to obtain its variation range. Moreover, let us denote using the symbol G M q the q -th simulated student grade for the method M . Then, since we are dealing with normalized scores, we have
0 < G M q 1 .
Let G O M ¯ stand for the average objective grade that M assigns. Then, since usually the joining score lies in the interval ( 0 , 1.0 ] , we may set
G O M ¯ = 1 / 2 .
Now, let us consider the function U G M q defined through
U G M q = G M q 1 / 2 .
Then, for 0 < G M q 1 / 2 , we have that U G M q lies in the interval ( 1 / 2 , 0], that is, U G M q remains negative and attains a minimum value of 1 / 2 . Correspondingly, in the interval 1 / 2   < G M q 1 ,   U G M q takes values within ( 0 , 1 / 2 ] , that is, U G M q remains positive and attains a maximum value of ½ . Therefore, altogether, for 0   < G M q 1 , we have
0 G M q G O M ¯ 1 / 2 .
Now, for all the simulation procedures addressed here, we set q = 1 , 2 , , 100 , and G O M ¯ = 1 / 2 , and we also define
M A D M = q = 1 100 | G M q G O M ¯ | 100 .
Thus, we have
0 < M A D M 1 / 2 .
Considering this last inequality and that the term C P M takes values as indicated by Inequality (99), we have
2 / ( ρ + 2 ) ( 1 + M A D M · C P M ) 1 < 1 .
This result, along with Inequality (99), and Equation (100) set the ratio C I M staying in the range
( ρ + 2 θ + 2 ) / ( ρ + 2 ) C I M < ρ + 1 .
Then, considering Inequalities (99) and (107), we obtain
2 / 3 ( ρ + 1 ) ( C P M + 1 ) 1 M A D M + 1 1 1 / ( θ + 1 ) .
In the other hand, using Inequalities (99) and (109), we obtain
ρ + 2 θ 2 + 2 θ ) / ( ρ + 2 ) C I M C P M < ρ ρ + 1 .
Then, Inequalities (110) and (111) along with Equation (101) determine for β M the variation range:
2 ( ρ + 2 θ 2 + 2 θ ) / 3 ( ρ + 2 ) ( ρ + 1 ) < β M < ρ ( ρ + 1 ) / ( θ + 1 ) .

C I M and β M -Based Method Performance Comparison Criteria

To judge the mathematical consistency of two grading methods, we suggest the following general C I M -based comparison criterion: for two grading methods A and B , with their respective Consistency Indices C I A and C I B , we say that method A is more consistent than method B if C I A >   C I B . We can interpret this by stating that the method with the higher Consistency Index ( C I M ) achieves a better balance between low grading dispersion ( M A D M ) and reasonable grading complexity C P M . If C I A and C I B values are close, the methods A and B have similar consistency levels, and additional considerations (e.g., fairness and efficiency) may be needed to determine the preferable approach. If C I M values are very different, the method with the higher C I M is preferable in terms of grading consistency.
We can further elaborate on the presently offered general comparison criteria by adding a threshold-based ranking approach. This provides a structured way to further classify methods. In brief, we can propose benchmark thresholds: (1) One for low consistency, enforced whenever 0 < C I M 1 ( M produces unstable grading or excessive complexity). (2) A moderate consistency range sustained through 1 < C I M 3 / 2 ( M shows acceptable consistency, but it could be improved). (3) A high consistency range tied to the condition 3 / 2 < C I M 2 ( M provides a reliable grading output).
The β M index rates a method M s efficiency by measuring how consistently it performs relative to its complexity. Certainly, a high value of β M indicates that the grading method maintains high consistency despite its complexity, meaning it efficiently balances fairness and structure. Likewise, a low value of β M suggests that the grading method is either too complex for its level of consistency or too inconsistent in relation to its complexity. Relating to the variation of β M , we ought to consider several key cases. (1) Whenever C P M is high, a low β M value means that the grading method is unnecessarily complex relative to its ability to assign stable grades. Conversely, a high β M value could signal that despite high complexity, the grading method is very consistent, which may justify the complexity. (2) If C P M is very low, a high value of β M , may suggest that the grading method is efficient; in other words, it achieves strong consistency with minimal complexity. Furthermore, a low value of β M at low C P M , would indicate that perhaps due to its simplicity, the grading method still lacks consistency, meaning it might need better-defined criteria.
Accordingly, if two grading methods have similar C I M values, the one entailing a higher value of β M is the more efficient one, as it achieves similar consistency with less complexity. Moreover, β M sustains the identification of overly complex methods, as a very low value of this ratio suggests that the complexity added to the grading system does not significantly improve its consistency, meaning the method might be unnecessarily convoluted.
We can expand on the β M -based comparison criterion by suggesting a threshold-based ranking approach. This provides a structured approach to further categorizing methods based on their efficiency. We propose arranging the following benchmark thresholds: (1) One for inefficiency associated with a condition β M 1 / 6 (where the method is either highly inconsistent or extremely complex, necessitating a significant adjustment). (2) A low efficiency range 1 / 6 < β M 2 / 6 (the method battles with consistency or exhibits unnecessary complexity; enhancement is needed). (3) A moderate efficiency range 2 / 6 β M 3 / 6 (the method is reasonably fair, but optimizing complexity or consistency could be necessary). (4) A highly efficient domain β M > 3 / 6 (the grading method achieves strong consistency with reasonable complexity).

3.2. Calculations Related to M A D M , C I M and β M Indicators Based on Simulated Grade Data and Pondered–Normalized Complexity Pointer C P M Values

The results of the subsequent analysis are available in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. Table 1 displays the Components ( N M ), Elements ( E M ), and Operations ( Q M ) for the different methods considered in this study (M stands for Summative, Rubric, ST, and ST-FIS). Table 2 presents the Minimum and Maximum values for N M ,   E M and Q M entries across the addressed methods. Table 3 includes the pondered-normalized values for Components ( P N ( N M )), Elements ( P E ( E M )), and Operations ( P Q ( Q M ) ). These items were obtained by using Equations (89)–(91). Table 4 gives the values of the pondering-normalizing parameters θ and ρ (cf. Inequality (99)). It also sets values for the weighting coefficients a , b , c , reflecting the relative importance of factors N M , E M , and Q M , respectively, in the determination of the complexity pointer C P M (cf. Equation (95). Parameter values a , b , c , in this table were chosen as an illustrative device. The generality of the method allows for choosing them differently provided they satisfy the conditions specified by Equation (96). In turn, Table 5 gives upper bounds (UBs) and lower bounds (LBs) for the variation ranges of the Complexity Pointer ( C P M .), Consistency Index ( C I M ) , and Efficiency Index ( β M ) (cf. Equations (95), (100) and (101)). These are calculated by setting θ = 0.13 and ρ = 1.0 (cf. Inequality (99)). Table 6 presents values of indicators for Mean Absolute Deviation ( M A D M ), Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) associated with the analyzed grading methods (cf. Equations (106), (95), (100), and (101)). Finally, Table 7 reports the standard errors linked to the indicators of Agreement ( M A D M ) , Consistency ( C I M ), and Efficiency ( β M ) for the four grading methods analyzed. These values, obtained from simulation processes, offer insight into the statistical precision and reliability of each metric. Smaller standard errors reflect higher stability and reduced variability across simulation runs. According to present exploration, among the methods, ST showed the least variability in the M A D M indicator, suggesting higher precision in that dimension. The Summative approach achieved low standard errors for C I M and β M . Conversely, the ST-FIS method showed the greatest variability across all indicators, pointing to lower statistical robustness. Nevertheless, despite these differences, the variations in standard error values were minimal and did not significantly affect the overall comparative assessment of the methods.
Figure 1 depicts the variation of C I M (Figure 1a) and β M (Figure 1b) when each one of these indicators are set as a function of the M A D M index while keeping the complexity pointer C P M holding different fixed values.
We have, θ > 0 and ρ 1 , ( c f .   I n e q u a l i t y 99 ) . Moreover, entries in Table 1 set 0.13 < C P M 1 . We also have 0 < M A D M < 1 / 2 (cf. Inequality (107)). Then, as underlined by Inequality (109), we must have 0.79 C I M < 2 . Figure 1a shows that C I M remains within its theoretically determined range. If M A D M or C P M approach their largest values, C I M approaches the lower bound value of 0.79 , reflecting poor mathematical consistency of the grading method M . If M A D M is small and C P M approaches its upper boundary, C I M approaches 2, showing M to be a well-balanced grading protocol. Moreover, the values θ = 0.13 and ρ = 1 determine the range 0.0345 < β M < 1.769 (cf. Inequality (112)) Figure 1b illustrates this condition. Likewise, a low value of β M suggests that the grading method is either too complex for its level of consistency or too inconsistent in relation to its complexity.

3.3. Ordering Relationships for the M A D M , C P M , C I M and β M Indicators

The obtained M A D M values fell within the variation range stipulated by Inequality (107). They ranged between 0.06 (for M = S T F I S ) and 0.11 (for M = S u m m a t i v e ) (Table 6), thereby satisfying the relative ordering
M A D S T F I S < M A D S T < M A D R < M A D S
In agreement with the statement of Inequality (99), calculated C P M values (Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7) range between 0.23 ( M = Summative) and 0.96 ( M = S T F I S ), also sustaining the order of
C P S < C P R < C P S T < C P S T F I S .
Correspondingly, entries in Table 6 corroborate that calculated C I M , values fell within the variation range set by θ = 0.13 and ρ = 1.0 . Indeed, calculated C I M entries ranged between 1.20 ( M = S u m m a t i v e ) and 1.91 ( M = S T F I S ). Accordingly, the calculated C I M values sustain the following order:
C I S < C I R < C I S T < C I S T F I S .
It can also be asserted that all addressed methods entail well-balanced grading protocols. Moreover, according to entries in Table 6, among presently analyzed protocols, the S T F I S method attained the highest C I M , suggesting that despite its complexity, it maintains high coherence. Conversely, the Summative method obtained the lowest C I M , implying that its simplicity might render it unable to capture more nuanced evaluations.
Therefore, compared to the Summative, ST, and Rubric counterparts, we can assert that the ST-FIS arrangement produced a higher Consistency index C I M ; it achieved a better balance between low grading dispersion M A D M and reasonable grading complexity C P M . According to the proposed threshold-based ranking approach, the entries in Table 6 verify that the Summative and Rubric methods fell within a moderate consistency range of, 1 C I M 3 / 2 , thereby showing acceptable consistency but the potential to be improved. Concurrently, both the ST and ST-FIS methods were placed within the high consistency domain 3 / 2 < C I M 2 , which indicates reliable and efficient grading.
Concerning β M values, the methods were placed in the following order:
β S < β R < β S T < β S T F I S
Moreover, according to entries in Table 6, the calculated β M value for M = S u m m a t i v e placed within the low inefficiency range ( 1 / 6 < β M 2 / 6 ). Correspondingly, β M for M = R u b r i c accommodated within the moderate efficiency range, namely, 2 / 6 β M 3 / 6 . Accordingly, optimizing the ensuing complexity or consistency could render it necessary for this scheme. On the other hand, both the ST and ST-FIS protocols operated in the highly efficient domain, β M > 1 / 2 , which suggest that these grading methods achieved strong consistency with reasonable complexity.

3.4. Assessing the Distribution and Spread of Simulated Grades

Commonly, in assessing the consistency of an assessment method, it is asserted that individual student grades follow a normal distribution [17,18]. Figure 2 displays the fitting results of a normal distribution to grading data simulated using the Summative, Rubric, ST, and ST-FIS methods. Table 8 includes fitting statistics. The histograms are displayed in Figure 2. Entries in Table 8 show consistent fittings in all cases.
Table 8 presents the results of normal distribution fits to student grades generated using different assessment methods: Summative, Rubric, ST, and ST-FIS. Regarding the Mean and Standard Deviation, the Rubric method had the highest mean score (0.5874), indicating that students tend to score higher under this method compared to the others. The Summative method yielded a mean of 0.5214, which is slightly lower than the Rubric method. The ST method yielded a similar mean value (0.5044), indicating comparable grading outcomes to those of the Summative. The ST-FIS method had a slightly lower mean of (0.4982). Comparing standard deviation values, the smallest one was linked to the ST method (0.0697), indicating that this scheme has the lowest variability, meaning that the produced students’ grades are more clustered around the mean. The ST-FIS method reported a standard deviation of (0.2463), implying that this scheme shows a wider variability in student grades around the mean. The Summative approach reported a value of 0.1412, only slightly different from the one for the Rubric scheme (0.1410). Therefore, the spreads of grades produced by these last methods around the mean could be considered equivalent.
Regarding goodness of fit, we relied on Chi-squared statistics that evaluate how well the normal distribution fits the data. Lower values indicate a better fit. The degrees of freedom (df) associated with each χ 2 test included in Table 8 allow for full interpretation of the p-values. Reported p-values reflect the probability of observing the test statistics under the null hypothesis that the data follow a normal distribution. While the simulated grade data generated by the proposed assessment methods generally fit a normal distribution well (as the high p-values indicate), as we have emphasized, real-world grading data may not always exhibit this behavior. Therefore, the assumption of normality in this context serves as a controlled benchmark for evaluating whether our simulation procedures produce well-centered and symmetric grade outcomes under ideal conditions. It is not intended to generalize to all educational contexts but to support internal consistency analysis among the compared methods.
The Summative method output ( χ 2 = 1.0663, p = 0.5868) suggests a good fit to a normal distribution, as the χ 2 statistics resulted in the lowest of all methods, while its p-value is relatively high. The ST method, as indicated by the χ 2 value ( χ 2 = 1.3905, p = 0.7078), still yielded a good fit to a normal distribution due to its relatively low χ 2 value, which, along with the highest p-value, exhibited the best fit among the four methods. Correspondingly, the Rubric method was associated with a χ 2 value of 3.1181 ( p = 0.3738), which, despite a relatively high p-value, yielded a higher Chi-squared statistic, indicating a weaker fit to normality compared to the Summative and ST methods. Finally, the ST-FIS method yielded χ 2 = 3.5090, p = 0.1730, thereby holding the highest χ 2 value along with the lowest p -value, suggesting that its grade distribution deviates more from normality compared to the other methods but still does not show strong evidence of deviation from normality. To summarize, the Rubric resulted in the highest average student grades, but its fit to the normal distribution was relatively less consistent than those of other methods. Compared to the Rubric method, the Summative and ST methods produced similar average grades with moderate variability and a better fit to normality. The ST-FIS, while having a reasonable mean score, exhibited the widest variation in grades and a relatively weak fit to a normal distribution.
Despite the commonly held assumption of normality of a grade distribution, we acknowledge that grade distributions in real-world educational settings may deviate from normality due to several factors (such as grade inflation, institutional policies, or specific assessment practices). Our use of normal distribution fitting here is not intended to imply that real grades always follow a normal law; rather, it serves as a diagnostic tool with which to assess whether the simulated methods produce distributions that are balanced and unbiased under controlled conditions. This provides an additional layer of internal validation for the proposed simulation methods.
To fulfill the need for detailed statistical visualizations, we performed 100 simulation runs for each method to obtain distributions of the Consistency indicator ( C I M ) and the Efficiency indicator ( β M ). Figure 3 presents boxplots of the simulated C I M values across methods. These plots provide a detailed view of variability, medians, and overall distributional behavior. The ST-FIS method exhibits the highest median consistency and a relatively narrow spread, indicating strong central performance and good stability. Although the Summative method shows a slightly narrower interquartile range, its median value is notably lower, and the overall dispersion is comparable. Similarly, Figure 4 displays the distribution of simulated β M values. The ST-FIS method again stands out by achieving both the highest median efficiency and one of the smallest ranges, suggesting better and more consistent performance. These figures offer a comprehensive view of each method’s behavior under uncertainty and help identify those with the most robust and reliable outcomes.

4. Discussion

The assessment/evaluation foundation theory refers to the principles and conceptual frameworks that guide how assessment and evaluation are understood and implemented, particularly in education, social sciences, and program development. At its core, this theory addresses the purpose, process, and criteria used to judge the quality, effectiveness, or value of a subject, be it a student’s performance, a policy, or a program [33,34]. The assessment/evaluation foundation theory encompasses several essential aspects [35,36]. These include philosophical, methodological, and ethical principles that guide the way learning is measured and interpreted [37,38,39]. It involves defining what constitutes knowledge or competence. It also contributes to ensuring that assessments are valid, reliable, and fair, and leads to the interpretation of results in ways that support sound educational decisions. This theory also distinguishes between formative and summative purposes, as well as norm- and criterion-referenced standards [40]; it emphasizes the social and ethical implications of grading practices; and ultimately, it provides a framework for evaluating not only student performance but also the quality and consequences of the assessment methods themselves [41].
In particular, our currently offered agreement, consistency, and efficiency rating scheme could lead to the interpretation of results from comparisons of the performance of grading methods in ways that support sound educational decisions at the teacher’s level. We will now provide further explanation. Firstly, we conceptualize grading methods as algorithmic or rule-based systems. Thus, they could be seemingly placed within a mathematical framework. Thirdly, given a grading method M, our theoretical arrangement builds up on measures of agreement (Mean Absolute Deviation, M A D M ), Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) indicators. In our view, the listed elements readily typify what we conceive as a protocol for assessing the mathematical consistency of a grading protocol. By standing up to it, we could systematically explore the effects on the output of a grading method under controlled variations in input parameters. For instance, by applying different weighting schemes to conforming learning activities, we can examine how changes influence final grades. Moreover, analytical explorations concurrent with our theoretical arrangement could also reveal whether certain methods disproportionately penalize specific performance patterns, for example, when a single poorly performed high-stakes task overrides consistent performance in other areas, thus exposing potential inconsistencies or biases in the method’s mathematical design. In doing so, our work contributes to aspects of an assessment/evaluation foundation theory by offering an analytical framework through which to assess the mathematical and evaluative soundness of grading practices in a controlled, interpretable manner.
Summative evaluation is well known for delivering quantifiable measures of student achievement [1]. Defenders of this approach ascertain that it offers the advantage of providing precise, quantifiable measures of student achievement [8]. Summative evaluation is also regularly praised for providing effective benchmarking [42]. This enables teachers to assess student performance against established standards or compare it with the accomplishments of their peers [43]. Benchmarks also help to identify areas where students excel or may require additional support. This, in turn, facilitates targeted instructional improvements and supports overall student growth [2,44]. Despite the signaled advantages of the summative evaluation scheme, it also faces challenges [45,46,47]. It often focuses on rote memorization rather than deep understanding [48,49]. It also bears a high-stakes nature that can induce stress and not truly reflect a student’s abilities [50]; and, even though in the present evaluation, along with the ST scheme, the Summative method yielded the smallest value for the complexity pointer C P M , its reported value for the M A D M was the largest among the addressed methods. Moreover, according to the reported values for the C I M and β M rankings (Table 5), the suitability of this method generally leaves room for improvement.
Rubrics are often praised for providing clear, consistent, and structured feedback that supports effective learning [51]. As we described in Section 2, a Rubric is a scoring tool that outlines specific criteria for evaluating performance on an assignment or task, defining various levels of achievement across these criteria. Each contemplated criterion is associated with a weight that sets its influence on the overall grade. However, the particular weights and performance levels are assigned by each teacher, which borders on the individual [52,53]. Furthermore, it is argued that the Rubric appears to adhere to predefined categories or judgments, thereby limiting its effectiveness [54,55]. Therefore, the Rubric method can lead to biased, imprecise, or subjective evaluations [56]. Moreover, although in the present evaluation the M A D M indicator for the Rubric ( M A D R = 0.13 ) was found to be larger than that corresponding to the Summative method ( M A D S = 0.11 ), its Complexity value ( C P R = 0.61 ) was found to be considerably larger than that obtained for the former C P S = 0.23 . Moreover, the Rubric’s consistency indicator ( C I R = 1.54 ) was greater than the matching value for the Summative ( C I S = 1.20 ) protocol; still, it was found to be smaller than the ones corresponding to the ST ( C I S T = 1.73 ) and ST-FIS ( C I S T F I S = 1.91 ) arrangements. Moreover, the Rubric’s β M efficiency ranking ( β R = 0.52 ) placed well behind those corresponding to ST ( β S T = 0.71 ) and ST-FIS ( β S T F I S = 0.88 ) . Therefore, comparing the relationship C P R = 0.63 C P S T F I S to the circumscribing one β R = 0.59 β S T F I S , we can assert that despite having reasonably close complexity in comparison to the ST-FIS scheme, the Rubric’s efficiency placed well behind that of the ST-FIS counterpart. This suggests that compared to this last scheme, Rubric’s performance also has room for improvement.
Additionally, overall, conventional assessment approaches in education, such as Summative evaluation and Rubric, are not ideally suited for assessing the development of competencies in students [24]. To address this, conventional assessment methods could be adapted to operate on an ordinal scale. This category includes protocols that attempt to evaluate attributes not suited to be represented numerically. These extensions rely on linguistic categories that could produce qualitative response levels. Nevertheless, this approach to assessment does not allow for measuring skills in precise mathematical operations, nor does it quantify the exact difference between each level. An alternative to conducting a practical assessment on an ordinal scale may be to address a fuzzy logic setup. Fuzzy sets provide a basis for generalizing classical set theory. Given a fuzzy set, there comes the notion of a membership function taking values within the interval [0, 1]. Namely, a membership function determines the degree of belonging of an element to a fuzzy set [57]. Fuzzy logic models knowledge through IF-THEN prepositions, where one or more fuzzy sets constitute the condition and the implication. Both establish a relationship through associated mathematical rules. Fuzzy logic enables the incorporation of qualitative criteria, such as creativity, critical thinking, or collaboration, thereby quantifying subjective judgments more realistically [58]. In educational environments, fuzzy logic allows us to capture nuances in student learning with greater flexibility and precision [59].
Furthermore, fuzzy logic facilitates the transition between different performance levels by providing a more equitable assessment tailored to the student’s progress [60]. In short, fuzzy logic introduces a more flexible and adaptable approach to educational assessment, enabling a more comprehensive and nuanced analysis of student performance. Hegazi and coworkers discussed its wide application in decision-making, classification, prediction, analysis, and evaluation of student performance [61]. This supports the idea of using a fuzzy logic approach as a powerful tool to overcome the limitations of conventional assessment methods in education, particularly when attempting to rate competencies.
Particularly, the Integrated Instruction scheme, as described by [24], is a fuzzy logic-based approach that serves as an alternative to traditional Summative and Rubric-based assessments. This method explicitly outlines guidelines for conducting a learning activity and includes specific indicators with which to determine whether the activity has been completed. The indicators within Integrated Instruction are grouped into three main attributes, aligned with essential aspects of competence acquisition: Knowledge, Procedure, and Attitude. A fuzzy set represents each attribute.
In this contribution, we presented the Systematic Task-Based Assessment Method, which is intended to be a generalized version of the Integrated Instruction scheme [24]. This offers a dual operational slant, including the conventional ST modulus and its Fuzzy Inference System-based counterpart, ST-FIS. The ST-FIS relies on indicators that could be aligned with competence acquisition according to the attributes contemplated in the Integrated Instruction method. Then, the operational foundation of the ST-FIS allows for the straightforward incorporation of linguistic typifiers into grading. Again, each one of these attributes is represented by a fuzzy set. A Fuzzy Inference System considers the valuations of these indicators as inputs and generates a precise evaluation output. According to the present analysis, we argue that through consolidation of the operational stage, the ST-FIS scheme could provide an effective means of assessing student-acquired competencies.
Concerning the relationship between the accuracy and consistency of a grading method, it is necessary to point out that the presently offered C I M index measures consistency by integrating complexity penalization and method stability. We observe that C I M increases with C P M , indicating that more complex methods tend to show higher consistency. The ST-FIS method reported the highest C I S T F I S = 1.91 , suggesting that despite its greater complexity, it maintains high coherence in evaluation. Conversely, the Summative method had the lowest C I S = 1.20 , implying that its complexity might not be sufficient to capture more nuanced evaluations.
The Summative method attained the lowest C P M value with C P S = 0.23 , indicating that it is the least complex, likely employing a more direct approach with fewer elements and operations. The ST-FIS scheme had the highest C P S T F I S = 0.96 , meaning it is the most structured and detailed method in terms of elements and operations. The relationship between C P M and C I M that is implicitly enforced by our formulation suggests that methods with higher complexity tend to achieve better consistency values.
Moreover, the ratio β M assesses a method’s ability to maintain a well-structured approach without being excessively penalized for its complexity. In that regard, the ST-FIS achieved the highest efficiency ( β S T F I S = 0.88 ) , indicating that despite its relatively high complexity (only surpassed by the Rubric in the present evaluation), it remained efficient mainly due to its well-structured design that prompted noticeable accuracy. In contrast, Summative had the lowest β M value with β S = 0.20 , indicating that although it possesses a desirable simplicity attribute, it lacks the robustness necessary to compensate for its limited elements and operations.
It is also pertinent to emphasize the relevance of elucidating the relationship between Accuracy ( M A D M ) and Complexity ( C P M ) that implicitly holds in our approach. The M A D M values indicate that more complex methods (ST and ST-FIS) exhibited lower absolute deviations, suggesting they are more precise. Moreover, the ST-FIS scheme attained the lowest M A D S T F I S = 0.06 , making it the most precise method among the four. The Summative method, with M A D R = 0.13 , was the least precise, suggesting it may produce more inconsistent evaluations compared to more structured methods.
Table 8 reports the standard errors for M A D M , C I M and β M indicators for the four grading methods analyzed. M A D S T resulted in the least variability, suggesting better precision of ST in that dimension. C I S and β S achieved lower standard error values. Conversely, the ST-FIS method showed the greatest variability across all indicators, pointing to lower statistical robustness regarding the precision of its calculated values. Nevertheless, this should be taken in a relative sense. It is worth pointing out that the reported differences in standard error values among indicators are minimal and consequently do not significantly affect the overall comparative assessment of the methods.
According to the present results, a shift to STBAM is not merely a matter of notation or structure; it represents a methodological transformation through which to address the structural shortcomings of the traditional Summative and Rubric approaches. Table 9 provides a brief scheme that allows for the comparison of STABM’s features with those of conventional methods. Specifically, while Summative assessment operates through the linear aggregation of teacher-assigned grades across predefined content sections, STBAM evaluates student performance through explicit binary indicators linked to core competencies—Learning (L), Procedure (P), and Attitude (A)—which are tied to specific instructions within a learning activity. Moreover, STBAM is not simply a variant of conventional grading tools but a comprehensive framework designed to address key limitations found in both the Summative and Rubric approaches. Summative assessment relies on weighted averages of teacher-assigned grades, and the Rubric method introduces structure through descriptive performance levels and weighted criteria. Both methods assume evaluative precision and still involve significant subjectivity.
Furthermore, STBAM enables holistic evaluation, ensuring that students not only “know” (L) but can “apply” (P) and “engage constructively” (A) with the task. This is achieved by requiring that each instruction be associated with observable, assessable indicators, making the process more transparent and less susceptible to evaluator bias. STBAM evaluates competencies through observable, binary success indicators distributed across Learning, Procedure, and Attitude attributes linked to task-specific directions. This reduces ambiguity and promotes a more objective, traceable, holistic evaluation form. However, despite the performance shown, the reliance of STBAM on fuzzy logic requires computational resources and training, which may limit its adoption in resource-constrained educational settings. Moreover, performing a pending analysis of scalability, teacher training, and cultural adaptability would be essential to strengthen the potential of the STBAM to be adopted as a reliable grading alternative to the conventional grading approaches.

5. Conclusions

According to the present analysis, the most balanced method in terms of the assessed Complexity, Consistency, and Accuracy indicators is the ST-FIS. It produced the highest values of C I M and β M . This suggests that this approach is indeed complex yet well-structured and stable. Its low value of the M A D M index suggests that it provides precise evaluations. Present exploration also suggests that simpler methods, like the Summative, may not be sufficient in terms of consistency and accuracy. Despite its low complexity, its M A D M was the highest, indicating that its ability to capture variability in data is limited. Its low C I M and β M suggest it may not be the best choice for robust evaluations.
Our results reveal that Rubric and ST balance simplicity and accuracy. The Rubric displayed good Consistency ( C I R = 1.54 ) and a reasonably attenuated penalization for Complexity ( β R = 0.52 ), which raises it as a suitable intermediate choice. ST, (with its higher C P S T = 0.79 and β S T = 0.71 ), nears the ST-FIS in quality without being as complex. ST-FIS could be the best option if the goal is to minimize variability and maximize evaluation robustness. If a more straightforward yet effective method is required, ST or Rubric may be suitable choices. According to the present results, the Summative method could be the least recommended if precision and structural rigor are essential. ST-FIS could be the best alternative for evaluations requiring precise and structured measurement. However, if a balance between simplicity and accuracy is needed, ST and Rubric could be viable options. Our exploration also suggests that the summative scheme is better recommended for cases where simplicity is prioritized over accuracy and consistency in the sense we are dealing with here. It is also worth emphasizing that alongside their outstanding performance shown by the present study, both the ST and ST-FIS could be straightforwardly adapted for evaluation of students’ acquisition of competence [24]. However, a last comment is necessary. As stated in the Discussion section, performing a pending analysis of scalability, teacher training, and cultural adaptability will be essential to strengthen the potential of the ST-FIS to be adopted as a dependable alternative to conventional grading approaches.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15116014/s1. We provide Supplementary Materials containing the codes utilized to produce simulation results and perform Consistency and Efficiency assessments.

Author Contributions

Conceptualization, C.L.-R. and H.A.E.-H.; Methodology, C.L.-R. and H.A.E.-H.; Software, C.L.-R.; Validation, H.A.E.-H. and E.V.-D.; Formal analysis, H.A.E.-H. and E.V.-D.; Investigation, H.A.E.-H.; Resources, H.A.E.-H.; Writing—original draft, C.L.-R. and H.A.E.-H.; Writing—review & editing, C.L.-R., H.A.E.-H., E.V.-D. and H.H.-A.; Supervision, H.A.E.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data produced by simulations are available from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. The Summative Assessment Method

In this appendix, we illustrate the formalities of the Summative grade assignation procedure.

Appendix A.1. Defining the Sections  s i That Make Up the Activity A to Be Evaluated

To illustrate the formalities of the summative grade assignment procedure according to the statement of Equation (1), we arrange a matrix A hosting a number m = 5 of sections. Therefore, we introduce the ( 5 × 1 ) circumscribing matrix:
A = s 1 s 2 s 3 s 4 s 5 T

Appendix A.2. Defining the Weights w i of Each of the Learning Section s i

Recalling Equation (2), we likewise adapt a matrix of weights W with the same number of entries as the matrix A of learning sections, namely,
W = w 1 w 2 w 3 w 4 w 5 T 0.2 0.3 0.1 0.2 0.2 T

Appendix A.3. Specifying the Scores p j i Assigned to the j -th Student in the Different Sections  s i

Considering Equation (5), we set the percentage scores or grades p 10 i assigned by the teacher and obtained by the 10 -th student for their performance in carrying out the activities specified in each assignment section s i . Hence, for student x 10 , we compose the matrix P ( 10 , A ) , namely,
P ( 10 , A ) = p 10   1 p 10   2 p 10   3 p 10   4 p 10   5 T 0.85 0.90 0.75 0.80 0.95 T
Then, it follows from Equation (8) that the grade e s 10 , A , W assigned by the teacher to student x 10 is the dot product of P ( 10 , A ) and W . Then, Equations (A2) and (A3) produce,
e s 10 , A , W = 0.85 , 0.90 , 0.75 , 0.80 , 0.95 · 0.2 , 0.3 , 0.1 , 0.2 , 0.2 = 0.86

Appendix A.4. Assigning the Matrix P O ( 10 , A ) of Objective Scores for the 10 -th Student in Different Sections  s i

For the S method, Equation (9) sets the associating matrix P O ( 10 , A ) of objective scores for student x 10 . Thus, we arranged the matrix P O ( 10 , A ) given by
P O ( 10 , A ) = p o 10   1 p o 10   2 p o 10   3 p o 10   4 p o 10   5 T 0.66 0.78 0.87 0.79 0.94 T
As it is set by Equation (10), we have an objective grade e s o ( 10 , A , W ) given by the scalar product of P O ( 10 , A ) and W . Then Equations (A2) and(A6) yield,
e s o 10 , A , W = 0.66 , 0.78 , 0.87 , 0.79 , 0.94 · 0.2 , 0.3 , 0.1 , 0.2 , 0.2 = 0.80

Appendix A.5. Calculation Example of the Rating Estimation Error Relative to e s o ¯ 10 , A , W

According to Equation (11), the grade e s 10 , A , W assigned by the teacher to student x 10 and given by the scalar product P 10 , A · W is e s 10 , A , W = 0.86 . Then, the associating absolute deviation of e s 10 , A , W relative to e s o ¯ 10 , A , W , becomes
e s 10 , e s , e s o ¯ = e s o ¯ 10 , A , W e s 10 , A , W = 4 / 5 0.86 = 0.06

Appendix A.6. Obtaining M A D S as Determined by the Cumulative Estimation Error e s ¯ j , e s q , e s o ¯

As stated around Equation (12), we consider q -simulated replicates p ( q ) j i for the score p j i assigned by the teacher to student x j , where q = 1 , 2 , , 100 . We defined the M A D S indicator to be given by setting the average grade e s o ¯ j , A , W to attain a value of 1 / 2 , in the Cumulative Estimation Error e s ¯ j , e s q , e s o ¯ of Equation (14). Then we have,
M A D S = q = 1 100 1 / 2 e s q j , A , W 100 = 0.11

Appendix A.7. Obtaining the Complexity C P S , Consistency C I S and Eficience β S Indexes Values

For this example, we set the numbers of components N S = 3 , elements E S = 15 and operations Q S = 19 that the Summative method involves (cf. Table A1). The minimum and maximum values for N M ,   E M and Q M entries across the addressed methods in this study are given in Table 2. Then Agreeing to Equations (89)–(91), for the summative method we take Pondered–Normalized indicator values of P N ( N S ) = 0.23 , P E ( E S ) = 0.13 , and P Q ( Q S ) = 0.31 (cf. Table 3). Also, according to Equation (96) we assign values a = 0.4 , b = 0.3   a n d   c = 0.3 (cf. Table 4). reflecting the relative importance of each of the afore given Pondered–Normalized factors in the determination of C P S . Ta Then, as stated by Equation (95), we have,
C P S = 0.4 0.23 + 0.3 0.13 + 0.3 0.31 = 0.23
And, corresponding to Equations (100) and (A8) and (A9), we have a C I S value of
C I S = 1 1 + 0.11 × 0.23 + C P S = 1.20
In turn Equation (101) sets
β S = ( 1.20 ) ( 0.23 ) ( 0.11 + 1 ) · 0.23 + 1 = 0.20
Table A1. Components, elements, and operations for the Summative scheme.
Table A1. Components, elements, and operations for the Summative scheme.
EquationComponentElementsOperations
(A2) W = 0.2 0.3 0.1 0.2 0.2 T 55
(A3) P ( 10 , A ) = 0.85 0.90 0.75 0.80 0.95 T 55
(A7) e s 10 , A , W =   0.90 , 0.85 , 0.88 , 0.92 , 0.87 × 0.2 , 0.3 , 0.1 , 0.2 , 0.2 59

Appendix B. The Rubric Method

In this appendix, we first explain the procedures for grade assignation using the Rubric method.

Appendix B.1. Establishing T ( R ) , and the H , P and P L Matrices

The rows of T ( R ) accommodate the criteria h i (Table A2). These indicate the specific aspects of the task to be assessed. We will assume that we have a number m = 5 of them. We also arrange for k = 5 performance levels l s . In the cells of T R , we describe the 25 joining linguistic descriptors u i s that specify the characteristics or skills that the student must demonstrate to opt for the s -th performance level l s while carrying out the provisions of the i -th criterion.
Table A2. Example of a Rubric table T ( R ) .
Table A2. Example of a Rubric table T ( R ) .
C r i t e r i o n D e f f i c i e n t L o w A c c e p t a b l e G o o d E x c e l l e n t
C l a r i t y   a n d
C o h e r e n c e
Confusing information and lacking coherence.Information barely improved, still lacking coherence.Information understandable,
but several
inconsistencies.
Information is clear and coherent
with some
exceptions.
Information
is clear and
coherent
at all times.
D e p t h
o f   a n a l y s i s
The analysis is very superficial or non-existent.The analysis has scarcely any depth and is still superficial. The analysis is superficial and shows a basic understanding of the topic.The analysis is adequate. It shows a good understanding of the topic.The analysis is thorough. It demonstrates a complete understanding.
U s e o f s o u r c e s
a n d   e v i d e n c e
Does not use reliable sources or present evidence.Mentions sources but lacks evidence.Uses some sources, not all of them reliable or well integrated.Uses reliable sources, not always integrated effectively.Uses trusted sources and integrates them effectively.
O r i g i n a l i t y
a n d   c r e a t i v i t y
Lacks originality and creativity.Requires more originality; shows poor creativeness.Shows originality and creativity.Displays a good level of originality and creativity.Demonstrates
a high level
of originality
and creativity.
P r e s e n t a t i o n
a n d   f o r m a t
Poor presentation and inconsistent inappropriate formatting.Presentation acceptable but still inappropriate formatting.Acceptable presentation but with several errors in the format.Good presentation, with minor errors in formatting.Presentation is
professional and
formatting is
consistent
and adequate.
According to the composition of T ( R ) above, the resulting characterization of the matrix H defined by Equation (15) with m = 5 becomes
H = h 1 h 2 h 3 h 4 h 5 T C l a r i t y   a n d   c o h e r e n c e D e p h t   o f   a n a l y s i s U s e   o f   s o u r c e s   a n d   e v i d e n c e O r i g i n a l i t y   a n d   C r e a t i v i t y P r e s e n t a t i o n   a n d   f o r m a t T .
Next, let us assign the relative weight ρ i of each criterion h i in the matrix H . This typifies the matrix P given by Equation (16); particularly, it is set to be
P = ρ 1 ρ 2 ρ 3 ρ 4 ρ 5 T 0.2 0.3 0.1 0.2 0.3 T .
To tie Equation (18) to present venues, as determined by the given conformation of T ( R ) , we must consider f i v e levels of performance. Each one is identified by a linguistic descriptor denoted through a symbol l s for s = 1 , 2 , , 5 ; to be precise, we have
P L = l s l 2 l 3 l 4 l 5 T D e f f i c i e n t L o w A c c e p t a b l e G o o d E x c e l l e n t T .

Appendix B.2. Defining the O P L Matrix

To characterize the P O L matrix of Equation (21), we must consider entries α l s that are numerical assignations to the linguistic terms l s . Furthermore, these numbers α l s are assigned to the level of performance l s , which are ratings according to the hierarchical scale specified by Inequality (19). Then, we arrange
P O L = α ( l 1 ) α ( l 2 ) α ( l 3 ) α ( l 4 ) α ( l 5 ) T 1 2 3 4 5 T .

Appendix B.3. Defining the Matrix A ( j , H , P L ) and the Performance Vector S j , H , P L

At this point, the explanation requires referring to a particular student, so let us choose student x 10 , for that aim. Then, in the adapted rubric, we must consider a matrix A ( 10 , H , P L ) , with entries a ( j ) i s given by Equation (25) for i = 1 , 2 , , 5 and s = 1 , 2 , , 5 . Particularly, as stated by Equation (26) for named student x 10 , the i -th row R i A ( 10 , H , P L ) of the A 10 , H , P L matrix becomes
R i A ( 10 , H , P L ) = a 10 i 1 , a 10 i 2 , a 10 i 3 , a 10 i 4 , a 10 i 5 ) .
As we explained around Equation (28) all the entries of R i A ( 10 , H , P L ) but one will vanish. Such an entry is denoted a 10 i s ( i ) and defined by Equation (28). Correspondingly, as stated by Equation (30) for student x 10 , we should conceive of a performance vector S ( 10 , H , P L ) bearing components a 10 i s ( i ) for i = 1 , 2 , , 5 :
S ( 10 , H , P L ) = a 10 1 s ( 1 ) , a 10 2 s ( 2 ) , a 10 3 s ( 3 ) , a 10 4 s ( 4 ) , a 10 5 s ( 5 ) .
Therefore, for student x 10 , using Equation (28) we can assign a performance vector S 10 , H , P L with the following components:
S 10 , H , P L = 3 ρ 1 5 ρ 2 2 ρ 3 3 ρ 4 2 ρ 5
Then, S 10 , H , P L provides a compact representation of A 10 , H , P L , a matrix bearing the following form:
A ( 10 , H , P L ) , = 0 0 3 ρ 1 0 0 0 0 0 0 5 ρ 2 0 2 ρ 3 0 0 0 0 0 3 ρ 4 0 0 0 2 ρ 5 0 0 0 .
For the addressed Rubric grade assignation example, the sum of entries of T S ( 10 , H , P L ) ) referred to in Equations (31) and (28) becomes
T S ( 10 , H , P L ) = i = 1 5 α l s i ρ i = 3 ρ 1 + 5 ρ 2 + 2 ρ 3 + 3 ρ 4 + 2 ρ 5

Appendix B.4. Calculate the Grade e r ( 10 , H , P L )

It turns out that agreeing to Equation (33) the total direct quantitative grade e r ( 10 , H , P L ) assigned by the teacher to the 10 -th student using the Rubric’s scheme becomes
e r ( 10 , H , P L ) = T S ( 10 , H , P L ) 5
But Equation (A20) states,
e r ( 10 , H , P L ) = 3 ρ 1 + 5 ρ 2 + 2 ρ 3 + 3 ρ 4 + 2 ρ 5 5 ,
Then we have,
e r ( 10 , H , P L ) = 0.24 + 0.8 + 0.24 + 0.84 + 0.72 5 = 2.84 5 = 0.57
Moreover, the ratio e r 10 , H , P L provides a criterion through which to assign a qualitative assessment e r Q ( 10 , H , P L ) (cf. Equation (35)) for the overall performance of the 10 -th student. Accordingly, to assign such a qualitative grade, we adapt the following inference rule,
e r Q ( 10 , H , P L ) = D e f i c i e n t i f 0 e r ( 10 , H , P L ) < 0.2 L o w i f 0.2 e r ( 10 , H , P L ) < 0.4 A c c e p t a b l e i f 0.4 e r ( 10 , H , P L ) < 0.6 G o o d i f 0.6 e r ( 10 , H , P L ) < 0.8 O u t s t a n d i n g i f 0.8 e r ( 10 , H , P L ) < 1 .
Therefore, as implied by Equation (A24) for student x 10 we assign a qualitative grade
e r Q ( 10 , H , P L ) a c c e p t a b l e

Appendix B.5. The Rating Estimation Error in the Rubric’s Evaluation of Student x 10

To illustrate the calculations to obtain the rating estimation error in the Rubric method, we take S 10 , H , P L given by Equation (A18), while S O ( 10 , H , P L ) fixes by
S O ( 10 , H , P L ) = 5 ρ 1 5 ρ 2 5 ρ 3 4 ρ 4 1 ρ 5
Therefore, as stated by Equation (38), the objective grade e r o 10 , H , P L assigned to the 10 -th student becomes
e r o 10 , H , P L = 5 ( 0.2 ) + 5 ( 0.3 ) + 5 ( 0.1 ) + 4 ( 0.2 ) + 1 ( 0.2 ) 5 = 0.80
Then, from Equations (39) and (A23), the subsequent absolute deviation becomes
e r ( 10 , S , S O ) = 0.57 0.80 = 0.23

Appendix B.6. Estimation of the Mean Absolute Deviation e r ¯ j , H , P L

Now, as explained around Equation (40), for q = 1 n k , we consider the following replicates:
S q ( j , H , P L ) = a ( q ) 1 s ( 1 ) j , a ( q ) 2 s ( 2 ) j , a ( q ) 3 s ( 3 ) j , a ( q ) 4 s ( 4 ) j , a ( q ) 5 s ( 5 ) j .
Then, letting e r o ¯ ( j , H , P L ) stand for the average value of objective grades e r o ( j , H , P L ) (cf. Equation (38)), we can consider the replicate absolute deviations e r j , e r q , e r o ¯ given by
e r j , e r q , e r o ¯ = e r o ¯ ( j , H , P L ) e r q ( j , H , P L ) ,
with e r q ( j , H , P L ) given by Equation (42), and
T S q ( j , H , P L ) = i = 1 m a ( q ) i s ( i ) j .
Then, setting δ m = 100 and letting e r ¯ j , H , P L denote the Mean Absolute Deviation of simulated values e r q ( j , H , P L ) relative to e r o ¯ ( j , H , P L ) , we have
e r ¯ j , H , P L = q = 1 δ m e r o ¯ ( j , H , P L ) e r q ( j , H , P L ) 100 ,
which agrees to the convention e r o ¯ j , H , P L = 1 / 2 , yielding
e r ¯ j , H , P L = 0.13 .

Appendix B.7. Obtaining the Complexity C P R , Consistency C I R and Eficience β R Indexes Values

For this example, we set the numbers of components N R = 6 , elements E R = 47 and operations Q R = 60 that the Rubric method involves (cf. Table A3). The minimum and maximum values for N M ,   E M and Q M entries across the addressed methods in this study are given in Table 2. Then Agreeing to Equations (89)–(91), for the Rubric method we take Pondered–Normalized indicator values of P N ( N R ) = 0.46 ,   P E ( E R ) = 0.41 , and P Q ( Q R ) = 1 (cf. Table 3). Also, according to Equation (96) we assign values a = 0.4 ,   b = 0.3   a n d   c = 0.3 (cf. Table 4). reflecting the relative importance of each one of the afore given Pondered–Normalized factors in the determination of C P S . Then, as stated by Equation (95), we have,
C P R = 0.4 0.46 + 0.3 0.41 + 0.3 1 = 0.61 .
Correspondingly, Equation (A33) sets the M A D R indicator to attain a value of
M A D R = 0.13 .
Then, according to Equations (100) and (A34)–(A35), we have a C I R value of
C I R = 1 1 + 0.13 × 0.61 + C P R = 1.54 .
In turn Equation (101) and (A34)–(A36), set,
β R = ( 1.54 ) ( 0.61 ) ( 0.13 + 1 ) · 0.61 . + 1 = 0.52
Table A3. Components, elements, and operations for the Rubric scheme.
Table A3. Components, elements, and operations for the Rubric scheme.
EquationComponentElementsOperations
(A13) P = 0.08 0.16 0.12 0.28 0.36 T 5
(A15) O P L = 1 2 3 4 5 T 5
(A17) S 10 , H , P L = a 10 1 s 1 , a 10 1 s ( 1 ) , a 10 1 s ( 1 ) , a 10 4 s ( 4 ) , a 10 5 s ( 5 ) 525
(25) and (A19) A ( 10 , H , P L ) = 0 0 3 ρ 1 0 0 0 0 0 0 5 ρ 2 0 2 ρ 3 0 0 0 0 0 3 ρ 4 0 0 0 2 ρ 5 0 0 0 2525
(A20) T S ( 10 , H , P L ) = 3 ρ 1 + 5 ρ 2 + 2 ρ 3 + 3 ρ 4 + 2 ρ 5 59
(A21) e r ( 10 , H , P L ) = T S ( 10 , H , P L ) 5 21

Appendix C. The ST Method

In this appendix, we illustrate the formalities of the ST grade assignation procedure.

Appendix C.1. Agreeing to Equation (45), We Arrange the Matrix D Hosting a Number n = 5 of Directions

D = d 1 d 2 d 3 d 4 d 5 C o m p r e h e n s i o n   a n d   C o n c e p t   I d e n t i f i c a t i o n P r o b l e m S o l v i n g   A p p l i c a t i o n R e s e a r c h   a n d   E v i d e n c e   I n t e g r a t i o n P r a c t i c a l   I m p l e m e n t a t i o n   o r   E x p e r i m e n t a t i o n R e f l e c t i o n   a n d   S y n t h e s i s
Now, as described in Equation (46), to shape the configuration of the ST structure, we need to combine the matrices L i , P i , and A i . We do so by referring to each direction d i . Direction d 1 : Comprehension and Concept Identification (Task: read and summarize key concepts from material provided)
For the assemblage of matrices L 1 ,   P 1 , and A 1 (cf. Equation (46)), we let π L , 1 = 3 ,   π P , 1 = 5 , and π A , 1 = 2 ; thus, we obtain
  • L 1 : (Cognitive Attributes)—Knowledge required for the task
    L 1 = L 1 1 L 2 1 L 3 1 D e f i n i t i o n   o f k e y   t e r m s U n d e r s t a n d i n g   o f   f u n d a m e n t a l   p r i n c i p l e s R e c o g n i t i o n   o f   e s s e n t i a l   f a c t s
  • P 1 : (Practical Application Attributes)—How knowledge is applied
    P 1 = P 1 1 P 2 1 P 3 1 P 4 1 P 5 1 I d e n t i f y i n g   k e y   p o i n t s   i n   a   t e x t S u m m a r i z i n g   c o n c e p t s   c o n c i s e l y R e l a t i n g   i d e a s   t o   p r i o r   k n o w l e d g e C a t e g o r i z i n g   i n f o r m a t i o n   l o g i c a l l y D i s t i n g u i s h i n g   e s s e n t i a l   v s .   n o n e s s e n t i a l   d e t a i l
  • A 1 : (Performance and Behavioral Attributes)—Cognitive engagement
    A 1 = A 1 1 A 2 1 C r i t i c a l   T h i n k i n g L o g i c a l   R e a s o n i n g .
    Direction d 2 : Problem-Solving Application (task: apply identified concepts to solve a given problem)
To obtain the matrices L 2 , P 2 and A 2 , we let π L , 2 = 2 , π P , 2 = 2 ,   a n d   π A , 2 = 5 , (cf. Equation (46)); thus, we have
  • L 2 : (Cognitive Attributes)
    L 2 = L 1 2 L 2 2 K n o w l e d g e   o f   p r o b l e m s o l v i n g   t e c h n i q u e s U n d e r s t a n d i n g   o f   r e l e v a n t   f o r m u l a s
  • P 2 : (Practical Application Attributes)
    P 2 = P 1 2 P 2 2 A p p l y i n g   c o n c e p t s   t o   s t r u c t u r e d   p r o b l e m s I d e n t i f y i n g   a p p r o p r i a t e   m e t h o d s   f o r   s o l u t i o n s
  • A 2 : (Performance and Behavioral Attributes)
    A 2 = A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 L o g i c a l   s t r u c t u r i n g   o f   t h o u g h t   p r o c e s s e s A b s t r a c t   r e a s o n i n g D e t a i l o r i e n t e d   p r o b l e m   s o l v i n g J u s t i f i c a t i o n   o f   c h o s e n   m e t h o d o l o g y P e r s i s t e n c e   i n   f i n d i n g   s o l u t i o n s
    Direction d 3 : Research and Evidence Integration (task: find and integrate external sources into analysis)
For building up the matrices L 3   P 3 and A 3 , we let π L , 3 = 2 , π P , 3 = 1 , π A , 3 = 4 , respectively (cf. Equation (46)) then, we have
  • L 3 : (Cognitive Attributes)
    L 3 = L 1 3 L 2 3 U n d e r s t a n d i n g   o f   c r e d i b l e   s o u r c e s A w a r e n e s s   o f   d i f f e r e n t   p e r s p e c t i v e s
  • P 3 : (Practical Application Attributes)
    P 3 = P 1 3 I d e n t i f y i n g   r e l e v a n t   a c a d e m i c   s o u r c e s
  • A 3 : (Performance and Behavioral Attributes)
    A 3 = A 1 3 A 2 3 A 3 3 A 4 3 A n a l y t i c a l   r e a s o n i n g C r i t i c a l   e v a l u a t i o n   o f   s o u r c e s J u s t i f y i n g   r e s e a r c h   c h o i c e s E t h i c a l   u s e   o f   i n f o r m a t i o n
    Direction d 4 : Practical Implementation or Experimentation (task: conduct an experiment or case study, record observations, and analyze results)
To acquire the matrices L 4 , P 4 and A 4 (cf. Equation (46)), we let π L , 4 = 3 , π P , 4 = 2 , π A , 4 = 4 , respectively; then, we have
  • L 4 : (Cognitive Attributes)
    L 4 = L 1 4 L 2 4 L 3 4 U n d e r s t a n d i n g   o f   e x p e r i m e n t a l   m e t h o d s A w a r e n e s s   o f   k e y   v a r i a b l e s F a m i l i a r i t y   w i t h   d a t a   c o l l e c t i o n   t e c h n i q u e s
  • P 4 : (Practical Application Attributes)
    P 4 = P 1 4 P 2 4 D e s i g n i n g   a n   e x p e r i m e n t   o r   c a s e s t u d y I m p l e m e n t i n g   a   s t e p b y s t e p   p r o c e d u r e
  • A 4 : (Performance and Behavioral Attributes)
    A 4 = A 1 4 A 2 4 A 3 4 A 4 4 S y s t e m a t i c   a p p r o a c h   t o   i n v e s t i g a t i o n A t t e n t i o n   t o   d e t a i l   i n   o b s e r v a t i o n s A n a l y t i c a l   t h i n k i n g   i n   d a t a   i n t e r p r e t a t i o n A c c u r a c y   i n   r e p o r t i n g   r e s u l t s
    Direction d 5 : Reflection and Synthesis (task: reflect on learning experience and articulate personal insights)
To set the form of the matrices L 5 , P 5 and A 5 , we let π L , 5 = 5 , π P , 5 = 2 , π A , 5 = 3 , respectively (cf. Equation (46)); in this way, we obtain,
  • L 5 : (Cognitive Attributes)
    L 5 = L 1 5 L 2 5 L 2 5 L 4 5 L 5 5 A w a r e n e s s   o f   p e r s o n a l   l e a r n i n g   p r o c e s s U n d e r s t a n d i n g   o f   s t r e n g t h s   a n d   w e a k n e s s e s A b i l i t y   t o   r e l a t e   l e a r n i n g   t o   r e a l w o r l d   a p p l i c a t i o n s R e c o g n i t i o n   o f   i n t e r d i s c i p l i n a r y   c o n n e c t i o n s C a p a c i t y   t o   s y n t h e s i z e   a c q u i r e d   k n o w l e d g e
  • P 5 : (Practical Application Attributes)
    P 5 = P 1 5 P 2 5 W r i t i n g   a   s t r u c t u r e d   r e f l e c t i o n P r o v i d i n g   e x a m p l e s   o f   p e r s o n a l   l e a r n i n g   e x p e r i e n c e s
  • A 5 : (Performance and Behavioral Attributes)
    A 5 = A 1 5 A 2 5 A 3 5 S e l f a s s e s s m e n t   s k i l l s C r i t i c a l   r e f l e c t i o n   o n   e x p e r i e n c e s L o g i c a l   s t r u c t u r i n g   o f   p e r s o n a l   i n s i g h t s
Furthermore, as stated by Equation (47), to qualify the effort by the student while addressing direction d i , we must consider an attainment indicator l ( j ) k i associated with the L k i trait. Similarly, p ( j ) k i will represent the corresponding success indicator for the P k i attribute. In turn, a ( j ) k i will represent the success indicator of the A k i attribute. Therefore, formally, for i = 1 , 2 , , 5 , for direction d i , the structure of the S T protocol also includes the indicator matrices I L ( j ) i , I P ( j ) i , and I A ( j ) i :
I L ( j ) 1 = l ( j ) 1 1 l ( j ) 2 1 l ( j ) 3 1         I P ( j ) 1 = p ( j ) 1 1 p ( j ) 2 1 p ( j ) 3 1 p ( j ) 4 1 p ( j ) 5 1         I A ( j ) 1 = a ( j ) 1 1 a ( j ) 2 1
I L ( j ) 2 = l ( j ) 1 2 l ( j ) 2 2         I P ( j ) 2 = p ( j ) 1 2         I A ( j ) 2 = a ( j ) 1 2 a ( j ) 2 2 a ( j ) 3 2 a ( j ) 4 2 a ( j ) 5 2
I L ( j ) 3 = l ( j ) 1 3 l ( j ) 2 3 l ( j ) 3 3         I P ( j ) 3 = p ( j ) 1 3         I A ( j ) 3 = a ( j ) 1 3 a ( j ) 2 3 a ( j ) 3 3 a ( j ) 4 3
I L ( j ) 4 = l ( j ) 1 4 l ( j ) 2 4 l ( j ) 3 4         I P ( j ) 4 = p ( j ) 1 4 p ( j ) 2 4         I A ( j ) 4 = a ( j ) 1 4 a ( j ) 2 4 a ( j ) 3 4 a ( j ) 4 4
I L ( j ) 5 = l ( j ) 1 5 l ( j ) 2 5 l ( j ) 3 5 l ( j ) 4 5 l ( j ) 5 5         I P ( j ) 5 = p ( j ) 1 5 p ( j ) 2 5         I A ( j ) 5 = a ( j ) 1 5 a ( j ) 2 5 a ( j ) 3 5

Appendix C.2. Performance Marks or X

As we explained in Equations (48)–(58) that establish the steps yielding the grade e s t ( j , z , D ) assigned directly by the teacher by using the conventional modulus of the ST scheme, firstly, we add performance marks or X for the indicators l ( j ) k i ,   p ( j ) k i , and a ( j ) k i composing Equations (A54) through (A58), which is to say
I L ( j ) 1 = X X     I P ( j ) 1 =     I A ( j ) 1 = X
I L ( j ) 2 = X     I P ( j ) 2 = X     I A ( j ) 2 = X
I L ( j ) 3 = X X X     I P ( j ) 3 =     I A ( j ) 3 = X X
I L ( j ) 4 = X X     I P ( j ) 4 = X     I A ( j ) 4 = X
I L ( j ) 5 = X     I P ( j ) 5 =     I A ( j ) 5 = X X X

Appendix C.3. Total Number of Indicators for d i

According to Equations (48)–(52), we obtain the total number of indicators pooling for d i that display a binary pointer , namely,
z ( j ) L 1 = 1             z ( j ) P 1 = 5     z ( j ) A 1 = 1
z j L 2 = 1             z j P 2 = 0     z j A 2 = 4
z j L 3 = 0             z j P 3 = 1     z j A 3 = 2
z j L 4 = 1             z j P 4 = 1     z j A 4 = 3
z j L 5 = 4             z j P 5 = 2     z j A 5 = 0

Appendix C.4. Numbers of Indicators Showing a Binary Pointer

As stated by Equation (54), the numbers of indicators corresponding to L , P ,   a n d   A , that show a binary pointer are, respectively,
z ( j ) L = 7             z ( j ) P = 9             z j A = 10
The steps described through Equations (A38)–(A69) can be summarized in the following T ( S T ) table.
Table A4. Directions, Atributes, Indicators and Performance Pointers composing the ST scheme.
Table A4. Directions, Atributes, Indicators and Performance Pointers composing the ST scheme.
Direction   ( d i ) Learning   ( L ) Procedure   ( P ) Attitude   ( A )
d 1 🗴      L 1 1 ✓      P 1 1 ✓      A 1 1
🗴      L 2 1 ✓      P 2 1 🗴      A 2 1
✓      L 3 1 ✓      P 3 1
✓      P 4 1
✓      P 5 1
d 2 🗴      L 1 2 🗴      P 1 2 ✓      A 1 2
✓      L 2 2 ✓      A 2 2
✓      A 3 2
🗴      A 4 2
✓      A 5 2
d 3 🗴      L 1 2 ✓      P 1 2 ✓      A 1 2
🗴      L 2 2 ✓      A 2 2
🗴      L 3 2 🗴      A 3 2
🗴      A 4 2
d 4 🗴      L 1 2 ✓      P 1 2 ✓      A 1 2
✓      L 2 2 🗴      P 1 2 ✓      A 2 2
🗴      L 2 2 🗴      A 3 2
✓      A 4 2
d 5 ✓      L 1 2 ✓      P 1 2 🗴      A 1 2
✓      L 2 2 ✓      P 2 2 🗴      A 2 2
✓      L 3 2 🗴      A 3 2
✓      L 4 2
🗴      L 5 2
Totals z ( j ) L = 7 z ( j ) P = 9 z j A = 10

Appendix C.5. Numbers of Indicators in L , P , and A

Correspondingly, agreeing to Equation (55), the total numbers of indicators in L , P , and A , are, respectively, given by
T L = 16         T P = 11         T A = 18 .

Appendix C.6. Total Number of Indicators Composing the ST Scheme

Relating to Equation (56), the total number T I of indicators composing the ST scheme becomes
T I = 45 .

Appendix C.7. Vector z ( j , D ) of Inputs

Now, going along with Equation (57), the vector z ( j , D ) of grading inputs z ( j ) L , z ( j ) P , z j A becomes
z j , D = z j L , z j P , z j A = 7 , 9 , 10 .

Appendix C.8. Assigning the e s t ( j , z , D ) Grade

Then, using Equation (58), the e s t ( j , z , D ) grade directly assigned by the teacher to the j -th student by relying in the direct ST scheme is
e s t ( j , z , D ) = z ( j ) L + z ( j ) P + z j A T I = 7 + 9 + 10 45 = 26 45 = 0.57 .

Appendix C.9. Estimating the Absolute Deviation e s t ( j , z , z o , D )

As stated by Equation (62), we consider an objective performance rate e s t ( j , z o , D ) of 4 / 5 . It turns out that, agreeing with Equation (A56), we have T I = 45 ; then, to achieve e s t ( j , z o , D ) = 4 / 5 , a configuration of the vector z o ( j , D ) (cf. Equation (61)) that yields z o ( j ) L + z o ( j ) P + z o j A = 36 will suffice. Thus, we could set
z o ( j , D ) = 11 , 10 , 15 .
Then, the grade e s t ( j , z o , D ) will indeed be
e s t ( j , z o , D ) = 11 + 10 + 15 45 = 36 45 = 0.80
Therefore, agreeing with Equation (63) and using the values obtained by Equations (A73) and (A75), the absolute deviation e s t ( j , z , z o ) becomes
e s t j , z , z o , D = e s t ( j , z o , D ) e s t ( j , z , D ) = 0.80 0.57 = 0.23

Appendix C.10. Estimating the Mean Absolute Deviation e s t ¯ ( j , z q , z o , D )

We now set a labeling index q = 2 , 1 , , θ m , with θ m = 100 aimed to label simulated replicates z ( q , j ) L z ( q , j ) P and z ( q , j ) A , (cf. Equation (64)). In this way, we obtained the vector z q ( j , D ) defined by Equation (65), and we also obtained the simulated grades e s t ( j , z q , D ) defined by Equation (66). Additionally, we considered a value for e s t ¯ ( j , z o , D ) of 1 / 2 to stand for the average value of the e s t ( j , z o , D ) grades assigned through the ST method. This in turn produced absolute deviations e s t j , z q , z o , D as specified by Equation (67). Then, the Mean Absolute Deviation e s t ¯ ( j , z q , z o , D ) of Equation (68) resulted in
e s t ¯ j , z q , z o , D = q = 1 100 1 / 2 e s t ( j , z q , D ) 100 = 0.07

Appendix C.11. Obtaining the Complexity C P S T , Consistency C I S T and Eficience β S T Indexes Values

For this example, we set the numbers of components N S T = 8 , elements E S T = 112 and operations Q S T = 48 that the ST method involves (cf. Table A5). The minimum and maximum values for N M ,   E M and Q M entries across the addressed methods in this study are given in Table 2. Then Agreeing to Equations (89) through (91) for the ST method we take Pondered–Normalized indicator values of P N ( N S T ) = 0.61 ,   P E ( E S T ) = 1.0 , and P Q ( Q S T ) = 0.80 (cf. Table 3). Also, according to Equation (96) we assign values a = 0.4 ,   b = 0.3   a n d   c = 0.3 (cf. Table 4). reflecting the relative importance of each one of the afore given Pondered–Normalized factors in the determination of C P S T . Then, as stated by Equation (95), we have,
Table A5. Components, elements, and operations for the ST scheme.
Table A5. Components, elements, and operations for the ST scheme.
Equation Components   ( N ) Elements   ( E ) Operations (M)
(A38) D = d 1 d 2 d 3 d 4 d 5 5
(A41) I L ( 2 ) 1 = X X      I P ( 2 ) 1 =         I A ( 2 ) 1 = X 10
I L ( 2 ) 2 = X      I P ( 2 ) 2 = X      I A ( 2 ) 2 = X 8
I L ( 2 ) 3 = X X X      I P ( 2 ) 3 =      I A ( 2 ) 3 = X X 8
I L ( 2 ) 4 = X X      I P ( 2 ) 4 = X      I A ( 2 ) 4 = X 9
I L ( 2 ) 5 = X      I P 2 5 =      I A ( 2 ) 5 = X X X 10
(A42) z ( 2 ) L 1 = 1        z ( 2 ) P 1 = 5    z ( 2 ) A 1 = 1
z ( 2 ) L 2 = 1        z ( 2 ) P 2 = 0    z ( 2 ) A 2 = 4
z ( 2 ) L 3 = 0        z ( 2 ) P 3 = 1    z ( 2 ) A 3 = 2
z ( 2 ) L 4 = 1        z ( 2 ) P 4 = 1    z ( 2 ) A 4 = 3
z ( 2 ) L 5 = 4        z ( 2 ) P 5 = 2    z ( 2 ) A 5 = 0
2623
(A43) z ( 2 ) L = 7     z ( 2 ) P = 9     z ( 2 ) A = 10 1512
(A44) T L = 16     T P = 11     T A = 18 1512
(A45) T I = 45 1
(A46) z 2 , D = z 2 L , z 2 P , z 2 A = 7 , 9 , 10 3
(A47) e s t ( z 2 , D ) = 7 + 9 + 10 45 = 26 45 = 0.57 21
The complexity pointer C P S T resulted in
C P S T = 0.4 0.61 + 0.3 1.0 + 0.3 0.80 = 0.79
And, according to Equation (A77) we have a M A D S T value of 0.07.
Then, agreeing to Equation (100) we obtain a C I S T value of
C I S T = 1 1 + 0.07 × 0.79 + C P S T = 1.73
Correspondingly as set by Equation (101) we get,
β S T = ( 1,73 ) ( 0.79 ) ( 0.07 + 1 ) · 0.79 + 1 = 0.71

Appendix D. The ST-FIS Method

In this appendix, we illustrate the formalities of the S T F I S grade assignation procedure.

Appendix D.1. Acquiring Input Variables

According to Equation (70), the ST-FIS considers a set of three input values:
Z = z 1 , z 2 , z 3 ,
where each z k for k = 1 , 2 , 3 , represents a crisp value. As indicated around Equation (71), for each input z k , we must consider an associated pool of fuzzy sets generically represented by F and defined through
F = B 1 F , , B 2 F , , B m F 1 F , B m ( F ) F ,
B k F for k = 1 , 2 , , m ( F ) represents a linguistic term that qualitatively interprets the input z k . Particularly, given a direction d i for i = 1 , 2 , , 5 , the ST-FIS takes up as input variables the fuzzy sets; L covers the learning of theoretical background, P the display of practical skills, and A the student’s disposition to complete tasks. Thus, according to Equations (72)–(74) here, we associate linguistic terms typifying the fuzzy sets L , P and A , explicitly:
L = B 1 L Little , B 2 L Enough , B 3 L Much
P = B 1 P Unrealized , B 2 P Incomplete , B 3 P Realized
A = B 1 A U n e x p e c t e d , B 2 A Expected
Similarly, the L ,   P and A input fuzzy sets ((A83)–(A85)) can be represented using the binary input value–membership function notation, namely,
L = z 1 , μ B 1 L z 1 , z 1 , μ B 2 L z 1 , z 1 , μ B 3 L z | z 1 U ,
P = z 2 , μ B 1 P z 2 , z 2 , μ B 2 P z 2 , z 2 , μ B 3 P z 2 | z 2 U ,
A = z 3 , μ B 1 A z 3 , z 3 , μ B 2 A z 3 | z 3 U ,
where U is the universe that contains all the possible elements that belong to a particular context. For the present aims, we arrange membership functions associated with the fuzzy sets L and P that assign to each element of U a number within the interval 0,1 , namely,
μ B 1 L z 1 = 1 i f z 1 a L 1 × T L 10 1 2 z 1 a L 1 × T L 10 b L 1 × T L 10 a L 1 × T L 10 2 i f a L 1 × T L 10 z 1 a L 1 × T L 10 + b L 1 × T L 10 2 2 z 1 b L 1 × T L 10 b L 1 × T L 10 a L 1 × T L 10 2 i f a L 1 × T L 10 + b L 1 × T L 10 2 z 1 b L 1 × T L 10 0 i f z 1 b L 1 × T L 10 f o r   0 z 1 T L ,
μ B 2 L z 1 = e z 1 m L 2 × T L 10 2 2 σ L 2 × T L 10 2 f o r   0 z 1 T L ,
μ B 3 L z 1 = 0 i f z 1 a L 3 × T L 10 2 z 1 a L 3 × T L 10 b L 3 × T L 10 a L 3 × T L 10 2 i f a L 3 × T L 10 z 1 m L 3 × T L 10 1 2 z 1 b L 3 × T L 10 b L 3 × T L 10 a L 3 × T L 10 2 i f m L 3 × T L 10 < z 1 < b L 3 × T L 10 1 i f z 1 b L 3 × T L 10 f o r   0 z 1 T L ,
μ B 1 P z 2 = 1 i f z 2 a P 1 × T P 10 1 2 z 2 a P 1 × T P 10 b P 1 × T P 10 a P 1 × T P 10 2 i f a P 1 × T P 10 z 2 a P 1 × T P 10 + b P 1 × T P 10 2 2 z 2 b P 1 × T P 10 b P 1 × T P 10 a P 1 × T P 10 2 i f a P 1 × T P 10 + b P 1 × T P 10 2 z 2 b P 1 × T P 10 0 i f z 2 b P 1 × T P 10 f o r   0 z 2 T P ,
μ B 2 P z 2 = e z 2 m P 2 × T P 10 2 2 σ P 2 × T P 10 2 f o r   0 z 2 T P ,
μ B 3 P z 2 = 0 i f z 2 a P 3 × T P 10 2 z 2 a P 3 × T P 10 b P 3 × T P 10 a P 3 × T P 10 2 i f a P 3 × T P 10 z 2 m P 3 × T P 10 1 2 z 2 b P 3 × T P 10 b P 3 × T P 10 a P 3 × T P 10 2 i f m P 3 × T P 10 < z 2 < b P 3 × T P 10 1 i f z 2 b P 3 × T P 10 f o r   0 z 2 T P ,
where m * , σ * , a * and b * are real numbers (see Table A6), and the parameters T L and T P are positive integers, which are specified by Equation (55). The membership functions expressed in Equations (A89) and (A92) are of the z-type; those expressed in Equations (A90) and (A93) are Gaussian-type; and those expressed in Equations (A91) and (A94) are of s-type.
Table A6. Values of the parameters of the membership functions (MFs) that characterize the fuzzy sets L , P , A and E .
Table A6. Values of the parameters of the membership functions (MFs) that characterize the fuzzy sets L , P , A and E .
MF a * b * c * d * m * σ * Equation
L μ B 1 L z 1 a L 1 = 0.8 b L 1 = 7.2 --------------------(A89)
μ B 2 L z 1 -------------------- m L 2 = 8 σ L 2 = 1.6 (A90)
μ B 3 L z 1 a L 2 = 8.8 b L 3 = 15.2 ---------- m L 3 =   ( a L 3 + b L L 3 ) / 2 -----(A91)
P μ B 1 P z 2 a P 1 = 0 b P 1 = 3.96 --------------------(A92)
μ B 2 P z 2 -------------------- m P 2 = 5.5 σ P 2 = 1.40 (A93)
μ B 3 P x 2 a P 3 = 7.04 b P 3 = 11 ---------- m P 3 =   ( a P 3 + b P 3 ) / 2 -----(A94)
A μ B 1 A z 3 a A 1 = 0 b A 1 = 0 c A 1 = 3 d A 1 = 9 ----- (A95)
μ B 2 A z 3 a A 2 = 9 b A 2 = 15 c A 2 = 18 d A 2 = 18----------(A96)
D μ B 1 E Y a E = 0 b E = 0.001 m E = 0 -----(A99)
μ B 2 E Y a E = 0 b E = 1.18 m E = 0.59 -----(A99)
μ B 3 E Y a E = 0.59 b E = 1.76 m E = 1.18 -----(A99)
μ B 4 E Y a E = 1.18 b E = 2.35 m E = 1.76 -----(A99)
μ B 5 E Y a E = 1.76 b E = 2.94 m E = 2.35 -----(A99)
μ B 6 E Y a E = 2.35 b E = 3.53 m E = 2.94 -----(A99)
μ B 7 E Y a E = 2.94 b E = 4.12 m E = 3.53 -----(A99)
μ B 8 E Y a E = 3.53 b E = 4.71 m E = 4.12 -----(A99)
μ B 9 E Y a E = 4.12 b E = 5.29 m E = 4.71 -----(A99)
μ B 10 E Y a E = 4.71 b E = 5.88 m E = 5.29 -----(A99)
μ B 11 E Y a E = 5.29 b E = 6.47 m E = 5.88 -----(A99)
μ B 12 E Y a E = 5.88 b E = 7.06 m E = 6.47 -----(A99)
μ B 13 E Y a E = 6.47 b E = 7.65 m E = 7.06 -----(A99)
μ B 14 E Y a E = 7.06 b E = 8.24 m E = 7.65 -----(A99)
μ B 15 E Y a E = 7.65 b E = 8.82 m E = 8.24 -----(A99)
μ B 16 E Y a E = 8.24 b E = 9.41 m E = 8.82 -----(A99)
μ B 17 E Y a E = 8.82 b E = 10 m E = 9.41 -----(A99)
μ B 18 E Y a E = 9.99 b E = 10 m E = 10 -----(A99)
Correspondingly, we set the membership functions associated with the fuzzy set A to be of a trapezoidal type, and they are defined as follows:
μ B 1 A z 3 = 0 i f z 3 < a A 1 × T A 10 z 3 a A 1 × T A 10 b A 1 × T A 10 a A 1 × T A 10 i f a A 1 × T A 10 z 3 b A 1 × T A 10 1 i f b A 1 × T A 10 < z 3 < c A 1 × T A 10 z 3 d A 1 × T A 10 d A 1 × T A 10 c A 1 × T A 10 i f c A 1 × T A 10 z 3 d A 1 × T A 10 0 i f z 3 > d N E × T A 10   f o r   0 x 3 a = T A ,
μ B 2 A z 3 = 0 i f z 3 < a A 2 × T A 10 z 3 a A 2 × T A 10 b A 2 × a 10 a A 2 × a 10 i f a A 2 × T A 10 z 3 b A 2 × T A 10 1 i f b A 2 × T A 10 < z 3 < c A 2 × T A 10 z 3 d A 2 × T A 10 d A 2 × T A 10 c A 2 × T A 10 i f c A 2 × T A 10 z 3 d A 2 × T A 10 0 i f z 3 > d A 2 × T A 10   f o r   0 z 3 T A ,
where a * , b * , c * and d * are real numbers (see Table A6), and the parameter T A is a positive integer (cf. Equation (55)).

Appendix D.2. Output Variable

Likewise, looking forward to adapting an FIS for the ST protocol and according to Equation (75), we take up an output variable E , which links to an Enlightenment or Proficiency skill shown by the student and that has linguistic terms that qualitatively interpret its variability:
E = { B 1 E I n a d e q u a t i e , B 2 E D e f i c i e n t ,   B 3 E L o w , B 4 E M o d e r a t e , B 5 E R e a s o n a b l e , B 6 E P r o m i s i n g , B 7 E A c c e p t a b l e , B 8 E Satisfactory ,   B 9 E Good ,   B 10 E E l e v a t e d , B 11 E H i g h ,   B 12 E D i s t i n g u i s h e d , B 13 E E x c e p t i o n a l , B 14 E S u p e r i o r , B 15 E A d m i r a b l e , B 16 E E x t r a o r d i n a r y , B 17 E Unparalleled   B 18 E Magnificent }
Correspondingly, we define the fuzzy set that represents the E output variable (A97):
E = y , μ B 1 E y , y , μ B 2 E y , y , μ B 3 E y , y , μ B 4 E y , y , μ B 5 E y , y , μ B 6 E y , y , μ B 7 E y , y , μ B 8 E y , y , μ B 9 E y , y , μ B 10 E y , y , μ B 11 E y , y , μ B 12 E y , y , μ B 13 E y , y , μ B 14 E y , y , μ B 15 E y , y , μ B 16 E y , y , μ B 17 E y , y , μ B 18 E y | y U , .
The membership functions that we link with the fuzzy set E are of a triangular type and are defined as follows:
μ B * E y = 0 i f y a E y a E m E a E i f a E y m E b E y b E m E i f m E y b E 0 i f o t h e r w i s e f o r   0 y q ,
where q = 10.5 and a * , b * y m * are real numbers (see Table A5).

Appendix D.3. Rule Base

Agreeing with Equation (76), the adapted ST-FIS is composed of a total of 18 fuzzy rules, which we express by using the membership functions defined in Equations (A89)–(A96) and (A99), namely,
R 1 : I F   μ B 1 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 1 E y 1
R 2 : I F   μ B 1 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 2 E y 2
R 3 : I F   μ B 1 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 3 E y 3
R 4 : I F   μ B 1 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 4 E y 4
R 5 : I F   μ B 1 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 5 E y 5
R 6 : I F   μ B 1 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 6 E y 6
R 7 : I F   μ B 2 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 7 E y 7
R 8 : I F   μ B 2 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 8 E y 8
R 9 : I F   μ B 2 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 9 E y 9
R 10 :   I F μ B 2 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 10 E y 10
R 11 : I F   μ B 2 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 11 E y 11
R 12 : I F   μ B 2 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 12 E y 12
R 13 : I F   μ B 3 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 13 E y 13
R 14 : I F   μ B 3 L z ( j ) 1 μ B 1 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 14 E y 14
R 15 : I F   μ B 3 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 15 E y 15
R 16 : I F   μ B 3 L z ( j ) 1 μ B 2 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 16 E y 16
R 17 : I F   μ B 3 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 1 A z ( j ) 3 THEN   μ B 17 E y 17
R 18 : I F   μ B 3 L z ( j ) 1 μ B 3 P z ( j ) 2 μ B 2 A z ( j ) 3 THEN   μ B 18 E y 18

Appendix D.4. Fuzzification

In the presently adapted ST-FIS structure through the fuzzification process, the input vector z ( j ) L , z ( j ) P , z j A (cf. Equation (57)) is transferred into fuzzy values through Equation (76) based on the fuzzy sets defined according to Equations (A86)–(A88) and (A98), which are predefined by their membership functions given by Equations (A89)–(A96) and (A99), that is
For z 1 , we can consider membership functions:
μ B 1 L z ( j ) 1 , μ B 2 L z ( j ) 1 , μ B 3 L z ( j ) 1 .
Similarly, for z 2 we arrange
μ B 1 P z ( j ) 2 , μ B 2 P z ( j ) 2 , μ B 2 P z ( j ) 2 .
And for z 3 ,
μ B 1 A z ( j ) 3 , μ B 2 A z ( j ) 3 .
The output of the fuzzification step is a fuzzy representation of each input variable z ( j ) * across its relevant fuzzy sets. This representation provides the basis for subsequent processing, such as applying fuzzy rules and aggregating results.

Appendix D.5. Rule Evaluation

The inference engine evaluates the rules in the knowledge base using fuzzy logic operators to combine membership degrees. For a given input set Z = z ( j ) 1 , z ( j ) 2 , z ( j ) 3 the adapted ST-FIS calculates the activation level α r of each rule R r using the ‘and’ ( m i n ) operators, namely,
α 1 = m i n μ B 1 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 2 = m i n μ B 2 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 3 = m i n μ B 1 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 4 = m i n μ B 2 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 5 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 6 = m i n μ B 2 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 7 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 8 = m i n μ B 3 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 9 = m i n μ B 1 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 10 = m i n μ B 3 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 11 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 12 = m i n μ B 3 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 13 = m i n μ B 2 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 14 = m i n μ B 3 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 15 = m i n μ B 2 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 16 = m i n μ B 3 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 17 = m i n μ B 2 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 18 = m i n μ B 3 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3

Appendix D.6. Aggregation Process

The aggregation process combines the outputs of all activated rules to produce an overall fuzzy set. This procedure relies on the THEN clauses composing the inference rules R r that jointly specify the output fuzzy set E .
E = B 1 E , B 2 E , , B 8 E ,
where each output fuzzy set B j E is defined by its membership function μ B j E ( y ) . For each rule R r , the degree of activation α r (calculated during the rule evaluation step) is used to scale the membership function of the corresponding output fuzzy set B r E . This scaling modifies the membership function values to reflect the strength of the contribution of rule R r to the global output E . Then, the aggregation process combines these scaled membership functions α r μ B r E ( y r ) from all rules using the m a x operator. This operator ensures that the highest degree of membership across all rules is retained for each output value y r . The resulting aggregated membership function is expressed as
μ E ( y r ) = max r α r μ B r E ( y r ) .
This aggregated fuzzy set represents the combined output of the FIS capturing all relevant contributions from the activated rules.

Appendix D.7. Defuzzification Process

To convert the fuzzy output into a crisp grade, the FIS uses a defuzzification method. For the FIS adapted for the ST scheme, we rely on a centroid approach; then, setting E = e s t f i s j , z , D , we have
e s t f i s ( j , z , D ) = s = 1 18 y s μ E ( y s ) s = 1 18 μ E ( y s )

Appendix D.8. Assigning an Objetive Grade

We take the vector z 2 , D = z 2 1 , z 2 2 , z 2 3 = 7 , 9 , 10 given by Equation (A72) and then use Equation (A123) to obtain the grade assigned by the teacher to the 2 t h student by relaying in the FIS-ST scheme. This becomes
e s t f i s ( 2 , z , D ) = 0.79
In a similar form, to obtain the associate objective performance rate, we consider the matrices I L ( j ) i ,   I P ( j ) i , and I A ( j ) i given by Equation (47), with l ( j ) k i ,   p ( j ) k i , and a ( j ) k i indicator pointers that produce the values of z o ( j ) L = 11 , z o ( j ) P = 10 , and z o j A = 13 just as Equation (60) states. Particularly, the 2 -th student the objective grade produced by Equation (62) turns into
e s t f i s ( 2 , z o , D ) = 0.80

Appendix D.9. Rating Error Relative to e s t f i s j , z o , D

Again, for the 2 -th student, according to Equation (63), we have an absolute deviation
e s t f i s ( 2 , z , z o , D ) = e s t f i s ( 2 , z o , D ) e s t f i s ( 2 , z , D ) = 0.80 0.79 = 0.01

Appendix D.10. Mean Absolute Deviation

We then arrange a labeling index q = 1 , 2 , , 100 , to simulate replicates z ( q , j ) 1 ,   z ( q , j ) 2 and z ( q , j ) 3 . For the 2-th student, this produces simulated grades e s t f i s 2 , z q , D (cf. Equation (84)). Considering the objective performance rate e s t f i s ( 2 , z o , D ) as a proxy of the mean value for teacher’s assigned grades e s t f i s j , z , D , we calculate the Mean Absolute Deviation e s t f i s ¯ j , z q , z o , D (cf. Equation (88)) that estimates the average rating error for e s t f i s j , z , D , relative to e s t f i s ¯ j , z o , D . This is given by
e s t f i s ¯ j , z q , z o , D = 0.06

Appendix D.11. Obtaining the Complexity C P S T F I S , Consistency C I S T F I S and Eficience β S T F I S Indexes Values

For this case, we set numbers N S T F I S = 13 of components E S T F I S = 110 of elements, and Q S T F I S = 53 of operations that the ST-FIS involves (cf. Table A7). The minimum and maximum values for N M ,   E M and Q M entries across the addressed methods in this study are given in Table 2. Then agreeing to Equations (89)–(91) for the ST-FIS method, we take Pondered–Normalized indicator values of P N ( N S T F I S ) = 1 , P E ( E S T F I S ) = 0.98 , and P Q ( Q S T F I S ) = 0.88 . (cf. Table 3). Also, according to Equation (96) we assign values a = 0.4 , b = 0.3   a n d   c = 0.3 (cf. Table 4). reflecting the relative importance of each one of the afore given Pondered–Normalized factors in the determination of C P S T F I S . Then, as stated by Equation (95), we have,
Table A7. Components, elements, and operations for the ST-FIS scheme.
Table A7. Components, elements, and operations for the ST-FIS scheme.
EquationComponentElementsOperations
(A38) D = d 1 d 2 d 3 d 4 d 5 5
(A41) I L 2 1 = X X     I P 2 1 =     I A 2 1 = X 10
I L ( 2 ) 2 = X     I P ( 2 ) 2 = X     I A ( 2 ) 2 = X 8
I L ( 2 ) 3 = X X X     I P ( 2 ) 3 =     I A ( 2 ) 3 = X X 8
I L ( 2 ) 4 = X X     I P ( 2 ) 4 = X     I A ( 2 ) 4 = X 9
I L ( 2 ) 5 = X     I P 2 5 =     I A ( 2 ) 5 = X X X 10
(A42) z ( 2 ) L 1 = 1        z ( 2 ) P 1 = 5    z ( 2 ) A 1 = 1
z ( 2 ) L 2 = 1        z ( 2 ) P 2 = 0    z ( 2 ) A 2 = 4
z ( 2 ) L 3 = 0        z ( 2 ) P 3 = 1    z ( 2 ) A 3 = 2
z ( 2 ) L 4 = 1        z ( 2 ) P 4 = 1    z ( 2 ) A 4 = 3
z ( 2 ) L 5 = 4        z ( 2 ) P 5 = 2    z ( 2 ) A 5 = 0
2623
(A43) z ( 2 ) L = 7      z ( 2 ) P = 9      z ( 2 ) A = 10 1512
(A44) T L = 16      T P = 11      T A = 18 1512
(A45) T I = 45 1
(A46) z 2 , D = z 2 L , z 2 P , z 2 A = 7 , 9 , 10 3
Fuzzification
(A107) μ B 1 L z ( j ) 1 , μ B 2 L z ( j ) 1 , μ B 3 L z ( j ) 1 . 1
(A108) μ B 1 P z ( j ) 2 , μ B 2 P z ( j ) 2 , μ B 2 P z ( j ) 2 . 1
(A109) μ B 1 A z ( j ) 3 , μ B 2 A z ( j ) 3 . 1
Rule evaluation
(A103)–(A120) α 1 = m i n μ B 1 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 2 = m i n μ B 2 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 3 = m i n μ B 1 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 4 = m i n μ B 2 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 5 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 6 = m i n μ B 2 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 7 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 8 = m i n μ B 3 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 9 = m i n μ B 1 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 10 = m i n μ B 3 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 11 = m i n μ B 1 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 12 = m i n μ B 3 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 13 = m i n μ B 2 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 14 = m i n μ B 3 L z ( j ) 1 , μ B 1 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 15 = m i n μ B 2 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 16 = m i n μ B 3 L z ( j ) 1 , μ B 2 P z ( j ) 2 , μ B 2 A z ( j ) 3
α 17 = m i n μ B 2 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 1 A z ( j ) 3
α 18 = m i n μ B 3 L z ( j ) 1 , μ B 3 P z ( j ) 2 , μ B 2 A z ( j ) 3
1
Aggregation process
(A122) μ Y ( y r ) = max r α r μ B r E ( y r ) 1
Defuzzification process
(A123) e s t f i s ( j , z , D ) = s = 1 18 y s μ Y ( y s ) s = 1 18 μ Y ( y s ) 2
The complexity pointer as given by Equation (95) acquires a value of
C P S T F I S = 0.4 1 + 0.3 0.98 + 0.3 0.88 = 0.96
In turn, as stated by Equation (68), the M A D S T F I S entry attains a value of
M A D S T F I S = q = 1 θ m 2.5 / 5 e s t f i s ( j , z q , D ) θ m = 0.06
Then, according to Equation (100), we have a C I S T F I S value of
C I S T F I S = 1 1 + 0.06 × 0.96 + C P S T F I S = 1.91
And correspondingly as stated by Equation (101) we have,
β S T F I S = ( 1.91 ) ( 0.96 ) ( 0.06 + 1 ) · 0.96 + 1 = 0.88

Appendix E. Glossary

Table A8. Definition of terms and symbols.
Table A8. Definition of terms and symbols.
Term and SymbolDefinition
N M ,   E M ,   Q M Number of components, elements, and operations in the assessment method’s structure.
NormalizationRescaling of values to the range [0.1, 1] to ensure comparability across methods.
C P M Complexity Indicator.
M A D M Mean Absolute Deviation: average deviation between simulated and objective scores.
C I M Consistency indicator: measures reliability by combining M A D M and C P M .
β M Efficiency indicator: evaluates the trade-off between consistency and complexity.
I ( C ) Indicator Function that returns 1 if a condition C is true and 0 otherwise.
FISFuzzy Inference System: a analyzing method that handles gradual or imprecise information.
SummativeTraditional assessment method based on assigning weights to learning sections and calculating weighted averages of scores. It does not model uncertainty or competence dimensions.
RubricEvaluation tool with structured criteria and performance levels for qualitative scoring.
STBAM or STSystematic Task-Based Assessment Method that evaluates students based on specific tasks and indicators grouped into Learning, Procedure, and Attitude categories and emphasizes observable competencies and instructional alignment.
ST-FISSystematic Task-Based Assessment Method with an integrated Fuzzy Inference System aimed to model complex performance and reduce subjectivity.

References

  1. CSAI. Valid and Reliable Assessments. Update, CSAI, The Center on Standards and Assessment Implementation. 2018. Available online: https://files.eric.ed.gov/fulltext/ED588476.pdf (accessed on 24 March 2025).
  2. Palmer, N. Tools to Improve Efficiency and Consistency in Assessment Practices Whilst Delivering Meaningful Feedback. In Proceedings of the ICERI2022 Proceedings, Seville, Spain, 7–9 November 2022; pp. 1069–1078. [Google Scholar]
  3. Scott, S.; Webber, C.F.; Lupart, J.L.; Aitken, N.; Scott, D.E. Fair and equitable assessment practices for all students. Assess. Educ. Princ. Policy Pract. 2013, 21, 52–70. [Google Scholar] [CrossRef]
  4. Hull, K.; Lawford, H.; Hood, S.; Oliveira, V.; Murray, M.; Trempe, M.; Jensen, M. Student anxiety and evaluation. Coll. Essays Learn. Teach. 2019, 12, 23–35. [Google Scholar] [CrossRef]
  5. Guevara Hidalgo, E. Impact of evaluation method shifts on student performance: An analysis of irregular improvement in passing percentages during COVID-19 at an Ecuadorian institution. Int. J. Educ. Integr. 2025, 21, 4. [Google Scholar] [CrossRef]
  6. Glaser, R.; Chudowsky, N.; Pellegrino, J.W. (Eds.) Knowing What Students Know: The Science and Design of Educational Assessment; National Academies Press: Washington, DC, USA, 2001. [Google Scholar]
  7. Tutunaru, T. Improving Assessment and Feedback in the Learning Process: Directions and Best Practices. Res. Educ. 2023, 8, 38–60. [Google Scholar] [CrossRef]
  8. Bhat, B.A.; Bhat, G.J. Formative and summative evaluation techniques for improvement of learning process. Eur. J. Bus. Soc. Sci. 2019, 7, 776–785. [Google Scholar]
  9. Guskey, T.R. Addressing inconsistencies in grading practices. Phi Delta Kappan 2024, 105, 52–57. [Google Scholar] [CrossRef]
  10. Cambridge International Examinations. Developing Your School with Cambridge: A Guide for School Leaders; Director Education: Hong Kong, China, 2016. [Google Scholar]
  11. Ahea, M.M.A.B.; Ahea, M.R.K.; Rahman, I. The Value and Effectiveness of Feedback in Improving Students’ Learning and Professionalizing Teaching in Higher Education. J. Educ. Pract. 2016, 7, 38–41. [Google Scholar]
  12. Tinkelman, D.; Venuti, E.; Schain, L. Disparate methods of combining test and assignment scores into course grades. Glob. Perspect. Account. Educ. 2013, 10, 61. [Google Scholar]
  13. Malouff, J.M.; Thorsteinsson, E.B. Bias in grading: A meta-analysis of experimental research findings. Aust. J. Educ. 2016, 60, 245–256. [Google Scholar] [CrossRef]
  14. Newman, D.; Lazarev, V. How Teacher Evaluation is affected by Class Characteristics: Are Observations Biased. Empower. Educ. Evid.-Based Decis. 2015, 1–11. [Google Scholar]
  15. Anderson, L.W. A Critique of Grading: Policies, Practices, and Technical Matters. Educ. Policy Anal. Arch. 2018, 26, 49. [Google Scholar] [CrossRef]
  16. Von Hippel, P.T.; Hamrock, C. Do test score gaps grow before, during, or between the school years? Measurement artifacts and what we can know in spite of them. Sociol. Sci. 2019, 6, 43–80. [Google Scholar] [CrossRef] [PubMed]
  17. Lauck, L.V. Grade Distribution and Perceptions of School Culture and Climate In a New Secondary School. Ph.D. Thesis, Rockhurst University, Kansas City, MO, USA, 2019. [Google Scholar]
  18. Royal, K.D.; Guskey, T.R. The Perils of Prescribed Grade Distributions: What Every Medical Educator Should Know. Online Submiss. 2014, 2, 240–241. [Google Scholar]
  19. Harris, D. Let’s Talk About Grading, Maybe: Using Transparency About the Grading Process To Aid in Student Learning. Seattle UL Rev. 2021, 45, 805. [Google Scholar]
  20. Dubois, P.; Lhotte, R. Consistency and Reproducibility of Grades in Higher Education: A Case Study in Deep Learning. arXiv 2023, arXiv:2305.07492. Available online: https://arxiv.org/abs/2305.07492 (accessed on 24 March 2025).
  21. Tyler, J.H.; Taylor, E.S.; Kane, T.J.; Wooten, A.L. Using student performance data to identify effective classroom practices. Am. Econ. Rev. 2010, 100, 256–260. [Google Scholar] [CrossRef]
  22. Muñoz, M.A.; Guskey, T.R. Standards-based grading and reporting will improve education. Phi Delta Kappan 2015, 96, 64–68. [Google Scholar] [CrossRef]
  23. Lamarino, D.L. The benefits of standards-based grading: A critical evaluation of modern grading practices. Curr. Issues Educ. 2014, 17, 2. [Google Scholar]
  24. Leal-Ramírez, C.; Echavarría-Heras, H.A. An Integrated Instruction and a Dynamic Fuzzy Inference System for Evaluating the Acquirement of Skills through Learning Activities by Higher Middle Education Students in Mexico. Mathematics 2024, 12, 1015. [Google Scholar] [CrossRef]
  25. Gronlund, N.E. Assessment of Student Achievement, 8th ed.; Allyn & Bacon Publishing, Longwood Division: Boston, MA, USA, 1998; pp. 1–300. [Google Scholar]
  26. Yambi, T.D.A.C.; Yambi, C. Assessment and Evaluation in Education; University Federal do Rio de Janeiro: Rio de Janeiro, Brazil, 2018. [Google Scholar]
  27. Ghaicha, A. Theoretical Framework for Educational Assessment: A Synoptic Review. J. Educ. Pract. 2016, 7, 212–231. [Google Scholar]
  28. Struyven, K.; Dochy, F.; Janssens, S. Students’ perceptions about evaluation and assessment in higher education: A review. Assess. Eval. High. Educ. 2005, 30, 325–341. [Google Scholar] [CrossRef]
  29. Royal, K.D.; Guskey, T.R. Does mathematical precision ensure valid grades? What every veterinary medical educator should know. J. Vet. Med. Educ. 2015, 42, 242–244. [Google Scholar] [CrossRef] [PubMed]
  30. Sikora, A.S. Mathematical Theory of Student Assessment Through Grading. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=085fd7628dc0d7e7c0e22dc458c81d4e50c44dee (accessed on 24 March 2025).
  31. Benton, T.; Elliott, G. The reliability of setting grade boundaries using comparative judgement. Res. Pap. Educ. 2016, 31, 352–376. [Google Scholar] [CrossRef]
  32. Hanania, M.I. Mathematical Methods Applicable to the Standardization of Examinations. Master’s Thesis, American University of Beirut, Beirut, Lebanon, 1947. [Google Scholar]
  33. Fitzpatrick, J.L.; Sanders, J.R.; Worthen, B.R.; Wingate, L.A. Program Evaluation: Alternative Approaches and Practical Guidelines; Pearson: Boston, MA, USA, 2012. [Google Scholar]
  34. Shepard, L.A. The role of assessment in a learning culture. Educ. Res. 2000, 29, 4–14. [Google Scholar] [CrossRef]
  35. Stufflebeam, D.L.; Coryn, C.L. Evaluation Theory, Models, and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  36. Taras, M. Assessing Assessment Theories. Online Educ. Res. J. 2012, 3, 1–13. [Google Scholar]
  37. Astin, A.W. Assessment for Excellence: The Philosophy and Practice of Assessment and Evaluation in Higher Education; Rowman & Littlefield Publishers: Lanham, MD, USA, 2012. [Google Scholar]
  38. Chen, H.T. Theory-Driven Evaluation: Conceptual Framework, Application and Advancement; Springer Fachmedien Wiesbaden: Wiesbaden, Germany, 2012; pp. 17–40. [Google Scholar]
  39. Gipps, C. Beyond Testing (Classic Edition): Towards a Theory of Educational Assessment; Routledge: London, UK, 2011. [Google Scholar]
  40. Wiliam, D. Integrating formative and summative functions of assessment. In Working Group; King’s College London: London, UK, 2000; Volume 10. [Google Scholar]
  41. Hurskaya, V.; Mykhaylenko, S.; Kartashova, Z.; Kushevska, N.; Zaverukha, Y. Assessment and evaluation methods for 21st century education: Measuring what matters. Futur. Educ. 2024, 4, 4–17. [Google Scholar] [CrossRef]
  42. Galamison, T.J. Benchmarking: A Study of the Perceptions Surrounding Accountability, Instructional Leadership, School Culture, Formative Assessments and Student Success. Ph.D. Thesis, University of Houston, Houston, TX, USA, 2014. [Google Scholar]
  43. Goldberger, S.; Keough, R.; Almeida, C. Benchmarks for Success in High School Education. Putting Data to Work in School to Career Education Reform; LAB at Brown University: Boston, MA, USA, 2000. [Google Scholar]
  44. Dunn, D.S.; McCarthy, M.A.; Baker, S.C.; Halonen, J.S. Using Quality Benchmarks for Assessing and Developing Undergraduate Programs; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
  45. Knight, P.T. Summative Assessment in Higher Education: Practices in Disarray. Stud. High. Educ. 2002, 27, 275–286. [Google Scholar] [CrossRef]
  46. Yorke, M. Summative Assessment: Dealing with the ‘Measurement Fallacy’. Stud. High. Educ. 2011, 36, 251–273. [Google Scholar] [CrossRef]
  47. Harlen, W.; Crick, R.D.; Broadfoot, P.; Daugherty, R.; Gardner, J.; James, M.; Stobart, G. A Systematic Review of the Impact of Summative Assessment and Tests on Students’ Motivation for Learning; University of Bristol, Evidence-Based Practice Unit: Bristol, UK, 2002. [Google Scholar]
  48. Ishaq, K.; Rana, A.M.K.; Zin, N.A.M. Exploring Summative Assessment and Effects: Primary to Higher Education. Bull. Educ. Res. 2020, 42, 23–50. [Google Scholar]
  49. Bijsterbosch, H. Professional Development of Geography Teachers with Regard to Summative Assessment Practices. Ph.D. Thesis, Utrecht University, Utrecht, The Netherlands, 2018. [Google Scholar]
  50. Ekwue, U.N. A Hybrid Exploration of the Impact of Summative Assessment on A-level Students’ Motivation and Depth of Learning and the Extent to Which This Is a Reflection of the Self. Ph.D. Thesis, King’s College London, London, UK, 2015. [Google Scholar]
  51. Stevens, D.D. Introduction to Rubrics: An Assessment Tool to Save Grading Time, Convey Effective Feedback, and Promote Student Learning; Routledge: New York, NY, USA, 2023. [Google Scholar]
  52. Steinberg, M.P.; Kraft, M.A. The sensitivity of teacher performance ratings to the design of teacher evaluation systems. Educ. Res. 2017, 46, 378–396. [Google Scholar] [CrossRef]
  53. Thompson, M.K.; Clemmensen, L.K.H.; Ahn, B.U. The effect of rubric rating scale on the evaluation of engineering design projects. Int. J. Eng. Educ. 2013, 29, 1490–1502. [Google Scholar]
  54. Humphry, S.M.; Heldsinger, S.A. Common structural design features of rubrics may represent a threat to validity. Educ. Res. 2014, 43, 253–263. [Google Scholar] [CrossRef]
  55. Sadler, D.R. Indeterminacy in the use of preset criteria for assessment and grading. Assess. Eval. High. Educ. 2009, 34, 159–179. [Google Scholar] [CrossRef]
  56. Brookhart, S.M.; Chen, F. The quality and effectiveness of descriptive rubrics. Educ. Rev. 2015, 67, 343–368. [Google Scholar] [CrossRef]
  57. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  58. Mousse, M.A.; Almufti, S.M.; García, D.S.; Jebbor, I.; Aljarbouh, A.; Tsarev, R. Application of fuzzy logic for evaluating student learning outcomes in E-learning. In Proceedings of the Computational Methods in Systems and Software; Springer: Cham, Switzerland, 2023; pp. 175–183. [Google Scholar]
  59. Ballester, L.; Colom, A.J. Lógica difusa: Una nueva epistemología para las Ciencias de la Educación. Rev. Educ. 2006, 340, 995–1008. [Google Scholar]
  60. Sripan, R.; Suksawat, B. Propose of fuzzy logic-based students’ learning assessment. In Proceedings of the ICCAS 2010, Gyeonggi-do, Republic of Korea, 27–30 October 2010; pp. 414–417. [Google Scholar]
  61. Hegazi, M.O.; Almaslukh, B.; Siddig, K. A fuzzy model for reasoning and predicting student’s academic performance. Appl. Sci. 2023, 13, 5140. [Google Scholar] [CrossRef]
Figure 1. Consistency and efficiency indices as functions of a varying dispersion index M A D M and a fixed complexity pointer C P M . Behavior of the consistency index C I M (panel (a)) and efficiency index β M (panel (b)) as a function of the dispersion index M A D M while keeping different fixed values of the complexity pointer C P M .
Figure 1. Consistency and efficiency indices as functions of a varying dispersion index M A D M and a fixed complexity pointer C P M . Behavior of the consistency index C I M (panel (a)) and efficiency index β M (panel (b)) as a function of the dispersion index M A D M while keeping different fixed values of the complexity pointer C P M .
Applsci 15 06014 g001
Figure 2. Normal distribution fitting to simulated grade data. Panel (a) shows the Summative method, in which the fit of a normal distribution to simulated grades is acceptable (failing to reject H0). Panel (b) fits a normal distribution to a Rubric method’s simulated scores; the fit is appropriate (failing to reject H0). Panel (c) shows the fit of a normal distribution to simulated grade data using the ST scheme. The fit is satisfactory (failing to reject H0). Panel (d) shows the fit of a normal distribution to grading data simulated according to the ST-FIS method; the fit is also acceptable (i.e., we fail to reject H0).
Figure 2. Normal distribution fitting to simulated grade data. Panel (a) shows the Summative method, in which the fit of a normal distribution to simulated grades is acceptable (failing to reject H0). Panel (b) fits a normal distribution to a Rubric method’s simulated scores; the fit is appropriate (failing to reject H0). Panel (c) shows the fit of a normal distribution to simulated grade data using the ST scheme. The fit is satisfactory (failing to reject H0). Panel (d) shows the fit of a normal distribution to grading data simulated according to the ST-FIS method; the fit is also acceptable (i.e., we fail to reject H0).
Applsci 15 06014 g002aApplsci 15 06014 g002b
Figure 3. Boxplot of simulated values of the Consistency indicator C I M across grading methods M . Panel (a) Summative. Panel (b) Rubric. Panel (c) ST. Panel (d) ST-FIS. Each boxplot summarizes 100 simulation runs per method and displays the median, interquartile range, and potential outliers. The ST-FIS method demonstrates the highest median and a relatively narrow spread, indicating strong and stable performance. While the Summative method has a slightly narrower interquartile range, its lower median and similar overall dispersion suggests less consistent central performance.
Figure 3. Boxplot of simulated values of the Consistency indicator C I M across grading methods M . Panel (a) Summative. Panel (b) Rubric. Panel (c) ST. Panel (d) ST-FIS. Each boxplot summarizes 100 simulation runs per method and displays the median, interquartile range, and potential outliers. The ST-FIS method demonstrates the highest median and a relatively narrow spread, indicating strong and stable performance. While the Summative method has a slightly narrower interquartile range, its lower median and similar overall dispersion suggests less consistent central performance.
Applsci 15 06014 g003
Figure 4. Boxplot of simulated values of the Efficiency indicator β M across grading methods M . Panel (a) Summative. Panel (b) Rubric. Panel (c) ST. Panel (d) ST-FIS. The ST-FIS method stands out by achieving both the highest median efficiency and one of the smallest ranges, suggesting better and more consistent performance.
Figure 4. Boxplot of simulated values of the Efficiency indicator β M across grading methods M . Panel (a) Summative. Panel (b) Rubric. Panel (c) ST. Panel (d) ST-FIS. The ST-FIS method stands out by achieving both the highest median efficiency and one of the smallest ranges, suggesting better and more consistent performance.
Applsci 15 06014 g004aApplsci 15 06014 g004b
Table 1. Components ( N M ), Elements ( E M ), and Operations ( Q M ) for the different methods contemplated in this study.
Table 1. Components ( N M ), Elements ( E M ), and Operations ( Q M ) for the different methods contemplated in this study.
Method N M E M Q M
Summative31519
Rubric64760
ST811248
ST-FIS1311053
Table 2. Minimum and maximum values for N M ,   E M and Q M entries across the addressed methods.
Table 2. Minimum and maximum values for N M ,   E M and Q M entries across the addressed methods.
N M  (min) E M  (min) Q M  (min)
3.0015.0019.00
N M  (Max) E M  (Max) Q M  (max)
1311260
Table 3. Pondered–normalized values for Components ( ( P N   ( N M ) ) , Elements P E   E M , and Operations ( P Q   ( Q M ) for the presently addressed grading methods (cf. Equations (89)–(91)).
Table 3. Pondered–normalized values for Components ( ( P N   ( N M ) ) , Elements P E   E M , and Operations ( P Q   ( Q M ) for the presently addressed grading methods (cf. Equations (89)–(91)).
Method P N   ( N M ) P E   ( E M ) P Q   ( Q M )
Summative0.230.130.31
Rubric0.460.411.00
ST0.611.000.80
ST-FIS1.000.980.88
Table 4. Weighting coefficients a , b , c reflecting the relative importance of factors N M , E M , and Q M , respectively, in the determination of the complexity pointer C P M . Scaling parameters θ and ρ for normalized–pondered values P N ( N M ) , P E ( E M ) and P Q ( Q M ) (cf. Equation (96) and Inequality (99)).
Table 4. Weighting coefficients a , b , c reflecting the relative importance of factors N M , E M , and Q M , respectively, in the determination of the complexity pointer C P M . Scaling parameters θ and ρ for normalized–pondered values P N ( N M ) , P E ( E M ) and P Q ( Q M ) (cf. Equation (96) and Inequality (99)).
a b c θ ρ
0.40.30.30.131.0
Table 5. Upper bounds ( U B s ) and lower bounds L B s for indicators of Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) , (cf. Equations (95), (100) and (101) one to one) calculated by setting θ = 0.13 and ρ = 1.0 , (cf. Inequality (99)).
Table 5. Upper bounds ( U B s ) and lower bounds L B s for indicators of Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) , (cf. Equations (95), (100) and (101) one to one) calculated by setting θ = 0.13 and ρ = 1.0 , (cf. Inequality (99)).
C P M C I M β M
L B 0.13 0.79 0.0345
U B 1.002.0 1.769
Table 6. Indicators for Mean Absolute Deviation ( M A D M ) , Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) associated with the analyzed grading methods (cf. Equations (106), (95), (100), and (101)).
Table 6. Indicators for Mean Absolute Deviation ( M A D M ) , Complexity ( C P M ) , Consistency ( C I M ), and Efficiency ( β M ) associated with the analyzed grading methods (cf. Equations (106), (95), (100), and (101)).
M A D M C P M C I M β M
Summative0.110.231.200.20
Rubric0.130.611.540.52
ST0.070.791.730.71
ST-FIS0.060.961.910.88
Table 7. Standard errors (SE (M) corresponding to the indicators of Agreement ( M A D M ) , Consistency ( C I M ), and Efficiency ( β M ) for each grading method). These values are based on simulation runs and quantify the statistical variability of each metric.
Table 7. Standard errors (SE (M) corresponding to the indicators of Agreement ( M A D M ) , Consistency ( C I M ), and Efficiency ( β M ) for each grading method). These values are based on simulation runs and quantify the statistical variability of each metric.
S E   ( M A D M ) S E   ( C I M ) S E   ( β M )
Summative0.000860.000180.00018
Rubric0.000960.000500.00060
ST0.000510.000360.00050
ST-FIS0.00100.000890.0012
Table 8. Results of normal distribution fitting to simulated student grades, based on the provided mathematical setups, incorporating the Summative, Rubric, ST, and ST-FIS methods.
Table 8. Results of normal distribution fitting to simulated student grades, based on the provided mathematical setups, incorporating the Summative, Rubric, ST, and ST-FIS methods.
MethodMeanStandard DeviationChi-Squared StatisticDegrees of Freedomp-Value
Summative0.52140.14121.066320.5868
Rubric0.58740.14103.118130.3738
ST0.50440.06971.390530.7078
ST-FIS0.49820.24633.509020.1730
Table 9. Comparative description of the Summative, Rubric and STBAM methods.
Table 9. Comparative description of the Summative, Rubric and STBAM methods.
FeatureSummativeRubricSTBAM
Underlying logicWeighted arithmeticWeighted criteria + qualitative levelsCompetency assestment structure with binary indicators
Evaluation unitsLearning sections (content areas)Evaluation criteriaInstructions or “directions” within a task
Evaluation of attributesImplicitExplicit, but mainly qualitativeExplicit: Learning (L), Procedure (P), Attitude (A)
Scoring scaleContinuous, percentageDiscrete, ordinal scale (e.g., 1–5 or 1–4)Ratio of ✓ marks over total indicators
Weighting mechanismPredefined weightsPredefined weights per criterionImplicit via the number of indicators per attribute
SubjectivityHighModerate (due to descriptive guidance)Reduced: teacher marks presence/absence of observable attributes (✓ or ✗ per observable indicator)
Final grade computationScalar product of scores and weightsWeighted sum of performance levelsSum of success indicators divided by total number of indicators
Alignment to competenciesLimitedPartial (depends on design)High (mapped directly to competency-building indicators)
Empirical validation feasibilityLowModerateHigh (traceable indicator-level evaluation)
Holistic assessmentNoPartiallyYes (includes knowledge, skills, and attitude dimensions)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Leal-Ramírez, C.; Echavarría-Heras, H.A.; Villa-Diharce, E.; Haro-Avalos, H. A Study on the Consistency and Efficiency of Student Performance Evaluation Methods: A Mathematical Framework and Comparative Simulation Results. Appl. Sci. 2025, 15, 6014. https://doi.org/10.3390/app15116014

AMA Style

Leal-Ramírez C, Echavarría-Heras HA, Villa-Diharce E, Haro-Avalos H. A Study on the Consistency and Efficiency of Student Performance Evaluation Methods: A Mathematical Framework and Comparative Simulation Results. Applied Sciences. 2025; 15(11):6014. https://doi.org/10.3390/app15116014

Chicago/Turabian Style

Leal-Ramírez, Cecilia, Héctor Alonso Echavarría-Heras, Enrique Villa-Diharce, and Horacio Haro-Avalos. 2025. "A Study on the Consistency and Efficiency of Student Performance Evaluation Methods: A Mathematical Framework and Comparative Simulation Results" Applied Sciences 15, no. 11: 6014. https://doi.org/10.3390/app15116014

APA Style

Leal-Ramírez, C., Echavarría-Heras, H. A., Villa-Diharce, E., & Haro-Avalos, H. (2025). A Study on the Consistency and Efficiency of Student Performance Evaluation Methods: A Mathematical Framework and Comparative Simulation Results. Applied Sciences, 15(11), 6014. https://doi.org/10.3390/app15116014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop