# Probabilistic Models for Competence Assessment in Education

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Related Work

#### 1.2. Bayesian Networks

#### 1.3. Representation

- The vertices represent the random variables that we model.
- For each vertex ${X}_{i}$, there is a conditional probability distribution $\pi \left({X}_{i}\right|{\mathbf{pa}}_{i})$.

**pa**${}_{i}$ stands for parent nodes and denotes the set of variables with a direct edge towards ${X}_{i}$. The symbol ${\mathsf{\Theta}}_{i}$ represents the parameters of the probability distribution associated with that same vertex. If a node has no parents, then the probability distribution associated with it is marginal. As a side note, when depicting a Bayesian network, observed nodes will be shaded.

- Given a node X in a Bayesian network, its parent nodes are the set of nodes with a direct edge towards X.
- Given a node X in a Bayesian network, its children nodes are the set of nodes with an incoming edge from X.
- Given a node X in a Bayesian network, its ancestors are given by the set of all variables from which we can reach X through a directed, arbitrarily long path.
- Given the set of all the variables modeled in a Bayesian network, $X=\{{X}_{1},{X}_{2},\dots ,{X}_{N}\}$, an ancestral ordering of the variables is followed when traversing the network; if every time we reach a variable X, we have already visited its ancestors.
- Given a node X in a Bayesian network, its Markov blanket is given by its parents, its children, and the parents of its children.

#### 1.4. Flow of Probabilistic Influence

- If W is an intermediate node and all the edges go in the same direction (Figure 3), then an update in X will be reflected in Y if and only if W is not an observed variable, and vice versa: an update in Y will be reflected in X if and only if W is not an observed variable.
- The same applies if W is a parent of two children X and Y (Figure 4). Again, there will be a flow of probabilistic influence from X to Y if and only if W is not observed.
- Finally, if X and Y are parents of W (v-structure, Figure 5), then the situation reverses, and there is a flow of probabilistic influence from X to Y if and only if W is observed.

- Let $\mathcal{G}$ be a DAG.
- Let ${X}_{1}\rightleftharpoons \dots \rightleftharpoons {X}_{2}$ be a trail in $\mathcal{G}$.
- A trail is active given a set of observed variables W if
- Whenever there is a v-structure ${X}_{i-1}\to {X}_{i}\leftarrow {X}_{i+1}$, ${X}_{i}$ or one of its descendants is in W.
- no other node along the trail is in W.

**Mb**${}_{j}$, for any set of variables ${\mathbf{X}}_{c}\subseteq (\mathbf{V}\backslash {\mathbf{Mb}}_{j})$:

## 2. Materials and Methods

## 3. Results

#### 3.1. Probabilistic Models of Peer Assessment

#### 3.1.1. Personalized Automated Assessments

#### Model

#### Direct Trust

#### Indirect Trust

**Definition**

**1.**

**Definition**

**2.**

#### Incremental Updates

- Initially, the default direct trust distribution ${T}_{i,j}$ between any two peers i and j is the one describing ignorance (i.e., the flat equiprobable distribution $\mathbb{F}$). When j evaluates an object $\alpha $ that was already assessed by i, ${T}_{i,j}$ is updated as follows:
- Let $\mathbb{P}({X}_{u}=x)$ for $x=\mathit{diff}(i,j)$ be the probability distribution of the assessment difference between i and j. The new assessment must be reflected in a change in the probability distribution. In particular, $\mathbb{P}({X}_{u}=x)$ is increased a fraction of the probability of X not being equal to x:$$\mathbb{P}({X}_{u}=x)=\mathbb{P}({X}_{u}=x)+\gamma \xb7(1-\mathbb{P}({X}_{u}=x))$$For instance, if the probability of x is 0.6 and $\gamma $ is 0.1, then the new probability of x becomes $0.6+0.1\times (1-0.6)=0.64$. As in the example, the value of $\gamma $ must be closer to 0 than to 1, for considerable changes can only be the result of information learned from the accumulation of many assessments.
- The resulting ${T}_{i,j}$ is then normalized by computing the distribution that respects the new computed value and has a minimal relative entropy with the previous probability distributions:$$\begin{array}{c}\hfill {\mathbb{T}}_{i,j}\left(X\right)={\mathrm{argmin}}_{{\mathbb{P}}^{\prime}}{\mathbb{P}}^{\prime}\left(X\right)\sum _{{x}^{\prime}}p\left({X}^{\alpha}{x}^{\prime}\right)log\frac{p\left({X}^{\alpha}{x}^{\prime}\right)}{{p}^{\prime}\left({X}^{\alpha}{x}^{\prime}\right)}\\ \hfill \phantom{\rule{1.em}{0ex}}\mathrm{such}\phantom{\rule{4.pt}{0ex}}\mathrm{that}\phantom{\rule{4.pt}{0ex}}\phantom{\rule{1.em}{0ex}}\{p\left({X}^{\alpha}x\right)={p}^{\prime}\left({X}^{\alpha}x\right)\}\end{array}$$These direct trust distributions between peers are stored in a matrix $\mathcal{C}$.
- To encode the decrease in the integrity of information with time (information decays with time, I may be sure today about your high competence playing chess, but maybe in five years time I will be no longer sure if our interactions stop. You might have lost your abilities during that period), the direct trust distributions in $\mathcal{C}$ are decayed towards a decay limit distribution after a certain grace period. In our case, the limit distribution is the flat equiprobable $\mathbb{F}$. When a new evaluation updates a direct trust distribution ${T}_{i,j}$, ${T}_{i,j}$ is first decayed before it is modified.
- The indirect trust distributions between $\u03f5$ and each peer are stored in a distributions vector $\overrightarrow{{t}_{\epsilon}}$.Initially, $\overrightarrow{{t}_{\epsilon}}$ contains the probability distributions describing ignorance $\mathbb{F}$. When matrix $\mathcal{C}$ is updated, $\overrightarrow{{t}_{\epsilon}}$ is also updated as a product of its former version times matrix $\mathcal{C}$:$${t}_{\u03f5,j}^{k+1}=\underset{0<i\le n}{\u2a01}{\mathbb{T}}_{i,j}\otimes {\mathbb{T}}_{\u03f5,i}^{k}$$
- If a direct trust distribution ${T}_{\epsilon ,j}$ exists between $\epsilon $ and j, the indirect trust distribution ${t}_{\epsilon ,j}$ is overwritten with ${T}_{\epsilon ,j}$ after the update of the indirect trust distributions.

#### 3.1.2. Tuned Models of Peer Assessment in MOOCs

- The assignment’s true score, ${s}_{u}\in \mathbb{R}$. In the case of the implementation presented here, this is the teacher’s grade.
- The grader’s bias, ${b}_{v}\in \mathbb{R}$. This bias reflects a grader’s tendency to either inflate or deflate their assessment by a certain number of percentage points. The lower these biases, the more accurate the grades will be.
- The grader’s reliability, $\tau \in \mathbb{R}$, reflecting how close on average a grader’s peer assessments tend to land near the corresponding assignment’s true score after having corrected for bias. In this context, reliability is a synonym for precision or inverse variance of a normal distribution. Notice that the reliability of every grader is fixed to be the same value.

#### Partially Known Parameters

#### 3.1.3. PG-Bivariate: A Bayesian Model of Grading Similarity

- The location vector, ${\overrightarrow{\mu}}_{{v}_{1}{v}_{2}}=({\mu}_{v1},{\mu}_{v2})$ is composed of each of the peers’ location parameters separately.
- The covariance matrix ${\mathsf{\Sigma}}_{{v}_{1}{v}_{2}}$ contains the individual variances in its diagonal. The off-diagonal components codify the correlations between ${v}_{1}$ and ${v}_{2}$ when grading.

#### Partially Known Parameters

#### 3.2. Experiments

#### 3.2.1. Experiments on Bayesian Networks

#### Posterior Predictive Sampling

#### Studying the Error Evolution

- Random choice (baseline): The next observed ground truth is chosen randomly.
- Total $RMSE$ decreasing policy: At each iteration, we picked and observed the true grade (i.e., the teacher’s grade) of that student whose assessment was introducing the highest root mean squared error.

- The red line shows the evolution of the estimations’ $RMSE$ as we introduce new ground truths following a random policy.
- The yellow, discontinuous line shows the evolution of the estimations’ $RMSE$ without considering the known ground truths to correct for overly optimistic low error values. Additionally, in this case, a random policy for ground truth injection is followed.
- The blue line shows the evolution of the estimations’ $RMSE$ as we introduce new ground truths following an $RMSE$ decreasing policy.
- The violet, discontinuous line shows the same information as the yellow line for the case of the $RMSE$ decreasing policy.

#### 3.2.2. Experiments on PAAS

#### 3.2.3. Comparison of the Three Models

## 4. Discussion

- The small number (5) of peers assessing each assignment.
- The fact that these graders are chosen randomly.
- The fact that the process is entirely anonymous concerning the students (the graders do not know who are they assessing, and the gradees do not know the identity of their graders).

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

AI | Artificial intelligence |

BN | Bayesian network |

DAG | Directed acyclic graph |

MOOC | Massive open online course |

NLP | Natural language processing |

PAAS | Personalized automated assessments |

PGM | Probabilistic graphical model |

RMSE | Root mean squared error |

## References

- Schön, D.A. The Design Studio: An Exploration of Its Traditions and Potentials; International Specialized Book Service Incorporated: London, UK, 1985. [Google Scholar]
- Tinapple, D.; Olson, L.; Sadauskas, J. CritViz: Web-based software supporting peer critique in large creative classrooms. Bull. IEEE Tech. Comm. Learn. Technol.
**2013**, 15, 29. [Google Scholar] - Kulkarni, C.; Wei, K.P.; Le, H.; Chia, D.; Papadopoulos, K.; Cheng, J.; Koller, D.; Klemmer, S.R. Peer and self assessment in massive online classes. ACM Trans. Comput.-Hum. Interact. (TOCHI)
**2013**, 20, 1–31. [Google Scholar] [CrossRef] [Green Version] - Piech, C.; Huang, J.; Chen, Z.; Do, C.; Ng, A.; Koller, D. Tuned models of peer assessment in MOOCs. arXiv
**2013**, arXiv:1307.2579. [Google Scholar] - Sterbini, A.; Temperini, M. Correcting open-answer questionnaires through a Bayesian-network model of peer-based assessment. In Proceedings of the 2012 International Conference on Information Technology Based Higher Education and Training (ITHET), Istanbul, Turkey, 21–23 June 2012; pp. 1–6. [Google Scholar]
- Bachrach, Y.; Graepel, T.; Minka, T.; Guiver, J. How to grade a test without knowing the answers—A Bayesian graphical model for adaptive crowdsourcing and aptitude testing. arXiv
**2012**, arXiv:1206.6386. [Google Scholar] - Mi, F.; Yeung, D.Y. Probabilistic graphical models for boosting cardinal and ordinal peer grading in MOOCs. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Gutierrez, P.; Osman, N.; Roig, C.; Sierra, C. Personalised Automated Assessments. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, 9–13 May 2016; Jonker, C.M., Marsella, S., Thangarajah, J., Tuyls, K., Eds.; ACM: New York, NY, USA, 2016; pp. 1115–1123. [Google Scholar]
- De Alfaro, L.; Shavlovsky, M. CrowdGrader: A tool for crowdsourcing the evaluation of homework assignments. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education, Atlanta, GA, USA, 5–8 March 2014; pp. 415–420. [Google Scholar]
- Ashley, K.; Goldin, I. Toward ai-enhanced computer-supported peer review in legal education. In Legal Knowledge and Information Systems; IOS Press: Amsterdam, The Netherlands, 2011; pp. 3–12. [Google Scholar]
- Balfour, S.P. Assessing Writing in MOOCs: Automated Essay Scoring and Calibrated Peer Review
^{TM}. Res. Pract. Assess.**2013**, 8, 40–48. [Google Scholar] - Admiraal, W.; Huisman, B.; Pilli, O. Assessment in Massive Open Online Courses. Electron. J. E-Learn.
**2015**, 13, 207–216. [Google Scholar] - The Earth Mover’s Distance (EMD); The Stanford University: Stanford, CA, USA, 1999.
- Swain, M.J.; Ballard, D.H. Color indexing. Int. J. Comput. Vis.
**1991**, 7, 11–32. [Google Scholar] [CrossRef] - Stan Development Team. PyStan: The Python Interface to Stan. 2021. Available online: http://mc-stan.org/2 (accessed on 14 December 2021).

**Figure 1.**Iterative addition of variables to a generative model. This methodology constructs a joint probability distribution from intermediate conditional probability distributions.

**Figure 2.**Reverse construction of a joint probability distribution. In this case, we begin with a marginal distribution over some few variables and then increase the complexity by conditioning this distribution over a new set of (ancestor) nodes.

**Figure 12.**Percentage error using random (red line) and entropy decreasing (violet, discontinuous line) assessment order as a function of the number of observed teacher’s grades for classes of 200 and 50 students using synthetic data. The

**upper**figure represents the original implementation by Gutierrez et al. [8].

**Below**, our results are shown.

**Figure 13.**Percentage error as a function of the number of observed ground truth grades reported by the three studied models.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

López de Aberasturi Gómez, A.; Sabater-Mir, J.; Sierra, C.
Probabilistic Models for Competence Assessment in Education. *Appl. Sci.* **2022**, *12*, 2368.
https://doi.org/10.3390/app12052368

**AMA Style**

López de Aberasturi Gómez A, Sabater-Mir J, Sierra C.
Probabilistic Models for Competence Assessment in Education. *Applied Sciences*. 2022; 12(5):2368.
https://doi.org/10.3390/app12052368

**Chicago/Turabian Style**

López de Aberasturi Gómez, Alejandra, Jordi Sabater-Mir, and Carles Sierra.
2022. "Probabilistic Models for Competence Assessment in Education" *Applied Sciences* 12, no. 5: 2368.
https://doi.org/10.3390/app12052368