Next Article in Journal
Quantifying Deviations from Gaussianity with Application to Flight Delay Distributions
Next Article in Special Issue
Information Theory in Perception of Form: From Gestalt to Algorithmic Complexity
Previous Article in Journal
A Hyperbolic Sum Rule for Probability: Solving Recursive (“Chicken and Egg”) Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information-Theoretic Measures of Metacognitive Efficiency: Empirical Validation with the Face Matching Task

Department of Psychology, Ariel University, Ariel 40700, Israel
Entropy 2025, 27(4), 353; https://doi.org/10.3390/e27040353
Submission received: 2 February 2025 / Revised: 5 March 2025 / Accepted: 24 March 2025 / Published: 28 March 2025
(This article belongs to the Special Issue Information-Theoretic Principles in Cognitive Systems)

Abstract

:
The ability of participants to monitor the correctness of their own decisions by rating their confidence is a form of metacognition. This introspective act is crucial for many aspects of cognition, including perception, memory, learning, emotion regulation, and social interaction. Researchers assess the quality of confidence ratings according to bias, sensitivity, and efficiency. To do so, they deploy quantities such as m e t a d - d or the M r a t i o These measures compute the expected accuracy level of performance in the primary task (Type 1) from the secondary confidence rating task (Type 2). However, these measures have several limitations. For example, they are based on unwarranted parametric assumptions, and they fall short of accommodating the granularity of confidence ratings. Two recent papers by Dayan and by Fitousi have proposed information-theoretic measures of metacognitive efficiency that can address some of these problems. Dayan suggested m e t a I and Fitousi proposed m e t a U , m e t a KL , and m e t a J . These authors demonstrated the convergence of their measures on the notion of metacognitive efficiency using simulations, but did not apply their measures to real empirical data. The present study set to test the construct validity of these measures in a concrete behavioral task—the face-matching task. The results supported the viability of these novel indexes of metacognitive efficiency, and provide substantial empirical evidence for their convergence. The results also adduce considerable evidence that participants in the face-matching task acquire valuable metaknowledge about the correctness of their own decisions in the task.

1. Introduction

Assessing the validity of our own decisions is a self-reflective act of immense survival value [1,2,3,4]. For example, while learning for a test, we have to constantly monitor the level of our knowledge. If we feel that a satisfactory level of knowledge has been accomplished, we can halt our preparation; otherwise, we have to continue our learning [5,6]. This self-reflective ability is a form of metacognition [7,8,9,10,11,12,13]. Metacognition plays a central role in various domains, including learning [14], memory [15,16], self-awareness [17], action [18,19], psychiatric disorders [20], and social interaction [21]. Several neural substrates are involved in metacognitive assessments and monitoring of behavior [22]. These include the right anterior dorsolateral prefrontal cortex and the bilateral parahippocampal cortex [11]. A prominent method of assessing metacognition behaviorally is based on collecting confidence judgments from participants [13]. Measuring the quality of these confidence ratings is consequential for understanding the underlying mechanisms of metacognition. The goals of the present study are threefold: (a) to introduce several recently developed information-theoretic measures of metacognitive efficiency that are based on confidence judgments [3,4], (b) to demonstrate how these measures can be applied to a concrete behavioral task of face matching [23], and (c) to show that these information-theoretic measures converge on a single theoretical construct in this empirical context [24]. Previous studies by Dayan [3] and Fitousi [4] demonstrated the viability of these measures by using synthetic data from simulations. However, these measures should be computed and tested with real data from genuine empirical tasks. If indeed those information-based measures gauge the same psychological construct – metacognitive efficiency – they should correlate with other canonical measures, as well as with themselves, in real data.
A primary objective of research on metacognition is that of measuring the quality of confidence judgments. Three quantities stand out as crucial aspects of measurement and modeling: bias, sensitivity, and efficiency. To motivate our understanding of these concepts, consider a simple task by which an observer is asked to detect whether a signal is present or absent by emitting one of two possible responses, and then immediately the observer is asked to rate their confidence in the correctness of their decision. It is often customary to call the first task a type 1 task, and the second a type 2 task [25,26]. It is also theoretically convenient to separate the actor, who performs in the type 1 task, from the rater, who engages in the type 2 task, although the two entities might be the same person [27]. Sensitivity plays a role in both the actor’s and the rater’s decisions. The actor’s sensitivity is reflected in her ability to separate between noise and noise plus signal. In the language of signal detection theory (SDT, [28,29]), this quantity is captured by the parameter d . The rater’s sensitivity reflects their ability to distinguish between the actor’s correct and incorrect decisions. A metacognitively sensitive rater tends to provide high confidence ratings for the actor’s correct decisions, and low confidence ratings for the actor’s incorrect decisions. However, the sensitivity of the rater cannot serve as a viable measure of the rater’s quality of decision. Trivially, a rater can be highly sensitive if the actor is giving correct decisions most of the time. In other words, the rater’s job becomes much easier as the actor becomes more sensitive. Thus, a genuine measure of metacognitive efficiency should correct the rater’s sensitivity for the actor’s performance. Moreover, metacognitive efficiency can be affected by bias. An overconfident rater, one who gives high confidence ratings even when the actor’s decisions are incorrect, is effectively both insensitive and inefficient.
Model-free [30] and model-based [1,2,26,27,31] approaches have been developed to quantify metacognitive bias, sensitivity, and efficiency. One of the most influential model-based frameworks in the field is called the m e t a d model [1,2]. The model relies on principles of SDT [28,29], and as such, incorporates parametric assumptions [25,26,32]. The model’s objective is to predict the rater’s hypothetical sensitivity in the type 1 task based on the rater’s performance in the type 2 task. The value of this theoretical quantity is dubbed m e t a d . Figure 1A–D illustrates the theoretical and empirical components of this model and their interrelations. Figure 1A presents the underlying model of the type 1 task. This is the traditional equal-variance signal detection model [29], depicted by two overlapping Gaussian distributions, one for the noise and one for the signal + noise. The normalized distance between the two distributions is captured by the parameter d , which gauges the sensitivity of the actor in the type 1 task. Figure 1B illustrates the underlying model behind the rater’s type 2 confidence judgments. It assumes the rater’s decisions are also based on an equal-variance signal detection process. However, the Gaussian distributions reflect the incorrect and correct actor’s responses, and are generated from the type 1 distributions by adding noise. Figure 1D depicts the hypothetical Receiver Operating Characteristic (ROC) curve of the rater’s type 2 performance. The parameters of the theoretical models can be found by fitting them to the data of the accuracy–confidence histograms that are illustrated in Figure 1C. These histograms reflect the empirical normalized probabilities of the rater’s confidence ratings conditioned on the accuracy of the actor’s response. The distributions for the correct responses appear in green, whereas the distributions for the incorrect responses appear in red. The probabilities in each distribution sum up to 1.
M e t a d reflects the expected d of the rater in the type 1 task based on her performance in the type 2 task. Because this quantity and the actor’s type 1 d are expressed in the same measurement units [2], they can be directly compared. Metacognitive efficiency can then be assessed in several possible ways: as an M-ratio ( m e t a d / d ) or as a difference ( m e t a d - d ). The rater’s efficiency is optimal when M r a t i o = 1 , or when m e t a d - d = 0 . The rater is hyposensitive when M r a t i o < 1 or when m e t a d - d < 0 . There are situations in which the rater is hypersensitive, namely when M r a t i o > 1 , or when m e t a d - d > 0 [1]. This can occur when the rater has access to additional information accumulated after the actor’s decision. Such a scenario can be modeled, for example, by the second-order model [27]. According to this model, the actor and the rater are sampling evidence from two separate distributions. Hypersensitivity in this model can be modeled by reducing the noise (i.e., variance) in the rater’s distribution to a much lower level than the noise (i.e., variance) in the actor’s distribution
Measures emanating from the m e t a d model [1,2], such as m e t a d - d and the M r a t i o , have been used extensively by researchers to evaluate the quality of confidence judgments [34]. However, these measures are not without their limitations [3]. First, it is not immediately clear why empirical confidence ratings should necessarily accommodate to a type 1 decision process [3]. Moreover, these measures require distributional assumptions, are not bias-free, are applicable only to ratio-scale variables, do not scale with the number of stimuli, responses, or confidence rating alternatives, do not have definite upper and lower bounds, and are applicable only to experimental designs with two stimuli and two responses. Dayan [3] and Fitousi [4] have recently proposed information-theoretic indexes of metacognitive efficiency that can ameliorate most of these problems, and serve as additional means of assessment.
Dayan [3] has succinctly summarized the utility of an information-theoretic approach to judgments of confidence: “the problem of confidence is inherently one of information—that the actor has about the true state of the stimulus; and that the rater has about the same quantity and about what the actor used. It therefore seems appropriate to use the methods of information theory to judge the relationship between the stimulus, the actor and the rater” (p. 11). Indeed, the information-theoretic measures have many desirable characteristics: they are relative measures, bias-free (e.g., do not depend on criterion shifts), do not require any parametric assumptions, scale with the number of confidence rating points, do not require the assumption of ratio or order of confidence ratings, and are relatively easy to compute. Both Dayan [3] and Fitousi [4] performed simulation with canonical models to support the validity of the information-theoretic measures they developed. Both authors demonstrated convergence of their information-theoretic measures based on synthetic data. However, neither of them demonstrated the viability of these measures with real empirical datasets, nor did they examine whether the construct validity of those measures was achieved in a genuine empirical setting. The present study aims to address these objectives.

2. An Information-Theoretic Approach to Metacognitive Efficiency

Recent years have seen a renewed interest [35] in information-theoretic approaches to perception and cognition [36,37,38,39,40,41,42,43,44]. Many researchers have begun to realize the potential strength of information theory [45,46] in providing powerful tools for addressing the uncertainty that characterizes brain and behavior [47,48]. In two recent studies, Dayan [3] and Fitousi [4] have developed complementary information-theoretic approaches to metacognition that build on the idea of confidence judgments as a communication system. An illustration of such a system is presented as a Venn diagram in Figure 2. This Venn diagram depicts a communication model that consists of three entities: Stimuli, an Actor, and a Rater. The uncertainty associated with each variable is denoted by a circle in a different color. This diagram can guide rudimentary insights into the informational quantities that affect the actor’s and rater’s accuracy. For example, the discriminability of the actor’s response can be quantified by the mutual information shared between the stimuli and the actor. The rater’s accuracy depends on the amount of mutual information they share with the actor.

2.1. M e t a I

Dayan [3] outlined an information-theoretic measure of metacognitive efficiency he called m e t a I . Consider an actor who performs in a two-choice signal detection task with consequences of a = 1, or a = −1. The actor’s response is r, and they are correct when r = a and incorrect when r a . The confidence–accuracy discrete distributions can be used to assess the metacognitive efficiency of the rater as a difference between two entropies.
m e t a I = H 2 ( r | a = 1 ) H 2 ( r | c , a = 1 )
where
H 2 ( r | a = 1 ) = h 2 [ P ( r | a = 1 ) ]
and
H 2 ( r | c , a = 1 ) = c P ( c | a = 1 ) h 2 [ P ( r | c , a = 1 ) ]
where
h 2 [ P ( x ) ] = x = 0 ; 1 P ( x ) log 2 P ( x )
is the entropy of a Bernoulli random variable. The first entropy H 2 ( r | a = 1 ) is the overall uncertainty about the actor’s accuracy. In the SDT model it is determined by the value of the parameter d . When the actor is perfectly accurate and d or when the actor is perfectly wrong, this entropy amounts to 0 bits; when the actor is completely uncorrelated with d , which occurs when d 0 , this entropy amounts to 1 bits. The second entropy H 2 ( r | c , a = 1 ) is the weighted average uncertainty about the residual accuracy after observing the confidence rating c where the weights arrive from the probability of observing that rating c.
The m e t a I measure has many desirable characteristics: it is a relative measure, does not require any parametric assumptions, is bias-free, scales with the number of confidence rating points, does not require the assumption of ratio or order of confidence ratings, and offers definite bounds from above and from below.
The m e t a I is an ingenious measure in its own right. However, it can be modified to become a more general index of metacognitive efficiency. Note that the measure is contingent on accuracy (i.e., it depends on a specific partition, r = a, r a), while there is evidence that participants do not necessarily rate confidence based on the accuracy of response [49]. For example, Koriat [50,51] has demonstrated the operation of a consensuality principle in judgments of confidence, whereby participants give the highest confidence rating to the most popular option, not necessarily to the one that is objectively correct. Moreover, m e t a I was originally developed to assess metacognitive efficiency in the two-choice SDT task (i.e., it assumes two mutual choices, a = 1 or a = 1 ). However, m e t a I does not apply to tasks of multi-alternative choice [49]. For example, assume an experimental design with four stimuli that are categorized into four responses, and are then rated on a four-point confidence scale. In that case, m e t a I is not readily applicable. Thus, a more general version of m e t a I , which I call m e t a I , is needed. This version is not contingent on accuracy, and makes no assumptions about the structure of the experimental design (i.e., number of stimuli or/and responses). The new m e t a I proposed here meets these requirements. It is computed as the mutual information between response and the joint confidence-stimuli variable:
m e t a I = I ( r ; c , s ) = H ( r | s ) H ( r | s , c ) = H ( r ) + H ( c , s ) H ( s , r , c )
This quantity amounts to 0 bits when the actor shares no information with the stimuli and rater; it is maximal when the actor shares all the information with the stimuli and the rater. An upper-bound on m e t a I is determined by the variable with the lowest number of alternatives. For example, if the stimuli variable has the lowest number of alternatives, say four, then the upper bound is 2 bits. In the case of a two-choice SDT task, the upper bound is 1 bit. Decomposition of m e t a I into its entropy terms can provide more insights. According to one possible decomposition, it reflects the degree to which uncertainty in the actor’s response to the stimuli H ( r | s ) is reduced after observing the rater’s confidence judgments H ( r | s , c ) . According to another decomposition, m e t a I reflects the reduction in the combined uncertainty in the actor’s response and the confidence-stimuli variables H ( r ) + H ( s , c ) after removing the uncertainty in the joint stimuli-response-confidence variable H ( s , r , c ) . In its new form, m e t a I can be applied to complex experimental designs, with several stimuli and responses, and without the need to define the correctness of the response.

2.2. M e t a U , M e t a KL , and M e t a J

Fitousi [4] has recently developed three complementary information-theoretic measures of metacognitive efficiency, which he called m e t a U , m e t a KL , and m e t a J . The first measure bears close affinity with Dayan’s m e t a I , and its generalized form m e t a I . The measure quantifies metacognitive efficiency as the mutual information exchanged between the rater’s confidence judgments and the stimuli-response variable:
m e t a U = I ( c ; s , r ) = H ( c | s ) H ( c | s , r ) = H ( c ) + H ( s , r ) H ( s , r , c )
The index expresses the amount of information gained about confidence from knowing the actor’s response, after knowing the stimuli. When the rater’s metacognitive efficiency is null, m e t a U amounts to zero bit, setting a lower bound on efficiency. An upper-bound on m e t a U is placed by the variable with the lowest number of alternatives in the experiment. Decomposition of m e t a U into entropy terms can explicate the logic of this measure. In the first decomposition, m e t a U reflects the degree to which the conditional uncertainty of confidence after knowing the stimulus is reduced after observing the actor’s response. In the second decomposition, we note that m e t a U gauges the degree to which uncertainty in the confidence and stimulus-response variables H ( c ) + H ( s , r ) is reduced after observing the actor’s response H ( s , r , c ) . This measure corrects for type 1 performance by taking the difference between type 1 uncertainty H ( s , r ) and the type 1 and type 2 uncertainty H ( s , r , c ) , in a similar way to m e t a d - d .
Importantly, it can be shown that m e t a U equals m e t a I due to the symmetry of mutual information:
m e t a U = m e t a I I ( c ; s , r ) = I ( r ; c , s )
In the case of a two-choice SDT task, m e t a U and m e t a I are equal to Dayan’s m e t a I .
The other two measures advanced by Fitousi [4], m e t a KL and m e t a J , are based on the Kullback–Leibler (KL) [52] and Jeffrey’s divergences [53]. These divergences are applied to the normalized confidence–accuracy distributions P [ ( c o n f i d e n c e = y ) | C o r r e c t ) ] and Q [ ( c o n f i d e n c e = y ) | I n c o r r e c t ) ] . It should be noted though that, in principle, these divergence measures can be derived for any required partition of the stimuli-response joint variable, irrespective of accuracy. Accuracy is defined by labels (i.e., “correct”, “incorrect”) placed by the experimenter on the actor’s responses. However, information-theoretic quantities are agnostic to such labels. A rater can be metacognitively efficient even when they flip the “correct” and “incorrect” labels on the actor’s responses.
The new measure proposed by Fitousi, m e t a KL , is based on a KL divergence D K L ( Q P ) that measures the degree of separation between correct and incorrect accuracy–confidence distributions:
m e t a KL = D K L ( P Q ) = P ( c = y | C o r r e c t ) log 2 P ( c = y | C o r r e c t ) Q ( c = y | I n c o r r e c t ) )
It amounts to 0 bits when the rater cannot distinguish between correct and incorrect responses of the actor, and increases as the rater becomes more metacognitively efficient. An upper bound is met when the two distributions are at their maximal separability. An analogous expression of m e t a KL gauges the difference between the entropy in the joint correct and incorrect distribution and the entropy in the correct distribution:
m e t a KL = D K L ( P Q ) = H ( P , Q ) H ( P ) = H ( c = y | C o r r e c t , c = y | I n c o r r e c t ) H ( c = y | C o r r e c t )
Metacognitive efficiency is null when uncertainty in the joint correct–incorrect confidence–accuracy distribution is equal to the uncertainty in the correct confidence–accuracy distribution. Metacognitive efficiency increases as the uncertainty in the correct distribution decreases. It reaches an upper bound when the latter is 0 bits; namely, the rater eliminates all uncertainty in the actor’s correct (incorrect) responses to the stimuli.
The m e t a KL measure does not satisfy the triangle inequality, and is also not symmetric in the two distributions, since D K L ( P Q ) D K L ( Q P ) . In this sense, m e t a KL does not qualify as a metric [54]. To address these constraints, Fitousi [4] has also proposed m e t a J based on the Jeffrey’s divergence [53,55], which offers a symmetric measure of divergence:
m e t a J = J ( P , Q ) = D K L ( P Q ) + D K L ( Q P ) = P ( c = y | C o r r e c t ) log 2 P ( c = y | C o r r e c t ) Q ( c = y | I n c o r r e c t ) ) + Q ( c = y | I n c o r r e c t ) log 2 Q ( c = y | I n c o r r e c t ) P ( c = y | C o r r e c t ) )
This measure functions in a similar way to m e t a KL . It equals 0 bits when the rater cannot distinguish between the correct and incorrect accuracy–confidence distributions, and increases as the rater becomes more efficient. However, unlike m e t a KL , it takes into consideration the two possible KL divergences, D K L ( P Q ) and D K L ( Q P ) , between the correct and incorrect confidence–accuracy distributions. Another way of expressing the same measure:
m e t a J = J ( P , Q ) = 2 × H ( P , Q ) H ( P ) H ( Q ) = 2 × H ( c = y | C o r r e c t , c = y | I n c o r r e c t ) H ( c = y | C o r r e c t ) H ( c = y | I n c o r r e c t )
suggests that metacognitive efficiency gauges the degree to which the rater reduces uncertainty in the actor’s response by eliminating uncertainty in the correct accuracy–confidence distribution, and/or in the incorrect accuracy–confidence distribution. The m e t a J takes into consideration two sources of uncertainty, weighting them as two independent contributions to metacognitive efficiency. This is in contrast to the m e t a KL , which gives all the weight to the uncertainty in the correct accuracy–confidence distribution.
To validate these novel measures, Fitousi [4] has performed simulations with the m e t a d model [1,33] using the free code published by Maniscalco and Lau [1]. He varied the values of d and m e t a d from 0.01 to 4, incrementally in equal steps of 0.21 and in an orthogonal fashion. For each combination of d and m e t a d , he generated the corresponding confidence–accuracy distribution by simulating 10,000 trials. The Type 1 bias (the SDT parameter c) was set to 0. The number of ratings was fixed at 4. The information-theoretic measures m e t a U , m e t a KL , and m e t a J were computed for each confidence–accuracy distribution, and plotted in three separate heatmaps, with d in the abscissa and m e t a d in the ordinate. The code for the meta -d’ model was downloaded 10.08.2018 from: http://www.columbia.edu/ bsm2105/type2sdt/archive/index.html. The other codes can be found in https://osf.io/hsvfm/. Figure 3 presents the results of these simulations. As can be noted, all three heatmaps reveal one of the most important properties of an efficiency measures, namely that the rater’s sensitivity is corrected for Type 1 performance. To see why, note that the highest values of metacognitive efficiency are achieved when the actor’s sensitivity ( d ) is low and the rater’s sensitivity ( m e t a d ) is high. In all three heatmaps, as the actor’s sensitivity increases, there is less and less room for the rater to accomplish high metacognitive efficiency. Also notable are the differences between the mutual information measure ( m e t a U ), and the divergence measures ( m e t a KL , m e t a J ) with respect to the maximal values they can take and their overall dynamics.
The information-theoretic measures have many valuable characteristics. These are discussed in detail in the papers by Dayan [3] and Fitousi [4]. Here, I give a brief summary of the most important aspects: (a) relative measures, (b) bias-free, (c) do not require any parametric assumptions, (d) scale with the number of confidence rating points, (e) do not require the assumption of ratio or order of confidence ratings, (f) easy to compute, (g) offer definite lower and upper bounds on performance, and (h) can be related to other entropy measures at the neuronal level.
One important question that remains to be addressed is whether these measures of metacognitive efficiency converge on the same psychological construct. Previous efforts by Dayan [3] and Fitousi [4] have emulated various simulations with theoretical models to show that the measures capture the same psychological construct. However, there is still the need to assess the construct validity of these measures with real datasets. If these different indexes capture the same psychological construct, they should correlate in real empirical data. This hypothesis is subjected to scrutiny in the next section.

3. Empirical Validation

3.1. The Face-Matching Task

Here, I provide an empirical validation of the information-theoretic measures of metacognitive efficiency. To do so, I harness the face-matching task [56], which has become an important tool in assessing sundry aspects of face recognition [57,58,59,60]. In this task, participants are presented with two face images taken under various lighting conditions, with different cameras, and at different times. Participants are asked to decide whether a given pair of face images belong to the same identity, or alternatively to two different identities. Figure 4 presents examples of pairs of face images from the Glasgow Face Matching Task (GFMT, [56]) and the Kent Face Matching Task (KFMT, [61])—two of the most popular standard tests, which were also deployed in the present investigation. It turns out that with unfamiliar facial identities, the face-matching task poses considerable challenge for both humans [23,56,57,58,59,60] and machines [62]. This is mainly because under variable ambient conditions, images of the same personal identity can look completely different, whereas images of different identities may appear similar. Despite extensive research on the face-matching task [63,64], the underlying mechanisms are still not well understood. For example, many researchers believe that match and mismatch face pairs are governed by two separate mechanisms [58,59]. However, a recent paper by Fitousi [23] has challenged this position, by introducing an unequal-variance signal detection model (UVSD) of the face matching task. This model postulates a single unified mechanism that can account for performance with both match and mismatch stimuli. The model can also explain many related aspects of face matching, including Type 1 confidence judgments, similarity ratings, and the relations between Type 1’s accuracy and confidence.

3.2. Metacognition in the Face-Matching Task

Notwithstanding the wealth of research on the face-matching task [57,65,66,67], only little is known about the underlying metacognitive capabilities with which participants perform this task [68,69,70,71,72]. The majority of studies that investigated metacognition in face-recognition tasks [69,70,71,72] found medium or moderate levels of metacognitive abilities. These studies deployed face recognition tasks and questionnaire-based self-estimates. Other studies [73,74] showed that participants can provide confidence judgments that differentiate between correct and incorrect Type 1 performance. However, the measures deployed by those studies are suboptimal, mainly because they are subjected to the confounds mentioned earlier, namely bias and inability to correct for Type 1 performance. The application of the M r a t i o index, along with the novel information-theoretic measures m e t a U , m e t a KL , and m e t a J to the face-matching task, can ameliorate most of these shortcomings, and shed new light on the underlying mechanisms that govern face recognition. Most importantly, the joint application of these measures is needed to garner strong evidence for their construct validity and convergence.
The face-matching task is ideal for our present purposes. It is challenging enough to produce below-ceiling accuracy for all participants [23], enabling us to compute SDT and associated measures. Furthermore, it does not require an additional staircase procedure, which is known to inflate both model-based and model-free measures of metacognitive sensitivity [75]. Moreover, the task is implemented in various well-established face-matching tests [56,61], for which the normative type 1 performance characteristics are well known. A signal-detection model of the task has been recently advanced [23]. However, little is known about the metacognitive efficiency of observers in this task. Taken together, these factors make the face-matching task a compelling testing platform.
Two experiments were conducted. In Experiment 1, participants engaged in the long version of the GFMT [56]. In Experiment 2, a new sample of participants were tested with the short version of the KFMT [61].

4. Method

4.1. Participants

A total of 140 participants took part in Experiment 1 and Experiment 2. In total, 70 participants (14 men, 56 women, mean age = 23.03, sd = 2.14) participated in Experiment 1; and a second sample of 70 participants (16 men, 54 women, mean age = 22.7, sd = 3.1) participated in Experiment 2. The participants were recruited from the pool of participants of Ariel University. The study was approved by the Ariel University Ethical Committee (AU-SOC-DF-20210411). All participants gave their informed consent.

4.2. Stimuli and Apparatus

The set of stimuli in Experiment 1 consisted of the long version of the GFMT [56]. This includes 168 pairs of Caucasian, unfamiliar faces. The faces were high-quality grayscale photos taken in full-view with different cameras, conveying neutral expression and pose. Each image was approximately 12 × 7 cm. Half of the face pairs (84) were “identity-match”, and the other half were “identity-mismatch”. The set of stimuli in Experiment 2 consisted of the short version of the KFMT [61]. There were 40 pairs of unfamiliar faces, with each pair composed of two color photos: a frontal image on the right and a student ID image on the left. The frontal image was high-quality of approximately 283 × 332 pixels, with a neutral pose and expression. The student ID image was a color image of approximately 142 × 192 pixels. The ID images varied in pose, expression, and lighting. Half of the face pairs (20) were “identity-match”, and the other half were “identity-mismatch”.

4.3. Procedure and Design

The general method in both experiments was identical. Participants were sitting in front of a computer screen. In each trial, they were presented with a randomly selected face pair, and were asked to decide as accurately and as speedily as they can whether the pair constituted a “match” or a “mismatch” by pressing one of two computer keys. Immediately after responding, the images were removed from the screen, and a four-point confidence scale appeared on the screen with 1 = “not confident” and 4 = “highly confident”. In Experiment 1, participants completed two blocks of trials, which amounted to a total of 336 trials. In Experiment 2, participants completed four blocks, which resulted in a total of 160 trials. The experiments were run with the macromedia Authorware 7 software.

5. Results

5.1. Face Matching

All data, codes, and supporting materials can be downloaded from https://osf.io/hsvfm/. The measures were computed for each participant in each experiment. Of the available parametric measures, I focus on the M r a t i o measure, because it is considered to be the most reliable [76,77]. To assess the M r a t i o , I fit the m e t a d and d parameters to the data of each participant in each experiment using the code published by Maniscalco and Lau (downloaded 10 August 2018 from: http://www.columbia.edu/~bsm2105/type2sdt/archive/index.html). For five participants in each experiment, the resulting fitting procedure provided negative m e t a d values, so they were removed from the remaining analyses. The information-theoretic values do not require any fitting, and are computed directly from the empirical data using dedicated codes that I have developed and can be reproduced in https://osf.io/hsvfm/.
First, it would be valuable to look at the underlying individual differences in the Type 1 and Type 2 measures. Figure 5 and Figure 6 present the empirical frequency distributions of those variables in Experiment 1 and Experiment 2, respectively. These can provide insights into the nature of individual differences in both Type 1 and Type 2 performance, across the two experiments. These distributions can also teach us about the sampling characteristics of the parametric and non-parametric measures. One key observation is that the distributions are not necessarily symmetric. Those associated with the information-theoretic exhibit mostly skewed asymmetric negative distributions, with the majority of observations located on the lower end of the data range. The M r a t i o shows this trend only in Experiment 2, but not in Experiment 1.
The GFMT administrated in Experiment 1 was relatively easier ( d = 2.51, sd = 0.65) than the KFMT in Experiment 2 ( d = 1.01, sd = 0.41), an outcome that is well-expected [23]. However, the results from both experiments pointed to the conclusion that, on average, participants have metaknowledge about their face-matching performance. The mean M r a t i o amounted to 1.44 (sd = 0.66) in Experiment 1 and 2.83 (sd = 3.25) in Experiment 2. The M r a t i o in Experiment 2 seems larger than often found in other domains [78]. This might be due to the relatively small number of trials in the short version of the KFMT. Recall that for M r a t i o a value greater than 1 entails hypersensitivity. One-sided Student’s t-tests supported these conclusions, with all t-statistics being significant at the 0.05 level. The results of the M r a t i o strongly suggest that participants in the face-matching task can monitor the level of their performance in the face-matching task, irrespective of their Type 1 level of performance.
Next, I turn to our novel information-theoretic measures: m e t a U , m e t a KL , and m e t a J . To test whether participants acquired metaknowledge regarding their face-matching performance, I compared those values to 0 bits of information. The null hypothesis is that participants have no metaknowledge whatsoever, and the information-based measures equal 0 bits. The average m e t a U was greater than 0 bit in both experiments. In Experiment 1, it amounted to 0.22 bits (sd = 0.17) [ t ( 65 ) = 12.45 ,   p < 0.001 ] . In Experiment 2, it was 20.37 bits (sd = 0.21) [ t ( 65 ) = 14.25 ,   p < 0.001 ] . Recall that in the two-choice task m e t a U is quantitatively comparable to Dayan’s [3 m e t a I . The average m e t a KL value in Experiment 1 amounted to 1.54 bits (sd = 0.95) [ t ( 65 ) = 13.39 ,   p < 0.001 ] . In Experiment 2, it amounted to 1.11 bits (sd = 0.74) [ t ( 65 ) = 12.08 ,   p < 0.001 ] . The average m e t a J in Experiment 1 amounted to 4.15 bits (sd = 2.37) [ t ( 65 ) = 14.43 ,   p < 0.001 ] , and 2.54 bits (sd = 1.74) [ t ( 65 ) = 11.84 ,   p < 0.001 ] in Experiment 2. The information-theoretic measures are consistent with the M r a t i o results, supporting the conclusion that participants in Experiment 1 and Experiment 2 acquired genuine metaknowledge about the correctness of their own decisions.

5.2. Testing for Construct Validity

A viable approach for testing construct validity [24] would be to see whether different measures of the same putative construct—metacognitive efficiency—correlate to some extent. I have, therefore, performed a series of linear regressions with all possible pairs of measures (M-ratio, m e t a U , m e t a KL , and m e t a J ), in order to quantify the corresponding coefficient of determination  R 2 , and Pearson’s correlation coefficient r. Figure 7 and Figure 8 present the results of these analyses, respectively, for Experiment 1 and Experiment 2.
As can be noted in Figure 7 and Figure 8, significant positive and strong relations ensued between each pair of measures in both experiments. Overall, higher R 2 values obtained for the KFMT (Experiment 2) than the GFMT (Experiment 1) data. In the former, all coefficient of determination values were larger than 0.47 , or a Pearson correlation of r = 0.68 . In the latter, all those values exceeded 0.29 , or a Pearson correlation of r = 0.53 , all significant at the p < 0.05 level. The strongest relations were observed between m e t a KL and m e t a J ( R 2 = 0.94 ). This is quite expected given their mathematical affinity. Taken together, these results confer validation support in the convergence of the information-theoretic measures on the common psychological structure of metacognitive efficiency. The convergence holds for both within-information measures and between SDT- and information-based measures. This strengthens our belief in their construct validity.

6. General Discussion

I have introduced several information-theoretic measures of metacognitive efficiency recently developed by Dayan [3] and Fitousi [4]. These indexes consist of the m e t a I [3] and the m e t a I , m e t a U , m e t a KL , and m e t a J [4]. These measures were originally developed within a common theoretical framework, assuming a communication system with stimuli (experimenter), an actor, and a rater as encoders/decoders. I provided a succinct account of each measure, along with its possible interpretations. I then applied these information indexes, along with the M r a t i o to data from two face-matching experiments [56]. The results provided considerable support in the convergence of these measures on a single psychological construct—metacognitive efficiency. The measures exhibited strong correlations both across-themselves and with the parametric M r a t i o . Another important result that is particularly relevant to research on face recognition is that participants in the face-matching task exhibit significant metaknowledge about their performance in the task. This finding was supported by both the parametric measure M r a t i o and the information-based measures.
One source of motivation for developing information-theoretic markers of metacognitive efficiency has to do with the limitations of existing model-based measures, such as m e t a d d and M r a t i o . These indexes are built on the dubious assumption that they should fit exactly to a type 1 decision process [3]. They are subjected to decisional bias [1,2]. They cannot accommodate changes in the granularity of type 2 confidence ratings [3], nor do they scale with the number of type 1 stimuli or responses [4]. Moreover, they cannot be applied in designs with more than two stimuli and two responses, namely in multi-attribute classification tasks [79]. Information-theoretic measures address most of these limitations. First, because these measures do not require parametric assumptions, they can be computed relatively easily. Second, because they do not assume a particular number of stimuli or responses, they can be applied to tasks which incorporate multi-alternative choices. Third, they can adapt to changes in the granularity of stimuli, response, and confidence alternatives. It is important to note that the deployment of information-based measures does not exclude the testing of parametric, hierarchical models [31,76]. As a matter of fact, these measures can serve as an important yardstick in the development of such models, mainly because they are expressed by the most general quantity of entropy, and as such, can transcend different assumptions concerning distributions, parameters, and psychological mechanisms. This fact is reflected in our present findings, which show that these measures are compatible with the M r a t i o . Researchers can take advantage of the information-based measures to relate confidence at the behavioral and neuronal levels, since uncertainty at both levels can be quantified in bits. Another reason for advancing the information-theory approach to metacognition has to do with the suitability of such a theory to characterize capacity limitations [45], rate-distortions [80,81], information-bottlenecks [82,83], and other quantities that are not readily available in current approaches (e.g., SDT). Future work on confidence and metacognition should seek to expand our theoretical understanding by building on these powerful notions.

6.1. Dependency on Type 1 Performance

Applied metacognition researchers would ideally like to have a measure that captures the underlying construct of metacognitive efficiency in each individual without contamination by type 1 performance [76]. In practice, this ideal is rarely met. Using synthetic data from simulations with state-of-the-art parametric models, both Guggenmos [76] and Rausch and colleagues [77] have shown that the M-ratio measure is not completely independent of type 1 performance. Notably though, Guggenmos [76] found that, when tested with empirical data, the M-ratio exhibited only negligible dependency on type 1 performance, making it an almost ideal measure. Thus, a question that remains to be answered concerns the ability of information-theoretic measures to accommodate type 1 performance. This is an empirical question that can be tested. In any event, one may rightly argue that in some sense a parametric approach (e.g., meta - d’) is superior to the information-theoretic approach advanced here. This is primarily because the former allows us to know the underlying generating process, whereas the latter is apparently agnostic to this process. Indeed, several researchers [76,77,84,85] have argued that measures of metacognition should ideally be derived by capturing the hierarchical model of type 1 and 2 performance and making inferences on the critical parameters. If so, why should one advance information-theoretic measures in the first place? The answer to this question is threefold.
First, processing models of metacognition are difficult to develop for many cases. For example, at present, there are only a few process models for multi-alternative tasks [79]. The information-theoretic measures presented here can be readily applied to a wide scope of applications, including the multi-alternative case. Second, the deployment of information-theoretic measures does not necessarily exclude the development and testing of hierarchical parametric models [86]. On the contrary, because information-based measures are mostly invariant to the underlying parametric distributions, they can be computed and compared across a range of parametric models [86]. In this way, information-based measures can serve as a yardstick for model-building and testing. Third, entropy is ubiquitous across all system’s levels [42], including behavioral (reaction times, eye movements), neuronal (single-cell spikes, cell-ensemble signals), and chemical (ion concentration) levels. This domain-generality makes information-based measures incredibly instrumental in bridging remote levels of processing. For example, one can gain new insights by relating quantities of behavioral metacognitive efficiency with signatures of entropy generated by neuronal and chemical processing in the brain [41,86].

6.2. Practical Advice to Practitioners of Information-Based Measures

The convergence of the information-theoretic measures suggests that they can be used interchangeably. However, in practical applications, the m e t a J , for example, is preferable to the m e t a KL  for two reasons. First, it is a metric, as it complies with the triangle equality [53]; second, it takes into consideration reduction in entropy from both the correct and incorrect accuracy–confidence distributions. The m e t a KL measure, in contrast, is not a metric [54]; it also does not consider the reduction in entropy in the incorrect accuracy–confidence distribution. This is important if one is interested in dissociating two alternative mechanisms by which raters weight their evidence. This difference can account for the positive evidence bias [33,87,88,89,90], whereby raters overweight evidence in favor of the actor’s decision and underweight evidence against the actor’s decision. Moreover, the m e t a U and m e t a I are of broader applicability than m e t a KL and m e t a J , mainly because they are not contingent on the accuracy of the actor’s response. As such, they can be applied to tasks in which the correctness of the actor’s response is not defined [91]. Another prominent advantage of all the information-theoretic measures is that they can be computed for tasks that require categorization of multi-attribute stimuli [49]. This is currently impossible to achieve with the m e t a d approach. A related advantage of information-based measures is that type 1 performance can be measured by I ( S ; R ) —the mutual information between stimuli and response [4]. In two-choice SDT designs, I ( S ; R ) is monotonically related to d [38]. This affords researchers to quantify both type 1 and type 2 performance in terms of information-theoretic quantities.
Another relevant issue that emerges from the application of information theory to metacognition concerns the granularity of the confidence judgment scale. If the raters use every point in their scale equally often, then a rater that operates with an eight-point scale can be more sensitive than a rater who operates with a four-point scale. The reason is that the former can transmit more information than the latter [92]. Both Dayan [3], and Fitousi [4] have shown via simulations that the information-theoretic measures accommodate the granularity of the confidence scale, that is, metacognitive efficiency can potentially improve as the number of rating alternatives increases. SDT-related measures do not exhibit this important characteristic. Moreover, the information-theoretic measures also accommodate the granularity of the stimuli and response alternatives. Finally, I would recommend researchers to deploy all three major information-theoretic measures together, if possible, to enhance the validity of their conclusions.

6.3. Implications for Face Recognition

The present effort bears important implications for research on face recognition and individual differences in metacognitive abilities. Do people have insights into their face-recognition capabilities? The majority of studies who tested for this question [69,70,71,72,73,74] found medium or moderate levels of metacognitive abilities. These studies employed self-reports, questionnaires, and confidence judgments. However, these tools are susceptible to the vagaries and biases mentioned at the outset, such as response bias, or differences in Type 1 performance. These problems are particularly acute in the face-recognition domain where the presence of large-scale individual differences in both Type 1 and Type 2 performance is well-known [93]. For example, on the one end of the spectrum, one can find participants with prosopagnosia [94]. These participant exhibit very poor face-recognition performance; while on the other end, there are super-recognizers [69]—participants with extraordinary face-recognition abilities. There are good reasons to believe that those two populations also differ in their metaknowledge concerning their type 1 performance. The upshot is that in order to study metacognitive abilities in the domain of face recognition, one must employ measures that correct for type 1 performance and for the possibility of response biases. The present study provides, what I believe to be, the first genuine evidence for high metacognitive sensitivity of observers in their face-recognition abilities. This evidence is not confounded by type 1 performance or bias. A related approach that can be tremendously instrumental in characterizing individual differences in both type 1 and type 2 performance is the development of hierarchical Bayesian models [31] of face-recognition tasks. Such models can assist in specifying the parameters of individual participants as resulting from higher hyper-parameters of the population. Although the information-theoretic measures are not parametric in nature, they can still be computed in such hierarchical models.

Funding

This research was funded by the ISRAEL SCIENCE FOUNDATION (grant No. 1498/21).

Institutional Review Board Statement

The study was approved by Ariel University Ethical Committee (AU-SOC-DF-20210411). Approval date 7 April 2024.

Data Availability Statement

Data can be found in https://osf.io/hsvfm/.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Maniscalco, B.; Lau, H. A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Conscious. Cogn. 2012, 21, 422–430. [Google Scholar] [CrossRef] [PubMed]
  2. Maniscalco, B.; Lau, H. Signal detection theory analysis of type 1 and type 2 data: Meta-d, response-specific meta-d, and the unequal variance SDT model. In The Cognitive Neuroscience of Metacognition; Springer: Berlin/Heidelberg, Germany, 2014; pp. 25–66. [Google Scholar]
  3. Dayan, P. Metacognitive information theory. Open Mind 2023, 7, 392–411. [Google Scholar] [PubMed]
  4. Fitousi, D. Bits of Confidence: Metacognition as Uncertainty Reduction. 2025, Submitted. Available online: https://osf.io/z5swc/.
  5. Schulz, L.; Rollwage, M.; Dolan, R.J.; Fleming, S.M. Dogmatism manifests in lowered information search under uncertainty. Proc. Natl. Acad. Sci. USA 2020, 117, 31527–31534. [Google Scholar] [CrossRef]
  6. Schulz, L.; Fleming, S.M.; Dayan, P. Metacognitive computations for information search: Confidence in control. Psychol. Rev. 2023, 130, 604–639. [Google Scholar]
  7. Peirce, C.S.; Jastrow, J. On small differences in sensation. Mem. Natl. Acad. Sci. 1884, 3–11, 73–83. [Google Scholar]
  8. Henmon, V.A.C. The relation of the time of a judgment to its accuracy. Psychol. Rev. 1911, 18, 186–201. [Google Scholar] [CrossRef]
  9. Lau, H. In Consciousness We Trust: The Cognitive Neuroscience of Subjective Experience; Oxford University Press: Oxford, UK, 2022. [Google Scholar]
  10. Metcalfe, J.; Shimamura, A.P. Metacognition: Knowing About Knowing; MIT Press: Cambridge, MA, USA, 1994. [Google Scholar]
  11. Fleming, S.M.; Dolan, R.J. The neural basis of metacognitive ability. Philos. Trans. R. Soc. B Biol. Sci. 2012, 367, 1338–1349. [Google Scholar] [CrossRef]
  12. Fleming, S.M.; Weil, R.S.; Nagy, Z.; Dolan, R.J.; Rees, G. Relating introspective accuracy to individual differences in brain structure. Science 2010, 329, 1541–1543. [Google Scholar] [CrossRef] [PubMed]
  13. Fleming, S.M. Metacognition and confidence: A review and synthesis. Annu. Rev. Psychol. 2024, 75, 241–268. [Google Scholar] [CrossRef]
  14. Meyniel, F.; Schlunegger, D.; Dehaene, S. The sense of confidence during probabilistic learning: A normative account. PLoS Comput. Biol. 2015, 11, e1004305. [Google Scholar] [CrossRef]
  15. Koriat, A.; Goldsmith, M. Monitoring and control processes in the strategic regulation of memory accuracy. Psychol. Rev. 1996, 103, 490–517. [Google Scholar] [PubMed]
  16. Koriat, A. Monitoring one’s own knowledge during study: A cue-utilization approach to judgments of learning. J. Exp. Psychol. Gen. 1997, 126, 349–370. [Google Scholar] [CrossRef]
  17. Koriat, A. The feeling of knowing: Some metatheoretical implications for consciousness and control. Conscious. Cogn. 2000, 9, 149–171. [Google Scholar]
  18. Persaud, N.; McLeod, P.; Cowey, A. Post-decision wagering objectively measures awareness. Nat. Neurosci. 2007, 10, 257–261. [Google Scholar]
  19. Van den Berg, R.; Zylberberg, A.; Kiani, R.; Shadlen, M.N.; Wolpert, D.M. Confidence is the bridge between multi-stage decisions. Curr. Biol. 2016, 26, 3157–3168. [Google Scholar] [PubMed]
  20. Hoven, M.; Lebreton, M.; Engelmann, J.B.; Denys, D.; Luigjes, J.; van Holst, R.J. Abnormalities of confidence in psychiatry: An overview and future perspectives. Transl. Psychiatry 2019, 9, 268. [Google Scholar] [CrossRef]
  21. Bahrami, B.; Olsen, K.; Latham, P.E.; Roepstorff, A.; Rees, G.; Frith, C.D. Optimally interacting minds. Science 2010, 329, 1081–1085. [Google Scholar] [PubMed]
  22. Vaccaro, A.G.; Fleming, S.M. Thinking about thinking: A coordinate-based meta-analysis of neuroimaging studies of metacognitive judgements. Brain Neurosci. Adv. 2018, 2, 2398212818810591. [Google Scholar]
  23. Fitousi, D. A Signal-detection based confidence-similarity model of face-matching. Psychol. Rev. 2024, 131, 625–663. [Google Scholar]
  24. Garner, W.R.; Hake, H.W.; Eriksen, C.W. Operationism and the concept of perception. Psychol. Rev. 1956, 63, 149–159. [Google Scholar]
  25. Clarke, F.R.; Birdsall, T.G.; Tanner, W.P., Jr. Two types of ROC curves and definitions of parameters. J. Acoust. Soc. Am. 1959, 31, 629–630. [Google Scholar] [CrossRef]
  26. Galvin, S.J.; Podd, J.V.; Drga, V.; Whitmore, J. Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions. Psychon. Bull. Rev. 2003, 10, 843–876. [Google Scholar] [CrossRef]
  27. Fleming, S.M.; Daw, N.D. Self-evaluation of decision-making: A general Bayesian framework for metacognitive computation. Psychol. Rev. 2017, 124, 91–114. [Google Scholar] [CrossRef]
  28. Green, D.M.; Swets, J.A. Signal Detection Theory and Psychophysics; Wiley: New York, NY, USA, 1966; Volume 1. [Google Scholar]
  29. Macmillan, N.A.; Creelman, C.D. Detection Theory: A User’s Guide; Psychology Press: East Sussex, UK, 2004. [Google Scholar]
  30. Nelson, T.O. A comparison of current measures of the accuracy of feeling-of-knowing predictions. Psychol. Bull. 1984, 95, 109–133. [Google Scholar] [CrossRef] [PubMed]
  31. Fleming, S.M. HMeta-d: Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neurosci. Conscious. 2017, 2017, nix007. [Google Scholar] [CrossRef]
  32. Kunimoto, C.; Miller, J.; Pashler, H. Confidence and accuracy of near-threshold discrimination responses. Conscious. Cogn. 2001, 10, 294–340. [Google Scholar] [CrossRef] [PubMed]
  33. Maniscalco, B.; Peters, M.A.; Lau, H. Heuristic use of perceptual evidence leads to dissociation between performance and metacognitive sensitivity. Atten. Percept. Psychophys. 2016, 78, 923–937. [Google Scholar] [CrossRef] [PubMed]
  34. Shekhar, M.; Rahnev, D. Sources of metacognitive inefficiency. Trends Cogn. Sci. 2021, 25, 12–23. [Google Scholar] [CrossRef]
  35. Luce, R.D. Whatever happened to information theory in psychology? Rev. Gen. Psychol. 2003, 7, 183–188. [Google Scholar]
  36. Knill, D.C.; Pouget, A. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends Neurosci. 2004, 27, 712–719. [Google Scholar] [CrossRef]
  37. Feldman, J. Information-theoretic signal detection theory. Psychol. Rev. 2021, 128, 976–987. [Google Scholar] [PubMed]
  38. Killeen, P.; Taylor, T. Bits of the ROC: Signal detection as information transmission. 2001; unpublished work. [Google Scholar]
  39. Fitousi, D. Mutual information, perceptual independence, and holistic face perception. Atten. Percept. Psychophys. 2013, 75, 983–1000. [Google Scholar]
  40. Fitousi, D.; Algom, D. The quest for psychological symmetry through figural goodness, randomness, and complexity: A selective review. i-Perception 2024, 15, 1–24. [Google Scholar]
  41. Fitousi, D. Quantifying Entropy in Response Times (RT) Distributions Using the Cumulative Residual Entropy (CRE) Function. Entropy 2023, 25, 1239. [Google Scholar] [CrossRef]
  42. Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 2010, 11, 127–138. [Google Scholar]
  43. Norwich, K.H. Information, Sensation, and Perception; Academic Press: San Diego, CA, USA, 1993. [Google Scholar]
  44. Hirsh, J.B.; Mar, R.A.; Peterson, J.B. Psychological entropy: A framework for understanding uncertainty-related anxiety. Psychol. Rev. 2012, 119, 304–320. [Google Scholar] [PubMed]
  45. Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar]
  46. Shannon, C.; Weaver, W. The Mathematical Theory of Communication; First Published in 1949; Urbana University of Illinois Press: Champaign, IL, USA, 1949. [Google Scholar]
  47. Friston, K. The free-energy principle: A rough guide to the brain? Trends Cogn. Sci. 2009, 13, 293–301. [Google Scholar]
  48. Pouget, A.; Drugowitsch, J.; Kepecs, A. Confidence and certainty: Distinct probabilistic quantities for different goals. Nat. Neurosci. 2016, 19, 366–374. [Google Scholar]
  49. Li, H.H.; Ma, W.J. Confidence reports in decision-making with multiple alternatives violate the Bayesian confidence hypothesis. Nat. Commun. 2020, 11, 2004. [Google Scholar]
  50. Koriat, A. Subjective confidence in one’s answers: The consensuality principle. J. Exp. Psychol. Learn. Mem. Cogn. 2008, 34, 945–959. [Google Scholar] [CrossRef] [PubMed]
  51. Koriat, A. The self-consistency model of subjective confidence. Psychol. Rev. 2012, 119, 80–113. [Google Scholar] [PubMed]
  52. Kullback, S. Information Theory and Statistics; Courier Corporation: North Chelmsford, MA, USA, 1959. [Google Scholar]
  53. Jeffrey, R.C. Probability and the Art of Judgment; Cambridge University Press: Cambridge, UK, 1992. [Google Scholar]
  54. Cover, T.; Thomas, J. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1991. [Google Scholar]
  55. Pearl, J. Jeffrey’s rule, passage of experience, and neo-Bayesianism. In Knowledge Representation and Defeasible Reasoning; Springer: Berlin/Heidelberg, Germany, 1990; pp. 245–265. [Google Scholar]
  56. Burton, A.M.; White, D.; McNeill, A. The Glasgow face matching test. Behav. Res. Methods 2010, 42, 286–291. [Google Scholar] [CrossRef] [PubMed]
  57. White, D.; Towler, A.; Kemp, R. Understanding professional expertise in unfamiliar face matching. In Forensic Face Matching: Research and Practice; Oxford University Press: Oxford, UK, 2021; pp. 62–88. [Google Scholar]
  58. Megreya, A.M.; Burton, A.M. Unfamiliar faces are not faces: Evidence from a matching task. Mem. Cogn. 2006, 34, 865–876. [Google Scholar]
  59. Megreya, A.M.; Burton, A.M. Hits and false positives in face matching: A familiarity-based dissociation. Percept. Psychophys. 2007, 69, 1175–1184. [Google Scholar] [CrossRef]
  60. Kemp, R.; Towell, N.; Pike, G. When seeing should not be believing: Photographs, credit cards and fraud. Appl. Cogn. Psychol. Off. J. Soc. Appl. Res. Mem. Cogn. 1997, 11, 211–222. [Google Scholar] [CrossRef]
  61. Fysh, M.C.; Bindemann, M. The Kent face matching test. Br. J. Psychol. 2018, 109, 219–231. [Google Scholar]
  62. O’Toole, A.J.; An, X.; Dunlop, J.; Natu, V.; Phillips, P.J. Comparing face recognition algorithms to humans on challenging tasks. ACM Trans. Appl. Percept. (TAP) 2012, 9, 1–13. [Google Scholar]
  63. Bate, S.; Mestry, N.; Portch, E. Individual differences between observers in face matching. In Forensic Face Matching: Research and Practice; Oxford University Press: Oxford, UK, 2021; pp. 117–145. [Google Scholar]
  64. Towler, A.; White, D.; Kemp, R.I. Evaluating training methods for facial image comparison: The face shape strategy does not work. Perception 2014, 43, 214–218. [Google Scholar] [CrossRef]
  65. Jenkins, R.; White, D.; Van Montfort, X.; Burton, A.M. Variability in photos of the same face. Cognition 2011, 121, 313–323. [Google Scholar] [CrossRef]
  66. Bindemann, M.; Attard, J.; Leach, A.; Johnston, R.A. The effect of image pixelation on unfamiliar-face matching. Appl. Cogn. Psychol. 2013, 27, 707–717. [Google Scholar] [CrossRef]
  67. Megreya, A.M.; Bindemann, M. Feature instructions improve face-matching accuracy. PLoS ONE 2018, 13, e0193455. [Google Scholar]
  68. Kramer, R.S. Face matching and metacognition: Invetigating individual differences and a training intervention. PeerJ 2023, 11, e14821. [Google Scholar]
  69. Bobak, A.K.; Hancock, P.J.; Bate, S. Super-recognisers in action: Evidence from face-matching and face memory tasks. Appl. Cogn. Psychol. 2016, 30, 81–91. [Google Scholar] [CrossRef] [PubMed]
  70. Gray, K.L.; Bird, G.; Cook, R. Robust associations between the 20-item prosopagnosia index and the Cambridge Face Memory Test in the general population. R. Soc. Open Sci. 2017, 4, 160923. [Google Scholar] [CrossRef]
  71. Livingston, L.A.; Shah, P. People with and without prosopagnosia have insight into their face recognition ability. Q. J. Exp. Psychol. 2018, 71, 1260–1262. [Google Scholar]
  72. Ventura, P.; Livingston, L.A.; Shah, P. Adults have moderate-to-good insight into their face recognition ability: Further validation of the 20-item Prosopagnosia Index in a Portuguese sample. Q. J. Exp. Psychol. 2018, 71, 2677–2679. [Google Scholar]
  73. Bruce, V.; Henderson, Z.; Greenwood, K.; Hancock, P.J.; Burton, A.M.; Miller, P. Verification of face identities from images captured on video. J. Exp. Psychol. Appl. 1999, 5, 339–360. [Google Scholar] [CrossRef]
  74. Hopkins, R.F.; Lyle, K.B. Image-size disparity reduces difference detection in face matching. Appl. Cogn. Psychol. 2020, 34, 39–49. [Google Scholar]
  75. Rahnev, D.; Fleming, S.M. How experimental procedures influence estimates of metacognitive ability. Neurosci. Conscious. 2019, 2019, niz009. [Google Scholar]
  76. Guggenmos, M. Measuring metacognitive performance: Type 1 performance dependence and test-retest reliability. Neurosci. Conscious. 2021, 2021, niab040. [Google Scholar] [PubMed]
  77. Rausch, M.; Hellmann, S.; Zehetleitner, M. Measures of metacognitive efficiency across cognitive models of decision confidence. Psychol. Methods 2023, 1–20. [Google Scholar] [CrossRef]
  78. Guggenmos, M.; Wilbertz, G.; Hebart, M.N.; Sterzer, P. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback. Elife 2016, 5, e13388. [Google Scholar] [PubMed]
  79. Shekhar, M.; Rahnev, D. The nature of metacognitive inefficiency in perceptual decision making. Psychol. Rev. 2021, 128, 45–70. [Google Scholar]
  80. Berger, T. Rate-Distortion Theory; Wiley Encyclopedia of Telecommunications: New York, NY, USA, 2003. [Google Scholar]
  81. Sims, C.R. Rate–distortion theory and human perception. Cognition 2016, 152, 181–198. [Google Scholar]
  82. Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
  83. Fitousi, D. Complexity versus accuracy in metacognitive decision making: An information-bottleneck model. 2025; manuscript in preparation. [Google Scholar]
  84. Mamassian, P.; de Gardelle, V. Modeling perceptual confidence and the confidence forced-choice paradigm. Psychol. Rev. 2022, 129, 976. [Google Scholar]
  85. Rahnev, D. Visual metacognition: Measures, models, and neural correlates. Am. Psychol. 2021, 76, 1445–1453. [Google Scholar]
  86. Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
  87. Samaha, J.; Barrett, J.J.; Sheldon, A.D.; LaRocque, J.J.; Postle, B.R. Dissociating perceptual confidence from discrimination accuracy reveals no influence of metacognitive awareness on working memory. Front. Psychol. 2016, 7, 851. [Google Scholar]
  88. Peters, M.A.; Thesen, T.; Ko, Y.D.; Maniscalco, B.; Carlson, C.; Davidson, M.; Doyle, W.; Kuzniecky, R.; Devinsky, O.; Halgren, E.; et al. Perceptual confidence neglects decision-incongruent evidence in the brain. Nat. Hum. Behav. 2017, 1, 0139. [Google Scholar]
  89. Zylberberg, A.; Barttfeld, P.; Sigman, M. The construction of confidence in a perceptual decision. Front. Integr. Neurosci. 2012, 6, 79. [Google Scholar]
  90. Koizumi, A.; Maniscalco, B.; Lau, H. Does perceptual confidence facilitate cognitive control? Atten. Percept. Psychophys. 2015, 77, 1295–1306. [Google Scholar]
  91. Thurstone, L.L. Three psychophysical laws. Psychol. Rev. 1927, 34, 424–432. [Google Scholar]
  92. Garner, W.R. Rating scales, discriminability, and information transmission. Psychol. Rev. 1960, 67, 343–352. [Google Scholar] [PubMed]
  93. Kramer, R.S.; Jones, A.; Fitousi, D. Face familiarity and similarity: Within- and between-identity representations are altered by learning. J. Exp. Psychol. Hum. Percept. Perform. 2025; in press. [Google Scholar]
  94. Damasio, A.R.; Damasio, H.; Van Hoesen, G.W. Prosopagnosia: Anatomic basis and behavioral mechanisms. Neurology 1982, 32, 331–341. [Google Scholar]
Figure 1. The meta - d’ model [1]. (A) A signal detection theory (SDT) representation of the actor’s decisions in the type 1 task. (B) An SDT representation of the rater’s decisions in the type 2 task. (C) Empirical confidence-rating distributions conditional on type 1 correct and incorrect responses. A rater with high metacognitive sensitivity provides high confidence ratings when the actor is correct, and low confidence ratings when the actor is incorrect. A rater with low metacognitive sensitivity does not distinguish between the actor’s correct and incorrect responses. (D) A theoretical ROC curve of the type 2 task is generated by assuming a hypothetical equal-variance SDT model of confidence. The meta - d’ model [1,31,33] expresses the type 2 hit rates p ( H i t ) and false alarms p ( F A ) in terms of the divergence units of the type 1 d’, and then looks for the sensitivity parameter that best fits the empirical confidence data. The resulting parameter is called m e t a d .
Figure 1. The meta - d’ model [1]. (A) A signal detection theory (SDT) representation of the actor’s decisions in the type 1 task. (B) An SDT representation of the rater’s decisions in the type 2 task. (C) Empirical confidence-rating distributions conditional on type 1 correct and incorrect responses. A rater with high metacognitive sensitivity provides high confidence ratings when the actor is correct, and low confidence ratings when the actor is incorrect. A rater with low metacognitive sensitivity does not distinguish between the actor’s correct and incorrect responses. (D) A theoretical ROC curve of the type 2 task is generated by assuming a hypothetical equal-variance SDT model of confidence. The meta - d’ model [1,31,33] expresses the type 2 hit rates p ( H i t ) and false alarms p ( F A ) in terms of the divergence units of the type 1 d’, and then looks for the sensitivity parameter that best fits the empirical confidence data. The resulting parameter is called m e t a d .
Entropy 27 00353 g001
Figure 2. A Venn diagram illustrating a communication system between stimuli, actor, and rater. Uncertainty in Stimuli is depicted by a pink circle, uncertainty in an Actor’s response is illustrated by a blue circle, and uncertainty in a Rater’s confidence ratings is represented by a green circle. In this example, there are two stimuli. The actor responds by emitting one of two possible responses, and the rater expresses their confidence in the actor’s response by sending one of four confidence-rating levels.
Figure 2. A Venn diagram illustrating a communication system between stimuli, actor, and rater. Uncertainty in Stimuli is depicted by a pink circle, uncertainty in an Actor’s response is illustrated by a blue circle, and uncertainty in a Rater’s confidence ratings is represented by a green circle. In this example, there are two stimuli. The actor responds by emitting one of two possible responses, and the rater expresses their confidence in the actor’s response by sending one of four confidence-rating levels.
Entropy 27 00353 g002
Figure 3. Theoretical validation of the information measures of metacognitive efficiency using the m e t a d model [1,33]. Data are generated by increasing incrementally and in an orthogonal fashion the values of parameters d and m e t a d , and then computing the corresponding values of m e t a U , m e t a KL , and m e t a J . The resulting values are presented in a heatmap against the values of d and m e t a d . (A) m e t a U . (B) m e t a KL . (C) m e t a J .
Figure 3. Theoretical validation of the information measures of metacognitive efficiency using the m e t a d model [1,33]. Data are generated by increasing incrementally and in an orthogonal fashion the values of parameters d and m e t a d , and then computing the corresponding values of m e t a U , m e t a KL , and m e t a J . The resulting values are presented in a heatmap against the values of d and m e t a d . (A) m e t a U . (B) m e t a KL . (C) m e t a J .
Entropy 27 00353 g003
Figure 4. Examples of face pairs from the Glasgow Face Matching Task (GFMT, panels (A,B)) and from the Kent Face Matching Task (KFMT, panels (C,D)). Panels (A,C) depict the same identity (i.e., “match”), whereas panels B and D depict different identities (i.e., “mismatch”).
Figure 4. Examples of face pairs from the Glasgow Face Matching Task (GFMT, panels (A,B)) and from the Kent Face Matching Task (KFMT, panels (C,D)). Panels (A,C) depict the same identity (i.e., “match”), whereas panels B and D depict different identities (i.e., “mismatch”).
Entropy 27 00353 g004
Figure 5. Experiment 1: Frequency distributions of the major quantities recorded in the GFMT (Glasgow Face Matching Task): (A) d , (B) m e t a d , (C) M-ratio, (D) m e t a U , (E) m e t a KL , and (F) m e t a J .
Figure 5. Experiment 1: Frequency distributions of the major quantities recorded in the GFMT (Glasgow Face Matching Task): (A) d , (B) m e t a d , (C) M-ratio, (D) m e t a U , (E) m e t a KL , and (F) m e t a J .
Entropy 27 00353 g005
Figure 6. Experiment 2: Frequency distributions of the major quantities recorded in the KFMT (Kent Face Matching Task): (A) d , (B) meta - d’, (C) M-ratio, (D) m e t a U , (E) m e t a KL , and (F) m e t a J .
Figure 6. Experiment 2: Frequency distributions of the major quantities recorded in the KFMT (Kent Face Matching Task): (A) d , (B) meta - d’, (C) M-ratio, (D) m e t a U , (E) m e t a KL , and (F) m e t a J .
Entropy 27 00353 g006
Figure 7. Experiment 1: Results from the GFMT (Glasgow Face Matching Task). (AC) M-ratio as a function of m e t a U , m e t a KL , and m e t a J . (DF) Relations between m e t a U , m e t a KL , and m e t a J . Blue circles are data points, solid red line is a regression line, and red dashed lines are the 95% confidence interval.
Figure 7. Experiment 1: Results from the GFMT (Glasgow Face Matching Task). (AC) M-ratio as a function of m e t a U , m e t a KL , and m e t a J . (DF) Relations between m e t a U , m e t a KL , and m e t a J . Blue circles are data points, solid red line is a regression line, and red dashed lines are the 95% confidence interval.
Entropy 27 00353 g007
Figure 8. Experiment 2: Results from the KFMT (Kent Face Matching Task): (AC) M-ratio as a function of m e t a U , m e t a KL , and m e t a J . (DF). relations between m e t a U , m e t a KL , and m e t a J . Blue circles are data points, solid red line is a regression line, and red dashed lines are the 95% confidence interval.
Figure 8. Experiment 2: Results from the KFMT (Kent Face Matching Task): (AC) M-ratio as a function of m e t a U , m e t a KL , and m e t a J . (DF). relations between m e t a U , m e t a KL , and m e t a J . Blue circles are data points, solid red line is a regression line, and red dashed lines are the 95% confidence interval.
Entropy 27 00353 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fitousi, D. Information-Theoretic Measures of Metacognitive Efficiency: Empirical Validation with the Face Matching Task. Entropy 2025, 27, 353. https://doi.org/10.3390/e27040353

AMA Style

Fitousi D. Information-Theoretic Measures of Metacognitive Efficiency: Empirical Validation with the Face Matching Task. Entropy. 2025; 27(4):353. https://doi.org/10.3390/e27040353

Chicago/Turabian Style

Fitousi, Daniel. 2025. "Information-Theoretic Measures of Metacognitive Efficiency: Empirical Validation with the Face Matching Task" Entropy 27, no. 4: 353. https://doi.org/10.3390/e27040353

APA Style

Fitousi, D. (2025). Information-Theoretic Measures of Metacognitive Efficiency: Empirical Validation with the Face Matching Task. Entropy, 27(4), 353. https://doi.org/10.3390/e27040353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop