Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting

Munro, Neil Allan Robertson

doi:10.3390/forensicsci5010013

Open AccessReview

Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting

by

Neil Allan Robertson Munro

Sleep Disorders Centre, Nuffield House, 3rd Floor, Guy’s Hospital, Great Maze Pond, London SE1 9RT, UK

Forensic Sci. 2025, 5(1), 13; https://doi.org/10.3390/forensicsci5010013

Submission received: 10 December 2024 / Revised: 10 January 2025 / Accepted: 6 March 2025 / Published: 17 March 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Miscarriages of justice led to concerns that forensic science reports were prosecution-biassed and led to elementary errors of probability. The European Network of Forensic Science Institutes (EFNSI) and other institutes developed standards requiring reporting of the probability of evidence under all hypotheses (usually prosecution and defence hypotheses) with the likelihood ratio (LR).

L R = \frac{p (E| H_{p})}{p (E| H_{d})},

values > 1, being probative for a prosecution hypothesis. In elementary two-variable conditional probability theory (Baye’s theorem), the LR is also an updating factor which multiplies the odds of guilt for each item of evidence considered. Although this is not true for multiple-variable probability theory, the value of the LR as a valid measure of evidential probity remains. Forensic scientists are experts in evidence and should not stray into the role of the Court to consider the probability of the hypotheses given the totality of the evidence:

p (H_{p} {, H}_{d}| , E_{1}, E_{2} \dots E_{n}) .

Medical experts may be required to assist the court with diagnoses (the hypothesis), but this privilege is balanced by vigilance that experts do not stray beyond their expertise. A narrow interpretation of expertise hinders the evaluation of the evidence under hypotheses adjacent to the area of expertise. This paradox may be overcome by experts declaring competence in areas adjacent to their main area of expertise. Regulatory bodies do not currently require medical experts to adhere to EFNSI guidelines in evaluative reporting. Legal opinion is divided on whether probability theory can be applied to cases requiring medical expertise. Medical experts should, in their reports, clearly separate evaluating the probability of the evidence (where evaluative reporting should apply) and evaluating the probability of hypotheses where methodology should be prioritised over opinion. The reckless misapplication of elementary probability theory, typically transposing conditional probabilities or neglecting prior odds, may lead to the jury being misled into believing posterior odds of guilt are many orders of magnitude greater than reality. Medical experts should declare training in elementary probability theory. Inaccurate probabilities are a joint enterprise between all who inform or advise the jury, so all must be trained in elementary probability theory.

Keywords:

evaluative reporting; Bayes; Bayesian nets; forensic science; likelihood ratio

1. Introduction

In the latter half of the last century, there were miscarriages of justice, including in the UK, Australia, and the United States [1,2,3,4]. Many of these cases were attributed to unreliable forensic science, and there was said to be a crisis in forensic science [2]. Medical expert evidence was also tainted; Sally Clark was jailed for the murder of two of her children and exonerated after two appeals [5,6]. In Australia, Kathleen Folbigg was jailed for the murder of four of her children because of the same flawed statistical evidence and was exonerated in 2023 after 21 years in jail [4]. Poor forensic standards, an asymmetric prosecution-orientated presentation of forensic evidence in court, focused on opinion rather than evidence, were blamed [5,6]. Ian Evett, who had championed the use of Bayesian statistics in forensic science since the 1980s [7], led the development of the European Network of Forensic Science Institutes standards, requiring forensic scientists to consider all possible hypotheses formally. Specifically, where there are two hypotheses and one variable, the ratio of the odds of posterior and prior hypotheses, the Bayes factor, is identical to the likelihood ratio, the ratio of the probability of finding the evidence under each hypothesis [7,8]. The requirement to quote these statistics fulfilled the requirement of balance and independence. Similar guidelines were developed by the American Standards Board (ASB) of the American Academy of Forensic Science.

In 2002, Daniel Kahneman was given the Nobel Prize in Economics for work conducted over 40 years with the late Amos Tversky “for having integrated insights from psychological research into economic science, especially concerning human judgement and decision-making under uncertainty” [9]. They conducted a series of experiments on the nature of heuristic thinking and cognitive biases [10,11,12]. They identified two modes of reasoning. System 1 thinking is intuitive, heuristic thinking, often subconscious, characterised by pattern recognition, and is performed rapidly by the brain. System 1 thinking is biassed towards minimising risk and has served us well in evolution, but it is flawed and prone to errors and biases. System 2 thinking (in its fully developed form), slower analytic reasoning, and the ability to reason logically and mathematically have only emerged in the last few thousand years. In System 1 thinking, we can readily recognise, for example, the association between intoxication and violent behaviour, but it requires System 2, analytical thinking, to distinguish between the likelihood of violent behaviour in an intoxicated person and the likelihood of intoxication in a violent person. More formally, this is the distinction between the probability of finding the evidence, given a hypothesis, and the probability of a hypothesis, given the evidence.

Evidence commonly found in one circumstance or one hypothesis but rarely found in an alternative circumstance hypothesis can distinguish between the two hypotheses. So, the ratio of the conditional probabilities, the likelihood ratio, is given by the following:

L R = \frac{p r o b a b i l i t y (E g i v e n H_{p r o s e c u t i o n})}{p r o b a b i l i t y (E g i v e n H_{d e f e n c e})}

which is a measure of the usefulness or probity of the evidence [8]. It is a measure of how the evidence can distinguish between the two hypotheses of guilt and innocence. The requirement by standard guidelines in forensic evaluation that the LR is quoted with each item of evidence reporting meets the requirement that a prosecution or defence perspective does not bias forensic evidence [13,14,15,16,17,18,19]. It also meets a second requirement by assisting the court in evaluating the evidential probity of individual items of evidence, without encroaching on the court’s role in evaluating the hypothesis of guilt or innocence in the light of the totality of the evidence. So, it is the forensic scientist’s role first to source and process items of evidence and then to evaluate and report the evidential probity of each item of evidence under all hypotheses. It is the court’s role, specifically in a criminal case, the jury, to evaluate the hypotheses, given the totality of the evidence. So, the forensic scientist evaluates and reports the probability of the evidence, while the court evaluates the probability of the hypotheses given the totality of the evidence.

In summary, and somewhat formally, using “|” to denote “given”.

Forensic scientist evaluates p(Ei|Hp) and p(E|Hd).

The court evaluates (heuristically) p(Hd|sum of Ei) and p(Hp|sum of Ei).

Miscarriages of justice have occurred following problematic medical evidence [6]. The medical expert is a diagnostician, and the court requires assistance in diagnosis, so the medical expert is required by the court to venture into the area of evaluating a hypothesis (or at least assisting the jury in evaluating a hypothesis) rather than merely the probity of the evidence [19,20]. Furthermore, doctors are permitted to rely on evidence given in clinical interviews, from relatives, evidence that might otherwise contravene rules of hearsay. No wonder Charleton, Supreme Court Ireland, judge states, “Judges should always be aware of how dangerous expert evidence is” [21]. “So, forensic clinicians must make it clear if or when a matter or issue falls outside their area or areas of expertise” [22,23]. This is the making of a potential paradox: the probability of the evidence under one hypothesis may be within the area of expertise, but the probability of evidence under an alternative hypothesis may be outside the primary area of expertise. The paradox may be resolved by considering both depth and breadth of expertise and the reality that an expert in one area of medicine must be competent in related branches of medicine that may give alternative diagnoses. So, experts must explain not only their areas of expertise but their areas of competence on which they rely.

Professor Sir Roy Meadow famously asserted in the case of Sally Clark that the probability of two child deaths occurring in a middle-class family from sudden infant death syndrome (SIDS) was 1 in 73,000,000 [6]. He thought that SIDS did not run in families and that was challenged. This would reduce this figure to around 1 in 7,000,000. If this was an error, it was a relatively minor error. It was this error that was the focus of interest in the first appeal, which did not succeed. The trial was conducted with little or no evidence to support a hypothesis of murder, so ”two deaths” was the totality of the evidence. The court mistook this figure, the probability of evidence, given a hypothesis of innocence for the probability of innocence, given the evidence. Had Meadow, as current EFNSI guidelines require, also quoted the probability of the evidence given a hypothesis of murder (p(E|H_p), (the probability of double murder), estimated to be of order 1 in 5,000,000 as well, the likelihood ratio would have been of order 1. So, the mere facts of the case, “double infant death”, was not probative evidence. Meadow was criticised (and struck off the medical register before being re-instated on appeal) for going beyond his expertise in quoting statistics. Should he have been criticised for using bad statistics? Arguably, had he adhered to current recommendations and quoted the prevalence of double murder, p(E|H_p) he could have been criticised for venturing outside his expertise into the field of criminology. Is a clear understanding of statistics and probability part of the core competence of any doctor?

2. Two Cognitive Systems

Kahneman and Tversky’s work on the two cognitive systems, system 1 and system 2, was described in Kahneman’s Nobel Prize lecture, in his book, “Thinking Fast, Thinking Slow” and numerous academic publications [9,10,11]. System 1 is a mode of thinking present in all mammals of high social interaction. It has evolved for survival in the face of threat. It works rapidly and is usually effective in identifying danger when the urgency for fast decisions overrides the need for precision.

An early paper in Amos and Tversky’s work, quoted in the Nobel Prize-winning citation, was on baseline neglect in computing risk [11]. The example given was if we know that librarians are more introverted than farmers, can we say that if we meet an introvert he is more likely to be a librarian than a farmer? We evaluate the likelihood of an event on the immediate evidence but ignore the background prevalence. In Amos and Tversky’s paper, they cited a belief that knowing farmers were more extrovert than librarians, in general, we tend to believe that if we encounter an introvert, he is more likely to be a librarian than a farmer. This neglects the baseline statistic that farmers are more prevalent than librarians. So, if we meet anyone randomly without knowing their personality, we are more likely to encounter farmers than librarians. An alternative way of understanding this is that baseline neglect (ignoring prior prevalence or probability) is linked to conflating the probability of a librarian being introvert: prob(introvert given librarian) and the probability of an introvert being a librarian: prob(librarian given introvert). Put generally, system 1 cannot solve simple Bayesian problems, and unless we are trained to understand Bayesian logic, we do not have the capacity for system 2 to override this.

Discussion of these problems lists a whole host of fallacies: baseline neglect, “prosecutor’s fallacy”, and transposition of conditional probabilities. This makes the whole field more incomprehensible than it needs to be; all these errors reduce to failure to comprehend simple two-variable probability theory, leaving system one to substitute a simple problem for a more difficult problem. As seen below, these substitutions are only approximately valid if the probability of the evidence and the probability of the hypothesis are of a similar order of magnitude. In many cases, p(E)/p(H)~10⁶, and this informs the jury of an odds of guilt that is inflated by 10⁶.

Once system 2 is disabled in this way, system 1 can run amok, and availability and anchoring biases can persuade the jury that a familiar narrative of motive, intent, and action, without considering the less familiar narrative of a sleep disorder or a psychosis, is convincing evidence of guilt. This causes trump statistics. If a prosecution hypothesis portrays a more convincing mechanism than an obscure medical narrative, it may, under system 1, be more persuasive. We have evolved to fear danger and organised threats and perceive causality from random fluctuation (regression to the mean).

3. Probability and the Law [24,25,26,27,28,29]

“The chances of something happening in the future may be expressed in terms of percentage. Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. But you cannot properly say that there is a 25% chance that something has happened … Either it has, or it has not.”
Toulson LJ in Nulty v Milton Keynes Borough Council, Court of Appeal [24].

“The introduction of Bayes’ theorem into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity deflecting them from their proper task.”
R v Adams, 1996, Court of Appeal [25].

Classical probability theory (or Bayesian probability theory) defines probability as a reasonable expectation representing a state of knowledge or as quantification of a personal belief. This may be termed epistemic probability because it depends on the knowledge of the person calculating the probability. Classical probability theory began with the mathematics of games of chance in the writings of Fermat (1607–1665), Pascal (1623–1662), and Bernoulli (1655–1705). Bayes (1701–1761) left some notes found in his personal effects after his death, which showed a simple algorithm for updating probability from new information which has been highly influential in practical inference and artificial intelligence. Probability theory was made mathematically rigorous by Laplace (1814), who wrote the following:

“The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.”

A frequentist perspective on probability defines the probability of an event as the limit of its relative frequency in infinitely many trials.

It is outside the scope of this paper to address the different philosophical historical approaches to probability, which Andrey Kolgomorov unified and resolved mathematically in the 1930s.

Toulson LJ, in Nulty, was describing a frequentist perspective of probability. The frequentist definition of probability is a limitation that suggests probability cannot be used to infer cause from evidence (probabilistic inference), nor is it a perspective of probability used by scientists to predict experimental outcomes from theory. In the 1920s, there was a scientific belief that light could be both understood as a wave (obeying a wave equation) and a particle subject to scientific models of classical and relativistic mechanics, the amplitude of the wave equations defining a probability density. Classical and relativistic quantum theories, not conceptually “common sense”, have been tested to high degrees of accuracy. Scientific theories are not induced from scientific data; rather, scientific theories represent a belief system of how the world functions, which is then formulated into a theoretical model from which results are predicted. The scientific model is then tested by whether predicted results match experimental results.

If we throw a die, why do we believe that the probability of the “number 3” is 1/6? The frequentist might say if we throw a die 120 times, we find approximately 20 throws will deliver a “number 3”; if we throw the die 12,000 times, we find that about 2000 throws will deliver a “number 3”. The more we throw the die, the closer the actual proportion of “number 3” will get to 1/6.

The classical probability theorist says that we believe that dice resemble perfect cubes. The perfect symmetry of a perfect cube cannot allow for nature to distinguish between one face and another. We believe, considering our understanding of Newtonian mechanics and air resistance, that the subtle markings that distinguish the faces are unlikely to make a significant difference to that symmetry. As any poker player will know, the probability estimation depends on the estimator’s knowledge at the time of evaluation. It does not matter whether ignorance is because the event is in the future or whether ignorance is because the result of the event has been concealed from the estimator.

Toulson’s narrow understanding of probability effectively precludes the use of probability in Court when events are in the past, and it precludes the use of probability based on verified scientific models of the world and how it works.

Does Bayes’ theorem plunge the jury into the realms of theory? Consider the prevalence of assault in American colleges [30], where after parties and social dating (“encounters”), severe alcohol intoxication (about 10% have a blood alcohol > 200 mg/dL) and sexual assault are common (about 1%). See Figure 1.

From Figure 1.

p (a s s a u l t) = \frac{10}{1000} = 1 %

(1)

p (i n t o x i c a t i o n) = \frac{100}{1000} = 10 %

(2)

Of those who assault, half are intoxicated:

p (i n t o x i c a t i o n g i v e n a s s a u l t) = \frac{5}{10} = \frac{p (b o t h a s s a u l t a n d i n t o x i c a t i o n)}{p (a s s a u l t)} = \frac{0.5 %}{1 %}

(3)

Of those who are intoxicated, 5% commit assault:

p (a s s a u l t g i v e n i n t o x i c a t i o n) = \frac{5}{100} = \frac{p (b o t h a s s a u l t a n d i n t o x i c a t i o n)}{p (i n t o x i c a t i o n)} = \frac{0.5 %}{10 %}

(4)

It is plain from inspecting Figure 1, that p(intoxication given assault) is not the same as p(assault given intoxication). This simple truth does not require anybody to be “plunged into realms of unnecessary complexity”, but it does require care and attention to detail.

Dividing Equation (4) by Equation (3), we obtain the relationship between conditional probabilities:

\frac{p (a s s a u l t g i v e n i n t o x i c a t i o n)}{p (i n t o x i c a t i o n g i v e n a s s a u l t)} = \frac{p (a s s a u l t)}{p (i n o x i c a t i o n)}

(5)

This is Bayes’ theorem, and we have proved it using elementary school mathematics by inspecting a Venn diagram and being clear-headed about classifying information. We did not need to use Bayes’ theorem to understand the relationship between conditional probabilities; it is merely necessary to pay attention to detail.

In an assault case, “assault” is the prosecution hypothesis, H_p and “intoxication” might be an item of evidence, E, so we can write Equation (5) more generally as follows:

p (H_{p} g i v e n E) = p (E g i v e n H_{p}) \times \frac{p (H_{p})}{p (E)}

(6)

The same argument applies to the 99% of encounters with no assault:

p (H_{d} g i v e n E) = p (E g i v e n H_{d}) \times \frac{p (H_{d})}{p (E)}

(7)

Dividing Equation (6) by Equation (7), (noting p(guilt)/p(innocence) is the odds of guilt)

posterior odds o f g u i l t = \frac{p (E g i v e n H_{p})}{p (E g i v e n H_{d})} \times \frac{{p (H}_{p})}{{p (H}_{d})} = L R \times prior odds o f g u i l t

(8)

The jury do not need to be plunged into realms of complexity, nor do they need to use a formula. Rather, it needs to be explained that the likelihood ratio is a measure of how useful the evidence is in discriminating between guilt and innocence. If the judge is minded to instruct them, they may need to know that the likelihood ratio is an updating factor, in simple cases, updating the odds of guilt before the evidence to new odds after the evidence. Should expert evidence always be delivered by a statistician, as recommended by the president of the Royal Statistical Society after Sally Clark’s trial? Such a proposal would effectively prevent any expert from giving quantitative evidence. Anyone involved in assisting the jury to evaluate the hypotheses must be trained in elementary conditional probability theory.

In the case of R v T [31], discussed by Berger and Hamer et al. [32,33], T was found guilty of murder. We are not told how T came to the attention of the police but are told that the only piece of evidence was a footprint made by a size 10 Nike trainer. The murderer could have been any one of thousands of men with that size and model of Nike trainer. The appeal court judges had no difficulty in quashing the verdict. How could he have been found guilty in the first place? The forensic scientist estimated that 1 in 50 shoes in the community might be size 10 Nike trainers of that particular style. As to the wear, it was greater on the defendant’s shoe than on the footprint, possibly because of subsequent wear. The forensic scientist estimated that the wear was only twice as likely to be found on a defendant’s shoe, giving a likelihood ratio of 100. But that was not the odds of guilt, that was the likelihood ratio. The odds of guilt is the LR x the prior odds of guilt—but there was no other evidence, so the prior odds of guilt must have been extremely small. The presumed error in the original trial was to mistake the LR for the odds of guilt, a common error. That is “prior neglect” or “baseline neglect”, as described by Kahneman, an error that is prominent in cases with a single item of evidence [34]. Prior neglect is equivalent to setting the prior odds at 1, or believing that before the shoe evidence was considered, with no evidence of guilt or hypothesis to support guilt, T has a 50% chance of being the murderer. The appeal court accepted that the expert had used standard methodology but was sceptical that the methodology was statistically valid and ruled:

“We are satisfied that in the area of footwear evidence, no attempt can realistically be made in the generality of cases to use a formula to calculate the probabilities. The practice has no sound basis. It is clear that outside the field of DNA (and possibly other areas where there is a firm statistical basis) this court has made it clear that Bayes’ theorem and likelihood ratios should not be used.”

It was a judgement of profound significance, setting back the cause of evaluative reporting of forensic evidence, making it difficult or impossible to use probabilistic inference in court. In their judgement, which was about 50 pages, they finally concluded that the likelihood ratio was invalid, and the whole methodology was wrong. The phrase “likelihood ratio” (or ratios) appeared 81 times, while the word “prior” appeared just once in a formula quoting Bayes’ theorem from a publication on likelihood ratios. The irony of this case is that they failed to identify the problem and by failing to consider what the prior odds were or what the trial judge thought the prior odds were, the appeal court made the same error as the original trial: baseline neglect (or prior odds neglect). So, this landmark ruling was founded on a misunderstanding of an elementary two-variable probability.

These simple concepts apply to two variables, a single item of evidence and a hypothesis. An iterative approach to evaluating a hypothesis for the totality of evidence by multiplying the likelihood ratio for individual items of evidence is only valid if items of evidence are independent of each other. The reality of multiple interacting items of evidence rapidly generates numerous variables, and the calculation using conventional algebra rapidly becomes intractable. Such problems can only be solved by laying out the relationship between the items of evidence using a directed acyclic graph (DAG), a network of evidential arrangements that cannot be circular, sometimes referred to as Bayesian nets. This theory enables algorithmic pruning of multiple variables and their solution by a computer. The professional statistician (unlike the forensic scientist) is then required to encroach on the role of the court and the finder of facts to evaluate the probability of the hypothesis (rather than the probability of the evidence). Fenton’s solution to this problem is that juries should trust the computer algorithm that solves the problem of probabilistic inference as much as we trust a calculator to perform long division. But it is not the same; even if we trust the algorithm, understanding DAGs and Bayesian nets requires postgraduate training in mathematics. Aside from a logistic problem requiring every forensic scientist, and every medical witness to be accompanied by a qualified statistician, the role of the judge and jury would be usurped by a computer algorithm. This results in the unacceptable situation of a defendant asking why he was found guilty of charges and being told, “Because the computer found you so”.

Using the likelihood ratio as a marker for evidential probity provides a measure of the strength of each item of evidence, which either supports a prosecution or defence hypothesis. Combining reliable likelihood ratios heuristically may not result in an accurate figure for odds of guilt, but it is reliable in excluding certainty. Miscarriages resulting from a miscalculation of two-variable conditional probabilities: neglecting the prior hypothesis (with usually very low odds of guilt) or transposing a conditional where the error factor is p(E)/p(H)—often enormous~10⁶ or more (Sally Clark). Errors introduced by combining likelihood ratios heuristically can be compensated by biassing heuristic evaluation in favour of the defence. Where an item of evidence supports both prosecution hypotheses and defence hypotheses, it can be simplified only to support a defence hypothesis. Where a multivariate probabilistic (computer-algorithmic) analysis changes an outcome in favour of the prosecution hypothesis, it could not be admitted as evidence. What must be avoided are basic errors of elementary probability, transposing conditionals and neglecting prior odds of guilt.

In this paragraph, I have highlighted three examples of rulings which indicate a misunderstanding of elementary probability theory. Toulson LJ stated that probability could only comment on the probability of future events (rather than unknown events), contradicting mathematically established probability theory (and contradicting generations of poker players). The elementary probability required to use likelihood ratios only requires the jury to understand a notion of evidential probity, and perhaps, the concept of iteratively revising the odds of guilt as new evidence is presented. For lawyers and experts, two-variable elementary probability theory needs to be taught and understood, and the algebra required is elementary. Yet, in R v Adams, the following was ruled:

“The introduction of Bayes’ theorem into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity”

In RvT, a landmark ruling that challenges or forbids the use of EFNSI guidelines in many cases was the result of an error of baseline neglect by the original trial judge, prosecution, and defence barristers in the original trial and by three judges in the Court of Appeal.

4. A Problematic Approach to the Forensic Evaluation of Sexual Behaviours in the Sleep Period

Holoyda and colleagues [35] give an excellent review of the challenges of the psychiatrist evaluating the differential diagnosis of sexual behaviours in sleep of the defendant and the wakeful perpetration of sexual assault on the sleeping victim. They highlight the need for a forensic psychiatrist to evaluate malingered psychopathology, including underlying paraphilias such as paedophilia and somnophilia. Paedophilia is the sexual interest in prepubescent children. Somnophilia is the sexual interest in the sleeping person. They emphasise the need for collateral history from previous bedpartners who may be able to assist in identifying paraphilias and propensity for sleep-related behaviours. The authors are diligent in pointing out the difficulties:

“There are no objective tests designed to assess for feigned sexsomnia”.

On paraphilia:

“It is unknown whether individuals’ sleep-related sexual behaviours reflect their sexual orientation”,

and

“The paucity of studies examining the prevalence of paraphilic disorders in cases of sexsomnia or studies applying sexual offender risk-assessment tools in sexsomnia makes it difficult to conceptualize whether paraphilias and sexual offending risk assessments relate to sexsomnia.”

The table (Table 1) is headed “Potential clues to feigned sexsomnia”. These are reasonably argued and valid “red flags” that might alert a clinician, seeing a patient in a clinic, to look deeply, but do they pass muster as forensic evidence? If presented to a jury by a medical forensic witness, evaluating the probability of the hypothesis, might they be accepted at face value, and would the jury “rubber stamp” a conclusion of guilt or innocence by the medical expert? Has the evidential probity of the individual items of evidence been presented so the jury can evaluate the probability (or the legal certainty) of guilt?

The notion that concealing behaviour might be evidence of criminality has a long history in forensic sleep. Bonkalo [36], in a clinical series of violence in sleep, including homicide, cited covering up the evidence of homicide as evidence of guilt. But the argument was entirely circular; those who covered up their deeds were usually found guilty. It was a narrative-based conclusion based on theories of how a reasonable person might behave. It may well be that the horror of waking up to find one has killed one’s spouse would drive the reasonable person to call the police. It is a nostrum passed down from forensic sleep export to forensic sleep expert as a matter of faith. What is the probability of concealment among the innocent? What is the likelihood ratio of “concealment”? A man waking on the sofa next to his child to be told by his child that he has assaulted her should certainly inform the police, but the consequences might tempt the innocent to conceal.

The danger of the cognitive bias of “anchoring” should remind the forensic expert to use neutral language. “Complainant” is more neutral than “victim”. Repeated episodes of sexsomnia do occur. In Bjortvatn and colleagues’ [37] telephone survey, the prevalence of sexsomnia was 7.1% in a lifetime but 2.7% in the last three months. Repeated behaviours in sleep are behaviours while intent is absent, and the inability to resist the behaviour is not abuse. It may or may not be perceived as abuse by the recipient of the behaviour and is usually not perceived as abuse by bed-partners who acknowledge that it is a sleep-related behaviour, so use of the phrase “repeated episodes of sexual abuse” and the word “perpetrated” is prosecution-biassed language that should be avoided.

Parasomnias present to doctors infrequently. If the Bjorvatn study and Chung study [37,38], in abstract form only (on a non-random population), are correct, then the probability that sleep-related sexual behaviours present in the clinic is low by several orders of magnitude. Embarrassment and a reasonable fear of being misbelieved may be the reasons. The notion that an “individual genuinely concerned about the effect of his sleep-related sexual behaviour” would seek medical help is not evidence-based and may be incompatible with the observation that when sexsomnia does present in the clinic, it often presents late. It is a “system 1” narrative opinion that is not based on analysis.

The most definitive study on the recollection of parasomnias was published in 2024, three years after this paper was published, challenging the notion that recollection of parasomnias is uncommon [39]. The quoted papers to support the notion that recall is uncommon are a collection of case reports. Patients who present to the clinic with sexsomnia are not common and may not be representative of patients in the community.

New-onset sexsomnia presenting as a sole parasomnic behaviour is problematic. Again, the “research studies” referenced are the collection of case studies of patients attending clinics [40], where it is known that the characteristics are different from surveys [37,38]. The frequency of sexsomnia in the non-randomised survey, published in abstract, was 7.6% (11% of males), and only 6% of those had current (other) parasomnias [38]. In the Bjorvatn study [37], current sexsomnia was more common than current sleepwalking, but the overlap with other parasomnias was not reported. In the self-reporting internet survey [40], it was 46% who had other parasomnias. The prevalence of intercurrent parasomnias in case reports is much higher [41].

Finally, this paper recommends objective measures to assess whether patients who are charged with sexual behaviours with children should be assessed with penile plethysmography or pupillary responses to assess whether they have a sexual interest in children. This may or may not be a reliable marker of sexual interest. Of more importance (as is acknowledged by the authors of this paper) is whether sexual behaviour in sleep occurs by failure to inhibit subconscious drives due to local sleep in parasomnias, so-called directed behaviours [41]. The presumption is that sexsomnia is more common among those who are sexually attracted to the object of their behaviour, and this may explain the high incidence of sexsomnia in surveys among the normal population. So, whether the defendant is or is not, consciously or subconsciously, sexually interested in children may have no evidential probity between a sleep-related or wake-related hypothesis.

The correct approach is to evaluate the probability of finding these items of evidence among the guilty population and the probability of finding the evidence among the innocent. Raising suspicion as a substitute for evidence empowers the jury to form a narrative opinion based on their experience. They are much more likely to be familiar with a narrative of assault.

5. Conclusions

We live with the behemoth of our own system 1 thinking and, with it, all the heuristic biases. Unlike forensic experts, medical experts are required to advise the Court on diagnosis and venture into the domain of the probability of the hypothesis (see Figure 2). Like the forensic scientist, the medical expert must be objective, and the guidelines of evaluative reporting provide the necessary discipline to be impartial and avoid serious probabilistic errors in ascribing and evaluating the probity of the evidence. It provides the discipline to reflect and inform the Court where the scientific theory or the evidence base cannot provide an accurate estimate of evidential probity. It provides the framework of thinking to ensure the opinion is objective. Reports should be structured so that a clear distinction is made between the probability of the evidence under each hypothesis and advising the court on how the medical expert makes a diagnosis and the probability of hypotheses from the evidence. EFNSI guidelines should, therefore, apply in the domain of the probability of the evidence, p(E|H) (see Figure 2).

This process cannot be delegated to an expert in probability who is not in a position to evaluate specialist medical evidence. Rather, competence in elementary probability theory is a mandatory competence in anyone who advises the court on any matter. In complex cases, where probative evidence is interrelated, it may be appropriate to advise the court that a specialist in risk analysis and Bayesian nets might be able to assist with the evaluation of the evidence.

Science represents and describes our current understanding of the world, and probability is the expression of its quantification. Elementary errors in probability have resulted in very serious errors in the estimation of evidential probity, with victims of miscarriage of justice serving many years. These probabilistic fallacies continue to occur with serious consequences [26]. Two-variable elementary conditional probability is taught to school children, yet competence among expert witnesses and lawyers is not universal.

Rulings by senior lawyers, whose declarations may be binding in Court, do not always reveal competence in probability and are not always mathematically valid. The ruling in RvT is tantamount to an error of baseline neglect. The ruling by Toulson LJ is a denial of the role of epistemic probability. There must be limits to judicial fiat, and mathematically incorrect rulings must be addressed. If miscarriages of justice are to be avoided, catastrophically inaccurate estimates of evidential probity must stop. Everyone who has a role in informing the jury must demonstrate competence in elementary two-variable, school-level probability. Miscarriages, from probabilistic fallacies, are a joint enterprise.

Funding

There was no external funding for this contribution.

Conflicts of Interest

The author declares no conflict of interest.

References

Samuels, A. Forensic Science and Miscarriages of Justice. Med. Sci. Law 1994, 34, 148–154. [Google Scholar] [CrossRef]
Roberts, P.; Willmore, C.; Davis, G. The Role of Forensic Science Evidence in Criminal Proceedings; H.M. Stationery Office: London, UK, 1993. [Google Scholar]
Hackman, L. Miscarriages of Justice and the Role of the Expert Witness. In The Expert Witness, Forensic Science, and the Criminal Justice Systems of the UK; CRC Press: Boca Raton, FL, USA, 2019; pp. 29–44. [Google Scholar]
Milne, R.; Poyser, S.; Williamson, T.; Savage, S.P. Miscarriages of justice: What can we learn. In Forensic Psychology; Willan: London, UK, 2010; pp. 49–69. [Google Scholar]
O’Grady, C. Mothers damned by statistics. Science 2023, 379, 232. [Google Scholar] [CrossRef]
Glazebrook, S. Miscarriage by expert. Vic. Univ. Wellingt. Law Rev. 2018, 49, 245. [Google Scholar] [CrossRef]
Evett, I.W. Bayesian inference and forensic science: Problems and perspectives. J. R. Stat. Soc. Ser. D (Stat.) 1987, 36, 99–105. [Google Scholar] [CrossRef]
Evett, I. The logical foundations of forensic science: Towards reliable knowledge. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2015, 370, 20140263. [Google Scholar] [CrossRef]
Kahneman, D. Maps of Bounded Rationality: A Perspective on Intuitive Judgement and Choice; Nobel Prize Lecture; Aula Magna: Stockholm, Sweden, 2002. [Google Scholar]
Kahneman, D. Fast and Slow Thinking; Penguin Books, Random House: London, UK, 2011; pp. 125–228. [Google Scholar]
Tversky, A.; Kahneman, D. Judgment under Uncertainty: Heuristics and Biases. Science 1974, 185, 1124–1131. [Google Scholar] [CrossRef] [PubMed]
Gilovich, T.; Griffin, D.; Kahneman, D. (Eds.) Heuristics and Biases: The Psychology of Intuitive Judgment; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
Berger, C.E.; Buckleton, J.; Champod, C.; Evett, I.W.; Jackson, G. Expressing evaluative opinions: A position statement. Sci. Justice 2011, 51, 1–2. [Google Scholar]
Biedermann, A.; Champod, C.; Willis, S. Development of European standards for evaluative reporting in forensic science: The gap between intentions and perceptions. Int. J. Evid. Proof 2017, 21, 14–29. [Google Scholar] [CrossRef]
Biedermann, A.; Kotsoglou, K.N. Decisional dimensions in expert witness Testimony—A Structural Analysis. Front. Psychol. 2018, 9, 2073. [Google Scholar] [CrossRef]
Catoggio, D.; Bunford, J.; Taylor, D.; Wevers, G.; Ballantyne, K.; Morgan, R. An introductory guide to evaluative reporting in forensic science. Aust. J. Forensic Sci. 2019, 51, S247–S251. [Google Scholar] [CrossRef]
Gittelson, S. Evolving from Inferences to Decisions in the Interpretation of Scientific Evidence. Ph.D. Thesis, Université de Lausanne, Faculté de droit et des Sciences Criminelles, Lausanne, Switzerland, 2013. [Google Scholar]
McKenna, L.; McDermott, S.; O’Donell, G.; Barrett, A.; Rasmusson, B.; Nordgaard, A. ENFSI Guideline for Evaluative Reporting in Forensic Science: Strengthening the Evaluation of Forensic Results Across Europe (STEOFRAE); European Network of Forensic Science Institutes: Wiesbaden, Germany, 2015; pp. 30–41. [Google Scholar]
Williams, G.A.; Maskell, P.D. Embracing likelihood ratios and highlighting the principles of forensic interpretation. Forensic Sci. Int. Rep. 2021, 3, 100209. [Google Scholar] [CrossRef]
General Medical Council, Acting as a Witness in Legal Proceedings. Available online: https://www.gmc-uk.org/-/media/documents/gmc-guidance-for-doctors---acting-as-a-witness-in-legal-proceedings_pdf-58832681.pdf (accessed on 7 December 2024).
Charleton, P.; Rakhmanin, I. Expert evidence: Dangers and the enhancement of reasoning. BJPsych Adv. 2024, 30, 338–345. [Google Scholar] [CrossRef]
Rix, K. Forensic Clinicians (Physicians, Nurses and Paramedics) as Witnesses in Criminal Proceedings. Available online: https://fflm.ac.uk/wp-content/uploads/2020/12/Forensic-clinicians-as-witnesses-in-criminal-proceedings-Prof-K-Rix-Oct-2020.pdf (accessed on 7 December 2024).
Rix, K. The Code of Practice on Expert Evidence. Faculty of Forensic and Legal Medicine of the Royal College of Physicians. Available online: https://fflm.ac.uk/wp-content/uploads/2022/01/Code-of-Practice-on-Expert-Evidence-Jan-2022-Prof-K-Rix.pdf (accessed on 7 December 2024).
Nulty v Milton Keynes Borough Council [2013] EWCA Civ 15. Available online: https://vlex.co.uk/vid/michael-nulty-deceased-and-793699865 (accessed on 5 March 2025).
R v Adams, [1996] EWCA Crim 222. Available online: http://www.bailii.org/ew/cases/EWCA/Crim/1996/222.html (accessed on 5 March 2025).
Fenton, N.; Neil, M.; Berger, D. Bayes and the law. Annu. Rev. Stat. Its Appl. 2016, 3, 51–77. [Google Scholar] [CrossRef]
Fenton, N.; Neil, M.; Lagnado, D.A. A general structure for legal arguments about evidence using Bayesian networks. Cogn. Sci. 2013, 37, 61–102. [Google Scholar] [CrossRef]
Fenton, N.; Neil, M. Avoiding probabilistic reasoning fallacies in legal practice using Bayesian networks. Austl. J. Leg. Phil. 2011, 36, 114. [Google Scholar]
Fenton, N.; Neil, M. Risk Assessment and Decision Analysis with Bayesian Networks; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Munro, N.A. Alcohol and parasomnias: The statistical evaluation of the parasomnia defense in sexual assault, where alcohol is involved. J. Forensic Sci. 2020, 65, 1235–1241. [Google Scholar] [CrossRef] [PubMed]
R v T [2010] EWCA Crim 2439. Available online: http://www.bailii.org/ew/cases/EWCA/Crim/2010/2439.pdf (accessed on 10 March 2025).
Hamer, D. Discussion paper: The R v T controversy: Forensic evidence, law and logic. Law Probab. Risk 2012, 11, 331–345. [Google Scholar] [CrossRef]
Berger, C.E.; Buckleton, J.; Champod, C.; Evett, I.W.; Jackson, G. Evidence evaluation: A response to the court of appeal judgment in R v T. Sci. Justice 2011, 51, 43–49. [Google Scholar] [CrossRef]
Sangero, B.; Halpert, M. Why a conviction should not be based on a single piece of evidence: A proposal for reform. Jurimetrics 2007, 48, 43. [Google Scholar]
Holoyda, B.J.; Sorrentino, R.M.; Mohebbi, A.; Fernando, A.T.; Friedman, S.H. Forensic Evaluation of Sexsomnia. J. Am. Acad. Psychiatry Law 2021, 49, 202–210. [Google Scholar] [CrossRef]
Bonkalo, A. Impulsive acts and confusional states during incomplete arousal from sleep: Criminological and forensic implications. Psychiatr. Q. 1974, 48, 400–409. [Google Scholar] [CrossRef] [PubMed]
Bjorvatn, B.; Grønli, J.; Pallesen, S. Prevalence of different parasomnias in the general population. Sleep Med. 2010, 11, 1031–1034. [Google Scholar] [CrossRef] [PubMed]
Chung, S.A.; Yegneswaran, B.; Natarajan, A.; Trajanovic, N.; Shapiro, C.M. Frequency of sexomnia in sleep clinic patients. Sleep 2010, 33, A226. [Google Scholar]
Siclari, F. Consciousness in non-REM-parasomnia episodes. J. Sleep Res. 2025, 34, e14275. [Google Scholar] [CrossRef]
Mangan, M.A.; Reips, U.D. Sleep, sex, and the Web: Surveying the difficult-to-reach clinical population suffering from sexsomnia. Behav. Res. Methods 2007, 39, 233–236. [Google Scholar] [CrossRef]
Schenck, C.H. Update on sexsomnia, sleep-related sexual seizures, and forensic implications. NeuroQuantology 2015, 13, 518–541. [Google Scholar] [CrossRef]

Figure 1. Prevalence of assault and intoxication in American Colleges [28].

Figure 2. The medical expert must distinguish between evaluating the probability of the evidence and the probability of the hypothesis.

Table 1. Potential Clues to Feigned Sexsomnia reported in Holoyda et al. may not be valid.

Element	Explanation
Efforts to conceal behaviour	Efforts to conceal sexual acts allegedly committed while asleep demonstrate knowledge of the acts
Repeated episodes of sexual abuse perpetrated after being aware of the behaviour	An individual genuinely concerned about the effect of his sleep-related sexual behaviour would be more likely to try to reduce the risk of recurrence.
Recollection of the episode	Sexsomnia occurs during slow-wave sleep, a time when an individual is typically not conscious. Research demonstrates that full or patchy recall of alleged events occurs in a minority of cases.
New-onset sexsomnia presenting as a sole parasomnic behaviour	One-tenth to one-third of patients presenting with sexsomnia in research studies have no history of current or non-sexual parasomnia behaviour. New-onset sexsomnia with no history of other parasomnia behaviours in an individual charged with a sex offence may raise an evaluator’s suspicion.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Munro, N.A.R. Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting. Forensic Sci. 2025, 5, 13. https://doi.org/10.3390/forensicsci5010013

AMA Style

Munro NAR. Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting. Forensic Sciences. 2025; 5(1):13. https://doi.org/10.3390/forensicsci5010013

Chicago/Turabian Style

Munro, Neil Allan Robertson. 2025. "Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting" Forensic Sciences 5, no. 1: 13. https://doi.org/10.3390/forensicsci5010013

APA Style

Munro, N. A. R. (2025). Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting. Forensic Sciences, 5(1), 13. https://doi.org/10.3390/forensicsci5010013

Article Menu

Should Medical Experts Giving Evidence in Criminal Trials Adhere to EFNSI Forensic Guidelines in Evaluative Reporting

Abstract

1. Introduction

2. Two Cognitive Systems

3. Probability and the Law [24,25,26,27,28,29]

4. A Problematic Approach to the Forensic Evaluation of Sexual Behaviours in the Sleep Period

5. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI