Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing

Ligero-Acosta, Manuel; Muñoz-Pichardo, Juan M.; Gómez, María Dolores; Ripollés-Lobo, María; Valera, Mercedes

doi:10.3390/math14010167

Open AccessArticle

Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing

by

Manuel Ligero-Acosta

¹

,

Juan M. Muñoz-Pichardo

^2,*

,

María Dolores Gómez

¹

,

María Ripollés-Lobo

¹

and

Mercedes Valera

¹

Departamento de Agronomía, Escuela Técnica Superior de Ingeniería Agronómica (ETSIA), Universidad de Sevilla, Ctr. Utrera Km 1, 41013 Sevilla, Spain

²

Departamento de Estadística e Investigación Operativa, Facultad de Matemáticas, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Sevilla, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 167; https://doi.org/10.3390/math14010167

Submission received: 4 November 2025 / Revised: 26 December 2025 / Accepted: 29 December 2025 / Published: 1 January 2026

(This article belongs to the Special Issue Advances in Statistical Approaches with Applications for Multivariate Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

We propose a general methodology for constructing dynamic performance indicators (or strength metrics) in any sport that relies on comparative outcomes among competitors, using chronological positional data. Specifically, we develop a family of strength indicators for harness trotting races based on time-weighted, head-to-head results. Using the official Balearic trotting records (1990–2023), we construct win, draw, and confrontation matrices up to each event and apply a triweight kernel to reduce the influence of older results. From these matrices, we derive a family of five bounded, interpretable indicators on the interval

[0, 1]

: an overall average win rate, a category-adjusted version, and three distance-specific versions (short, medium, and long). Indicator validation is performed via predictive validation, employing regularized logistic regression models (Elastic Net) based on indicator differences between horse pairs. Standard metrics (accuracy, calibration, discrimination, and Brier score) are used for the validation analysis. The results confirm that the indicators are coherent, stable, and interpretable, demonstrating that the generic construction procedure yields robust outcomes. We conclude that these indicators establish a solid and easily updatable foundation for developing dynamic ranking systems and practical selection/handicap procedures in trotting.

Keywords:

ability indicators; category and distance adjustment; horse racing; kernel weighting; logistic regression; performance; predictive validation

MSC:

62H30; 62J12; 62P99

1. Introduction

As noted by Aldous [1], in any study that seeks to model the outcomes of a sports competition, the probability that A (a player or team) defeats B is assumed to be a specific function of the difference in their underlying ‘strength’. This premise naturally raises several questions including the following: How should a competitor’s “strength” be defined? How can this “strength” be measured?

This line of reasoning is also pursued by McHale and Morton [2], who model tennis match outcomes under the assumption that the probability of victory is a function of each player’s latent abilities or ‘strength’. The authors explicitly remark that ‘It is of interest to use the player abilities… to produce an alternative ranking system and to compare the new rankings with the official ATP rankings’. Similarly, Ricard and Legarra [3] address this idea by developing models that interpret observed rankings as the empirical outcome of an underlying hierarchy of latent, normally distributed performances of horses in competition.

Within the field of horse racing, numerous studies provide evidence that betting markets yield reasonably accurate forecasts. According to Stekler et al. [4], empirical results indicate that these markets are generally able to discriminate among horses’ potential and quality, producing probability estimates that are well aligned with the observed frequencies of victory. However, the authors also confirm a systematic deviation at the extremes of the probability distribution, commonly referred to as the ‘longshot bias’. This bias manifests as excessive wagering on highly favored horses and disproportionate betting on low-probability outcomes, motivated by the prospect of achieving higher returns. Such behavioral patterns distort the implied probabilities and thus compromise the reliability of market-based predictions when betting data are used as explanatory or predictive variables. This adverse effect of betting behavior has been extensively documented in the literature (see Lo and Bacon-Shone [5] and Snyder [6]). Consequently, betting data cannot be considered suitable for estimating intrinsic performance.

In light of the preceding arguments, if the aim is to develop indicators of horse performance or ‘strength’ in races without incorporating the horses’ phenotypic or genetic characteristics, it is reasonable to restrict the analysis to the results of previous races. The use of competitive outcomes constitutes the basis of Elo-type rating systems and standard ranking procedures (e.g., in chess, tennis, car and motorcycle racing, etc.). However, for performance indicators to provide timely information, historical data must be temporally weighted so that more recent results carry greater importance. In particular, weighting schemes based on kernel-type functions offer an objective criterion for assigning weights and align with the principle that an athlete’s (or horse’s) current form has a temporal component that diminishes over time, analogously to the ‘decay factor’ used in dynamic Elo ratings. This weighting strategy has been employed in various studies aimed at constructing models for forecasting sports outcomes (see McHale and Morton [2]). Aldous [1] further reinforces this line of reasoning by arguing that, in sporting contexts, the estimates of a player’s or team’s strength or performance must be updated after every match or competition.

Furthermore, in many sports, competitions take place under heterogeneous conditions and environments that exert a substantial influence on outcomes. For example, in tennis, the surface on which each tournament is played (clay, grass, etc.) is a decisive factor. In horse racing, race distance or specific modalities can play an important role. In trotting, variables such as race distance or start type (autostart or handicap) can significantly affect the time per kilometer, and thus must be incorporated into the performance indicators. Clearly, such aspects must be taken into account when constructing measures of performance or strength. For instance, McHale and Morton [2] explicitly account for playing surface in the model they propose for forecasting tennis match outcomes.

On the other hand, the competition category must also be taken into account when defining an indicator of sporting strength or skill. Undoubtedly, in any sport—whether individual or team-based—the category of the tournament influences athletes’ performances and the outcomes they achieve. The quality of opponents, the psychological effects on each athlete, environmental and media pressure, the suitability of the conditions under which the event is held, and even the associated sporting and financial rewards are all relevant factors that shape results. This provides a clear justification for incorporating the competition category into the definition of sporting strength.

In many sports, events or tournaments are classified into categories that determine the points awarded for official rankings. For example, tennis tournaments are categorized as ATP Grand Slam, ATP 1000, ATP 500, ATP 250, and so on. In horse racing, events with a higher competitive level (e.g., those offering larger prize purses or greater prestige) tend to attract higher-quality competitors; consequently, the results obtained in such contexts provide greater discriminative information regarding a horse’s relative strength.

The objective of this study is to develop a statistical methodology for estimating quantitative strength indicators in trotter horses, based exclusively on historical race results that are temporally weighted and adjusted for event category and distance. The definition of such indicators must incorporate the aspects discussed above, namely, the time elapsed between the event and the moment at which performance is assessed, the category of the event, and the quality of the participating rivals. It is also important to emphasize that the goal is not to predict race outcomes, as a predictive approach would require additional information on the horse’s morphological characteristics as well as specific data on the driver. Betting data are likewise not considered. Although such data are used in many outcome-prediction models, they do not necessarily convey information about a horse’s inherent potential or performance. Moreover, as reported in the literature, they may introduce undesirable noise into the construction of animal strength indicators. Although the objective is not direct prediction, validation of the proposed indicators is carried out through their predictive validity. These indicators are suitable as inputs to an Elo-type system, analogous to those used in chess. Finally, the proposed framework and general procedure are adaptable to any sporting discipline that produces comparative results among competitors, provided that chronological ranking information is available.

The remainder of this paper is structured as follows. Section 2 describes the dataset used to address the objective of the study. In Section 3, we define and outline the parameters, measures, and statistical quantities that form the basis for constructing the performance or strength indicators. The subsequent section introduces the proposed performance measures or indicators and presents an analysis of their theoretical underpinnings. In Section 5, we assess the proposed indicators using the dataset. Finally, Section 6 summarizes the main conclusions and outlines directions for future research.

2. Dataset Description

For this study, performance data from the Balearic Trotting Federation [7] collected between 1990 and 2023 were used. This database consists of a set of horse–event records, meaning that each entry contains:

Horse identifier;
Driver identifier;
Event identifier;
Associated characteristics for both the horse (sex, date of birth, time per kilometer, position, and earnings) and the event (date, racetrack, distance, type, and starting mode).

In this study, only harness (trotter) races were considered, as they constitute the majority of trotting competitions in Spain. The starting modes are either autostart or handicap; further details are provided in Gómez et al. [8]. Race distances were grouped following the same categorization used by Gómez et al. [8]:

Short: $d i s t a n c e$ < 2000 m.
Medium: 2000 $\leq d i s t a n c e \leq$ 2200 m.
Long: $d i s t a n c e >$ 2200 m.

Similarly, a new variable, race category, was created by classifying races into five groups based on the total prize money awarded:

Category A: prize money exceeding EUR 10,000
Category B: prize money between EUR 6000 and EUR 9999
Category C: prize money between EUR 3000 and EUR 5999
Category D: prize money between EUR 800 and EUR 2999
Category E: prize money less than EUR 800.

This classification follows the prize-money distribution observed in the database and reflects the practical competition levels recognized by the Balearic Trotting Federation. Table 1 provides a summary of the dataset records by distance and category.

Regarding race position, it should be noted that horses exceeding a time limit in the races are recorded as ‘A’, and their time per kilometer is not recorded. Furthermore, horses that were disqualified or withdrawn from the race were excluded from the analysis, as the interruption of their participation prevents an accurate measurement of their performance. These cases accounted for approximately

21.77 %

of the total records and were excluded to ensure comparability.

It is important to emphasize that the aim of this study is to quantify the intrinsic performance or “strength” of each horse, rather than to predict official finishing positions. In trotting competitions, particularly in handicap events, horses may cover different distances, meaning that the finishing order is not an objective or directly comparable measure of performance. For this reason, Time per Kilometer (

T p K

) is used as the primary performance metric, in line with previous studies such as Ekiz and Kocak [9], Gómez et al. [8], and Ricard and Legarra [3].

This procedure follows the general principle that a win/loss outcome can be assigned only when the performance measures of both horses are objectively comparable.

Based on the available

T p K

, the pairwise outcome between horses i and j in a race is defined as follows:

i defeats j if $T p K_{i} < {TpK}_{j}$ ;
i loses to j if ${TpK}_{i} > {TpK}_{j}$ .

Difficulties arise when one or both horses are assigned the position ‘A’, which indicates that the horse exceeded the time limit and, consequently, no

T p K

is recorded. If both horses receive ‘A’, or if one horse receives ‘A’ in a handicap race (where distances differ), there is insufficient information to determine which horse performed better. These cases represent a limited subset of all pairwise comparisons, and treating them as draws prevents the introduction of arbitrary and unverifiable outcomes.

Therefore, all pairwise outcomes for which the relative performance could not be objectively established were classified as draws. Consequently, each race yields a confrontation matrix with three possible outcomes:

$+ 1$ if i clearly out performs j (i defeats j);
$- 1$ if i is clearly out performed by j (i loses to j);
0 when the comparison is not interpretable based on observable performance (‘draw’).

In this manner, after data cleaning and exclusion of incomplete records, the final dataset comprises 421,329 records, among which there are 9271 animals, 2140 drivers, and 56,882 races or events.

3. Notation and Definitions of Parameters and Quantities

The notation and statistical quantities required for constructing the strength indicators are defined below.

N: Number of horses in the database.
The collection of events is denoted by $T = {t_{h} : h = 1, \dots, H}$ , where H represent the total number of events (or competitions), with distinct editions of the same tournament considered as separate events.
The available data cover a time interval denoted by $[0, T_{G}]$ . Thus, an event or tournament $t \in T$ occurs at a specific time instant $τ (t) \in [0, T_{G}]$ . In practice, $τ (t)$ corresponds to the calendar date of event t, expressed as the number of days elapsed since the first recorded race.
Given an event $t_{0}$ , the collection of events held prior to it, that is, within the time interval $[0, τ (t_{0})]$ , is denoted by the following:

$T [t_{0}] = {t \in T : τ (t) \leq τ (t_{0})}$

(1)
Each event or competition (for each distinct edition) is assigned a category based on an ordinal variable with five levels: $A > B > C > D > E$ . As discussed in the Introduction, victories or high placements in top-category events (A or B) are more significant than successes in lower-category events and should therefore contribute more to indicators of horse quality. Thus, given an event $t_{0} \in T$ with $τ (t_{0}) = τ_{0}$ , we consider the collection of events held prior to it, $T [t_{0}]$ . Each event in this collection can be assigned a category correction coefficient, such as

$For t \in T [t_{0}] : ω (t) = \{\begin{matrix} 1.0 & if category of t is A \\ 0.8 & if category of t is B \\ 0.6 & if category of t is C \\ 0.4 & if category of t is D \\ 0.2 & if category of t is E \end{matrix}$

(2)

Thus, the application of these coefficients weights the outcome of a Category C event at $60 %$ of the value assigned to an identical outcome in a Category A event. The coefficients were assigned according to a monotonically decreasing scheme, ensuring that higher-level events contribute proportionally more to the final indicator. This design preserves ordinal consistency and allows for future calibration through data-driven optimization.
Each event (for each distinct edition) is associated with a distance (in meters) based on an ordinal categorical variable as follows: short, medium, and long, according to the categorization described in the preceding section. It is denoted by

$\forall t \in T d i s t (t) = s (s h o r t), m (m e d i u m) or l (l o n g) .$

(3)

Given that a horse’s performance potential may vary with distance, this factor must be considered in the quality indicators designed to characterize the horses. Accordingly, correction coefficients can be assigned based on the similarity to the distance of the event $t_{0}$ under consideration, analogous to the approach used for event categories.

$\{\begin{matrix} For d i s t (t_{0}) = s & and t \in T [t_{0}] \\ δ^{[s]} (t) = 1.0 if d i s t (t) = s \\ δ^{[s]} (t) = 0.6 if d i s t (t) = m \\ δ^{[s]} (t) = 0.4 if d i s t (t) = l \\ For d i s t (t_{0}) = m & and t \in T [t_{0}] \\ δ^{[m]} (t) = 1.0 if d i s t (t) = m \\ δ^{[m]} (t) = 0.5 if d i s t (t) = s, l \\ For d i s t (t_{0}) = l & and t \in T [t_{0}] \\ δ^{[l]} (t) = 0.4 if d i s t (t) = s \\ δ^{[l]} (t) = 0.6 if d i s t (t) = m \\ δ^{[l]} (t) = 1.0 if d i s t (t) = l \end{matrix}$

(4)

As in the preceding case, the selection of these coefficients follows a criterion consistent with the study objective, though they may be adjusted if necessary. Specifically, higher similarity in distance to the target event results in higher weights, reflecting a logical and coherent weighting scheme.

To define quantities or measures related to horse outcomes, they must be referenced either to a time instant

τ \in [0, T_{G}]

or to an event or tournament

t \in T

. For any pair of horses

(i, j)

, the notation ‘

i > >_{[t]} j

’ indicates that horse i defeated horse j in event t.

Thus, given an event

t_{0}

(or an instant

τ_{0}

), we can define the matrix of prior victories to that event or instant,

W [t_{0}]

or

W [τ_{0}]

, as an

N \times N

victory indicator matrix. Each element counts the number of victories of one horse over another, defined for each pair of horses

{(i, j)}_{1 \leq i, j \leq N}

as follows:

W [t_{0}] (i, j) = # {t \in T [t_{0}] : i > >_{[t]} j}

(5)

and for

i = 1, \dots, N

,

W [t_{0}] (i, i) = 0

(6)

This matrix,

W [t_{0}] = {W [t_{0}] (i, j)}

, is not symmetric and does not account for possible draws between two horses. Furthermore, the following considerations can be made regarding it:

Given horse i, the number of ‘victories over other horses’ prior to event $t_{0}$ is given by

$W [t_{0}] [i] = \sum_{j = 1}^{N} W [t_{0}] (i, j)$

(7)
Given a pair of horses $(i, j)$ and an event or tournament $t \in T$ , we denote $(i, j) \in t$ if the pair of horses participated in that event t. Thus,

$W [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} I_{t} (i > > j)$

(8)

where $I_{t} (i > > j)$ is the indicator function:

$I_{t} (i > > j) = \{\begin{matrix} 1 & if i > >_{[t]} j \\ 0 & otherwise \end{matrix}$

(9)
$W [t_{0}] (j, i)$ represents the number of events in which horse j defeated horse i prior to event $t_{0}$ , which account for the non-symmetry of $W [t_{0}]$ . Obviously, the matrix $W [t_{0}] + W {[t_{0}]}^{⊤}$ is symmetric and represents the total number of pairwise confrontations (excluding ties) between each pair of horses over the time interval $[0, τ (t_{0})]$ .

Analogously, the draw matrix

W_{E} [t_{0}] = {W_{E} [t_{0}] (i, j)}

can be defined. A draw between horses i and j in an event t is denoted by

i \equiv_{[t]} j

, and the corresponding draw indicator function is defined as

I_{t} (i \equiv j) = \{\begin{matrix} 1 & if i \equiv_{[t]} j \\ 0 & otherwise \end{matrix}

(10)

Then,

W_{E} [t_{0}] (i, j) = # {t \in T [t_{0}] : i \equiv_{[t]} j} = \sum_{t \in T [t_{0}] : (i, j) \in t} I_{t} (i \equiv j)

(11)

and diagonal elements are null,

W_{E} [t_{0}] (i, i) = 0

.

Finally, the confrontation matrix up to event

t_{0}

is defined as

W_{F} [t_{0}] = W_{E} [t_{0}] + W [t_{0}] + W {[t_{0}]}^{⊤},

(12)

which is to say,

W_{F} [t_{0}] (i, j) = W_{E} [t_{0}] (i, j) + W [t_{0}] (i, j) + W [t_{0}] (j, i),

(13)

where the element

(i, j)

of

W_{F} [t_{0}]

represents the total number of confrontations between the pair of horses

(i, j)

prior to the instant

τ (t_{0})

. Obviously, this is a symmetric matrix.

These matrices summarize the number of victories and confrontations between each pair of horses over the entire period preceding the construction of the matrix, i.e., prior to event

t_{0}

. When estimating the likelihood of a future confrontation between two horses, particularly in event

t_{0}

, using these matrices, one captures the historical performance of each animal but conflates its current form with its average performance over the entire past. To address this, a weighting scheme must be incorporated to assign greater importance to more recent results. One approach is to apply kernel-type functions, which assign higher weights to events closer in time to the prediction or adjustment point, with the weight gradually decreasing for events further away. In this context, temporal proximity is defined by the elapsed time until the event under consideration for prediction or adjustment.

A kernel function suitable for this purpose is the so-called Triweight Kernel (Gramacki [10]):

κ (z) = \{\begin{matrix} \frac{35}{32} {(1 - {(z / p_{b})}^{2})}^{3} & z \in (- p_{b}, p_{b}) \\ 0 & o t h e r w i s e \end{matrix}

(14)

where

p_{b}

is the base period, or bandwidth, which must be fixed a priori. Note that when weighting with

κ (z)

, events occurring at a time earlier than

p_{b}

(measured in days) receive zero weight and are therefore excluded from the projection. Conversely, receive a weight close to

\frac{35}{32}

. In the present study, the bandwidth is set to four years, with the unit of time being days, so that

p_{b} = 1460

. This window was selected based on expert knowledge regarding the competitive lifespan and performance maturation of trotting horses, which typically evolve over multi-year periods. As such, a four-year span provides a biologically and contextually meaningful scale for temporal weighting in this discipline.

Kernel weighting ensures a smooth temporal decay of past results. Among several possible choices (e.g., Gaussian, Epanechnikov, or Triweight), the Triweight kernel offers compact support and finite influence, making it particularly suitable for dynamic performance modeling (Gramacki [10]).

Thus, for an event

t_{0}

the elements of the time-weighted matrix of prior victories are defined as follows:

M [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} ψ (t; t_{0}) I_{t} (i > > j)

(15)

where

ψ (t; t_{0})

is the weight determined by the kernel function for event t relative to event

t_{0}

, that is,

ψ (t; t_{0}) = κ (τ (t) - τ (t_{0}))

(16)

with

τ (t)

being the time instant at which event t occurs. Thus,

τ (t) - τ (t_{0})

represents the time elapsed between the two events.

Analogously, the time-weighted draw matrices

M_{E} [t_{0}]

can be defined, with elements:

M_{E} [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} ψ (t; t_{0}) I_{t} (i \equiv j),

(17)

and the time-weighted confrontation matrices

M_{F} [t_{0}] = M_{E} [t_{0}] + M [t_{0}] + M {[t_{0}]}^{⊤} .

(18)

Using the preceding matrices, the time-weighted victory ratio matrix,

M_{R} [t_{0}] = {M_{R} [t_{0}] (i, j)}

, can be defined, with the elements given by

M_{R} [t_{0}] (i, j) = \{\begin{matrix} \frac{M [t_{0}] (i, j)}{M_{F} [t_{0}] (i, j)} & for 1 \leq i \neq j \leq N and M_{F} [t_{0}] (i, j) \neq 0 \\ 0 & otherwise \end{matrix}

(19)

That is to say,

M_{R} [t_{0}] (i, j)

represents the ratio of horse i’s victories over horse j’s, time-weighted through the Triweight Kernel function.

These matrices collectively capture the historical head-to-head relationships among horses and provide the mathematical foundation for constructing time-weighted indicators of relative strength. This framework can be readily generalized to other sports that involve pairwise comparisons.

3.1. Corrections According to Event Category

As noted previously, when defining a strength indicator for each horse, it is important to account for the categories of the events in which the horse has participated. To this end, we construct analogous matrices that incorporate event category through the coefficients defined in Equation (2) and the prize-money levels (A–E) described in Section 2.

Each event or competition (for each distinct edition) is associated with a category represented by an ordinal variable with five levels:

A > B > C > D > E

. The correction coefficients specified in Equation (2) are applied accordingly. Based on this information, the following matrices and derived quantities can be constructed:

Time-weighted and category-corrected victory matrix:

$M^{[ω]} [t_{0}] : M^{[ω]} [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} ω (t) ψ (t; t_{0}) I_{t} (i > > j)$

(20)
Time-weighted and category-corrected draw matrix:

$M_{E}^{[ω]} [t_{0}] : M_{E}^{[ω]} [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} ω (t) ψ (t; t_{0}) I_{t} (i \equiv j)$

(21)
Time-weighted and category-corrected confrontation matrix:

$M_{F}^{[ω]} [t_{0}] = M_{E}^{[ω]} [t_{0}] + M^{[ω]} [t_{0}] + {(M^{[ω]} [t_{0}])}^{⊤}$

(22)

These matrices, respectively, represent victories, draws, and confrontations, all weighted by time and corrected according to the coefficients corresponding to event categories. Analogous to the previous procedure, the time-weighted and category-corrected victory ratio matrices can be defined as follows:

M_{R}^{[ω]} [t_{0}] (i, j) = \{\begin{matrix} \frac{M^{[ω]} [t_{0}] (i, j)}{M_{F} [t_{0}] (i, j)} & for 1 \leq i \neq j \leq N and M_{F}^{[ω]} [t_{0}] (i, j) \neq 0 \\ 0 & otherwise \end{matrix}

(23)

The category coefficients

ω (t)

follow a monotonically decreasing scheme (see Equation (2)), ensuring ordinal coherence: higher-level events contribute proportionally more to the indicator. This design is theoretically consistent and may be calibrated using data in future work, for instance, via constrained cross-validation under monotonicity.

Note that in defining the ratio associated with the pair

(i, j)

, the corresponding element from the time-weighted confrontation matrix,

M_{F} [t_{0}]

, is retained as the denominator. In other words, the ratio quantifies the time-weighted and category-corrected victories relative to the total time-weighted confrontations between the two horses. By construction,

M_{R}^{[ω]} [t_{0}] (i, j) \in [0, 1]

and inherits the time-decay induced by

ψ (t; t_{0})

. Using the uncorrected denominator

M_{F} [t_{0}] (i, j)

preserves the head-to-head frequency scale.

3.2. Corrections According to Event Distance

Analogous matrices can be constructed by incorporating event distance. Each event or competition (for each distinct edition) has a distance, measured in meters, represented by an ordinal categorical variable: short, medium, and long distance (see Section 2 for operational definitions: short < 2000 m; medium [2000, 2200] and long > 2200 m). In Equation (4), correction coefficients are defined based on the similarity between the distance of the event under consideration and that of the event being adjusted. Accordingly, the following matrices and derived quantities are defined:

Time-weighted and distance-corrected victory matrix: if $d i s t (t_{0}) = g$ with $g = s (s h o r t), m (m e d i u m), l (l o n g)$ , then

$M^{[δ]} [t_{0}] : M^{[δ]} [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} δ^{[g]} (t) ψ (t; t_{0}) I_{t} (i > > j)$

(24)
Time-weighted and distance-corrected draw matrix: if $d i s t (t_{0}) = g$ , then

$M_{E}^{[δ]} [t_{0}] : M_{E}^{[δ]} [t_{0}] (i, j) = \sum_{t \in T [t_{0}] : (i, j) \in t} δ^{[g]} (t) ψ (t; t_{0}) I_{t} (i \equiv j)$

(25)
Time-weighted and distance-corrected confrontation matrix:

$M_{F}^{[δ]} [t_{0}] = M_{E}^{[δ]} [t_{0}] + M^{[δ]} [t_{0}] + {(M^{[δ]} [t_{0}])}^{⊤}$

(26)

Analogous to the preceding procedure, the ratio matrices can be defined. That is, the time-weighted and distance-corrected victory ratio matrix is given by

M_{R}^{[δ]} [t_{0}] (i, j) = \{\begin{matrix} \frac{M^{[δ]} [t_{0}] (i, j)}{M_{F} [t_{0}] (i, j)} & for 1 \leq i \neq j \leq N and M_{F}^{[δ]} [t_{0}] (i, j) \neq 0 \\ 0 & otherwise \end{matrix}

(27)

As in the definition of the time-weighted and category-corrected victory ratio matrix, in this case the uncorrected denominator

M_{F} [t_{0}] (i, j)

is also used, preserving the head-to-head frequency scale.

4. Strength Indicators

Performance and strength indicators are fundamental metrics for assessing and evaluating athletic performance in sports science. These indicators provide quantifiable data that facilitate the evaluation of an athlete’s or team’s efficiency and effectiveness. They are essential for understanding how results vary as a function of different factors and can inform the adjustment of training programs to enhance performance. Such indicators can capture multiple dimensions, including biomechanical, psychological, technical–tactical, biological–functional, biochemical, and anthropometric–morphological aspects (see Urdampilleta et al. [11]). Extensive research has addressed this topic in team sports such as soccer (Herold et al. [12]), basketball (García et al. [13]), and baseball (Mercier et al. [14]), as well as in individual sports such as athletics (Johns et al. [15]) and cycling (Phillips and Hopkins [16]). However, when the objective is to construct an Elo-type system, indicators based on the physical or psychological characteristics of the athlete are less suitable, as such systems are primarily result-driven, as exemplified by the original Elo system in chess. Accordingly, in our study, we rely on the horse’s historical performance results to construct these indicators.

Clearly, a rigorous mathematical foundation and the application of appropriate statistical techniques and models are necessary, in line with the assertion of Lames and McGarry [17]: ‘performance analysis for purposes of theoretical advancement must make use of mathematical modeling and simulation techniques’.

Based on the matrices defined in the previous section, strength or performance measures can be constructed for each horse, incorporating information from their recent past performance.

First, we can consider the victory ratio matrix

M_{R} [t_{0}]

, in which Kernel weighting is applied to assign greater weight to recent events. The

(i, j)

-th element of this matrix,

M_{R} [t_{0}] (i, j)

, represents the time-weighted victory ratio of horse i over horse j. Accordingly, the i-th row contains the victory ratios of the i-th horse against all other horses in the study. Thus, the time-weighted mean victory ratio of the i-th horse, defined as the mean of the values in the i-th row of

M_{R} [t_{0}]

, is given by

M_{R} [t_{0}] [i] = \frac{1}{m [t_{0}] (i)} \sum_{j = 1}^{N} M_{R} [t_{0}] (i, j)

(28)

where

m [t_{0}] (i) = # \{j \in {1, \dots N} : M_{F} [t_{0}] (i, j) > 0\},

(29)

Obviously, this ratio represents a measure of the horse’s prior performance.

It is worth highlighting several key aspects of this strength indicator, as outlined in the following comments:

It takes values in the interval $[0, 1]$ and represents an index of the horse’s overall winning ability at the instant $τ (t_{0})$ , with more recent events (closer to that instant) carrying greater weight.
It only considers the victory ratios achieved against horses with which the horse has competed in at least one event prior to the instant $τ (t_{0})$ . Obviously, if the i-th individual has not participated in any event before to $t_{0}$ , (i.e., $m [t_{0}] [i] = 0$ ), then the indicator $M_{R} [t_{0}] [i]$ is not defined. In other words, this indicator is assigned only to horses that have previously participated in at least one event.
The value 0 is assigned to any horse that, across all events in which it has competed during the considered time window, has never finished ahead of another horse.
The value 1 is assigned to any horse that has won all events in which it has competed during the considered time window, meaning it has not been defeated by any other horse.
Clearly, if a horse has participated in only a few events within the considered time window, the indicator may be unstable and provide limited information about its true strength. Consequently, for young horses at the beginning of their competitive careers, this measure may be unreliable. However, as the horse competes in more events, the indicator becomes increasingly stable and emerges as a reliable measure of its strength.

To illustrate the indicator’s adequacy, values were obtained for five randomly selected horses from the database, specifically among those that have participated in at least 80 races or events and have competed across all three race distance categories, ensuring that the following conditions are met:

Horse 1: Has finished in first position two or more times in category A races.
Horse 2: Has finished in first position two or more times in category B races, but has not won any category A event.
Horse 3: Has finished in first position two or more times in category C races, but has not won any events in categories A or B.
Horse 4: Has finished in first position two or more times in category D races, but has not won any events in categories A, B, or C.
Horse 5: Has finished in first position two or more times in category E races, but has not won any events in categories A, B, C, or D.

The evolution of the indicators for these horses over time is represented in Figure 1.

The behavior pattern of the mean victory ratio supports comment 5. A slight initial increase in the indicator can be observed, reflecting the horse’s accumulation of experience and gradual stabilization of performance, followed by a decrease in the final phase of its competitive lifespan. Additionally, the indicator generally preserves the intuitive ranking of horses’ potential based on the random selection criteria employed (e.g., Horse 1 outperforming Horse 2, Horse 2 outperforming Horse 3, etc.). However, this differentiation based on performance in higher-category events is not captured by this indicator, as its definition does not account for event categories; this limitation is addressed by the category-corrected indicator described below.

Given the potential influence of event categories on the strength measure, an analogous indicator can be defined using the time-weighted and category-corrected victory ratio matrix

M_{R}^{[ω]} [t_{0}]

. Accordingly, the time-weighted and category-corrected mean victory ratio for horse i is defined as

M_{R}^{[ω]} [t_{0}] [i] = \frac{1}{m [t_{0}] (i)} \sum_{j = 1}^{N} M_{R}^{[ω]} [t_{0}] (i, j)

(30)

where

m [t_{0}] (i)

was defined in Equation (29). Only the numerator is category-modified (see Equation (23)), preserving the head-to-head frequency scale in the denominator.

This quantity can be interpreted as a measure of the winning capacity or potential of horse i in race or event

t_{0}

, or at the instant

τ (t_{0})

.

The definition of this indicator incorporates both the horse’s recent performance history (through kernel weighting within the considered time window) and the relevance of the events in which the horse has participated during that period (through the correction coefficients associated with event categories).

Consequently, it takes values in the interval

[0, 1]

and serves as an index of the horse’s overall strength or winning capacity at the instant

τ (t_{0})

, giving greater weight to more recent events and to results achieved in higher-category competitions. Furthermore, comments 2, 3, and 5 regarding

M_{R} [t_{0}] [i]

are also applicable to

M_{R}^{[ω]} [t_{0}] [i]

.

With regard to comment 4, for this ratio, a value of 1 is assigned to any horse that has participated exclusively in Category A events during the considered time window and has won all of them. Conversely, a horse competing solely in Category E events, even if it wins all of them, will not exceed a value of 0.2 for this indicator.

Following an approach analogous to that used for

M_{R} [t_{0}] [i]

, the values of this indicator,

M_{R}^{[ω]} [t_{0}] [i]

, for the five selected horses are shown in Figure 2. It is noteworthy that the differentiation of horses based on performance in higher-category events is clearly captured, providing a more accurate representation of the intuitive ranking among the selected horses.

Finally, the effect of race distance must be taken into account. To this end, we utilize the time-weighted and distance-corrected victory ratio matrices. As defined in the previous section, prior to the instant

τ_{0} = τ (t_{0})

of an event

t_{0}

, three time-weighted and distance-corrected victory ratio matrices are constructed, corresponding to the distance associated with that event,

d i s t (t_{0})

, with the weightings specified in Equation (4):

For short distance ( $d i s t (t_{0}) = s (s h o r t)$ ), at the instant $τ_{0} = τ (t_{0})$ , we consider the time-weighted and distance-corrected victory ratio matrix for short distances:

$M_{R}^{[δ, s]} [τ_{0}] = {M_{R}^{[δ, s]} [τ_{0}] (i, j)} : M_{R}^{[δ, s]} [τ_{0}] (i, j) = M_{R}^{[δ^{[s]}]} [t_{0}] (i, j) .$

(31)
For medium distance ( $d i s t (t_{0}) = m e d i u m$ ), at the instant $τ_{0} = τ (t_{0})$ , we consider the time-weighted and distance-corrected victory ratio matrix for medium distances:

$M_{R}^{[δ, m]} [τ_{0}] = {M_{R}^{[δ, m]} [τ_{0}] (i, j)} : M_{R}^{[δ, m]} [τ_{0}] (i, j) = M_{R}^{[δ^{[m]}]} [t_{0}] (i, j) .$

(32)
For long distance ( $d i s t (t_{0}) = l o n g$ ), at the instant $τ_{0} = τ (t_{0})$ , we consider the time-weighted and distance-corrected victory ratio matrix for long distances:

$M_{R}^{[δ, l]} [τ_{0}] = {M_{R}^{[δ, l]} [τ_{0}] (i, j)} : M_{R}^{[δ, l]} [τ_{0}] (i, j) = M_{R}^{[δ^{[l]}]} [t_{0}] (i, j) .$

(33)

In the preceding notation, teach matrix can be considered as corresponding to the time instant

τ_{0}

, or equivalently, the instant immediately preceding event

t_{0}

. Using these matrices, mean ratios can be calculated for each horse, serving as strength measures specific to each distance. Accordingly, the time-weighted and distance-corrected mean victory ratio for short distance for horse i is defined as

M_{R}^{[δ, s]} [τ_{0}] [i] = \frac{1}{m [t_{0}] (i)} \sum_{j = 1}^{N} M_{R}^{[δ, s]} [τ_{0}] (i, j)

(34)

where

m [t_{0}] (i)

was defined in Equation (29).

Analogously, the time-weighted and distance-corrected mean victory ratio for medium distance and the time-weighted and distance-corrected mean victory ratio for long distance for horse i are defined, respectively, as:

M_{R}^{[δ, m]} [τ_{0}] [i] = \frac{1}{m [t_{0}] (i)} \sum_{j = 1}^{N} M_{R}^{[δ, m]} [τ_{0}] (i, j)

(35)

and

M_{R}^{[δ, l]} [τ_{0}] [i] = \frac{1}{m [t_{0}] (i)} \sum_{j = 1}^{N} M_{R}^{[δ, l]} [τ_{0}] (i, j) .

(36)

For each horse, these three quantities can be interpreted as measures of the winning potential or capacity of the horse in the race or event

t_{0}

, or at the instant

τ (t_{0})

. As with the previously defined indicators, the definitions of

{M_{R}^{[δ, s]} [τ_{0}] [i], M_{R}^{[δ, m]} [τ_{0}] [i], M_{R}^{[δ, l]} [τ_{0}] [i]}

incorporate both the horse’s recent performance history (via kernel weighting within the considered time window) and the distances of the events in which the horse has participated (via the correction coefficients associated with the three distance categories). Consequently, these measures take values in the interval

[0, 1]

and, for a given distance, serve as indices of the horse’s strength or overall winning capacity at

τ (t_{0})

for events of that distance. These indices give greater weight to more recent events and to results achieved in competitions of similar distances. Additionally, comments 2, 3, and 5 regarding

M_{R} [t_{0}] [i]

also apply to these distance-corrected indicators, while comment 4 should be adapted analogously to the category-corrected ratio. Thus, each horse is assigned a separate ratio for each of the three distance modalities.

Analogously, similar graphs have been obtained for these indicators (see Figure 3). Similar comments to the preceding ones can be made regarding the evolution of the horses’ performance based on these figures.

In summary, five strength indicators have been proposed for each horse (see Table 2).

$M_{R} [t_{0}] [i]$ : time-weighted mean victory ratio, which serves as a general strength indicator, accounts solely for results within the recent past as defined by the considered time window ( $p_{b}$ ).
$M_{R}^{[ω]} [t_{0}] [i]$ : time-weighted and category-corrected mean victory ratio, which serves as a general strength indicator that accounts for results in the recent past based on the time window considered, providing greater relevance to results obtained in high-category events.
${M_{R}^{[δ, s]} [τ_{0}] [i], M_{R}^{[δ, m]} [τ_{0}] [i], M_{R}^{[δ, l]} [τ_{0}] [i]}$ : time-weighted and distance-corrected mean victory ratio for short, medium, and long distance, respectively. These serve as specific strength indicators for each of the three distance modalities, taking into account results in the recent past based on the time window considered.

5. Validation of the Strength Indicators

The previously defined indicators can be regarded as measures of a horse’s strength and performance, as they are fundamentally based on results obtained in prior events. While additional analysis is not strictly necessary to establish them as strength indicators, a validation study is recommended to confirm their appropriateness with greater certainty. Validation ensures that the proposed indicators accurately measure the intended construct and provide consistent results across different occasions. The reliability of these measures has already been illustrated graphically in Figure 1, Figure 2 and Figure 3, which show that, when evaluated over time for a given horse, the indicator values progressively stabilize, remain consistent, and generally do not exhibit abrupt or anomalous fluctuations.

There is no exact or universally accepted measure of a horse’s strength, and data on the horse’s physical or biological characteristics are not available to directly assess the indicators’ ability to capture that strength. Consequently, we focus on a form of criterion validation, evaluating the relationship between the indicator values and other related measures or proxies of the horse’s strength. In particular, we assess the indicators’ ability to predict each horse’s performance in events or tournaments, i.e., their ‘predictive validity’ (see Cronbach and Meehl [18] and Clemens et al. [19]). Specifically, for an event t in which a collection of horses

C [t]

participates, we use the strength indicators of each horse to predict the victory outcome for every pair

(i, j) \in t

(

i > >_{[t]} j

or

j > >_{[t]} i

), and then validate these predictions against the real available data. All validation is performed under a strict out-of-time protocol: for each test event

t_{0}

, the indicators and matrices are computed using only data available prior to

τ (t_{0})

.

Given a pair of horses

(i, j)

scheduled to run a specific event or tournament

t_{0}

, denote the probability that “

i > >_{[t_{0}]} j

” let

μ [t_{0}] (i, j)

. Throughout, we denote

τ_{0} = τ (t_{0})

. Let

ψ (t; t_{0}) = κ (τ (t) - τ (t_{0}))

(defined in Equation (14)) and

d_{0} = d i s t (t_{0})

. The objective is focused on estimating this probability using the strength indicators: the general strength indicators,

M_{R} [t_{0}] [i]

and

M_{R} [t_{0}] [j]

, the general strength indicators corrected by event category,

M_{R}^{[ω]} [t_{0}] [i]

and

M_{R}^{[ω]} [t_{0}] [j]

, and the strength indicators specific to the event distance

d_{0} = d i s t (t_{0})

(

d_{0} = s, m

or l),

M_{R}^{[δ, d_{0}]} [τ_{0}] [i]

(or

M_{R}^{[δ, d_{0}]} [t_{0}] [i]

) and

M_{R}^{[δ, d_{0}]} [τ_{0}] [j]

(or

M_{R}^{[δ, d_{0}]} [t_{0}] [j]

).

Using these indicators, the following differences are defined:

\begin{matrix} D [t_{0}] [i, j] & = M_{R} [t_{0}] [i] - M_{R} [t_{0}] [j] \\ D^{[ω]} [t_{0}] [i, j] & = M_{R}^{[ω]} [t_{0}] [i] - M_{R}^{[ω]} [t_{0}] [j] \\ D^{[δ]} [t_{0}] [i, j] & = M_{R}^{[δ, d_{0}]} [t_{0}] [i] - M_{R}^{[δ, d_{0}]} [t_{0}] [j] \end{matrix}

(37)

The probability

μ [t_{0}] (i, j)

is intended to be estimated through a logistic regression model (see McCullagh and Nelder [20]), considering the three differences in strength indicators as predictor variables. That is, by considering the model,

μ [t_{0}] (i, j) = \frac{exp \{β_{1} D [t_{0}] [i, j] + β_{2} D^{[ω]} [t_{0}] [i, j] + β_{3} D^{[δ]} [t_{0}] [i, j]\}}{1 + exp \{β_{1} D [t_{0}] [i, j] + β_{2} D^{[ω]} [t_{0}] [i, j] + β_{3} D^{[δ]} [t_{0}] [i, j]\}}

(38)

or

log \frac{μ [t_{0}] (i, j)}{1 - μ [t_{0}] (i, j)} = β_{1} D [t_{0}] [i, j] + β_{2} D {[t_{0}]}^{[ω]} [i, j] + β_{3} D {[t_{0}]}^{[δ]} [i, j] .

(39)

For the estimation of the logistic regression model, a random sample or collection of mutually independent sample data is necessary. Therefore, from the collection of horses included in the database starting in 2005, a random sample was selected following the procedure described in Appendix A, for each of the three distances associated with the events.

It should be noted that the selection of pairs was performed randomly such that each horse appears at most once in the collection of pairs, and from each race or event only a single pair of horses was considered at most. Thus, the collection of records or cases included in

M [m]

can be considered mutually independent and, consequently, can be used for the adjustment of the logistic regression model.

The same procedure is applied for the other two distances, resulting in three samples

M [s]

,

M [m]

and

M [l]

, each with its corresponding dataset. The sample sizes obtained are 1823, 2244, and 1995 records, respectively. The logistic regression model is then fitted to each of these datasets. Given the potential for multicollinearity among the explanatory variables, the ‘elastic net’ regularized logistic regression model (Zou and Hastie [21]) is applied, utilizing the “glmnet” library in R (see Friedman et al. [22] and Tay et al. [23]). The regularization criterion implemented in the aforementioned library can be selected as a mixture of ridge and LASSO regularization, with a parameter

α

that takes values in the interval

[0, 1]

(

α = 0

for ridge regularization;

α = 1

for LASSO regularization). The regularization parameter

λ

is selected through cross-validation using the ‘cv.glmnet()’ function of the library, specifically to achieve the maximum value of the ‘area under the curve’ (AUC) criterion of the associated classification rule.

Thus, for each of the event distances (short, medium, long), models were obtained for the values of

α \in {0, 0.1, 0.2, \dots, 1}

, with the

λ

parameter determined via cross-validation. Therefore, the fitted models were obtained for each distance. Table 3 presents the results obtained for LASSO regularization, ridge regularization, and the optimal

α

value according to the AUC criterion. The AUC values for all tested

α

levels across the three event distances are illustrated in Figure 4. In addition, the table includes the parameters for the regularization fitting (

lambda

values) and the estimators of the model parameters for each event distance:

\{({\hat{β}}_{1}^{[d]}, {\hat{β}}_{2}^{[d]}, {\hat{β}}_{3}^{[d]}), d = s, m, l\}

.

In order to compare these results with those obtained using other classification methods, we applied the following techniques to each of the samples using the caret library (Kuhn [24]): Support Vector Machines with Linear Kernel, Support Vector Machines with Polynomial Kernel, Neural Networks, Bagging (Bagged CART), Random Forest, and eXtreme Gradient Boosting. The results, according to the AUC criterion, are comparable to those achieved through regularized logistic regression (see Appendix B). Since the latter approach enables a more straightforward interpretation of the role of each predictor based on the Strength indicators—via the estimated odds ratios and the coefficients in the linear predictor—we opted to continue the validation of these indicators using the logistic regression model. Nevertheless, the validation process could be carried out using any of the aforementioned techniques, yielding virtually identical results.

Based on the coefficient estimates reported in Table 3, the probabilities outlined in Equation (38) can be estimated; that is:

\hat{μ} [t_{0}] (i, j) = \frac{exp \{{\hat{β}}_{1}^{[d_{0}]} D [t_{0}] [i, j] + {\hat{β}}_{2}^{[d_{0}]} D^{[ω]} [t_{0}] [i, j] + {\hat{β}}_{3}^{[d_{0}]} D^{[δ]} [t_{0}] [i, j]\}}{1 + exp \{{\hat{β}}_{1}^{[d_{0}]} D [t_{0}] [i, j] + {\hat{β}}_{2}^{[d_{0}]} D^{[ω]} [t_{0}] [i, j] + {\hat{β}}_{3}^{[d_{0}]} D^{[δ]} [t_{0}] [i, j]\}}

(40)

con

d_{0} = d i s t (t_{0})

.

Thus, the validation of each model will be based on comparing the actual results registered in the binary variable Y with the values fitted by the model, according to the rule:

For the pair

(i, j)

in event

t_{0}

:,

If $\hat{μ} [t_{0}] (i, j) > 0.5$ then $\hat{Y} = 1$ ,

If $\hat{μ} [t_{0}] (i, j) \leq 0.5$ then $\hat{Y} = 0$ .

This classification rule is applied to all possible pairs of competing horses in each event from the year 2005 onward.

The total number of records, or horse pairs, for which the probabilities were estimated was 676,047. This large dataset enables a comprehensive evaluation of the forecasting performance based on the horses’ strength indicators. For this assessment, we follow the framework proposed by Williams et al. [25] in the context of ATP tennis match forecasting. Specifically, the performance measures considered include prediction accuracy, calibration, model discrimination, and the Brier Score [26]. The rationale for using these measures is that, while accuracy is often viewed as the most desirable property in predictive modeling, sensitivity to potential bias is also critical (Irons et al. [27]). Definitions of the aforementioned model performance measures are provided below:

Prediction accuracy ( $A c c$ ) is a measure of the number of correctly predicted matches, that is, $A c c$ measures how well a model’s predicted outcomes match reality, calculated as the ratio of correct predictions to total predictions.
The calibration ratio C is calculated as the sum of the victory probabilities of the horse with the highest probability, divided by the number of pairs or records in which this horse actually wins. For an event $t \in T$ , let the collections of pairs or records be:

$\begin{matrix} P_{t} = \{(i, j) \in t : \hat{μ} [t] (i, j) > 0.5\} \\ W_{t} = \{(i, j) \in t : i > >_{[t]} j and \hat{μ} [t] (i, j) > 0.5\} \end{matrix}$

(41)

Obviously, $W_{t} \subseteq P_{t}$ . We denote the collections for all events considered in the validation set as

$W = ⋃_{t \in T} W_{t} and P = ⋃_{t \in T} P_{t}$

(42)

Thus, the calibration ratio C is defined as

$C = \frac{1}{# (W)} \sum_{t \in T} \sum_{(i, j) \in P_{t}} \hat{μ} [t] (i, j)$

(43)

As noted by Williams et al. [25], the closer the value is to one, the better calibrated and less biased the prediction method will be, and consequently, the more representative of reality the estimated probabilities are. If the model prioritizes the victory of the horses with the highest probability, the calibration ratio may be greater than one. Conversely, if the ratio is less than one, it means the model underestimates the horses with the highest estimated probability.
The discrimination D metric is calculated as the mean of the estimated probabilities of the pairs where the horse with the higher probability won, minus the mean of the estimated probabilities of the pairs where that horse lost (surprises). For an event $t \in T$ , let the collections of pairs be $W_{t}$ and

$W_{t}^{*} = \{(i, j) \in t : j > >_{[t]} i and \hat{μ} [t] (i, j) > 0.5\} .$

(44)

Respectively, let the unions for the collection of all events or tournaments considered be $W$ and

$W^{*} = ⋃_{t \in T} W_{t}^{*},$

(45)

with their mean estimated probabilities:

$\hat{μ} (W) = \frac{1}{# (W)} \sum_{t \in T} \sum_{(i, j) \in W_{t}} \hat{μ} [t] (i, j) and \hat{μ} (W^{*}) = \frac{1}{# (W^{*})} \sum_{t \in T} \sum_{(i, j) \in W_{t}^{*}} \hat{μ} [t] (i, j) .$

(46)

If $# (W^{*}) = 0$ , we consider $\hat{μ} (W^{*}) = 0$ . Thus, the discrimination metric is defined as

$D = \hat{μ} (W) - \hat{μ} (W^{*}) .$

(47)

Obviously, high values of this measure reflect greater discriminatory power.
The Brier score [26] is defined as the average sum of the squared differences between a predicted probability and the actual outcome in matches between two horses,

$S_{B} = \frac{1}{D_{B}} \sum_{t \in T} \sum_{(i, j) \in t} {\{\hat{μ} [t] (i, j) - I_{t} (i > > j)\}}^{2}$

(48)

where the total number of pairs or records analyzed, denoted as $D_{B}$ , is given by

$D_{B} = \sum_{t \in T} n [t] (n [t] - 1)$

(49)

where $n [t]$ is the number of horses that participated in event $t \in T$ . Obviously, a Brier score of 0 means perfect accuracy, and a Brier score of 1 means perfect inaccuracy. If the decision rule is practically random, i.e., with victory (or defeat) probabilities close to $0.5$ , $S_{B}$ takes values close to $0.25$ . Consequently, values less than $0.25$ will indicate ‘good accuracy’.

Table 4 presents the results of the four performance measures for the Elastic Net regularized models across each race distance. For short distances, given the equivalent performance in model fitting according to the

A U C

criterion (see Table 3), the equality in the

A c c

measure, and better values for the C, D and

S_{B}

metrics, the model with ridge-LASSO mixture regularization (

α = 0.3

) is selected, as highlighted in bold in the table. Similarly, for medium and long distances, ridge regularization (

α = 0

) and LASSO regularization (

α = 1

) are selected, respectively.

The last row of Table 4 reports the global validation measures across all records combined. Globally, the results show good accuracy and Brier scores, very good calibration, and low discriminatory capacity. Nevertheless, these findings indicate that the horses’ strength or performance indicators enable the prediction of sporting outcomes with acceptable accuracy. In other words, the association between a horse’s indicators at a given time and its results in an event occurring at that time is confirmed.

To analyze these results, it must be taken into account that the models based on the horse strength indicators estimate the victory probabilities

\hat{μ} [t_{0}] (i, j)

(defined in Equation (38)) of one horse over another in each of the races or events. Furthermore, the models do not include as covariate characteristics that describe the horses’ morphology or genetics, variables describing the specific event or the conditions under which it takes place. Neither are pre-race betting data included, which is information widely used in sports predictions. Finally, no traits or characteristics associated with the drivers steering each horse have been considered. An alternative approach might have involved constructing strength indicators for the combined ‘horse–driver’ pair, but such metrics would not accurately represent the horse’s intrinsic strength and were therefore excluded from the analysis. Therefore, reported performance should be considered a lower bound relative to models including betting or morpho-genetic covariates.

To establish a frame of reference for the values achieved by these metrics, we can consider some studies where prediction models are applied to sporting competitions.

In the work by McHale and Morton [2], various models are proposed to study the outcomes of tennis matches in ATP tournaments. The accuracy or proportion of matches in which the higher ranked player won, for rankings derived from the official ATP rankings and five different models, takes values in the interval $[0.64, 0.66]$ . The prediction is based on the ATP rankings, which can be considered a measure of the tennis player’s ‘strength’.
Spann and Skiera [28] compare the forecasting accuracy of different methods and evaluate their ability to systematically generate profits in a betting market. They report the results of an empirical study using match data from three seasons of the German Bundesliga (first division football). The accuracy or hit rate of the proposed models does not exceed 55%.
Gifford and Bayrak [29] desarrollan predictive analytics models to forecast the NFL games outcomes in a season using decision trees and logistics regression. Using the 2002–2018 NFL data, this study focuses on developing and constructing predictive models to quantify the influence of team statistics on the 2018 NFL regular season wins. En los modelos se incluyen como variables predictoras, además de número de victorias y derrotas previas de los equipos, variables descriptivas del desarrollo de los partidos (total yards gained by rushing, total yards gained by offense, team turnovers lost…) La presencia de estas variables permite to predict the outcomes of the NFL games with high accuracy. The misclassification rate for the decision tree model is approximately 0.216 while the logistic regression is 0.169.
In the previously cited work by Vaughan Williams et al. [25], the purpose is to examine the performance of different forecasting methodologies for both men’s and women’s professional tennis matches. The authors utilize various variables of prior athlete performance (the official men’s tennis and women’s tennis rankings, the standard Elo ratings or the surface-specific Elo ratings). The authors apply their methodologies to several of the most relevant ATP tournaments (Wimbledon, US Open, Australian Open and French Open), thus featuring the world’s best tennis players. The following measures of predictive capacity and fit for match outcome prediction were achieved across their models:: accuracy, $A c c \in [0.66, 0.74]$ ; calibration ratio, $C \in [0.71, 0.77]$ , discrimination, $D \in [0.6, 0.11]$ ; Brier score, $S_{B} \in [0.18, 0.21]$ .

The analysis of the preceding references allows us to conclude that the results obtained by the models fitted in this work should be considered good, given the inherent difficulty of predicting outcomes in equine sporting events, and considering that the sole variables included in the models are indicators based purely on prior results. Consequently, the validation analysis, based on the principle of predictive validation, confirms that the defined indicators are valid and adequate for providing information on the strength of the horses; that is, they ‘measure what they intend to measure’ and offer consistent and reliable results.

6. Discussion and Conclusions

This study introduces a family of five purely result-based strength indicators for trotter horses, incorporating (i) time-decay via kernel weighting, (ii) competition level through category coefficients, and (iii) event–distance adjustments. These indicators are straightforward to compute, interpretable on the

[0, 1]

scale, and can be updated after each race; they provide the building blocks for an Elo–type ranking system.

It is important to emphasize that this work does not aim to address the prediction of race or sporting event outcomes. While this problem has attracted considerable attention from researchers and betting agencies, success has not always been achieved. Numerous studies have been published in this domain, and many more presumably remain unpublished. Wunderlich and Memmert [30] provide an overview of key topics in sports outcome forecasting, highlighting the central role of ratings as an intermediate step in predictive models and discussing the challenges associated with evaluating the quality of ratings-based forecasts. Accordingly, the construction of strength or performance indicators can serve as a first step toward (i) facilitating the development of Elo-type ranking or classification systems and, potentially, (ii) enabling more accurate outcome prediction through these systems. The present study focuses exclusively on the first step: the construction of the indicators.

The proposed methodology relies on the horse’s historical outcomes, emphasizing the critical importance of up-to-date information and the need for performance or strength measures to be updated after each event (Aldous [1]). To this end, a weighting system is applied to previous results, assigning greater importance to more recent outcomes through the use of kernel functions. Additionally, two other factors play a central role in the methodology: a categorization scheme for sporting events based on their sporting or economic significance, and the conditions under which the events take place, which can influence the performance of the athlete, team, or animal. In the context of this study, these factors correspond to the race category, determined by prize money, and the race distance.

Based on a collection of historical data regarding events or competitions and athletes or teams, the general methodology proposed can be summarized as follows

Phase 0 Prerequisites:
-
Define the temporal weighting scheme.
-
Define the categories of the events or competitions and the corresponding weighting scheme based on category.
-
Define the collection $Δ$ of event development conditions (e.g., track type, race distance…)
Phase 1 Construction of Ratio Matrices.
-
Construction of the time-weighted victory ratio matrices, $M_{R} [τ]$ , for each instant $τ$ of interest, or $M_{R} [t]$ , for the instant immediately preceding an event or competition t.
-
Construction of the time-weighted and category-corrected victory ratio matrices, $M_{R}^{[ω]} [τ]$ , for each instant $τ$ of interest, or $M_{R}^{[ω]} [t]$ , for the instant immediately preceding an event or competition t.
-
For each condition $δ \in Δ$ , construction of the time-weighted and condition-corrected victory ratio matrices, $M_{R}^{[δ]} [τ]$ , for each instant $τ$ of interest, or $M_{R}^{[δ]} [t]$ for the instant immediately preceding an event or competition t.
Phase 2 Construction of Individual Strength Indicators (Athlete, Team, Horse, etc.).
-
For each individual i, construction of the general strength indicator: time-weighted mean victory ratio $M_{R} [τ] [i]$ for each instant $τ$ of interest, or $M_{R} [t] [i]$ for the instant immediately preceding an event or competition t, defined as the average of the i-th row of the matrix $M_{R} [τ]$ or $M_{R} [t]$ , respectively.
-
For each individual i, construction of the general strength indicator: time-weighted and category-corrected mean victory ratio $M_{R}^{[ω]} [τ] [i]$ for each instant $τ$ of interest, or $M_{R}^{[ω]} [t] [i]$ for the instant immediately preceding an event or competition t, defined as the average of the i-th row of the matrix $M_{R}^{[ω]} [τ]$ or $M_{R}^{[ω]} [t]$ , respectively.
-
For each individual i and for each condition $δ \in Δ$ , construction of the specific strength indicator: time-weighted and condition-corrected mean victory ratio $M_{R}^{[δ]} [τ] [i]$ for each instant $τ$ of interest, or $M_{R}^{[δ]} [t] [i]$ for the instant immediately preceding an event or competition t, defined as the average of the i-th row of the matrix $M_{R}^{[δ]} [τ]$ or $M_{R}^{[δ]} [t]$ , respectively.

The procedure described above for developing and constructing the indicators must be complemented by a validation study. Such a step enables the use of these indicators as the foundation for a monitoring and evaluation system to track the progression of athletes’ strength and performance. The validation procedure may vary depending on the availability of relevant information, but it generally involves applying a criterion validation approach, which assesses whether the indicators correlate with a recognized “gold standard” or established outcome. More specifically, a predictive validity framework is employed, evaluating the extent to which the new indicators can predict future outcomes. This framework is adopted in the present study and can be summarized as follows:

For each sporting event t included in the historical data collection $T$ ,
-
Let $τ = τ (t)$ be the instant at which the event was held. Obtain the collection of performance indicators for the participants at that instant:

$\{M_{R} [τ] [i], M_{R}^{[ω]} [τ] [i], M_{R}^{[δ]} [τ] [i] : i \in t, δ \in Δ\}$

(50)

-
The prediction model or technique is applied using these strength indicators, and the measure of success or failure in that event is recorded.
The predictive capacity is evaluated through the goodness-of-fit measures and performance metrics of the applied prediction model or technique.

The results obtained by applying this methodology to horse racing can be considered positive, as five indicators of each horse’s strength or performance were constructed based solely on prior outcomes. These indicators are computationally straightforward, easily interpretable, useful for comparing performance among horses, and exhibit adequate stability, ultimately demonstrating reliable measures of a horse’s strength. Furthermore, the predictive capacity of these indicators is strong, confirming that they are validated as effective measures according to the ‘predictive validity’ criterion.

In summary, the following practical implications, limitations, and future directions can be expressed:

Practical implications. The proposed indicators enable within-season monitoring of form, transparent comparisons across competition levels, and distance-specific profiling; they can also serve as inputs to Elo-type rankings and to simple selection/handicap rules.
Limitations. Indicators are purely result-based and ignore morphology/genetics, driver effects, track/weather and betting information; pairwise observations are event-clustered; and the category/distance weights were fixed a priori (monotone but not data-calibrated). Performance in Section 5 (see Table 4) should therefore be viewed as a transparent baseline.
Future work. We plan (i) data-driven calibration of $ω$ and $δ$ under monotonic constraints, (ii) hierarchical/Bayesian formulations to propagate uncertainty and separate horse/driver effects, (iii) sensitivity analyses of the kernel and bandwidth $p_{b}$ , (iv) integration of track/driver/betting covariates for full forecasting systems, and (v) external validation across other trotting circuits and gallop racing.

Author Contributions

Conceptualization: M.L.-A., J.M.M.-P. and M.V.; Data curation: M.L.-A., M.D.G., M.R.-L. and M.V.; investigation, resources and visualization: M.L.-A., J.M.M.-P. and M.V.; methodology and formal analysis: M.L.-A. and J.M.M.-P.; supervision: all authors; writing—original draft preparation: M.L.-A., J.M.M.-P. and M.V.; Project administration and funding acquisition: M.V.; writing—review and editing: M.L.-A., J.M.M.-P. and M.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the PRJ202003868 project, “Genetic evaluation of the Spanish Trotter improvement program: estimation of genetic parameters and genetic evaluation of breeding stock for trotting functional aptitude,” through an agreement between the Research Foundation of the University of Seville (FIUS) and the Association of Breeders and Owners of Trotter Horses (ASTROT).

Data Availability Statement

The dataset supporting the results of this study was supplied by the Balearic Trotting Federation. The dataset analyzed during the current study is available at https://doi.org/10.5281/zenodo.17464691.

Acknowledgments

The authors wish to thank the Association of Breeders and Owners of Trotter Horses (ASTROT) for providing the data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Sample-selection procedure for the mutually independent records used in fitting the logistic regression models for each race distance.

The procedure for the medium distance (m) is included. For other distances, the procedure is identical. Sample

M [m]

is generated as follows:

Step 0. Selection of the set of races and the set of horses.
-
Let $T [m]$ denote the set of events held at distance m, and let $C [m]$ denote the corresponding set of participating horses.
-
Initialize: $C_{s} [m] = \emptyset$ , $T_{s} [m] = \emptyset$ , and $M [m] = \emptyset .$
Step p ( $p = 1, 2, \dots$ ) Selection and recording of the p-th sample case.
-
Selection:
∗
$t_{p} \in T [m] - T_{s} [m]$ is randomly selected.
∗
Let $P [t_{p}] = {(i, j) : i, j \in C [m] - C_{s} [m], (i, j) \in t_{p}, i ≢_{t_{p}} j}$ denote the collection of records or pairs.
∗
If $P [t_{p}] = \emptyset$ , go to Step p-Update of the subcolletions.
∗
If $P [t_{p}] \neq \emptyset$ , randomly select a pair $(h_{p}, k_{p}) \in P [t_{p}]$ and include it in the sample.

$M [m] = M [m] \cup (h_{p}, k_{p})$

-
Registration. For the pair or case $(h_{p}, k_{p})$ , the explanatory variables are registered,

$x_{1, p} = D [t_{p}] [h_{p}, k_{p}], x_{2, p} = D^{[ω]} [t_{p}] [h_{p}, k_{p}], x_{3, p} = D^{[δ]} [t_{p}] [h_{p}, k_{p}],$

and the binary target variable is registered,

$y_{K} = \{\begin{matrix} 1 & if h_{p} > >_{[t_{p}]} k_{p} \\ 0 & otherwise \end{matrix}$

-
Update of the subcollections
∗
$T_{s} [m] = T_{s} [m] \cup {t_{p}}$ .
∗
$C_{s} [m] = C_{s} [m] \cup {h_{p}, k_{p}}$ .
∗
If $T [m] - T_{s} [m] = \emptyset$ or $C [m] - C_{s} [m] = \emptyset$ STOP.

Appendix B

This appendix presents the results obtained from applying several supervised learning techniques to validate the strength indicators proposed in this study. The following methods were implemented using the caret library (Kuhn [24]): Support Vector Machines with a Linear Kernel, Support Vector Machines with a Polynomial Kernel, Neural Networks, Bagging (Bagged CART), Random Forest, and eXtreme Gradient Boosting. Additionally, the elastic net–regularized logistic regression model, fitted using the optimal

λ

and

α

parameters, was included to compare its performance with that of these supervised learning methods.

First, these methods were applied to construct classification models using the independent record samples obtained according to the procedure described in Appendix A. These models were then employed for the predictive validation of the indicators. The results are summarized in Table A1.

Table A1. Fitting models and predictive validation for several supervised learning techniques.

Dist.	Method	Fitting Model		Predictive Validation
		N	$A U C$	N	$A c c$	C	D	$S_{B}$
Short	Logistic Reg.	1491	$0.7312$	129,914	0.6514	0.9836	0.0492	0.2158
	SVM linear		$0.7292$		0.6519	0.9851	0.0491	0.2158
	SVM polinom.		$0.7284$		0.6521	0.9852	0.0487	0.2159
	Neural net.		$0.7271$		0.6512	0.9983	0.0499	0.2159
	Bagging		$0.6675$		0.6033	1.1944	0.0469	0.2512
	Random forest		$0.6819$		0.6156	1.1393	0.0484	0.2386
	XGBoost		$0.7185$		0.6447	1.0209	0.0483	0.2189
Med.	Logistic Reg.	1697	$0.7285$	321,768	0.6702	0.9459	0.0462	0.2111
	SVM linear		0.7269		0.6690	0.9362	0.0467	0.2106
	SVM polinom.		0.7279		0.6690	0.9312	0.0461	0.2107
	Neural net.		0.7269		0.6688	0.9595	0.0425	0.2105
	Bagging		0.6534		0.6209	1.1263	0.0491	0.2403
	Random forest		0.6650		0.6365	1.0751	0.0470	0.2287
	XGBoost		0.7132		0.6657	0.9698	0.0459	0.2122
Long	Logistic Reg.	1470	$0.7169$	224,365	0.6581	0.9637	0.0453	0.2150
	SVM linear		0.7163		0.6579	0.9611	0.0461	0.2148
	SVM polinom.		0.7165		0.6129	1.3083	0.0747	0.2419
	Neural net.		0.7147		0.6561	0.9726	0.0450	0.2149
	Bagging		0.6551		0.6065	1.1829	0.0533	0.2494
	Random forest		0.6627		0.6167	1.1212	0.0560	0.23723
	XGBoost		0.7083		0.6498	0.9895	0.0428	0.21804

N sample size;

A U C

area under curve;

A c c

accuracy; C calibration; D discrimination;

S_{B}

Brier score.

References

Aldous, D. Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability? Stat. Sci. 2017, 32, 616–629. [Google Scholar] [CrossRef]
McHale, I.; Morton, A. A Bradley-Terry type model for forecasting tennis match results. Int. J. Forecast. 2011, 27, 619–630. [Google Scholar] [CrossRef]
Ricard, A.; Legarra, A. Validation of models for analysis of ranks in horse breeding evaluation. Genet. Sel. Evol. 2010, 42, 3. [Google Scholar] [CrossRef] [PubMed]
Stekler, H.O.; Sendor, D.; Verlander, R. Issues in sports forecasting. Int. J. Forecast. 2010, 26, 606–621. [Google Scholar] [CrossRef]
Lo, V.S.; Bacon-Shone, J. Probability and Statistical Models for Racing. J. Quant. Anal. Sport. 2008, 4, 11. [Google Scholar] [CrossRef]
Snyder, W.W. Horse racing: Testing the efficient markets model. J. Financ. 1978, 33, 1109–1118. [Google Scholar] [CrossRef]
Federación Balear de Trote. Available online: https://federaciobaleardetrot.com/index.php (accessed on 18 September 2025).
Gómez, M.D.; Menéndez-Buxadera, A.; Valera, M.; Molina, A. Estimation of genetic parameters for racing speed at different distances in young and adult Spanish Trotter horses using the random regression model. J. Anim. Breed. Genet. 2010, 127, 385–394. [Google Scholar] [CrossRef]
Ekiz, B.; Kocak, O. Phenotypic and genetic parameter estimates for racing traits of Arabian horses in Turkey. J. Anim. Breed. Genet. 2005, 122, 349–356. [Google Scholar] [CrossRef]
Gramacki, A. Nonparametric Kernel Density Estimation and Its Computational Aspects; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
Urdampilleta, A.; Martínez-Sanz, J.M.; Cejuela, R. Indicadores del rendimiento deportivo: Aspectos psicológicos, fisiológicos, bioquímicos y antropométricos. EFDeportes.com Rev. Digit. 2012, 17. Available online: https://www.efdeportes.com/efd173/indicadores-del-rendimiento-deportivo.htm (accessed on 18 September 2025).
Herold, M.; Kempe, M.; Bauer, P.; Meyer, T. Attacking Key Performance Indicators in Soccer: Current Practice and Perceptions from the Elite to Youth Academy Level. J. Sport. Sci. Med. 2021, 20, 158–169. [Google Scholar] [CrossRef]
García, J.; Ibáñez, S.J.; De Santos, R.M.; Leite, N.; Sampaio, J. Identifying basketball performance indicators in regular season and playoff games. J. Hum. Kinet. 2013, 36, 161–168. [Google Scholar] [CrossRef]
Mercier, M.; Tremblay, M.; Daneau, C.; Descarreaux, M. Individual factors associated with baseball pitching performance: Scoping review. BMJ Open Sport Exerc. Med. 2020, 6, e000704. [Google Scholar] [CrossRef]
Johns, K.L.; Philipson, P.M.; Hayes, P.R. Planning for optimal performance—What happens before the taper? Int. J. Sport. Sci. Coach. 2019, 14, 749–764. [Google Scholar] [CrossRef]
Phillips, K.E.; Hopkins, W.G. Determinants of Cycling Performance: A Review of the Dimensions and Features Regulating Performance in Elite Cycling Competitions. Sports Med.-Open 2020, 6, 23. [Google Scholar] [CrossRef]
Lames, M.; McGarry, T. On the search for reliable performance indicators in game sports. Int. J. Perform. Anal. Sport 2007, 7, 62–79. [Google Scholar] [CrossRef]
Cronbach, L.J.; Meehl, P.E. Construct validity for psychological tests. Psychol. Bull. 1955, 52, 281–302. [Google Scholar] [CrossRef] [PubMed]
Clemens, N.; Ragan, K.; Prickett, C. Predictive validity. In The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2018; Volume 4, p. 1289. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J. Generalized Linear Models, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1989. [Google Scholar]
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
Tay, J.K.; Narasimhan, B.; Hastie, T. Elastic Net Regularization Paths for All Generalized Linear Models. J. Stat. Softw. 2023, 106, 1–31. [Google Scholar] [CrossRef]
Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Vaughan Williams, L.; Liu, C.; Dixon, L.; Gerrard, H. How well do Elo-based ratings predict professional tennis matches? J. Quant. Anal. Sport. 2021, 17, 91–105. [Google Scholar] [CrossRef]
Brier, G.W. Verification of Forecasts Expressed in Terms of Probability. Mon. Weather. Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Irons, D.J.; Buckley, S.; Paulden, T. Developing an Improved Tennis Ranking System. J. Quant. Anal. Sport. 2014, 10, 109–118. [Google Scholar] [CrossRef]
Spann, M.; Skiera, B. Sports forecasting: A comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J. Forecast. 2009, 28, 55–72. [Google Scholar] [CrossRef]
Gifford, M.; Bayrak, T. A predictive analytics model for forecasting outcomes in the National Football League games using decision tree and logistic regression. Decis. Anal. J. 2023, 8, 100296. [Google Scholar] [CrossRef]
Wunderlich, F.; Memmert, D. Forecasting the outcomes of sports events: A review. Eur. J. Sport Sci. 2021, 21, 944–957. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Time-weighted mean victory ratio

M_{R} [t_{0}] [i]

(see Equation (28)) for five reference horses (≥80 events; selection described in text). Y-axis in

[0, 1]

. Kernel: Triweight; bandwidth

p b = 1460

days.

Figure 1. Time-weighted mean victory ratio

M_{R} [t_{0}] [i]

(see Equation (28)) for five reference horses (≥80 events; selection described in text). Y-axis in

[0, 1]

. Kernel: Triweight; bandwidth

p b = 1460

days.

Figure 2. Category-adjusted time-weighted mean victory ratio

M_{R}^{[ω]} [t_{0}] [i]

(defined in Equation (30)). Same horses and color mapping as Figure 1; same Y-axis range. Higher-category results have larger impact (see Equation (2)).

Figure 2. Category-adjusted time-weighted mean victory ratio

M_{R}^{[ω]} [t_{0}] [i]

(defined in Equation (30)). Same horses and color mapping as Figure 1; same Y-axis range. Higher-category results have larger impact (see Equation (2)).

Figure 3. Distance-specific indicators

M_{R}^{[δ, g]} [τ_{0}] [i]

with

g = s

(short),

m

(medium) or

l

(long). (A) Distance-short indicator

M_{R}^{[δ, s]} [τ_{0}] [i]

(see Equation (34)); (B) Distance-medium indicator

M_{R}^{[δ, m]} [τ_{0}] [i]

(see Equation (35)); and (C) Distance-long indicator

M_{R}^{[δ, l]} [τ_{0}] [i]

(see Equation (36)). Same horses, colors, and Y-axis as Figure 1 and Figure 2. Bandwidth

p_{b} = 1460

days; distance weights per Equation (4).

Figure 3. Distance-specific indicators

M_{R}^{[δ, g]} [τ_{0}] [i]

with

g = s

(short),

m

(medium) or

l

(long). (A) Distance-short indicator

M_{R}^{[δ, s]} [τ_{0}] [i]

(see Equation (34)); (B) Distance-medium indicator

M_{R}^{[δ, m]} [τ_{0}] [i]

(see Equation (35)); and (C) Distance-long indicator

M_{R}^{[δ, l]} [τ_{0}] [i]

(see Equation (36)). Same horses, colors, and Y-axis as Figure 1 and Figure 2. Bandwidth

p_{b} = 1460

days; distance weights per Equation (4).

Figure 4. AUC performance across

α

levels for short, medium, and long distance models, based on a random sample of mutually independent observations.

Figure 4. AUC performance across

α

levels for short, medium, and long distance models, based on a random sample of mutually independent observations.

Table 1. Summary of horse-event records by event distance and category.

Horse-Event Records	Distance			Category
Horse-Event Records	Short	Medium	Long	A	B	C	D	E
N	107,888	190,370	123,075	2404	10,372	20,823	82,368	305,366
%	25.61	45.18	29.21	0.57	2.46	4.94	19.55	72.48

Table 2. Summary of strength indicators.

			Strength Indicators
Weighting Scheme			$M_{R} [t_{0}]$	$M_{R}^{[ω]} [t_{0}]$	$M_{R}^{[δ, s]} [t_{0}]$	$M_{R}^{[δ, m]} [t_{0}]$	$M_{R}^{[δ, l]} [t_{0}]$
Time	$ψ$	(16)	x	x	x	x	x
Category	$ω$	(2)		x
Short distance	$δ [s]$	(4)			x
Medium distance	$δ [m]$	(4)				x
Long distance	$δ [l]$	(4)					x

Table 3. Fitting logistic regression models according to event distance.

	Regularization Fitting				Parameter Estimates
Distance	$α$	$λ_{\min}$	$AUC$	$se (AUC)$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$
Short $(N = 1823)$	0.0	1.2853	0.7303	0.0126	0.3557	0.8099	0.4336
	0.3	0.0032	0.7312	0.0073	2.2066	2.5698	0.3832
	1.0	0.0030	0.7300	0.0148	2.5190	2.3353	0.0159
Medium $(N = 2244)$	0.0	0.0250	0.7285	0.0110	1.4360	2.2269	0.5974
	0.7	0.0166	0.7292	0.0088	1.9450	1.9283	0.0000
	1.0	0.0168	0.7289	0.0071	2.0109	1.6219	0.0000
Long $(N = 1995)$	0.0	0.1872	0.7136	0.0074	0.8689	1.5634	1.0379
	0.4	0.0212	0.7181	0.0151	1.5800	1.8752	0.7328
	1.0	0.0033	0.7169	0.0088	2.2705	1.9802	0.0000

Table 4. Predictive validation by distance class and overall.

Distance	$α$	Accuracy	Calibration	Discrimination	Brier Sc.	Select
		$(Acc)$	$(C)$	$(D)$	$(S_{B})$
Short $(N =$ 129,914)	0.0	0.6512	0.8336	0.0167	0.2326
	0.3	0.6514	0.9836	0.0492	0.2158	(*)
	1.0	0.6513	0.9829	0.0490	0.2158
Medium $(N =$ 321,768)	0.0	0.6702	0.9459	0.0462	0.2111	(*)
	0.7	0.6701	0.9396	0.0451	0.2115
	1.0	0.6697	0.9362	0.0443	0.2118
Long $(N =$ 224,365)	0.0	0.6580	0.9154	0.0361	0.2183
	0.4	0.6578	0.9523	0.0435	0.2155
	1.0	0.6581	0.9637	0.0453	0.2150	(*)
Global		0.6626	0.9589	0.0464	0.2133

The asterisk (*) marks the alpha selected according to the values of

A c c

, C, D and

S_{B}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ligero-Acosta, M.; Muñoz-Pichardo, J.M.; Gómez, M.D.; Ripollés-Lobo, M.; Valera, M. Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing. Mathematics 2026, 14, 167. https://doi.org/10.3390/math14010167

AMA Style

Ligero-Acosta M, Muñoz-Pichardo JM, Gómez MD, Ripollés-Lobo M, Valera M. Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing. Mathematics. 2026; 14(1):167. https://doi.org/10.3390/math14010167

Chicago/Turabian Style

Ligero-Acosta, Manuel, Juan M. Muñoz-Pichardo, María Dolores Gómez, María Ripollés-Lobo, and Mercedes Valera. 2026. "Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing" Mathematics 14, no. 1: 167. https://doi.org/10.3390/math14010167

APA Style

Ligero-Acosta, M., Muñoz-Pichardo, J. M., Gómez, M. D., Ripollés-Lobo, M., & Valera, M. (2026). Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing. Mathematics, 14(1), 167. https://doi.org/10.3390/math14010167

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time-Weighted Result-Based Strength Indicators from Head-to-Head Outcomes: An Application to Trotter (Harness) Racing

Abstract

1. Introduction

2. Dataset Description

3. Notation and Definitions of Parameters and Quantities

3.1. Corrections According to Event Category

3.2. Corrections According to Event Distance

4. Strength Indicators

5. Validation of the Strength Indicators

6. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI