Big Data and Data-Driven Research in Sports

A special issue of Data (ISSN 2306-5729).

Deadline for manuscript submissions: 31 December 2026 | Viewed by 25013

Special Issue Editor


E-Mail Website
Guest Editor
Department of Education. Area of Physical Education and Sports, University of Cantabria, Los Castros Avenue, 50, 39005 Santander, Spain
Interests: big data; sports performance; physiology; technologies; physical education; individual sports; team sports; health

Special Issue Information

Dear Colleagues,

The use of big data via the use of various techniques and strategies has evolved in recent years, yielding numerous advantages in the sports field. Its use has led to the development of research and measures that target athletes: performance measurement, physiological and biomechanical enhancements, tactical and strategic improvements in team sports, and many more. New research and methodological improvements in these fields are being produced continuously.

This Special Issue aims to compile scientific evidence regarding the use of big data and decision making for the improvement of sports performance, athlete safety, and other sports-related areas.

In more detail, this Special Issue welcomes original research articles, dataset descriptors, communications, and systematic reviews. The scope of this Special Issue includes, but is not limited to, the following topics:

  • Biomechanics;
  • Physiology;
  • Injury prevention;
  • Tactics and strategies in team sports;
  • Nutrition;
  • New technology applied to big data.

Dr. Oliver Ramos Álvarez
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • performance
  • analysis
  • technology
  • biomechanics
  • physiology
  • optimization
  • team sports

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

31 pages, 2038 KB  
Article
Quantifying the Key Performance Indicators of Success: An Exploratory Analysis of Champion Teams in Europe’s Top Football Leagues
by José Gama, Gonçalo Dias, Rodrigo Mendes, Fernando Martins, Rui Sousa Mendes and Vasco Vaz
Data 2026, 11(5), 102; https://doi.org/10.3390/data11050102 - 2 May 2026
Viewed by 669
Abstract
This study quantified performance indicators associated with match outcomes among champion teams from the five major European football leagues during the 2023–2024 season. Ordinal logistic regression with robust standard errors clustered by team was employed, with analyses stratified by match location (home/away) and [...] Read more.
This study quantified performance indicators associated with match outcomes among champion teams from the five major European football leagues during the 2023–2024 season. Ordinal logistic regression with robust standard errors clustered by team was employed, with analyses stratified by match location (home/away) and opponent quality (high/medium/low). Data from 182 matches were sourced from Wyscout® and included offensive indicators (possession, passes, shots, shots on target, expected goals) and defensive indicators (interceptions, fouls, shots conceded, yellow and red cards). Spearman correlations showed that goals scored (q=0.523) and shots on target (q=0.243) were positively associated with match outcomes, whereas goals conceded (q=0.441) and fouls (q=0.255) were negatively associated. Ordinal regression revealed context-dependent effects. Offensively, shots on target increased the odds of a better outcome at home (OR = 3.76) and against high-quality opponents (OR = 5.24), while expected goals (xG) was the key predictor in away matches (OR = 2.09). Defensively, interceptions were crucial against high-quality opponents (OR = 1.76), while fouls (OR = 0.53) and yellow cards (OR = 0.61) were detrimental against medium-quality opponents. Against low-quality opponents, shots on target conceded (OR = 0.22) and red cards (OR = 66.58) were critical. Volume-based indicators did not retain significant independent effects. For elite champion teams, competitive success is predominantly determined by efficiency-based indicators, shot accuracy, expected goals, and defensive organisation, whose relevance varies systematically with context. These findings provide exploratory insights and a context-sensitive benchmark for performance analysis at the highest level of European football, warranting further validation in future studies. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

10 pages, 1011 KB  
Article
The Role of Shot Velocity in Advanced Post-Shot Metrics: Evidence from the UEFA European Football Championships
by Blanca De-la-Cruz-Torres, Anselmo Ruiz-de-Alarcón-Quintero and Miguel Navarro-Castro
Data 2026, 11(2), 39; https://doi.org/10.3390/data11020039 - 13 Feb 2026
Viewed by 815
Abstract
Introduction: Ball velocity is a critical determinant of shot effectiveness in football, yet its influence on advanced post-shot metrics, such as expected shot impact timing (xSIT) and expected goals on target (xGOT), remains poorly understood, particularly in the context of sex-specific differences. This [...] Read more.
Introduction: Ball velocity is a critical determinant of shot effectiveness in football, yet its influence on advanced post-shot metrics, such as expected shot impact timing (xSIT) and expected goals on target (xGOT), remains poorly understood, particularly in the context of sex-specific differences. This study examined the relationship between ball velocity and these metrics in men’s and women’s elite European tournaments. Methods: A total of 2174 shots were analyzed from all matches of the 2024 UEFA Men’s EURO (n = 1305) and 2025 UEFA Women’s EURO (n = 869), classified as goal shots on target, non-goal shots on target, and shots off target. Ball velocity was measured for each shot, and its associations with xSIT, our own xGOT model and the StatsBomb xGOT model were quantified using correlation coefficients. Results: Ball velocity differed significantly between sexes (p < 0.001), with higher values in men, and goal shots on target exhibited lower velocities than non-goal or off-target shots, indicating a speed–accuracy trade-off. Only xSIT and our own xGOT model were sensitive to ball velocity, reflecting sex-specific differences (p < 0.001). When comparing shot types across advanced metrics, a consistent trend was observed in both tournaments: xSIT showed no significant differences between goal and non-goal shots, whereas both xGOT models were higher for goal shots on target. Correlations indicated a moderate positive relationship between xSIT and ball velocity, and moderate negative correlations for both xGOT models, slightly stronger in men. Conclusions: Ball velocity is a critical factor influencing shot performance and advanced post-shot metrics, with notable sex-specific differences. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

9 pages, 616 KB  
Article
Expected Shot Impact Timing (xSIT) and Other Advanced Metrics as Indicators of Performance in English Men’s and Women’s Professional Football
by Blanca De-la-Cruz-Torres, Miguel Navarro-Castro and Anselmo Ruiz-de-Alarcón-Quintero
Data 2025, 10(10), 159; https://doi.org/10.3390/data10100159 - 2 Oct 2025
Cited by 1 | Viewed by 1979
Abstract
Blackground: Football performance analysis has grown rapidly in recent years, with increasing interest in advanced metrics to more accurately evaluate both individual and team performance. The aim of this study was to examine the utility of the Expected Shots Impact Timing (xSIT) metric [...] Read more.
Blackground: Football performance analysis has grown rapidly in recent years, with increasing interest in advanced metrics to more accurately evaluate both individual and team performance. The aim of this study was to examine the utility of the Expected Shots Impact Timing (xSIT) metric as an indicator of shooting performance in English professional football, specifically in the men’s Premier League (PL) and the Women’s Super League (WSL). Methods: A total of 9831 shots from the PL (2015/16 season) and 3219 shots from the WSL (2020/21 season) were analyzed. Data were obtained from publicly accessible football databases. The variables examined included goals, Possession Value (PV), Expected Goals (xG), Expected Goals on Target (xGOT), and xSIT. All variables were normalized per match (90 min). Descriptive statistics, correlational analyses, and comparative analyses between leagues. Results: The WSL exhibited a significantly higher PV than the PL (p < 0.001), whereas the remaining metrics showed no significant differences between leagues (p > 0.05). Moreover, in the WSL, all performance indicators displayed very strong correlations with goals, while in the PL, similarly strong associations were observed, except for PV, which showed only a weak relationship. Conclusions: the xSIT metric, as an indicator of shooting performance, may be regarded as an influential factor in determining match outcomes across both leagues. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

17 pages, 3841 KB  
Article
Sliding Performance Evaluation with Machine Learning-Based Trajectory Analysis for Skeleton
by Ting Yu, Zhen Peng, Zining Wang, Weiya Chen and Bo Huo
Data 2025, 10(10), 153; https://doi.org/10.3390/data10100153 - 24 Sep 2025
Viewed by 1651
Abstract
Skeleton is an extreme sliding sport in the Winter Olympics, where formulating targeted sliding strategies, based on training videos to navigate complex tracks, is particularly important. To make in-depth use of training video records, this study proposes an analytical method based on Mixture [...] Read more.
Skeleton is an extreme sliding sport in the Winter Olympics, where formulating targeted sliding strategies, based on training videos to navigate complex tracks, is particularly important. To make in-depth use of training video records, this study proposes an analytical method based on Mixture of Gaussians (MoG) and K-means clustering to extract and analyze trajectories from recorded videos for sliding performance evaluation and strategy development. A case study was conducted using data from the Chinese national skeleton team at the Yanqing Sliding Center, obtaining 741, 834, and 726 sliding trajectories from three representative curves. These trajectories were divided into groups based on sliding completion time (fast, medium, and slow groups). The consistency of trajectories within each group was calculated to evaluate sliding stability, while trajectory patterns in the fast group were clustered and described based on the average values of multiple features (starting position, ending position, and apex orthogonal offset). The results showed that more skilled athletes exhibited greater sliding stability (lower ρC-values), and on each curve, there were sliding patterns that performed significantly better than others. This research quantifies the characteristics of athletes’ sliding trajectories on curves, facilitating the visual tracking of training effects and the development of personalized strategies. It provides coaches and athletes with scientific decision-making support and clear directions for improvement, ultimately enabling precise enhancements in training efficiency and competitive performance, while also laying a technical foundation for the future development of intelligent training systems. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

29 pages, 2211 KB  
Article
Big Data Analytics Framework for Decision-Making in Sports Performance Optimization
by Dan Cristian Mănescu
Data 2025, 10(7), 116; https://doi.org/10.3390/data10070116 - 14 Jul 2025
Cited by 23 | Viewed by 16594
Abstract
The rapid proliferation of wearable sensors and advanced tracking technologies has revolutionized data collection in elite sports, enabling continuous monitoring of athletes’ physiological and biomechanical states. This study proposes a comprehensive big data analytics framework that integrates data acquisition, processing, analytics, and decision [...] Read more.
The rapid proliferation of wearable sensors and advanced tracking technologies has revolutionized data collection in elite sports, enabling continuous monitoring of athletes’ physiological and biomechanical states. This study proposes a comprehensive big data analytics framework that integrates data acquisition, processing, analytics, and decision support, demonstrated through synthetic datasets in football, basketball, and athletics case scenarios, modeled to represent typical data patterns and decision-making workflows observed in elite sport environments. Analytical methods, including gradient boosting classifiers, logistic regression, and multilayer perceptron models, were employed to predict injury risk, optimize in-game tactical decisions, and personalize sprint mechanics training. Key results include a 12% reduction in hamstring injury rates in football, a 16% improvement in clutch decision-making accuracy in basketball, and an 8% decrease in 100 m sprint times among athletes. The framework’s visualization tools and alert systems supported actionable insights for coaches and medical staff. Challenges such as data quality, privacy compliance, and model interpretability are addressed, with future research focusing on edge computing, federated learning, and augmented reality integration for enhanced real-time feedback. This study demonstrates the potential of integrated big data analytics to transform sports performance optimization, offering a reproducible and ethically sound platform for advancing personalized, data-driven athlete management. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

Other

Jump to: Research

17 pages, 735 KB  
Data Descriptor
Daily and Accumulated Training-to-Match Load Ratios in Professional Soccer: The Influence of Starting Status and Playing Position Across a Full Competitive Season
by Alejandro Sierra-Casas, Daniel Castillo, Filipe Manuel Clemente and Alejandro Rodríguez-Fernández
Data 2026, 11(4), 84; https://doi.org/10.3390/data11040084 - 14 Apr 2026
Viewed by 720
Abstract
Introduction: Monitoring training load is essential in elite soccer to optimize performance and reduce injury risk. The training-to-match load ratio (TMr) has emerged as a useful metric to contextualize training demands relative to competitive match exposure. The objective of this study was to [...] Read more.
Introduction: Monitoring training load is essential in elite soccer to optimize performance and reduce injury risk. The training-to-match load ratio (TMr) has emerged as a useful metric to contextualize training demands relative to competitive match exposure. The objective of this study was to compare daily and accumulated TMr between starters and non-starters over a professional season, considering microcycle day and playing position. Methods: Twenty players (Tier 3) from a professional team were monitored during a full competitive season (30 microcycles; 144 training sessions; 30 matches). External load variables, namely total distance (TD), high-speed distance (HSD), sprint distance (SPD), high metabolic load distance (HMLD), acceleration (ACC) and deceleration (DCC), were collected using 10 Hz GPS devices (STATSports). Daily and microcycle TMr were calculated relative to each player’s maximal match value registered during a full competitive period. Linear mixed-effects models examined the effects of starting status, microcycle day, and playing position. Results: Linear mixed models revealed significant three-way interactions (status × day × position) for locomotor variables: TD (F = 3.36, p < 0.001), HSD (F = 2.49, p < 0.001), and SPD (F = 3.37, p < 0.001). Starters accumulated higher loads on match day, whereas non-starters showed higher TMr on MD + 1 and MD + 2. Position-specific differences emerged during acquisition sessions (i.e., MD − 5 to MD − 3), particularly for wide midfielders (WMs) and central defenders (CDs). No significant three-way interactions were observed for ACC, DCC, or HMLD absolute loads (p > 0.05), nor for any accumulated microcycle TMr metrics (p > 0.05). Conclusions: TMr effectively differentiates preparation strategies between starters and non-starters. Although “top-up conditioning” sessions increase early-week relative loads for non-starters, position-specific variations–particularly in mechanical variables during acquisition sessions–highlight the need for individualized load prescription. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

3 pages, 157 KB  
Data Descriptor
Normative Physical Fitness Profiles and Sex Differences in University Students of Sport Sciences: An Open Dataset of Anthropometrics, Flexibility, Strength, and Jump Performance
by Julio Martín-Ruiz and Laura Ruiz-Sanchis
Data 2026, 11(2), 34; https://doi.org/10.3390/data11020034 - 7 Feb 2026
Viewed by 710
Abstract
This Data Descriptor provides an open, anonymized dataset describing anthropometric and physical fitness outcomes in undergraduate students enrolled in a Physical Activity and Sport Sciences degree program. The dataset included 156 participants (28 females and 128 males) and reported sex, age, body mass, [...] Read more.
This Data Descriptor provides an open, anonymized dataset describing anthropometric and physical fitness outcomes in undergraduate students enrolled in a Physical Activity and Sport Sciences degree program. The dataset included 156 participants (28 females and 128 males) and reported sex, age, body mass, stature, and body mass index, alongside standardized field-based tests covering flexibility, muscular endurance, strength, and jump performance. Hip flexibility was assessed using the Thomas test on both sides. Trunk extensor endurance was measured using the Biering–Sørensen test, and upper-body strength–endurance was assessed using a dead-hang test. Upper limb strength was recorded as elbow flexion strength. Lower limb power was evaluated using vertical jump tests, including Abalakov, squat jump, and countermovement jump, and a derived indicator (IE) was provided to facilitate comparisons across jump modalities. The data are distributed as a machine-readable CSV file accompanied by a detailed data dictionary describing the variables, units, and missingness. The dataset is intended to support the reproducible reporting of normative fitness profiles in sports science students, facilitate teaching and benchmarking in exercise science contexts, and enable secondary analyses exploring associations between anthropometry and physical performance. For reproducible inferential comparisons, users may apply Welch’s two-sample t-test for sex-based differences. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Back to TopTop