A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports
Abstract
1. Introduction
- An Actionable Framework: We introduced our structured methodology that orchestrates the collaboration between three key roles, including the ‘elite sports team’, the ‘sport science expert’, and the ‘causal inference expert’. Then, through a nine-step workflow, a practical problem is translated into a causal computing and the computational outputs are eventually translated back into practical strategies. We also proposed a dual-dimensional field evaluation for practical application of causal discovery in elite sports, encompassing both outcome evaluation and process evaluation.
- Demonstrated Real-World Application: We showcase the framework’s adaptability with an exploratory case study where our framework was used to discover the causal impact of ambient temperature on 20 km race-walking performance. We present the real-life collaboration between experts from the three domains in each of the nine steps to provide one additional evidence-informed input for the national team’s cooling strategy at the Paris 2024 Games. As of December 2025, no original article applying causal discovery algorithms to sport could be identified in Web of Science, IEEE Xplore, ACM DL or Scopus (keywords: ‘causal discovery’ and ‘sport *’).
2. Related Work
- (1)
- Constraint-based methods searching for “difference making”: PC treats causality as a set of conditional independence facts implied by d-separation [35], while FCI generalizes PC by allowing unknown hidden common causes denoted as bidirectional edges [36]. They need only reliable independence tests to return a Markov equivalence class, so they scale to tens of thousands of genes in sparse regulatory networks.
- (2)
- Score-based approaches to maximize “explanatory factors”: GES recasts the task as greedy maximization of a score over equivalence classes, so humans cannot easily trace the “why”; when used alone, these methods assume no confounders [37]. Its speed advantage outweighs the slight loss of interpretability, making GES the default first-cut tool in high-dimensional genomics.
- (3)
- Functional causal models exploiting causal asymmetry proposed by Hume, Russell, and Lewis [38,39,40]: LiNGAM, ANM, and PNL encode the physical generative process (Y = f(X,ε) with ε⊥⊥X [41]); the ensuing invertibility constraint lets them orient every edge and output in a DAG, making them ideal for small protein-signaling pathways or fMRI effective connectivity graphs where directionality is crucial.
3. The Framework
- (1)
- The Elite Sports Team: The coach, the athletes, and the managers are the practice and problem owners, who raise practical problems about the unknown knowledge, yet refuse to trust experimental data from other populations or computations, whose rationale remains unclear.
- (2)
- The Sport Science (SS) Expert: The sports biomechanics expert, the sports physiology expert, and so on, are the known domain knowledge holders, who can translate practical problems into scientific problems, yet lack clues to new knowledge required for the team.
- (3)
- The Causal Inference (Ci) Expert: They are the computational method owner, yet they struggle with insufficient understanding of application problems and domain knowledge. This is also the primary reason why most data science research fails to be practically applied in elite sports.
- (1)
- Process-Based Evaluation: Although lacking the evaluation statistical measures in traditional methods, causal discovery offers greater interpretability in computational process. Referencing the criteria coaches and athletes use to adopt knowledge before a competition, cross-domain teams can evaluate the trustworthiness of knowledge at each step and ultimately determine the overall trustworthiness of the entire knowledge discovery process.
- (2)
- Outcome-Based Evaluation: Without ground truth from RCTs or standard datasets, practice serves as a source of practical feedback for evaluating knowledge utility. Referencing how coaches update beliefs based on post hoc performance, positive outcomes indicate contextual consistency—compatibility with successful performance under specific conditions—rather than universal causal validation. If adopting causally discovered knowledge yields positive outcomes, the knowledge’s practical utility score should increase; conversely, it should decrease.
- (1)
- Process-Based Trustworthiness Score (Tp): This component quantifies the intrinsic trustworthiness of the knowledge discovery process itself, independent of its practical outcomes. Si represents the trustworthiness score for each of the nine steps in the framework. It is assessed by the collaborative team and ranges from −2 (indicating a significant flaw or untrustworthy action) to +2 (indicating a highly rigorous and transparent action), with a default value of 0 (step completed but without special validation). The sum of these scores, (∑si), ranges from −18 to +18. By dividing this sum by 36, we normalize it to a scale of −0.5 to +0.5, with a default of 0.5. This normalized value is then added to a baseline of 0.5, ensuring that TP is bounded between 0 (a fundamentally flawed process) and 1 (a maximally trustworthy process), with a default of 0.5, as shown in Equation (1):
- (2)
- Outcome experience factor (A0) dynamically adjusts the score based on empirical results. This factor is calculated by accumulating the number of positive and negative outcomes. Each positive outcome increases trustworthiness but never exceeds 1, while each negative outcome decreases trustworthiness but never falls below zero. Thus, the outcome trustworthiness metric serves as an expansion factor for the process trustworthiness score, setting the score’s values to range between 0 and 1. In addition, to control the speed of this adjustment, we introduce γ (gamma), a sensitivity parameter (a small positive constant with a default of 1) that controls how quickly the metric responds to new outcomes. A smaller γ makes the model more conservative, requiring more evidence for significant adjustments, while a larger γ allows for a more rapid response.
4. A Case of Olympic Application
- (1)
- Elite Sports Team: This role was filled by the national race walking team, including the head coach, athletes, and administrative staff. They are the practitioners and the originators of the problem, possessing the most direct on-field experience and the most pressing practical needs. They had observed the team’s performance decline and even instances of heatstroke during competitions in hot weather (such as the 2023 Budapest World Championships), leading them to pose the core practical challenge: “How can we scientifically adjust our tactics to maintain competitiveness in high-temperature environments?”
- (2)
- Sport Science Expert: This role was undertaken by experts in exercise physiology and sport training from the National Institute of Sport Science. They are the holders of domain expertise, capable of translating the coaching staff’s vague “feelings” and practical issues into a researchable scientific problem. They understand the energy metabolism, thermoregulation mechanisms, and technical-tactical characteristics of race walking, enabling them to provide theoretical support and interpretation for data analysis from physiological and biomechanical perspectives.
- (3)
- Causal Inference Expert: This role was filled by computational scientists from the National Institute of Sport Science and the Institute of Artificial Intelligence in Sports at Capital University of Physical Education and Sports. They are the experts in computational methods, proficient in data science, statistics, and causal discovery algorithms. They were responsible for processing complex data and for selecting and implementing the most appropriate computational models to uncover the hidden causal relationships within the data, though they have limited professional knowledge of the sport of race walk itself.
- Executor: Elite sports team.
- Specific Work: With only a few months left before the Paris Olympics, the national race walking team observed a trend of rising ambient temperatures in recent major competitions and noted that the team’s performance in hot races was unsatisfactory. They urgently needed a data-driven, scientific basis that went beyond individual coaching experience to guide athletes on how to implement more refined pacing, replenishment, and cooling strategies in the anticipated high temperatures of Paris to strive for the best possible results.
- Executor: Sport science expert.
- Specific Work: Upon receiving the practical problem, the sport science expert translated it into a clear scientific research question. Based on knowledge of exercise physiology (e.g., that core body temperature, skin temperature, and metabolic rate are key endogenous variables affecting endurance performance), they proposed the core scientific hypothesis: “Mediated by endogenous variables such as core temperature, skin temperature, and metabolic rate, ambient temperature and humidity have a dynamic and evolving impact on pacing strategy across different segments of a race walk.” This question shifted the research focus from vague “tactical optimization” to the precise “dynamic causal relationship between environmental conditions and split times.”
- Executors: Sport science expert and causal inference expert.
- Specific Work: The two experts collaborated to convert the scientific problem into a computable and operational data science problem.
- ○
- Data Requirements Definition: They determined the need to collect two core types of data: (1) Performance Data: Final times and the most granular split times possible (10 km, 5 km, and 1 km) for elite athletes (top eight finishers) from past major competitions (the Olympics and World Championships) and (2) Environmental Data: Pre-race and post-race ambient temperature and humidity for the corresponding competitions.
- ○
- Analysis Pathway Planning: They planned a three-part analysis: (1) Correlation analysis, to initially verify if a relationship exists between the temperature/humidity and performance; (2) causal discovery, to uncover the causal graph among the variables; and (3) regression modeling, to quantify the magnitude of the effects.
- Executors: Sport science expert and causal inference expert.
- Specific Work: Official result lists and on-site weather reports for seven championships (2015–2023) were downloaded from the World Athletics portal; missing meteorological values were linearly interpolated from Weather Spark (0.5° grid, center ≤ 35 km from the venue) to the local official start time (≤15 min discrepancy). Manual splits were transcribed independently by two researchers from the official on-screen timing displayed in China Central Television (CCTV) race feeds; discrepancies were resolved by a third reviewer, yielding perfect inter-rater agreement (ICC = 1.00). Measurement uncertainties (Weather Spark ± 0.6 °C, ±4% RH; timer ± 0.01 s) were recorded for later propagation. Ultimately, a dataset containing the performance and corresponding environmental data for 56 elite female and 56 elite male athletes was collected, as Table 2 shows. WC denotes World Championships, OC denotes Olympic Games, IAAF denotes International Association of Athletics Federations, CCTV denotes China Media Group, WS denotes Weather Spark, and NA indicates missing data. BT denotes ambient temperature before the event, AT denotes ambient temperature after the event, BH denotes ambient humidity before the event, and AH denotes ambient humidity after the event.
- Executor: Causal inference expert.
- Specific Work: The causal inference expert conducted rigorous quality control on the collected raw data. A critical step was to verify the consistency of the environmental data from different sources (World Athletics vs. Weather Spark). By calculating the intraclass correlation coefficient (ICC), they confirmed that the pre-race temperature (ICC = 0.963), post-race temperature (ICC = 0.960), pre-race humidity (ICC = 0.867), and post-race humidity (ICC = 0.865) from both sources were highly consistent (p < 0.001), allowing the data to be merged for analysis. This step ensured the reliability and validity of the subsequent analyses.
- Executor: Causal inference expert.
- Specific Work: Based on the computational problem and data characteristics, the causal inference expert selected a series of computational methods. First, Pearson correlation (two-tailed, α = 0.05) was used for exploratory analysis. For the core causal discovery phase, considering the potential for unobserved confounding factors in a race walk (such as an athlete’s individual heat acclimatization or in-race cooling interventions), the expert ruled out the PC algorithm, which assumes no latent variables. Instead, they chose the FCI (fast causal inference) algorithm (gCastle 1.2.2, Fisher-Z conditional independence test, α = 0.01, maxP = 3, complete-rule-set), which is capable of handling potential latent variables. This choice reflected a deep understanding of the problem’s complexity and significantly enhanced the robustness of the conclusions.
- Executor: Causal inference expert.
- Specific Work: The pipeline was executed in Python 3.11. The correlation analysis revealed a significant positive correlation between ambient temperature and split times, especially in the first half of women’s race (r(54) = 0.782, p < 0.001). Subsequently, the FCI algorithm was applied to the 10 km split data to allow latent confounders; edges present in ≥80% of 100 bootstrap runs were retained. As a sensitivity check, the same pipeline was re-run after removing the fastest and slowest split sequences from each race (top six), yielding ≥92% edge overlap with the original top eight PAG, confirming stable structure. The computational results (as shown in Figure 2) visually displayed the causal arrows between variables and the potential presence of latent variables (indicated by bi-directed arrows).
- Executors: Sport science expert and causal inference expert.
- Specific Work: This was the crucial step for transforming data into knowledge.
- ○
- The causal inference expert first interpreted the causal graph from a technical perspective: “The results show that pre-race temperature is a direct cause of the first-half split time, which in turn directly affects the second-half split time for female athletes. Furthermore, the bi-directed arrow between pre-race temperature and the first-half split time indicates the presence of an unobserved confounder that affects both.”
- ○
- The sport science expert then translated this technical conclusion into the language of sport science: “This means that high temperature primarily impacts the final result by influencing the pacing during the first half of the race, rather than simply depleting energy in the second half as commonly believed. That ‘confounder’ likely represents individualized interventions, such as pre-race ice vests or in-race water dousing for cooling, which may modulate the body’s response to external heat and thereby influence the chosen pace for the first half.”
- Executors: The entire team, with the final delivery to the elite sports team.
- Specific Work: The team translated the interpreted causal insights into direct, actionable preparation advice for the national team. The final report highlighted: (1) Core Insight: The impact of high temperature on performance primarily manifests in the first half of the race. Therefore, cooling and replenishment strategies must be actively implemented from the very beginning, not just when fatigue sets in during the second half. (2) Tactical Recommendation: It was recommended that when formulating tactics for the Paris Olympics, the team should pre-plan their first-half pacing strategy and cooling plan based on the forecasted starting temperature in order to conserve energy and create favorable conditions for the final push. This report was submitted to the national team one week before the race, providing a crucial quantitative reference for their final decisions.
- In the context of real-world applications in elite sports, we often lack a “gold standard” for evaluation (such as a pre-known ground-truth causal graph or parallel randomized controlled trials, RCTs). Therefore, we adopt the “field evaluation” method defined in the framework. This method comprehensively assesses the trustworthiness of the knowledge discovery from two dimensions: process and outcome.
- Process-Based Trustworthiness
- 3.
- Outcome-Based Evaluation
5. Discussion
5.1. Principal Findings
- (1)
- Quality-controlled, time-synchronized public data can yield actionable causal hypotheses.
- (2)
- The field evaluation trustworthiness score heuristic offers transparent, Delphi-anchored trust quantification in the absence of RCT ground truth.
- (3)
- Multi-domain collaboration and co-interpretation is essential to turn DAG edges into coach-ready language.
5.2. Methodological Rigor and Limitations
- (1)
- Evaluation Heuristic: The process-based trustworthiness score is intentionally a decision-support tool, not a validated psychometric index. We anchored it to Delphi consensus protocol and mean opinion score; however, independent raters were not feasible within the Olympic timeline. Future work should invite external panels and compare the score against experimental benchmarks.
- (2)
- FCI Robustness: FCI parameters (Fisher-Z, α = 0.01, maxP = 3, complete-rule-set) and bootstrap stability (≥80%, B = 100) are now fully disclosed. Sensitivity analysis—trimming the fastest and slowest split per race (top six)—yielded ≥92% edge overlap, indicating that the temperature → first-half pacing pathway is robust to athlete-selection perturbations. Nevertheless, the observational nature of the data precludes definitive causal claims; the DAG should be treated as hypothesis-generating.
- (3)
- Bias Mitigation: We adopted: (i) double-blind problem translation, (ii) mandatory “uncertainty edge” review, and (iii) bootstrap stability thresholds. These reduce but do not eliminate confirmation bias inherent when the same team scores its own process; we now explicitly state this limitation.
- (4)
- Outcome-Validation Decoupling: We explicitly distinguish between the competitive success observed in the case study and the validation of causal claims. The Olympic outcome demonstrates that the framework-generated insights were contextually consistent with successful performance under specific environmental conditions, not that the causal discovery algorithm ‘caused’ the victory. This aligns with the framework’s purpose as a decision-support tool rather than a deterministic predictor. Elite sport performance is influenced by numerous uncontrolled factors including training regimens, athlete preparation, competition dynamics, and chance; our framework provided one evidence-informed input among many preparation strategies.
5.3. Applicability Boundaries
5.4. Future Directions
- (1)
- Experimental Validation: Crossover heat-chamber trials to test the temperature → pacing hypothesis under controlled conditions.
- (2)
- Multi-Modal Challenges: The integration of multi-modal data (IMU, video, HRV, and wearables) implies more variables and longer pathways, necessitating the introduction of causal discovery for time series—a topic not yet addressed in this paper.
- (3)
- Semi-Automation: A web-based “nine-step wizard” that guides coaches through problem translation, data upload, bootstrap FCI, and auto-generation of action cards.
- (4)
- Repository: An open repository of anonymized case studies (rowing, swimming, and cycling) to serve as a practical playbook.
5.5. Concluding Statement
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Rangasamy, K.; As’ari, M.A.; Rahmad, N.A.; Ghazali, N.F.; Ismail, S. Deep learning in sport video analysis: A review. TELKOMNIKA Telecommun. Comput. Electron. Control 2020, 18, 1926–1933. [Google Scholar] [CrossRef]
- Caprioli, L.; Romagnoli, C.; Campoli, F.; Edriss, S.; Padua, E.; Bonaiuto, V.; Annino, G. Reliability of an Inertial Measurement System Applied to the Technical Assessment of Forehand and Serve in Amateur Tennis Players. Bioengineering 2025, 12, 30. [Google Scholar] [CrossRef]
- Rajšp, A.; Fister, I. A systematic literature review of intelligent data analysis methods for smart sport training. Appl. Sci. 2020, 10, 3013. [Google Scholar] [CrossRef]
- Wu, F.; Wang, Q.; Bian, J.; Ding, N.; Lu, F.; Cheng, J.; Dou, D.; Xiong, H. A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications. IEEE Trans. Multimed. 2023, 25, 7943–7966. [Google Scholar] [CrossRef]
- Cochrane, A. Effectiveness and Efficiency: Random Reflections on Health Services; Nuffield Provincial Hospitals Trust: London, UK, 1972; Available online: https://www.nuffieldtrust.org.uk/research/effectiveness-and-efficiency-random-reflections-on-health-services (accessed on 5 June 2025).
- Périard, J.D.; Wilson, M.G.; Tebeck, S.T.; Gilmore, J.B.; Stanley, J.; Girard, O. Influence of the Thermal Environment on Work Rate and Physiological Strain during a UCI World Tour Multistage Cycling Race. Med. Sci. Sports Exerc. 2022, 55, 32–45. [Google Scholar] [CrossRef]
- Liu, X.; Ornelas, E.; Shi, H. Was COVID-19 a Game Changer for the Tokyo and Beijing Olympics? J. Sports Econ. 2024, 25, 866–886. [Google Scholar] [CrossRef]
- Schlembach, C.; Schmidt, S.L.; Schreyer, D.; Wunderlich, L. Forecasting the Olympic medal distribution–A socioeconomic machine learning model. Technol. Forecast. Soc. Change 2022, 175, 121314. [Google Scholar] [CrossRef]
- Franklin, J.M.; Patorno, E.; Desai, R.J.; Glynn, R.J.; Martin, D.; Quinto, K.; Pawar, A.; Bessette, L.G.; Lee, H.; Garry, E.M.; et al. Emulating randomized clinical trials with nonrandomized real-world evidence studies: First results from the RCT DUPLICATE initiative. Circulation 2021, 143, 1002–1013. [Google Scholar] [CrossRef] [PubMed]
- Knechtle, B.; Weiss, K.; Valero, D.; Scheer, V.; Villiger, E.; Nikolaidis, P.T.; Andrade, M.; Cuk, I.; Gajda, R.; Rosemann, T.; et al. Change in elevation predicts 100 km ultra marathon performance. Sci. Rep. 2025, 15, 25176. [Google Scholar] [CrossRef] [PubMed]
- Racinais, S.; Havenith, G.; Aylwin, P.; Ihsan, M.; Taylor, L.; Adami, P.E.; Adamuz, M.-C.; Alhammoud, M.; Alonso, J.M.; Bouscaren, N.; et al. Association between thermal responses, medical events, performance, heat acclimation and health status in male and female elite athletes during the 2019 Doha World Athletics Championships. Br. J. Sports Med. 2022, 56, 439–445. [Google Scholar] [CrossRef]
- Bullock, G.S.; Hughes, T.; Arundale, A.H.; Ward, P.; Collins, G.S.; Kluzek, S. Black Box Prediction Methods in Sports Medicine Deserve a Red Card for Reckless Practice: A Change of Tactics is Needed to Advance Athlete Care. Sports Med. 2022, 52, 1729–1735. [Google Scholar] [CrossRef]
- Glymour, C.; Zhang, K.; Spirtes, P. Review of causal discovery methods based on graphical models. Front. Genet. 2019, 10, 524. [Google Scholar] [CrossRef]
- Shrier, I.; Platt, R.W. Reducing bias through directed acyclic graphs. BMC Med. Res. Methodol. 2008, 8, 70. [Google Scholar] [CrossRef]
- Nuzzo, J.L.; Finn, H.T.; Herbert, R.D. Causal mediation analysis could resolve whether training-induced increases in muscle strength are mediated by muscle hypertrophy. Sports Med. 2019, 49, 1309–1315. [Google Scholar] [CrossRef] [PubMed]
- Kalkhoven, J.T. Athletic Injury Research: Frameworks, Models and the Need for Causal Knowledge. Sports Med. 2024, 54, 1121–1137. [Google Scholar] [CrossRef] [PubMed]
- Miliani, M.; Auriemma, S.; Bondielli, A.; Chersoni, E.; Passaro, L.; Sucameli, I.; Lenci, A. ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025; Che, W., Nabende, J., Shutova, E., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 17335–17355. [Google Scholar] [CrossRef]
- Aristotle. Aristotle Physics; Public Domain, 350 BC; Peripatetic Press: Merrimack, NH, USA, 1980. [Google Scholar]
- Aristotle. Aristotle Metaphysics; Public Domain, 350 BC; Penguin Classics: New York, NY, USA, 1999. [Google Scholar]
- Bacon, F. Novum Organum; Public Domain; Bottom of the Hill Publishing: San Francisco, CA, USA, 1620. [Google Scholar]
- Galilei, G. Dialogues Concerning Two New Sciences; Public Domain; Louis Elzevir: Leiden, The Netherlands, 1638. [Google Scholar]
- Newton, I. Mathematical Principles of Natural Philosophy; Royal Society: London, UK, 1687. [Google Scholar]
- Abdelbaky, F.M. Impacts of Mental Toughness Program on 20 Km Race Walking. Ovidius Univ. Ann. Ser. Phys. Educ. Sport/Sci. Mov. Health 2012, 12, 67–71. [Google Scholar]
- Silva Gde, O.S.; Bredt Sda, G.T.; Praça, G.M. Does experience mitigate the deleterious effect of mental fatigue on tactical performance? A study in youth soccer academies. Int. J. Sports Sci. Coach. 2025, 21, 154–169. [Google Scholar] [CrossRef]
- Angrist, J.D.; Pischke, J.S. Mostly Harmless Econometrics: An Empiricist’s Companion; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
- Goldberger, A.S. Structural equation methods in the social sciences. Econom. J. Econom. Soc. 1972, 40, 979–1001. [Google Scholar] [CrossRef]
- Guo, C.; Hu, X.; Xu, C.; Zheng, X. Association between Olympic Games and children’s growth: Evidence from China. Br. J. Sports Med. 2022, 56, 1110–1114. [Google Scholar] [CrossRef]
- Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015; Available online: https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB (accessed on 4 February 2026).
- Gibbs, C.P.; Elmore, R.; Fosdick, B.K. The causal effect of a timeout at stopping an opposing run in the NBA. Ann. Appl. Stat. 2022, 16, 1359–1379. [Google Scholar] [CrossRef]
- Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Landes, J.; Osimani, B.; Poellinger, R. Epistemology of causal inference in pharmacology. Eur. J. Philos. Sci. 2018, 8, 3–49. [Google Scholar] [CrossRef]
- Spirtes, P.; Zhang, K. Causal discovery and inference: Concepts and recent methodological advances. Appl. Inform. 2016, 3, 3. [Google Scholar] [CrossRef]
- Malinsky, D.; Danks, D. Causal discovery algorithms: A practical guide. Philos. Compass 2018, 13, e12470. [Google Scholar] [CrossRef]
- Vowels, M.J.; Camgoz, N.C.; Bowden, R. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
- Spirtes, P.; Glymour, C. An Algorithm for Fast Recovery of Sparse Causal Graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2001; Available online: https://books.google.com/books?hl=zh-CN&lr=&id=OZ0vEAAAQBAJ&oi=fnd&pg=PR9&dq=causation+prediction+and+search&ots=08GOsAn13b&sig=PwZlgAVd0GVZEjMTvwtl1FnJKlg (accessed on 15 March 2025).
- Chickering, D.M. Optimal structure identification with greedy search. J. Mach. Learn. Res. 2002, 3, 507–554. [Google Scholar]
- Hume, D. A Treatise of Human Nature; Public Domain; International Relations and Security Network: Zürich, Switzerland, 1739; Available online: https://www.gutenberg.org/ebooks/4705 (accessed on 4 February 2026).
- Russell, B. Our Knowledge of the External World; Routledge: Oxfordshire, UK, 1914; Available online: https://archive.org/details/bub_gb_FC0djL2CDNgC/page/n21/mode/2up (accessed on 4 February 2026).
- Lewis, D. Counterfactuals; Blackwell Publishers: Oxford, UK, 1973; Available online: https://philpapers.org/rec/LEWC-2 (accessed on 4 February 2026).
- Shimizu, S.; Hoyer, P.O.; Hyvärinen, A.; Kerminen, A. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 2006, 7, 2003–2030. [Google Scholar]
- Amar, D.; Sinnott-Armstrong, N.; Ashley, E.A.; Rivas, M.A. Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks. Nat. Commun. 2021, 12, 350. [Google Scholar] [CrossRef] [PubMed]
- Marmarelis, M.G.; Hasan, A.; Azizzadenesheli, K.; Alvarez, M.; Anandkumar, A. Off-policy Predictive Control with Causal Sensitivity Analysis. In Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence, PMLR, Rio de Janeiro, Brazil, 21–25 July 2025; pp. 2958–2972. [Google Scholar]
- Wang, Z.; Ma, P.; Xue, Z.; Dai, Y.; Ji, Z.; Wang, S. Privacy-preserving and Verifiable Causal Prescriptive Analytics. Proc. ACM Manag. Data 2025, 3, 1–27. [Google Scholar] [CrossRef]
- Carvalho, D.D.; Goethel, M.F.; Erblang, M.; Vilas-Boas, J.P.; Pyne, D.B.; Fernandes, R.J.; Lopes, P. Impact of an Overload Period on Heart Rate Variability, Sleep Quality, Motivation, and Performance in High-level Swimmers: Use of Explainable Artificial Intelligence (XAI) to Assess Training Load Variations. Sports Med. 2025, 1–20. [Google Scholar] [CrossRef]
- Wang, M.; Yang, Y.; Li, F.; Liu, L.; Cao, W. A novel robust mixture-of-experts model with causal priors for interpretable water quality diagnosis. Adv. Eng. Inform. 2026, 71, 104267. [Google Scholar] [CrossRef]
- Hasson, F.; Keeney, S.; Mckenna, H. Research guidelines for the Delphi survey technique. J. Adv. Nurs. 2000, 32, 1008–1015. [Google Scholar] [CrossRef] [PubMed]
- Viswanathan, M.; Viswanathan, M. Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang. 2005, 19, 55–83. [Google Scholar] [CrossRef]


| Class | Representative Algorithms | Philosophical Logic | Mathematical Foundation | Computational Output | Genomics Use-Scenario |
|---|---|---|---|---|---|
| Constraint-based | PC, FCI | Difference making | Non-parametric CI tests; Causal Markov + Faithfulness | CPDAG/PAG | Sparse regulatory networks. |
| Score-based | GES, GFCI | Explanatory factors | Gaussian/multinomial likelihood + sparsity penalty | CPDAG/PAG | High-dimensional genomics |
| Functional causal models | LiNGAM, ANM, PNL | Causal asymmetry | Structural equation Y = f(X,e), e independent of X | DAG + noise-residual function | Where directionality is crucial |
| Year | Game | 20 km Mark | 10 km Mark | 5 km Mark | 1 km Mark | BT | AT | BH | AH |
|---|---|---|---|---|---|---|---|---|---|
| 2015 | WC | IAAF | IAAF | IAAF | NA | IAAF | WS | IAAF | WS |
| 2016 | OC | IAAF | CCTV | NA | NA | IAAF | WS | WS | WS |
| 2017 | WC | IAAF | IAAF | IAAF | NA | IAAF | IAAF | IAAF | IAAF |
| 2019 | WC | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF |
| 2021 | OC | IAAF | CCTV | CCTV | CCTV | IAAF | WS | WS | WS |
| 2022 | WC | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF |
| 2023 | WC | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF | IAAF |
| Step | Score (sᵢ) | Justification |
|---|---|---|
| Step 1: Formulate practical problem | +2 | Highly relevant and urgent: the problem was directly raised by the national team and directly linked to Olympic preparations in immense practical value and timeliness. |
| Step 2: Define scientific problem | +2 | Precise translation: sports scientists successfully translated the coach’s vague concerns into a clear, testable scientific hypothesis focusing on “dynamic impact.” |
| Step 3: Design computational problem | +2 | Efficient collaboration: sports science and casual inference experts collaborated closely to jointly define clear data requirements and a multi-stage analysis pathway. |
| Step 4: Collect data | +1 | Resourceful but challenging: the team successfully collected data from multiple sources, but manual entry method indicated that the data was not perfectly readily available. |
| Step 5: Validate data | +2 | Rigorous methodology: the consistency of data from multiple sources was rigorously validated using the ICC, laying a solid foundation for the reliability of later analyses. |
| Step 6: Select computational method | +2 | Prudent selection: considering potential confounders, the FCI algorithm, capable of handling latent variables, was chosen over the simpler PC algorithm, reflecting a deep understanding of the problem’s complexity. |
| Step 7: Execute computation | +1 | Standard execution: the computation was executed smoothly as planned, producing the expected correlation matrices and causal graph, thus completing the technical task. |
| Step 8: Interpret computational results | +2 | Key to interdisciplinary fusion: this was a critical step for value realization. Tri-domain experts jointly interpreted the results, translated the abstract causal graph into a meaningful concept of potential interventions, and made the conclusion understandable. |
| Step 9: Solve practical problem | +2 | Highly actionable: the final conclusion was distilled into a direct, clear tactical recommendation rather than remaining at a theoretical level, successfully closing the loop. |
| Total Score | +16 | The overall process is highly trustworthy. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cui, D.; Jiang, Z.; Zhang, X.; Yang, W.; He, Z. A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports. Appl. Syst. Innov. 2026, 9, 43. https://doi.org/10.3390/asi9020043
Cui D, Jiang Z, Zhang X, Yang W, He Z. A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports. Applied System Innovation. 2026; 9(2):43. https://doi.org/10.3390/asi9020043
Chicago/Turabian StyleCui, Dandan, Zili Jiang, Xiangning Zhang, Wenchao Yang, and Zihong He. 2026. "A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports" Applied System Innovation 9, no. 2: 43. https://doi.org/10.3390/asi9020043
APA StyleCui, D., Jiang, Z., Zhang, X., Yang, W., & He, Z. (2026). A Multi-Domain Collaborative Framework for Practical Application of Causal Knowledge Discovery from Public Data in Elite Sports. Applied System Innovation, 9(2), 43. https://doi.org/10.3390/asi9020043

