1. Introduction and Background
Professional basketball, particularly within the National Basketball Association (NBA), is a high-stakes environment where the competition is fierce, and the performance of players under pressure can determine the outcome of games [
1]. Among the varieties of factors influencing the success of an NBA team, the concept of “clutch performance” stands out as a critical element [
2]. Clutch performance refers to a player’s ability to excel in the crucial final minutes of closely contested games. The NBA introduced the Clutch Player of the Year award, also known as the Jerry West Award, in the 2022–2023 season to recognize the player who excels most in high-pressure situations. This award is determined by a media panel, including sports journalists and broadcasters, based on nominations from NBA head coaches. The panel votes to select the player who has performed best in clutch moments throughout the season [
3,
4].
This research investigates clutch performance in the NBA, focusing on the final minutes of tightly contested games. Utilizing a novel approach that combines advanced data science techniques with in-depth statistical analysis, we aim to uncover the key determinants of successful outcomes during these critical moments. By introducing the Estimation of Clutch Competency (EoCC) metric, we provide a comprehensive assessment of a player’s impact in clutch scenarios. This study examines player performance statistics over 20 seasons, addressing a significant gap in the literature regarding the quantification of clutch moments and challenging conventional insight in basketball analytics.
The essence of professional basketball lies in the strategic execution of offensive possessions, each capped at 24 s, contributing to the game’s brisk rhythm. On average, NBA teams engage in approximately 100 ball possessions per game with each successful basket incrementally increasing the team’s score [
5]. However, not all possessions hold equal value; those occurring in the final tense minutes of a closely contested game carry amplified significance. It is during these critical junctures that the term “clutch” emerges, defining a player’s ability to excel under immense pressure [
6]. Clutch moments, often unfolding in the fourth quarter or overtime, demand more than technical skill. They require a blend of mental fortitude, including confidence, resilience, and experience, alongside the willingness to shoulder the outcome of crucial plays. These attributes, while pivotal, elude direct quantification, posing a challenge in evaluating a player’s composure and confidence in high-stakes scenarios [
7].
The concept of clutch performance examines the variance between a player’s regular performance and their ability to rise to the occasion during critical game phases. Notably, present evidence contradicts the common belief of consistent performance elevation in clutch situations. These studies indicate that players’ shooting percentages do not typically increase during clutch moments compared to regular game statistics [
8]. This finding is corroborated by a growing body of research, which suggests that significant improvements in late-game situations are not as prevalent as previously assumed. These insights guided this research to adopt a metric that emphasizes the absolute clutch performance of players, rather than their performance relative to non-clutch situations, providing a more objective assessment of their impact on the game during its most critical junctures [
9].
“Basketball on Paper” by D. Oliver [
10] is analyzed for its contribution to quantifying basketball’s complex elements, much like statistical approaches revolutionized baseball. Oliver’s methodology, underlined by a detailed analysis of a 1997 NBA Finals game, extends to evaluating team strategies, player contributions, and coaching impacts through advanced statistics. The work introduces metrics, such as individual win–loss records, showing how statistical evaluations can inform more nuanced personnel decisions. This synthesis is targeted not only at basketball enthusiasts but also at sports professionals exploring the potential of quantitative analysis for strategic enhancements.
A study found that the performance of NBA teams is significantly influenced by the scheduling of games particularly with regard to back-to-back fixtures [
11]. The investigation revealed that an increase in the probability of winning is significantly associated with having at least one day of rest as opposed to playing games back-to-back. Furthermore, it was observed that shooting efficiency, along with other performance metrics, varied significantly across different scheduling conditions. This implies that game schedules and rest periods are pivotal factors affecting the performance outcomes of NBA teams. It suggests that modifications in game scheduling to ensure adequate rest could lead to enhancements in performance and a potential reduction in injury risks, providing valuable insights for optimizing game strategies and training loads. Another study analyzed the effectiveness of different offensive strategies in the final moments of close NBA games. It found that transition, inbound, and complex team plays were the most effective for scoring, emphasizing the value of quick motion and cooperative actions. The research suggests that strategic play choices, particularly those that enhance spatial dynamics, significantly impact success in crunch time scenarios [
12].
This research aims to bridge the analytical gap in understanding clutch performance in basketball through a dual approach. Firstly, it employs data mining (DM) techniques to identify the key factors that influence winning outcomes in tightly contested games. Secondly, it leverages these findings, along with relevant literature, to propose new metrics for ranking the NBA’s most skilled clutch performers over the last two decades, utilizing player statistics from the 1997 to 2018 seasons.
The motivation behind this study stems from a noticeable scientific literature gap on basketball players’ clutch performance, which is compounded by a scarcity of pertinent data. Advanced basketball analytics like Player Efficiency Ratings (PERs) face implementation challenges in the context of last-minute statistics, and more nuanced metrics, such as Value Over Replacement Player (VORP) and Usage Percentage (USG%) are either unmeasured or inapplicable in these crucial moments. Despite these hurdles, our research is grounded in a rigorous statistical approach, aiming to deliver insights that are not only highly relevant to understanding clutch performance but also resonate with the analytical framework of the NBA.
The novelty and contribution of this study to the state of the art in the domain of Sports Analytics, specifically in the context of basketball and the NBA, are multifaceted and significantly enrich the existing literature and analytical practices. It introduces innovative DM techniques to dissect the intricacies of clutch performance in basketball, focusing on the last-minute plays that often decide the outcomes of closely contested games. This approach is novel in its application of a comprehensive array of statistical methods and machine learning (ML) algorithms to identify and quantify the key factors that contribute to winning in high-pressure situations. The research methodically filters and analyzes data spanning over two decades (1997 to 2018), encompassing a vast array of player performances in clutch moments, which itself is a considerable accomplishment given the traditional challenges associated with accessing and interpreting such specific datasets.
The comprehensive analysis, grounded in DM and ML, not only challenges conventional wisdom but also sets a new benchmark for future research in sports analytics. By interpreting clutch performance and introducing new evaluative metrics, the research significantly contributes to both theoretical and practical aspects of basketball analytics, offering valuable insights for coaches, analysts, and the broader sports analytics community. In summary, the novelty lies in the sophisticated analysis of clutch moments through advanced DM techniques, and the contribution is the development of new Formula (1) for clutch performance evaluation, filling a critical gap in sports analytics literature. This work not only advances our understanding of key performance indicators in basketball but also enhances the analytical tools available for strategic decision making in sports.
Related Work
The journey of sports analytics is a transformative narrative, tracing its impact across diverse sporting domains before homing in on its noteworthy influence within basketball, notably within the NBA. This exploration not only underscores the evolution of analytical methodologies but also underscores the innovative application of data science in enhancing our comprehension of sports performance. This study addresses this gap by introducing the EoCC metric.
Clutch performances in competitive sports are described as exceptional performances occurring in critical competition situations [
13]. The nature, conditions, and components of clutch performances are examined, and comparisons are made with other types of performances [
14,
15]. The implications for sports psychology and coaching are discussed as well. Another study discussed the concept of clutch performance from a positive psychology perspective, which focuses on the strengths and virtues that enable individuals to thrive [
5]. The paper argues that clutch performance is a manifestation of positive emotions, flow, resilience, and self-efficacy. The review establishes clutch performance as a real and measurable phenomenon associated with positive psychological states, coping skills, and motivational factors [
13,
16].
A study on the clutch player phenomenon examined factors impacting scoring probabilities under pressure in basketball, including player performance during clutch times and the “hot player” phenomenon. This study utilized data from NBA seasons between 1996/97 and 2020/21, employing methodologies like LASSO logistic regression to evaluate player-selection policies based on these properties. The findings demonstrated a significantly higher success rate when the shot was taken by the highest-ranked player compared to commonly taken clutch shots [
17]. Our EoCC metric fills this gap by incorporating both offensive and defensive contributions, providing a more balanced evaluation of a player’s clutch capabilities.
Another investigation employs a structural equation model to predict sport performance under pressure, considering personality traits, anxiety, self-focus, perceived control, and implicit knowledge. The findings suggest that reinvesting attention in the task enhances performance, while anxiety and self-focus impair it, emphasizing the role of perceived control [
9]. Our work extends these findings by quantifying performance based on data analysis over 20 NBA seasons.
The relationship between clutch performance and team cohesion in basketball is explored through a mixed-methods approach, analyzing data from 16 university basketball teams. The study identifies a positive correlation between clutch performance and team cohesion, which is influenced by factors such as leadership, communication, and confidence [
18].
In a qualitative study, the role of self-regulation in clutch performance among elite athletes is investigated through semi-structured interviews. The analysis reveals the critical role of self-regulation with athletes employing various strategies to regulate cognition, emotion, motivation, and behavior [
19].
Clutch performance has been scrutinized through various methodological lenses across different sports, including golf, baseball, soccer, and tennis. In professional golf, a multilevel modeling analysis of performance data from 1029 golfers across 1143 tournaments between 2010 and 2019 suggested that individual, situational, and contextual factors such as experience, skill, personality, pressure, and competition influence clutch performance [
20]. In contrast, our research is specific to basketball, and our use of advanced DM and ML techniques allows us to decode the complex interactions that define clutch performance in the NBA.
In baseball, researchers used a Bayesian framework to estimate the clutch ability of batters and pitchers based on their 2019 MLB season performance data. Significant variation in clutch performance among players is observed without a strong relation to overall performance or salary [
21].
In tennis, researchers used a network analysis of point-by-point data from the 2019 Grand Slam tournaments. The study reveals that clutch performance is a complex and dynamic phenomenon affected by the network structure and properties of the players, such as centrality, prestige, and reciprocity [
22].
The impact of clutch performance on team success in soccer, utilizing a data-driven approach to measure the clutch performance of players and teams based on goal-scoring data from the 2018–2019 English Premier League season, was studied [
8,
23]. A positive relationship between clutch performance and team success was found, which was influenced by team and player characteristics such as quality, style, and position [
16,
24].
The evolution of sports analytics is marked by the increasing application of data analytics across different sports, aiming to address long-standing questions and draw insightful conclusions. A notable area of interest has been the intersection of genetics and training in determining athletes’ potential [
25,
26]. The advent of analytics has empowered researchers to explore beyond conventional wisdom, utilizing databases of athletes’ physical characteristics and employing DM to predict young athletes’ potential in various sports [
27,
28].
Performance analysis research has been categorized into three distinct types: fundamental theory-driven, practice-oriented applied science, and atheoretical over-simplistic research. Integrating theoretical frameworks with advanced methodological tools is crucial for advancing beyond simple descriptive statistics toward predictive and actionable insights in sports science [
29].
A deep learning model was designed to be utilized in optimizing basketball plays with the prediction of the best shot-taker and the identification of the most strategic lineup being based on game conditions. By leveraging data on players and games, the approach was aimed at enhancing strategic decision making in real time, increasing the likelihood of scoring [
30].
Another deep learning model, Multiple bidirectional Encoder Transformers for Injury Classification (METIC), was developed to forecast injuries in NBA basketball by analyzing the longitudinal data of NBA player injuries and incorporating game activity and player statistics. This model, which surpasses traditional ML methods in performance, employs feature learning to generate interactive features that become significant in combination with each other, offering practical insights for athlete management to potentially reduce injury incidence [
31].
Using Classification and Regression Trees (CART) to assess shooting performances in basketball, researchers analyzed play-by-play data from both the Basketball Champions League and the NBA. The aim was to calculate shooting probabilities under various pressures and develop an index to measure individual shooting performance, taking shot difficulty into account. This analysis revealed how shooting effectiveness is influenced by factors such as the remaining game time, the score gap, and the outcomes of previous shots, demonstrating the significant role of big data in enhancing sports analytics and team strategy through an understanding of player performance in critical situations [
32].
A study discussing the significant role of data analytics in enhancing decision making in team sports emphasized the shift from questioning the utility of analytics to focusing on how deeply and rapidly it should be integrated into sports strategies. It highlighted how analytics has become pivotal in player recruitment, game planning, and health management, driven by league initiatives, technological advancements, and successful case studies. The study also underscored the critical balance between analytics-driven strategies and traditional methods, suggesting a more nuanced approach to incorporating analytics into sports management and decision making [
33]. Our research bridges this gap by applying ML algorithms specifically to clutch scenarios, developing the EoCC metric, which evaluates and ranks players based on their impact in high-pressure moments.
Evaluations of player statistics have expanded to include advanced metrics like Adjusted Plus–Minus (APM), shedding light on a player’s impact on the game beyond basic scorelines [
34,
35]. Research has also explored the challenges of quantifying defensive efforts with innovations such as the “Dwight Effect” illustrating the complexity of defensive analytics [
36].
The authors of [
8] reviewed, synthesized, and evaluated existing research on clutch performance, which is defined as improved performance under pressure in sport and exercise. The main characteristics, antecedents, and consequences of clutch performance are identified alongside the gaps and limitations in the current literature. Recommendations for future research and practice are also provided.
The exploration extends to DM techniques that have been applied in sports analytics from the Apriori algorithm for uncovering relationships between variables to neural networks and linear regression for performance prediction and evaluation. These methodologies have opened new avenues for understanding sports dynamics, offering tools for predicting outcomes like Hall of Fame inductions based on statistical data [
14].
In this context, this research contributes by employing DM and ML to decode clutch performance in the NBA. Through an analysis of player performance during high-pressure, last-minute plays, this study introduces novel metrics for evaluating clutch performance, addressing gaps in traditional analytics. By integrating advanced analytical techniques with a comprehensive review of existing literature, this study enhances our understanding of clutch moments in basketball and establishes a benchmark for future research in sports analytics.
2. Data and Methods
Introducing the key components of our study, we embark on an ambitious journey to unravel the intricacies of clutch performance in the high-stakes environment of NBA basketball. This exploration is meticulously structured into four pivotal sections: Aims and Objectives, Methodology, Data Engineering, and ML Methodologies. Each segment is crafted to build upon the previous, creating a cohesive narrative that guides us through the objectives we set to achieve, the methodological approach we adopt, the rigorous data engineering processes we undertake, and the comprehensive data analysis we perform.
2.1. Aim and Objectives
The primary aim of this study is to employ advanced DM techniques to dissect the pivotal moments of NBA games, focusing on clutch performance in high-pressure situations. Specifically, the objectives are twofold:
In this research, our primary aim was to identify key factors that contribute to successful outcomes in tightly contested NBA games, particularly focusing on the critical last minutes of basketball gameplay. Utilizing advanced data analytics, we examined individual player statistics as predictors to determine which actions—such as scoring, offensive rebounds, steals, or blocks—most significantly influence the probability of winning these clutch moments. This approach not only highlights the specific contributions of players but also addresses the complex dynamics of basketball where individual performances can pivotally impact game results.
Building on this analysis, we developed a novel metric, namely EoCC, to evaluate and rank players based on their clutch performances over the last two decades. The EoCC (1) incorporates a comprehensive array of performance indicators, integrating both offensive and defensive statistics to provide a balanced and nuanced view of player impact in clutch situations. The aforementioned formula (1) has been designed to appreciate the multifaceted nature of basketball performance, recognizing not just the actions themselves but also the context in which they occur, offering a sophisticated tool for assessing the clutch capability of NBA players.
2.2. Methodology
The cornerstone of the study, data acquisition, is a multifaceted endeavor that presents both opportunities and challenges in the context of modern data science. We navigated these complexities by sourcing basketball analytics from accredited online platforms, employing Python-based web scraping techniques. This dual-method approach was instrumental in compiling a dataset that spans the NBA seasons from 1996 to 2018, ensuring both breadth and depth in our analysis. Recognizing the importance of data quality, we embarked on an exhaustive preprocessing journey. This critical phase involved cleansing, variable filtering, and feature transformation to refine the dataset. Our rigorous efforts in preprocessing laid a solid foundation for accurate and reliable analysis, setting the stage for insightful discoveries [
37,
38,
39].
It meticulously addressed data inconsistencies, errors, and outliers, ensuring the dataset’s integrity. This process was essential for maintaining the study’s analytical rigor with a particular focus on refining variables crucial for clutch performance analysis. By establishing a statistical threshold for game participation, we focused our analysis on players with a significant presence in clutch moments. This approach not only enhanced the study’s validity but also illuminated the nuances of clutch performance. The dataset was streamlined by eliminating irrelevant variables and transforming key attributes to align closely with our research objectives. This targeted refinement was pivotal in practicing the analysis on the most impactful factors in clutch performance.
Figure 1 illustrates the data preprocessing for the NBA clutch performance analysis procedure, outlining a structured workflow that begins with ‘Data Cleansing’. In this initial phase, outliers—players with inconsistently high statistics—are identified and addressed to ensure data quality. Additionally, to achieve uniformity across datasets, only the top 50 players are retained, enhancing homogeneity. The next step, ‘Variable Filtering’, involves the strategic removal of certain variables such as ‘Team’, ‘Age’, ‘Fantasy Points’, ‘DD2’, ‘TD3’, and ‘Plus/Minus’, deemed extraneous for the study’s focus, streamlining the dataset. The final stage, ‘Data Transformation’, includes adding a ‘Relevant Year’ attribute to distinguish performance across different seasons for the same player and introducing a ‘Winning Percentage’ as the target variable to precisely measure the impact of players’ performance on game outcomes.
In addition, through our study, we used a vital metric, the True Shooting Percentage (TS%) for evaluating basketball players, combining field goals, 3-point field goals, and free throws into a singular efficiency measure. Its formula, which factors in the total points scored over a weighted combination of shooting attempts, offers a comprehensive assessment of a player’s scoring prowess. TS% is particularly useful for assessing clutch performance, reflecting how effectively a player converts scoring opportunities during critical moments of a game.
2.3. Data Engineering
The process of data collection, embodying the systematic gathering of information on pertinent variables from a multitude of sources, is undertaken with the objective of addressing data-centric inquiries or facilitating data-driven projects. It encompasses the integration of diverse data types and sources, ranging from observational data to digital formats such as .csv and text files. The web scraping technique is used for the automated extraction of online data. For the acquisition of data in this project, traditional basketball statistics spanning from the NBA season of 1996–1997 through to the 2017–2018 season were utilized, resulting in the compilation of 44 .csv files, with each year providing separate files for the regular season and the playoffs [
37,
38]. The methodological approach to data collection was twofold: firstly, the direct download of accessible Excel workbooks was employed wherever feasible; secondly, data-scraping techniques were applied, utilizing the Python programming language and its associated libraries to gather the required datasets.
An initial filtering of data from the source was conducted to include only those games where the point differential was five or less, signifying the analysis would focus solely on very close games, and limit the statistics to the last three minutes of gameplay. This duration was deemed a reliable indicator to distinguish between random occurrences and consistent behaviors, and it was sufficiently close to the game’s conclusion to ensure the engagement of the best players in the pursuit of victory.
Data cleansing was undertaken as a critical step in the research process, which was aimed at identifying and rectifying errors, irregularities, and anomalies within the dataset. This process was essential for ensuring that the data adhered to previously outlined qualities such as accuracy, completeness, validity, consistency, relevance, and security, facilitating the analytical process by rendering the data more comprehensible, easier to process, and scrutinize. The objective of data cleaning was to enhance the overall quality of the datasets and address any issues related to collection errors, missing values, outliers, or irrelevant information that could impede the analytical process.
The challenge posed by outliers, notably players with a minimal number of games yet with exceptional performances, was addressed by establishing a minimum number of games criterion for player eligibility or employing the number of games as a negatively weighted variable in the formulas. A preference for the former approach was expressed, as the latter was deemed to unnecessarily complicate the algorithms without offering tangible benefits. The establishment of this threshold was grounded in statistical justification, which was aimed at ensuring the sample represented clutch ability rather than random performance incidents. ANOVA tests were conducted on a subset of datasets to identify an optimized threshold for the “Games Played” (GP) attribute, dividing the datasets into smaller groups based on GP and employing ANOVA and Tukey’s HCD tests to discern statistically significant differences between the groups. This analytical process illuminated the statistical disparities across different game participation levels, ultimately setting a threshold of 20 games for the regular season and 5 games for the playoffs, balancing statistical rigor with empirical insights.
For the establishment of optimal game participation thresholds in
Table 1, the null hypothesis, which assumed no significant difference in mean clutch statistics between the groups, was rejected for groups comparing 0–4 games to those with higher game participation (25–29 and 35–39 games), as indicated by the p-adjusted values and the “reject” column. Accordingly, we established a threshold of 20 games for the regular season, aligning with our findings where significant statistical differences were observed in groups beyond this point. This analytical decision was encouraged by practical basketball reasoning, which suggests that players who appear in a higher number of games are more likely to contribute consistently to team success. The set threshold for the playoffs was adjusted to 5 games, acknowledging the condensed nature of postseason play. These thresholds were verified for statistical significance, ensuring that our analysis focused on players with substantial game involvement, reducing the potential distortion from outlier performances and enhancing the reliability of our overall results.
Feature selection is employed to pinpoint essential features enhancing model accuracy. This process simplifies data analysis, yielding more precise and interpretable models. A summary of the features excluded from further consideration is outlined below:
TEAM: timeframe of 22 years, team composition, coaching staff, and directors undergo significant changes, rendering team comparison irrelevant for a study focused on player analysis.
AGE: While the age of a player could potentially influence clutch decisions through experience, this study opts not to incorporate age as a parameter, acknowledging the complexity of analyzing players as time series.
REB/DREB: Despite rebounds being a critical basketball statistic, defensive rebounds were deemed not as impactful in clutch moments, where offensive rebounds present a more significant interest.
FP (Fantasy Points), DD2/TD3 (Double-Doubles and Triple-Doubles), and Plus/Minus: These were excluded due to their lack of direct relevance to the clutch period under study.
In the preprocessing phase, we carefully selected variables that directly impact clutch moments, which led to the exclusion of accumulative statistics like DD2, TD3, and Plus/Minus. These metrics, while valuable, do not specifically align with our focused definition of clutch performance—critical plays within the last three minutes of a game with a point differential of five or less. Defensive rebounds were also omitted, as they typically result from team defense rather than isolated individual efforts in clutch situations. Our methodology aimed to ensure clarity in this study ML analysis by focusing on the most influential clutch actions. In a further or alternative study, we are open to reassessing the importance of these variables in the broader context of the game, potentially utilizing more sophisticated models that can account for the complexity they introduce.
A final step involved limiting the dataset to the top 50 players per season for regular season datasets, which was a decision informed by previous data-cleaning steps. This uniformity across datasets aimed to enhance data homogeneity and facilitate subsequent algorithmic analysis.
Data transformation processes were implemented for greater relevance to clutch performance analysis, and we introduced specific attributes to enhance the contextual understanding of the data. A critical addition was the ‘YEAR’ attribute, which delineates the season and stage (regular season vs. playoffs) for each player’s statistics. This distinction is vital for differentiating performance across years; for example, it allows us to discern between Kobe Bryant’s performance in 2001 versus 2008. This granularity is crucial as we aggregate and compare datasets to draw meaningful conclusions about players’ clutch abilities over time.
Additionally, we identified a need for a target variable suitable for our supervised learning models. After consideration, we chose the Win/Games Played (W/GP) ratio as a proxy for assessing the impact of a player’s performance on game outcomes. While acknowledging its limitations, this statistic serves as a practical solution in the absence of game-by-game results. Consequently, we replaced the ‘W’ and ‘L’ columns with a ‘WinPercentage’ attribute that encapsulates the essence of these variables into a single, comprehensive metric.
In the current aggregated dataset, the statistics we present are average values calculated based on each player’s participation in clutch moments rather than aggregating totals for the entire season. To illustrate, the points statistic (PTS) for a player is computed from the average points that the player scored in the last three minutes of clutch games, which are games they played with a point differential of five or less. If a player participated in 36 such clutch games in a season, the PTS value in our dataset would specifically represent the average points scored during those critical final minutes across the 36 games. This method of normalization is applied consistently across all the statistical metrics we have included. By focusing on these average values during clutch periods, we aim to accurately capture and evaluate a player’s impact when it matters most.
Figure 2 demonstrates the final structured form of our datasets following data preprocessing with all pertinent statistics present alongside the newly added ‘YEAR’ (indicating the season and whether it was regular or playoff) and ‘WinPercentage’ (the fraction of wins to games played). This approach has streamlined our data for the upcoming algorithmic exploration, setting the stage for a detailed analysis of clutch performance across different seasons and playoff contexts. This refined dataset laid the groundwork for in-depth analytical exploration in the subsequent phases of the project.
2.4. Machine Learning Algorithms and Methodologies
The LASSO technique, known for its utility in variable selection and regularization, involves appending a penalty to the linear regression function, aiming to minimize squared residuals while ensuring the sum of the absolute coefficient values does not exceed a specific constant. This penalty, defined by the user through the Alpha parameter, serves to diminish the relevance of less significant features and mitigate overfitting by constraining the coefficients. The determination of the optimal Alpha value for the dataset involved 10-fold cross-validation, which identified Alpha = 0.0006 as ideal. Employing this Alpha value, the dataset was divided into training and test sets in an 80–20% split, upon which the model was applied, revealing blocks as the most influential factor, which was followed by offensive rebounds and turnovers.
Elastic Net, a regression method akin to LASSO but incorporating both L1/L2 regularization, aims to nullify certain variables’ impact (L1) and address collinearity (L2) by penalizing the square sum of coefficients. It features two hyperparameters: Alpha, determining penalty strength, and L1_ratio, balancing L1 and L2 effects. Through a 10-fold cross-validation, optimal values for these hyperparameters were identified as Alpha = 0.0007 and L1_ratio = 0.9, suggesting a close approximation to the LASSO model.
Simple linear regression, foundational to both LASSO and Elastic Net, was not expected to yield significantly different results. Following an 80–20% data split, the outcomes aligned with expectations, although the negative coefficient for turnovers was notably higher, emphasizing the criticality of errors in game final moments.
The exploration extended to decision trees and random forest, which are non-parametric methods that eschew assumptions about data distribution. Employing an 80–20% split, the random forest model was refined to n = 100 estimators, revealing a unique data insight: the TS% variable exhibited a stronger correlation with win/loss ratio than any other variable, which was a novel finding compared to previous algorithms.
XGBoost, part of the gradient boosting family, constructs models iteratively to correct predecessors’ errors, incorporating L1 and L2 regularization terms. Extensive testing yielded significant insights, with feature importance analysis before and after hyperparameter tuning indicating shifts in variable significance, marking this model as particularly successful in encapsulating data characteristics.
The overarching goal of this extensive modeling was to identify key individual performance factors influencing team win rates in close games. While not aimed at precise game outcome prediction, the analysis revealed player statistics traits that could enhance a team’s chances in clutch situations. The final section will introduce a bespoke clutch performance metric, derived from these insights, to assess and rank NBA clutch performances from 1997 to 2018.
3. Results
In the Results section, we dissect the impact of key performance indicators on clutch moments in NBA games through ML algorithms and the EoCC. This analysis illuminates the statistical features that distinguish winning efforts in the closing minutes and reveals insights into player performance under pressure. Through a comparison of various ML approaches and a detailed look at EoCC across two decades, we unveil patterns of success and the critical role of individual contributions in high-stakes scenarios.
3.1. Machine Learning Algorithms Application
One of the study results was based on ML methodologies. In the context of identifying key performance indicators for winning close basketball games, we explored various techniques to sift through player statistics and derive actionable insights. This exploration encompasses a diverse array of ML approaches each with its unique strengths and application scenarios.
Table 2 encompasses a comparative overview of several ML methods evaluated in the study. LASSO and Elastic Net are both utilized for variable selection and regularization with specific parameters like Optimal Alpha and L1_ratio crucial for their performance. Simple linear regression served as a baseline for comparison, emphasizing the significant impact of turnovers. Decision trees and random forest, being non-parametric, captured unique data peculiarities, showing a strong relationship between True Shooting Percentage (TS%), blocks, and winning. XGBoost, a gradient boosting method, demonstrated significant improvements after hyperparameter tuning, notably altering the significance of certain features. Each method provided unique insights, highlighting the importance of feature selection and model tuning in predicting outcomes in close basketball games. The specific hyperparameters were determined using a 10-fold cross-validation approach designed to optimize model performance, and these were directly integrated into the model functions during the training phase. Although the primary focus of our models was to gauge the impact of significant predictors on the outcome rather than prediction accuracy, we acknowledge that these aspects are interrelated.
The application of XGBoost, with and without hyperparameter tuning, demonstrated an enhanced performance in variable importance measures, as evidenced by the 10-fold cross-validation results. As presented in
Table 3, there is an improvement in the feature importances after the tuning.
3.2. Introduction of Clutch
The EoCC was formulated with the intention of quantifying a player’s ability to perform effectively under pressure, with an emphasis on scoring, defensive contributions, offensive rebounds, assists, and a penalty for turnovers. The introduced formula of EoCC is the following:
The presented Formula (1) captures clutch performance by integrating basketball knowledge with statistical insights. Scoring, represented by TS%, is fundamental, as effective scoring is often linked with victory in tight games. The parameters (1.2, 1.4, 1.75, etc.) are weighted linearly, reflecting the importance of each action as determined by algorithmic results and basketball expertise. ASTs are included, as they represent valuable decisions even if they were not highlighted as top predictors; they still contribute to the overall clutch moment. TOVs are appropriately penalized, acknowledging their critical impact on game outcomes. This blend of data-driven analysis with a significant understanding of basketball dynamics aims to provide a robust metric for clutch performance, acknowledging areas for future research and refinement. The acronyms of Equation (1) are explained in
Table A1.
Through EoCC metric (1), a detailed analysis was conducted on datasets spanning from 1997 to 2018. It was found that significant performances often occurred in contexts where teams relied on a singular, exceptional player during crucial game moments. This reliance on a “lone superstar” presented challenges in correlating team success with championship victories. Notably, LeBron James’s appearances in the top-20 performances were exclusively with the Cleveland team, and they were absent during his tenure with top-tier teammates like Kyrie Irving. Similarly, performances by Allen Iverson, Russell Westbrook, and Dirk Nowitzki underscored their roles as primary options on their respective teams without significant support (
Table 4).
An analysis aimed at identifying consistent top clutch performers revealed that Michael Jordan and Kobe Bryant emerged as leaders, with DeMar DeRozan, LeBron James, and Dwyane Wade following closely. This investigation into consistency highlighted a preference for mid-range shots in clutch situations, suggesting a potential reevaluation of shot selection strategies in late-game scenarios (
Table 5).
In addition, the focus was placed on examining the consistency of top clutch players and their frequency of appearance in the compiled lists. Lists were generated featuring the top three clutch players based on the End of Close Contest (EoCC) rating for each instance, resulting in 44 lists comprising three players each: 22 for playoff years and 22 for the regular season. Subsequently, the frequency of individual player appearances in these top-three lists was calculated as a proportion of their total appearances in the dataset. This method facilitated the calculation of a percentage indicating how often these prominent players attained top performance, assessing their consistency.
Table 6 and
Table 7 display the outcomes derived from regular season and playoff data, respectively. It was observed that the highest percentages tended to correlate with a lower number of appearances within the regular season dataset. However, the performances of LeBron James, Kevin Durant, Dwyane Wade, Kobe Bryant, and Stephen Curry were particularly noteworthy. LeBron James, for example, consistently ranked among the top three clutch performers in approximately half of his extensive career.
The compilation of regular season leaders predominantly featured recurrent names, suggesting a level of exclusivity. This phenomenon is contrasted with the playoff context, where rotation players occasionally exceed their standard statistical outputs, achieving figures comparable to those of superstars due to the notably tougher defensive environment targeting the opposing team’s stars. An illustrative case is the 1997 finals, where Michael Jordan’s decision to pass the ball to an unguarded Steve Kerr led to a historic game-winning shot.
The playoff data revealed additional insights, including a broader diversity of players in the top-three lists and the notable clutch performances of lower-profile players such as Peja Stojakovic and Kyle Lowry. Kobe Bryant’s playoff performances stood out, with exceptional numbers in six of his eight playoff appearances, making him the era’s most dominant clutch playoff player. This finding underscores Kobe’s increased playoff performance compared to the regular season, with his top-three appearance rate jumping from 33% to 75%, which was a stark contrast to LeBron James’ decrease from 46% to 27%. This comparison highlights Kobe Bryant’s exceptional determination and winning mentality.
It is important to note that players with only a single record in either category were excluded to focus on assessing consistency. Moreover, the calculated percentages represent the frequency of top-three appearances relative to appearances in the clutch player datasets, acknowledging that all featured players have participated in many more seasons but did not qualify for the datasets due to insufficient involvement in close games.
4. Discussion
In this study, an extensive analysis was conducted using ML algorithms to identify key performance indicators that contribute to winning in clutch moments of NBA games. It was demonstrated that variables such as blocks, offensive rebounds, and turnovers play significant roles with different algorithms highlighting various aspects of game dynamics. For instance, LASSO and Elastic Net highlighted the importance of regularization in managing collinearity, while XGBoost’s performance, especially after hyperparameter tuning, underlined the efficacy of gradient boosting methods in enhancing prediction accuracy.
The introduction and application of Equation (1) provided a novel metric for assessing player performance under pressure. The analysis revealed a trend where teams often depend on exceptional individual performances in critical game moments. However, this reliance on a singular superstar did not consistently correlate with team success in securing championships, as evidenced by the varying success rates of teams led by LeBron James, Allen Iverson, and Dirk Nowitzki.
Furthermore, the study highlighted the evolving nature of clutch performances with a noticeable preference for mid-range shots in tight game situations. This finding prompts a reevaluation of prevailing shot selection strategies during crucial game moments, suggesting a potential shift toward more efficient scoring options.
The examination of “lone superstar” scenarios elucidated the difficulties in achieving championship success under such conditions, with LeBron James’s and Kobe Bryant’s performances serving as prime examples. However, the study also illuminated the critical role of defensive contributions and efficient scoring in clutch situations. The trend analysis conducted through the Mann–Kendall trend test revealed a significant increase in player statistics over the years, suggesting an evolution in playing styles, strategies, and possibly the influence of “stat-padding”.
The correlation between individual clutch performance and team success was scrutinized, revealing that while individual prowess in clutch moments is invaluable, a comprehensive team strategy encompassing depth, coaching, and overall team performance is essential for championship success. The playoffs data suggested a more diversified representation in top clutch performances, indicating the importance of team dynamics and the role of less prominent players in clutch situations.
The EoCC metric provides a specialized evaluation of clutch performance, addressing the unique demands of high-pressure situations. Traditional metrics like PER, GS, plus–minus, and USG% offer valuable insights but are designed for broader contexts and do not account for the specific characteristics of clutch moments. For instance, PER offers a comprehensive measure of overall statistical performance but relies heavily on league-wide averages, complicating its application during clutch moments due to the variability in game context. Similarly, GS aggregates various box score statistics into a single value, which is useful for general performance assessment but lacks the context-specific weighting necessary for clutch scenarios.
Plus–minus measures the point differential when a player is on the court, reflecting team performance rather than individual contributions. This metric can be skewed by the performance of teammates and overall team strategy especially during critical plays where the same players are typically involved. USG% indicates a player’s involvement in team plays but does not necessarily correlate with effectiveness in clutch situations. In contrast, EoCC focuses on the significance of key statistical categories during clutch time and their direct impact on winning probabilities. By employing machine learning algorithms to tailor the weights of these statistics, EoCC provides a more accurate and comprehensive evaluation of player performance under pressure.
4.1. EoCC Case Studies
To validate the EoCC metric, we conducted a benchmarking analysis by comparing our EoCC results with the official NBA Clutch Player of the Year voting results. By examining the overlap between the top EoCC performers and the NBA’s clutch player nominees and award winners, we can assess the accuracy and relevance of our metric in identifying true clutch performers.
2022–2023 Season Analysis: For the 2022–2023 season, our EoCC results in
Table 8 highlighted Jalen Brunson, De’Aaron Fox, and Nikola Jokic as the top three clutch performers. When compared to the NBA Clutch Player of the Year voting results, we observed that both Jalen Brunson and De’Aaron Fox were recognized as nominees with De’Aaron Fox winning the award. Specifically, one out of the top three players in our EoCC rankings was also recognized in the NBA voting (
Table 9). Expanding this comparison to the top five EoCC results, which included Jimmy Butler, we found that four out of the top five players were also NBA nominees. This significant overlap extends to the top 10 EoCC results, where 7 out of 10 players matched the NBA voting results. This strong correlation between our EoCC rankings and the official NBA awards indicates that our metric effectively identifies players who perform exceptionally well in clutch situations.
2023–2024 Season Analysis: For the 2023–2024 season, our EoCC results in
Table 10 ranked Shai Gilgeous-Alexander, DeMar DeRozan, and Nikola Jokic as the top three clutch performers. Based on current performance trends and media discussions, it is reasonable to assume that these players would be strong candidates for the NBA Clutch Player of the Year award (
Table 11). Assuming the hypothetical nominees include these players, our analysis would show a very good alignment for the top three EoCC results. Additionally, for the top five EoCC results, which include Stephen Curry and Kawhi Leonard, we expect a significant overlap, further validating our metric. Even within the top 10, the inclusion of consistently high-performing players like Damian Lillard and Luka Doncic suggests that our EoCC metric aligns well with the players likely to be recognized by the NBA for their clutch performance.
The Total Points column (
Table 11) for the NBA Clutch Player of the Year is calculated by assigning point values to each player’s votes: 5 points for each 1st place vote, 3 points for each 2nd place vote, and 1 point for each 3rd place vote. The total points for each player are then determined by multiplying their votes in each category by the corresponding point value and summing the results.
In conclusion, while the EoCC (1) differs from the NBA voting scheme (statistical versus intuition-narrative approach), we support its use for predictive purposes. Future work can explore incorporating additional factors to create a version more targeted toward predicting the NBA Player of the Year. Given that this specific award has been active for only two years, more data will be needed to refine and enhance its effectiveness. This means that the award is often influenced by narratives, personal observations, and subjective assessments rather than solely relying on statistical performances and winning results. Consequently, the award can be seen as partly opinion-based rather than strictly scientific or data-driven.
4.2. Threats to Validity and Limitations
One of the primary limitations of our study is the availability and granularity of the data. While we incorporated as many relevant statistics as possible, the lack of detailed defensive data such as successful shot contests, forced passes, and other important defensive contributions limits the comprehensiveness of our analysis. These elements are crucial for a thorough evaluation of clutch performance, as they significantly impact game outcomes during critical moments. The absence of that data restricts our ability to fully capture the defensive ability and overall impact of players in clutch situations.
Our analysis primarily focuses on individual player statistics, which inherently assumes some level of independence in player performance during clutch moments. However, basketball is a team sport where player interactions, strategies, and dynamics play a significant role in determining outcomes. The independent assumption overlooks the synergistic effects and the influence of teammates and opponents, which can lead to an overestimation or underestimation of a player’s true clutch ability. Future studies should aim to integrate models that account for these interactions to provide a more accurate depiction of clutch performance.
The definition of a clutch player in our study relies heavily on win percentage and individual performance metrics. While win percentage offers a practical measure, it does not fully encapsulate the complexities and multifaceted nature of clutch performance. Some players may exhibit strong individual performances in clutch situations without significantly affecting the win/loss outcome due to various factors, including team strength and the quality of the opposition. This limitation highlights the need for a more nuanced definition that incorporates additional contextual factors and broader performance indicators.
Our current model does not adequately consider the strength of the team and the supporting cast, which are critical factors influencing clutch performance. Players surrounded by strong teammates may benefit from better spacing, less defensive attention, and more effective overall team strategies, potentially inflating their individual clutch statistics. Conversely, players on weaker teams may face more defensive pressure and less support, adversely affecting their performance metrics. This oversight poses a threat to the validity of our findings, as it fails to isolate the individual contributions from the team context. Incorporating metrics that account for team strength and the role of supporting players is essential for a more balanced and accurate assessment.
The study spans over two decades, from 1997 to 2018, during which the game of basketball has evolved significantly. Changes in playing style, rules, training methods, and overall league dynamics can influence player performance and the interpretation of clutch statistics. Comparing players across different eras without accounting for these changes introduces a temporal bias, potentially affecting the validity of the conclusions drawn. Future research should consider developing era-adjusted metrics to mitigate these effects and provide a more consistent comparison across different periods.
In conclusion, while our study provides valuable insights into clutch performance in the NBA, it is essential to acknowledge these limitations and threats to validity. Addressing these issues in future research will enhance the robustness and accuracy of clutch performance evaluations, ultimately contributing to a more comprehensive understanding of this critical aspect of basketball analytics.
5. Conclusions and Future Work
The findings from this research underscore the multifaceted nature of clutch dynamics in basketball, emphasizing the significance of both individual performances and strategic game elements. The application of ML algorithms elucidated the critical factors contributing to success in clutch moments, offering valuable insights for teams and coaches in optimizing strategies for tight game situations.
The introduction of the EoCC metric marks a significant advancement in quantifying clutch performance, highlighting the disproportionate impact of certain players in high-pressure scenarios. This metric serves as a foundation for further research into the psychological and tactical aspects of clutch performance, potentially guiding more nuanced coaching strategies and player development programs.
Moreover, the study’s exploration into shot selection strategies during clutch moments challenges conventional wisdom, suggesting that teams might benefit from a diversified approach to scoring in the closing stages of games. This could lead to a broader tactical evolution in late-game scenarios with implications for scouting, player training, and in-game decision making.
In this research, the investigation was centered on the performance of NBA players during the critical final moments of close games. The concept of ‘clutch’ was prominently featured throughout the literature review and empirical analysis, highlighting its pivotal role. The elusive nature of ‘clutchness’ within the sports domain was acknowledged, with some related work suggesting it represents the ability to surpass typical performance under pressure, while others question its consistency and view it as a moment of exceptional performance.
An exploration into the statistics of the final minutes from the NBA was conducted, aiming to rank players in this crucial category. The study’s focus was on identifying key factors that enhance a team’s chances of securing victory in the game’s closing moments. A novel metric for clutch performance (EoCC) was introduced and applied to evaluate various players under comparable conditions. The comparison of players’ clutch statistics with their average performance during the rest of the game was not included in our analysis for specific reasons. Firstly, this aspect has been addressed in previous studies, and secondly, averaging statistics from last-minute plays and contrasting them with the game’s remaining 45 min was considered to potentially introduce biases. Teams often alter their strategies significantly when the outcome is critical, with some coaches reserving special plays for these moments to surprise opponents. Hence, the research focused exclusively on the final 3-minute period.
Our study challenges the traditional view that clutch performance solely reflects improvement under late-game pressure. Influenced by findings [
5,
6,
9,
13] which indicate that players typically do not enhance their performance during clutch moments compared to regular play, we developed a metric focused on absolute clutch performance. This approach assesses how players maintain strong performance levels in critical game situations, identifying those who effectively lead their team in tightly contested scenarios. It highlights maintaining performance under pressure as a key indicator of clutch ability, providing a more objective assessment of a player’s impact during crucial moments.
Findings from the study revealed interesting insights into player efficiency during the last minutes and its influence on the likelihood of achieving a positive outcome. It was found that the most effective clutch players were predominantly mid-range specialists with contributions from big men being less frequent. This observation is consistent with tactical approaches where perimeter players initiate offense before considering passes inside. However, a significant correlation was observed between blocks, offensive rebounds, and winning percentages, indicating the crucial, though sometimes overshadowed, role of centers and power forwards in securing victory.
During the research, questions regarding the prevalence of mid-range shots in late-game scenarios, where secure decisions might deter teams from attempting the more popular but riskier 3-point shots, were raised. It was indicated by our findings that top clutch performers excelled in the mid-range game, suggesting an avenue for further investigation. The phenomenon of “stat padding”, where players might attempt to artificially boost their performance metrics, was also observed. A noticeable increase in statistical performance over the years was noted, warranting a detailed examination beyond the assumption of stat padding [
40,
41,
42].
The benchmarking analysis of the EoCC case studies demonstrates a significant overlap between our EoCC results and the NBA Clutch Player of the Year voting outcomes, particularly within the top 3, top 5, and top 10 rankings. This alignment validates the effectiveness and accuracy of our EoCC metric in identifying true clutch performers. The EoCC metric not only complements existing performance evaluations but also provides a reliable tool for assessing player impact in high-pressure situations, enhancing our understanding of clutch performance in the NBA.
A further observation made was the relationship between top clutch performance and seasonal team success. It was discovered that teams relying heavily on a single superstar for clutch scoring often produced the most impressive individual statistics. However, teams with multiple reliable clutch players achieved higher success rates and winning percentages. For instance, while the most notable individual performances were attributed to players in teams with a singular clutch focus, teams like Miami, with multiple go-to clutch players, including LeBron James and Dwyane Wade, saw greater success, including winning national championships. This suggests that while individual performance is critical in games, it does not solely dictate overall team success.
In conclusion, this research contributes to a deeper understanding of clutch dynamics in basketball, offering a comprehensive framework for analyzing key performance indicators and their impact on game outcomes. The insights gained from this study have the potential to influence coaching strategies, player evaluation, and the broader discourse on basketball analytics.
Future Work
Ideas for expanding our work in the future were contemplated, which are contingent upon access to big data datasets. An intriguing concept that was considered but ultimately not implemented involved the assignment of biometrics to performance metrics based on the timing of in-court actions. Specifically, the hypothesis that a shot taken in the last 20 s of a game holds greater significance than one taken earlier was explored for its potential mathematical formulation.
A particularly compelling area for future research, identified as the differentiation and comparison between team plays and isolation plays in the game’s final crucial moments, was suggested. This analysis could shed light on the importance of having strong clutch performers on a team, potentially guiding coaches toward a more balanced roster capable of collectively addressing critical situations rather than relying solely on a superstar. The exploration of individual plus–minus ratings, especially between specific offensive and defensive player pairings, presents another intriguing research opportunity. This could reveal whether certain players influence their opponents’ decision making in game-deciding moments. Additionally, the strategy of allocating the ball to the player with the best performance in a specific game, irrespective of their usual role, is a topic of debate among basketball enthusiasts. This intuitively sensible approach lacks empirical backing but merits investigation.
In future iterations of our research, we plan to incorporate a two-step analytical approach to deepen our understanding of clutch performance in NBA games. Initially, we will focus on team-level achievements, analyzing collective strategies and dynamics that contribute to success in tightly contested games. This phase will allow us to discern the broader tactical patterns and team behaviors that are most effective during critical game moments. Following this, we will conduct an individual-level analysis to identify specific players whose actions significantly impact the team’s chances of winning. By distinguishing between the contributions of different players, we can pinpoint which skills and decisions are most valuable in high-pressure situations. This two-pronged approach will provide a holistic view of both team and individual performances, substantially enriching our comprehension of the elements that define clutch success in professional basketball.
In conclusion, the domain of sports analytics, from psychological factors to straightforward basketball statistics, remains largely unexplored. A consensus is yet to be reached among researchers on whether clutch performance can be developed or is merely coincidental. Nonetheless, with the advent of advanced data collection systems and sufficient resources, there is significant scope for sports analysts to contribute to this field, enhancing coaching strategies and broadening our comprehension of the basketball universe.