Next Article in Journal
Analysis of the Behavior of Insider Traders Who Disclose Information to External Traders
Previous Article in Journal
The Impact of Geographical Factors on the Banking Sector in El Salvador
Previous Article in Special Issue
Post-Prime Football Player Valuations: Depreciation Difference Between the English Premier League and the Top European Leagues
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Financial Valuation of Football Players: A Machine Learning Approach Across Career Stages

1
Business School, Holy Spirit University of Kaslik, Jounieh P.O. Box 446, Lebanon
2
College of Engineering and Technology, American University of the Middle East, Egaila 54200, Kuwait
*
Author to whom correspondence should be addressed.
Int. J. Financial Stud. 2025, 13(2), 111; https://doi.org/10.3390/ijfs13020111
Submission received: 7 March 2025 / Revised: 6 May 2025 / Accepted: 26 May 2025 / Published: 17 June 2025
(This article belongs to the Special Issue Sports Finance (2nd Edition))

Abstract

:
The financial valuation of professional football players is influenced by multiple factors that evolve throughout a player’s career. This study examines these determinants using Gradient Boosting Machine Learning models, segmented by three age categories and three playing positions to capture the dynamic nature of player valuation. K-fold cross-validation is applied to measure accuracy, with results indicating that incorporating a player’s projected future potential improves model precision from an average of 74% to 84%. The findings reveal that the relevance of valuation factors diminishes with age, and the most influential features vary by position—shooting for attackers, passing for midfielders, and defensive skills for defenders. The study adopts a dynamic segmentation approach, providing financial insights relevant to club managers, investors, and stakeholders in sports finance. The results contribute to sports analytics and financial modeling in sports, with applications in contract negotiations, talent scouting, and transfer market decisions.

1. Introduction

A market is where buyers and sellers meet to exchange goods and services. In professional football, the transfer market represents a dynamic arena where clubs buy and sell players to strengthen their squads. In the early days, transfers occurred informally, with players moving between clubs without binding agreements (Hoey et al., 2021). However, as the game evolved, formal contracts became essential to protect the rights of players and clubs. The first documented transfer fees emerged in the 1890s, but it was not until the Bosman ruling in 1995 that the structure of the transfer market fundamentally changed (O Leary & Caiger, 2000). Sparked by Jean-Marc Bosman’s legal battle against RFC Liege, the ruling granted players the right to move freely at the end of their contracts without a transfer fee. This decision shifted the balance of power, increased player wages, and made top talent more accessible to wealthier clubs (Radoman, 2017).
Although several studies have examined the financial valuation of football players using regression analysis, option pricing models, or crowd-sourced estimates (such as Patnaik et al., 2019; Müller et al., 2017; Felipe et al., 2020), limitations persist in the literature. Many models treat all players uniformly, failing to differentiate valuation drivers across playing positions or career stages. Others acknowledge the importance of future potential, yet do not quantify its relative impact or examine how it varies over time. Furthermore, most traditional approaches assume linear relationships and may struggle to capture the complex, high-dimensional nature of player attributes. These gaps call for a more flexible, data-driven approach that can adapt to the heterogeneity and dynamic progression inherent in football careers.
Today’s football transfer market is highly complex and data-intensive, with over 20,000 players moving across more than 5000 clubs annually. Football’s significance within the broader entertainment sector is growing, with measurable ties to national economic indicators such as GDP per capita (Gásquez & Royuela, 2014).
In addition, the momentum in the global football transfer market demonstrates its increasing complexity and strategic significance. Just in January 2023, 4387 international transfers were recorded, amounting to USD 1.57 billion; a record figure that marked a 49.4% year-over-year increase. Major global events and investments have contributed to this trend. For instance, Qatar’s preparation for the 2022 FIFA World Cup involved infrastructure spending of approximately USD 220 billion, fifteen times more than what Russia spent in 2018 (Subathra et al., 2022). Similarly, Saudi Arabia has launched an aggressive football development strategy as part of its Vision 2030 initiative, targeting to position the Saudi league among the world’s top ten. This effort has included high-profile transfers of global stars like Cristiano Ronaldo, Karim Benzema, and N’Golo Kanté (Dazi-Héni, 2021). These developments reflect a broader push toward leveraging football as a tool for economic diversification and global impact. As the financial stakes continue to escalate, accurate player valuation becomes even more vital; not just for clubs and agents, but for national leagues and economic policymakers too.
Despite a growing body of research in this area, existing valuation models often fall short by treating all players uniformly, failing to account for how valuation drivers differ across positions and stages of a player’s career. Additionally, while many models acknowledge future potential as an influencing factor, they rarely quantify how its impact varies over time.
To address these gaps, this study proposes the use of machine learning (ML), specifically the Gradient Boosting algorithm, to develop predictive models that estimate the market value of football players based on a combination of physical, technical, and contextual attributes. ML offers a compelling advantage over traditional statistical methods due to its ability to handle large datasets, capture complex nonlinear relationships, and improve predictive accuracy (Samak et al., 2019). This study aims to answer the following research questions:
  • How do the factors influencing a football player’s market value vary across different playing positions?
  • How does the importance of these factors evolve with age, reflecting a player’s career trajectory?
  • What is the quantitative role of future potential in determining a player’s value at different stages of their career?
By answering these questions, the study aims to develop a robust, data-driven framework that supports more nuanced and dynamic player valuation. This holds significant practical implications for multiple stakeholders; clubs can make smarter investment decisions, agents can better negotiate contracts, coaches can guide development strategies, scouts can assess talent more accurately, and fans and analysts can gain deeper insights into player performance and value.

2. Literature Review

2.1. Mainstream Literature

A growing body of literature has explored the financial valuation of football players using traditional econometric and modern machine learning (ML) approaches. While these studies offer valuable insights, many present findings in isolation, lacking a unified framework that reflects how valuation drivers differ by player position, age group, or career trajectory. This study addresses these limitations by applying a segmented Gradient Boosting model that systematically analyzes how key variables vary in importance across different subgroups. The following subsections synthesize the literature into four central themes, each aligned with one of the study’s core research questions and modeling strategies.

2.1.1. Valuation Determinants Across Player Positions

Several studies have stressed the importance of the playing position in determining market value. Shen et al. (2023) demonstrated that key influencing factors vary significantly by position, with age and future potential ranking among the top drivers. Likewise, Behravan and Razavi (2021) found that different skill sets contribute to valuation depending on the player’s role, suggesting that position-specific models are more appropriate than generalized ones. Metelski (2021), while focusing on Poland’s top league, supported this view by identifying higher valuation trends among younger forwards.
However, most existing studies treat players as a homogeneous group, failing to reflect how distinct positions require different technical proficiencies. This study addresses this shortcoming by constructing dedicated ML models for attackers, midfielders, and defenders, thereby revealing position-specific valuation patterns and identifying the most influential attributes for each category.

2.1.2. Age and the Evolving Effect of Performance Indicators

Many studies examined age-related value trends. For instance, Herm et al. (2014) found a nonlinear relationship between age and market value, while Liu (2025) found that player value typically declines over time. Metelski (2021) and Felipe et al. (2020) showed peak transfer activity and market value clustering around the ages of 21 and 24.
While age is widely acknowledged as a valuation factor, few studies have examined how the relative importance of other variables, such as physical attributes, skills, and reputation, shifts across age brackets. To address this gap, the current study implements ML models segmented into three career phases—early (16–23), mid (24–29), and late (30+)—enabling the tracking of how predictors of value evolve with player aging.

2.1.3. Potential, Popularity, and Intangible Impacts

Beyond skills and age, intangible qualities like potential and visibility have been found to influence player value. Müller et al. (2017) and Al-Asadi and Tasdemir (2022) revealed that market value is increasingly impacted by future potential and visibility, and that fame and projected future contribution outweigh current performance, especially for top-tier players. Likewise, Yaldo and Shamir (2017) showed that non-skill attributes such as crowd-pulling power and social media presence often play a disproportionate role in salary determination.
Despite their recognized influence, these attributes remain difficult to quantify, and their weight across career stages is underexplored. This study incorporates potential as a predictor and compares model performance with and without it. This design allows for the empirical estimation of its relative importance and supports practical discussions on talent investment and forecasting.

2.1.4. Comparison Between Valuation Models

Previous studies have employed various methodologies, ranging from traditional statistical models to modern AI techniques. Regression-based approaches (such as Kiefer, 2012; Felipe et al., 2020; Patnaik et al., 2019) have yielded valuable insights but are often constrained by assumptions of linearity and variable independence. Patnaik et al. (2019), for instance, compared multilevel regression, crowd estimates, and option pricing, concluding that while regression was the most effective among the three, each had clear limitations in capturing market dynamics. Recent studies have turned to ML algorithms, which allow for nonlinear, high-dimensional modeling. Singh and Lamba (2019), Li et al. (2022), and Shen et al. (2023) demonstrated that among several tested models, Gradient Boosting consistently yielded the most accurate predictions. Gradient Boosting is particularly well-suited to modeling football player value due to its robustness to outliers, adaptability to changing data patterns, and ability to handle complex variable interactions; all critical in evaluating heterogeneous player profiles. This study builds upon these advances by tailoring ML models to specific subgroups and testing the value of future potential as a differentiating variable.

2.2. Theoretical Rationale and Modeling Perspective

This study adopts a predictive ML approach to assess how the determinants of football players’ market value vary by playing position and career stage. While prior research has applied both economic theory and behavioral models to interpret player valuation, such as wage determination, contract theory, and decision-making under uncertainty, this study refrains from drawing causal inferences or invoking marginal effects. Instead, the theoretical rationale for the current work is grounded in the principles of predictive modeling and applied sports analytics, where the goal is to identify patterns and generate actionable insights from data, so as not to estimate underlying structural relationships.
Rather than rely on traditional economic frameworks such as marginal productivity theory, which requires the computation of partial derivatives (e.g., ∂Value/∂Attribute), this study focuses on feature importance as derived from Gradient Boosting algorithms. These importance scores reflect how frequently and effectively a variable is used to split decision trees and optimize prediction, thus capturing predictive relevance, not economic contribution in a marginal sense.
In the same vein, concepts from behavioral economics, such as framing effects or recency bias, are acknowledged as relevant phenomena in football valuation, but are not tested or operationalized in this study. Since no proxies for psychological bias (e.g., recent performance spikes, media visibility indicators, or nationality framing) are included in the dataset, behavioral explanations are omitted from the interpretation of results to maintain empirical integrity.
The justification for using ML instead of theory-driven regression models rests on three factors. First, footballers’ market value depends on a multidimensional set of interacting variables that are technical, physical, reputational, and temporal. ML techniques like Gradient Boosting can capture such non-linear relationships more effectively than parametric models. Second, by segmenting the data into player positions and age groups, this study acknowledges that valuation drivers are context-dependent. ML models can accommodate this variation without requiring separate parametric specifications. Finally, clubs, agents, and scouts benefit from data-driven forecasts that emphasize predictive accuracy over interpretability. The current approach offers actionable insights into which features are most relevant for valuing players at different stages of their careers.
Therefore, this study is grounded in a data-centric modeling paradigm that prioritizes generalizability and predictive utility over theoretical abstraction. While economic theories of value and behavior may inform the broader discourse on player valuation, this study does not claim to test or validate such theories empirically. The insights presented should be interpreted as predictive associations, not causal relationships.

2.3. Hypotheses Development

In light of the reviewed literature, several important research gaps emerge. While many studies highlight the influence of player skills, fame, or age, few have explored how these factors interact differently across positions and stages of a player’s career. Moreover, the impact of potential remains under-quantified despite its recognized importance in modern transfers. To address these gaps, three hypotheses are developed, grounded in the critical synthesis of empirical findings.
Several studies have emphasized that different playing positions require distinct skill sets and contribute differently to team dynamics. Shen et al. (2023) and Behravan and Razavi (2021) found that valuation determinants vary significantly by position. Majewski (2016) found that goals and assists are critical for forwards, while Metelski (2021) noted that younger attacking players are often transferred at higher values. These findings highlight the need for a position-specific valuation approach. Thus, the first hypothesis of the study is as follows:
H1. 
Different positions are valued based on different sets of skills.
Figure 1 shows the categorization of players based on their skills. Defenders are primarily expected to excel in their defensive capabilities, even though their shooting accuracy may not be as high, while midfielders play a pivotal role in ensuring a seamless transition between the attackers and defenders but do not require the same speed as the attackers to be considered exceptional performers. Forwards are entrusted with the task of finishing attacks with great precision, but they are generally not assigned extensive defensive responsibilities.
Age is another well-established factor in player valuation (Hill et al., 2025; Lorincz, 2022). Herm et al. (2014) identified a nonlinear relationship between age and market value, while Liu (2025) noted a steady decline in value over time. Felipe et al. (2020) and Metelski (2021) highlighted peak valuation ages between 21 and 24. Additionally, continued representation in a national team or top-tier club at an older age can still signal strong market value (Idson & Kahane, 2000). However, few studies have explored how the predictive influence of specific attributes, such as physical condition, passing, or reputation, changes with age. By applying ML models to distinct age segments, this study tests whether predictive patterns shift over a player’s career. Thus, the second hypothesis of this study is as follows:
H2. 
As players age, the impact of specific factors on their valuation fluctuates.
A third underexplored area involves the role of future potential, typically assessed by expert evaluations. Many previous studies confirm the growing significance of potential and intangible attributes. For instance, Müller et al. (2017), Yaldo and Shamir (2017), and Al-Asadi and Tasdemir (2022) all found that fame, crowd-pulling power, and perceived growth potential often outweigh actual performance, especially for younger players. Similarly, Vroonen et al. (2017) stressed the strategic significance of potential in determining a player’s future impact on team success. Most predictive models do not quantify how much potential contributes to valuation across stages of a player’s development. This study directly incorporates potential as a feature and measures its predictive contribution. Thus, the third hypothesis of the study is as follows:
H3. 
A significant portion of a player’s value is based on his future potential.
Thus, this study aims to reveal how feature relevance varies across these dimensions, thereby improving our understanding of dynamic valuation mechanisms in professional football. Importantly, these hypotheses are grounded in predictive objectives, not causal claims, and are meant to uncover data-driven patterns that can inform real-world decision-making.

3. Results and Discussion

Two sets of quantitative results were obtained. The first contains nine models that consider the future potential of players, and the second omits this feature from the nine models. The potential of players was removed because most models are highly reliant on it, and because it is itself highly reliant on experts’ opinions about the player’s future. The results of the models that consider the potential of each player are shown in Table 11.
Six of the nine trained models had an average adjusted R-squared of more than 90% and only one model slightly below 60% (59%). The normalized Root-Mean-Squared-Error (RMSE) of the nine models ranged between 2% and 5%. When looking at the features’ importance in each model, it is apparent that most of the valuation can be explained by two variables, which are the international reputation and the future potential. Also, it can be noted that the influence of the future potential of the players has an increasing pattern as the players gain years of experience. A trend that is consistent across all positions shows that the impact of international reputation on the player’s value decreases with experience. Outside the potential and international reputation, a few significant variables appear to be influential, like the dribbling abilities of attackers, passing for midfielders, and defending for defenders during the early stages of their careers. Age was not considered in most of the models because the players were already separated into age groups, but it has still shown an effect on defenders at the last stage of their journey. As explained, the same analysis was performed but without adding the future potential of the players as one of the features fed to the models, because it was not highly quantifiable. The goal was to dive deeper into the dynamics of all the features for different positions and diverse age categories.
Although removing the potential decreased the accuracies of the models, their average R-squared stayed significant (all above 60%). The RMSE slightly increased but stayed within an acceptable range between 3 and 6%. However, the number of features impacting the player’s valuation increased, enabling us to better understand to what extent each variable contributes to the price of the player. Despite its significant impact on all the models, the trend of decreasing importance of international reputation was consistently observed in Table 2 across all positions. A new reflection in Table 2 was that as players gain years of experience, their valuation becomes more reliant on a higher number of abilities (% of variables with an effect of more than 5%). For attackers, the main supporters of the valuation are the dribbling and shooting skills, in addition to the pace or speed, which affects forwards aged between 24 and 29. If attackers are older than 29 years and are still playing in the national team, that would also considerably increase their transfer cost. Being part of the national team influences the market value of midfielders above 23 years, and for the same age bracket, passing skills significantly influence the valuation with substantial power over players above 29. At the mature level of their career, midfielders’ market value is also significantly affected by their defending skills and age. Considering the football pitch, passing and defending skills play a crucial role when assessing defenders above 23 years. At a future stage (above 29 years), the pace and age of the defenders also become relevant.

Validation of the Hypotheses

The variety of the approach and the diverse range of results covering multiple types of players led to the validation of all hypotheses.
The results demonstrate that the prices of attackers rely on their shooting and dribbling abilities. As for midfielders, passing could be an important determinant. When looking at defenders, the transfer fees can fluctuate based on the defending talents and passing proficiency of the player. Those observations substantiate the first hypothesis, expressing that skills, which could be important for a certain player’s valuation, can be much less relevant for another.
In the trial wherein the potential of each player was removed from the list of features, it was found that as players gain experience, the factors dictating their transfer price fluctuate even if players occupying the same position on the field are contemplated. This validates the second hypothesis, which states that the importance of certain features varies as a player grows older. The results show that for defenders, the effect of pace on their value increases from 0.08% in the 16–24 age category to 19.44% for players above 29 years of age.
The next proven point was that the future potential of certain players is crucial when accessing the market value because, after removing the potential from the set of independent features used to train the machine learning model, the normalized RMSE slightly increased for all the models, and the average accuracy decreased from 88 to 74%. These results concretize the third hypothesis showing that the models became mildly less reliable after removing the future potential.
This study examined how the determinants of football player valuation vary across playing positions and career stages using a predictive ML approach. The findings offer several contributions to the existing body of research.
First, the results confirm that valuation determinants are position-specific. Forwards’ market values were found to be strongly influenced by shooting and dribbling abilities, midfielders by passing skills and international reputation, and defenders by defensive attributes and, at later career stages, physical pace. These findings are consistent with prior studies such as those by Shen et al. (2023) and Behravan and Razavi (2021), which emphasized the need to account for different skill sets in different playing positions. They also support Majewski (2016), who found that offensive contributions like goals and assists have a greater impact on the value of attacking players.
Second, the study demonstrates that the importance of valuation factors evolves with player age. Intangible attributes, such as international reputation and future potential, exert a greater impact on younger players’ valuations, while tangible skill-based attributes, such as defending and passing, become more critical as players age. This finding is in line with the work of Herm et al. (2014) and Liu (2025), who identified a nonlinear relationship between age and market value, and complements the insights of Felipe et al. (2020) and Metelski (2021) on peak valuation ages between 21 and 24 years.
Third, the role of future potential in determining player valuation was confirmed. The removal of the “potential” variable from the models significantly reduced predictive accuracy across all groups. This result supports previous findings by Müller et al. (2017), Yaldo and Shamir (2017), and Al-Asadi and Tasdemir (2022), who argued that perceived future contributions often outweigh current performance, particularly among younger players.
Additionally, the study found that defenders’ valuations increasingly depend on physical attributes such as pace and age after the age of 30, which partially differs from previous research that often emphasized technical skills across all career stages. This suggests that physical durability becomes a more critical differentiator for defenders at later stages, an area that has received less emphasis in earlier valuation models.
By adopting a segmented ML approach rather than traditional parametric models, the study offers a new perspective on dynamic player valuation patterns. Unlike some earlier studies that assume homogeneity across players or rely on linear modeling assumptions (e.g., Patnaik et al., 2019; Kiefer, 2012), the present study captures the nonlinear and context-specific nature of player valuation, thus offering more nuanced and flexible insights into market dynamics.
Thus, the findings contribute to the literature by reinforcing the position-specific and age-specific nature of football player valuation and highlighting the strategic role of potential in shaping transfer market values.

4. Methodology

4.1. Data Collection and Sampling

The data were gathered from sofifa.com, which is based on the official FIFA video game and backed by the International Federation of Association Football (FIFA). It contains records for every single player for all the licensed leagues around the world, encompassing more than 19,000 players. EA Sports is responsible for the FIFA game where 25 producers and 400 outside data contributors are employed to ensure that all players’ data are being updated. Moreover, more than 6000 data reviewers or talent scouts from around the globe are continuously offering possible modifications to the scores of each player. For the study, the data as of the end of 2022 were downloaded, and multiple layers of filtering were done. Because it was hypothesized that factors differ with the player’s position in the game, separate models were established for attackers, midfielders, and defenders. The study does not consider substitutes and goalkeepers.
For every player, more than 80 variables could affect their market value (Figure 2). When looking at the skill-related parameters, three layers were identified. The first layer is composed of highly detailed features like heading accuracy, short passing, long passing, jumping power, shooting power, and many others that add up to more than 30 variables. The second takes into consideration the scores of the tier-one features but in a more aggregated manner from which passing, shooting, and defending can be enumerated. The third contains one variable, which is the overall score of the player. In the investigation, the choice taken was to feed the model features from the second layer, which allows the identification of positions on the field that are dependent on certain talents, without increasing the number of features to an extent that causes overfitting in the estimated model. Based on previous studies, it was possible to reduce those variables to 17, and then remove the overall score of the players in addition to three variables that were insignificant, and reach 13 variables. Those 13 features include skill-related attributes, among others, related to the level of fame, the duration of the current contract, and future potential.
The study’s goal is not only to build a model that predicts the value of the players, but also to monitor the influence of each variable on the player’s value. To reach this objective, the dataset was divided into three categories of age (16–23; 24–29; 30 years and above) and three classes of players (attackers, midfielders, and defenders), which resulted in nine baskets of data. Figure 3 shows the number of players in each basket.
Then, for each basket of players, the study engaged in a dimensionality reduction process to try to only keep the features that are relevant to this specific basket before feeding it to the machine learning model. To avoid multi-collinearity, decrease the overfitting bias, and validate the significance of variables in each basket, the correlation matrix was plotted. The selection decision was based on the following rules:
If, with a confidence level of 95%, the correlation between dependent and independent variables was found to be insignificant, the independent variable was excluded;
If the correlation between dependent and independent variables was below 10%, the independent variable was eliminated;
If the correlation between two independent variables exceeded 80%, the independent variable with the lower correlation to the market value (dependent variable) of the player was disregarded.
To illustrate the process, the attackers aged between 16 and 23 (Figure 42) were taken as an example. The correlation matrix was plotted, whereby it became clear that the following features had a low correlation with the market value (lower than 10%): preferred foot—left, preferred foot—right, weight in kg, height in cm, and age. These features were filtered. As explained earlier, the overall score of the player has also been removed from the list of features fed to the model. When looking at the passing score, it is possible to notice a high correlation of 83% with the dribbling capabilities, so one of those variables had to be eliminated. Since the market value of the players showed a higher correlation with the dribbling skills (56%) than with the passing skills (45%), the passing skills for this group of players are shown. The same process and approach were followed for the feature selection of the other 8 models.

4.2. Metrics and Scale of Measurement

The study incorporated 13 features across nine models, as outlined in Table 3, to derive the market value of players. Due to the diversity of the models, not all variables were applicable to every player category. Table 4 displays the relevant variables for each group of players with similar characteristics. These variables were determined through filtering based on the correlation matrix and statistical significance.

4.3. Empirical Framework

Based on its superior performance in predicting the value of football players in previous studies, the implementation of the Gradient Boosting Machine Learning Model (XGBoost) was chosen. The Gradient Boost is used for regression because the predicted variable is continuous. This algorithm is considered an Ensemble Learning Model since it is constructed on the back of multiple weak learners. Those basic models are not highly accurate on their own, but when aggregated, they can provide decent results (Malagón-Selma et al., 2023). Here, the model works by building a series of decision trees, where each new decision tree tries to optimize the prediction by studying the residual or errors of the previous decision tree. In the end, the model stops iterating when a maximum number of trees is reached. This process was repeated nine times to generate nine different models, one for each basket of players. Those steps are visualized in Figure 5, and then the mathematical rationale for the respective steps in the iteration loop is provided.
y ~ i m = Ψ y i , F ( x i ) F ( x i ) F x = F m 1 x
Formula (1) is applied in the initial step of the iteration loop, wherein the residuals im are computed based on the loss function Ψ. The loss function is given by the formula (½)*(observed-predicted) (see note 1 above), where yi and F(xi) are the observed and predicted market values of the player, respectively. xi represents the independent variables used to predict the value of the player. “m” stands for the order of the current decision tree (if m = 7 then it is the 7th decision tree).
γ l m = arg min γ x i R l m Ψ y i , F m 1 x i + γ
In the third step of the iteration loop, the predicted residual γlm is calculated using the above formula. Rlm is the terminal region or the base leaf in the decision trees.
F m x = F m 1 x + ν γ l m 1 x R l m
The last step in the iteration loop is the one where the predictions Fm (x) are generated—Formula (3). To reach this prediction, the model takes the prediction generated by the previous iteration Fm−1 (x) and adds to it the current tree’s predicted residual, multiplied by a learning rate ν. After being trained, the model can predict the value of a certain football player by following the respective branches in each decision tree to reach the respective residual. Subsequently, each residual obtained is multiplied by a learning rate and added to the initial prediction to get the final expected value of the player. Each feature’s importance was extracted as well to check the independent variables with the highest impact on the study’s predicted value. A key point to be noted here is that the importance of each variable comes from the number of times it was used in the decision trees inside the XGBoost model.
To evaluate the accuracy of the study’s model, the K-fold cross-validation testing technique was used (Figure 6). Instead of having only one training dataset and another validation dataset, this method takes the whole dataset and divides it into K portions. The training sample is constituted of (K−1) portions, with one part remaining used for validation. This train–test validation split is reiterated K times, and the accuracy of the model is then considered the average accuracy of the K trials. The adjusted R-squared and Root-Mean-Squared-Error (RMSE) are the measurements used to represent the accuracy of the models.
Since the variable “potential” is highly reliant on the consensus of expert’s opinion and much less quantifiable based on numerical statistics of the players, it was decided to split the empirical analysis into two categories. In the first one, the potential variable among the set of independent variables is kept and the results of the nine models are assessed. In the second category, it is omitted from the variables fed to all the models, the correlation filtering steps are performed, and then the impact on the accuracy of the models is studied and the importance of each feature is highlighted.

5. Conclusions

The novelty of this study comes from the dynamic assessments of the factors affecting the transfer value of football players at different stages of their careers. This study is of major importance to teams and players who can use it to gain bargaining power over other counterparties during the negotiations that take place before a certain transfer. For agents, this research can also provide numerical backing for the valuations proposed to potential bidders. While training the players of a certain team, coaches can focus on specific aspects of a footballer’s game in a way that increases his contribution to the team and multiplies his estimated economic value. The study’s models can help small and medium-sized clubs with limited budgets identify undervalued players who are nearing the end of their contracts and could be available on the market in the near future. As mentioned by Anjum and Fatima (2023), these models can be viewed as more quantitative and objective means to assess the value of players.
While this study offers valuable insights into the valuation of football players across varied positions and career stages using an ML approach, there remain several limitations. First, the dataset was segmented by player position and age group to capture subgroup-specific patterns. While this granularity provided important interpretive advantages, it also led to smaller training sets for certain subgroups. Specifically, some segments, such as younger defenders or older midfielders, contained relatively few observations, which may have reduced model reliability for those specific cases. Nonetheless, the Gradient Boosting algorithm employed demonstrated robust performance and retained acceptable levels of accuracy across subsets. Second, goalkeepers were excluded from the analysis due to both their low representation in the dataset and the distinct nature of their performance metrics. This exclusion was necessary to maintain model consistency and avoid statistical distortion. As highlighted in prior work (e.g., Asif et al., 2016; Fahlberg, 2024), the specialized nature of goalkeeper evaluation often requires a tailored modeling approach that is beyond the scope of the current study. Third, feature importance was measured using the frequency-based metric (“weight”) provided by XGBoost, which counts how often a variable is used in decision tree splits. While this method is commonly used and gives a practical sense of each feature’s predictive utility, it does not account for variable interaction effects or the directionality and magnitude of impact. More advanced interpretive tools, such as SHAP (Shapley Additive Explanations) values, could offer a more nuanced understanding, and are recommended for future research. Finally, the study’s generalizability is constrained by its cross-sectional design. The data reflect player attributes and valuations from a single season, which ensures temporal consistency but limits the ability to capture longer-term trends or market dynamics. Further, contextual variables such as team performance, league prestige, and macroeconomic conditions, factors known to influence transfer value, were purposely excluded to isolate individual-level player attributes. Incorporating longitudinal data and external contextual factors presents a promising direction for future studies aiming to enhance the model’s external validity and applicability in real-world decision-making contexts.
The takeaways from this study show that different positions on the pitch are valued based on different variables, and those variables also dynamically fluctuate throughout the life of the player and the different career stages that he goes through. Forwards, for example, are mostly valued based on their shooting and dribbling skills, and midfielders are highly reliant on their ability to transition the ball between the defenders and attackers. Defending, passing and pace are the features that decide most of the worth of defenders. Overall, the study found that international reputation is a major factor in determining the market value of small players. However, as players age, their skill-related variables become more important. That is why players have to continuously work on their skills as they age to preserve a higher market value.
Several promising directions for future studies can expand and deepen the insights offered in this study. First, incorporating goalkeepers and substitute players into the analysis would allow for a more comprehensive evaluation of player value across all field positions. Given the unique nature and impact of goalkeepers, particularly in high-stakes tournaments where penalty shootouts are more frequent, dedicated modeling strategies and additional data collection would be required to accurately assess their contribution. Second, comparing the performance of different ML algorithms, such as Random Forest, Support Vector Machines, and Neural Networks, could help identify the most appropriate technique for football player valuation within various data environments. Ensemble approaches or hybrid models may also be explored to balance predictive accuracy with interpretability. Third, a valuable extension would involve segmenting players by transaction value to examine whether different sets of attributes drive valuation in high-, mid-, or low-range transfer categories. This would provide deeper market-level insights and could aid clubs in tailoring recruitment strategies according to budget constraints. Fourth, incorporating longitudinal data across multiple seasons could enhance the robustness and generalizability of findings. This would enable the modeling of long-term performance trends, the effects of injury and recovery, or changes in form over time; factors particularly relevant in dynamic valuation markets. Moreover, integrating contextual variables such as team success, league reputation, or macroeconomic shifts in the sports industry would offer a more holistic view of the valuation process. Finally, future research could utilize SHAP values to better interpret model outputs. SHAP values allow for a more granular understanding of feature contributions, capturing both interaction effects and the direction of influence on the predicted outcome. Applying SHAP would strengthen the transparency and interpretability of machine learning models used in sports analytics. Pursuing these directions will enhance the refinement and applicability of football player valuation models, offering more actionable insights to clubs, agents, and analysts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijfs13020111/s1.

Author Contributions

Conceptualization, D.K. and J.Y.; methodology, D.K.; software, E.C.; validation, E.C., D.K. and C.Z.; formal analysis, E.C.; investigation, E.C. and D.K.; resources, E.C.; data curation, E.C.; writing—original draft preparation, D.K., J.Y. and E.C.; writing—review and editing, N.J.A.M.; visualization, C.Z. and N.J.A.M.; supervision, D.K.; project administration, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data were gathered from sofifa.com, and are based on the official FIFA video game and backed by the International Federation of Association Football (FIFA).

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1
Variables that have been excluded from certain models based on the filtering process previously explained are noted as N/A.
2
The correlation matrices for the other samples of players are available in the Supplementary Files.

References

  1. Al-Asadi, M. A., & Tasdemir, S. (2022). Predict the value of football players using FIFA video game data and machine learning techniques. IEEE Access, 10, 22631–22645. [Google Scholar] [CrossRef]
  2. Anjum, S., & Fatima, A. (2023). Predictive analytics for FIFA player prices: An ML approach. Journal of Scientific Research and Technology, 1(6), 204–212. [Google Scholar]
  3. Asif, R., Zaheer, M. T., Haque, S. I., & Hasan, M. A. (2016). Football (soccer) analytics: A case study on the availability and limitations of data for football analytics research. International Journal of Computer Science and Information Security, 14(11), 516. [Google Scholar]
  4. Behravan, I., & Razavi, S. M. (2021). A novel machine learning method for estimating football players’ value in the transfer market. Soft Computing, 25(3), 2499–2511. [Google Scholar] [CrossRef]
  5. Dazi-Héni, F. (2021). Mohammed bin salman’s gamble on youth (Report No. 80). Institut de Recherche Stratégique de l'École Militaire (IRSEM). [Google Scholar]
  6. Fahlberg, S. M. B. (2024). An ecological dynamics approach to skill development and performance in handball goalkeeper training: An evaluation of current training methodologies and ways forward [Master’s thesis, Brage NIH]. [Google Scholar]
  7. Felipe, J. L., Fernandez-Luna, A., Burillo, P., de la Riva, L. E., Sanchez-Sanchez, J., & Garcia-Unanue, J. (2020). Money talks: Team variables and player positions that most influence the market value of professional male footballers in Europe. Sustainability, 12(9), 3709. [Google Scholar] [CrossRef]
  8. Gásquez, R., & Royuela, V. (2014). Is football an indicator of development at the international level? Social Indicators Research, 117, 827–848. [Google Scholar] [CrossRef]
  9. Herm, S., Callsen-Bracker, H. M., & Kreis, H. (2014). When the crowd evaluates soccer players’ market values: Accuracy and evaluation attributes of an online community. Sport Management Review, 17(4), 484–492. [Google Scholar] [CrossRef]
  10. Hill, D. F., Skinner, J., & Grosman, A. (2025). A review of football player metrics and valuation methods: A typological framework of football player valuations. Managing Sport and Leisure, 1–24. [Google Scholar] [CrossRef]
  11. Hoey, S., Peeters, T., & Principe, F. (2021). The transfer system in European football: A pro-competitive no-poaching agreement? International Journal of Industrial Organization, 75, 102695. [Google Scholar] [CrossRef]
  12. Idson, T. L., & Kahane, L. H. (2000). Team effects on compensation: An application to salary determination in the National Hockey League. Economic Inquiry, 38(2), 345–357. [Google Scholar] [CrossRef]
  13. Kiefer, S. (2012). The impact of the Euro 2012 on popularity and market value of football players (No. 11/2012). Diskussionspapier des Instituts für Organisationsökonomik. [Google Scholar]
  14. Li, C., Kampakis, S., & Treleaven, P. (2022). Machine learning modeling to evaluate the value of football players. arXiv, arXiv:2207.11361. [Google Scholar]
  15. Liu, J. (2025). Post-prime football player valuations: Depreciation difference between the english premier league and the top European leagues. International Journal of Financial Studies, 13(1), 17. [Google Scholar] [CrossRef]
  16. Lorincz, M. K. (2022). Estimating the market value of attacking football players using multiple linear regression [Master’s thesis, Universidade Catolica Portuguesa (Portugal)]. [Google Scholar]
  17. Majewski, S. (2016). Identification of factors determining market value of the most valuable football players. Central European Management Journal, 24(3), 91–104. [Google Scholar] [CrossRef]
  18. Malagón-Selma, P., Debón, A., & Domenech, J. (2023). Measuring the popularity of football players with Google trends. PLoS ONE, 18(8), e0289213. [Google Scholar] [CrossRef]
  19. Metelski, A. (2021). Factors affecting the value of football players in the transfer market. Journal of Physical Education and Sport, 21, 1150–1155. [Google Scholar]
  20. Müller, O., Simons, A., & Weinmann, M. (2017). Beyond crowd judgments: Data-driven estimation of market value in association football. European Journal of Operational Research, 263(2), 611–624. [Google Scholar] [CrossRef]
  21. O Leary, J., & Caiger, A. (2000). Shifting power and control in English football. New Zealand Journal of Industrial Relations, 25(3), 259–276. [Google Scholar]
  22. Patnaik, D., Praharaj, H., Prakash, K., & Samdani, K. (2019, March 29–30). A study of prediction models for football player valuations by quantifying statistical and economic attributes for the global transfer market. 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) (pp. 1–7), Pondicherry, India. [Google Scholar]
  23. Radoman, M. (2017). Labor market implications of institutional changes in European football: The Bosman ruling and its effect on productivity and career duration of players. Journal of Sports Economics, 18(7), 651–672. [Google Scholar] [CrossRef]
  24. Samak, A. T., Samak, B., & Kaya, T. (2019, July 23–25). Football player value assessment using machine learning techniques. In Intelligent and Fuzzy Techniques in Big Data Analytics and Decision Making: Proceedings of the INFUS 2019 Conference (Vol. 1029, p. 289), Istanbul, Turkey. [Google Scholar]
  25. Sayan, V. H., & Hançer, E. (2022). A survey on football player performance and value estimation using machine learning techniques. Scientific Journal of Mehmet Akif Ersoy University, 5(2), 57–62. [Google Scholar]
  26. Shen, M., Wang, S., Wang, M., & Chen, H. (2023). Mining analysis on the correlation between football player’s competency and value based on machine learning. International Journal of Research in Engineering and Science (IJRES), 11(1), 149–156. [Google Scholar]
  27. Singh, P., & Lamba, P. S. (2019). Influence of crowdsourcing, popularity and previous year statistics in market value estimation of football players. Journal of Discrete Mathematical Sciences and Cryptography, 22(2), 113–126. [Google Scholar] [CrossRef]
  28. Subathra, S., Sivanesan, K., Narmadha, M., Senthilkumar, K., Alkitani, M., Essa, M. M., & Walid Qoronfleh, M. (2022). Economic implications of hosting FIFA World Cup–A study with special reference to South Africa, Brazil, Russia and Qatar. In FIFA (pp. 73–94). Nova Science Publishers. [Google Scholar]
  29. Vroonen, R., Decroos, T., Van Haaren, J., & Davis, J. (2017, September). Predicting the potential of professional soccer players. In Proceedings of the 4th workshop on machine learning and data mining for sports analytics (Vol. 1971, pp. 1–10). Springer. [Google Scholar]
  30. Yaldo, L., & Shamir, L. (2017). Computational estimation of football player wages. International Journal of Computer Science in Sport, 16(1), 18–38. [Google Scholar] [CrossRef]
Figure 1. Relevant variables diagram. Source: Sayan and Hançer (2022).
Figure 1. Relevant variables diagram. Source: Sayan and Hançer (2022).
Ijfs 13 00111 g001
Figure 2. Visualization of the filtering process. Source: authors’ work.
Figure 2. Visualization of the filtering process. Source: authors’ work.
Ijfs 13 00111 g002
Figure 3. Number of players per basket. Source: authors’ work.
Figure 3. Number of players per basket. Source: authors’ work.
Ijfs 13 00111 g003
Figure 4. Correlation matrix for players aged between 16 and 23 years. Source: authors’ work.
Figure 4. Correlation matrix for players aged between 16 and 23 years. Source: authors’ work.
Ijfs 13 00111 g004
Figure 5. Gradient Boosting model representation. Source: authors’ work.
Figure 5. Gradient Boosting model representation. Source: authors’ work.
Ijfs 13 00111 g005
Figure 6. K-fold cross-validation visualization. Source: authors’ work.
Figure 6. K-fold cross-validation visualization. Source: authors’ work.
Ijfs 13 00111 g006
Table 1. Results of models including potential.
Table 1. Results of models including potential.
Model 1Model 2Model 3Model 4Model 5Model 6Model 7Model 8Model 9
Age:16–2324–2930+16–2324–2930+16–2324–2930+
Position:AttackerAttackerAttackerMidfielderMidfielderMidfielderDefenderDefenderDefender
Average K-fold accuracy87%95%59%80%95%96%91%92%97%
Normalized RMSE in %3%2%5%3%2%2%2%2%2%
Feature importance
international_reputation58.50%16.78%0.31%66.92%10.17%0.00%43.30%4.30%0.05%
Potential22.57%78.11%92.62%18.59%81.19%94.08%43.33%89.39%90.73%
dribbling11.04%N/AN/AN/AN/AN/AN/AN/A0.02%
shooting3.91%2.50%N/A1.47%1.47%0.09%0.38%0.47%0.05%
club_contract_valid_until2.89%0.11%0.23%0.24%0.68%2.45%0.19%1.40%0.21%
skill_moves0.78%1.10%1.33%3.33%1.53%0.03%0.26%0.52%0.01%
physic0.17%0.22%N/A0.38%0.57%0.41%0.46%0.60%0.01%
Pace0.09%0.43%2.08%0.68%0.26%0.30%0.99%0.40%0.51%
defending0.02%0.24%0.72%0.77%0.33%0.04%7.46%N/AN/A
In_the_national_team0.02%0.08%1.67%1.40%0.49%0.02%0.29%0.75%0.00%
passingN/A0.43%1.04%6.23%3.30%N/A3.35%2.17%N/A
ageN/AN/AN/AN/AN/A2.58%N/AN/A7.89%
weight_kgN/AN/AN/AN/AN/AN/AN/AN/A0.51%
# of factors in the model1010810101010911
Source: authors’ own work.
Table 2. Results of models excluding potential.
Table 2. Results of models excluding potential.
Model 1Model 2Model 3Model 4Model 5Model 6Model 7Model 8Model 9
Age:16–2324–2930+16–2324–2930+16–2324–2930+
Position:AttackerAttackerAttackerMidfielderMidfielderMidfielderDefenderDefenderDefender
Average K-fold accuracy78%86%60%69%77%66%73%85%73%
Normalized RMSE in %3%4%6%4%4%4%4%3%4%
Feature importance
international_reputation59.41%23.80%7.65%90.69%77.95%6.57%94.56%58.19%3.98%
dribbling19.56%17.19%56.15%N/AN/AN/AN/AN/AN/A
shooting17.26%0.50770.15720.01010.00850.0260.00190.00450.39%
club_contract_valid_until3.20%0.31%0.00420.15%1.11%1.91%0.07%0.89%0.22%
skill_moves0.07%0.41%1.90%1.96%0.45%0.50%0.15%0.54%0.22%
physic0.06%0.51%N/A0.10%1.70%0.42%0.19%0.41%0.23%
Pace0.10%6.27%0.01440.67%0.91%1.56%0.08%1.33%19.44%
defending0.19%0.49%0.40%0.51%2.95%10.52%1.77%32.24%62.80%
In_the_national_team0.16%0.24%16.32%1.90%5.96%5.24%2.16%0.00280.0092
passingN/AN/AN/A3.01%8.11%63.50%0.83%5.67%5.65%
ageN/AN/AN/AN/AN/A0.0718N/AN/A0.0512
weight_kgN/AN/AN/AN/AN/AN/AN/AN/A1.05%
% of variables with effect of more than 5%33%44%50%11%33%50%11%33%36%
# of factors in the model99899109911
Source: authors’ work.
Table 3. Features and scales.
Table 3. Features and scales.
Features Type DescriptionUnit/Scale
Dependent Variable
Value Value of the playerEuros
Independent variables (Ticker)
International reputation (IR)PopularityThe level of fame of the playerScaled:
1 (low) to 5 (high)
Potential (POT)PotentialExpected future potential based on experts’ opinionScaled 0–100
0 (low) to 100 (high)
Dribbling (DRI)SkillAbility to go through the defense of the opposite teamScaled 0–100
0 (low) to 100 (high)
Shooting (SHO)SkillSpeed and accuracy of the shots delivered by the playerScaled 0–100
0 (low) to 100 (high)
Club contract valid until (CCV)OtherDuration of the player’s contract with his current teamYears
Skill moves (SKM)SkillBall handling capabilitiesScaled 0–100
0 (low) to 100 (high)
Physic (PHY)SkillPhysical condition and strength of the playerScaled 0–100
0 (low) to 100 (high)
Pace (PAC)SkillSpeed or agility in the movementScaled 0–100
0 (low) to 100 (high)
Defending (DEF)SkillAbility to block incoming attacksScaled 0–100
0 (low) to 100 (high)
In the national team (NAT)OtherIf the player is in his native country’s national team Binary:
Yes: 1 No: 0
Passing (PAS)SkillAbility to spot teammates and deliver the ball to themScaled 0–100
0 (low) to 100 (high)
Age (AGE)PersonalAge of the player Years
Weight (WEI)PersonalWeight of the playerKilograms
Source: authors’ work.
Table 4. Relevant features per basket.
Table 4. Relevant features per basket.
ModelPositionAgeTickers of the Relevant FeaturesDependent Variable
1Attackers16–23IR; POT; DRI; SHO; CCV; SKM; PHY; PAC; DEF; NATMarket Value of players
2Attackers24–29IR; POT; SHO; CCV; SKM; PHY; PAC; DEF; NAT; PAS
3Attackers30+IR; POT; CCV; SKM; PAC; DEF; NAT; PAS
4Midfielders16–23IR; POT; SHO; CCV; SKM; PHY; PAC; DEF; NAT; PAS
5Midfielders24–29IR; POT; SHO; CCV; SKM; PHY; PAC; DEF; NAT; PAS
6Midfielders30+IR; POT; SHO; CCV; SKM; PHY; PAC; DEF; NAT; AGE
7Defenders 16–23IR; POT; SHO; CCV; SKM; PHY; PAC; DEF; NAT; PAS
8Defenders24–29IR; POT; SHO; CCV; SKM; PHY; PAC; NAT; PAS
9Defenders30+IR; POT; DRI; SHO; CCV; SKM; PHY; PAC; NAT; AGE; WEI
Source: authors’ work.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khalife, D.; Yammine, J.; Chbat, E.; Zaki, C.; Jabbour Al Maalouf, N. Dynamic Financial Valuation of Football Players: A Machine Learning Approach Across Career Stages. Int. J. Financial Stud. 2025, 13, 111. https://doi.org/10.3390/ijfs13020111

AMA Style

Khalife D, Yammine J, Chbat E, Zaki C, Jabbour Al Maalouf N. Dynamic Financial Valuation of Football Players: A Machine Learning Approach Across Career Stages. International Journal of Financial Studies. 2025; 13(2):111. https://doi.org/10.3390/ijfs13020111

Chicago/Turabian Style

Khalife, Danielle, Jad Yammine, Elias Chbat, Chamseddine Zaki, and Nada Jabbour Al Maalouf. 2025. "Dynamic Financial Valuation of Football Players: A Machine Learning Approach Across Career Stages" International Journal of Financial Studies 13, no. 2: 111. https://doi.org/10.3390/ijfs13020111

APA Style

Khalife, D., Yammine, J., Chbat, E., Zaki, C., & Jabbour Al Maalouf, N. (2025). Dynamic Financial Valuation of Football Players: A Machine Learning Approach Across Career Stages. International Journal of Financial Studies, 13(2), 111. https://doi.org/10.3390/ijfs13020111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop