Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework

Ali, Gihan M.

doi:10.3390/jrfm19010063

Open AccessArticle

Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework

by

Gihan M. Ali

Department of Accounting, College of Business Administration in Hawtat Bani Tamim, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

J. Risk Financial Manag. 2026, 19(1), 63; https://doi.org/10.3390/jrfm19010063

Submission received: 9 December 2025 / Revised: 1 January 2026 / Accepted: 4 January 2026 / Published: 13 January 2026

(This article belongs to the Section Financial Technology and Innovation)

Download

Browse Figures

Versions Notes

Abstract

Decentralized Finance (DeFi) has become a major component of digital asset markets, yet accurately valuing protocol performance remains difficult due to high volatility, nonlinear pricing dynamics, and persistent disclosure gaps that amplify valuation risk. This study develops an Optuna-tuned Super Learner stacked ensemble to improve risk-aware DeFi valuation, combining Extremely Randomized Trees (ETs), Support Vector Regression (SVR), and Categorical Boosting (CAT) as heterogeneous base learners, with a K-Nearest Neighbors (KNNs) meta-learner integrating their forecasts. Using an expanding-window panel time-series cross-validation design, the framework achieves significantly higher predictive accuracy than individual models, benchmark ensembles, and econometric baselines, obtaining RMSE = 0.085, MAE = 0.065, and R² = 0.97—representing a 25–36% reduction in valuation error. Wilcoxon tests confirm that these gains are statistically significant (p < 0.01). SHAP-based interpretability analysis identifies Gross Merchandise Volume (GMV) as the primary valuation determinant, followed by Total Value Locked (TVL) and key protocol design features such as Decentralized Exchange (DEX) classification, while revenue variables and inflation contribute secondary effects. The findings demonstrate how explainable ensemble learning can strengthen valuation accuracy, reduce information-driven uncertainty, and support risk-informed decision-making for investors, analysts, developers, and policymakers operating within rapidly evolving blockchain-based digital asset environments.

Keywords:

Decentralized Finance (DeFi); digital asset valuation; financial risk analysis; super learner stacked ensemble; Optuna hyperparameter optimization; explainable machine learning; SHAP interpretability; protocol disclosure indicators; market volatility; risk-informed decision-making

1. Introduction

Decentralized Finance (DeFi) has become one of the most dynamic and rapidly expanding sectors within the digital asset economy, reshaping how financial services are designed, accessed, and governed in online environments. Built on blockchain infrastructures and executed through smart contracts, DeFi protocols replicate and extend traditional financial activities—including trading, lending, borrowing, and liquidity provisioning—without reliance on centralized intermediaries. This architecture strengthens transparency, auditability, and composability while enabling permissionless participation in global digital markets (Angeris et al., 2019; Werner et al., 2022). The sharp expansion of DeFi, reflected in the growth of Total Value Locked (TVL) across decentralized exchanges, lending platforms, and liquidity pools underscores both its economic relevance and the increasing complexity of its valuation mechanisms (Kaal et al., 2024; Metelski & Sobieraj, 2022). Despite this growth, the absence of robust and reliable valuation frameworks remains a central challenge in assessing the fundamental value of DeFi protocols operating under highly nonlinear and volatile conditions.

Recent research highlights the growing importance of rigorous analytics in understanding blockchain-enabled financial ecosystems. Digital-asset markets exhibit pronounced volatility, interdependence, and structural complexity, which challenge the assumptions of traditional financial models and complicate valuation in decentralized, platform-based settings (Almeida et al., 2022; Loukil et al., 2025). These characteristics are especially salient in DeFi, where protocol activity, liquidity, and incentives evolve rapidly, reinforcing the need for advanced, data-driven valuation frameworks.

As with other blockchain-based innovations, DeFi offers significant benefits alongside considerable risks and analytical challenges. These benefits include continuous market access, decentralized governance, automated liquidity management, and cost-efficient execution, while users gain immediate token liquidity and access to diverse yield-generating opportunities (Werner et al., 2022). Yet these advantages coexist with vulnerabilities stemming from smart-contract exploits, governance failures, liquidity shocks, and token-inflation dynamics (Zmaznev, 2021). A persistent challenge is the high degree of information asymmetry between protocol developers and users, which complicates assessments of fundamental health, revenue sustainability, and risk exposure—factors that are central to protocol valuation (Kaal et al., 2024).

Given these complexities, there is a clear need for robust, data-driven valuation frameworks capable of capturing DeFi’s nonlinear market structure. Prior empirical work has relied primarily on econometric approaches—such as fixed-effects models and Granger causality tests—to examine the predictive value of disclosure variables including TVL, revenue, Gross Merchandise Volume (GMV), and inflation (Metelski & Sobieraj, 2022). However, these techniques impose restrictive assumptions regarding linearity and independence and often fail to capture the high-variance, interdependent nature of protocol activity. Empirical findings frequently reveal weak or unstable predictive relationships, suggesting that DeFi valuation is governed by nonlinear interactions that traditional models cannot fully represent.

Although machine learning (ML) offers clear advantages in this context, existing ML studies in the blockchain domain have predominantly focused on cryptocurrency price prediction, scam detection, or initial coin offerings performance (Kayikci & Khoshgoftaar, 2024), leaving DeFi protocol valuation relatively underexplored. Despite growing interest in DeFi analytics, the literature lacks a unified, explainable, ensemble-based valuation framework that is explicitly designed to capture the nonlinear, high-variance, and disclosure-driven dynamics of DeFi protocols while remaining robust to temporal instability and model uncertainty. To address this gap, this study adopts a Super Learner ensemble framework designed to capture complex nonlinear relationships while reducing model uncertainty and improving robustness.

The framework integrates three heterogeneous base learners—Extremely Randomized Trees (ET), Categorical Boosting (CAT), and Support Vector Regression (SVR)—and combines them through a K-Nearest Neighbors (KNNs) meta-learner. Using an expanding-window time-series cross-validation scheme to prevent information leakage, the proposed model consistently outperforms its constituent learners, competing ensemble strategies, and traditional econometric benchmarks across multiple performance metrics. Empirically, the Super Learner achieves an RMSE of 0.0854 ± 0.0259, an MAE of 0.0645 ± 0.0231, and an R² of 0.9731 ± 0.0202, substantially surpassing individual ML models and alternative ensemble approaches. Wilcoxon signed-rank tests confirm that these improvements are statistically significant (p < 0.01), highlighting the robustness and generalizability of the proposed framework.

Beyond predictive performance, the study enhances transparency through SHAP-based interpretability. The results show that economic throughput (GMV), liquidity depth (TVL), and protocol classification—particularly Decentralized Exchange (DEX) status—are the dominant drivers of valuation. Other indicators such as total revenue (TR), protocol revenue (PR), and inflation (INF) play more moderate roles. These findings extend the work of Metelski and Sobieraj (2022), who identify GMV as the most consistent econometric predictor, and highlight the capacity of nonlinear ML models to extract more stable relationships from disclosure variables. They also reinforce broader insights that liquidity efficiency, fee generation, and user engagement are central components of DeFi value formation (Angeris et al., 2019; Werner et al., 2022).

Building on these insights, this study contributes to the literature along three main dimensions. First, it introduces a novel Super Learner stacked ensemble specifically tailored to the nonlinear and high-variance dynamics of DeFi protocol valuation. Second, it provides a comprehensive performance evaluation against econometric baselines, individual machine-learning models, and alternative ensemble strategies, demonstrating the superior accuracy and robustness of heterogeneous stacking. Third, it integrates SHAP-based interpretability to identify the key disclosure variables driving DeFi valuations, offering actionable economic insights for investors, developers, and regulators. Collectively, the study advances the analytical foundations of DeFi valuation by combining predictive accuracy with economic interpretability in a unified, data-driven framework.

The remainder of the paper is structured as follows: Section 2 reviews the literature on DeFi valuation; Section 3 details the dataset and methodological design; Section 4 presents the empirical results and discussion; and Section 5 concludes the study.

2. Related Work

Decentralized Finance (DeFi), a blockchain-based peer-to-peer financial system operating without traditional intermediaries, is fundamentally reshaping global finance by addressing critical gaps in financial inclusion. Its infrastructure relies on interconnected smart contracts deployed on blockchains like Ethereum, with key operational mechanisms including self-executing financial logic, oracles for off-chain data, incentivized keepers, and DAO-managed governance (John et al., 2023; Werner et al., 2022). DeFi provides permissionless access to services (e.g., savings, lending, payments) for the 1.4 billion unbanked adults worldwide—particularly in developing regions where traditional banking faces geographic, cost, and documentation barriers (Alamsyah & Salsabila, 2024; World Bank, 2023). By eliminating intermediaries through blockchain technology, DeFi enables peer-to-peer transactions that can substantially reduce transaction costs; for example, cross-border remittance fees average 6.3% in traditional systems but can fall to approximately 0.01% on platforms such as Celo (Foster et al., 2021; World Bank, 2023).

The ecosystem is characterized by four core properties: non-custodial control, permissionless access, open auditability, and composability (“Money Lego”) enabling seamless service integration (Werner et al., 2022). These facilitate enhanced financial inclusion (Cong & He, 2019), improved transparency via public ledgers, permissionless innovation (John et al., 2023), and potentially higher yields through disintermediation (Gudgeon et al., 2020). However, these advantages coexist with significant challenges, including the Oracle Problem requiring trusted data feeds (John et al., 2023) Miner Extractable Value (MEV) exploitation (Daian et al., 2019), capital inefficiency from overcollateralization (Gudgeon et al., 2020), security vulnerabilities, regulatory uncertainty (Werner et al., 2022), and integration difficulties with traditional legal frameworks (John et al., 2023). Despite these hurdles, DeFi has experienced explosive growth, with TVL surging from $700 million in early 2020 to over $150 billion by April 2022 (Werner et al., 2022), reflecting its expansion beyond early experimental stages.

The DeFi ecosystem has rapidly diversified beyond its initial applications. The development of Automated Market Makers (AMMs) like Uniswap revolutionized trading by replacing order books with liquidity pools and constant product algorithms (Angeris et al., 2019; John et al., 2023). Protocols for Loanable Funds (PLFs) introduced innovative lending mechanisms, ranging from overcollateralized loans (e.g., Compound) to uncollateralized flash loans executable within single transactions (Gudgeon et al., 2020). Stablecoins became a particularly impactful sector, with collateralized models (e.g., DAI) and algorithmic designs (e.g., TerraUSD) playing significant roles, though the collapse of TerraUSD in 2022 underscored critical architectural vulnerabilities (Klages-Mundt et al., 2020). Further diversification included derivatives (synthetic assets, options, perpetual swaps), yield aggregators optimizing returns across protocols (Cousaert et al., 2022), and privacy-enhancing mixers (Werner et al., 2022). This rapid innovation attracted substantial investment, with TVL peaking above $250 billion (Bhambhwani & Huang, 2024), while simultaneously increasing systemic and security risks (Werner et al., 2022).

2.1. Disclosure Determinants of Decentralized Finance (DeFi) Valuation

The valuation of DeFi protocols reflects a complex interaction of performance metrics, market dynamics, and external factors. Total Value Locked (TVL) is a widely used performance indicator; research indicates it significantly impacts valuations, although its explanatory power varies across protocol categories like decentralized exchanges, lending, and asset management (Metelski & Sobieraj, 2022). However, Şoiman et al. (2023) challenge the primacy of TVL-to-Market Capitalization (MC) ratios, finding broader cryptocurrency market exposure to be a stronger driver of DeFi token returns. Beyond TVL, protocol revenue and Gross Merchandise Volume (GMV) are critical financial indicators of economic activity. Metelski and Sobieraj (2022) demonstrate that GMV Granger-causes future valuations, while protocol revenue and inflation factors often exert negative effects.

Network effects constitute another important determinant of valuation. Studies by Liu and Tsyvinski (2021) emphasize that user adoption increases token utility and valuation, particularly in early development stages (Ante, 2022). Regulatory uncertainty represents a significant external risk, negatively impacting TVL in major categories like decentralized exchanges and lending, though derivatives and payments may show greater resilience (Zmaznev, 2021). Conversely, security assurances through smart contract audits positively influence valuation. Protocols audited by higher-quality firms exhibit higher TVL and market capitalization, as audits signal reliability and boost investor confidence (Bhambhwani & Huang, 2024). This is corroborated by findings that breaches cause significant TVL declines, while audit announcements increase asset commitments (Knechel et al., 2023).

Despite sustained growth, DeFi tokens frequently exhibit overvaluation compared to traditional finance. Xu et al. (2023) document persistent overvaluation using valuation multiples and discounted cash flow (DCF) approaches, even during market downturns. Furthermore, self-generated bubbles, often catalyzed by movements in major DeFi tokens like Chainlink (LINK) and Maker (MKR), have been identified using bubble detection methods such as the Supremum Augmented Dickey–Fuller (SADF) test (Corbet et al., 2023).

2.2. Machine Learning Applications in Valuation: Bridging the DeFi Gap

While machine learning (ML) has revolutionized traditional corporate valuation through data-driven precision, its application to DeFi protocol valuation remains limited. Existing DeFi valuation studies predominantly rely on econometric and time-series methods, including fixed-effects panel regressions (Metelski & Sobieraj, 2022), Granger causality tests, vector autoregressions, and regime-switching models (Corbet et al., 2023; Zmaznev, 2021).

Although informative, these econometric and traditional valuation approaches rely on assumptions that are frequently violated in DeFi contexts. Linear panel regressions and causality-based models typically assume stable data-generating processes, weak endogeneity, and time-invariant relationships. However, empirical evidence indicates that DeFi protocols are characterized by rapidly evolving governance rules, incentive redesigns, and endogenous liquidity feedback loops that generate strong nonlinear interactions between usage, rewards, and valuation metrics (Corbet et al., 2023; Şoiman et al., 2023; Zmaznev, 2021). Research documenting regime shifts, speculative behavior, and structural instability in DeFi token markets further highlights the fragility of these assumptions under decentralized market conditions (Almeida et al., 2022; Corbet et al., 2023).

Financial valuation techniques based on multiples or discounted cash flow (DCF) analysis additionally presume predictable cash-flow streams and consistent disclosure practices. In DeFi, these assumptions are weakened by token inflation mechanisms, protocol upgrades, composability-driven revenue reallocation, and heterogeneous reporting standards across projects (Kaal et al., 2024; Metelski & Sobieraj, 2022; Xu et al., 2023). Prior empirical studies show that valuation signals such as TVL, revenue, and token supply dynamics often exhibit unstable or context-dependent explanatory power across protocol categories and market regimes, limiting the robustness and generalizability of inference based on static or linear models (Metelski & Sobieraj, 2022; Zmaznev, 2021). As a result, valuation estimates derived from these approaches are highly sensitive to model specification and sample period, motivating the need for adaptive, nonlinear, and ensemble-based methodologies for DeFi protocol valuation.

In contrast, recent advances in ML demonstrate substantial advantages in capturing nonlinear relationships in valuation contexts. Geertsema and Lu (2019) demonstrated that ML algorithms using only historical accounting data achieved a median absolute percentage error of 17.2% in firm valuation, outperforming both finance students and professional analysts. Their subsequent work (Geertsema & Lu, 2023) revealed that decision-tree-based ML models could reduce valuation errors by 5.6 to 31.4 percentage points compared to traditional multiples-based approaches while identifying key drivers consistent with discounted cash flow theory. Similarly, Koklev (2022) found Gradient Boosting Decision Trees (GBDTs) achieved remarkable explanatory power (R² = 86.7%) in predicting market capitalization, substantially outperforming conventional econometric models.

Moreover, various ML techniques have proven effective for different valuation contexts. Tree-based methods like Random Forest (RF) and Extreme Gradient Boosting (XGBoost) have shown particular promise, with Koklev (2022) demonstrating that XGBoost models using 19 fundamental signals outperformed linear models by 27% in measuring firm quality. Neural networks have excelled in specific applications, as C. Zhang et al. (2020) showed Artificial Neural Networks (ANNs) outperformed other models by 18-19% in valuing energy firms. For early-stage ventures, R. Zhang et al. (2023) found neural networks optimized with differential evolution algorithms effectively predicted entrepreneurial firm valuations, revealing that VC investor syndicate size was more influential than patent portfolios. Cross-country studies like Cakici et al. (2023) have demonstrated ML’s global applicability, with models successfully predicting returns across 46 markets using 148 firm characteristics, though performance varied based on market size and idiosyncratic risk factors. Importantly, interpretability challenges associated with ML have been addressed through tools such as SHAP values, which enable identification of economically meaningful value drivers (Koklev, 2022).

Despite this advancement, DeFi valuation research remains disproportionately reliant on traditional econometric and financial approaches (Metelski & Sobieraj, 2022; Xu et al., 2023). These methods are inherently ill-equipped to address DeFi’s unique complexities—including data uncertainty, nonlinear protocol interactions, endogenous liquidity dynamics, and rapidly evolving market conditions—resulting in persistent valuation gaps. Consequently, DeFi valuation requires models capable of capturing nonlinear dependencies, adapting to temporal instability, and mitigating model risk—requirements that are not adequately addressed by single-model or purely econometric frameworks.

To bridge this methodological divide, the research proposes an integrated framework leveraging ensemble learning through a Super Learner model combined with Optuna hyperparameter optimization. The Super Learner effectively integrates multiple base learners, capturing complementary strengths and modeling complex, non-linear relationships within DeFi data. When paired with Optuna’s dynamic hyperparameter tuning—which minimizes prediction variance under volatile conditions—this approach enables adaptive, robust valuation unattainable via traditional methodologies. Collectively, this paradigm offers a scalable solution to DeFi’s core challenges while maintaining interpretability essential for stakeholder adoption.

3. Proposed Framework

3.1. General Context

This study develops an advanced valuation framework for DeFi protocols using a Super Learner stacked ensemble. The methodological process is organized into two main stages: training and testing. As shown in Figure 1, data preprocessing includes structuring the time-series observations, imputing missing values, encoding categorical protocol characteristics, and standardizing numerical variables. Feature relevance is inferred through the performance contributions of the individual base learners. The predictive architecture integrates three heterogeneous models—ET, SVR, and CAT—as base learners, while a KNN model serves as the meta-learner to aggregate their outputs and enhance predictive stability. Model performance in the evaluation stage is assessed using RMSE, MAE, and R².

3.2. Phase 1: Data Preparation

Step 1: Dataset Overview

The study uses publicly accessible data collected from three major aggregation platforms—DefiLlama, TokenTerminal, and DappRadar—for 30 DeFi protocols. DefiLlama and TokenTerminal serve as the primary sources for protocol-level financial metrics and disclosure variables, including market capitalization, Total Value Locked (TVL), protocol revenue, total revenue, gross merchandise volume (GMV), and token inflation factors. The dataset consists of daily observations spanning the period from 11 January to 8 July 2022 (179 days), yielding 5370 protocol–day observations. Lagged and rolling-window features are constructed exclusively from past observations within each protocol, which reduces the effective sample available for estimation.

The sample selection follows the benchmark design of Metelski and Sobieraj (2022), employing the same set of DeFi protocols, disclosure variables, and observation window to ensure conceptual consistency and comparability with prior valuation studies. The included protocols represent the most economically significant DeFi platforms during the study period and collectively account for a substantial share of aggregate TVL, transaction volume, and protocol revenues within the DeFi ecosystem. Although the observation window is limited to early 2022, this period coincides with heightened volatility and market stress in digital-asset markets, providing a demanding empirical environment for evaluating valuation models under adverse and rapidly changing conditions.

The selection of explanatory variables is theoretically motivated and grounded in prior empirical research on DeFi valuation. TVL serves as a proxy for capital commitment, user adoption, and network trust and is conceptually analogous to assets under management in traditional finance (Metelski & Sobieraj, 2022; Zmaznev, 2021). Revenue-based variables—protocol revenue, total revenue, and GMV—capture economic performance, transaction intensity, and platform scale, consistent with valuation theory for digital platforms and marketplace-based business models (Metelski & Sobieraj, 2022; Yan et al., 2017). The inflation factor reflects token supply dynamics and dilution risk, which are central to crypto-asset valuation through scarcity and supply–demand mechanisms (Liu & Tsyvinski, 2021; Şoiman et al., 2023).

In addition, the study includes a categorical variable capturing DeFi protocol classification—Decentralized Exchanges, Lending Protocols, and Asset Management Protocols—to control for structural heterogeneity across business models. This extension is theoretically motivated by literature emphasizing that different DeFi protocol types rely on distinct value-generation mechanisms and exhibit heterogeneous sensitivities to liquidity, revenue, and transaction-based metrics (Schär, 2021; Werner et al., 2022). Table 1 summarizes the dependent variable and the theoretically motivated disclosure indicators used in the empirical analysis.

Step 2: Data Preprocessing

The dataset includes temporal information, protocol identifiers, protocol-level financial indicators, and the target variable. The preprocessing workflow followed several structured steps to prepare the data for modeling. The dataset was first sorted by each protocol’s name and corresponding date to preserve the temporal ordering of observations. Within each protocol, missing values were handled using linear interpolation to maintain continuity in the time series, and any remaining missing entries were subsequently filled using forward- and backward-filling methods. Categorical variables, particularly the protocol type, were encoded using one-hot encoding through a scikit-learn ColumnTransformer, ensuring that the encoded columns remained free of multicollinearity by dropping redundant categories. Numerical features were standardized using StandardScaler to place all predictors on a comparable scale for model training. Finally, the target variable VAL was designated as the outcome of interest, while all remaining features served as predictors, with the protocol identifier retained separately to enable group-aware model evaluation.

Step 3: Feature Validation

In this step, a correlation matrix is computed to assess pairwise relationships among the variables and identify potential multicollinearity issues. Figure 2 displays these correlations, which reveal several meaningful patterns. Overall, the correlations remain moderate, indicating that the predictors capture distinct aspects of DeFi protocol behavior without raising multicollinearity concerns. Protocol revenue (PR) is strongly correlated with total revenue (TR) (0.62) and moderately correlated with TVL (0.52). This pattern reflects the economic structure of DeFi protocols: higher user activity and greater locked value tend to generate more fee-based revenue streams. TVL and GMV show a moderate positive correlation (0.51), consistent with the idea that protocols handling larger transaction volumes also tend to attract more deposited liquidity. TR also correlates positively with both TVL (0.48) and GMV (0.25), reinforcing these operational linkages.

The inflation factor (INF) shows weak correlations with all variables, suggesting that token supply changes are not directly driven by protocol usage or revenue. The encoded DeFi-class variables show notable relationships with operational metrics. For example, DeFi-class_Exchanges is moderately negatively correlated with GMV (−0.45), indicating that GMV tends to be higher among protocols outside the exchange category. Meanwhile, DeFi-class_Lending has a moderate positive correlation with GMV (0.38), suggesting lending protocols tend to process higher transaction volumes. The negative correlation between the two class indicators (−0.50) is expected due to their categorical and mutually exclusive nature. Crucially, none of the correlations exceed commonly accepted concern thresholds (e.g., |0.80|), supporting the suitability of the feature set for predictive modeling.

3.3. Phase 2: Model Development

Step 1: Specification of the Super Learner Model

This study employs ensemble learning to construct a predictive framework for DeFi valuation. Ensemble learning strengthens supervised learning by integrating multiple algorithms whose combined output typically yields more accurate and stable predictions than any individual model—a concept often referred to as the “wisdom of the crowd” (Kunapuli, 2023; Mohammed & Kora, 2023). Prior empirical work consistently demonstrates that ensemble approaches enhance predictive accuracy and generalization performance (Bogaert & Delaere, 2023; Thabet et al., 2024).

Ensemble methods can generally be grouped into homogeneous and heterogeneous designs. Homogeneous ensembles rely on repeated instances of the same algorithm trained on different data subsets. Bagging models such as RF and ET operate in parallel and aggregate their predictions, while boosting methods—including AdaBoost, Gradient Boosted Trees, LGBM, CAT, and XGBoost—build models sequentially to reduce residual errors (Mienye & Sun, 2022; Mohammed & Kora, 2023). In the present study, several of these algorithms (e.g., RF, LGBM, CAT, ADA, ET) are included as benchmark comparison models, enabling a comprehensive performance comparison across methodological families.

Heterogeneous ensemble methods integrate multiple ML algorithms to enhance predictive accuracy and generalization. Instead of relying on a single modeling technique, this strategy involves training a diverse set of models on the same dataset and then combining their outputs through structured aggregation mechanisms. These mechanisms typically include voting procedures—such as majority, simple averaging, and weighted averaging, where weights reflect relative model performance—or a meta-learning framework. In a meta-learning setup, a higher-level model learns how to optimally combine the outputs of the base learners, forming what is known as a stacked ensemble. Such approaches are especially effective for complex regression and classification settings because they exploit algorithmic diversity and data-driven combination rules (Porwik et al., 2019).

The Super Learner, proposed by Van der Laan et al. (2007), represents a formalized stacking-based ensemble algorithm designed to surpass the limitations of single-model predictive methods. By integrating predictions from multiple base learners through a meta-learning layer, the Super Learner achieves asymptotically optimal performance (Dey & Mathur, 2023; Naimi & Balzer, 2018; Phillips et al., 2023). Its strength lies in its ability to assign optimal weights to a diverse library of algorithms, enabling it to deliver high accuracy and robust predictive behavior across varying data environments. This makes the Super Learner particularly advantageous when no single algorithm consistently outperforms others.

Empirical evidence shows that the Super Learner frequently exceeds the performance of individual models across a wide range of applications. Its adaptability stems from its capacity to select and weight only the most informative algorithms for the prediction task, thereby reducing overfitting and improving generalization (Wong et al., 2019). The meta-learner plays a central role in this process by minimizing prediction error through the optimal combination of base learner outputs. Studies involving behavior classification demonstrate that this approach produces superior accuracy and lower variance compared to both standalone models and conventional ensemble methods (Ladds et al., 2017). Through meta-learning, the Super Learner harnesses the strengths of each contributing model while compensating for their individual weaknesses, resulting in more reliable predictions. Overall, the Super Learner offers substantial advantages in predictive modeling by merging a diverse set of algorithms into a unified predictive function. Grounded in theoretical guarantees, it asymptotically performs at least as well as the best model in its library, and has demonstrated superior results in domains such as spatial prediction and network security (Davies & Van Der Laan, 2016).

Step 2: Model Selection Process

The selection of ML algorithms was performed using 10-fold cross-validation and evaluated using RMSE, MAE, and R². Among the candidates, ET, CAT, and SVR demonstrated the strongest and most consistent performance while also providing complementary modeling strengths, making them suitable base learners for the Super Learner framework. KNN was selected as the meta-learner due to its relatively strong standalone predictive performance and its ability to capture nonlinear relationships when integrating base-model outputs. Unlike linear stacking approaches, KNN combines base-model predictions based on similarity in prediction space, allowing the ensemble to adapt to protocol-specific and regime-dependent valuation patterns. This property is particularly relevant in DeFi markets, where valuation relationships vary across protocol types and market conditions. Optuna was employed to optimize the hyperparameters of all base learners as well as those of the KNN meta-learner. This architecture enables the framework to leverage the complementary strengths of heterogeneous algorithms, thereby producing more stable and accurate DeFi valuation estimates. The following section presents the individual base learners, the meta-learner structure, and the associated Optuna optimization procedure.

3.3.1. Extremely Randomized Trees (ETs)

ET algorithm is an ensemble method that enhances RF by introducing additional randomness, improving model diversity and mitigating overfitting. It generates multiple decision trees from random feature subsets and split thresholds, aggregating results via averaging (regression) or majority voting (classification) (González et al., 2020; Schmid et al., 2023). Key inputs include training data, number of trees, maximum depth, random splits, minimum leaf samples, and features per split, while outputs comprise predictions and feature importance scores. Its ability to reduce variance without sacrificing predictive accuracy makes it effective for financial tasks like DeFi Valuation.

3.3.2. Categorical Gradient Boosting (CAT)

CAT, developed by Dorogush et al. (2018), is an advanced gradient boosting algorithm aimed at improving both model performance and computational efficiency. It employs a balanced level-wise tree structure to speed up training and incorporates several innovations to reduce overfitting. Unlike conventional methods, CAT trains on the full dataset while applying random permutations to each data instance. It also introduces a novel method for calculating leaf values during tree construction, effectively addressing the biased gradient problem often observed in standard boosting algorithms (Lee et al., 2023).

3.3.3. Support Vector Regression (SVR)

SVR is a regression technique grounded in statistical learning theory and is well suited for modeling nonlinear relationships. It operates by applying kernel-based support vector machines to project the inputs into a higher-dimensional feature space, enabling the algorithm to identify a hyperplane that best approximates the target function (Ince & Trafalis, 2008; C. Zhang et al., 2020).

Its optimization problem can be expressed as:

m i n i m i z e \frac{1}{2} {| | w | |}^{2} = C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*})

(1)

Subject to:

y_{i} - f (x) \leq ϵ + ξ_{i}^{*}

(2)

f (x) - y_{i} \leq ϵ + ξ_{i}

(3)

ξ_{i}, ξ_{i}^{*} \geq 0

(4)

Here,

C

acts as a regularization parameter that regulates the penalty applied to prediction errors, while the slack variables

ξ_{i}

and

ξ_{i}^{*}

quantify deviations from the

ε

-insensitive boundary for each observation. The instances that lie closest to the regression boundary—those that define the model—are known as support vectors (Kim & Sohn, 2010). SVR has demonstrated strong predictive performance, achieving high accuracy in stock market forecasting (Chhajer et al., 2022; Dash et al., 2023; C. Zhang et al., 2020). It has also outperformed logistic regression, ANN, and RF in predicting cryptocurrency price movements (Akyildirim et al., 2021).

3.3.4. The K-Nearest Neighbors (KNNs)

The KNN algorithm estimates an outcome by locating the

K

most similar data points to a given input and computing the average of their observed values. It is based on the principle that observations with comparable characteristics tend to exhibit similar results, making KNN a straightforward and effective regression method. The prediction for a new point

z

is defined as:

\hat{y} = \frac{1}{K} \sum_{j \in N (z)} y_{j}

where

\hat{y}

is the predicted value for

z

,

y_{j}

is the actual value of the

j

-th nearest neighbor, and

N (z)

represents the set of closest neighboring points. A small value of

K

makes the model highly responsive to noise and increases the risk of overfitting, whereas a larger

K

produces smoother estimates but may overlook important local variations. KNN performs effectively when clear structural patterns exist in the data, as it infers outcomes by averaging similar historical states.

As a meta-learner, KNN aggregates the predictions of the base learners by identifying observations with similar prediction profiles and averaging their realized outcomes. This similarity-based aggregation allows the Super Learner to flexibly combine heterogeneous base-model outputs without imposing linearity or additive structure. Unlike linear stacking approaches, KNN operates in the prediction space rather than the original feature space, enabling the ensemble to adapt locally to regime-dependent and protocol-specific valuation patterns. This property is particularly relevant in DeFi markets, where valuation relationships vary across protocol types, liquidity conditions, and market regimes. At the same time, KNN introduces limited additional model complexity at the meta-learning stage, reducing overfitting risk while preserving nonlinear expressiveness. These characteristics make KNN a suitable and parsimonious choice for stacking heterogeneous learners in high-variance, nonstationary financial environments such as decentralized finance.

3.3.5. Optuna Optimization

Given the high dimensionality and nonlinear structure of the proposed ensemble, careful hyperparameter tuning is essential for stable and reproducible performance. Manual tuning or exhaustive grid search becomes computationally infeasible as the number of learners and interaction effects increases. Optuna is therefore employed as a unified optimization framework to systematically identify well-regularized configurations for both the base learners and the meta-learner.

Optuna is a modern hyperparameter optimization framework designed to efficiently explore high-dimensional search spaces and identify high-performing parameter combinations (Akiba et al., 2019). Optuna uses advanced sampling strategies, including the Tree-structured Parzen Estimator (TPE), which targets promising regions of the search space while reducing unnecessary computations. It also integrates pruning mechanisms that terminate poorly performing trials early, thereby improving overall search efficiency (Srinivas & Katarya, 2022).

Optuna has demonstrated state-of-the-art performance across numerous domains—including financial forecasting, cybersecurity analytics, and geoscience modeling—and consistently outperforms classical approaches in combined algorithm selection and hyperparameter optimization (CASH) tasks (Ali & Yahia, 2025; Almarzooq & bin Waheed, 2024; Parekh et al., 2024). These advantages make it particularly suitable for tuning the base learners of the stacked ensemble, which include ET, SVR, and CAT.

The Optuna optimization workflow, summarized conceptually in Figure 3, follows three main stages (Priyadarshi & Kumar, 2024):

1.: Define the Objective Function

In this stage, each base learner of the stacked ensemble is trained on preprocessed DeFi protocol data using a specific trial configuration, where Optuna samples hyperparameters such as the number of trees and depth for ET and CAT, the learning rate and number of estimators for CAT, the number of neighbors and distance metric for KNN, and various parameters including minimum sample splits, subsampling ratios, or regularization terms; each sampled configuration is then evaluated on validation folds using RMSE—chosen as the primary objective due to its sensitivity to large deviations that are especially consequential in financial valuation contexts—as the metric to be minimized.

2.: Create a Study Object

An Optuna Study object manages the optimization process by handling trial execution, logging results, and guiding the search toward promising hyperparameter configurations. The TPE sampler is used to iteratively refine the hyperparameter distributions and focus on regions associated with lower validation error.

3.: Run the Optimization

The optimization is carried out using the study.optimize() function, which specifies both the objective function and the number of trials, allowing Optuna to dynamically explore model-specific structural parameters (such as tree depth, number of estimators, and number of neighbors), regularization and learning parameters (including learning rate and split criteria), and computational parameters like minimum samples per split, subsampling strategies, and leaf constraints; through adaptive sampling and pruning mechanisms, Optuna ensures a computationally efficient tuning process that converges toward well-performing model configurations.

Step 3: Model Development

To ensure realistic forecasting conditions, this study employs an expanding-window panel time-series cross-validation (CV) scheme rather than random or shuffled splits. In each fold, models are trained exclusively on past observations and evaluated on subsequent periods, preserving temporal ordering and preventing look-ahead bias. This setup mirrors real-time valuation scenarios in which future protocol values must be predicted using only historically available information—a consideration that is especially important in volatile and rapidly evolving DeFi markets.

As shown in Figure 4, the Super Learner model integrates three Optuna-tuned base learners—ET, SVR, and CAT—alongside a KNN meta-learner whose hyperparameters were also optimized using Optuna. Optuna was employed to automatically search the hyperparameter spaces of all base learners within each fold, minimizing average RMSE to identify the best configurations. After tuning, the base learners were retrained on the entire leakage-free dataset using these optimal parameters before final ensemble training. This process ensured each base learner contributed complementary predictive strengths, enhancing the ensemble’s robustness.

To construct the meta-learner, a strict panel time-series CV was followed to prevent data leakage. Each base model generated predictions on the corresponding hold-out segment after preprocessing and standardization. These out-of-fold (OOF) predictions were collected across all folds to form a new dataset representing the base learners’ predictive behavior. The KNN meta-learner was then trained on this OOF matrix, with Optuna tuning the number of neighbors, distance weighting scheme, and Minkowski metric. The best KNN configuration—determined by minimizing RMSE across folds—was trained on the complete OOF dataset to form the final ensemble. The predictive performance of the Super Learner was evaluated by comparing its OOF-based stacked predictions with those of other compared algorithms using RMSE, MAE, and R². Final hyperparameter settings for all models are summarized in Table 2. Statistical significance was assessed with Wilcoxon signed-rank tests. This structured approach ensures a robust, generalizable ensemble by combining complementary models while rigorously avoiding data leakage.

Step 4: Performance Metrics

The proposed model was evaluated using four metrics: RMSE, MAE, and R² (Equations (5)–(7)) (Erdebilli & Devrim-İçtenbaş, 2022; Nguyen et al., 2021).

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{(y_{i} - x_{i})}^{2}}{n}}

(5)

M A E = \frac{\sum_{i = 1}^{n} |y_{i} - x_{i}|}{n}

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{x}}_{i})}^{2}}

(7)

Here,

x_{i}

represents the forecasted value for the

i

th observation,

{\bar{x}}_{i}

is the mean of the forecasted values,

y_{i}

denotes the actual value for the

i

th observation, and

n

is the total number of observations. Models with higher

R^{2}

values indicate a stronger capacity to capture the variability in the observed data and are therefore regarded as more effective in prediction (Erdebilli & Devrim-İçtenbaş, 2022). Together, these metrics provide a solid basis for comparing statistical, ML, and ensemble models.

Step 5: Comparative Performance Analysis

To evaluate the effectiveness of the Super learner Ensemble, its performance is benchmarked against:

Its base learners (ET, CAT, and SVR);
Other tree-based and boosting homogenous ensemble methods (RF, Bagging, LGBM, AdaBoost);
Average and weighted-voting heterogeneous ensembles composed of the same constituent algorithms;
Classical regressors (OLS, Ridge, Lasso, ElasticNet, DT, KNN).

This comparative analysis highlights whether the Super Learner architecture consistently outperforms individual learners and other ensemble techniques. Wilcoxon signed-rank tests are conducted on RMSE values to verify improvements over competing models.

Step 6: Model Explanation

To enhance interpretability, SHAP analysis was performed using TreeExplainer for tree-based base learners and KernelExplainer for others, including the KNN meta-learner. This quantifies each feature’s contribution to predictions at both global and individual levels. Out-of-fold predictions from base learners were generated using leakage-free panel time-series cross-validation and served as training data for the tuned KNN meta-learner. The base learners were then retrained on the full dataset to obtain final models for SHAP explanation. A representative holdout set from the last temporal split was used, applying consistent feature encoding and standardization. SHAP values were computed for each base learner accordingly, and their predictions on the holdout set were inputs for the meta-learner’s SHAP computation. SHAP values were computed separately for each Optuna-tuned base learner using its own prediction function and input feature space, and ensemble-level interpretability was obtained by aggregating these model-specific SHAP values according to each learner’s contribution to the stacked prediction, thereby reflecting how individual features influence the final Super Learner output through the meta-learning structure.

Base learner SHAP values—initially calculated on encoded features—were aggregated back to the original numeric features and dummy variables. Meta-learner SHAP contributions were then mapped onto these aggregated values, approximating each original feature’s overall influence on the ensemble’s output. The final SHAP values support both global feature importance analysis and detailed per-observation explanations. Visualizations like SHAP summary plots further clarify how each feature drives the Super Learner’s predictions, improving transparency and trust (Lundberg et al., 2020).

4. Experimental Results and Discussion

This section first evaluates the predictive performance of the proposed Super Learner by comparing it with a range of regression and ML alternatives using multiple evaluation metrics. Additionally, SHAP values are employed to provide a deeper understanding of the model’s internal mechanisms and the influence of key predictors on DeFi valuation. The section concludes with a discussion of the main results, their implications, and how they relate to prior research.

4.1. Analysis of Experimental Results

To assess the effectiveness of the Super Learner model in forecasting DeFi valuations, its performance is compared against a range of alternative regressors. These benchmark models fall into three groups: (1) the model’s individual base learners; (2) alternative ensemble approaches; (3) regression-based and standalone ML models.

4.1.1. Performance Comparison Between the Super Learner and Its Base Learners

As shown in Table 3 and Figure 5, the Super Learner consistently outperforms each of its base learners across all regression metrics. The ensemble achieves the lowest RMSE (0.0854 ± 0.0259) and MAE (0.0645 ± 0.0231), indicating substantially higher predictive accuracy. Furthermore, the Super Learner attains the highest R² value (0.9731 ± 0.0202), capturing considerably more variance in protocol valuation than ET, CAT, or SVR.

Compared with the second-best performing base learner, ET, which reports an RMSE of 0.1165 ± 0.0519 and an R² of 0.9424 ± 0.0628, the Super Learner reduces error by approximately 25% and increases explained variance by 31%. CAT and SVR perform noticeably worse, with RMSE values of 0.1318 ± 0.0603 and 0.1572 ± 0.0649, and R² values of 0.9235 ± 0.0789 and 0.8960 ± 0.1001, respectively. These numerical gaps further underscore the ensemble’s advantage over its constituent models. The statistical significance tests reported in the final column reinforce these findings. Wilcoxon signed-rank tests indicate that the Super Learner significantly outperforms all base learners (p < 0.01). Overall, these results demonstrate that combining heterogeneous learners through stacking yields a more accurate, stable, and generalizable model than any individual constituent model.

4.1.2. Performance Comparison Between the Super Learner and Alternative Ensemble Strategies

Table 4 and Figure 6 compare the Super Learner with alternative ensemble strategies, including Weighted Voting (WV), Average Voting (AV), Bagging, and single-model ensembles. The Super Learner outperforms all competing ensembles across every regression metric, achieving the lowest RMSE (0.0854 ± 0.0259) and MAE (0.0645 ± 0.0231). It also yields the highest R² (0.9731 ± 0.0202), demonstrating superior explanatory power relative to all other approaches.

Compared with the second-best heterogeneous ensemble, the Weighted Voting Ensemble, the Super Learner delivers a 31% reduction in RMSE (0.0854 vs. 0.1235) and a 36% reduction in MAE (0.0645 vs. 0.1012). Its R² is 4% higher (0.9731 vs. 0.9335), confirming consistent improvements in both accuracy and variance explanation. Lower-performing ensembles, such as Bagging and LGBM, show notably higher errors and reduced explanatory power. These differences are visually reinforced in RMSE and R² bar plots, where the Super Learner is clearly separated from all alternatives. Wilcoxon signed-rank tests indicate that the Super Learner significantly outperforms all alternative ensembles (p < 0.01), confirming that the observed gains are statistically meaningful. These results highlight the advantage of the Super Learner’s meta-learning mechanism over fixed-weight averaging or single-ensemble approaches.

4.1.3. Performance Comparison Between the Super Learner and Statistical and ML Models

Table 5 shows that all individual statistical and ML models perform substantially worse than the Super Learner across every regression metric. Among linear models, Ridge and ElasticNet are the strongest competitors, achieving RMSE values of 0.1757 ± 0.0901 and 0.1815 ± 0.0876, MAE of 0.1345 ± 0.0757 and 0.1394 ± 0.0674, and R² of 0.8561 ± 0.1455 and 0.8556 ± 0.1210, respectively. Although these models perform relatively well, their errors remain considerably higher and their explained variance substantially lower than the Super Learner.

Other linear models, including Lasso and OLS, also show limited predictive power (RMSE ≈ 0.188–0.190, R² ≈ 0.848), reflecting the difficulty of capturing nonlinear patterns in protocol valuation. Nonlinear baselines, such as Decision Trees (RMSE = 0.2025 ± 0.0873, R² = 0.8231 ± 0.1463) and KNN (RMSE = 0.1599 ± 0.0796, R² = 0.8811 ± 0.1242), perform better than linear models but still underperform relative to the Super Learner. Wilcoxon signed-rank tests confirm that the Super Learner significantly outperforms traditional statistical and ML models (p < 0.01), indicating that the observed improvements are robust and statistically meaningful. The RMSE and R² bar plots in Figure 7 further illustrate the clear performance gap between the Super Learner and all baseline models.

4.1.4. SHAP Analysis

The SHAP analysis provides insight into the relative contribution and directional influence of each predictor in the Super Learner’s valuation model (Figure 8 and Figure 9). The global SHAP bar plot in Figure 8 indicates that GMV is the most influential feature (mean |SHAP| ≈ 0.062), followed by TVL (≈0.050) and the DEX classification indicator (≈0.039), confirming that transaction throughput, liquidity depth, and exchange-type protocols are the strongest drivers of DeFi valuation. The summary plot in Figure 9 further shows that higher GMV and TVL values consistently increase predicted valuation, consistent with these variables’ roles as indicators of economic scale, user engagement, and protocol robustness. Revenue- and activity-based variables exhibit moderate influence: total revenue (TR; ≈0.033) contributes positively to valuation, while protocol revenue (PR; ≈0.022) has the weakest overall effect, reflecting the limited and irregular nature of revenue distributions.

The inflation factor (INF; ≈0.034) shows mixed but predominantly negative SHAP values, indicating that higher token-supply growth tends to depress valuation through dilution. Among categorical variables, lending protocols have low and near-zero SHAP values (≈0.026), suggesting minimal systematic effect relative to the baseline category. Overall, the SHAP analysis highlights that market activity (GMV), capital depth (TVL), and protocol type (DEX classification) are the primary determinants of valuation, while inflation exerts a modest downward influence and protocol revenue plays a minor role.

5. Discussion of Results

The Super Learner demonstrates consistently superior predictive performance across all evaluation metrics when compared with base learners, alternative ensemble strategies, and traditional econometric and machine learning models (Table 3, Table 4 and Table 5). Achieving an RMSE of 0.0854 ± 0.0259, an MAE of 0.0645 ± 0.0231, and an R² of 0.9731 ± 0.0202, the Super Learner substantially outperforms individual learners such as ET, CAT, and SVR, reducing RMSE by approximately 25% relative to the strongest base learner. It also delivers lower prediction errors and higher explanatory power than alternative ensemble approaches—including weighted voting, average voting, and bagging—as well as linear and nonlinear benchmarks such as OLS, Ridge, ElasticNet, KNN, and Decision Trees. Wilcoxon signed-rank tests confirm that these gains are statistically significant (p < 0.01). The robustness of these findings is further supported by consistent performance across rolling expanding-window evaluation periods, strict out-of-fold stacking to eliminate information leakage, comparison against a broad set of econometric and machine-learning benchmarks, and nonparametric significance testing.

These methodological findings align with broader perspectives in the DeFi literature emphasizing the limitations of traditional econometric approaches in capturing value formation in decentralized financial systems. DeFi markets are characterized by high volatility, nonlinear dependencies, composable financial primitives, and rapidly evolving incentive structures (Kaal et al., 2024; Schär, 2021). Prior studies note that while DeFi assets exhibit distinct return profiles and diversification properties, coherent valuation frameworks remain underdeveloped (Yousaf & Yarovaya, 2022). Using the same dataset and core set of valuation variables, Metelski and Sobieraj (2022) employ a fixed-effects panel regression framework and document statistically significant relationships between valuation and variables such as TVL, GMV, revenues, and inflation; however, their model achieves a comparatively modest explanatory power (R² ≈ 0.45). This highlights the difficulty of capturing complex, nonlinear interactions and heterogeneous protocol dynamics using linear econometric specifications. In contrast, the substantially higher out-of-sample explanatory performance achieved by the Super Learner suggests that meta-learning frameworks are better suited to modeling the interaction-driven and structurally heterogeneous nature of DeFi valuations, consistent with recent evidence in crypto-asset valuation research (Xu et al., 2023).

Beyond predictive accuracy, the SHAP analyses provide economically meaningful insights into the drivers of DeFi protocol valuation. Gross merchandise volume (GMV) emerges as the most influential predictor (mean |SHAP| ≈ 0.062), followed by total value locked (TVL; ≈0.050) and decentralized exchange (DEX) classification (≈0.039). From an economic perspective, the dominance of GMV reflects the platform-based nature of DeFi protocols. Unlike TVL, which primarily captures capital commitment, GMV measures realized transaction demand and network utilization. Platform economics theory suggests that value creation in multi-sided markets is driven by transaction intensity and user interaction rather than static asset holdings. In DeFi ecosystems—particularly decentralized exchanges—higher GMV signals deeper liquidity usage, stronger network effects, and greater fee-generating capacity, which translate into more sustainable protocol valuations (Angeris et al., 2021; Schär, 2021).

TVL remains an important predictor, reflecting liquidity depth, protocol scale, and user trust, but its influence is weaker than that of GMV. This finding reinforces concerns raised in prior literature that TVL can be inflated by asset price appreciation or short-term liquidity incentives and may not fully reflect underlying economic activity or protocol health (Metelski & Sobieraj, 2022). The strong role of DEX classification further underscores the importance of protocol design, as decentralized exchanges tend to exhibit higher activity levels, stronger network effects, and more resilient value-generation mechanisms compared with other DeFi categories (Angeris et al., 2021; Werner et al., 2022).

Revenue-based variables—including total revenue (TR; ≈0.033) and protocol revenue (PR; ≈0.022)—also exert positive and economically meaningful effects on predicted valuations. These results support the distinction between liquidity provision and value capture mechanisms emphasized in prior studies. While TVL reflects the scale of deposited capital, revenue metrics represent realized economic rents generated through protocol usage and distributed across participants. Token valuation theory suggests that crypto-asset prices are more closely linked to expected cash-flow-like mechanisms and incentive structures than to raw capital accumulation alone (Cong et al., 2021; Liu & Tsyvinski, 2021). The SHAP results indicate that markets reward protocols that successfully convert transaction activity into revenue streams benefiting token holders.

In contrast, the comparatively weaker influence of inflation and lending-protocol classification reflects structural constraints rather than economic irrelevance. Token inflation introduces dilution effects that can offset usage-driven value gains, particularly during periods of declining market sentiment or excessive incentive issuance (Zetzsche et al., 2020). Similarly, lending protocols exhibit lower and more volatile valuation sensitivity due to their reliance on overcollateralization, liquidation mechanisms, and exposure to market downturns, consistent with documented fragilities in DeFi lending models (Zmaznev, 2021). These factors suggest that protocol design and incentive stability act as bounding conditions on valuation rather than primary growth drivers.

Overall, the findings indicate that DeFi valuations are shaped primarily by realized economic activity, liquidity utilization, and protocol design rather than by passive capital accumulation alone. Protocols that generate sustained transaction throughput and effectively convert usage into revenue exhibit stronger and more stable valuations, while excessive reliance on liquidity incentives or token inflation introduces fragility. By integrating high-accuracy meta-learning predictions with interpretable SHAP-based explanations, this study extends prior econometric evidence and provides a more comprehensive, economically grounded framework for understanding value formation in decentralized finance.

Beyond methodological performance, the results have direct economic relevance for key DeFi stakeholders. For investors, the superior out-of-sample accuracy of the Super Learner—combined with the dominance of activity-based drivers such as GMV and revenues—suggests that valuation signals grounded in realized protocol usage provide more reliable indicators of fundamental value than liquidity-based metrics alone. For protocol developers and designers, the findings imply that sustainable valuation growth is more closely linked to increasing transaction throughput and effective value-capture mechanisms than to short-term TVL expansion driven by incentive programs. From a policy and regulatory perspective, the results highlight that economically meaningful valuation in DeFi is tied to realized economic activity, incentive stability, and protocol design rather than to nominal liquidity measures, supporting a shift toward activity- and revenue-based indicators when assessing systemic relevance and market integrity.

6. Conclusions

This study develops a high-performing Super Learner ensemble model for predicting decentralized finance (DeFi) protocol valuations, offering substantial improvements over individual ML models, alternative ensemble techniques, and traditional econometric approaches. By integrating heterogeneous learners through a KNN meta-learner and employing strict leakage-free rolling time-series validation, the model achieves the lowest RMSE and MAE and the highest R² across all evaluated models. SHAP-based interpretability analysis further reveals that economic throughput (GMV), liquidity depth (TVL), and protocol architecture—particularly DEX classification—constitute the primary drivers of DeFi valuation, while revenue-related indicators and inflation exhibit more moderate and context-dependent effects. These findings highlight the nonlinear and structural determinants of value formation in decentralized financial systems, adding nuance to and advancing prior research that often reports weaker or inconsistent relationships.

The results carry meaningful implications for practitioners, developers, and regulators operating in the DeFi ecosystem. For investors, the Super Learner offers a transparent and data-driven tool for assessing protocol fundamentals and identifying high-value opportunities based on observable characteristics such as GMV, TVL, and protocol type. For protocol designers, the findings underscore the economic importance of throughput efficiency, liquidity depth, and exchange-type mechanisms in shaping valuation outcomes. For regulators and policymakers, the interpretability of SHAP outputs provides a valuable framework for monitoring systemic risks, evaluating inflation dynamics, and understanding market concentration across protocol classes. Taken together, the study demonstrates how combining advanced ML techniques with interpretable modeling enhances transparency and supports data-informed decision-making in decentralized finance.

This study is subject to several limitations. First, the analysis relies on a fixed sample of 30 DeFi protocols observed over a specific historical period, which may limit generalizability across market regimes and newly emerging protocols. Second, the framework is based on protocol-level disclosure indicators and does not incorporate higher-frequency on-chain behavior, governance dynamics, or cross-protocol interdependencies. Finally, while the Super Learner demonstrates strong predictive performance, the analysis is correlational rather than causal, and the estimated relationships should be interpreted accordingly.

In light of these limitations, DeFi markets evolve quickly, and shifts in liquidity incentives, protocol upgrades, and broader macroeconomic conditions may reshape valuation dynamics. These developments motivate several directions for future research. Incorporating high-frequency on-chain data could improve temporal sensitivity and reveal finer behavioral patterns, while methods for detecting structural breaks and regime shifts may better capture changing market conditions. Causal ML techniques could help distinguish mechanisms from correlations, and extending the framework across multi-chain ecosystems—such as Ethereum, BNB Chain, Avalanche, and Solana—would broaden its applicability. Developing real-time predictive systems that process continuous on-chain data also presents a promising avenue with impact on risk management, trading, and regulatory analysis. Together, these extensions would enrich understanding of DeFi valuation dynamics and strengthen the analytical foundations of decentralized finance.

Funding

The author extends her appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/02/34358).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is available from the author upon reasonable request.

Conflicts of Interest

The author declares no conflict of interest.

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019, August 4–8). Optuna: A next-generation hyperparameter optimization framework. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623–2631), Anchorage, AK, USA. [Google Scholar]
Akyildirim, E., Goncu, A., & Sensoy, A. (2021). Prediction of cryptocurrency returns using machine learning. Annals of Operations Research, 297(1–2), 3–36. [Google Scholar] [CrossRef]
Alamsyah, A., & Salsabila, N. (2024, August 7–8). Exploring the mechanisms of decentralized finance (DeFi) using blockchain technology. 2024 3rd International Conference on Creative Communication and Innovative Technology (ICCIT) (pp. 1–8), Tangerang, Indonesia. [Google Scholar]
Ali, G., & Yahia, Z. (2025). Disclosure determinants of blockchain crowdfunding performance for sustainable smart city financing: An explainable Optuna-optimized machine learning approach. Journal of Posthumanism, 5(7), 802–826. [Google Scholar] [CrossRef]
Almarzooq, H., & bin Waheed, U. (2024). Automating hyperparameter optimization in geophysics with Optuna: A comparative study. Geophysical Prospecting, 72(5), 1778–1788. [Google Scholar] [CrossRef]
Almeida, D., Dionísio, A., Vieira, I., & Ferreira, P. (2022). Uncertainty and risk in the cryptocurrency market. Journal of Risk and Financial Management, 15(11), 532. [Google Scholar] [CrossRef]
Angeris, G., Kao, H.-T., Chiang, R., Noyes, C., & Chitra, T. (2019). An analysis of Uniswap markets. Cryptoeconomic Systems, 1(1). [Google Scholar] [CrossRef]
Angeris, G., Kao, H.-T., Chiang, R., Noyes, C., & Chitra, T. (2021). An analysis of Uniswap markets. arXiv, arXiv:1911.03380. [Google Scholar] [CrossRef]
Ante, L. (2022). The non-fungible token (NFT) market and its relationship with Bitcoin and Ethereum. FinTech, 1(3), 216–224. [Google Scholar] [CrossRef]
Bhambhwani, S. M., & Huang, A. H. (2024). Auditing decentralized finance. The British Accounting Review, 56(2), 101270. [Google Scholar] [CrossRef]
Bogaert, M., & Delaere, L. (2023). Ensemble methods in customer churn prediction: A comparative analysis of the state-of-the-art. Mathematics, 11(5), 1137. [Google Scholar] [CrossRef]
Cakici, N., Fieberg, C., Metko, D., & Zaremba, A. (2023). Machine learning goes global: Cross-sectional return predictability in international stock markets. Journal of Economic Dynamics and Control, 155, 104725. [Google Scholar] [CrossRef]
Chhajer, P., Shah, M., & Kshirsagar, A. (2022). The applications of artificial neural networks, support vector machines, and long–short term memory for stock market prediction. Decision Analytics Journal, 2, 100015. [Google Scholar] [CrossRef]
Cong, L. W., & He, Z. (2019). Blockchain disruption and smart contracts. The Review of Financial Studies, 32(5), 1754–1797. [Google Scholar] [CrossRef]
Cong, L. W., Li, Y., & Wang, N. (2021). Tokenomics: Dynamic adoption and valuation. The Review of Financial Studies, 34(3), 1105–1155. [Google Scholar] [CrossRef]
Corbet, S., Goodell, J. W., Gunay, S., & Kaskaloglu, K. (2023). Are DeFi tokens a separate asset class from conventional cryptocurrencies? Annals of Operations Research, 322(2), 609–630. [Google Scholar] [CrossRef]
Cousaert, S., Xu, J., & Matsui, T. (2022, May 2–5). Sok: Yield aggregators in defi. 2022 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) (pp. 1–14), Shanghai, China. [Google Scholar]
Daian, P., Goldfeder, S., Kell, T., Li, Y., Zhao, X., Bentov, I., Breidenbach, L., & Juels, A. (2019). Flash boys 2.0: Frontrunning, transaction reordering, and consensus instability in decentralized exchanges. arXiv, arXiv:1904.05234. [Google Scholar] [CrossRef]
Dash, R. K., Nguyen, T. N., Cengiz, K., & Sharma, A. (2023). Fine-tuned support vector regression model for stock predictions. Neural Computing and Applications, 35(32), 23295–23309. [Google Scholar] [CrossRef]
Davies, M. M., & Van Der Laan, M. J. (2016). Optimal spatial prediction using ensemble machine learning. International Journal of Biostatistics, 12(1), 179–201. [Google Scholar] [CrossRef]
Dey, R., & Mathur, R. (2023). Ensemble learning method using stacking with base learner, a comparison. In Proceedings of international conference on data analytics and insights, ICDAI 2023 (Volume 727 LNNS, pp. 159–169). Lecture Notes in Networks and Systems. Springer. [Google Scholar] [CrossRef]
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. arXiv, arXiv:1810.11363. [Google Scholar] [CrossRef]
Erdebilli, B., & Devrim-İçtenbaş, B. (2022). Ensemble voting regression based on machine learning for predicting medical waste: A case from Turkey. Mathematics, 10(14), 2466. [Google Scholar] [CrossRef]
Foster, K., Blakstad, S., Gazi, S., & Bos, M. (2021). BigFintechs and their impacts on macroeconomic policies. Dialogue on Global Digital Finance Governance, Technical Paper No. 1.1B. United Nations Development Programme & United Nations Capital Development Fund. Available online: https://hub.hku.hk/handle/10722/301744 (accessed on 19 November 2025).
Geertsema, P., & Lu, H. (2019). Machine valuation. Working Paper. [Google Scholar] [CrossRef]
Geertsema, P., & Lu, H. (2023). Relative valuation with machine learning. Journal of Accounting Research, 61(1), 329–376. [Google Scholar] [CrossRef]
González, S., García, S., Del Ser, J., Rokach, L., & Herrera, F. (2020). A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 64, 205–237. [Google Scholar] [CrossRef]
Gudgeon, L., Werner, S., Perez, D., & Knottenbelt, W. J. (2020, October 21–23). Defi protocols for loanable funds: Interest rates, liquidity and market efficiency. 2nd ACM Conference on Advances in Financial Technologies (pp. 92–112), New York, NY, USA. [Google Scholar]
Ince, H., & Trafalis, T. B. (2008). Short term forecasting with support vector machines and application to stock price prediction. International Journal of General Systems, 37(6), 677–687. [Google Scholar] [CrossRef]
John, K., Kogan, L., & Saleh, F. (2023). Smart contracts and decentralized finance. Annual Review of Financial Economics, 15(1), 523–542. [Google Scholar] [CrossRef]
Kaal, W. A., Evans, S. R., & Howe, H. A. (2024). Digital asset valuation. Richmond Journal of Law and Technology, 28, 657. [Google Scholar] [CrossRef]
Kayikci, S., & Khoshgoftaar, T. M. (2024). Blockchain meets machine learning: A survey. Journal of Big Data, 11(1), 9. [Google Scholar] [CrossRef]
Kim, H. S., & Sohn, S. Y. (2010). Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research, 201(3), 838–846. [Google Scholar] [CrossRef]
Klages-Mundt, A., Harz, D., Gudgeon, L., Liu, J.-Y., & Minca, A. (2020, October 21–23). Stablecoins 2.0: Economic foundations and risk-based models. 2nd ACM Conference on Advances in Financial Technologies (pp. 59–79), New York, NY, USA. [Google Scholar]
Knechel, W. R., Maex, S., & Park, H. J. (2023). Decentralized Finance (DeFi) and cybersecurity assurance. In Donald G. Costello College of Business at George Mason University research paper. George Mason University. [Google Scholar]
Koklev, P. S. (2022). Business valuation with machine learning. Finance: Theory and Practice, 26(5), 132–148. [Google Scholar] [CrossRef]
Kunapuli, G. (2023). Ensemble methods for machine learning. Simon and Schuster. [Google Scholar]
Ladds, M. A., Thompson, A. P., Kadar, J.-P., Slip, D. J., P Hocking, D., & G Harcourt, R. (2017). Super machine learning: Improving accuracy and reducing variance of behaviour classification from accelerometry. Animal Biotelemetry, 5, 8. [Google Scholar] [CrossRef]
Lee, S., Nguyen, N., Karamanli, A., Lee, J., & Vo, T. P. (2023). Super learner machine-learning algorithms for compressive strength prediction of high performance concrete. Structural Concrete, 24(2), 2208–2228. [Google Scholar] [CrossRef]
Liu, Y., & Tsyvinski, A. (2021). Risks and returns of cryptocurrency. The Review of Financial Studies, 34(6), 2689–2727. [Google Scholar] [CrossRef]
Loukil, S., Syed, A. A., Hamza, F., & Jeribi, A. (2025). Decoding the dynamic connectedness between traditional and digital assets under dynamic economic conditions. Journal of Theoretical and Applied Electronic Commerce Research, 20(2), 97. [Google Scholar] [CrossRef]
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. [Google Scholar] [CrossRef]
Metelski, D., & Sobieraj, J. (2022). Decentralized finance (DeFi) projects: A study of key performance indicators in terms of DeFi protocols’ valuations. International Journal of Financial Studies, 10(4), 108. [Google Scholar] [CrossRef]
Mienye, I. D., & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149. [Google Scholar] [CrossRef]
Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University—Computer and Information Sciences, 35(2), 757–774. [Google Scholar] [CrossRef]
Naimi, A. I., & Balzer, L. B. (2018). Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33, 459–464. [Google Scholar] [CrossRef]
Nguyen, X. C., Nguyen, T. T. H., La, D. D., Kumar, G., Rene, E. R., Nguyen, D. D., Chang, S. W., Chung, W. J., Nguyen, X. H., & Nguyen, V. K. (2021). Development of machine learning—Based models to forecast solid waste generation in residential areas: A case study from Vietnam. Resources, Conservation and Recycling, 167, 105381. [Google Scholar] [CrossRef]
Parekh, N., Sen, A., Rajasekaran, P., Jayaseeli, J. D. D., & Robert, P. (2024, December 17–18). Network intrusion detection system using Optuna. 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS) (pp. 312–318), Online. [Google Scholar]
Phillips, R. V., Van Der Laan, M. J., Lee, H., & Gruber, S. (2023). Practical considerations for specifying a super learner. International Journal of Epidemiology, 52(4), 1276–1285. [Google Scholar] [CrossRef]
Porwik, P., Doroz, R., & Wrobel, K. (2019). An ensemble learning approach to lip-based biometric verification, with a dynamic selection of classifiers. Expert Systems with Applications, 115, 673–683. [Google Scholar] [CrossRef]
Priyadarshi, P., & Kumar, P. (2024). Detecting insider trading in the indian stock market: An optimized deep learning approach. Computational Economics, 65, 3923–3943. [Google Scholar] [CrossRef]
Schär, F. (2021). Decentralized finance: On blockchain- and smart contract-based financial markets. Federal Reserve Bank of St. Louis Review, 103(2), 153–174. [Google Scholar] [CrossRef]
Schmid, L., Gerharz, A., Groll, A., & Pauly, M. (2023). Tree-based ensembles for multi-output regression: Comparing multivariate approaches with separate univariate ones. Computational Statistics & Data Analysis, 179, 107628. [Google Scholar] [CrossRef]
Srinivas, P., & Katarya, R. (2022). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. [Google Scholar] [CrossRef]
Şoiman, F., Dumas, J.-G., & Jimenez-Garces, S. (2023). What drives DeFi market returns? Journal of International Financial Markets, Institutions and Money, 85, 101786. [Google Scholar] [CrossRef]
Thabet, H. H., Darwish, S. M., & Ali, G. M. (2024). Measuring the efficiency of banks using high-performance ensemble technique. Neural Computing and Applications, 36, 16797–16815. [Google Scholar] [CrossRef]
Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1), Art25. [Google Scholar] [CrossRef]
Werner, S., Perez, D., Gudgeon, L., Klages-Mundt, A., Harz, D., & Knottenbelt, W. (2022, September 19–21). Sok: Decentralized finance (DeFi). 4th ACM Conference on Advances in Financial Technologies (pp. 30–46), Cambridge, MA, USA. [Google Scholar]
Wong, J., Manderson, T., Abrahamowicz, M., Buckeridge, D. L., & Tamblyn, R. (2019). Can hyperparameter tuning improve the performance of a super learner?: A case study. Epidemiology, 30(4), 521–531. [Google Scholar] [CrossRef]
World Bank. (2023). Remittance prices worldwide. Available online: https://remittanceprices.worldbank.org (accessed on 19 November 2025).
Xu, J., Paruch, K., Cousaert, S., & Feng, Y. (2023). Sok: Decentralized exchanges (dex) with automated market maker (amm) protocols. ACM Computing Surveys, 55(11), 1–50. [Google Scholar] [CrossRef]
Yan, Y., Guo, W., Zhao, M., Hu, J., & Yan, W. P. (2017). Optimizing gross merchandise volume via DNN-MAB dynamic ranking paradigm. arXiv, arXiv:1708.03993. [Google Scholar] [CrossRef]
Yousaf, I., & Yarovaya, L. (2022). Static and dynamic connectedness between NFTs, Defi and other assets: Portfolio implication. Global Finance Journal, 53, 100719. [Google Scholar] [CrossRef]
Zetzsche, D. A., Arner, D. W., & Buckley, R. P. (2020). Decentralized finance. Journal of Financial Regulation, 6(2), 172–203. [Google Scholar] [CrossRef]
Zhang, C., Zhang, H., & Liu, D. (2020). A contrastive study of machine learning on energy firm value prediction. IEEE Access, 8, 11635–11643. [Google Scholar] [CrossRef]
Zhang, R., Tian, Z., McCarthy, K. J., Wang, X., & Zhang, K. (2023). Application of machine learning techniques to predict entrepreneurial firm valuation. Journal of Forecasting, 42(2), 402–417. [Google Scholar] [CrossRef]
Zmaznev, E. (2021). Measuring decentralized finance regulatory uncertainty (pp. 1–43). Norwegian School of Economics. [Google Scholar]

Figure 1. Schematic overview of the Super Learner–based DeFi valuation framework. The diagram illustrates the sequential workflow from data preprocessing to base-learner training, stacked meta-learning, and out-of-sample evaluation. Heterogeneous base learners generate complementary predictions that are aggregated by a KNN meta-learner to improve predictive accuracy and stability, with performance assessed using standard regression metrics. Model interpretation is conducted using SHAP analysis, providing explanations of feature contributions to valuation predictions.

Figure 2. Pairwise Pearson correlation matrix of the predictors used in the Super Learner valuation model. Color intensity reflects the strength and direction of linear relationships. Correlations are generally moderate, indicating limited multicollinearity and supporting the use of the feature set for predictive modeling. Stronger associations among activity, liquidity, and revenue variables reflect expected operational linkages in DeFi protocols, while inflation remains weakly correlated with usage-based metrics.

Figure 3. Optuna optimization framework. Conceptual illustration of the Optuna-based hyperparameter optimization workflow used to tune the base learners and the meta-learner. The process consists of defining an objective function evaluated by RMSE on validation folds, managing trials through an Optuna study with a TPE sampler, and iteratively running the optimization to identify well-performing model configurations.

Figure 4. Structural representation of the Super Learner ensemble used for DeFi valuation. Base learners (Extra Trees, Support Vector Regression, and CatBoost) are trained under an expanding-window panel time-series cross-validation scheme and generate out-of-fold predictions. These predictions are combined by a KNN meta-learner, which learns an optimal aggregation of heterogeneous model outputs to produce the final valuation forecast, ensuring improved accuracy while preserving temporal integrity and avoiding information leakage.

Figure 5. Cross-validated predictive performance of the Super Learner compared with its constituent base learners (Extra Trees, CatBoost, and Support Vector Regression). (A) reports root mean squared error (RMSE), where lower values indicate higher predictive accuracy, while (B) reports the coefficient of determination (R²), where higher values reflect greater explained variance. The Super Learner achieves the lowest RMSE and highest R² across all comparisons, demonstrating that stacking heterogeneous learners yields substantial gains in accuracy and explanatory power relative to any individual base model.

Figure 6. Cross-validated performance comparison between the Super Learner and alternative ensemble strategies, including weighted voting, average voting, bagging, and boosting ensembles. (A) presents RMSE and (B) presents R². The Super Learner consistently outperforms all competing ensemble approaches, indicating that meta-learning with optimized base learners provides superior predictive accuracy and variance explanation compared with fixed-weight averaging or homogeneous ensemble methods.

Figure 7. Cross-validated predictive performance of the Super Learner compared with traditional econometric models and standalone machine-learning algorithms. (A) reports RMSE and (B) reports R². Linear models (e.g., OLS, Ridge, ElasticNet, Lasso), nonlinear baselines (e.g., decision trees and KNN), and regularized regressors all exhibit substantially higher prediction errors and lower explanatory power than the Super Learner. The figure highlights the limitations of single-model approaches in capturing the nonlinear and heterogeneous dynamics of DeFi valuation.

Figure 8. Mean absolute SHAP values for the Super Learner valuation model, ranking predictors by their average contribution to ensemble predictions across the evaluation sample. Higher values indicate greater overall influence on predicted DeFi protocol valuation. Gross merchandise volume (GMV) emerges as the dominant driver, followed by total value locked (TVL) and the decentralized exchange (DEX) classification indicator, highlighting the primacy of transaction activity, liquidity utilization, and protocol design in value formation. Revenue-based variables exhibit moderate influence, while token inflation and lending-protocol classification contribute comparatively less to valuation outcomes.

Figure 9. SHAP summary plot for the Super Learner model, illustrating both the magnitude and direction of each predictor’s impact on valuation predictions. Each point represents an individual observation, with color gradients indicating feature magnitude. Positive SHAP values correspond to upward contributions to predicted valuation. Higher levels of GMV and TVL consistently increase predicted protocol value, reflecting their roles as indicators of realized economic activity and liquidity depth. Revenue variables contribute positively but with greater dispersion, while higher token inflation generally exerts a negative effect through dilution. The plot highlights heterogeneity in feature effects across protocols and time, underscoring the nonlinear and interaction-driven nature of DeFi valuation dynamics.

Table 1. Disclosures of DeFi Protocols.

Variable	Description
The Dependent Variable
Valuation (VAL)	Market value of a DeFi protocol, calculated as circulating token supply multiplied by token price. Reflects overall protocol valuation within the digital asset ecosystem.
The Independent Variables
Total Value Locked (TVL)	Total amount of user funds deposited in a protocol’s smart contracts (e.g., lending, staking, liquidity pools). Indicates protocol scale, liquidity, and user trust.
Protocol Revenue (PR)	Revenue distributed directly to token holders, representing financial returns generated for protocol participants.
Total Revenue (TR)	Total user fees collected over a specified period. Includes both protocol revenue and supply-side revenue (e.g., liquidity providers).
Gross Merchandise Volume (GMV)	Total value of transactions processed by the protocol during a given period. Used to assess growth, activity levels, and competitive positioning.
Inflation Factor (INF)	Measure of the change in circulating token supply, reflecting dilution effects. Calculated as the percentage increase in token supply from one period to the next.
DeFi-class	Categorical variable indicating the protocol type. It includes three classes: Decentralized Exchanges (DEXs), which enable peer-to-peer trading; Lending Protocols, which support decentralized borrowing and lending; and Asset Management Protocols, which provide automated portfolio management and yield optimization services.

Table 2. Tuned hyperparameters of the base learners and meta-learner in the proposed ensemble.

Model/Parameter	Definition	Selected Value
ET
n_estimators	Number of trees in the ensemble.	132
max_depth	Maximum depth of each tree.	9
CAT
iterations	Number of boosting iterations (trees).	394
max_depth	Maximum depth of each tree.	8
learning_rate	Step size used to update model weights.	0.0206
SVR
kernel	Type of function (e.g., linear, polynomial, rbf) used to map data into a higher-dimensional space.	rbf
C	Regularization strength controlling the balance between error tolerance and model complexity.	1.0
KNN
n_neighbors	Number of nearest neighbors	18
weights	How neighbor contributions are weighted	uniform
p	Power parameter for Minkowski distance.	1

Table 3. Cross-validation results of the Super Learner compared with its base learners.

Model	RMSE (±SD)	MAE (±SD)	R² (±SD)	Wilcoxon Test
Super Learner	0.0854 ± 0.0259	0.0645 ± 0.0231	0.9731 ± 0.0202	—
ET	0.1165 ± 0.0519	0.0921 ± 0.0506	0.9424 ± 0.0628	**
CAT	0.1318 ± 0.0603	0.1087 ± 0.0590	0.9235 ± 0.0789	**
SVR	0.1572 ± 0.0649	0.1214 ± 0.0516	0.8960 ± 0.1001	**

* Note: Values are reported as mean ± standard deviation. ** indicates that the Super Learner significantly outperforms the comparator based on the Wilcoxon signed-rank test (p < 0.01).

Table 4. Cross-validation results of the Super Learner compared with alternative ensembles.

Model	RMSE (±SD)	MAE (±SD)	R² (±SD)	Wilcoxon Test
Super Learner	0.0854 ± 0.0259	0.0645 ± 0.0231	0.9731 ± 0.0202	—
WV Ensemble	0.1235 ± 0.0571	0.1012 ± 0.0564	0.9335 ± 0.0736	**
AV Ensemble	0.1247 ± 0.0574	0.1021 ± 0.0565	0.9324 ± 0.0745	**
Bagging	0.1536 ± 0.0682	0.1119 ± 0.0582	0.8971 ± 0.0945	**
LGBM	0.1632 ± 0.0657	0.1146 ± 0.0576	0.8898 ± 0.0927	**
RF	0.1674 ± 0.0753	0.1264 ± 0.0672	0.8749 ± 0.1186	**
AdaBoost	0.1911 ± 0.0809	0.1562 ± 0.0695	0.8380 ± 0.1443	**

* Note: ** indicates that the Super Learner significantly outperforms all ensemble methods based on the Wilcoxon signed-rank test (p < 0.01).

Table 5. Cross-validation results of the Super Learner compared with individual statistical and ML models.

Model	RMSE (±SD)	MAE (±SD)	R² (±SD)	Wilcoxon Test
Super Learner	0.0854 ± 0.0259	0.0645 ± 0.0231	0.9731 ± 0.0202	—
KNN	0.1599 ± 0.0796	0.1268 ± 0.0735	0.8811 ± 0.1242	**
Ridge	0.1757 ± 0.0901	0.1345 ± 0.0757	0.8561 ± 0.1455	**
ElasticNet	0.1815 ± 0.0876	0.1394 ± 0.0674	0.8556 ± 0.1210	**
Lasso	0.1883 ± 0.0864	0.1441 ± 0.0632	0.8486 ± 0.1280	**
OLS	0.1899 ± 0.0793	0.1433 ± 0.0705	0.8482 ± 0.1299	**
DT	0.2025 ± 0.0873	0.1375 ± 0.0679	0.8231 ± 0.1463	**

* Note: ** indicates that the Super Learner significantly outperforms all statistical and ML models based on the Wilcoxon signed-rank test (p < 0.01).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ali, G.M. Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework. J. Risk Financial Manag. 2026, 19, 63. https://doi.org/10.3390/jrfm19010063

AMA Style

Ali GM. Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework. Journal of Risk and Financial Management. 2026; 19(1):63. https://doi.org/10.3390/jrfm19010063

Chicago/Turabian Style

Ali, Gihan M. 2026. "Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework" Journal of Risk and Financial Management 19, no. 1: 63. https://doi.org/10.3390/jrfm19010063

APA Style

Ali, G. M. (2026). Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework. Journal of Risk and Financial Management, 19(1), 63. https://doi.org/10.3390/jrfm19010063

Article Menu

Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework

Abstract

1. Introduction

2. Related Work

2.1. Disclosure Determinants of Decentralized Finance (DeFi) Valuation

2.2. Machine Learning Applications in Valuation: Bridging the DeFi Gap

3. Proposed Framework

3.1. General Context

3.2. Phase 1: Data Preparation

3.3. Phase 2: Model Development

3.3.1. Extremely Randomized Trees (ETs)

3.3.2. Categorical Gradient Boosting (CAT)

3.3.3. Support Vector Regression (SVR)

3.3.4. The K-Nearest Neighbors (KNNs)

3.3.5. Optuna Optimization

4. Experimental Results and Discussion

4.1. Analysis of Experimental Results

4.1.1. Performance Comparison Between the Super Learner and Its Base Learners

4.1.2. Performance Comparison Between the Super Learner and Alternative Ensemble Strategies

4.1.3. Performance Comparison Between the Super Learner and Statistical and ML Models

4.1.4. SHAP Analysis

5. Discussion of Results

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI