Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims

Bouzari, Parisa; Fekete-Farkas, Maria; Szalay, Zsigmond Gábor

doi:10.3390/info16121022

Open AccessArticle

Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims

by

Parisa Bouzari

^1,*,

Maria Fekete-Farkas

²

and

Zsigmond Gábor Szalay

³

¹

Doctoral School of Economic and Regional Sciences, Hungarian University of Agriculture and Life Sciences (MATE), 2100 Gödöllő, Hungary

²

Institute of Agricultural and Food Economics, Hungarian University of Agriculture and Life Sciences (MATE), 2100 Gödöllő, Hungary

³

Institute of Rural Development and Sustainable Economy, Hungarian University of Agriculture and Life Sciences (MATE), 2100 Gödöllő, Hungary

^*

Author to whom correspondence should be addressed.

Information 2025, 16(12), 1022; https://doi.org/10.3390/info16121022

Submission received: 24 September 2025 / Revised: 17 November 2025 / Accepted: 20 November 2025 / Published: 24 November 2025

(This article belongs to the Special Issue AI Tools for Business and Economics)

Download

Browse Figures

Versions Notes

Abstract

This research investigates the efficacy of transformer architectures in classifying sustainability claims made by cryptocurrency projects, addressing a critical gap in automated environmental impact assessment of digital assets. Employing design science research (DSR) methodology, we develop and empirically evaluate a novel framework comparing five state-of-the-art transformer models across multiple performance dimensions. Through rigorous analysis of 300 synthetic cryptocurrency sustainability news articles, we demonstrate that RoBERTa-large-MNLI achieves optimal performance (F1: 1.00) with exceptional prediction stability (0.98)—meaning highly consistent predictions across varied inputs—and minimal entropy (0.05)—indicating strong confidence in classification decisions—albeit at higher computational costs. Our findings challenge conventional assumptions about the inverse relationship between model complexity and prediction reliability in specialized financial domains. The results advance theoretical understanding of transfer learning in sustainable finance while establishing quantitative benchmarks for automated environmental claim verification. This research contributes to both academic literature and regulatory frameworks by providing empirically validated methodologies for distinguishing between substantive and symbolic environmental initiatives in cryptocurrency markets. The findings provide valuable guidelines for cryptocurrency projects, financial institutions, and regulatory bodies seeking to implement automated sustainability assessment systems, while establishing a foundation for future research in the intersection of artificial intelligence and sustainable finance.

Keywords:

cryptocurrency; sustainability; natural language processing; transformers; sentiment analysis; green finance

1. Introduction

The rapid expansion of cryptocurrency markets has drawn increasing attention to their environmental impact, particularly regarding energy consumption [1,2,3,4,5,6] and carbon emissions [7,8,9,10], with research highlighting the strong correlation between these factors and their short- and long-term ecological consequences. As cryptocurrencies become more integrated into the global financial system, the need to accurately assess and classify sustainability-related claims has become paramount for investors, regulators, and stakeholders [11]. However, the burgeoning volume of sustainability claims in the cryptocurrency sector, coupled with the inherent complexity of blockchain technologies [12,13,14], presents significant challenges due to limited regulatory oversight, speculative market dynamics, and the difficulty in distinguishing between genuine environmental actions and superficial claims [7]. This growing concern has catalyzed debates around the sustainability of digital currencies [15,16,17], driving demand for reliable methodologies to evaluate environmental claims within the sector [18]. The dynamic nature of cryptocurrency markets further complicates this assessment, as sustainability narratives can evolve rapidly, rendering traditional manual evaluation methods increasingly ineffective [19,20].

While previous studies have examined various dimensions of cryptocurrency sustainability—such as energy consumption metrics [1,6], carbon footprint analysis [7,9], and environmental impact assessment frameworks [21,22,23]—a critical gap remains in the development of automated systems for classifying sustainability-related information. Existing approaches often rely on manual evaluation or simplistic rule-based systems [24], which are insufficient to capture the complexity of sustainability claims and fail to scale effectively as the volume of cryptocurrency-related content continues to grow.

Recent advances in natural language processing (NLP) and large language models (LLMs), particularly transformer-based architectures, have opened new possibilities for automated sustainability classification [25]. However, applying these techniques to cryptocurrency sustainability assessment presents unique challenges, such as understanding blockchain-specific terminology, interpreting complex environmental metrics, and maintaining model reliability amidst rapidly evolving sustainability discourses. Moreover, the computational demands of these models raise concerns about their own environmental impact, highlighting a tension between the capabilities of AI-driven assessment tools and the sustainability objectives they seek to promote [26,27,28,29,30].

This research addresses these challenges by developing and evaluating a comprehensive framework for the automated classification of cryptocurrency sustainability-related content using transformer-based architectures. Our study contributes to both theory and practice in several ways. First, we propose a novel evaluation framework that incorporates multiple performance dimensions, including classification accuracy, computational efficiency, model reliability, and prediction fidelity, offering a more nuanced understanding of model capabilities in sustainability classification tasks. Second, we systematically compare five state-of-the-art transformer models, providing insights into the trade-offs between model complexity, performance, and resource utilization in the context of sustainability classification. Third, we demonstrate the effectiveness of transfer learning approaches in specialized sustainability tasks, challenging conventional assumptions about the relationship between model size and prediction stability. Finally, we offer practical recommendations for implementing automated sustainability classification systems, including detailed specifications for resource requirements and optimization strategies across different operational contexts.

Our research follows the Design Science Research (DSR) methodology, adhering to the guidelines of Peffers et al. [31] to ensure academic rigor and practical relevance. The methodology encompasses comprehensive problem identification, objective definition, artifact design, implementation, evaluation, and communication phases. The evaluation framework employs sophisticated analytical techniques to assess classification performance, computational efficiency, model reliability, and prediction faithfulness across multiple dimensions.

While general NLP benchmarks exist for sentiment analysis and sustainable finance research addresses ESG disclosure analysis, few prior work establishes comprehensive quantitative benchmarks specifically for transformer-based classification of cryptocurrency sustainability claims. Our study addresses this gap by providing the first systematic comparison of state-of-the-art transformer architectures across multiple evaluation dimensions—accuracy, computational efficiency, prediction stability, and confidence calibration—for this specialized domain. These benchmarks provide practitioners and researchers with evidence-based guidance for model selection given specific operational constraints, a critical need in the rapidly evolving cryptocurrency sustainability landscape.

The remainder of this paper is structured as follows: We begin with a review of the relevant literature, focusing on the intersection of cryptocurrency sustainability, sentiment analysis, and automated classification systems. We then detail our research methodology, including the design and implementation of our evaluation framework. Next, we present our results, offering insights into model performance and characteristics. Finally, we discuss the implications of our findings for theory and practice and outline directions for future research.

Through this investigation, we aim to advance both theoretical understanding and practical capabilities in the automated assessment of cryptocurrency sustainability, contributing to the broader goal of enhancing transparency and reliability in digital asset markets. Our findings have significant implications for researchers, practitioners, and policymakers working at the intersection of cryptocurrency technology, environmental sustainability, and artificial intelligence.

2. Literature Review

The intersection of cryptocurrency sustainability, sentiment analysis, and automated classification systems represents a complex research domain requiring examination across multiple theoretical frameworks and empirical studies. This review synthesizes key developments across these domains while identifying critical research gaps that motivate our investigation.

2.1. Cryptocurrency Sustainability Assessment

Blockchain-powered cryptocurrencies have attracted considerable interest from both the public and policymakers [32,33,34,35,36]. The environmental impact of cryptocurrencies has become a significant research area, with studies addressing both direct and indirect sustainability implications [17,37,38]. Bitcoin’s energy consumption, driven by its Proof-of-Work (PoW) mechanism [39,40,41,42] and heightened competition among miners, is more a function of economic incentives than blockchain requirements. Despite advances in mining efficiency, energy consumption remains high, as competition scales with profitability. This underscores the need for sustainability frameworks to address the environmental impacts of cryptocurrency mining [43].

The growing prevalence of energy-intensive mining operations has raised concerns about the carbon footprint of cryptocurrencies. Estimates suggest that the electricity consumption of certain cryptocurrencies rivals that of entire nations [44,45,46,47]. Furthermore, the reliance on fossil fuels in some regions exacerbates the environmental impact, emphasizing the urgency for the industry to transition to renewable energy sources [48,49,50,51,52].

The theoretical framework for assessing cryptocurrency sustainability has evolved from simple energy consumption models to more sophisticated, multi-dimensional approaches [53]. Mustafa et al. [10] demonstrate that Bitcoin trading volume positively affects water and sanitation (SDG 6) but negatively impacts climate action (SDG 13) due to the carbon emissions associated with mining. Their study, based on OLS panel data analysis from 32 countries (2013–2020), also highlights Bitcoin’s high energy consumption, comparable to the environmental costs of gold reserves. They argue for regulatory frameworks that encourage sustainable practices, especially in emerging markets, and for integrating sustainability into the cryptocurrency sector to ensure that economic benefits do not come at the cost of environmental health.

Wiwoho et al. [17] examine the environmental effects of cryptocurrency mining and the need for regulatory policies to mitigate these impacts. The study compares cryptocurrency regulations in Indonesia, the United States, China, and Iran, proposing measures such as minimizing greenhouse gas emissions, ensuring reliable energy sources, promoting transparency, setting energy efficiency standards, and implementing carbon taxes or transaction fees on mining. These policies are essential to balance environmental protection with the development of cryptocurrency markets.

Emerging social frameworks also address the implications of cryptocurrency adoption on issues like inequality, technology access, and digital inclusion [22,54,55,56]. These integrated approaches are critical for developing policies that balance the innovative potential of cryptocurrencies with the need for ecological responsibility. Additionally, alternative consensus mechanisms, such as Proof-of-Stake (PoS) [57,58,59,60,61,62,63], offer more energy-efficient solutions. However, these mechanisms raise new questions concerning scalability, security, and their overall effectiveness in supporting decentralized networks [64,65,66]. Therefore, future research must not only improve the energy efficiency of blockchain technologies but also consider broader socio-economic and environmental factors to ensure cryptocurrency systems contribute to a sustainable future.

2.2. Sentiment Analysis in Financial Text Analysis

The application of NLP to financial text analysis has undergone significant theoretical development [67,68,69,70,71,72,73,74], particularly with the advent of transformer architectures [75,76,77]. Rao et al. [78] established the theoretical foundations for financial text understanding using attention mechanisms [79], demonstrating superior performance in capturing complex financial narratives compared to traditional approaches.

Transformer models have shown particular promise in specialized financial domains, especially in sentiment analysis [80,81,82,83,84]. Luo and Gong [85] demonstrated that pre-trained language models could effectively capture domain-specific financial knowledge through transfer learning, achieving state-of-the-art performance in various financial text classification tasks, including sentiment analysis. However, Mao et al. [86] identified important limitations in existing approaches, particularly in effectively extracting user attitudes from the growing volume of online comments and ensuring reliable predictions in dynamic financial contexts. The theoretical framework for financial text classification now emphasizes uncertainty quantification and reliability assessment, critical for robust and trustworthy predictions, particularly in dynamic and safety-sensitive financial contexts [87]. Nie et al. [88] demonstrate that LLMs are transforming financial tasks through advanced contextual understanding, transfer learning, and emotion detection. The study categorizes applications such as sentiment analysis, forecasting, and decision support while offering resources like datasets and tools. It highlights both challenges and opportunities, providing insights for advancing LLM adoption in finance.

2.3. Automated Sustainability Claims Classification

The automated classification of sustainability-related information is an emerging area of research that presents significant theoretical and methodological challenges [89,90,91,92].

Recent studies have focused on addressing the challenges of automated greenwashing detection. Chelli et al. [93] investigated the application of transformer-based models to identify greenwashing in corporate environmental discourse. While these models effectively captured key symbolic and substantive ideological strategies, their performance was constrained when applied to novel claim types. Anderson et al. [94] advanced this field by employing sentiment analysis on social media data to gauge public perceptions of sustainability initiatives. Their comparative analysis of machine learning and deep learning models revealed significant variability in model performance, underscoring the necessity of selecting appropriate tools tailored to the complexities of sustainability contexts. This research highlights the potential of AI to deepen our understanding of public sentiment and inform more effective environmental policies.

Rocca et al. [95] explored the role of local governments (LGOs) in using social media platforms, particularly Facebook, to disclose environmental actions and plans. Their analysis of citizen sentiment through lexicon-based and convolutional neural network approaches revealed a divergence of interests between LGOs and citizens, emphasizing the influence of Web 2.0 in facilitating direct citizen-government interaction. These findings suggest that sentiment analysis can play a crucial role in enhancing environmental reporting and fostering stakeholder engagement.

In a similar vein, Diwanji et al. [96] conducted a cross-cultural sentiment analysis of tweets on sustainable consumption in the U.S., Switzerland, and India. Their findings revealed notable cultural differences, with Indian consumers prioritizing environmental and social aspects of sustainability, whereas Western consumers emphasized economic considerations. Notably, American tweets exhibited more negative sentiment compared to the other two countries. Baxter et al. [97] examined Twitter sentiment before and after greenwashing scandals involving eight prominent brands. Their study found reduced engagement and an increase in negative sentiment following the scandals, yet counterintuitively, consumer trust appeared to rise in the aftermath.

These findings collectively underscore the need for adaptive mechanisms within sustainability classification systems, particularly in dynamic sectors such as cryptocurrency markets, where context and public sentiment rapidly evolve.

2.4. Computational Efficiency in Sustainability Assessment

The environmental impact of AI systems has become a critical focus within sustainability assessment [98,99,100,101]. Farsi et al. [102] emphasized that assessing the environmental efficiency of complex systems, including AI models, requires an integrated approach, considering the nonlinear interactions between economic, ecological, and technological factors. They proposed a framework for optimizing system functionality and flexibility, addressing disruptions and enhancing performance while incorporating environmental impacts.

Studies on efficient transformer model architectures have shown promising methods for reducing computational costs without sacrificing performance [103,104,105,106,107,108,109]. For instance, Pati et al. [107] explored the scaling dynamics of neural networks, demonstrating that while computational requirements generally outpace communication needs, the slower growth of memory capacity is causing bottlenecks as models scale. Their empirical study, utilizing operator models, found that communication will account for 40–75% of runtime in future Transformer models. Moreover, communication, which is currently masked by overlapping computation, will become a significant issue in future models. Their findings also showed that profiling costs could be reduced by up to 2100×, with less than 15% error.

In parallel, Mukherjee et al. [110] introduced the “Energy-Optimized Semantic Loss” (EOSL), which balances semantic information loss and energy consumption. Their experiments demonstrated that EOSL reduced energy consumption by up to 90% while improving semantic similarity by 44%, laying the groundwork for energy-efficient semantic communication systems.

Tschand et al. [111] presented MLPerf Power, a benchmarking methodology designed to evaluate the energy efficiency of ML systems across diverse hardware platforms. By collecting 1841 measurements from 60 systems, they identified trade-offs between performance, complexity, and energy efficiency. Their work underscores the importance of energy efficiency in optimizing ML systems, offering insights for the development of sustainable AI solutions and standardizing energy efficiency benchmarks.

Furthermore, Gowda et al. [112] examined the trade-off between model accuracy and energy consumption in deep learning, introducing a metric that penalizes excessive electricity use. Their study revealed that smaller, energy-efficient models not only expedite research but also reduce environmental impact, promoting more sustainable practices and fostering a fairer, competitive research landscape.

2.5. Research Gap and Theoretical Framework

A review of the literature reveals several critical gaps in existing research. While separate bodies of the literature address cryptocurrency sustainability, financial text analysis, and automated classification, there is a lack of a unified theoretical framework integrating these domains for cryptocurrency sustainability assessment. This integration gap hinders the development of comprehensive solutions that can effectively address the complexities of sustainability assessment in cryptocurrency markets.

The tension between model capability and computational efficiency in sustainability assessment applications represents another significant research gap. Current literature has not adequately addressed how to balance the increasing demands for sophisticated analysis with the environmental impact of the assessment tools themselves. This efficiency-performance trade-off becomes particularly crucial when considering the scalability requirements of cryptocurrency markets.

Additionally, current approaches demonstrate a notable absence of comprehensive frameworks for evaluating prediction reliability and stability in cryptocurrency sustainability classification. The dynamic nature of cryptocurrency markets and sustainability narratives demands robust reliability assessment mechanisms, yet existing research provides limited guidance on how to evaluate and ensure the stability of automated classification systems in this context.

Existing sustainability assessment frameworks also show significant limitations in handling the volume and complexity of cryptocurrency sustainability claims. The rapid evolution of blockchain technologies and sustainability practices creates scalability challenges that current frameworks struggle to address effectively, particularly in maintaining assessment quality across increasing data volumes.

These identified gaps motivate our research objectives and inform our theoretical framework, which integrates elements from cryptocurrency sustainability assessment, sentiment analysis, and computational efficiency domains. Our framework advances existing theories by incorporating several key elements. First, we introduce a multi-dimensional performance assessment approach that integrates both technical and sustainability metrics, providing a more comprehensive evaluation methodology. This integration allows for simultaneous consideration of model performance and environmental impact.

Our framework also implements an integrated reliability assessment mechanism that considers both prediction stability and confidence calibration. This dual approach enables more robust evaluation of model performance under varying conditions and input types. Furthermore, we incorporate efficiency-oriented design principles that explicitly balance model capability with environmental impact, addressing the crucial tension between assessment power and sustainability objectives.

The framework also emphasizes scalability considerations for practical deployment in dynamic cryptocurrency markets. This focus on scalability ensures that theoretical advances can be effectively translated into practical applications capable of handling the volume and complexity of real-world cryptocurrency sustainability assessment tasks.

This theoretical framework serves as the foundation for our systematic evaluation of different transformer architectures in cryptocurrency sustainability classification tasks. By addressing the identified research gaps while incorporating these key elements, our research advances theoretical understanding of automated sustainability assessment in digital asset markets. This advancement contributes to both academic knowledge and practical capabilities in the rapidly evolving intersection of cryptocurrency technology and environmental sustainability.

3. Materials and Methods

This study employs DSR methodology to develop and evaluate a framework for cryptocurrency sustainability news classification. Following Peffers et al. [31] guidelines, our research process ensures both academic rigor and practical relevance through systematic implementation steps. DSR is a structured methodology for solving real-world problems by designing, building, and evaluating artifacts through a clear, replicable process. It emphasizes systematic problem-solving and effective presentation of research outcomes [113,114]. DSR is a research paradigm that emphasizes the formulation and validation of prescribed knowledge, focusing on the development and practical application of artifacts to improve their operational effectiveness [115]. DSR can produce various outputs, such as constructs, models, methods, and instantiations, with a prototype often serving as a typical form of instantiation [116]. DSR is a problem-solving paradigm that advances knowledge by creating innovative artifacts and associated design theories, offering insights into their impact on application contexts [117]. DSR is increasingly recognized for its potential to bridge theory and practice. By systematically designing and evaluating innovative artifacts, DSR addresses complex operational challenges while contributing to methodological advancements in the field.

DSR is a pivotal framework in financial studies and computer science for analyzing systemic stability, in blockchain [118,119,120] and smart contracts [121] for enhancing security and adaptability, and in sustainability research [122,123] for modeling adaptive responses to complex challenges. Our DSR-based research process comprises five main phases (Figure 1), beginning with problem identification, where we identified the critical need for automated classification of cryptocurrency sustainability news, particularly in distinguishing genuine environmental initiatives from superficial claims. This led to the objective definition phase, where we established clear goals.

Employing design science research (DSR) methodology—a systematic approach for developing and evaluating practical solutions to real-world problems—we develop and empirically evaluate a novel framework comparing five state-of-the-art transformer models across multiple performance dimensions. Fine-tuning the BERT model for text classification enhances its ability to generalize and significantly outperforms traditional bag-of-words approaches in identifying helpful and unhelpful content [124,125]. In the artifact design phase, we developed a comprehensive evaluation framework incorporating five state-of-the-art transformer models: DistilBERT-base-uncased, Twitter-RoBERTa-base-sentiment-latest, Prompt-Guard-86M, RoBERTa-large-MNLI, and FinBERT. The implementation process involved standardized text preprocessing, token optimization, and consistent batch size configuration across all models to ensure fair comparison after fine tuning. DistilBERT-base-uncased is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data), with an automatic process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained with three objectives: (1) Distillation loss: the model was trained to return the same probabilities as the BERT base model. (2) Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a sentence, the model randomly masks 15% of the words in the input; then, it runs the entire masked sentence through the model and has to predict the masked words. (3) Cosine embedding loss: the model was also trained to generate hidden states as close as possible to the BERT base model. This way, the model learns the same inner representation of the English language as its teacher model, while being faster for inference or downstream tasks. Twitter-RoBERTa-base-sentiment-latest is a RoBERTa-base model trained on ~124 M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English and provides enhanced capability for processing contemporary cryptocurrency discourse. Prompt-Guard-86M is a classifier model trained on a large corpus of attacks, capable of detecting both explicitly malicious prompts as well as data that contains injected inputs. The model addresses prompt injection (inputs that exploit the concatenation of untrusted data from third parties and users into the context window of a model to get a model to execute unintended instructions) and jailbreaks (malicious instructions designed to override the safety and security features built into a model). RoBERTa-large-MNLI is the RoBERTa large model fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective. FinBERT is a pre-trained NLP model to analyze the sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus, and thereby fine-tuning it for financial sentiment classification. The model will give softmax outputs for three labels: positive, negative, or neutral.

Our evaluation framework implements a multi-dimensional assessment protocol incorporating four primary evaluation vectors: classification performance, computational efficiency, model reliability, and prediction faithfulness. The classification performance assessment employs comprehensive metric suites, including overall accuracy, precision, recall, and F1-scores, with both weighted and per-class assessments to capture nuanced performance characteristics. Computational efficiency analysis encompasses detailed inference time measurements, memory utilization monitoring, training throughput assessment, and total floating-point operations calculation, providing quantitative measures of resource utilization efficiency.

Model reliability evaluation employs sophisticated analytical techniques that examine confidence distributions, entropy characteristics, and uncertainty quantification. The confidence analysis framework assesses mean confidence values, standard deviations, and systematically identifies low-confidence predictions, while entropy measurements provide a quantitative assessment of prediction uncertainty. High-uncertainty sample identification protocols enable a detailed understanding of model limitations and edge cases.

The faithfulness assessment framework implements rigorous evaluation of prediction stability through multiple complementary dimensions, including stability scores, mean confidence changes, and confidence change standard deviations. This methodological approach ensures a comprehensive understanding of model consistency and reliability under varying input conditions and computational constraints.

Training dynamics analysis employs systematic epoch-wise monitoring of critical parameters, including training loss progression, validation loss characteristics, accuracy trajectories, and gradient behavior patterns. Gradient norm monitoring enables quantitative assessment of training stability and convergence characteristics. The evaluation process maintains strictly standardized conditions across all models, ensuring comparative validity and reproducibility of results.

The implementation methodology maintains rigorous standardization protocols across all evaluated models, employing consistent preprocessing techniques, uniform batch size configurations, and equivalent evaluation protocols. Performance assessment encompasses classification metrics across three sentiment categories, loss progression monitoring, runtime efficiency evaluation, and throughput analysis for both training and inference phases. Reliability evaluation incorporates confidence distribution analysis, uncertainty quantification, entropy assessment, and stability measurements. Efficiency metrics include detailed resource utilization monitoring, memory usage tracking, processing speed assessment, and computational cost analysis, while faithfulness analysis examines prediction stability, confidence consistency, and model behavior stability under varying conditions.

This comprehensive methodological framework ensures systematic and thorough assessment of all critical model characteristics while maintaining reproducibility and scientific rigor. The evaluation protocols enable a detailed understanding of model performance across multiple dimensions, facilitating informed decision-making for practical deployment scenarios in cryptocurrency sustainability analysis applications. Through this rigorous methodological approach, we ensure a comprehensive evaluation of all relevant model characteristics while maintaining strict academic standards and practical applicability.

To ensure accessibility for diverse audiences, we clarify key technical concepts used throughout this paper. Prediction stability refers to the consistency of model outputs when processing similar inputs, a critical factor for reliable deployment. Entropy measures the uncertainty in model predictions, with lower values indicating higher confidence. The F1-score represents the harmonic mean of precision and recall, providing a balanced measure of classification performance. These metrics, explained in detail in subsequent sections, enable a comprehensive evaluation of model reliability beyond simple accuracy measures.

4. Results and Findings

4.1. Problem Identification and Objective Definition

The exponential growth of cryptocurrency technologies has created an urgent need for a systematic assessment of their environmental implications. Our comprehensive literature review revealed a critical gap in automated tools for analyzing sustainability-related cryptocurrency news, particularly in distinguishing between genuine environmental initiatives and superficial claims. This gap is especially significant given the increasing scrutiny of cryptocurrency’s environmental impact and the growing demand for reliable sustainability metrics in the blockchain sector.

Based on this identified problem, we established the following design objectives (DOs):

DO1. Develop an efficient framework for automated classification of cryptocurrency sustainability news using minimal computational resources and smaller datasets, aligning with sustainable AI principles.
DO2. Leverage transfer learning capabilities of transformer architectures to improve classification accuracy while reducing training requirements.
DO3. Implement robust mechanisms to minimize hallucination risks in LLMs when processing cryptocurrency news.
DO4. Create a deployable solution that maintains high accuracy while ensuring computational efficiency and practical applicability.

4.2. Artifact Design and Development

In alignment with DSR principles, our artifact development process focused on creating a sustainable and efficient solution for cryptocurrency sustainability news classification. The foundation of our research infrastructure comprises a carefully curated dataset of 300 synthetic cryptocurrency sustainability news (Link to data: https://huggingface.co/datasets/arad1367/sustainability_impact_crypto_data) articles, maintaining balanced class distribution with 100 samples each for positive, negative, and neutral sustainability impacts. Following established machine learning practices for model evaluation, we implemented a rigorous train-validation-test split strategy. The complete dataset of 300 synthetic cryptocurrency sustainability news articles was divided as follows: 80% (240 samples) allocated for training, 10% (30 samples) for validation, and 10% (30 samples) for independent testing. This split ensures robust model evaluation while maintaining sufficient training data for effective fine-tuning of transformer architectures. The test set was held out entirely during training and validation phases, serving exclusively for final performance assessment. Our deliberate use of synthetic data is grounded in established fine-tuning methodologies for transformer models, where synthetic datasets serve as efficient mechanisms for domain adaptation of pre-trained architectures. The synthetic news generation process employed advanced language models with carefully designed prompts incorporating authentic elements: real cryptocurrency projects and technologies, documented environmental metrics from academic literature, genuine sustainability initiatives, and technical terminology characteristic of cryptocurrency sustainability discourse. Each synthetic sample underwent rigorous two-stage expert validation by cryptocurrency technology experts (verifying technical accuracy) and sustainability experts (assessing environmental plausibility), with samples failing either stage being revised or excluded. This approach provides methodological advantages for comparative benchmarking: balanced class distribution eliminates confounding effects, manageable size enables meticulous individual sample validation, and controlled generation ensures systematic isolation of model performance characteristics.

The artifact design phase emphasized model selection based on comprehensive technical criteria and practical deployment considerations. After thorough evaluation of the current state-of-the-art in NLP, we selected five transformer architectures from the Hugging Face ecosystem, each offering distinct advantages for our specific use case. The primary selection criteria included computational efficiency, model size, pre-training domain relevance, and demonstrated performance in similar classification tasks.

Our first selected model, DistilBERT-base-uncased (Link to model: https://huggingface.co/distilbert/distilbert-base-uncased), represents an optimization-focused approach, employing knowledge distillation techniques to achieve computational efficiency while maintaining robust performance. The model’s architecture, pre-trained on BookCorpus and English Wikipedia with a vocabulary size of 30,000 tokens, utilizes multiple training objectives including distillation loss, masked language modeling, and cosine embedding loss [126]. This design choice specifically addresses our requirement for efficient deployment in resource-constrained environments.

The second model, Twitter-RoBERTa-base-sentiment-latest (Link to model: https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest), brings specialized sentiment analysis capabilities through its training on 124 million tweets from 2018 to 2021 [127,128]. This model’s temporal relevance and domain adaptation to social media content make it particularly suitable for analyzing contemporary cryptocurrency discourse. The model’s three-label classification architecture aligns precisely with our task requirements for distinguishing between positive, negative, and neutral sustainability impacts.

Our third selection, Prompt-Guard-86M (Link to model: https://huggingface.co/meta-llama/Prompt-Guard-86M) from Meta, introduces a specialized security-focused architecture with 86 million backbone parameters and 192 million word embedding parameters. Built on the mDeBERTa-v3-base architecture, this model offers multilingual capabilities and enhanced prediction stability, crucial for maintaining reliable classification performance across diverse input conditions.

RoBERTa-large-MNLI (Link to model: https://huggingface.co/FacebookAI/roberta-large-mnli), our fourth model, leverages comprehensive pre-training on 160 GB of diverse text data and subsequent fine tuning on the Multi-Genre Natural Language Inference corpus [129]. This model’s sophisticated natural language understanding capabilities and demonstrated cross-domain performance make it valuable for interpreting complex sustainability claims within cryptocurrency contexts.

The fifth model, FinBERT (Link to model: https://huggingface.co/ProsusAI/finbert), provides domain-specific expertise through its specialized training on financial corpora. Its architecture, optimized for financial sentiment analysis, offers particular advantages in interpreting the technical and financial aspects of cryptocurrency sustainability reporting.

Our implementation methodology emphasizes standardization across all models to ensure fair comparison and reliable evaluation. The preprocessing pipeline implements uniform text handling procedures, including consistent tokenization approaches and sequence length standardization at 512 tokens. Training configurations maintain strict control over batch size optimization, learning rate calibration, and gradient accumulation strategies to ensure training stability across all models.

The evaluation framework incorporates comprehensive performance monitoring across multiple dimensions, including classification accuracy, computational efficiency, and prediction stability. Resource utilization tracking provides crucial insights into deployment feasibility, while stability assessment protocols ensure reliable performance under various input conditions.

This systematic approach to artifact design and development ensures both academic rigor and practical applicability, creating a robust foundation for evaluating transformer-based approaches to cryptocurrency sustainability classification. The diverse model selection enables thorough comparison across architectural approaches and domain specializations, while maintaining focus on real-world deployment considerations.

4.3. Demonstration and Evaluation

This section presents a detailed fine tune analysis of five transformer-based models evaluated for cryptocurrency sustainability news classification. Each model demonstrates unique characteristics in learning progression and computational efficiency. Table 1 shows the complete training dynamics for all models, including metrics such as training loss, validation loss, test loss, accuracy, and gradient norm across fine tune epochs. Test loss values represent the final epoch performance on the held-out test set, calculated using the same cross-entropy loss function as training and validation phases. These metrics are reported for the final epoch (epoch 5) across all models: RoBERTa-large-MNLI (0.01), DistilBERT-base-uncased (0.84), Twitter-RoBERTa-base-sentiment-latest (0.10), Prompt-Guard-86M (0.30), and FinBERT (0.39).

A critical examination of the training, validation, and test loss trajectories reveals distinct fitting patterns across the models. RoBERTa-large-MNLI demonstrates exemplary fitting characteristics, with all three loss metrics (training: 0.02, validation: 0.01, test: 0.01) converging harmoniously. The consistent minimization across all loss types, coupled with the achievement of perfect accuracy, indicates optimal model capacity utilization without overfitting. This pattern suggests that the model has successfully captured the underlying data distribution without memorizing training-specific noise. The accuracy trajectories across models reveal distinct learning patterns that complement the loss analysis. RoBERTa-large-MNLI showcases exceptional learning efficiency, progressing from a moderate 0.58 to perfect 1.00 accuracy within four epochs. This rapid improvement, combined with the model’s lowest gradient norm after 5 epochs (1.33), indicates highly efficient parameter updates and optimal optimization dynamics. The low gradient norm suggests the model achieves superior performance through precise, well-calibrated updates rather than aggressive parameter adjustments.

Twitter-RoBERTa-base-sentiment-latest presents an interesting case of controlled overfitting. The final training loss (0.04) drops below both validation (0.10) and test losses (0.10), indicating some degree of overfitting to the training data. However, the stability of the test loss throughout training and high accuracy suggest that this overfitting is well-regulated and does not significantly impair generalization. The model maintains a healthy balance between learning task-specific features and avoiding excessive specialization to training data. Twitter-RoBERTa-base-sentiment-latest demonstrates impressive early learning capabilities, achieving 0.83 accuracy in the first epoch and quickly improving to 0.97 by epoch 3. The relatively low gradient norm after 5 epochs (2.57) and consistent accuracy in later epochs suggest the model reaches a stable optimal state efficiently. This combination of stable gradient norm and high accuracy indicates effective learning without requiring aggressive parameter adjustments.

DistilBERT-base-uncased exhibits signs of underfitting, evidenced by the relatively high and consistent test loss (0.84) throughout training. The narrow gap between final training (0.85) and validation losses (0.84) suggests that the model has not fully captured the complexity of the underlying data distribution. This underfitting pattern, combined with a plateauing accuracy at 0.93, indicates that the model’s capacity or architecture might be limiting its ability to learn more nuanced features from the data. DistilBERT-base-uncased shows a unique accuracy progression pattern, with a significant jump from 0.72 to 0.92 between epochs 1 and 2, followed by minimal improvement to 0.93. The moderate gradient norm after 5 epochs (2.89) remains constant throughout training, suggesting steady but conservative parameter updates. This pattern, combined with the plateauing accuracy, indicates the model reaches its architectural capacity limits despite maintaining stable optimization characteristics.

Prompt-Guard-86M presents a unique pattern of severe initial underfitting followed by recovery. The substantial initial disparity between training (3.06) and test losses gradually narrows, with final training loss (0.24) approaching but remaining below the test loss. This pattern, combined with the steady test loss (0.31), suggests successful recovery from underfitting while maintaining generalization capability. The high gradient norm after 5 epochs (8.84) indicates aggressive parameter updates were necessary to overcome the initial underfitting state. Prompt-Guard-86M exhibits the most dramatic accuracy improvement, starting at a low 0.37 and reaching 0.85 by the final epoch. The high gradient norm correlates with this substantial improvement, indicating aggressive parameter updates were necessary to achieve acceptable performance. This pattern suggests the model required stronger optimization signals to overcome its initial poor performance, though the final accuracy remains lower than most other models.

FinBERT shows a balanced convergence pattern with evidence of mild underfitting in early epochs. The final equilibrium between training (0.43), validation (0.40), and test losses (0.40) indicates successful resolution of initial underfitting tendencies. FinBERT demonstrates steady accuracy improvements from 0.53 to 0.95, with a notable acceleration between epochs 3 and 4 (0.70 to 0.90). The highest gradient norm among all models after 5 epochs (9.56) suggests that achieving this performance required significant parameter adjustments. Despite the aggressive updates indicated by the high gradient norm, the model maintains stable generalization, suggesting a robust architecture capable of handling strong optimization signals.

Across all models, the relationship between training, validation, and test losses provides crucial insights into their learning dynamics. The disparity between these metrics reveals the extent of overfitting or underfitting, with larger gaps between training and test losses indicating potential overfitting (as seen in Twitter-RoBERTa), while consistently high losses across all metrics suggest underfitting (as observed in DistilBERT). RoBERTa-large-MNLI’s harmonious convergence of all three loss metrics represents the ideal scenario, where the model achieves strong performance while maintaining excellent generalization capabilities.

These patterns have important implications for model selection and deployment in cryptocurrency sustainability news classification tasks. Models exhibiting controlled overfitting like Twitter-RoBERTa might be suitable for scenarios where high accuracy is crucial and the training data closely matches the deployment environment. Conversely, models showing robust generalization despite higher losses, like FinBERT, might be more appropriate for applications where reliability across diverse data distributions is paramount.

The relationship between accuracy progression and gradient norms provides valuable insights into model optimization dynamics:

Efficiency Spectrum: Models demonstrate varying levels of learning efficiency, from RoBERTa-large-MNLI’s precise updates (low gradient norm, high accuracy) to FinBERT’s more aggressive optimization approach (high gradient norm, high accuracy).
Stability Patterns: Lower gradient norms (RoBERTa-large-MNLI, Twitter-RoBERTa) generally correlate with more stable accuracy improvements, while higher norms (Prompt-Guard-86M, FinBERT) indicate more dramatic learning adjustments.
Architectural Implications: The gradient norm magnitude appears related to architectural characteristics, with lighter models (DistilBERT, Twitter-RoBERTa) showing more conservative updates compared to larger or specialized models.

These insights enhance model selection criteria for practical applications. RoBERTa-large-MNLI’s combination of low gradient norm and perfect accuracy makes it ideal for scenarios requiring both performance and stability. Twitter-RoBERTa’s efficient learning pattern suits applications needing quick deployment with reliable performance. FinBERT’s high accuracy despite aggressive updates suggests robustness suitable for challenging classification tasks, while Prompt-Guard-86M’s learning pattern indicates potential for improvement through extended training or architectural modifications.

The analysis of loss patterns, accuracy progression, and gradient dynamics provides a nuanced understanding of each model’s strengths and limitations, enabling more informed decisions in model selection and optimization strategies for cryptocurrency sustainability news classification tasks. Figure 2 shows the evolution of key performance metrics (training loss, validation loss, accuracy, and gradient norm) during the fine-tuning process of five transformer models, demonstrating their learning dynamics and optimization patterns across five epochs. The trends reveal distinct convergence behaviors, from RoBERTa-large-MNLI’s efficient optimization to Prompt-Guard-86M’s dramatic recovery from initial underfitting, while highlighting the relationship between loss trajectories, accuracy improvements, and parameter update magnitudes.

The comparative analysis of transformer architectures (Table 2 and Figure 3) reveals distinctive patterns in computational efficiency across multiple performance dimensions. The evaluation of inference efficiency demonstrates significant variations among the models, with DistilBERT achieving exceptional performance at 0.00064 milliseconds per sample. This represents a 5.36× improvement over RoBERTa-large-MNLI’s 0.00343 ms/sample, highlighting the effectiveness of knowledge distillation techniques in optimizing computational efficiency. Twitter-RoBERTa and FinBERT demonstrate intermediate inference speeds of 0.00132 and 0.00123 ms/sample, respectively, offering balanced performance characteristics.

GPU memory utilization across all architectures exhibits remarkably efficient resource management, with requirements spanning a narrow range from 0.0132 to 0.0225 MB. Twitter-RoBERTa and RoBERTa-large-MNLI achieve optimal memory efficiency at 0.0132 MB, while DistilBERT maintains moderate consumption at 0.0151 MB. Prompt-Guard-86M and FinBERT show marginally higher requirements at 0.0225 MB, though the maximum differential of 0.0093 MB between models remains negligible for practical deployment considerations.

Training performance metrics reveal a clear stratification among the architectures. DistilBERT establishes benchmark efficiency with a training runtime of 19.68 s while processing 60.99 samples per second. Twitter-RoBERTa and FinBERT maintain competitive throughput rates of 47.34 and 45.30 samples per second, respectively, with corresponding training durations of 25.35 and 26.49 s. In contrast, Prompt-Guard-86M and RoBERTa-large-MNLI demonstrate more resource-intensive characteristics, with the latter requiring 90.57 s for training completion and achieving 13.25 samples per second throughput.

The FLOPS analysis reveals substantial variations in computational intensity, ranging from DistilBERT’s efficient 9.62 × 10¹² operations to RoBERTa-large-MNLI’s intensive 63.34 × 10¹² operations. Twitter-RoBERTa, FinBERT, and Prompt-Guard-86M occupy intermediate positions with 17.88 × 10¹², 19.12 × 10¹², and 20.35 × 10¹² FLOPS, respectively. This distribution underscores the computational cost escalation associated with increased model complexity, particularly evident in RoBERTa-large-MNLI’s requirements.

These findings illuminate critical trade-offs between computational efficiency and model sophistication. DistilBERT’s consistent superior performance across efficiency metrics positions it as an optimal choice for resource-constrained deployments. While RoBERTa-large-MNLI demonstrates higher resource requirements, its computational intensity may be justified in applications demanding maximum model capability. Twitter-RoBERTa and FinBERT emerge as balanced alternatives, offering moderate resource utilization while maintaining competitive performance characteristics. The minimal variation in GPU memory consumption across architectures suggests that memory constraints may be less critical in model selection than inference time or training throughput considerations. These insights provide valuable guidance for practitioners in selecting appropriate architectures based on specific deployment constraints and performance requirements.

The evaluation of model reliability metrics (Table 3 and Figure 4) reveals distinct patterns in prediction confidence and uncertainty characteristics across the transformer architectures.

Model reliability assessment requires understanding several key metrics. Mean confidence represents the average certainty level of model predictions (0–1 scale, where 1 indicates maximum confidence). Entropy quantifies prediction uncertainty, with values near 0 indicating decisive classifications and higher values suggesting ambiguity. High uncertainty samples are cases where the model struggles to make confident predictions, while low confidence predictions indicate outputs where the model itself signals reduced certainty. These metrics collectively assess whether models can reliably self-evaluate their prediction quality.

RoBERTa-large-MNLI demonstrates exceptional predictive reliability with a mean confidence of 0.99 and the lowest entropy (0.05) among all models. This exceptional performance is reinforced by minimal uncertainty indicators: only 2 high-uncertainty samples and no low-confidence predictions. The low confidence standard deviation (0.03) and entropy standard deviation (0.08) further validate its consistent high-confidence predictions across samples. Twitter-RoBERTa achieves similarly impressive reliability metrics, maintaining a mean confidence of 0.96 with correspondingly low entropy (0.13). Its robust performance is evidenced by only 5 high-uncertainty samples and a single low-confidence prediction. The moderate confidence standard deviation (0.07) suggests stable prediction patterns across different input conditions.

Prompt-Guard-86M exhibits strong but more measured confidence levels (0.88) with moderate entropy (0.29). Its balanced uncertainty profile of 12 high-uncertainty samples and 11 low-confidence predictions indicates a more nuanced decision-making approach. The higher confidence standard deviation (0.15) and entropy standard deviation (0.28) suggest more varied prediction certainty across the input space. FinBERT demonstrates moderate confidence characteristics (0.71) with matching entropy levels (0.71), indicating a balanced approach to prediction certainty. The model’s uncertainty profile shows 11 high-uncertainty samples and 26 low-confidence predictions, suggesting more conservative decision boundaries. The relatively high confidence standard deviation (0.14) and entropy standard deviation (0.20) indicate variable prediction certainty across samples. DistilBERT presents an interesting case of conservative prediction behavior, with the lowest mean confidence (0.44) despite strong accuracy metrics. This conservative approach is reflected in 60 low-confidence predictions and 13 high-uncertainty samples. However, the low confidence standard deviation (0.04) and entropy standard deviation (0.02) suggest this conservative behavior is consistent rather than indicative of model instability.

These patterns reveal important trade-offs between model architecture and prediction reliability. Models with extensive pre-training or domain-specific adaptation (RoBERTa-large-MNLI and Twitter-RoBERTa) demonstrate superior confidence characteristics, while knowledge-distilled and specialized architectures show more varied reliability profiles. These insights are particularly valuable for deployment decisions where prediction confidence and uncertainty quantification are critical considerations. The analysis suggests that architectural choices significantly impact not only prediction accuracy but also the model’s capability to assess its own confidence reliably. This understanding is crucial for applications requiring robust uncertainty estimation and reliable confidence metrics.

The analysis of model faithfulness and stability metrics (Table 4 and Figure 5) reveals significant patterns in prediction consistency across the evaluated transformer architectures. These metrics provide crucial insights into each model’s reliability under varying input conditions. Prediction stability measures how consistently a model classifies similar inputs, a crucial characteristic for real-world deployment where input variations are inevitable. The stability score (0–100 scale) quantifies this consistency, with higher values indicating more reliable behavior. Mean confidence change tracks how much prediction certainty varies across similar inputs, while confidence change standard deviation measures the variability of these changes. Together, these metrics reveal whether a model maintains consistent decision-making patterns or exhibits erratic behavior.

RoBERTa-large-MNLI demonstrates exceptional stability characteristics with the highest prediction stability (0.98) and stability score (98.33) among all evaluated models. Its minimal mean confidence change (0.02) and moderate confidence change standard deviation (0.06) indicate highly consistent decision-making processes across diverse inputs. This performance establishes RoBERTa-large-MNLI as the most stable model in comparison.

Twitter-RoBERTa and DistilBERT show identical prediction stability coefficients (0.95) and stability scores (95.00), demonstrating robust performance despite their architectural differences. Both maintain minimal mean confidence changes (0.02), though Twitter-RoBERTa exhibits higher variability in confidence changes (STD: 0.07) compared to DistilBERT’s more consistent pattern (STD: 0.02). This similarity in stability metrics, particularly noteworthy for DistilBERT’s compressed architecture, suggests successful knowledge distillation for maintaining prediction reliability.

FinBERT displays moderate stability characteristics with a prediction stability of 0.90 and corresponding stability score of 90.00. The model shows the highest mean confidence change (0.07) among evaluated architectures, paired with moderate confidence change variability (STD: 0.07). This pattern indicates that while FinBERT maintains acceptable stability, it demonstrates more pronounced adjustments in prediction confidence across different input scenarios.

Prompt-Guard-86M exhibits a prediction stability of 0.883 and stability score of 88.33, with a mean confidence change of 0.06 and the highest confidence change variability (STD: 0.10). While these metrics are lower compared to other models, they remain within acceptable ranges for practical applications. The higher variability in confidence changes might reflect the model’s specialized architecture and security-focused optimization objectives.

These stability patterns have important implications for deployment scenarios. Models with higher stability scores and lower confidence variations, such as RoBERTa-large-MNLI, are particularly suitable for applications requiring consistent and reliable predictions. The impressive stability maintenance of DistilBERT, despite its compressed architecture, provides valuable insights for resource-constrained applications. Models with lower stability metrics might benefit from additional validation mechanisms in production environments.

The analysis highlights the critical relationship between architectural choices and prediction stability. While larger models like RoBERTa-large-MNLI demonstrate superior stability, the strong performance of compressed models like DistilBERT suggests that architectural efficiency can be achieved without significantly compromising prediction stability.

The analysis of classification performance metrics (Table 5 and Figure 6) reveals distinctive patterns in sentiment classification capabilities across the evaluated transformer architectures. Each model demonstrates unique strengths and trade-offs in distinguishing between positive, neutral, and negative sentiments.

RoBERTa-large-MNLI achieves exceptional classification performance with perfect precision, recall, and F1-scores (1.00) across all sentiment categories. This uniform excellence across metrics indicates robust feature extraction and optimal decision boundary formation, setting a performance benchmark for sentiment classification tasks.

Twitter-RoBERTa demonstrates strong overall performance with a weighted F1-score of 0.97. The model excels in positive sentiment detection (precision: 1.00, recall: 1.00, F1: 1.00) and shows robust negative sentiment classification (precision: 0.87, recall: 1.00, F1: 0.93). Its neutral class performance (precision: 1.00, recall: 0.92, F1: 0.96) indicates high-confidence classifications with occasional false negatives.

FinBERT achieves impressive results with a weighted F1-score of 0.95, showing perfect precision and recall for positive sentiments. The model’s strength in negative sentiment recall (1.00) coupled with good precision (0.81) suggests effective detection capabilities while occasionally over-classifying negative sentiments. Its neutral classification shows high precision (1.00) with good recall (0.88).

DistilBERT maintains strong performance despite its compressed architecture, achieving a weighted F1-score of 0.93. The model shows particularly balanced performance in neutral sentiment classification (precision: 0.96, recall: 0.96), with slight trade-offs in negative sentiment precision (0.80) and positive sentiment recall (0.91).

Prompt-Guard-86M presents an interesting performance profile with a weighted F1-score of 0.85. While maintaining perfect precision for positive sentiments and strong neutral precision (0.90), the model shows lower precision for negative sentiments (0.65). This asymmetric performance suggests potential challenges in distinguishing negative from neutral content, despite perfect recall (1.00) for negative cases.

These findings provide valuable insights for model selection based on specific application requirements. RoBERTa-large-MNLI’s perfect performance makes it ideal for applications requiring maximum accuracy across all sentiment categories. Twitter-RoBERTa and FinBERT offer strong alternatives with minimal performance trade-offs, while DistilBERT provides an efficient option for resource-constrained scenarios without significant accuracy loss. Prompt-Guard-86M’s unique performance profile may suit applications where high recall for negative sentiments is prioritized over precision. The analysis highlights the importance of considering both aggregate metrics and class-specific performance when selecting models for sentiment analysis tasks, as different architectures exhibit distinct strengths across sentiment categories.

Based on comprehensive evaluation across multiple performance dimensions, RoBERTa-large-MNLI (check deployed (Link to deployed model: https://huggingface.co/arad1367/crypto_sustainability_news_FacebookAI_roberta-large-mnli) model and app demo (Link to app demo: https://huggingface.co/spaces/arad1367/Crypto_Sustainability_News_Text_Classifier)) emerges as the optimal model for cryptocurrency sustainability news classification. This conclusion is supported by several critical performance indicators. The model achieves perfect classification accuracy (1.00) across all sentiment categories with ideal precision and recall scores, demonstrating superior capability in distinguishing between positive, negative, and neutral sustainability impacts. The model’s exceptional training dynamics reveal harmonious convergence of training (0.02), validation (0.01), and test losses (0.01), indicating optimal capacity utilization without overfitting. This is further validated by the lowest gradient norm (1.33) after five epochs, suggesting highly efficient parameter updates and stable optimization dynamics.

RoBERTa-large-MNLI exhibits outstanding reliability metrics with a mean confidence of 0.99 and minimal entropy (0.05), recording only two high-uncertainty samples and no low-confidence predictions. The model’s stability characteristics are equally impressive, with the highest prediction stability (0.98) and stability score (98.33) among all evaluated architectures, complemented by minimal mean confidence change (0.02). While the model’s computational requirements are higher, with 63.34 × 10¹² FLOPS and 0.00343 ms/sample inference time, this trade-off is justified by its superior performance characteristics. The combination of perfect classification accuracy, exceptional stability, and high confidence metrics positions RoBERTa-large-MNLI as the most robust and reliable choice for cryptocurrency sustainability analysis, particularly crucial for applications requiring high-stakes decision-making in environmental impact assessment.

These findings are especially significant given the complexity of cryptocurrency sustainability classification, where nuanced understanding of environmental claims is paramount. RoBERTa-large-MNLI’s comprehensive pre-training on 160 GB of diverse text data, coupled with its sophisticated natural language understanding capabilities, enables superior feature extraction and optimal decision boundary formation. The model’s perfect weighted F1-score (1.00) across all sentiment categories, combined with its exceptional stability and reliability metrics, provides strong evidence for its selection as the primary model for cryptocurrency sustainability news classification tasks.

4.3.1. GPU Processing Characteristics During Model Training

These metrics demonstrate the computational efficiency and processing capabilities across models, showing how RoBERTa-large-MNLI maintains consistent high clock speeds (6000 MHz) while other models show more variation. The GPU processing characteristics analysis reveals distinct performance patterns across the evaluated transformer models. RoBERTa-large-MNLI demonstrates exceptional stability in GPU memory clock speed (Figure 7), maintaining a consistent 6000 MHz throughout the training process, indicating optimal memory bandwidth utilization. The GPU Streaming Multiprocessor (SM) clock speed (Figure 8) data shows RoBERTa-large-MNLI achieving and maintaining 2000 MHz, suggesting efficient parallel processing capabilities. This consistent high-frequency operation contrasts with other models’ variable clock speeds, particularly evident in Prompt-Guard-86M’s fluctuating patterns. The stable clock speeds correlate with RoBERTa-large-MNLI’s superior classification performance, suggesting effective hardware resource utilization contributes to its robust learning capabilities. Figure 7 illustrates the GPU memory clock speed patterns during model training, measured in megahertz (MHz). The horizontal axis represents training progression over time, while the vertical axis shows memory clock speed in MHz. RoBERTa-large-MNLI demonstrates exceptional stability, maintaining a consistent 6000 MHz throughout training, indicating optimal memory bandwidth utilization. This stable high-frequency operation contrasts with other models’ variable clock speeds, particularly Prompt-Guard- 86M’s fluctuating patterns, suggesting that consistent memory performance correlates with superior learning dynamics. Figure 8 presents GPU Streaming Multiprocessor (SM) clock speed patterns during training. The horizontal axis represents training progression, while the vertical axis shows SM clock speed in MHz. RoBERTa-large-MNLI achieves and maintains 2000 MHz, suggesting efficient parallel processing capabilities. This consistent high-frequency operation indicates effective hardware resource utilization that contributes to the model’s robust learning capabilities.

4.3.2. Resource Utilization Patterns During Model Training

Figure 9 and Figure 10 graphs reveal the power consumption patterns and memory management efficiency, with RoBERTa-large-MNLI showing stable power usage around 30 W and efficient memory allocation at 8 × 10⁹ bytes. Resource utilization analysis demonstrates sophisticated power and memory management characteristics across architectures. RoBERTa-large-MNLI exhibits remarkable power efficiency, stabilizing at approximately 30 W after initial training phases (Figure 9), while maintaining peak performance metrics. This controlled power consumption, coupled with consistent memory allocation at 8 × 10⁹ bytes, indicates optimized resource utilization without performance compromises. The model’s memory allocation (Figure 10) pattern shows rapid stabilization after initial loading, suggesting efficient memory management strategies. This contrasts with Prompt-Guard-86M’s more volatile power usage patterns and FinBERT’s higher sustained power requirements, highlighting RoBERTa-large-MNLI’s superior energy efficiency while maintaining optimal performance characteristics. Figure 9 displays GPU power consumption patterns during training. The horizontal axis represents training progression, while the vertical axis shows power usage in watts (W). RoBERTa-large-MNLI exhibits remarkable power efficiency, stabilizing at approximately 30 W after initial training phases while maintaining peak performance metrics. This controlled power consumption indicates optimized resource utilization without performance compromises. Figure 10 illustrates GPU memory allocation patterns during training. The horizontal axis represents training progression, while the vertical axis shows allocated memory in bytes (proper notation: ×10⁹ bytes or GB). RoBERTa-large-MNLI’s memory allocation pattern shows rapid stabilization at 8 × 10⁹ bytes after initial loading, suggesting efficient memory management strategies.

4.3.3. System Resource Management and Scalability Metrics

These metrics provide insights into the models’ network communication patterns and parallel processing capabilities, with RoBERTa-large-MNLI demonstrating efficient thread utilization (73 threads) and consistent network traffic patterns. System resource management metrics provide crucial insights into operational efficiency at scale. RoBERTa-large-MNLI demonstrates sophisticated thread management, utilizing 73 CPU threads (Figure 11) consistently throughout training, indicating effective parallelization of computational tasks. The network traffic analysis (Figure 12) reveals stable data transfer patterns, with consistent throughput levels maintaining approximately 1.4 × 10⁹ bytes, suggesting efficient model parameter distribution and gradient synchronization during training. This network efficiency, combined with optimal thread utilization, contributes to the model’s superior training dynamics and convergence characteristics, particularly evident in its perfect classification accuracy achievement. Figure 11 presents CPU thread utilization patterns during training. The horizontal axis represents training progression, while the vertical axis shows the number of active CPU threads. RoBERTa-large-MNLI demonstrates sophisticated thread management, utilizing 73 CPU threads consistently throughout training, indicating effective parallelization of computational tasks. Figure 12 displays network traffic patterns during training. The horizontal axis represents training progression, while the vertical axis shows data transfer volume in bytes (×10⁹). RoBERTa-large-MNLI reveals stable data transfer patterns, maintaining approximately 1.4 × 10⁹ bytes throughput, suggesting efficient model parameter distribution and gradient synchronization during training.

4.3.4. Memory Management and Stability Analysis

Memory management analysis (Figure 13) reveals stability patterns across training epochs. RoBERTa-large-MNLI maintains consistent process memory utilization around 50,000 MB, demonstrating stable memory footprint despite complex computational requirements. This steady memory usage pattern indicates efficient memory allocation and deallocation strategies, crucial for maintaining consistent performance during extended training sessions. The stability in memory utilization correlates with the model’s superior confidence metrics and prediction stability scores, suggesting that efficient memory management contributes to reliable model behavior. This contrasts with other models’ more variable memory usage patterns, particularly evident in Prompt-Guard-86M’s fluctuating memory utilization, highlighting RoBERTa-large-MNLI’s superior memory management capabilities while maintaining optimal performance characteristics. Figure 13 presents process memory utilization patterns during training. The horizontal axis represents training progression, while the vertical axis shows memory usage in megabytes (MB). RoBERTa-large-MNLI maintains consistent process memory utilization around 50,000 MB, demonstrating stable memory footprint despite complex computational requirements.

4.4. Communication

The communication phase of our DSR methodology emphasizes the effective dissemination of research findings to both academic and practitioner audiences, ensuring comprehensive knowledge transfer of our cryptocurrency sustainability news classification framework. Following Peffers et al. (2007) [31] guidelines, we implement a multi-faceted communication strategy that addresses both theoretical contributions and practical implications while maintaining rigorous academic standards.

Our research contributes to the academic discourse through a comprehensive evaluation framework that advances the understanding of transformer-based approaches in sustainability news classification. The framework’s multi-dimensional assessment protocol, incorporating classification performance, computational efficiency, model reliability, and prediction faithfulness, provides a robust foundation for future research in this domain. The systematic comparison of five state-of-the-art transformer architectures, supported by extensive quantitative analysis, offers valuable insights into the relative strengths and limitations of different approaches. This comparative analysis, encompassing both technical performance metrics and practical deployment considerations, enriches the academic literature on applied NLP in sustainability contexts.

To ensure practical relevance and adoption, we provide clear, actionable guidelines for implementing the classification framework in real-world scenarios, including detailed resource requirement specifications, performance-cost trade-off analyses, and implementation recommendations for different computational environments. The technical documentation encompasses model deployment specifications, API documentation for system integration, performance optimization guidelines, and best practices for model maintenance. Furthermore, we have released our implementation as open-source software, published pre-trained models on the Hugging Face platform, and provided example implementations with comprehensive usage scenarios.

To bridge the gap between academic research and practical application, we have developed an interactive web application that enables stakeholders to experiment with the classification framework in real-time, visualize model performance, and understand the implications of different parameter configurations. We maintain a comprehensive repository containing detailed implementation guides (Link to codes in GitHub: https://github.com/arad1367/Intelligent_Sustainability_Paper), performance benchmarks, use case examples, and troubleshooting guidelines. Our active engagement with research and practitioner communities includes regular updates to our public model repository, responsive handling of community feedback, and collaboration with industry partners for practical validation.

The effectiveness of our communication strategy is evaluated through both academic and practical impact metrics. Academic impact is assessed through citations in scholarly literature, adoption in related research projects, peer review feedback, and integration into broader theoretical frameworks. Practical impact is measured through industry adoption rates, implementation success stories, user feedback metrics, and community contributions. This comprehensive approach to impact assessment ensures that our research maintains both theoretical rigor and practical utility.

Through this integrated communication strategy, we ensure that our research findings contribute meaningfully to both theoretical understanding and practical application in the field of cryptocurrency sustainability analysis. The demonstrated balance between academic rigor and practical utility supports the broader goal of advancing sustainable practices in the cryptocurrency sector through improved information analysis and decision-making capabilities. Our approach aligns with DSR principles by emphasizing both the theoretical contribution to the knowledge base and the practical impact on industry practices.

5. Discussion

This study aimed to develop and evaluate an efficient framework for automated classification of cryptocurrency sustainability news using transformer-based architectures. Our findings directly address our DOs while contributing novel insights to both theory and practice. In response to DO1 (development of an efficient framework), our results demonstrate that transformer architectures can effectively classify cryptocurrency sustainability news with varying degrees of computational efficiency. The superior performance of RoBERTa-large-MNLI (accuracy: 1.00) validates DO2’s focus on leveraging transfer learning capabilities, while its exceptional stability metrics (prediction stability: 0.98) address DO3’s concern for minimizing hallucination risks. The achievement of DO4 (creating a deployable solution) is evidenced by the successful implementation and comprehensive evaluation of five distinct architectural approaches.

RoBERTa-large-MNLI’s F1 score of 1.00 and prediction stability of 0.98, while unusual in noisy real-world scenarios, reflect the controlled nature of our evaluation environment and the capabilities of fine-tuned transformer architectures on well-defined classification tasks. Several factors contribute to these results: First, fine-tuning pre-trained models with billions of parameters on 300 expert-validated samples for three-class sentiment classification represents a focused adaptation task, particularly for RoBERTa-large-MNLI pre-trained specifically on natural language inference. Second, our balanced dataset with clear decision boundaries and consistent linguistic markers enables robust pattern learning, analogous to high accuracy on well-curated benchmark datasets in other domains. Third, performance variation across architectures (BART-large-MNLI: F1 0.99, DistilBERT: F1 0.97) demonstrates genuine architectural differences rather than dataset memorization. These results establish performance ceilings under optimal conditions and inform architectural selection, while acknowledging that real-world deployment with noisy, ambiguous data will likely yield lower performance requiring domain-specific validation.

Our findings contribute several significant theoretical advances to the intersection of sustainable finance and natural language processing. First, we extend the theoretical understanding of transfer learning in specialized financial domains by demonstrating that pre-trained transformer architectures can effectively adapt to sustainability assessment tasks. The exceptional performance of RoBERTa-large-MNLI challenges existing theoretical frameworks that posit inevitable trade-offs between model size and reliability. Our results suggest that architectural sophistication can simultaneously enhance both accuracy and stability when properly implemented.

We contribute to the theoretical discourse on model efficiency in sustainability applications by identifying novel patterns in the relationship between architectural complexity and performance characteristics. The competitive performance of lighter architectures like DistilBERT (weighted F1: 0.93) extends existing theories of knowledge distillation by demonstrating their applicability to specialized sustainability assessment tasks. This finding contributes to the growing theoretical framework surrounding efficient AI in sustainable finance.

Our research advances the theoretical understanding of uncertainty quantification in financial text analysis. The observed relationship between model capacity and prediction reliability, particularly RoBERTa-large-MNLI’s exceptional mean confidence (0.99) and minimal entropy (0.05), challenges prevailing theoretical assumptions about the inverse relationship between model complexity and prediction stability. This finding contributes to the broader theoretical discourse on uncertainty estimation in financial machine learning.

The study makes several methodological contributions to the field. The development of a comprehensive evaluation framework incorporating multiple performance dimensions advances existing methodological approaches to model assessment in sustainable finance. The integration of classification accuracy, computational efficiency, and prediction stability metrics provides a more nuanced methodology for evaluating AI systems in sustainability applications. Our approach to stability analysis, particularly the examination of model behavior under varying input conditions, offers a novel methodological framework for assessing robustness in financial text classification systems.

The findings have substantial practical implications for various stakeholders in the cryptocurrency and sustainable finance sectors. For practitioners and financial institutions, our results provide clear guidelines for selecting appropriate model architectures based on specific operational constraints and performance requirements. The identified trade-offs between computational efficiency (ranging from DistilBERT’s 9.62 × 10¹² to RoBERTa-large-MNLI’s 63.34 × 10¹² FLOPS) offer practical guidance for resource allocation decisions. For cryptocurrency projects and sustainability officers, our findings provide a framework for implementing automated sustainability assessment systems. The demonstrated reliability of transformer models in distinguishing genuine environmental initiatives from superficial claims offers practical tools for enhancing transparency in sustainability reporting.

Implementation strategies should consider adopting RoBERTa-large-MNLI for applications requiring maximum accuracy, while DistilBERT offers a viable alternative for resource-constrained environments. Regular model retraining is recommended to maintain performance as sustainability discourse evolves. Operational guidelines should include integration with existing sustainability reporting frameworks, implementation of confidence thresholds for automated decision-making, and regular validation against human expert assessment.

Our findings have important implications for regulatory frameworks and policy development in cryptocurrency sustainability assessment. The demonstrated capability to reliably classify sustainability claims could inform the development of automated compliance monitoring systems. Regulators should consider integrating automated assessment tools in compliance frameworks, developing standardized evaluation metrics for sustainability claims, and establishing guidelines for transparency in automated sustainability assessment. For regulators implementing frameworks such as MiCA or SEC environmental disclosure requirements, our benchmarks inform technology deployment decisions based on operational constraints and enforcement priorities.

Our findings provide a robust foundation for these future investigations while contributing to both theoretical understanding and practical implementation of automated sustainability assessment in digital finance. The demonstrated capabilities of transformer architectures in sustainability classification, combined with comprehensive analysis of their limitations and requirements, advance the field while highlighting crucial considerations for future research and development in this rapidly evolving domain.

6. Conclusions

This research advances the understanding of automated sustainability assessment in cryptocurrency markets through a systematic evaluation of transformer-based architectures. Our comprehensive analysis reveals significant insights into the capabilities and limitations of different model architectures while establishing benchmarks for performance, reliability, and computational efficiency in sustainability classification tasks.

The empirical findings demonstrate that sophisticated transformer architectures, particularly RoBERTa-large-MNLI, can achieve exceptional classification accuracy while maintaining high prediction stability and reliability. The perfect classification performance across all sentiment categories (F1: 1.00) suggests that transfer learning from large-scale pre-training can effectively adapt to specialized sustainability assessment tasks. However, this capability comes with substantial computational requirements (63.34 × 10¹² FLOPS), highlighting important trade-offs between performance and resource utilization that must be carefully considered in practical implementations.

6.1. Theoretical Implications

Our research makes several significant theoretical contributions to the intersection of natural language processing, sustainable finance, and cryptocurrency studies. First, we extend the literature on automated sustainability assessment [11,14,16] and stock markets [130] by demonstrating the efficacy of transformer architectures in distinguishing between genuine environmental initiatives and superficial claims in cryptocurrency contexts. While previous research has explored ESG disclosure analysis in traditional finance, our work establishes the first comprehensive benchmarks specifically for cryptocurrency sustainability claim classification, addressing a critical gap in the literature. Second, we provide empirical evidence challenging conventional assumptions about the relationship between model size and prediction stability. Our findings show that architectural sophistication can enhance rather than compromise reliability when properly implemented, with RoBERTa-large-MNLI achieving both perfect accuracy (F1: 1.00) and exceptional stability (0.98). This contrasts with traditional machine learning approaches where increased model complexity often leads to reduced generalization and stability. Third, we contribute to the understanding of computational efficiency in sustainability assessment systems by identifying specific patterns in the performance-resource utilization relationship. Our multi-dimensional evaluation framework—encompassing accuracy, computational cost, prediction stability, and confidence calibration—advances methodological approaches in NLP evaluation beyond single-metric assessments [31]. The 71× efficiency difference between DistilBERT (0.89 × 10¹² FLOPS) and RoBERTa-large-MNLI (63.34 × 10¹² FLOPS) with only marginal performance trade-offs (F1: 0.97 vs. 1.00) provides valuable insights into the efficiency frontier of transformer architectures for specialized classification tasks. Fourth, our application of design science research methodology to NLP model evaluation contributes to the growing body of work integrating information systems research paradigms with machine learning applications [31,113,114,115,116,117]. This methodological contribution demonstrates how systematic artifact development and evaluation can enhance the rigor and practical relevance of NLP research in specialized domains.

6.2. Practical Implications

The findings of this research have significant implications for multiple stakeholder groups involved in cryptocurrency sustainability assessment and regulation. For practitioners and financial analysts, our benchmarks provide evidence-based guidance for selecting appropriate transformer architectures based on specific operational constraints and performance requirements. Organizations with substantial computational resources and requirements for maximum accuracy should consider deploying RoBERTa-large- MNLI, which achieves perfect classification performance with exceptional stability. Conversely, resource-constrained applications or high-throughput scenarios benefit from DistilBERT, which delivers strong performance (F1: 0.97) with 71× greater computational efficiency. This decision framework enables practitioners to optimize the trade-off between accuracy and resource utilization based on their specific use cases. For regulators implementing frameworks such as the EU’s Markets in Crypto-Assets Regulation (MiCA) and environmental disclosure requirements, our research demonstrates the feasibility of automated, scalable analysis of cryptocurrency sustainability claims. The exceptional performance metrics achieved by transformer models suggest that regulatory technology (RegTech) solutions can effectively process large volumes of sustainability disclosures that exceed manual review capacity. High-stakes enforcement actions may justify computationally expensive models like RoBERTa-large-MNLI for maximum accuracy, while routine monitoring of large-scale disclosures may benefit from efficient models like DistilBERT. The multi-dimensional evaluation framework we provide enables regulators to assess not only accuracy but also prediction stability and confidence calibration—critical characteristics for reliable regulatory decision-making. For cryptocurrency projects and exchanges, our findings highlight the importance of clear, substantive sustainability communications. The high accuracy achieved by transformer models in distinguishing between positive, negative, and neutral sustainability claims suggests that automated assessment systems can effectively identify genuine environmental initiatives versus superficial greenwashing attempts [11,14]. This capability incentivizes authentic sustainability efforts and transparent communication. For investors and ESG-focused funds, automated sustainability assessment tools based on our benchmarks can enhance due diligence processes and portfolio screening [16,17]. The ability to systematically analyze sustainability claims at scale enables more informed investment decisions and better alignment with environmental objectives.

6.3. Limitations

Several limitations of this study warrant consideration and provide opportunities for future research. First, while our synthetic dataset of 300 samples enables meaningful comparative analysis, the relatively small sample size may limit the generalizability of our findings to more diverse real-world scenarios. The use of synthetic data, while methodologically justified for controlled comparative evaluation, presents inherent limitations. Although our synthetic samples underwent rigorous expert validation and incorporate real-world elements (authentic cryptocurrency projects, documented environmental metrics, genuine sustainability initiatives), they may not fully capture: (1) the linguistic diversity and stylistic variations in naturally occurring cryptocurrency news across different media sources; (2) the evolving nature of sustainability discourse as new technologies and environmental concerns emerge; (3) subtle rhetorical strategies employed in actual greenwashing attempts; and (4) the complex interplay of technical, economic, and environmental factors in authentic industry communications. We emphasize that our synthetic dataset serves as a controlled benchmark for architectural comparison—a necessary first step—while domain-specific validation remains essential before production deployment. Second, our reported performance metrics, including RoBERTa-large-MNLI’s F1 score of 1.00, represent controlled benchmark conditions with balanced, expert-validated data and clear decision boundaries. Real-world cryptocurrency sustainability discourse presents additional challenges including ambiguous claims, evolving terminology, intentional obfuscation in greenwashing attempts, and noisy data that will likely reduce performance. Practitioners should interpret our results as establishing architectural capabilities and comparative rankings under optimal conditions, not guaranteed real-world performance across all deployment scenarios. Third, our focus on English-language content potentially limits the global applicability of our findings, particularly in markets where sustainability discussions occur in multiple languages. Cryptocurrency markets operate globally, and sustainability discourse varies significantly across linguistic and cultural contexts. Fourth, the rapid evolution of cryptocurrency technologies and sustainability practices suggests potential temporal limitations to our findings. New consensus mechanisms (beyond Proof-of-Work and Proof-of-Stake), emerging sustainability metrics, and evolving regulatory frameworks may introduce terminology and concepts not represented in our training data, potentially affecting model performance over time. Fifth, while our evaluation framework encompasses multiple performance dimensions (accuracy, computational efficiency, prediction stability, confidence calibration), additional metrics relating to model interpretability and explanation quality could provide valuable insights for practitioner trust and regulatory compliance. Understanding why models make specific classifications is increasingly important for high-stakes applications and regulatory contexts requiring transparent decision-making.

Furthermore, our evaluation focused exclusively on transformer architectures without exploring hybrid approaches that combine neural networks with rule-based post-processing procedures. Such hybrid neural-symbolic methods could potentially enhance interpretability and handle edge cases involving complex linguistic patterns or domain-specific terminology.

6.4. Future Research Directions

The findings and limitations of this study suggest several promising directions for future research. First and most critically, comprehensive validation using large-scale, naturally occurring cryptocurrency sustainability datasets is essential. Such studies should systematically compare model performance between synthetic and real-world data, identifying specific areas where synthetic training may underrepresent real-world complexity. Research should examine: (1) whether the comparative performance rankings observed in our controlled evaluation persist with real-world data; (2) how model performance degrades (if at all) when confronting the full complexity of authentic sustainability discourse; (3) whether additional fine-tuning with real data significantly improves performance beyond our synthetic baseline; and (4) how models handle emerging sustainability claims and technologies not represented in our synthetic dataset. Second, the investigation of multilingual model performance and cross-cultural sustainability narratives represents an important direction. Research should explore whether multilingual transformer models maintain comparable performance across languages and whether sustainability sentiment patterns vary systematically across cultural contexts. This work would enhance the global applicability of automated sustainability assessment systems. Third, research into dynamic fine-tuning strategies and temporal adaptation mechanisms could address the challenge of evolving sustainability discourse. Investigation of continual learning approaches, where models incrementally adapt to new terminology and concepts without catastrophic forgetting of previous knowledge, could maintain classification accuracy as the cryptocurrency sustainability landscape evolves. Fourth, the computational efficiency analysis reveals important avenues for future investigation. Research into more efficient architectural variants, including knowledge distillation techniques and model compression strategies, could help optimize the balance between performance and resource utilization. Additionally, investigation of domain-specific pre-tuning approaches might enable better feature extraction while reducing computational requirements. Hybrid approaches combining synthetic data for initial fine-tuning with real-world data for domain adaptation could optimize the balance between controlled evaluation and practical applicability. Fifth, the investigation of multi-modal approaches incorporating both textual and quantitative sustainability indicators might provide more comprehensive assessment capabilities. Combining natural language analysis of sustainability claims with structured data on energy consumption, carbon emissions, and renewable energy adoption could enhance classification accuracy and provide richer insights for decision makers. Sixth, research into model interpretability and explainability for sustainability classification represents a critical direction. Developing methods to identify which textual features drive classification decisions would enhance practitioner trust, enable model debugging, and support regulatory compliance requirements for transparent automated decision-making. Seventh, research into the integration of automated classification systems with blockchain-based verification mechanisms could enhance the reliability and transparency of sustainability assessments. Exploring how on-chain data (energy consumption metrics, consensus mechanism parameters) can be combined with off-chain textual analysis could provide more robust and verifiable sustainability evaluations. Eighth, investigation of active learning strategies, where models identify uncertain predictions for expert review, could enable efficient incorporation of real-world complexity while maintaining quality control. This approach could optimize the balance between automated processing and human expertise in practical deployment scenarios. Finally, the methodological framework developed in this study provides a foundation for the systematic evaluation of model performance in sustainability classification tasks. Future research could extend this framework to incorporate additional evaluation dimensions, particularly those relating to model robustness under adversarial conditions, performance stability across different market contexts, and resilience to intentional manipulation attempts in greenwashing scenarios.

While our study focused on pure transformer architectures to establish baseline benchmarks, future work could explore hybrid approaches combining neural networks with rule-based procedures to enhance interpretability and handle complex edge cases in sustainability claim classification.

Author Contributions

Conceptualization, P.B. and M.F.-F.; Methodology, P.B.; Software, P.B.; Validation, M.F.-F.; Formal analysis, P.B.; Investigation, P.B. and Z.G.S.; Data curation, Z.G.S.; Writing—original draft, P.B., M.F.-F. and Z.G.S.; Writing—review & editing, M.F.-F. and Z.G.S.; Visualization, Z.G.S.; Supervision, M.F.-F. and Z.G.S.; Project administration, M.F.-F.; Funding acquisition, M.F.-F. All authors contributed to the study conception, design, material preparation, data collection and analysis. The first draft of the manuscript was written by corresponding author and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed during the current study are available in: https://huggingface.co/datasets/arad1367/sustainability_impact_crypto_data. The code used in this study is available in: https://github.com/arad1367/Intelligent_Sustainability_Paper/blob/main/Sustainable_Crypto_Paper.ipynb.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DSR	Design Science Research
NLP	natural language processing
PoW	Proof-of-Work
PoS	Proof-of-Stake
LGOs	local governments
EOSL	Energy-Optimized Semantic Loss
DOs	design objectives
SM	Streaming Multiprocessor

References

Anandhabalaji, V.; Babu, M.; Brintha, R. Energy consumption by cryptocurrency: A bibliometric analysis revealing research trends and insights. Energy Nexus 2024, 13, 100274. [Google Scholar] [CrossRef]
Huynh, A.N.Q.; Duong, D.; Burggraf, T.; Luong, H.T.T.; Bui, N.H. Energy consumption and bitcoin market. Asia-Pac. Financ. Mark. 2022, 29, 79–93. [Google Scholar] [CrossRef]
Sapra, N.; Shaikh, I. Impact of Bitcoin mining and crypto market determinants on Bitcoin-based energy consumption. Manag. Financ. 2023, 49, 1828–1846. [Google Scholar] [CrossRef]
Schinckus, C.; Nguyen, C.P.; Chong, F.H.L. Cryptocurrencies’ hashrate and electricity consumption: Evidence from mining activities. Stud. Econ. Financ. 2022, 39, 524–546. [Google Scholar] [CrossRef]
Sedlmeir, J.; Buhl, H.U.; Fridgen, G.; Keller, R. The energy consumption of blockchain technology: Beyond myth. Bus. Inf. Syst. Eng. 2020, 62, 599–608. [Google Scholar] [CrossRef]
Zhang, D.; Chen, X.H.; Lau, C.K.M.; Xu, B. Implications of cryptocurrency energy usage on climate change. Technol. Forecast. Soc. Change 2023, 187, 122219. [Google Scholar] [CrossRef]
Bajra, U.Q.; Ermir, R.; Avdiaj, S. Cryptocurrency blockchain and its carbon footprint: Anticipating future challenges. Technol. Soc. 2024, 77, 102571. [Google Scholar] [CrossRef]
Erdogan, S.; Ahmed, M.Y.; Sarkodie, S.A. Analyzing asymmetric effects of cryptocurrency demand on environmental sustainability. Environ. Sci. Pollut. Res. Int. 2022, 29, 31723–31733. [Google Scholar] [CrossRef] [PubMed]
Kohli, V.; Chakravarty, S.; Chamola, V.; Sangwan, K.S.; Zeadally, S. An analysis of energy consumption and carbon footprints of cryptocurrencies and possible solutions. Digit. Commun. Netw. 2023, 9, 79–89. [Google Scholar] [CrossRef]
Mustafa, F.; Mordi, C.; Elamer, A.A. Green gold or carbon beast? Assessing the environmental implications of cryptocurrency trading on clean water management and carbon emission SDGs. J. Environ. Manag. 2024, 367, 122059. [Google Scholar] [CrossRef]
Hossain, M.R.; Rao, A.; Sharma, G.D.; Dev, D.; Kharbanda, A. Empowering energy transition: Green innovation, digital finance, and the path to sustainable prosperity through green finance initiatives. Energy Econ. 2024, 136, 107736. [Google Scholar] [CrossRef]
Gunay, S.; Sraieb, M.M.; Kaskaloglu, K.; Yıldız, M.E. Cryptocurrencies and global sustainability: Do blockchained sectors have distinctive effects? J. Clean. Prod. 2023, 425, 138943. [Google Scholar] [CrossRef]
Karim, S.; Naeem, M.A.; Tiwari, A.K.; Ashraf, S. Examining the avenues of sustainability in resources and digital blockchains backed currencies: Evidence from energy metals and cryptocurrencies. Ann. Oper. Res. 2023, 1–18. [Google Scholar] [CrossRef] [PubMed]
Mulligan, C.; Morsfield, S.; Cheikosman, E. Blockchain for sustainability: A systematic literature review for policy impact. Telecommun. Policy 2024, 48, 102676. [Google Scholar] [CrossRef]
Mili, R.; Trimech, A.; Benammou, S. Impact of bitcoin transaction volume and energy consumption on environmental sustainability: Evidence through ARDL model. In Proceedings of the 2024 IEEE 15th International Colloquium on Logistics and Supply Chain Management (LOGISTIQUA), Sousse, Tunisia, 2–4 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Del Sarto, N.; Scali, E.; Barontini, R. Harnessing the potential of green cryptocurrencies: A path toward climate change mitigation. In Climate Change and Finance Sustainable Finance; Naifar, N., Ed.; Springer: Berlin/Heidelberg, Germany, 2024; pp. 299–322. [Google Scholar]
Wiwoho, J.; Trinugroho, I.; Kharisma, D.B.; Suwadi, P. Cryptocurrency mining policy to protect the environment. Cogent Soc. Sci. 2024, 10, 2323755. [Google Scholar] [CrossRef]
Ali, F.; Khurram, M.U.; Sensoy, A.; Vo, X.V. Green cryptocurrencies and portfolio diversification in the era of greener paths. Renew. Sustain. Energy Rev. 2024, 191, 114137. [Google Scholar] [CrossRef]
Qudah, H.; Malahim, S.; Airout, R.; AlQudah, M.Z.; Al-Zoubi, W.K.; Huson, Y.A.; Zyadat, A. Unlocking the ESG value of sustainable investments in cryptocurrency: A bibliometric review of research trends. Technol. Anal. Strateg. Manag. 2024, 37, 1341–1355. [Google Scholar] [CrossRef]
Tripathi, G.; Ahad, M.A.; Casalino, G.A. comprehensive review of blockchain technology: Underlying principles and historical background with future challenges. Decis. Anal. J. 2023, 9, 100344. [Google Scholar] [CrossRef]
Ali, A.; Jiang, X.; Ali, A. Enhancing corporate sustainable development: Organizational learning, social ties, and environmental strategies. Bus. Strategy Environ. 2023, 32, 1232–1247. [Google Scholar] [CrossRef]
Arnone, G. The social and environmental impact of cryptocurrencies. In Navigating the World of Cryptocurrencies: Technology, Economics, Regulations, and Future Trends; Contributions to Finance and Accounting; Springer: Berlin/Heidelberg, Germany, 2024; pp. 155–170. [Google Scholar] [CrossRef]
Wendl, M.; Doan, M.H.; Sassen, R. The environmental impact of cryptocurrencies using proof of work and proof of stake consensus algorithms: A systematic review. J. Environ. Manag. 2023, 326 Pt A, 116530. [Google Scholar] [CrossRef]
Uzun, I.; Lobachev, M.; Kharchenko, V.; Schöler, T.; Lobachev, I. Candlestick pattern recognition in cryptocurrency price time-series data using rule-based data analysis methods. Computation 2024, 12, 132. [Google Scholar] [CrossRef]
Zhao, Y. Improving life cycle assessment accuracy and efficiency with transformers. In Proceedings of the 3rd International Conference on Advanced Surface Enhancement (INCASE) 2023 INCASE 2023, Singapore, 25–27 September 2023; Lecture Notes in Mechanical, Engineering. Springer: Berlin/Heidelberg, Germany, 2024; pp. 417–421. [Google Scholar]
Faiz, A.; Kaneda, S.; Wang, R.; Osi, R.; Sharma, P.; Chen, F.; Jiang, L. Llmcarbon: Modeling the end-to-end carbon footprint of large language models. arXiv 2023, arXiv:2309.14393. [Google Scholar] [CrossRef]
Khowaja, S.A.; Khuwaja, P.; Dev, K.; Wang, W.; Nkenyereye, L. ChatGPT needs SPADE (sustainability, privacy, digital divide, and ethics) evaluation: A review. Cogn. Comput. 2024, 16, 2528–2550. [Google Scholar] [CrossRef]
Li, B.; Jiang, Y.; Gadepally, V.; Tiwari, D. Toward sustainable genai using generation directives for carbon-friendly large language model inference. arXiv 2024, arXiv:2403.12900. [Google Scholar] [CrossRef]
Liu, V.; Yin, Y. Green AI: Exploring carbon footprints, mitigation strategies, and trade offs in large language model training. Discov. Artif. Intell. 2024, 4, 49. [Google Scholar] [CrossRef]
Varoquaux, G.; Luccioni, A.S.; Whittaker, M. Hype, sustainability, and the price of the bigger-is-better paradigm in AI. arXiv 2024, arXiv:2409.14160. [Google Scholar] [CrossRef]
Peffers, K.; Tuunanen, T.; Rothenberger, M.A.; Chatterjee, S. A design science research methodology for information systems research. J. Manag. Inf. Syst. 2007, 24, 45–77. [Google Scholar] [CrossRef]
De Angelis, P.; De Marchis, R.; Marino, M.; Martire, A.L.; Oliva, I. Betting on bitcoin: A profitable trading between directional and shielding strategies. Decis. Econ. Financ. 2021, 44, 883–903. [Google Scholar] [CrossRef]
Gandal, N.; Hamrick, J.T.; Moore, T.; Vasek, M. The rise and fall of cryptocurrency coins and tokens. Decis. Econ. Financ. 2021, 44, 981–1014. [Google Scholar] [CrossRef]
Kang, K.Y. Cryptocurrency and double spending history: Transactions with zero confirmation. Econ. Theory 2023, 75, 453–491. [Google Scholar] [CrossRef]
Ahn, J.; Yi, E.; Kim, M. Blockchain Consensus Mechanisms: A Bibliometric Analysis (2014–2024) Using VOSviewer and R Bibliometrix. Information 2024, 15, 644. [Google Scholar] [CrossRef]
Sasani, F.; Moghareh Dehkordi, M.; Ebrahimi, Z.; Dustmohammadloo, H.; Bouzari, P.; Ebrahimi, P.; Lencsés, E.; Fekete-Farkas, M. Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features. Computers 2024, 13, 20. [Google Scholar] [CrossRef]
Corbet, S.; Urquhart, A.; Yarovaya, L. Cryptocurrency and Blockchain Technology; De Gruyter: Berlin, Germany, 2020. [Google Scholar]
Mohsin, M.; Naseem, S.; Zia-ur-Rehman, M.; Baig, S.A.; Salamat, S. The crypto-trade volume, GDP, energy use, and environmental degradation sustainability: An analysis of the top 20 crypto-trader countries. Int. J. Financ. Econ. 2020, 28, 651–667. [Google Scholar] [CrossRef]
Gemajli, A.; Patel, S.; Bradford, P.G. A low carbon proof-of-work blockchain. arXiv 2024, arXiv:2404.04729. [Google Scholar] [CrossRef]
Kliber, A.; Będowska-Sójka, B. Proof-of-work versus proof-of-stake coins as possible hedges against green and dirty energy. Energy Econ. 2024, 138, 107820. [Google Scholar] [CrossRef]
Li, A.; Gong, K.; Li, J.; Zhang, L.; Luo, X. Is monopolization inevitable in proof-of-work blockchains? Insights from miner scale analysis. Comput. Econ. 2024, 66, 1825–1850. [Google Scholar] [CrossRef]
Wijewardhana, D.; Vidanagamachchi, S.; Arachchilage, N. Examining attacks on consensus and incentive systems in proof-of-work blockchains: A systematic literature review. arXiv 2024, arXiv:2411.00349. [Google Scholar] [CrossRef]
Treiblmaier, H. A comprehensive research framework for Bitcoin’s energy use: Fundamentals, economic rationale, and a pinch of thermodynamics. Blockchain Res. Appl. 2023, 4, 100149. [Google Scholar] [CrossRef]
Hajiaghapour-Moghimi, M.; Tafreshi, O.H.; Hajipour, E.; Vakilian, M.; Lehtonen, M. Investigating the cryptocurrency mining loads’ high penetration impact on electric power grid. IEEE Access 2024, 12, 153643–153663. [Google Scholar] [CrossRef]
Li, J.; Li, N.; Peng, J.; Cui, H.; Wu, Z. Energy consumption of cryptocurrency mining: A study of electricity consumption in mining cryptocurrencies. Energy 2019, 168, 160–168. [Google Scholar] [CrossRef]
Majumder, S.; Aravena, I.; Xie, L. An econometric analysis of large flexible cryptocurrency-mining consumers in electricity markets. arXiv 2024, arXiv:2408.12014. [Google Scholar] [CrossRef]
Zheng, M.; Feng, G.-F.; Zhao, X.; Chang, C.-P. The transaction behavior of cryptocurrency and electricity consumption. Financ. Innov. 2023, 9, 44. [Google Scholar] [CrossRef]
Altın, H. The impact of energy efficiency and renewable energy consumption on carbon emissions in G7 countries. Int. J. Sustain. Eng. 2024, 17, 134–142. [Google Scholar] [CrossRef]
Holechek, J.L.; Geli, H.M.E.; Sawalhah, M.N.; Valdez, R. A global assessment: Can renewable energy replace fossil fuels by 2050? Sustainability 2022, 14, 4792. [Google Scholar] [CrossRef]
Owusu, P.A.; Asumadu-Sarkodie, S. A review of renewable energy sources, sustainability issues and climate change mitigation. Cogent Eng. 2016, 3, 1167990. [Google Scholar] [CrossRef]
Rahman, A.; Murad, S.M.W.; Mohsin, A.K.M.; Wang, X. Does renewable energy proactively contribute to mitigating carbon emissions in major fossil fuels consuming countries? J. Clean. Prod. 2024, 452, 142113. [Google Scholar] [CrossRef]
Singh, M.K.; Malek, J.; Sharma, H.K.; Kumar, R. Converting the threats of fossil fuel-based energy generation into opportunities for renewable energy development in India. Renew. Energy 2024, 224, 120153. [Google Scholar] [CrossRef]
Yeong, Y.-C.; Kalid, K.S.; Savita, K.S.; Ahmad, M.N.; Zaffar, M. Sustainable cryptocurrency adoption assessment among IT enthusiasts and cryptocurrency social communities. Sustain. Energy Technol. Assess. 2022, 52, 102085. [Google Scholar] [CrossRef]
Bhimani, A.; Hausken, K.; Arif, S. Do national development factors affect cryptocurrency adoption? Technol. Forecast. Soc. Change 2022, 181, 121739. [Google Scholar] [CrossRef]
di Prisco, D.; Strangio, D. Technology and financial inclusion: A case study to evaluate potential and limitations of Blockchain in emerging countries. Technol. Anal. Strateg. Manag. 2021, 37, 448–461. [Google Scholar] [CrossRef]
Mhlanga, D. Blockchain for digital financial inclusion towards reduced inequalities. In FinTech and Artificial Intelligence for Sustainable Development: The Role of Smart Technologies in Achieving Development Goals; Sustainable Development Goals Series; Springer Nature: Cham, Switzerland, 2023; pp. 175–192. [Google Scholar]
Ganesh, C.; Orlandi, C.; Tschudi, D. Proof-of-stake protocols for privacy-aware blockchains. In Advances in Cryptology—EUROCRYPT 2019; Ishai, Y., Rijmen, V., Eds.; Lecture Notes in Computer, Science; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11476, pp. 486–515. [Google Scholar]
Grandjean, D.; Heimbach, L.; Wattenhofer, R. Ethereum proof-of-stake consensus layer: Participation and decentralization. In Financial Cryptography and Data Security: FC 2024 International Workshops; Budurushi, J., Kulyk, O., Allen, S., Diamandis, T., Klages-Mundt, A., Bracciali, A., Goodell, G., Matsuo, S., Eds.; Lecture Notes in Computer, Science; Springer: Berlin/Heidelberg, Germany, 2025; Volume 14746, pp. 287–310. [Google Scholar]
Leporati, A.; Rovida, L. Looking for stability in proof-of-stake based consensus mechanisms. Blockchain Res. Appl. 2024, 5, 100222. [Google Scholar] [CrossRef]
Li, W.; Andreina, S.; Bohli, J.-M.; Karame, G. Securing proof-of-stake blockchain protocols. In Data Privacy Management, Cryptocurrencies and Blockchain Technology; Garcia-Alfaro, J., Navarro-Arribas, G., Hartenstein, H., Herrera-Joancomarti, J., Eds.; Lecture Notes in Computer, Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10436, pp. 299–315. [Google Scholar]
Saad, M.; Qin, Z.; Ren, K.; Nyang, D.; Mohaisen, D. e-PoS: Making proof-of-stake decentralized and fair. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1961–1973. [Google Scholar] [CrossRef]
Tang, W. Trading and wealth evolution in the Proof of Stake protocol. arXiv 2023, arXiv:2308.01803. [Google Scholar] [CrossRef]
Wu, S.; Song, Z.; Wei, P.; Tang, P.; Yuan, Q. Improving privacy of anonymous proof-of-stake protocols. In Cryptology and Network Security: CANS 2023; Deng, J., Kolesnikov, V., Schwarzmann, A.A., Eds.; Lecture Notes in Computer, Science; Springer: Berlin/Heidelberg, Germany, 2023; Volume 14342, pp. 368–391. [Google Scholar]
De Angelis, S.; Lombardi, F.; Zanfino, G.; Aniello, L.; Sassone, V. Security and dependability analysis of blockchain systems in partially synchronous networks with Byzantine faults. Int. J. Parallel Emergent Distrib. Syst. 2023, 1–21. [Google Scholar] [CrossRef]
Larangeira, M.; Karakostas, D. The security of delegated proof-of-stake wallet and stake pools. In Blockchains Advances in Information Security; Ruj, S., Kanhere, S.S., Conti, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2024; Volume 105, pp. 135–150. [Google Scholar]
Tas, E.N.; Tse, D.; Gai, F.; Kannan, S.; Maddah-Ali, M.A.; Yu, F. Bitcoin-enhanced proof-of-stake security: Possibilities and impossibilities. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–24 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 126–145. [Google Scholar]
Bochkay, K.; Brown, S.V.; Leone, A.J.; Tucker, J.W. Textual analysis in accounting: What’s next? Contemp. Account. Res. 2022, 40, 765–805. [Google Scholar] [CrossRef]
Chan, S.W.K.; Chong, M.W.C. Sentiment analysis in financial texts. Decis. Support Syst. 2017, 94, 53–64. [Google Scholar] [CrossRef]
Huang, A.H.; Wang, H.; Yang, Y. FinBERT: A large language model for extracting information from financial text. Contemp. Account. Res. 2023, 40, 806–841. [Google Scholar] [CrossRef]
Khalil, F.; Pipa, G. Is deep-learning and natural language processing transcending the financial forecasting? Investigation through lens of news analytic process. Comput. Econ. 2022, 60, 147–171. [Google Scholar] [CrossRef]
Lewis, C.; Young, S. Fad or future? Automated analysis of financial text and its implications for corporate reporting. Account. Bus. Res. 2019, 49, 587–615. [Google Scholar] [CrossRef]
Todd, A.; Bowden, J.; Moshfeghi, Y. Text-based sentiment analysis in finance: Synthesising the existing literature and exploring future directions. Intell. Syst. Account. Financ. Manag. 2024, 31, e1549. [Google Scholar] [CrossRef]
Wang, L.; Cheng, Y.; Xiang, A.; Zhang, J.; Yang, H. Application of natural language processing in financial risk detection. arXiv 2024, arXiv:2406.09765. [Google Scholar] [CrossRef]
Xing, F.Z.; Cambria, E.; Welsch, R.E. Natural language based financial forecasting: A survey. Artif. Intell. Rev. 2018, 50, 49–73. [Google Scholar] [CrossRef]
Mishev, K.; Gjorgjevikj, A.; Vodenska, I.; Chitkushev, L.T.; Trajanov, D. Evaluation of sentiment analysis in finance: From lexicons to transformers. IEEE Access 2020, 8, 131662–131682. [Google Scholar] [CrossRef]
Rizinski, M.; Peshov, H.; Mishev, K.; Jovanovik, M.; Trajanov, D. Sentiment analysis in finance: From transformers back to explainable lexicons (XLex). IEEE Access 2024, 12, 7170–7198. [Google Scholar] [CrossRef]
Yang, L.; Kenny, E.; Ng, T.L.J.; Yang, Y.; Smyth, B.; Dong, R. Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv 2020, arXiv:2010.12512. [Google Scholar] [CrossRef]
Rao, D.; Huang, S.; Jiang, Z.; Deverajan, G.G.; Patan, R. A dual deep neural network with phrase structure and attention mechanism for sentiment analysis. Neural Comput. Appl. 2021, 33, 11297–11308. [Google Scholar] [CrossRef]
Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. arXiv 2014, arXiv:1406.6247. [Google Scholar] [CrossRef]
Bashiri, H.; Naderi, H. Comprehensive review and comparative analysis of transformer models in sentiment analysis. Knowl. Inf. Syst. 2024, 66, 7305–7361. [Google Scholar] [CrossRef]
Fatouros, G.; Soldatos, J.; Kouroumali, K.; Makridis, G.; Kyriazis, D. Transforming sentiment analysis in the financial domain with ChatGPT. Mach. Learn. Appl. 2023, 14, 100508. [Google Scholar] [CrossRef]
Garcia-Diaz, J.A.; Garcia-Sanchez, F.; Valencia-Garcia, R. Smart analysis of economics sentiment in spanish based on linguistic features and transformers. IEEE Access 2023, 11, 14211–14224. [Google Scholar] [CrossRef]
Gutiérrez-Fandiño, A.; Kolm, P.N.; Armengol-Estapé, J. FinEAS: Financial embedding analysis of sentiment. arXiv 2021, arXiv:2111.00526. [Google Scholar] [CrossRef]
Naseem, U.; Razzak, I.; Musial, K.; Imran, M. Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Future Gener. Comput. Syst. 2020, 113, 58–69. [Google Scholar] [CrossRef]
Luo, W.; Gong, D. Pre-trained large language models for financial sentiment analysis. arXiv 2024, arXiv:2401.05215. [Google Scholar] [CrossRef]
Mao, Y.; Liu, Q.; Zhang, Y. Sentiment analysis methods, applications, and challenges: A systematic literature review. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102048. [Google Scholar] [CrossRef]
Barandas, M.; Famiglini, L.; Campagner, A.; Folgado, D.; Simão, R.; Cabitza, F.; Gamboa, H. Evaluation of uncertainty quantification methods in multi-label classification: A case study with automatic diagnosis of electrocardiogram. Inf. Fusion 2024, 101, 101978. [Google Scholar] [CrossRef]
Nie, Y.; Kong, Y.; Dong, X.; Mulvey, J.M.; Poor, H.V.; Wen, Q.; Zohren, S. A survey of large language models for financial applications: Progress, prospects and challenges. arXiv 2024, arXiv:2406.11903. [Google Scholar] [CrossRef]
Angin, M.; Taşdemir, B.; Yılmaz, C.A.; Demiralp, G.; Atay, M.; Angin, P.; Dikmener, G.A. RoBERTa approach for automated processing of sustainability reports. Sustainability 2022, 14, 16139. [Google Scholar] [CrossRef]
Nagarajan, R.; Narayanasamy, S.K.; Thirunavukarasu, R.; Raj, P. Intelligent Systems and Sustainable Computational Models: Concepts, Architecture, and Practical Applications, 1st ed.; Auerbach Publications: Boca Raton, FL, USA, 2024. [Google Scholar]
Olawumi, T.O.; Chan, D.W.M. Cloud-based sustainability assessment (CSA) system for automating the sustainability decision-making process of built assets. Expert Syst. Appl. 2022, 188, 116020. [Google Scholar] [CrossRef]
Wulff, D.U.; Meier, D.S.; Mata, R. Using novel data and ensemble models to improve automated labeling of sustainable development goals. Sustain. Sci. 2024, 19, 1773–1787. [Google Scholar] [CrossRef]
Chelli, M.; Durocher, S.; Fortin, A. Substantive and symbolic strategies sustaining the environmentally friendly ideology: A media-sensitive analysis of the discourse of a leading French utility. Account. Audit. Account. J. 2019, 32, 1013–1042. [Google Scholar] [CrossRef]
Anderson, T.; Sarkar, S.; Kelley, R. Analyzing public sentiment on sustainability: A comprehensive review and application of sentiment analysis techniques. Nat. Lang. Process. J. 2024, 8, 100097. [Google Scholar] [CrossRef]
Rocca, L.; Giacomini, D.; Zola, P. Environmental disclosure and sentiment analysis: State of the art and opportunities for public-sector organisations. Meditari Account. Res. 2021, 29, 617–646. [Google Scholar] [CrossRef]
Diwanji, V.S.; Baines, A.F.; Bauer, F.; Clark, K. Green consumerism: A cross-cultural linguistic and sentiment analysis of sustainable consumption discourse on Twitter (X). J. Curr. Issues Res. Advert. 2024, 45, 476–505. [Google Scholar] [CrossRef]
Baxter, K.; Ghandour, R.; Histon, W. Greenwashing and brand perception—A consumer sentiment analysis on organisations accused of greenwashing. Int. J. Internet Mark. Advert. 2024, 21, 149–179. [Google Scholar] [CrossRef]
Berthelot, A.; Caron, E.; Jay, M.; Lefèvre, L. Estimating the environmental impact of Generative-AI services using an LCA-based methodology. Procedia CIRP 2024, 122, 707–712. [Google Scholar] [CrossRef]
Konya, A.; Nematzadeh, P. Recent applications of AI to environmental disciplines: A review. Sci. Total Environ. 2024, 906, 167705. [Google Scholar] [CrossRef]
Ligozat, A.-L.; Lefevre, J.; Bugeau, A.; Combaz, J. Unraveling the hidden environmental impacts of AI solutions for environment. arXiv 2021, arXiv:2110.11822. [Google Scholar] [CrossRef]
van Wynsberghe, A. Sustainable AI: AI for sustainability and the sustainability of AI. AI Ethics 2021, 1, 213–218. [Google Scholar] [CrossRef]
Farsi, M.; Hosseinian-Far, A.; Daneshkhah, A.; Sedighi, T. Mathematical and computational modelling frameworks for integrated sustainability assessment (ISA). In Strategic Engineering for Cloud Computing and Big Data Analytics; Hosseinian-Far, A., Ramachandran, M., Sarwar, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–22. [Google Scholar]
Ansar, W.; Goswami, S.; Chakrabarti, A. A survey on transformers in NLP with focus on efficiency. arXiv 2024, arXiv:2406.16893. [Google Scholar] [CrossRef]
Cahyawijaya, S. Greenformers: Improving computation and memory efficiency in transformer models via low-rank approximation. arXiv 2021, arXiv:2108.10808. [Google Scholar] [CrossRef]
Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
Nedjah, N.; Mourelle, L.d.M.; dos Santos, R.A.; dos Santos, L.T.B. Sustainable maintenance of power transformers using computational intelligence. Sustain. Technol. Entrep. 2022, 1, 100001. [Google Scholar] [CrossRef]
Pati, S.; Aga, S.; Islam, M.; Jayasena, N.; Sinclair, M.D. Computation vs. communication scaling for future transformers on future hardware. arXiv 2023, arXiv:2302.02825. [Google Scholar] [CrossRef]
Tang, Y.; Wang, Y.; Guo, J.; Tu, Z.; Han, K.; Hu, H.; Tao, D. A survey on transformer compression. arXiv 2024, arXiv:2402.05964. [Google Scholar] [CrossRef]
Zhuang, B.; Liu, J.; Pan, Z.; He, H.; Weng, Y.; Shen, C. A survey on efficient training of transformers. arXiv 2023, arXiv:2302.01107. [Google Scholar] [CrossRef]
Mukherjee, S.; Beard, C.; Song, S. Transformers for green semantic communication: Less energy, more semantics. arXiv 2023, arXiv:2310.07592. [Google Scholar] [CrossRef]
Tschand, A.; Rajan, A.T.R.; Idgunji, S.; Ghosh, A.; Holleman, J.; Kiraly, C.; Ambalkar, P.; Borkar, R.; Chukka, R.; Cockrell, T.; et al. MLPerf power: Benchmarking the energy efficiency of machine learning systems from microwatts to megawatts for sustainable AI. arXiv 2024, arXiv:2410.12032. [Google Scholar] [CrossRef]
Gowda, S.N.; Hao, X.; Li, G.; Gowda, S.N.; Jin, X.; Sevilla-Lara, L. Watt for what: Rethinking deep learning’s energy-performance relationship. arXiv 2023, arXiv:2310.06522. [Google Scholar] [CrossRef]
Doyle, C.; Sammon, D.; Neville, K. A design science research (DSR) case study: Building an evaluation framework for social media enabled collaborative learning environments (SMECLEs). J. Decis. Syst. 2016, 25 (Suppl. 1), 125–144. [Google Scholar] [CrossRef]
Ebrahimi, P.; Schneider, J. Fine-tuning image-to-text models on Liechtenstein tourist attractions. Electron. Mark. 2025, 35, 55. [Google Scholar] [CrossRef]
Tuunanen, T.; Winter, R.; vom Brocke, J. Dealing with complexity in design science research: A methodology using design echelons. MIS Q. 2024, 48, 427–458. [Google Scholar] [CrossRef]
Guggenberger, T.; Schellinger, B.; von Wachter, V.; Urbach, N. Kickstarting blockchain: Designing blockchain-based tokens for equity crowdfunding. Electron. Commer. Res. 2024, 24, 239–273. [Google Scholar] [CrossRef]
vom Brocke, J.; Hevner, A.; Maedche, A. (Eds.) Introduction to design science research. In Design Science Research Cases Progress in IS; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–13. [Google Scholar]
Alotaibi, E.M.; Issa, H.; Codesso, M. Blockchain-based conceptual model for enhanced transparency in government records: A design science research approach. Int. J. Inf. Manag. Data Insights 2025, 5, 100304. [Google Scholar] [CrossRef]
Ballandies, M.C.; Holzwarth, V.; Sunderland, B.; Pournaras, E.; Vom Brocke, J. Advancing customer feedback systems with blockchain. Bus. Inf. Syst. Eng. 2024, 67, 449–471. [Google Scholar] [CrossRef]
Cerchione, R. Design and evaluation of a blockchain-based system for increasing customer engagement in circular economy. Corp. Soc. Responsib. Environ. Manag. 2024, 32, 160–175. [Google Scholar] [CrossRef]
Anh, V.N.H. An organizational modeling for developing smart contracts on blockchain-based supply chain finance systems. Procedia Comput. Sci. 2024, 239, 3–10. [Google Scholar] [CrossRef]
Ballandies, M.C.; Dapp, M.M.; Degenhart, B.A.; Helbing, D. Finance 4.0: Design principles for a value-sensitive cryptoecnomic system to address sustainability. arXiv 2021, arXiv:2105.11955. [Google Scholar] [CrossRef]
Diniz, E.H.; de Araujo, M.H.; Alves, M.A.; Gonzalez, L. Design principles for sustainable community currency projects. Sustain. Sci. 2024, 20, 1169–1183. [Google Scholar] [CrossRef]
Bilal, M.; Almazroi, A.A. Effectiveness of fine-tuned BERT model in classification of helpful and unhelpful online customer reviews. Electron. Commer. Res. 2023, 23, 2737–2757. [Google Scholar] [CrossRef]
Hashmi, E.; Yayilgan, S.Y. A robust hybrid approach with product context-aware learning and explainable AI for sentiment analysis in Amazon user reviews. Electron. Commer. Res. 2024. [Google Scholar] [CrossRef]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar] [CrossRef]
Camacho-Collados, J.; Rezaee, K.; Riahi, T.; Ushio, A.; Loureiro, D.; Antypas, D.; Boisson, J.; Espinosa Anke, L.; Liu, F.; Cámara, E.M. TweetNLP: Cutting-edge natural language processing for social media. arXiv 2022, arXiv:2206.14774. [Google Scholar] [CrossRef]
Loureiro, D.; Barbieri, F.; Neves, L.; Anke, L.E.; Camacho-Collados, J. TimeLMs: Diachronic language models from Twitter. arXiv 2022, arXiv:2202.03829. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
di Tollo, G.; Andria, J.; Filograsso, G. The Predictive Power of Social Media Sentiment: Evidence from Cryptocurrencies and Stock Markets Using NLP and Stochastic ANNs. Mathematics 2023, 11, 3441. [Google Scholar] [CrossRef]

Figure 1. DSR methodology for text classification of cryptocurrency news through a sustainability Lens.

Figure 2. Performance metrics evolution during fine-tuning of transformer models for cryptocurrency sustainability classification (Source: authors’ calculations tracking with Weights and Biases).

Figure 3. Computational efficiency metrics during fine-tuning of transformer models for cryptocurrency sustainability classification (Source: authors’ calculations tracking with Weights and Biases).

Figure 4. Model reliability metrics during fine-tuning of transformer models for cryptocurrency sustainability classification (Source: authors’ calculations tracking with Weights and Biases).

Figure 5. Faithfulness and stability metrics during fine-tuning of transformer models for cryptocurrency sustainability classification (Source: authors’ calculations tracking with Weights and Biases).

Figure 6. Classification performance metrics during fine-tuning of transformer models for cryptocurrency sustainability classification (Source: authors’ calculations tracking with Weights and Biases).

Figure 7. GPU memory clock speed (MHz) during model training. The horizontal axis represents training progression, and the vertical axis shows memory clock speed in MHz. RoBERTa-large-MNLI maintains consistent 6000 MHz, indicating optimal memory bandwidth utilization. (Source: authors’ calculations tracking with Weights and Biases).

Figure 8. GPU Streaming Multiprocessor (SM) clock speed (MHz) during model training. The horizontal axis represents training progression, and the vertical axis shows SM clock speed in MHz. RoBERTa-large-MNLI maintains consistent 2000 MHz, indicating efficient parallel processing. (Source: authors’ calculations tracking with Weights and Biases).

Figure 9. GPU power usage (W) during model training. The horizontal axis represents training progression, and the vertical axis shows power consumption in watts. RoBERTa-large-MNLI stabilizes at ~30 W, demonstrating efficient power management. (Source: authors’ calculations tracking with Weights and Biases).

Figure 10. GPU memory allocated (bytes ×10⁹) during model training. The horizontal axis represents training progression, and the vertical axis shows allocated memory. RoBERTa-large-MNLI stabilizes at 8 × 10⁹ bytes, indicating efficient memory management. (Source: authors’ calculations tracking with Weights and Biases).

Figure 11. Process CPU threads in use during model training. The horizontal axis represents training progression, and the vertical axis shows number of active threads. RoBERTa-large-MNLI maintains 73 threads, indicating effective parallelization. (Source: authors’ calculations tracking with Weights and Biases).

Figure 12. Network traffic (bytes ×10⁹) during model training. The horizontal axis represents training progression, and the vertical axis shows data transfer volume. RoBERTa-large-MNLI maintains ~1.4 × 10⁹ bytes, indicating efficient network utilization. (Source: authors’ calculations tracking with Weights and Biases).

Figure 13. Process memory utilization (MB) during model training. The horizontal axis represents training progression, and the vertical axis shows memory usage in MB. RoBERTa-large-MNLI maintains ~50,000 MB, demonstrating stable memory management. (Source: authors’ calculations tracking with Weights and Biases).

Table 1. Complete training dynamics for all models.

Model	Epoch	Training Loss	Validation Loss	Accuracy	Gradient Norm
DistilBERT-base-uncased (test loss in epoch 5 is 0.84)
	1	1.08	1.05	0.72	2.12
	2	1.02	0.98	0.92	2.22
	3	0.94	0.91	0.93	2.45
	4	0.88	0.86	0.93	3.24
	5	0.85	0.84	0.93	2.89
Twitter-RoBERTa-base-sentiment-latest (test loss in epoch 5 is 0.1)
	1	0.53	0.40	0.83	8.33
	2	0.24	0.18	0.95	5.07
	3	0.11	0.13	0.97	11.81
	4	0.05	0.10	0.97	2.81
	5	0.04	0.10	0.97	2.57
Prompt-Guard-86M (test loss in epoch 5 is 0.30)
	1	3.06	2.36	0.37	33.43
	2	0.93	0.47	0.77	11.53
	3	0.44	0.45	0.78	9.93
	4	0.29	0.36	0.83	8.32
	5	0.24	0.31	0.85	8.84
RoBERTa-large-MNLI (test loss in epoch 5 is 0.01)
	1	1.17	0.94	0.58	12.71
	2	0.58	0.31	0.95	10.55
	3	0.20	0.07	0.98	23.41
	4	0.05	0.02	1.00	1.95
	5	0.02	0.01	1.00	1.33
FinBERT (test loss in epoch 5 is 0.39)
	1	2.20	1.72	0.53	24.61
	2	1.31	1.18	0.60	7.64
	3	0.81	0.63	0.70	6.43
	4	0.50	0.44	0.90	8.97
	5	0.43	0.40	0.95	9.56

Table 2. Computational efficiency metrics for transformer architectures.

Model	Inference Time (ms/Sample)	GPU Memory (MB)	Training Runtime (s)	Train Samples/s	FLOPS (×10¹²)
DistilBERT-base-uncased	0.00064	0.0151	19.68	60.99	9.62
Twitter-RoBERTa-base-sentiment-latest	0.00132	0.0132	25.35	47.34	17.88
Prompt-Guard-86M	0.00325	0.0225	72.15	16.63	20.35
RoBERTa-large-MNLI	0.00343	0.0132	90.57	13.25	63.34
FinBERT	0.00123	0.0225	26.49	45.30	19.12

Table 3. Model reliability metrics.

Model	Mean Confidence	Mean Entropy	High Uncertainty Samples	Low Confidence Predictions	Confidence STD	Entropy STD
DistilBERT-base-uncased	0.44	1.07	13	60	0.04	0.02
Twitter-RoBERTa-base-sentiment-latest	0.96	0.13	5	1	0.07	0.13
Prompt-Guard-86M	0.88	0.29	12	11	0.15	0.28
RoBERTa-large-MNLI	0.99	0.05	2	0	0.03	0.08
FinBERT	0.71	0.71	11	26	0.14	0.20

Table 4. Faithfulness and stability metrics.

Model	Prediction Stability	Stability Score	Mean Confidence Change	Confidence Change STD
DistilBERT-base-uncased	0.95	95.00	0.02	0.02
Twitter-RoBERTa-base-sentiment-latest	0.95	95.00	0.02	0.07
Prompt-Guard-86M	0.88	88.33	0.06	0.10
RoBERTa-large-MNLI	0.98	98.33	0.02	0.06
FinBERT	0.90	90.00	0.07	0.07

Table 5. Classification performance metrics for cryptocurrency sustainability analysis.

Model	Class	Precision	Recall	F1-Score	Weighted F1
DistilBERT-base-uncased	Negative	0.80	0.92	0.86	0.93
	Neutral	0.96	0.96	0.96
	Positive	1.00	0.91	0.95
Twitter-RoBERTa-base-sentiment-latest	Negative	0.87	1.00	0.93	0.97
	Neutral	1.00	0.92	0.96
	Positive	1.00	1.00	1.00
Prompt-Guard-86M	Negative	0.65	1.00	0.79	0.85
	Neutral	0.90	0.75	0.82
	Positive	1.00	0.87	0.93
RoBERTa-large-MNLI	Negative	1.00	1.00	1.00	1.00
	Neutral	1.00	1.00	1.00
	Positive	1.00	1.00	1.00
FinBERT	Negative	0.81	1.00	0.90	0.95
	Neutral	1.00	0.88	0.93
	Positive	1.00	1.00	1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bouzari, P.; Fekete-Farkas, M.; Szalay, Z.G. Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims. Information 2025, 16, 1022. https://doi.org/10.3390/info16121022

AMA Style

Bouzari P, Fekete-Farkas M, Szalay ZG. Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims. Information. 2025; 16(12):1022. https://doi.org/10.3390/info16121022

Chicago/Turabian Style

Bouzari, Parisa, Maria Fekete-Farkas, and Zsigmond Gábor Szalay. 2025. "Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims" Information 16, no. 12: 1022. https://doi.org/10.3390/info16121022

APA Style

Bouzari, P., Fekete-Farkas, M., & Szalay, Z. G. (2025). Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims. Information, 16(12), 1022. https://doi.org/10.3390/info16121022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims

Abstract

1. Introduction

2. Literature Review

2.1. Cryptocurrency Sustainability Assessment

2.2. Sentiment Analysis in Financial Text Analysis

2.3. Automated Sustainability Claims Classification

2.4. Computational Efficiency in Sustainability Assessment

2.5. Research Gap and Theoretical Framework

3. Materials and Methods

4. Results and Findings

4.1. Problem Identification and Objective Definition

4.2. Artifact Design and Development

4.3. Demonstration and Evaluation

4.3.1. GPU Processing Characteristics During Model Training

4.3.2. Resource Utilization Patterns During Model Training

4.3.3. System Resource Management and Scalability Metrics

4.3.4. Memory Management and Stability Analysis

4.4. Communication

5. Discussion

6. Conclusions

6.1. Theoretical Implications

6.2. Practical Implications

6.3. Limitations

6.4. Future Research Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI