1. Introduction
The world’s energy needs are evolving rapidly as we face the urgent challenges of climate change and strive to transition away from fossil fuels. The harmful impacts of burning fossil fuels—rising global temperatures, increasingly severe natural disasters, and long-term damage to ecosystems—have made it clear that we need cleaner and more sustainable energy sources [
1,
2]. Among the promising solutions is hydrogen, a clean energy carrier that can revolutionize key industries such as heavy manufacturing, long-distance transport, and energy storage [
1,
3]. For instance, hydrogen has the potential to reduce carbon emissions in steel production by up to 90% [
3]. Despite this promise, most hydrogen today is still produced using fossil fuels, a method known as “Gray hydrogen,” which does little to address carbon emissions. Shifting to “green hydrogen,” made from renewable energy sources like wind and solar, is essential but challenging, especially given the intermittency of these renewable energy supplies. In the face of these challenges, biomass emerges as a steady and reliable alternative for hydrogen production. Agricultural residues and food wastes, such as date seeds (DSs) and spent coffee grounds (SCGs), are often overlooked yet possess immense potential for creating sustainable energy. Not only does utilizing these resources reduce waste, but it also aligns with global efforts to transition toward cleaner energy systems. Pyrolysis provides a sustainable waste-to-resource solution, converting these biomass wastes into valuable products like bio-oil, biochar, and syngas, thereby reducing landfill burden and contributing to circular economy models. However, to fully unlock the potential of biomass for hydrogen production, it is crucial to optimize the processes involved and address the inherent complexities of biomass composition. This is where machine learning (ML) and its subsets, such as deep learning, come into play as game-changing tools. By analyzing large datasets, ML algorithms can uncover hidden patterns, predict outcomes, and optimize parameters that would be challenging to manage using traditional methods [
4]. For example, neural networks excel at modelling the intricate relationships found in thermogravimetric and kinetic data. This capability allows for precise predictions of how biomass behaves during pyrolysis and its efficiency in producing energy [
5]. By reducing the need for labour-intensive experiments, predictive models driven by ML streamline research processes, making bioenergy development faster and more cost-effective. Building on this foundation, the focus of this study is to explore the potential of underutilized biomass resources like SCG and DS for sustainable hydrogen production. Specifically, it aims to optimize the pyrolysis process as a proven method for producing hydrogen while evaluating the performance of these resources both individually and as blends. To gain deeper insights, the study employs a range of analyses, including compositional studies, thermogravimetric analysis (TGA) under pyrolysis conditions, kinetic and thermodynamic evaluations, and pyrolysis tests. These approaches work together to provide a detailed understanding of the pyrolysis process and an investigation to prove which biomass sample ensures a more efficient and practical hydrogen production. To complement this experimental work, the study incorporates predictive modelling powered by Long Short-Term Memory (LSTM) neural networks. These advanced models are used to forecast mass loss curves from TGA data, offering a time- and cost-efficient alternative to traditional methods. By optimizing the pyrolysis process through these models, the study demonstrates how artificial intelligence can accelerate and enhance bioenergy research.
Machine learning techniques, particularly neural networks, have been increasingly applied in thermal analysis and pyrolysis research due to their ability to accurately model complex relationships between parameters and outcomes. Studies using traditional approaches like Random Forest (RF) and Support Vector Regression (SVR) have demonstrated their effectiveness in predicting reaction kinetics and optimizing operational parameters [
6,
7,
8,
9]. However, artificial neural networks (ANNs) have gained prominence for achieving superior performance in predicting pyrolysis outcomes, such as product yields and reaction kinetics [
10,
11]. For example, Balsora et al. [
12] used ANNs to predict product yields with R
2 values around 0.97, and Kartal and Özveren [
13] applied ANNs to estimate kinetic parameters with R
2 values over 0.96. While most studies rely on multi-layer perceptrons (MLPs), which work well for static data, the sequential nature of TGA data makes Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks more suitable. LSTMs effectively capture temporal dependencies and address challenges like the vanishing gradient problem, making them ideal for modelling weight-loss patterns during pyrolysis [
14,
15].
Despite considerable advances in applying AI and machine learning techniques to thermogravimetric analysis (TGA) modelling of biomass pyrolysis [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15], a critical research gap persists: few published studies have combined (i) spent coffee grounds and date seeds as co-pyrolysis feedstocks, (ii) comprehensive kinetic-thermodynamic characterization of their binary blends, and (iii) deep learning-based predictive modelling specifically tailored for these underutilized food waste biomasses. This triple knowledge gap represents a significant missed opportunity, as coffee grounds and date seeds are abundantly available in MENA (Middle East and North Africa) and Mediterranean regions, yet remain largely unexploited for hydrogen production despite their favourable lignocellulosic composition.
The present study addresses these gaps through four distinct novelties. First, it introduces for the first time a systematic blending strategy (three blend ratios: 75:25, 50:50, and 25:75 wt%) optimized to synergistically enhance hydrogen yield while minimizing activation energy barriers, a dual optimization criterion rarely investigated in the biomass co-pyrolysis literature. Second, it provides a comprehensive kinetic-thermodynamic dataset for SCG-DS blends, including activation energy (Ea), enthalpy (ΔH), entropy (ΔS), and Gibbs free energy (ΔG) profiles across the entire conversion range (α = 0.1–0.9) using three independent isoconversional methods (KAS, FWO, and Friedman). This multi-method validation rigorously establishes the thermochemical fingerprint of these blends, revealing that Blend 1 (75%DS-25%SCG) achieves an exceptionally low Ea of 161.75 kJ/mol, among the lowest reported for lignocellulosic biomass blends, while Blend 3 (25%DS-75% SCG) maximizes hydrogen production potential at the expense of higher energy requirements (Ea: 313.24 kJ/mol), thereby quantifying the energy-yield trade-off inherent to blend composition.
Third, the study pioneers the application of Long Short-Term Memory (LSTM) neural networks to predict TGA mass loss curves for food waste biomass blends. While previous works have successfully employed traditional machine learning algorithms (Random Forest, SVR) [
6,
7,
8,
9] or feed-forward artificial neural networks (ANNs) [
10,
11,
12,
13] for pyrolysis modelling, these approaches struggle to capture the temporal dependencies intrinsic to thermogravimetric data, where mass loss at time t depends on the entire thermal history preceding it. Our LSTM architecture, specifically designed to model sequential data, achieves unprecedented predictive accuracy (R
2 = 0.9996–0.9998) for both pure feedstocks and binary blends, surpassing the performance benchmarks reported in recent ANN-based studies (R
2 ≈ 0.96–0.97) [
12,
13]. Critically, this is the first demonstration that LSTM models trained on lignocellulosic data can generalize to unseen food waste blends, validating their transferability across diverse biomass matrices and opening a pathway for rapid, cost-effective screening of co-pyrolysis feedstock combinations without exhaustive experimental tests.
Fourth, this work uniquely integrates experimental validation with AI-driven optimization in a closed feedback loop: experimental TGA/Py-GC results inform LSTM model training, which in turn predicts optimal blending ratios and operating windows for maximizing hydrogen yield, with these predictions subsequently validated through targeted experiments. This bidirectional experimental-computational framework represents a paradigm shift from conventional sequential approaches (experiment → modelling) toward AI-accelerated discovery in bioenergy research, reducing the experimental burden by an estimated 60–70% compared to full factorial design exploration.
The overarching contribution of this study is therefore threefold: (1) it establishes SCG-DS blends as a high-potential, regionally abundant feedstock for green hydrogen production in circular economy contexts, with quantified kinetic-thermodynamic roadmaps guiding industrial implementation; (2) it demonstrates that deep learning (LSTM) can outperform conventional ML approaches for biomass pyrolysis modelling when temporal dynamics are critical, setting a new methodological standard for the field; and (3) it provides replicable AI-enhanced workflows that can be extended to other underutilized agricultural/food waste streams (e.g., olive pomace, citrus peels, and nut shells), thereby accelerating the global transition toward waste-to-energy circular systems. By bridging experimental thermochemistry, multi-scale kinetic analysis, and cutting-edge artificial intelligence, this work delivers actionable knowledge for both fundamental bioenergy research and applied hydrogen economy development.
2. Materials and Methods
2.1. Samples Preparation
The biomass materials utilized in this study were spent coffee grounds (SCGs) and date seeds (DSs) (
Figure 1). The SCG was sourced from a Tunisian coffee shop, while the DS was collected from an industry specializing in the processing of Phoenix dactylifera dates.
Before conducting any experiments, both biomass samples underwent a drying process to remove residual moisture. The SCG was dried using a Memmert UF55 drying oven set at 40 °C for a duration of 16 h. The dried SCG powder was subsequently processed using a pellet mill to produce homogeneous pellets with a diameter of 4 mm and a length ranging between 5 and 10 mm. The SCG pellets were then dried at 80 °C for 24 h and stored in airtight containers until use in pyrolysis experiments.
The DS was dried in an oven at 105 °C, then cooled and ground using a Retsch MM400 ball mill. Following grinding, the DS powder was similarly processed into pellets using the same pellet mill, yielding pellets of identical dimensions (diameter: 4 mm; length: 5–10 mm). The DS pellets were subsequently dried and stored in moisture-proof containers to ensure their preservation until further use.
To ensure the reliability and reproducibility of the experimental data, all TGA/DTG experiments were conducted in triplicate for each sample and heating rate condition (5, 10, and 15 °C/min). The mass loss profiles reported in the figures correspond to the averaged values of the three replicates. TGA is recognized as offering strong repeatability and high precision, making triplicate measurements a well-established protocol for ensuring data reliability in biomass pyrolysis studies. The maximum deviation between replicates did not exceed ±2% in any experimental condition, which is consistent with repeatability standards reported in the literature for equivalent thermogravimetric systems.
Similarly, all pyrolysis experiments were performed in triplicate under the reference conditions (600 °C, 10 °C/min), and the reported product yields correspond to mean values. The measurement uncertainties associated with each analytical method employed in this study were quantified and are summarized in
Table 1.
The Higher Heating Value (HHV) of both DS and SCG was determined experimentally using a Parr 6200 oxygen bomb calorimeter, in accordance with the ASTM D5865-13 standard. All measurements were performed in triplicate, and the reported values represent mean results with an associated measurement uncertainty of ±0.25% (
Table 1).
2.2. Experimental Facilities
Pyrolysis experiments were carried out in a fixed-bed tubular reactor operating under inert atmosphere. Prior to each experiment, a mass of 100 g of prepared biomass pellets (diameter: 4 mm; length: 5–10 mm) was loaded into the reactor vessel for each experiment. All experiments were performed in triplicate under the reference conditions to ensure reproducibility, and the reported product yields correspond to mean values with associated uncertainties as detailed in
Table 1.
Before initiating heating, an inert nitrogen (N2) atmosphere was established inside the reactor to prevent oxidative reactions during pyrolysis (50 mL/min for 15 min). A Tedlar gas sampling bag was connected to the gas outlet of the reactor to collect the non-condensable gas fraction produced during pyrolysis. The reactor outlet valve and the Tedlar bag were opened, and N2 was injected at a controlled flow rate through the reactor inlet, passing through a flow metre before entering the reactor and exiting through the condenser circuit. After 15 min of N2 purging, both the inlet and outlet valves were closed, and the system was considered ready for the pyrolysis run.
The heating rate was set to 10 °C/min on the temperature controller. It should be noted that the temperature displayed on the external controller does not directly correspond to the actual internal reactor temperature, due to inevitable thermal losses by conduction through the reactor walls, which are not perfectly insulated. To accurately track the real thermal profile inside the reactor, the internal temperature was continuously monitored throughout each experiment using a K-type thermocouple positioned inside the reactor vessel, connected to a PicoLog data acquisition system recording temperature in real time.
The non-condensable gases produced during pyrolysis were continuously directed into the Tedlar sampling bag throughout the heating phase. Once the internal reactor temperature reached 650 °C as confirmed by the PicoLog monitoring system, the reactor outlet valve was closed. Isothermal holding period was applied at the final temperature (20 min). After that, the reactor was allowed to cool naturally to ambient temperature under residual inert atmosphere before opening and recovering the solid biochar fraction.
The liquid bio-oil fraction was recovered from the condenser circuit after each experiment by washing with a known volume of solvent, and its mass was determined gravimetrically. The biochar mass was determined by weighing the reactor vessel before and after each experiment. Product yields (gas, liquid, and solid fractions) were calculated on a dry, ash-free basis and are reported as mean values ± standard deviation of the three replicates.
Ultimate analysis is performed to identify the elemental composition of the samples. It was performed employing a CHNS elemental analyser (Flash EA 1112 Series) that supplies percentage compositions for carbon, hydrogen, nitrogen, and sulphur, with oxygen being obtained by subtraction. The thermogravimetric study was performed from ambient temperature to 850 °C employing a SETARAM ThermysOne-TG-DSC, which is capable of reaching temperatures up to 1600 °C, under an inert atmosphere provided by a nitrogen gas flow rate of 50 mL/min. Finally, Micro gas chromatography (micro-GC), a miniaturized version of gas–liquid chromatography (GLC), is a highly efficient and precise technique for separating and analyzing volatile components in mixtures. When combined with pyrolysis (Py-GC), micro-GC becomes an invaluable tool for analyzing the volatile products generated by the thermal degradation of complex organic materials.
2.3. LSTM-Based Prediction of TGA Data
After discussion results on the physicochemical, thermal degradation and kinetic behaviours, the data outcomes are employed to feed the LSTM-based approach. To remind, it aims to predict TGA curves for biomass blends, leveraging their ability to process sequential data and capture complex thermal degradation trends. The model was developed and tested using Python 3, utilizing libraries such as TensorFlow and Keras for building the LSTM model, Scikit-learn for pre-processing, and Keras Tuner for hyper parameter optimization. Tools like Matplotlib were employed for result visualization, while RandomSearch was used to identify the optimal configuration for the LSTM architecture. By splitting the data into training, validation, and test subsets, the model’s performance was rigorously evaluated on unseen data, ensuring robust and generalizable predictions
2.3.1. Data Pre-Processing and Feature Engineering
The data obtained from TGA experiments on both pure and blended biomass samples are used to feed the deep learning procedure. Each sample was analyzed across a range of heating rates up to a final temperature of 850 °C. These extensive data provided insights into the thermal degradation patterns of both individual and blended samples. The specific datasets included:
Spent coffee grounds (SCGs): TGA data collected at heating rates of 5, 10, 15, and 20 °C/min;
Date seeds (DSs): TGA data collected at heating rates of 5, 10, 15, and 20 °C/min;
Blend 1 (75% DS, 25% SCG): data collected at heating rates of 5, 10, 15, and 20 °C/min;
Blend 2 (50% DS, 50% SCG): data collected at heating rates of 5, 10, and 15 °C/min;
Blend 3 (25% DS, 75% SCG): data collected at heating rates of 5, 10, and 15 °C/min.
These various datasets allowed us to observe and model thermal degradation trends across different compositions and heating rates, which was essential for building an accurate predictive model. The raw TGA data collected was complete and did not contain any missing values, as the TGA instrument generated a full dataset for each measurement. Furthermore, raw data were used without applying noise reduction or smoothing techniques to preserve reliability. This allowed the model to learn directly from the inherent variations within the data, which is expected to support generalization in prediction tasks. Concerning the model training, two dataset versions were created, each designed to capture different layers of detail:
Model 1: Primary Dataset: This simpler dataset included core features—DS %, SCG %, heating rate (°C/min), sample temperature (°C), and mass % (the target variable representing the sample’s remaining mass percentage during heating). These features were selected to provide a straightforward representation of the sample composition, heating conditions, and the resulting mass loss over time.
Model 2: Extended Dataset with Lignocellulosic Composition: To deepen the model’s understanding of biomass thermal behaviour, this dataset included additional features representing the three main lignocellulosic components:cellulose, hemicellulose, and lignin.
For Model 2, decomposition characteristics of lignocellulosic components over different temperature ranges were considered, each affecting mass loss differently as the temperature increases:
Cellulose % decomposes rapidly between 315 and 405 °C, causing significant mass loss in this range;
Hemicellulose % begins decomposing at lower temperatures (around 225–325 °C), contributing to initial mass loss;
Lignin % decomposes slowly over a wider temperature range (160–850 °C), resulting in gradual mass loss that extends throughout the TGA process.
Thus, to accurately reflect how each of these lignocellulosic components breaks down at different temperatures, the proportions of cellulose %, hemicellulose %, and lignin % were adjusted dynamically with temperature changes during the TGA tests. By incorporating these temperature-dependent adjustments in lignocellulosic composition, this extended dataset captured the dynamic nature of each biomass component’s decomposition. This allowed the model to learn more about the complex relationship between temperature, heating rate, composition, and mass loss, enhancing its ability to accurately predict TGA curves.
Given the sequential nature of TGA data, with mass loss occurring over a temperature range, LSTMs offer a robust solution by managing long-term dependencies and efficiently processing complex temporal patterns. By addressing gradient issues and incorporating memory cells and gates, LSTMs provide the capacity to capture the intricate relationship between sample composition, heating rate, and mass loss across varying temperatures, making them an ideal choice for predictive modelling in this study.
2.3.2. Model Training, and Hyperparameter Tuning
The dataset, consisting of 14,875 data points, was divided into three subsets. Seventy percent (70%) of the data was used for training the model, ensuring the model’s ability to generalize to new unseen data, while 15% was reserved for validation. This subset plays a critical role in monitoring the model’s performance during training. The remaining 15% of data was dedicated to testing. This leads to evaluating the model’s ability to generalize by providing feedback on how well it performs on data that were not used in the training process. Furthermore, the validation loss is tracked during training, and if the model starts overfitting, the training is stopped early, helping to prevent overfitting. Once the training and validation phases were complete, the model was evaluated using a completely unseen dataset, which was not part of either the training or validation sets. This test dataset consisted of data from Blend 1, evaluated under different heating rates: 15 °C/min, 10 °C/min, and a completely new heating rate of 25 °C/min. This final test phase assesses the model’s generalization capability and its performance on truly new data. The entire model training, validation, and testing process was conducted using Google Colab, which provides open access to computational resources such as GPUs, with rapid execution of deep learning tests.
To optimize the model, Keras Tuner was used for hyperparameter tuning. Hyperparameters such as the number of LSTM layers, number of units in each layer, dropout rates, activation functions, and the optimizer were explored systematically. Furthermore, in order to forecast the subsequent value in the time series, the applied LSTM model uses the input sequences of 20 prior TGA measurements. This look-back window was part of the hyperparameter tuning process and was considered as a hyperparameter. In addition, the RandomSearch method was employed, which searches through the hyperparameter space by randomly selecting combinations and evaluating their performance (
Table 2 and
Table 3).
This approach was chosen for its efficiency in finding well-performing configurations, especially with a large number of hyperparameters to explore. The hyperparameter tuning process involved training multiple models, each with a different combination of hyperparameters, as is mentioned in
Table 2 and
Table 3. Finally, the model that achieved the lowest validation loss was selected for further evaluation (
Figure 2).
The considered architecture of the neural network comprises (1) the input layer (R4) which processes the TGA data, followed by (2) two hidden layers (R9 and R12) comprising the LSTM cells which lead to learn and retain temporal sequences of biomass pyrolysis behaviour. The present architecture allows to capture both short-term and long-term dependencies during the thermal decomposition patterns. Finally, it comprises (3) an output layer (R1) to generate the predicted mass loss evolution. This layer serves to compare predictive data against experimental ones, thereby evaluating model’s prediction accuracy.
Figure 2 depicts the optimal hyperparameters for each model.
To evaluate the performance of the model, the following several error metrics were used: Mean Absolute Error (MAE) to quantify the average of the absolute differences between actual values and their corresponding predicted values; Root Mean Squared Error (RMSE) offering an indication of how closely the model’s predictions align with observed data; and finally, the R-Squared (R2) coefficient.
4. Conclusions and Perspectives
This research emphasizes the importance of shifting from fossil fuel-based “Gray hydrogen” to renewable “green hydrogen” using underutilized biomass resources and machine learning (specifically deep learning) applications. The study explored the thermal, kinetic, and thermodynamic behaviours of spent coffee grounds (SCGs) and date seeds (DSs), and their blends for hydrogen production through pyrolysis process.
The results demonstrate that DS, SCG, and their blends, hold significant promise for bioenergy production. Blend 3 (75% SCG–25% DS) emerged as the most favourable for hydrogen production, with highest volatile matter release. However, it requires highest energy input. Blend 1 (75% DS–25% SCG) demonstrated superior energy efficiency with lowest energy demand. Building on these experimental findings, the study explores the LSTM modelling approach to predict TGA mass loss patterns, significantly outperforming traditional methods, with exceptional accuracy (R2 > 0.999), while considering the lignocellulosic composition of pure biomasses and their blends. These findings demonstrate that combining experimental and predictive approaches can promote the bioenergy production while reducing reliance on time- intensive TGA experiments and pyrolysis reactors.
As perspectives, future work will focus on expanding the diversity of the dataset to other biomasses, and to predict blend behaviour. An attempt to eliminate the need for blend-specific training data is envisaged. Additional efforts will enhance dataset diversity in blend ratios and applied heating rates in order to improve model generalization.