1. Introduction
Crowdfunding has emerged as a transformative mechanism for early-stage financing, allowing individuals and startups to raise capital directly from the public through online platforms. It can be regarded as a specialized branch of digital commerce that combines traditional online retail practices with community-driven financing models. Platforms such as Kickstarter and Indiegogo operate as e-commerce channels when entrepreneurs raise funds by pre-selling products; however, unlike conventional e-commerce where products already exist, crowdfunding involves selling a promise, often for products still under development. In this setting, consumers effectively become co-investors in product development, blurring the traditional distinction between buyers and backers [
1]. This expansion of the e-commerce paradigm allows entrepreneurs to secure advance commitments and small-scale investments from a large and distributed pool of supporters [
2].
Since the 2008 Global Financial Crisis, crowdfunding has gained substantial traction as an alternative to traditional funding mechanisms such as venture capital and angel investment. Despite its growing popularity, a significant proportion of crowdfunding campaigns fail to reach their funding targets [
3,
4]. Campaign success is commonly driven by two interrelated factors: the number of backers and the average contribution per backer [
5]. Maximizing these outcomes requires project initiators to attract broad attention while offering compelling value propositions and rewards [
6]. Consequently, accurately predicting the likelihood of campaign success has become a critical challenge for improving strategic decision-making across the crowdfunding ecosystem.
Unlike traditional e-commerce forecasting, which primarily relies on structured historical data such as product categories, pricing information, and customer demographics [
7], crowdfunding success prediction must account for heterogeneous and largely unstructured inputs. These include campaign narratives, visual media, and platform-generated interaction signals that reflect emotional appeal, trust, and perceived credibility [
3,
4]. The resulting data are high-dimensional and dynamic, making predictive modeling considerably more complex. This complexity highlights the need for artificial intelligence (AI)-driven, scalable decision-support systems capable of extracting actionable insights from rich textual content [
8].
In practical deployment scenarios, many influential behavioral and interaction-based signals—such as funding velocity, update frequency, or social engagement—are unavailable or unreliable at very early-stages of a campaign. As a result, there is substantial value in predictive models that operate exclusively on information available at launch time, when creators seek early feedback and platforms conduct initial screening. In this context, the campaign blurb (short description) plays a central role, as it represents the primary medium through which creators communicate project intent, value propositions, and credibility to potential backers.
Recent advances in AI and natural language processing (NLP) have enabled increasingly effective analysis of campaign blurbs. Pre-trained language models such as BERT and FastText have demonstrated strong performance in capturing contextual semantics and sentiment, and have been widely adopted in crowdfunding analytics [
9,
10]. When combined with classifiers such as Long Short-Term Memory (LSTM) networks or Gradient Boosting Machines (GBM), these embeddings can yield competitive predictive performance. However, most existing approaches rely directly on high-dimensional embeddings, leading to substantial computational overhead, scalability limitations, and increased risk of overfitting.
To address these challenges, this study proposes a novel deep learning framework that integrates a CBAM-enhanced autoencoder to perform attention-guided semantic compression of high-dimensional BERT embeddings prior to feature selection and classification. By reducing redundancy while preserving salient linguistic information, the proposed autoencoder produces compact representations well-suited for early-stage prediction. To further refine these representations, meta-heuristic feature selection algorithms—specifically Genetic Algorithm (GA), Jaya, and Artificial Rabbit Optimization (ARO)—are applied to identify discriminative and efficient feature subsets.
The proposed model enriches the existing literature through the following contributions:
A comprehensive end-to-end framework is introduced, integrating contextual generative AI embeddings (BERT), attention-guided dimensionality reduction via a CBAM-enhanced autoencoder, meta-heuristic feature selection techniques (GA, Jaya, and ARO), and sequence-aware classification methods (LSTM and GBM). This architecture achieves a balance between scalability and predictive performance, enabling early-stage campaign evaluation in digital commerce settings.
This study presents the first application of a CBAM-enhanced autoencoder to compress transformer-based embeddings in the context of crowdfunding, bridging attention-based semantic modeling and scalable feature optimization.
A hybrid approach is developed that combines deep representation learning, meta-heuristic feature selection, and text-aware classification to improve both predictive accuracy and computational efficiency.
An extensive evaluation is conducted on a large-scale Kickstarter dataset, demonstrating that the proposed framework achieves state-of-the-art performance with over 95% feature dimensionality reduction and substantially reduced computational cost, while remaining suitable for deployment in real-world platform scenarios.
By explicitly addressing the dimensionality bottleneck of large language models and focusing on early-stage, text-driven prediction, this work contributes both a methodological advance and a practical decision-support tool for enhancing the strategic value and sustainability of crowdfunding ecosystems.
3. Proposed Methodology
Building upon the gaps identified in the literature, particularly the lack of attention-based embedding compression combined with meta-heuristic feature selection for crowdfunding success prediction, we propose a multi-stage AI framework tailored to the digital commerce crowdfunding domain. The framework is designed to handle the semantic richness, high-dimensionality, and noisy nature of campaign blurbs, while producing computationally efficient predictions suitable for integration into platform-level decision-support systems.
As illustrated in
Figure 1, the pipeline integrates contextual text representations from advanced NLP models, a novel autoencoder for deep embedding compression, meta-heuristic algorithms for dimensionality reduction, and two complementary classifiers: LSTM and GBM. The proposed framework consists of the following stages:
Data Preprocessing: Campaign blurbs are cleaned, normalized, and tokenized to ensure high-quality input for embedding models.
Feature Extraction: Contextual embeddings are generated using BERT to capture persuasive linguistic and stylistic patterns associated with backer decision-making.
Deep Feature Compression: A novel CBAM-enhanced autoencoder is introduced to compress high-dimensional BERT embeddings, retaining semantic richness while improving efficiency.
Feature Selection: Meta-heuristic optimization algorithms (GA, Jaya, and ARO) are employed to identify the most relevant subsets of compressed features, maximizing discrimination between successful and failed campaigns.
Classification: Reduced feature sets are used to train LSTM and GBM models, chosen for their ability to capture sequential linguistic dependencies and structured feature interactions, respectively.
Each component of the framework is elaborated in the following subsections, together with implementation details and theoretical motivations.
3.1. Dataset Description
The dataset used in this study was sourced from a publicly accessible GitHub repository
https://github.com/rajatj9/Kickstarter-projects (accessed on 1 July 2025), comprising data from 191,724 Kickstarter campaigns launched during 2018. Each record contains structured and unstructured features including project title, blurb, funding goal, campaign duration, launch and deadline dates, category, and the campaign’s final status (successful or failed).
In our study, the primary predictive signal is extracted from the blurb field, which typically contains a short, natural-language description provided by the campaign creator. This aligns with the hypothesis that the wording and sentiment of the campaign blurbs play a vital role in influencing backers’ decisions. The target variable is binary: campaigns are labeled as successful or failed.
Although collected in 2018, the data set remains a reliable benchmark in crowdfunding research due to its large size, public availability, completeness, and frequent usage in recent NLP studies [
10,
41]. From a business standpoint, it reflects the enduring linguistic and structural patterns of crowdfunding blurbs, making it relevant for designing prediction tools that generalize to current campaigns. Class imbalance is inherent, with approximately 40,000 more failed campaigns than successful ones, warranting attention during training and evaluation to ensure balanced learning and avoid bias toward the majority class.
3.2. Text Preprocessing Pipeline
Text preprocessing is a critical step in any NLP pipeline. Poorly pre-processed text can negatively impact embedding quality and introduce noise into learning. In crowdfunding platforms, inconsistent capitalization, creative spelling, and marketing-heavy phrasing are common; preprocessing ensures that these stylistic variations do not degrade model performance. In this work, the following preprocessing steps were implemented using Python’s NLTK 3.9.1 and SpaCy 3.8.7 libraries:
Cleaning and Normalization: All campaign blurbs were stripped of punctuation, special symbols, digits, and URLs. Text was converted to lowercase to maintain uniformity and reduce vocabulary size.
Tokenization: Sentences were tokenized into word units using the SpaCy tokenizer, which handles contractions, punctuation, and common edge cases effectively.
Stop Word Removal: Words with minimal semantic value, such as ‘and’, ‘the’ and ‘to’, were removed using a standard stopword list.
Lemmatization: Each word was reduced to its base or dictionary form. For example, ‘running’ becomes ‘running’ and ‘better’ becomes ‘good’, allowing equivalent campaign blurbs to be matched despite superficial variation.
Padding and Truncation: Since neural models expect fixed-length inputs, each campaign description was padded (with special tokens) or truncated to 64 words. This length was chosen based on the distribution of blurb lengths, covering the majority without excessive padding.
These preprocessing steps ensure consistency in the input sequences, reduce the sparsity in the resulting vector representations, and normalize language patterns for fairer model comparisons.
3.3. Feature Extraction with Contextual Embeddings
To extract meaningful features from campaign blurbs, we employed the BERT embedding model for its respective strengths in capturing bidirectional semantic context. BERT is a deep contextual language representation model introduced by Devlin et al. [
40], which has achieved state-of-the-art performance in a wide range of natural language understanding (NLU) tasks. Unlike conventional word embedding models such as Word2Vec or GloVe, which generate fixed word vectors independent of context, BERT produces context-sensitive embeddings by capturing bidirectional relationships between words in a sentence. This is achieved through the application of the Transformer architecture [
54], which uses self-attention mechanisms to model dependency among all words in the input sequence, regardless of distance.
As shown in
Figure 2, BERT is built entirely from transformer encoder layers, each consisting of two subcomponents:
Multi-Head Self-Attention Mechanism: This mechanism computes attention weights between all token pairs in a sentence, allowing each token to be contextualized with respect to every other token. The multi-head structure enables the model to capture information from different representation subspaces simultaneously.
Position-Wise Feedforward Network: Each token’s attention output is passed through a fully connected feedforward network comprising two linear layers separated by a ReLU activation. This facilitates complex, non-linear transformations of contextual features.
The base BERT model used in this study comprises 12 encoder layers, each with 12 attention heads and 768 hidden units, resulting in rich and high-dimensional token-level outputs.
In our framework, BERT was selected because of its demonstrated superiority in capturing contextually rich representations of text. Given that crowdfunding campaign success often hinges on subtle linguistic cues, and sentiment, BERT’s ability to encode such nuances is particularly advantageous.
The following configuration was employed in our study:
A pre-trained BERT-Base model from the Huggingface Transformers library was used as the base encoder. This model was initially trained on a large corpus comprising BooksCorpus and English Wikipedia, covering more than 3 billion words.
Tokenization: Campaign blurbs were first tokenized using the BERT tokenizer, which decomposes words into subword units to effectively handle rare or compound words. The tokenized sequences were padded or truncated to a maximum length of 64 tokens.
Fine-tuning Strategy: To adapt BERT to the domain of crowdfunding language, the model was fine-tuned using the campaign dataset. Specifically, the lower encoder layers were frozen to retain general linguistic knowledge, while the upper layers were updated to learn domain-specific semantics associated with campaign success narratives. A learning rate of and a batch size of 32 were used during fine-tuning.
Embedding Extraction: For each token in the input sequence, we extracted the hidden states from the last four encoder layers. These were concatenated to form a 3D tensor of shape , where 64 corresponds to the standardized sequence length (post-padding), 768 is the hidden dimension, and 4 represents the depth of stacked layers. This rich embedding tensor preserves hierarchical contextual information across layers.
Compression and Use: The extracted tensor was subsequently passed through our CBAM-enhanced autoencoder for dimensionality reduction. This step was crucial in minimizing computational costs and suppressing redundant features before applying feature selection and classification.
Using BERT’s fine-tuned embeddings, our framework benefits from deep semantic encoding of campaign blurbs, capturing not only surface-level token information, but also underlying intent, sentiment, and characteristics for predicting campaign success.
3.4. Feature Selection with Hybrid Pipeline
The use of deep language models such as BERT for feature extraction introduces high-dimensional semantically rich representations while often include redundant or non-discriminative components. This dimensionality poses significant challenges, including increased computational cost, overfitting risks, and diminished model interpretability. These concerns are particularly acute in resource-constrained environments or real-time inference scenarios.
To address these limitations, we propose a hybrid feature selection pipeline built on top of our CBAM-enhanced autoencoder, which serves as a semantic compression layer for BERT embeddings. This attention-guided module effectively reduces the 3D contextual embeddings into dense latent representations by preserving salient spatial and channel-wise information. However, even after compression, the resulting feature vectors remain highly dimensional relative to traditional structured data, necessitating further refinement [
53].
Accordingly, we investigate three complementary meta-heuristic algorithms for feature selection which perform global search to identify optimal feature subsets by maximizing classifier performance metrics such as F1-score or Matthews Correlation Coefficient (MCC).
By applying these selection strategies to the latent representations produced by the CBAM-encoded BERT features, we aim to construct a parsimonious yet highly expressive feature set. This hybrid strategy not only enhances the robustness of the model, but also significantly reduces training and inference costs. The following subsections detail the implementation and performance of each method in the context of crowdfunding campaign success prediction.
3.4.1. CBAM-Enhanced Autoencoder for Embedding Compression
To address the computational and overfitting challenges associated with high-dimensional contextual embeddings, we propose a novel CBAM-enhanced symmetric autoencoder that compresses the output of the last four hidden layers of the fine-tuned BERT model. This component represents a key innovation of the study, serving as a deep semantic feature compressor before applying feature selection and classification.
Each campaign blurb is first encoded using the fine-tuned BERT model, producing a tensor of size , where 64 is the standardized sequence length, 768 is the embedding dimension, and 4 denotes the number of stacked BERT layers. This tensor is treated as a pseudoimage input to our convolutional autoencoder, which reduces the embedding dimensionality while preserving salient contextual information.
Convolutional Block Attention Module (CBAM)
The Convolutional Block Attention Module (CBAM), integrated into each convolutional layer of the autoencoder, is designed to refine feature representations by sequentially applying attention along two key dimensions: channel and spatial. CBAM enhances feature maps through a two-stage mechanism:
Channel Attention Module (CAM): This sub-module emphasizes informative feature channels while suppressing less relevant ones. It does so by aggregating spatial information using both global average pooling and global max pooling, followed by a shared multi-layer perceptron (MLP). The resulting attention map is applied via element-wise multiplication to the input feature map, selectively enhancing discriminative channels.
Spatial Attention Module (SAM): After refining channel-wise responses, the spatial attention module focuses on localizing salient regions in the spatial domain. It aggregates channel-wise information through average and max pooling operations along the channel axis and applies a convolutional layer to generate a 2D spatial attention map. This map is used to reweight spatial locations in the feature map.
Together, these attention mechanisms allow CBAM to adaptively highlight both “what” and “where” to emphasize in the feature tensor. This improves the network’s ability to learn compact and semantically enriched latent representations. The lightweight nature of CBAM enables its seamless integration into the encoder and decoder without significantly increasing computational overhead.
Encoder Architecture
The encoder consists of five sequential convolutional blocks, each comprising:
A 2D convolutional layer with increasing filter sizes: 8, 16, 24, 32, and 64;
Batch normalization and ReLU activation;
A Convolutional Block Attention Module (CBAM) to apply both channel-wise and spatial attention.
Each block downsamples the input through stride-2 convolutions, progressively reducing spatial dimensions and generating a latent representation of shape .
Decoder Architecture
The decoder mirrors the encoder with transposed convolutional layers, reversing the filter sizes (64 → 8). Each upsampling layer restores spatial dimensions. The final output is reconstructed to match the original shape . A sigmoid activation is applied at the output layer to constrain values to [0, 1].
The autoencoder is trained in an unsupervised manner using Mean Squared Error (MSE) loss. During training, both encoder and decoder weights are optimized. After convergence, the decoder is discarded, and only the encoder is retained to extract low-dimensional latent features for the next stages of the pipeline. CBAM modules enhance the encoder’s ability to emphasize semantically important regions of the input by applying adaptive attention across feature channels and spatial locations. This makes compressed embeddings both compact and information rich. Unlike standard autoencoders or PCA, our design preserves deeper linguistic structures that are vital for downstream classification.
By reducing the input space for meta-heuristic selection and classification, the proposed framework improves training efficiency, reduces model complexity, and enhances generalization, as shown in the experimental results.
3.4.2. Meta-Heuristic Based Methods
Genetic Algorithm (GA): The Genetic Algorithm (GA) [
55,
56] is a heuristic search approach inspired by the processes of natural selection and genetics, aiming to discover optimal solutions for complex issues. As a form of Evolutionary Algorithm, it uses essential mechanisms to evolve potential solutions over generations: The Selection operator picks individuals for reproduction using methods such as roulette wheel (selecting based on fitness proportion) or tournament selection (randomly selecting and choosing the best). The Crossover operator merges the traits of parents to create offspring, while the Mutation operator ensures diversity by introducing random variations. Elitism retains the best solutions across generations, and a fitness function assesses each solution against the objectives of the problem.
Jaya: The Jaya Algorithm is a powerful optimization strategy introduced in [
57] that operates without the need for algorithm-specific parameters such as crossover or mutation probabilities, mutation rates, or inertia weights. Its parameter-free design enhances robustness and simplifies implementation by removing the complications associated with parameter tuning prevalent in other optimization methods. This algorithm optimizes by artfully balancing exploration and exploitation, converging toward the optimal solution while steering clear of the worst option in each iteration.
It functions through a cycle of iterative improvement:
Initialization: The procedure starts by creating a diverse set of initial candidate solutions randomly spread across the search space. This diversity is essential for comprehensive exploration and avoiding early convergence to suboptimal solutions.
Calculate Fitness: The fitness of each solution is assessed using the objective function. The algorithm identifies the best and worst solutions in the current population, which guide the optimization path.
Update Solutions: A key aspect of Jaya’s strength is its update mechanism for solutions. Each candidate solution
in iteration
k is updated according to Equation (
1).
where
and
are random numbers between 0 and 1. This update method is crafted to pursue two objectives simultaneously: advancing toward the best solution (
) while retreating from the worst one (
). The introduction of randomness with
and
ensures stochasticity, which preserves diversity and helps avoid premature convergence.
Iterative Refinement: This process repeats until a stopping criterion, such as reaching a set number of iterations or a satisfactory fitness level, is satisfied. Throughout, the population’s quality improves as more effective solutions emerge and inferior ones are discarded.
The algorithm’s strength lies in its balanced exploration-exploitation dynamic and its parameter-free structure, making it particularly suited for robust feature selection applications.
Artificial Rabbit Optimization (ARO) method, as presented by [
58], imitates the survival strategies of rabbits by integrating exploration (via detour foraging) and exploitation (through random hiding) alongside a dynamic energy reduction mechanism. Its process updates are delineated as follows:
Exploration (Detour Foraging):
Exploitation (Random Hiding):
By adaptively balancing exploration and exploitation through energy dynamics, ARO is proficient in escaping local minima in complex search landscapes.
3.5. Classification
For an effective classification of crowdfunding project success, we utilized various machine learning algorithms recognized for their reliability, adaptability, and precision across different applications. Due to the high-dimensionality and complexity of the features derived from the project blurbs, machine learning models were chosen for their ability to handle noisy data, generate accurate predictions, and perform efficiently with high-dimensional feature sets [
10]. The models selected for this research, such as Gradient Boosting Machines Forest (GBM) and Long Short-Term Memory (LSTM) networks, are popular in NLP classification tasks, with each providing distinct benefits for our dataset [
59]. The next sections detail the advantages and technical details of each model and explain their suitability for predicting the success of crowdfunding initiatives.
3.5.1. Long Short-Term Memory (LSTM)
LSTM networks are a type of recurrent neural network (RNN) designed to capture long-term dependencies in sequential data. They are particularly well-suited for textual data, where contextual relationships between words play a critical role.
The LSTM model consists of input, forget, and output gates that regulate the flow of information. These gates help the network retain relevant information while discarding irrelevant details. LSTM networks were trained on campaign blurbs encoded by BERT embeddings. Dropout regularization was applied to prevent overfitting, and hyperparameter tuning was conducted using grid search to optimize parameters such as learning rate, batch size, and the number of LSTM units. The ability of LSTM to preserve sequential dependencies enables it to effectively model the contextual flow of campaign blurbs, making it ideal for textual classification tasks.
3.5.2. Gradient Boosting Machine (GBM)
GBM is an ensemble learning technique that builds a series of decision trees, where each tree corrects the errors of its predecessors. It is highly effective in handling structured data and is robust against overfitting when properly tuned.
GBM minimizes an objective function, such as classification error, by sequentially adding weak learners (decision trees) to the ensemble. The predictions of all trees are combined to make the final decision. GBM was trained on feature subsets selected by the proposed methods. Hyperparameters, such as the number of estimators, the learning rate, and the maximum depth of the tree, were optimized through grid search. This ensured that the model achieved high accuracy while maintaining generalization capabilities. GBM is robust in handling heterogeneous feature types and missing data. Its iterative nature allows it to effectively capture complex relationships between features and the target variable.
3.6. Performance Metrics
While accuracy is often suggested for evaluating performance, it fails to assess class discrimination. Alternatively, the Matthews Correlation Coefficient and F-Measure serve as a valuable metric for validating class-based model assessments.
Accuracy is defined as the ratio of correct predictions to the total number of cases. However, when the number of false positives () significantly surpasses false negatives (), it becomes necessary to employ the F-measure for performance assessment. The F-measure computes the harmonic mean of precision and recall, thus considering both false positive and false negative cases to evaluate class separation.
The Matthews Correlation Coefficient (MCC) provides a balanced evaluation of binary classification performance by incorporating true positives (
), true negatives (
), false positives (
), and false negatives (
). Unlike accuracy, which may be overly optimistic under class imbalance, and unlike the F-measure, which primarily emphasizes the positive class, MCC captures the overall quality of predictions across both classes. It measures the correlation between predicted and actual labels, returning values between −1 and +1, where +1 indicates perfect prediction, 0 represents random guessing, and −1 corresponds to complete disagreement [
10].
4. Experimental Results
All experiments were conducted in Python 3.11.3 using TensorFlow 2.4.1 and HuggingFace Transformers 4.5.1 on a system equipped with an Intel Ultra7–265K CPU, 64 GB RAM, and an NVIDIA GTX 5070 Ti GPU.
To assess the impact of dimensionality reduction and intelligent feature selection strategies on campaign success prediction, we conducted a series of experiments in three progressive phases. Each phase incrementally enhanced the input representation of the campaign blurbs and evaluated the impact using GBM and LSTM classifiers. Hyperparameters for the LSTM and GBM classifiers were tuned using grid search, with parameter ranges shown in
Table 1 and
Table 2. The optimal configuration was selected based on 5–fold cross-validation performance.
To ensure reproducibility in the presence of stochastic optimization components, all experiments were executed with fixed random seeds at the levels of NumPy, Scikit, and the meta-heuristic algorithms. In addition, each meta-heuristic configuration (GA, Jaya, and ARO) was evaluated across multiple independent runs within each cross-validation fold. We observed high overlap in the selected feature subsets across runs, indicating stable convergence rather than random fluctuation.
Given the class imbalance in the Kickstarter dataset, we report not only accuracy but also F1-score and the MCC, the latter being especially informative because it accounts for all entries of the confusion matrix. All results are reported as mean ± standard deviation across folds. Where appropriate, paired tests and confidence intervals are used to compare configurations.
Throughout our evaluation, we also consider the practical implications for digital commerce: whether a model’s prediction quality, dimensionality, and efficiency make it suitable for real-time platform curation, creator feedback systems, or backer personalization. Across the three phases of experimentation, our goal is to understand how different feature representations and selection strategies influence both predictive accuracy and operational feasibility.
4.1. Phase 1: Baseline Using BERT Embeddings (No Feature Selection)
As an initial baseline, each tokenized campaign blurb was processed with a fine-tuned BERT model. Token embeddings (768 dimensions) were averaged to produce a single vector representation per campaign, which was directly fed into the GBM and LSTM classifiers without any feature selection.
Table 3 presents the results.
These baselines confirm that textual blurbs alone capture meaningful predictive signals. However, the resulting 768-dimensional vectors remain computationally expensive for large-scale deployment, and while suitable for offline analytics, this representation is less appropriate for low-latency applications such as real-time campaign evaluation or creator-facing feedback tools.
4.2. Phase 2: Meta-Heuristic Feature Selection on Raw BERT Embeddings
To better exploit BERT’s deep contextual structure, we extracted the final hidden states from the last four transformer layers for each token. Layerwise averaging produced four 768-dimensional vectors, which were concatenated into a 3072-dimensional representation per campaign.
This high-dimensional representation formed the search space for three meta-heuristic optimizers: GA, Jaya, and ARO. Each algorithm was run for
100 epochs using two objectives—F1-score and MCC—and feature selection was performed separately for GBM and LSTM classifiers. The results in
Table 4 show that all optimizers improved MCC and accuracy relative to the Phase 1 baseline, with ARO generally offering the best trade-off between performance and compactness.
Compared to the baseline (Phase 1), the best configuration (LSTM + ARO, MCC objective) improves accuracy by +1.6% and MCC by +0.041, while reducing dimensionality by over 95%. These results demonstrate that meta-heuristic pruning effectively isolates the most discriminative textual cues and reduces computational cost, making the model more suitable for scalable deployment.
4.3. Phase 3: CBAM-Autoencoded BERT with Meta-Heuristic Feature Selection
To ensure a rigorous and reproducible evaluation protocol, all preprocessing and representation-learning steps were performed strictly within each training fold. For every cross-validation split, tokenization, BERT layer extraction, tensor stacking, normalization, autoencoder training, feature selection, and classifier fitting were learned exclusively from the training data. No vocabulary statistics, normalization coefficients, or feature-selection masks were transferred across folds, thereby eliminating any temporal or statistical information leakage.
In this phase, we developed a symmetric convolutional autoencoder augmented with Convolutional Block Attention Modules (CBAMs) to perform attention-guided semantic compression of contextual BERT embeddings. The CBAM-enhanced autoencoder was trained using the Adam optimizer with a learning rate of
and a batch size of 64. Training was conducted for a maximum of 100 epochs with early stopping based on validation reconstruction loss (patience = 10) to prevent overfitting. All convolutional and transposed convolutional layers employed
kernels with
regularization (
) applied to kernel weights to improve generalization.
Table 5 and
Table 6 present the detailed layer-wise architectures of the encoder and decoder components, respectively.
Proposed autoencoder reduces the original 196,608-dimensional input () to a compact 3072-dimensional latent representation (). This compact space becomes the new search domain for meta-heuristic feature selection using GA, Jaya, and ARO.
Table 7 summarizes the classification performance obtained from the compressed representations. Across both GBM and LSTM classifiers, ARO optimized for MCC consistently yields the best trade-off between predictive performance and compactness. The best configuration (CBAM + ARO + LSTM) achieves
77.8% accuracy and 0.515 MCC using only
117 features, corresponding to a
99.94% reduction relative to the original BERT embedding space while simultaneously improving predictive performance.
4.4. Time-Aware Validation
A common source of over-optimism in crowdfunding prediction studies arises from evaluating models using random cross-validation on temporally ordered data. Such protocols may inadvertently allow future linguistic patterns to influence training, thereby inflating performance estimates. To assess temporal robustness under realistic deployment conditions, we conducted an additional time-aware evaluation using a rolling forecasting-origin scheme.
Specifically, the best-performing configuration (CBAM + ARO + LSTM) was trained on campaign descriptions from the first four months of the dataset and evaluated on campaigns launched in the subsequent month. This process was repeated across all contiguous five-month windows within the 2018 Kickstarter dataset. Importantly, only blurbs available at launch time were used; no post-launch signals such as funding velocity, update frequency, or user interactions were included at any stage of training or evaluation.
The results demonstrate that the model generalizes well under realistic temporal drift conditions. Across all rolling folds, the model achieved an accuracy of 0.756, an F1-score of 0.812, and an MCC of 0.465. These scores are only marginally lower than those obtained via random 5-fold cross-validation (0.778 accuracy and 0.515 MCC), indicating that predictive performance does not rely on temporal leakage and remains robust when trained exclusively on past data to forecast future campaigns; while this evaluation does not explicitly include post-2018 data, the stability observed across rolling temporal splits suggests that the linguistic signals captured by the model are not tied to a single snapshot in time, but rather reflect broader narrative patterns common to crowdfunding campaigns.
This time-aware evaluation provides two important insights. First, linguistic and narrative cues embedded in campaign blurbs appear sufficiently stable over short time horizons to support early-stage prediction. Second, the modest performance degradation relative to random cross-validation is consistent with natural shifts in campaign style and platform dynamics, suggesting that lightweight periodic retraining (e.g., weekly or monthly) would be sufficient to maintain accuracy in operational settings. Taken together, these findings confirm that the proposed framework is well-suited for real-world deployment scenarios where predictions must be made before or immediately after campaign launch, using textual information alone. Nonetheless, extending this evaluation to campaigns launched in later years and under evolving platform dynamics remains an important direction for future work, particularly to assess the impact of changing creator behaviors, platform policies, and language trends.
4.5. Cost–Benefit and Deployment Analysis
To evaluate scalability and deployment feasibility, we compared our final pipeline against widely used transformer baselines.
Parameter Efficiency. Fine-tuned BERT contains ∼110 M parameters, and DistilBERT contains ∼66 M. Our CBAM autoencoder has fewer than 3 M parameters and the downstream 117-dimensional LSTM classifier adds fewer than 0.2 M. Thus, the full pipeline is over 30× smaller than DistilBERT and over 35× smaller than BERT.
Inference Latency. On CPU, fine-tuned BERT requires 20–30 ms per sample; DistilBERT requires 10–12 ms. LSTM using full BERT embeddings requires ∼15–20 ms. In contrast, the compressed + ARO-selected 117-dimensional representation enables sub-millisecond inference (<1 ms), providing a 30× speedup over BERT.
Memory Footprint. Fine-tuned BERT occupies ∼420 MB; DistilBERT ∼265 MB; LSTM baselines 50–100 MB. The 117-dimensional LSTM classifier occupies less than 10 MB.
In summary, the attention-guided compression and meta-heuristic selection pipeline achieves best-in-class accuracy while offering major improvements in latency, memory, and parameter footprint. These characteristics make it highly suitable for real-time deployment in platform-scale settings such as success-aware recommendation engines, automated campaign evaluation dashboards, or creator-support interfaces.
5. Discussion
This study introduced a hybrid framework that integrates fine-tuned BERT embeddings, a CBAM-enhanced convolutional autoencoder, and meta-heuristic feature selection to predict the success of crowdfunding campaigns. By explicitly targeting the challenges posed by high-dimensional textual representations, the framework advances predictive modeling in digital commerce while remaining computationally efficient enough for platform-scale deployment. Empirically, semantic compression combined with biologically inspired optimization yields consistent gains in accuracy, F1-score, and MCC, while simultaneously shrinking the feature space by more than 95% relative to raw BERT embeddings.
Building on these results, it is important to position the contribution of the proposed approach within the broader literature on crowdfunding prediction and e-commerce analytics, and to understand how each architectural component shapes performance. The following subsections first examine ablation findings, then compare the proposed framework to internal and external baselines, and finally discuss practical, theoretical, and methodological implications alongside key limitations and future research directions.
5.1. Ablation Studies
To isolate the contribution of each architectural component, we conducted a set of controlled ablation experiments using the LSTM classifier for consistent comparison. The ablations examine four factors: (i) the role of CBAM attention within the autoencoder, (ii) the benefit of learned compression versus classical dimensionality reduction, (iii) the impact of meta-heuristic feature selection, and (iv) the effect of different optimization objectives for ARO under class imbalance.
The first ablation (
Table 8) shows that replacing CBAM modules with linear dense layers reduces both accuracy and MCC while keeping the number of selected features fixed. This indicates that attention-guided compression produces more discriminative latent representations than a purely convolutional encoder, and that CBAM is not merely redundant with subsequent feature selection.
Comparing CBAM-AE to Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) (
Table 9) highlights the advantage of task-adaptive compression over generic projections. Even when the latent dimensionality is matched (3072 features), the autoencoder yields markedly higher accuracy and MCC, suggesting that it preserves contextual and sequential cues that are critical for crowdfunding success prediction.
The effect of meta-heuristic feature selection is equally clear (
Table 10). Applying ARO on top of the CBAM-compressed embeddings improves accuracy by approximately +2.6 percentage points and MCC by +0.039, while reducing the dimensionality from 3072 to 117 features (around 97% reduction). This confirms that optimization-driven refinement is crucial for both performance and compactness, rather than acting only as a computational convenience.
Finally, the comparison of ARO objectives (
Table 11) shows that optimizing for MCC yields the strongest overall discrimination under class imbalance, even at the cost of a slightly larger feature subset. This supports the choice of MCC as a primary objective and evaluation metric in imbalanced crowdfunding settings.
Taken together, the ablation results demonstrate that each component of the CBAM + ARO + LSTM pipeline contributes tangible and complementary benefits: CBAM improves semantic selectivity; learned compression outperforms PCA and UMAP; meta-heuristic selection enhances both accuracy and compactness; and MCC-based optimization better aligns the search process with the underlying success/failure imbalance.
5.2. Significance in Relation to Prior Work
To contextualize the contribution of the proposed framework, it is useful to compare its performance against both internal baselines and prior studies in the literature. Classical text-mining pipelines based on TF–IDF and linear or tree-based classifiers provide a strong yet lightweight reference point. As summarized in
Table 12, TF–IDF + SVM achieves an accuracy of 0.690, outperforming TF–IDF + Logistic Regression (0.669) and TF–IDF + GBM (0.673). These results are broadly consistent with earlier crowdfunding research that relies on shallow lexical features and highlight the predictive power of simple text statistics.
5.3. Explainability and Actionable Insights
For predictive models deployed in real-world crowdfunding platforms, performance alone is insufficient; stakeholders require interpretable and actionable insights that explain not only whether a campaign is likely to succeed, but also why such an outcome is predicted and how it may be improved. Although the proposed framework is optimized for predictive accuracy and scalability, it incorporates multiple mechanisms that support explainability at different stages of the pipeline.
First, the Convolutional Block Attention Module (CBAM) embedded within the autoencoder provides intrinsic interpretability by assigning adaptive attention weights across both channel-wise (semantic dimensions) and spatial (token–layer interactions) representations. These attention maps implicitly highlight which semantic regions of the transformer embeddings contribute most strongly to the compressed representation, enabling qualitative inspection of salient linguistic patterns.
Second, the meta-heuristic feature selection stage produces sparse and compact feature subsets, which enhances interpretability by narrowing the decision process to a limited number of highly informative dimensions. Across cross-validation folds, we observe consistent convergence toward similar subsets, suggesting that the selected features capture stable and discriminative textual cues rather than random artifacts.
Third, the downstream classifiers (LSTM and GBM) support post hoc explainability through established techniques such as SHAP or LIME. When applied to the reduced feature space, these tools can be used to trace predictions back to influential embedding dimensions and, by extension, to underlying phrases or stylistic patterns in campaign blurbs. This enables actionable feedback, such as identifying overly vague language, lack of clarity in value propositions, or insufficient emphasis on product uniqueness.
From a practical standpoint, these explainability pathways allow platform operators to provide creator-facing recommendations (e.g., improving narrative clarity or emphasizing concrete deliverables), assist backers in understanding the rationale behind risk assessments, and support transparent decision-making processes within platform governance. As such, the proposed framework balances predictive power with interpretability, aligning technical performance with the practical requirements of digital commerce ecosystems.
5.4. Practical Implications
The findings of this study have several concrete implications for stakeholders in the crowdfunding ecosystem and the broader digital commerce domain.
For platform operators, the proposed framework enables scalable, early-stage screening of campaigns using blurbs available at launch. By combining high predictive accuracy with extreme feature compression, the model can be integrated into real-time moderation, recommendation, or curation pipelines without incurring prohibitive computational cost. This capability supports improved campaign visibility allocation, risk-aware promotion strategies, and more balanced platform governance.
For campaign creators, predictive scores and language-sensitive feedback can serve as a decision-support tool during the campaign design phase. By identifying linguistic patterns associated with successful outcomes, creators can iteratively refine project descriptions prior to launch, thereby reducing the likelihood of failure and improving funding efficiency.
For investors and backers, AI-driven success predictions offer an additional signal for risk assessment in environments characterized by high uncertainty and information asymmetry; while not intended to replace human judgment, such tools can complement existing heuristics and increase transparency in decision-making.
From a deployment perspective, the attention-guided compression and meta-heuristic feature selection strategy substantially reduces memory and inference requirements, making the framework suitable for real-world implementation at platform scale. Beyond crowdfunding, the proposed approach generalizes to other digital commerce scenarios involving textual persuasion and early-stage forecasting, such as product launch evaluation, content-driven marketing campaigns, and success-aware recommendation systems.
Overall, these findings demonstrate that carefully designed AI architectures can deliver not only predictive improvements but also operational and strategic value in data-intensive digital commerce environments.
5.5. Theoretical Implications
Beyond its practical and strategic relevance, this study also advances theoretical understanding in the domains of artificial intelligence and digital commerce. First, it demonstrates that integrating attention-based embedding compression with meta-heuristic feature selection offers a viable approach to addressing the persistent challenge of high-dimensionality in text-based predictive modeling. This contributes to the broader literature on representation learning by showing how semantic compression can be systematically aligned with optimization-driven feature refinement.
Second, the findings reinforce the importance of sequence-aware models in capturing narrative and linguistic structures that drive consumer and investor behavior, extending prior work on the role of textual cues in digital decision-making. Third, by framing crowdfunding as a specialized form of e-commerce—blending financial uncertainty with emotionally charged narratives—this study advances theoretical perspectives on crowdfunding as both a commerce and investment ecosystem.
Finally, the work underscores the need to link predictive modeling approaches with theories of trust, persuasion, and platform governance, opening pathways for interdisciplinary research at the intersection of computer science, information systems, and entrepreneurship studies.
5.6. Limitations
Despite the strong empirical and computational results, several limitations should be acknowledged when interpreting the findings.
First, the dataset is restricted to Kickstarter campaigns from a single year (2018), which constrains long-term temporal generalizability. Although a rolling time-aware validation was conducted to mitigate this concern, shifts in creator behavior, platform policies, or backer preferences over extended periods may still affect model robustness. Evaluating the framework on more recent datasets remains an important direction for future work.
Second, the analysis focuses exclusively on English-language campaign blurbs. This introduces linguistic and platform-specific biases, limiting applicability to non-English or culturally distinct crowdfunding ecosystems. Extending the approach to multilingual datasets and alternative platforms such as Indiegogo or GoFundMe represents a natural extension.
Third, the model relies solely on textual campaign blurbs; while this design choice aligns with early-stage prediction scenarios, it excludes multimodal signals such as images, videos, update frequency, and social interactions, which are known to influence trust and engagement. Consequently, the current framework captures the linguistic dimension of persuasion but not the full multimodal structure of campaign success.
Fourth, although the meta-heuristic algorithms identify compact and highly discriminative feature subsets, their stochastic nature may introduce variability in the selected features across runs. To mitigate this, fixed random seeds, repeated runs, and cross-validation averaging were employed; nevertheless, further work incorporating explicit stability analysis and richer post hoc interpretability tools (e.g., SHAP-based explanations) would enhance transparency.
Finally, category-level heterogeneity across Kickstarter—such as differing baseline success rates between Technology, Arts, and Games—raises potential fairness considerations, and while project category labels were not used as direct predictive features, indirect correlations may still emerge through linguistic patterns. A systematic fairness audit, including category-conditioned performance analysis and bias-aware evaluation metrics, constitutes an important avenue for future research to ensure that predictive accuracy does not disproportionately favor dominant or mainstream campaign types.
These limitations highlight promising directions for future work, including multimodal integration, cross-platform and multilingual validation, fairness-aware modeling, and enhanced interpretability analysis.
5.7. Future Directions
Future work could extend the framework along several dimensions:
Multimodal integration: Incorporating campaign images, videos, and social interactions to capture richer signals of persuasiveness.
Cross-platform validation: Testing on other crowdfunding platforms (e.g., Indiegogo, GoFundMe) to assess generalizability.
Real-time and continual learning: Developing lightweight, on-device models and adaptive retraining strategies to reflect evolving campaign patterns.
Explainable and responsible AI: Applying SHAP, LIME, and attention visualization to enhance transparency, while addressing fairness and bias in predictive outcomes.
These directions not only address current limitations but also pave the way for scalable, responsible deployment of predictive tools in digital commerce platforms.
6. Conclusions
This study presented a hybrid AI framework for predicting the success of crowdfunding campaigns by integrating fine-tuned BERT embeddings, a CBAM-enhanced convolutional autoencoder, and meta-heuristic feature selection. The proposed architecture directly addresses the challenge of high-dimensional textual representations, delivering strong predictive performance on an imbalanced Kickstarter dataset while substantially reducing computational footprint. In its best configuration, the model achieves 0.778 accuracy, 0.821 F1-score, and 0.515 MCC using only 117 features, and maintains robust performance under time-aware validation, indicating that its gains are not solely an artefact of random cross-validation.
The empirical analyses yield several key insights. First, CBAM-powered compression reduces the original BERT tensor to a 3072-dimensional latent space and, after feature selection, to 117-dimensional representations, while improving accuracy and MCC relative to both raw embeddings and classical dimensionality reduction (PCA, UMAP). Second, ARO consistently produces more compact and discriminative feature subsets than GA and Jaya, particularly when optimized with an imbalance-aware objective. Third, LSTM-based classifiers systematically outperform GBM across all phases, underscoring the importance of sequence-aware architectures for capturing narrative structure in campaign blurbs. Ablation studies further confirm that each component—CBAM, learned compression, meta-heuristic selection, and MCC-based optimization—contributes distinct and complementary gains.
Beyond methodological contributions, the framework demonstrates practical relevance for digital commerce. Its compact architecture and sub-millisecond inference times make it suitable for platform-scale deployment in real-time scoring pipelines, enabling creators to receive early feedback on campaign narratives, backers to better assess risk, and platforms to enhance curation, recommendation, and coaching services. At the same time, the model’s design is generic enough to be adapted to related tasks such as product launch forecasting, promotion optimization, and success-aware personalization in broader e-commerce contexts.
Overall, this work contributes both technical innovation and applied value: it shows how attention-guided compression and meta-heuristic feature selection can be combined to reconcile expressiveness, interpretive potential, and efficiency in text-based prediction, thereby supporting more informed and scalable decision-support tools in crowdfunding and beyond.