1. Introduction
The fast-growing prevalence of nascent software firms within the international technological landscape has only increased the criticality of making informed product decisions when operating in an environment of extreme uncertainty and scarcity. Product roadmap creation, particularly the identification and prioritization of features to build, is an activity that has traditionally been considered both highly impactful and poorly structured within the startup process [
1]. An ill-prioritized list of features can drain resources building capabilities that the market has little need for, whereas the right roadmap can hasten the achievement of product-market fit and increase chances of survival [
2]. Although its significance has been widely accepted, the prevailing methodologies for prioritization have tended to be heavily dependent on expert intuition and are largely unconnected with empirical inputs derived from delivery data and market analytics [
3].
Breakthroughs in the field of deep learning have provided a game-changing opportunity for data-based, rather than intuition-based, decision-making support systems. Valued at USD 96.8 billion by 2024, the worldwide deep learning market is expected to grow beyond USD 526.7 billion by 2030 at a Compound Annual Growth Rate (CAGR) of 31.8% [
4]. Software platforms for practitioners in the field of machine learning have been identified as the fastest-growing category in this market environment, representing over 46% of the total revenues generated [
4]. At the same time, the investment landscape witnessed record funding levels in the area of generative and predictive artificial intelligence (AI), rising to USD 33.9 billion globally in 2024, marking a rise of 18.7% year-over-year—an indication of investors’ confidence in practical applications of deep learning technology [
5]. Among startups, AI companies have been making increasing traction among enterprise software budgets, raking in over USD 3.5 billion in vertical AI solutions in 2025, up from USD 1.2 billion the previous year [
6].
Along with the macro-economic changes taking place, there is also a significant shift in the philosophy underlying product management. Moving away from output-based metrics, such as the number of features delivered within sprints, the field is moving towards more outcome-based metrics where decisions at each step of a product’s roadmap relate to its user base, engagement, and impact on revenue [
2,
3]. Machine learning plays an important role in this shift in focus, with prioritization and discovery being two key areas where AI can offer great benefits. Platforms that use AI to prioritize product development suggest that they can forecast which features have the highest likelihood of success by analyzing past performance of sprints, user behavior, and competition intelligence. Instead of having product managers estimate success of new features based on their intuition, such platforms deliver confidence scores based on objective criteria [
7]. However, the academic literature is yet to catch up, with studies that experimentally validate various deep learning networks to perform end-to-end feature prioritization not being publicly available [
1].
One aspect of the issue that stands out as not sufficiently considered is the ability to incorporate market signals into the prioritization pipeline process. Information on the labor market—more specifically, the overall skillset and profile required by technology firms to hire—is a high-frequency and low-cost alternative that can be used as a signal regarding changes in competitive capabilities and technology adoption. Online professional networks publish several hundred thousand job postings monthly, each containing vast amounts of structured and unstructured information about organizational priorities, emerging technology stacks, and comparative demand for specific functions in the respective industries [
8,
9]. Past research in labor economics and computational social science shows that job postings data may serve as a strong predictor of industry-wide technology adoption trends six to eighteen months prior [
10]. But until now, no one has managed to incorporate such outside market intelligence into the prioritization loop of a given startup’s software product.
The proposed solution to the above problem is presented in this research, where we introduce and experimentally assess the performance of a deep learning model that considers two related yet distinct data sources for predicting priority features based on: (i) historical delivery information such as sprint log files, velocity, bug rate, and adoption curves for individual features from startups’ engineering and product management tools; (ii) market signals from the dataset consisting of LinkedIn job postings from 2023–2024 [
8]. The LinkedIn dataset contains around 124,000 postings with features such as position title, company name, geographic location, skills required, experience, and salary. This dataset captures the current demand for technology skills within the labor market in North America. For preparing these postings into structured feature vectors, we use the same preprocessing pipeline suggested by Varghese [
11] involving text normalization, categorical feature encoding, salary scaling, and skill embedding dimension reduction.
There are three main contributions from this paper. First, we formulate the problem of exploring labor-market signals as a reproducible proxy for feature prioritization as a supervised classification problem on a mixed feature space, and we introduce a methodology for constructing such a dataset, particularly for applications relevant to early-stage startups. Second, we present a deep learning framework capable of encoding both internal delivery experience and market information, and show that using labor market data significantly improves proxy-classification performance compared to models trained using only internal signals. Finally, we perform an ablation study in order to determine the relative importance of each input feature type, and analyze interpretability through attention-weighted gradients [
12]. We emphasize that the present work constitutes a proof-of-concept for this proxy-based approach; empirical validation against direct expert-annotated priority labels remains as future work.
The rest of the paper proceeds as follows.
Section 2 discusses the related literature on requirements prioritization, market signal interpretation, and deep learning in software engineering.
Section 3 explains the dataset, data pre-processing pipeline, and the model architecture.
Section 4 presents the experimental setup and evaluation protocol. The results are reported in
Section 5, followed by a dedicated Discussion in
Section 6. Finally,
Section 7 concludes the paper.
2. Literature Review
Early-stage software startups encounter many uncertainties regarding decision making concerning successful operations and feature selection. The availability of limited resources and changing market dynamics call for the utilization of effective and data-oriented means to make decisions. There have been various studies in recent times utilizing machine learning and deep learning technologies to conduct analyses on startup data and predict their success. Besides, there have been other pieces of research analyzing feature selection processes conducted by startups in relation to budget and timeline constraints. Unfortunately, these two topics are considered independently. This literature review presents a summary of five major studies regarding feature prioritization and startup success prediction and the gaps in each of the papers.
The studies included in this review were identified through a structured search conducted on ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar using the following primary search terms: “predictive feature prioritization in early-stage software startup,” “deep learning software product management,” “labor market signals machine learning,” and “startup success prediction.” The search was restricted to peer-reviewed journal articles, conference proceedings, and theses published between 2019 and 2026. An initial pool of 312 candidate papers was screened by title and abstract, of which 47 were retrieved in full text and evaluated for relevance to at least one of three criteria: (a) feature or requirement prioritization in software engineering, (b) startup success prediction using machine learning or deep learning, and (c) labor-market or job-posting data for technology trend analysis. From this pool, five studies were selected as the most directly comparable to the present work, representing the key gaps this paper addresses. It is acknowledged that the broader literature on this topic is extensive—a ScienceDirect search on “predictive feature prioritisation in early-stage software startup” returns over 700 results for 2026 alone—and the present review focuses specifically on the studies most closely aligned with the proposed framework rather than providing an exhaustive systematic review.
Thirupathi et al. [
13] offer an approach to detecting early signs of startup success based on the use of machine learning algorithms applied to data on 3160 companies that received funding through SBIR/STTR grants. Their method employs publicly available data, both financial and from Crunchbase profiles, to construct an XGBoost predictor, which successfully forecasts startups’ success, showing accuracy of 84% and AUC of 0.91. According to the paper, important time-independent characteristics that contribute to startups’ success include entrepreneurs’ experience and education. Such results illustrate the relevance of team characteristics in assessing startups’ development potential. Nevertheless, the authors do not pay much attention to time-dependent variables that can be used in the forecasting process.
Stahl [
14] proposes a machine learning model using Gated Recurrent Units (GRU) to forecast the probability of a successful outcome for a startup during different funding rounds. The model uses time series signals, including the growth of employees, website visitors, and competitor funding. The algorithm forecasts the possibility of acquiring additional rounds of financing with an accuracy level as high as 85%, particularly for those ranked at the top. The study reveals that time series signals positively impact prediction accuracy. Moreover, it shows that traditional factors such as networks of investors have no significant effect on improving model efficiency. Despite the model’s usefulness in predicting startups’ financial outcomes, it fails to consider how features are prioritized.
Shi et al. [
15] conduct experiments to predict the success of startups using different machine learning algorithms based on a large dataset consisting of 24,965 startups. Algorithms considered include Random Forest, XGBoost, and Support Vector Machines, which have a classification accuracy of more than 90%. Random Forest outperforms other methods, showing good robustness when dealing with a difficult dataset. In the study, the authors emphasize the benefits of data-driven decision-making in venture capital and its contribution to objective predictions that are not affected by personal biases. With the inclusion of historical data and industry information, the methods used make accurate predictions about startups’ performance. Nevertheless, the use of dynamic data and sequential analysis has not been discussed, while the ways in which the prediction results can be applied within startups remain unclear.
Pattyn et al. [
16] investigate the feature prioritization approaches adopted by software startups via a survey conducted on 171 product managers. The authors highlight critical aspects of decision-making, such as the need for low cost and fast time to market. While large firms tend to focus on long-term profitability, startups aim at maximizing the pace of product delivery and effective resource utilization because of their short life span. It becomes evident from the study that finance-oriented prioritization plays a crucial role in avoiding unnecessary scaling and liquidity problems. In summary, the paper provides meaningful information on the feature prioritization strategies used by software startups. Nevertheless, the study has a purely qualitative nature and ignores any analytical methods.
Rivera [
17] proposes a machine learning approach for predicting startup success based on structured data, such as financial data and geographic information. In the experiments carried out in the paper, several models are considered, and it is shown that the stacking ensemble method works better than the other models. One of the main contributions of this research is the application of SHAP values for interpreting the results, which makes it possible for users to understand the importance of features and the way models make predictions. This approach increases transparency in the use of AI for making investment decisions. In terms of shortcomings, the research considers only the issue of investment decision-making and not product-level issues, such as feature selection.
3. Methods and Materials
3.1. Dataset Description
The LinkedIn Job Postings dataset, available at
https://www.kaggle.com/datasets/arshkon/linkedin-job-postings (accessed on 16 April 2026), is a large-scale real-world dataset comprising job advertisements collected from the LinkedIn platform. It is mainly a textually rich, unstructured dataset, wherein the description field holds textual data on job roles, job responsibilities, and skills and qualifications needed, and the formatted experience level attribute is a categorical representation of job level.
Task Formulation and Proxy Label Rationale. A clarification is warranted regarding the relationship between the classification target defined in this study and the overarching goal of feature prioritization. Direct ground-truth labels for product feature priority are not publicly available at scale; accordingly, following established practice in labor-market-signal research [
10], we employ
seniority demand derived from job postings as a
proxy for market-driven feature priority. The rationale is as follows: job postings that require senior or specialist-level expertise signal organizational investment in strategic, high-priority capability areas, whereas entry-level postings typically correspond to routine or lower-priority operational work. The binary target
y therefore operationalizes “high market-driven feature priority” as the absence of an entry-level designation (
) and “low priority” as an entry-level designation (
). While this proxy is imperfect, it provides a reproducible, publicly verifiable approximation of market signal strength that can be replaced by direct priority labels in future work when such datasets become available.
Data Leakage Prevention. To prevent data leakage arising from trivial predictors, the formatted_experience_level field—from which the target label is derived—was excluded from the input feature set during model training and SHAP analysis. The presence of formatted_experience_level in SHAP plots reported in an earlier draft was a result of it being mistakenly included as a predictor; this has been corrected in the current version. The SHAP analysis now reflects the model’s reliance on genuine market-signal features (industry, location, title, engagement metrics) rather than on the label-defining field itself.
Historical Delivery Dataset (). The internal delivery dataset used for the Historical Feature Encoder was compiled from sprint-level records of three anonymized early-stage software startups operating in the SaaS domain. The dataset contains 1840 sprint records spanning 24 months (Q1 2022–Q4 2023), with each record including: sprint velocity (story points completed), defect rate (bugs per story point), feature adoption rate (percentage of users engaging with a released feature within 30 days), and a binary expert-assigned priority label validated by the respective product managers. All company identifiers were removed prior to analysis. The internal records were not directly joined to the LinkedIn postings; instead, the BiLSTM encoder was pre-trained on the delivery sequences to produce a fixed-length historical embedding H, which was then concatenated with the CNN-derived market embedding Z for each LinkedIn posting sample. This design avoids the need for record-level alignment between the two heterogeneous datasets while still enabling cross-modal learning.
To accomplish this research, the dataset is narrowed down to two central features: the textual job descriptions (used to extract linguistic and contextual patterns) and the experience level (transformed into a binary target field). Text data enables thorough feature engineering based on lexical, structural, and readability-based representations and facilitates robust modeling of job qualities. This organized conversion of unstructured text data renders the dataset amenable to supervised classification and powerful predictive modeling.
3.2. Data Preprocessing
The preprocessing pipeline includes data cleaning and filtering, textual description feature engineering, building domain-specific binary indicators, splitting the dataset, oversampled class balancing, and lastly normalization where all features are scaled similarly before model training.
3.2.1. Handling Missing Values and Target Encoding
Initially, redundant rows are eliminated and rows that lack target values are eliminated to guarantee data integrity. As described in
Section 3, the
formatted_experience_level field is used exclusively to construct the proxy target label and is thereafter removed from the input feature matrix to prevent data leakage. This categorical target variable is then coded to the binary form to be used with supervised learning. The binary label operationalizes
low market-driven feature priority (entry-level demand,
) versus
high market-driven feature priority (non-entry-level demand,
), as justified in
Section 3. The change simplifies the task of classification but preserves the semantic meaning, as formulated in Equation (
1):
3.2.2. Text-Based Feature Engineering
The next step is the extraction of linguistic characteristics of the job descriptions, such as the word count, sentence count, vocabulary size, average sentence length, lexical richness, and readability index. These characteristics measure structural and semantic attributes of writing, which allows the model to comprehend intricacy and information density. Equation (
2) defines the lexical richness metric used to capture vocabulary diversity in each job description:
3.2.3. Keyword-Based Binary Feature Construction
Additional domain-specific binary features are created by detecting the presence of predefined keywords within job descriptions. This measure is better at boosting the interpretability of the quantitative information by directly encoding the significant job-related signals like customer service or project management. As shown in Equation (
3), each binary feature takes the value 1 if a given keyword is present in the text and 0 otherwise:
3.2.4. Train-Test Splitting
The pre-processed data is subsequently separated into training and testing datasets based on a given random state to achieve reproducibility. This segregation permits objective analysis of how models perform on unknown data. Equation (
4) defines the partitioning constraint, ensuring the training and test sets are mutually disjoint:
3.2.5. Handling Class Imbalance
Random oversampling is also used in order to create a balanced distribution of classes in training data since the latter is imbalanced. This will enhance better model generalization and avoid majority bias. As expressed in Equation (
5), oversampling equalizes the minority class count to match that of the majority class:
3.2.6. Feature Normalization
Finally, Min-Max normalization is used to scale all features to have a similar range such that the scales do not disproportionately affect the model. The purpose of this step is to stabilize training and enhance convergence. Equation (
6) presents the Min-Max normalization formula applied to all input variables, scaling each feature value between 0 and 1:
3.3. Proposed Model
The proposed deep learning-based approach to predict feature prioritization discussed in this study focuses on early-stage software startups. This approach leverages delivery data internally within the startup, along with signals from the outside world through the LinkedIn Job Postings dataset. This approach is designed to capture temporal patterns, semantics, and cross-domain information.
Figure 1 illustrates the overall workflow.
The historical delivery dataset
is formally defined as in Equation (
7), where
signifies the prioritization label and
reflects internal development attributes such as completion time, resource allocation, and sprint velocity:
Similarly, the market signal dataset
, comprising textual and categorical information including industry needs, skill demand, and job trend indicators, is defined as in Equation (
8), where
:
The overall learning objective is to find a function
f that maps the joint input space of historical and market features to a priority label, as formulated in Equation (
9):
where
f uses both internal delivery cues and external market signals to forecast a feature’s importance.
The proposed architecture consists of three primary components:
Historical Feature Encoder (HFE)
Market Signal Encoder (MSE)
Cross-Modal Fusion and Prediction Layer
The step-by-step procedure of the proposed model for predictive feature prioritization by integrating historical delivery data and market signals is shown in Algorithm 1.
| Algorithm 1 FeatPriorNet: Predictive Feature Prioritization Framework |
Require: Historical dataset , Market dataset Require: Learning rate , batch size B, epochs E Ensure: Trained parameters
- 1:
Initialize parameters - 2:
Preprocess (normalization, sequencing) - 3:
Preprocess (tokenization, embedding) - 4:
for to E do - 5:
Shuffle training data - 6:
for each batch do - 7:
Historical Encoder (HFE) - 8:
for each timestep t do - 9:
- 10:
- 11:
- 12:
end for - 13:
Compute attention: - 14:
- 15:
Market Encoder (MSE) - 16:
- 17:
- 18:
- 19:
Fusion - 20:
- 21:
Prediction - 22:
- 23:
Loss - 24:
- 25:
Update - 26:
- 27:
end for - 28:
end for - 29:
return
|
3.3.1. Historical Feature Encoder (HFE)
A Bidirectional Long Short-Term Memory (BiLSTM) network is used to describe sequential dependencies in previous delivery data. Given a series of historical features
, the forward hidden state
and backward hidden state
are computed as formulated in Equations (
10) and (
11), respectively:
As shown in Equation (
12), the forward and backward hidden states are concatenated at each timestep
t to form a full bidirectional representation
:
An attention mechanism computes a weighted sum over all timestep representations to produce the final historical embedding
H, as formulated in Equation (
13), where
denotes the attention weight at timestep
t:
3.3.2. Market Signal Encoder (MSE)
The majority of market signals that come from job postings are textual. To extract semantic characteristics, we use a Convolutional Neural Network (CNN) after a pretrained embedding layer.
Let
be the tokenized job description. As shown in Equation (
14), each token
is mapped to a dense vector
via a pretrained embedding layer, yielding the sequence matrix
E:
A 1-D convolution with kernel weights
and bias
is then applied over the embedding sequence, as defined in Equation (
15), producing local feature maps
via a ReLU activation:
Max-pooling is applied across all positions of the feature map to extract the most salient signal, as expressed in Equation (
16):
The resulting pooled feature vector forms the final market representation
Z, as denoted in Equation (
17):
3.3.3. Cross-Modal Fusion Layer
We employ a fusion approach that combines concatenation with a fully connected transformation to combine historical and market representations. As formulated in Equation (
18), the concatenation of
H and
Z is passed through a fully connected layer with non-linear activation
to produce the fused representation
F:
where
is a non-linear activation function (ReLU).
3.3.4. Prediction Layer
For binary prioritizing, a dense layer with sigmoid activation is applied to the fused representation. As given in Equation (
19), the output
represents the predicted probability that a feature belongs to the high-priority class:
where
represents the sigmoid function.
3.3.5. Loss Function
The model is optimized by minimizing the binary cross-entropy loss function, as defined in Equation (
20), which penalizes the divergence between predicted probabilities
and true labels
across all
N training samples:
3.3.6. Training Strategy
The model parameters
are updated at each step using the Adam optimizer, as given in Equation (
21), where
is the learning rate and
is the gradient of the loss with respect to the parameters:
Dropout and batch normalization are employed to mitigate overfitting and stabilize training.
Table 1 presents the hyperparameter configuration of the proposed model.
5. Results
5.1. Comparative Performance Analysis
The performance comparison analysis gives insight into how well the proposed model compares to classical machine learning and deep learning classifiers. This analysis shows the impact of cross-modal feature learning on enhancing prediction confidence and improving prioritization accuracy.
Table 2 shows the performance comparison of the proposed model with multiple traditional classifiers, including precision, recall, F1-score, accuracy, and AUC-ROC. For the traditional and single-stream baselines (Logistic Regression, SVM, Random Forest, Gradient Boosted Trees, CNN, BiLSTM), the multimodal input was handled by simple feature-vector concatenation: the normalized historical delivery features and the TF-IDF-encoded market-signal features were concatenated into a single flat vector before being passed to each baseline model. This approach is the standard protocol for incorporating heterogeneous feature types into non-fusion architectures and ensures a fair comparison with the proposed cross-modal fusion strategy. The lowest performance is obtained by Logistic Regression (accuracy 0.810), suggesting its inefficiency in modeling the complex relationship between prioritization and features. SVM shows better results (0.842 accuracy), and Random Forest achieves 0.896 accuracy owing to its non-linear feature capabilities. Gradient Boosted Trees (XGBoost) and BiLSTM achieve comparable performance (0.912 and 0.913 accuracy, respectively). The proposed model achieves the highest performance with precision, recall, and F1-score of 0.929, accuracy of 0.933—consistent with the confusion matrix values (TN = 4623; FP = 307; FN = 357; TP = 4643; accuracy =
)—and an AUC-ROC of 0.961, confirming its superiority on the proxy classification task.
5.2. Training and Validation Performance Analysis
Training and validation analysis offer a view of the learning process, convergence, and generalization of the model by examining the accuracy and loss fluctuations during training epochs.
The training and validation accuracy and loss curves of the proposed model over 50 epochs are shown in
Figure 2. The accuracy curve demonstrates the pattern of improvement across the initial epochs, where the training accuracy initially starts with a value of about 62% and gradually rises to around 97%, and the validation accuracy rises from about 57% to nearly 95%. This suggests that the model keeps on extracting valuable features from past delivery and market data patterns. This trend is also reflected in the loss curve, where the training and validation losses start to decline rapidly in the initial epochs and then converge to a stable behavior towards the end of the epochs. The small gap between training and validation indicates good generalization without overfitting. Therefore, the display assures the stability and successful learning of the proposed method.
5.3. Confusion Matrix Analysis
The confusion matrix analysis provides an understanding of the classification of the proposed model through a breakdown of false and true positive and negative predictions for priority and non-priority features.
Figure 3 shows the confusion matrix of the proposed model for negative and positive feature-priority classes. The model produces 4623 correct predictions (true negatives) for negative samples and 4643 correct predictions (true positives) for positive samples, demonstrating the efficacy of the proposed model for the prediction of non-priority and priority feature classes. However, it also misclassified 307 negative samples as positive (false positives) and 357 positive samples as negative (false negatives). Given the much lower number of misclassified samples compared to correctly classified samples, the proposed model behaves in a balanced manner towards both classes. It also implies that the use of historical delivery data and market signals helps the proposed model to differentiate between priority and non-priority features. In all, the matrix demonstrates good classification accuracy with minimal classification error.
5.4. ROC Curve Analysis
The ROC curve analysis measures the capability of the model to separate priority and non-priority feature classes for various threshold values, thus providing insight into the separation of priority and non-priority classes and the general accuracy of prediction.
The ROC curve in
Figure 4 illustrates the ability of the proposed model to separate the feature classes of priority and non-priority categories. It draws steeply to the top-left corner, which means that the model has a high hit rate with a low false alarm rate under various threshold settings. The calculated AUC value (0.961) is quite satisfactory and demonstrates strong separability of the classification, further proving that this model is able to successfully rank the important features from non-important ones. This model exhibits much better predictive performance than the diagonal dashed line (random classification). This suggests that using past delivery data and market signals makes classification boundaries better and helps to predict feature priorities.
5.5. Feature Importance Analysis
The analysis of feature importance gives insight into the relative importance of important feature inputs, and how the historical delivery attributes and market signals contribute to the proposed model in prioritizing jobs.
The feature importance plot in
Figure 5 shows the contributions of certain market attributes to the prioritization decision of the proposed model. School is the most influential feature, with an importance score of 0.062, as it is strongly associated with market demand patterns derived from job-posting signals. Business has the second-highest score (0.042), followed by diploma (0.026) and lift (0.023). The following features, such as management (0.022), year (0.020), leadership (0.018), strategic (0.018), safety (0.017), and senior (0.016), possess acceptable scores. The figure reveals that markers showing market need and market demand-related terms play a significant role in assisting with feature-priority predictions within the proposed model.
5.6. SHAP Analysis
SHAP analysis explains how individual features influence the proposed model’s prediction decisions by measuring their positive or negative contribution, improving interpretability and transparency in feature-priority classification.
The SHAP summary plot is given in
Figure 6. As described in
Section 3,
formatted_experience_level was completely excluded from the input feature set prior to model training; it therefore does not appear in the SHAP analysis. The corrected SHAP analysis identifies
industry as the most influential predictor, followed by
location. Features such as
application_type,
title, and
log10_views have a moderate level of impact, with their values both increasing and decreasing the prediction. In contrast,
work_type,
formatted_work_type,
log10_applies, and
sponsored show the smallest impacts. These results confirm that the model relies exclusively on genuine market-signal features, with no leakage from the label-defining variable.