Predicting Patent Life Using Robust Ensemble Algorithm

Park, Sang-Hyeon; Kim, Min-Seung; Rhee, Jaewon; Lee, Sang-Hwa; Kim, Jeong Kyu; Oh, Si-Hyun; Sung, Tae-Eung

doi:10.3390/su17219658

Open AccessArticle

Predicting Patent Life Using Robust Ensemble Algorithm

by

Sang-Hyeon Park

¹

,

Min-Seung Kim

¹

,

Jaewon Rhee

¹

,

Sang-Hwa Lee

¹

,

Jeong Kyu Kim

¹

,

Si-Hyun Oh

¹

and

Tae-Eung Sung

^2,*

¹

Department of Computer Science, Graduate School, Yonsei University, Wonju 26493, Republic of Korea

²

Division of Software, Yonsei University, Wonju 26493, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(21), 9658; https://doi.org/10.3390/su17219658

Submission received: 2 September 2025 / Revised: 24 October 2025 / Accepted: 28 October 2025 / Published: 30 October 2025

(This article belongs to the Special Issue Innovation and Strategic Management in Business)

Download

Browse Figures

Versions Notes

Abstract

Increasing macroeconomic uncertainty necessitates that firms optimize their R&D investment and commercialization strategies. Patents, as crucial outcomes of R&D with legal protection, impose significant costs due to progressively increasing maintenance fees. Predicting patent life accurately thus becomes critical for effective patent management. Previous studies have often and primarily employed classification models for patent life prediction, while limiting practical utility due to coarse granularity. This study proposes a robust ensemble regression model combining multiple machine learning techniques, such as Random Forest and deep neural networks, to directly predict patent life. The proposed model achieved superior performance, surpassing individual baseline models, and recorded a Mean Absolute Error (MAE) of approximately 852.81. Additional validation with active patents further demonstrated the model’s practical feasibility, showing its potential to support sustainable intellectual property management by accurately predicting longer life for high-quality patents currently maintained. Consequently, the proposed model provides ongoing firms and brand-new startups with a decision support tool for strategic patent maintenance and commercialization decisions. By promoting efficient allocation of R&D resources and reducing unnecessary maintenance of low-value patents, the approach fosters sustainable management of innovation assets, enhancing predictive accuracy and long-term applicability.

Keywords:

patent life prediction; robust ensemble algorithm; deep neural network; random forest; sustainable patent portfolio management

1. Introduction

Since the fourth industrial revolution, companies have continually demanded innovation capabilities from rapidly transforming industrial ecosystems. Consequently, corporate investments in research and development (R&D) to secure competitiveness within the industry have been steadily increasing alongside the increasing demand to evaluate the appropriateness and innovativeness of these R&D outcomes. In particular, patents are recognized as representative data that encompass information regarding a company’s R&D activities and provide insights into both technological changes and sustainable innovation [1]. Patent quality is closely related to the outcomes of R&D activities, and patent quality reflects the overall results of these activities [2]. Consequently, patents have been used as key metrics for evaluating R&D performance [3]. Thus, the evaluation of patent quality has emerged as a critical issue, with numerous studies focusing on utilizing bibliographic information in patents [2,4,5].

Patents possess the characteristic of facilitating quantitative analysis of the extensive R&D activities conducted across various industries. This renders them crucial for understanding historical trends in technological development [6]. Recently, the purpose of utilizing patents has expanded from the mere analysis of R&D trends and transformation processes to active application for purposes related to technology commercialization, assurance, and investments in technology management and finance [7]. For rational decision-making by stakeholders with varying objectives, appropriate strategies that quantitatively evaluate the distinctiveness and competitiveness of patents as intangible assets compared with similar patents within the industry must be constructed [8]. Motivated by these research demands, academia has continuously conducted studies to evaluate patent quality based on patent information.

The maximum duration of a patent’s validity is legally 20 years, and progressively increasing annual renewal fees besides registration fees impose a burden on assignees [9]. Therefore, it becomes increasingly difficult for an assignee to maintain a patent when there are fewer opportunities to generate revenue relative to the maintenance costs incurred [10]. Consequently, most patents expire before reaching their maximum lifespan by the assignee’s waiver or other reasons. Therefore, a longer patent maintenance period can indicate a higher patent quality when associated with its economic value, and the patent maintenance period (patent life) can be used as a proxy variable for evaluating patent quality [11,12,13]. Further, while many studies have attempted to evaluate patent quality using proxy variables such as patent life or bibliographic information, most have relied on single prediction models or individual analytical approaches, thereby limiting the robustness and generalizability of their results.

In contrast, this study proposes a differentiated approach by systematically comparing a variety of machine learning and deep learning models and constructing an ensemble of the top-performing models. This ensemble model-based strategy addresses the limitations of single-model approaches and enhances predictive accuracy. Furthermore, the proposed model has thoroughly underwent a feasibility test, which confirmed its generalized performance and practical applicability, thereby demonstrating its potential for real-world deployment in decision-making processes.

For this purpose, this study utilized patent life as the target variable in order to evaluate patent quality, and the analysis was conducted using a dataset consisting exclusively of expired Korean patents.

The principal contribution of this study is as follows:

I.: Direct Prediction of Patent Life: This study proposes a novel approach by directly predicting patent life, distinguishing itself from prior research. By framing the problem as a regression task instead of classification, the proposed method provides a more precise assessment of patent quality. Furthermore, the directly predicted patent life serves as a critical variable for quantifying the intrinsic value of patents.
II.: Robust Ensemble Modeling: To identify the optimal model for patent life prediction, this study compares the performance of various machine learning and deep learning models. By ensemble the best-performing models, the robustness and accuracy of predictions are enhanced. Unlike existing studies that focus on single-model approaches, the proposed ensemble method complements the limitations of individual models, leading to improved overall performance.
III.: Support for Rapid and Precise Decision-Making in Patent Portfolio Management: As discussed in Section 3.1, the proportion of maintained patents gradually slows down over time, largely influenced by the assignee’s strategic intentions. In this context, the model’s ability to directly predict patent life—closely linked to patent quality—offers valuable insights to support long-term decisions such as whether to maintain or abandon a patent. By enabling rapid and precise evaluation of the economic value of individual patents, the proposed model facilitates fast-track decision-making in critical contexts, including determining patent maintenance, assessing the feasibility of technology transfer, and prioritizing technology investments. This capability is particularly meaningful in practical environments involving the management of large-scale patent portfolios or the pursuit of technology commercialization. Ultimately, this approach enhances the strategic management of patent portfolios while aligning these decisions with the pursuit of sustainable technological innovation.

The remainder of this paper is organized as follows. Section 2 establishes the foundation of the research conducted based on a review of previous studies on patent quality assessment and patent life prediction. Section 3 introduces data collection and preprocessing methods as well as the research methodology based on machine learning models. Further, it outlines the research direction. Section 4 and Section 5 present the experimental results based on these methodologies and validate the effectiveness of the proposed model. Finally, Section 6 discusses the conclusions, limitations, and directions for future research.

2. Literature Review

2.1. Proxy as a Patent Quality

Most studies that have evaluated patent quality have been conducted based on specific quantitative indicators representing the technological superiority, distinctiveness, and competitiveness of patents. In particular, predictions or relative advantages over similar technologies have been focused upon.

First, studies have indicated that the size of the market wherein a patent is applied or the diversity of the technology fields covered by the patent can reflect the patent’s quality, and these factors are also critical in promoting sustainable technological innovation in a competitive global environment. A study [14] utilized the analytic hierarchy process (AHP) to analyze the key factors for valuing patents. It was found that among the four factors set in the study, a larger market size for patent applications correlated with a higher patent quality of the patent. Furthermore, ref. [15] mentioned that a greater number of International Patent Classification (IPC) categories suggested a diversification of technology types covered by the patent. This facilitated more advanced commercialization promotion. In addition, refs. [16,17] demonstrated that the IPC can be a suitable substitute variable for capturing the scope of commercialization and evaluating the excellence in patent quality. Further, ref. [18] highlighted that the family size of a patent was a crucial variable for determining its patent quality. This implies that as the types of countries wherein patent rights can be protected expand, the scope of future commercialization is expected to expand. Consequently, assignees are likely to perceive a higher utility of the patent.

Second, research has indicated that the more frequently a patent is cited in other patents, the higher its technological impact as a source technology. This is interpreted as a factor representing the quality of a patent. A study [19] proposed a point-process-based patent citation prediction model and confirmed that the number of patent citations could be utilized as a variable for patent evaluation. Similarly, refs. [20,21] noted a positive correlation between forward citations and patent quality. In addition, ref. [22] demonstrated that network formation, which combines three types of citations (forward, backward, and simultaneous), revealed the significant influence of citation structure on patent quality. Finally, considering the long-term input costs associated with patents (e.g., maintenance fees), several studies have interpreted the life of a patent as a factor representing its patent quality, which can also serve as a meaningful indicator for guiding sustainable technological innovation strategies. For instance, refs. [18,23,24,25,26,27,28,29,30,31,32] are representative examples of studies that have employed patent renewal data as a reference scale for evaluating patent quality.

2.2. Prediction of Patent Life

As discussed previously, patent quality is evaluated based on various proxies. Patent life is regarded as one of the most commonly used proxies. It has been utilized to evaluate patent quality through risk rate prediction for the termination of patent life or classification of patent lifecycle based on diverse methodologies.

A study [26] applied survival analysis based on a Weibull distribution to predict the risk of early termination of patent life by exploiting intrinsic factors within patent bibliographic information (e.g., the number of claims and distinctiveness of claims based on text data). Further, ref. [27] classified the intrinsic factors of patents into environmental, behavioral, and genetic factors. Consequently, they analyzed the ability of each factor to induce variations in the survival risk of patents via the application of the Cox proportional hazards model. A study [28] extended this aspect and proposed a patent life prediction model based on the Arrhenius chemical reaction rate theory to evaluate patent quality using intrinsic factors employed in previous studies. It achieved approximately 60% classification performance by categorizing each prediction result into three classes: short-term, medium-term, and long-term. Similarly, ref. [29] applied survival risk prediction based on the Cox proportional hazards model by combining the types and financial characteristics of assignees with the intrinsic factors of patents. The significance of the finding that the survival risk of patents can vary significantly depending on the type and business capability of the assignee was highlighted. Based on this, ref. [32] demonstrated that the intrinsic factors of patents, extrinsic factors reflecting the characteristics of similar technology groups, and industrial factors reflecting the characteristics of the applied industry exerted a significant influence on the survival risk of patents using a time-dependent Cox regression model.

However, with the continued accumulation of patents across various countries, maintaining technological competitiveness in a rapidly changing innovation environment has become increasingly important. While survival analysis models have been effective in predicting the risk of early patent termination, they are less suited for providing the precise, continuous value of patent life needed for detailed economic valuation and portfolio management. Reflecting this need, research has emerged that directly predicts patent life using complex nonlinear models, such as machine learning and deep learning. A study [13] categorized patent life registered with the USPTO into four classes based on the patent life cycle. The study utilized the intrinsic and extrinsic factors of patents as learning indicators to perform classification using a gradient boosting model (GBM), resulting in an f1 score of approximately 0.63. Similarly, ref. [33] proposed a feed-forward neural network (FFNN) model to perform binary classification to determine whether each patent could maintain the maximum legal remaining term of 20 years by utilizing the intrinsic and extrinsic factors of the patents. The classification model ultimately achieved an f1 score of approximately 0.85. This confirmed the applicability of complex models, such as artificial neural networks, to the characteristics of patent indicators. Furthermore, ref. [33] extended the findings by establishing a 9-step classification system based on the derived classification probabilities, thereby presenting a direct patent evaluation system. Using the same learning indicators and objectives as those presented in [33,34] proposed a focal loss-based LightGBM and several comparative models for patents registered with the USPTO. Despite being trained on the same set of variables, the model produced a lower f1 score of 0.77 compared to the FFNN. This clearly demonstrated the validity of neural-network-based models in patent data. Table 1 summarizes the core principles and methodological approaches of the studies reviewed above, as well as their similarities and differences.

2.3. Stacking Ensemble

The stacking technique is a meta-learning approach that combines predictions from multiple base learning models as new inputs to build a model, subsequently training a meta model to produce the final prediction [35]. Stacking technique has proven to be more effective in terms of predictive performance compared to single learning models. Ref. [35] proposed a stacking ensemble model using the SMOTETomek technique to predict major adverse cardiovascular events (MACE) in patients with acute coronary syndrome (ACS), concluding that this stacking ensemble model achieved improved predictive performance compared to methods using only single machine learning models. In [36], four individual learning models (gradient boosted model, distributed random forest, generalized linear model, and deep neural network) were used as base-learners for breast cancer classification. Their predictions were combined as inputs for a meta-learner, with results showing that the gradient boosted model as a meta-learner achieved the highest predictive performance. Ref. [37] proposed the Stacking Ensemble Learner (SEL) model to predict emergency readmissions for heart failure patients, using 13 classical machine learning models as base-learners and XGBoost as the meta-learner. This stacking model achieved superior predictive performance over single machine learning models. Meanwhile, the stacking ensemble technique has not been widely utilized in patent domain. Ref. [38] noted the scarcity of studies utilizing stacking ensemble techniques in patent classification, presenting and comparing various stacking ensemble frameworks for this purpose. Ref. [39] proposed an ensemble method to address the issue of imbalanced patent data, partitioning the dataset by high- and low-frequency codes before training and stacking ensemble the results, which improved the accuracy of a single classification model by approximately 6%.

As above, stacking ensemble is widely utilized to improve the performance of individual learning models. However, few studies have utilized this technique in evaluating patent quality. Therefore, this study aims to utilize the stacking ensemble technique with the expectation that it will yield improved performance over single machine learning models. By comparing the predictive performance of multiple machine learning models and their ensemble, we propose a robust ensemble model. Additionally, as observed in [36], while numerous studies have attempted to predict patent life using classification approaches, studies utilizing regression-based predictions are notably limited. To address this methodological gap in existing research, this study adopts a regression-based prediction approach to improve methodological diversity.

3. Data & Methodology

With the increasing emphasis on recognizing patent quality, numerous studies have focused on various proxy variables. Among these variables, patent life has been identified as the most commonly utilized. Therefore, this study adopted the background of the previous research and employed patent life as a proxy for patent quality. However, instead of subjectively defining the classifications of patent life as in previous studies, this study aimed to ensure the validity of the results by directly predicting patent life. In addition, considering that the prediction results presented in this study ultimately served as a tool for supporting managerial and financial decision-making, this study focused on maximizing the applicability of the model by applying an ensemble of multiple deep learning-based models to ensure the robustness of the results. To ensure generality and reproducibility across industries and assignees, we utilized solely on publicly available patent records and exclude firm-level financial covariates (e.g., market size, revenue, assignee market value) that may be unavailable, difficult to compare, or influenced by the target.

3.1. Data

3.1.1. Data Collection & Preprocessing

In this study, patents have been collected over Korea Intellectual Property Rights Information Service (KIPRIS), a public patent information search service operated by the Korea institute of Patent Information (KIPI). KIPRIS provides an online database of intellectual property rights managed by the Korean Intellectual Property Office (KIPO), comprising approximately three million Korean patents filed since the 1940s, along with intrinsic patent information [40]. Prior to preprocessing, we restricted the scope to invention patents only: we selected and used only invention patents from KIPO’s public records, and excluded utility models entirely, given the distributional heterogeneity arising from institutional differences such as statutory term and fee structures. For the purpose of data preprocessing, this study extracted data based on the following axioms:

(i): Patent data after 2000.
(ii): Expired patents as of the data collection date.
(iii): Patents without missing data.

First, patents filed since 2000 were selected from the KIPRIS database according to axiom 1. This is because the KIPO commenced its digitalization in 1999 through the implementation of an online patent application system named ‘Patents-net. In contrast, patents filed prior to 1999 were collected manually, which were more prone to data errors such as missing values or outliers. According to [41], replacing missing values (e.g., using the median or mean of the entire dataset) in cases of high proportions of missing data may reduce overall data quality. Consequently, 397,871 patent data points were excluded.

Second, according to axiom 2, only patent rights that had expired were included while those still valid were excluded. In supervised learning methodologies within machine learning and deep learning, a clearly defined target variable, such as patent life, serves as an essential component for prediction tasks. As of the date of data collection, valid patents lack a definitive patent life, thereby lacking a defined target variable. Accordingly, patents without a determined patent life were deemed unsuitable for experiments utilizing the proposed methodologies in this study, leading to the exclusion of 1,102,295 patents.

Third, the integrity of the dataset was ensured by excluding patents with missing values, according to axiom 3, thereby enhancing the consistency and reliability of the analysis. This process resulted in the exclusion of 358,449 data points.

Finally, 235,777 patent data points were preprocessed for developing machine learning models, including deep neural networks (DNNs). The preprocessed data were split into training and test sets (see Section 3.2.2). The preprocessing steps for the collected data are illustrated in Figure 1.

Figure 2 presents a graph visualizing the number of patents corresponding to each annual patent life of up to 20 years. The observation that the number of registered patents decreased sharply as the number of years increased implied a high correlation between rapidly increasing annual patent maintenance fees and the number of patents. This can be interpreted as evidence supporting the hypothesis that maintaining a patent for a long duration in cases of low patent quality is challenging.

3.1.2. Features

The composition of the learning variables was based on previous studies, and only 27 pieces of information that were deemed to have a direct or indirect impact on patent life were selected. These variables were broadly categorized into two types: intrinsic and extrinsic patent indicators [15,42,43]. Intrinsic patent indicators comprise information that can be quantitatively extracted from the patent bibliographic data for each patent. This category includes the resources invested in creating individual patents and the scope of rights that can be protected based on patents, which are considered to have a direct impact on patent life [42,44,45]. Extrinsic patent indicators comprise information that reflects the ecosystem of the technology group to which the patent belongs. The type of technology group to which a patent belongs is interpreted based on the IPC. Further, its purpose is to reflect factors (e.g., the intensity of competition and general life cycle of the technology group) in the learning process [46,47,48,49]. The detailed compositions of the variables are listed in Table 2.

3.2. Experiment Setting & Methodology

3.2.1. Experiment Setting

The experimental environment was established using Python version 3.10.11. This Python environment was built by installing libraries for creating and training machine-learning models, as well as libraries for data processing and preprocessing. The experiments were conducted on hardware equipped with a 16-core, 24-thread CPU, a GPU with 3584 CUDA cores, 12 GB of VRAM, and 64 GB of RAM. The detailed specifications of the experimental environment are listed in Table 3. Further, to evaluate the performance of the machine learning model used in the experiment, four evaluation metrics frequently used in regression models were utilized: the mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), and root mean squared error (RMSE). These metrics are calculated as shown in Equations (1)–(4), respectively.

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(1)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100

(2)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(3)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(4)

where

n

is the number of data points,

y_{i}

denotes the true value and

{\hat{y}}_{i}

denotes the predicted value. Since all four metrics are error-based, lower values for each indicate better model performance.

3.2.2. Methodology

For model construction, this study employed seven different machine learning models: (1) Random Forest (RF), (2) XGBoost (XGB), (3) light gradient boosting model (LGBM), (4) DNN, (5) linear regression (LR), (6) support vector regression (SVR), and (7) auto encoder (AE). The predictive performance of each model was evaluated and compared to implement a multi-model ensemble. A brief description of these seven machine learning techniques is provided in the Appendix A. These models were implemented using machine learning algorithms available within machine learning libraries set up in an experimental environment.

Considering that the dataset used for training comprised 235,777 records, there was no guarantee that the dataset would be free of outliers. Thus, the RobustScaler function from the Scikit-learn library, which scales data based on the median and interquartile range, was used to reduce the sensitivity of the training data to outliers. In addition, to evaluate the performance of the seven machine learning models, the train_test_split function from the scikit-learn library was used to separate 30% of the entire dataset as test data.

To achieve optimal hyperparameter tuning of the RF, XGB, LGBM, and SVR models, optimization techniques, such as Hyperopt, were employed. The DNN and Autoencoder (AE) models underwent hyperparameter tuning through the Keras-Tuner library, which utilizes the Hyperband search algorithm to efficiently find optimal settings. To ensure the statistical reliability of the hyperparameter optimization results, 5-Fold Cross Validation was uniformly applied to evaluate every candidate hyperparameter set for all tunable models. The LR model, having no adjustable parameters, was excluded from the hyperparameter tuning stage. The loss function for training all models was standardized to the Mean Absolute Error (MAE). The resulting final optimal hyperparameter sets and their respective search spaces are listed in Table 4.

Based on the experimental environment created, this study conducted experiments, as shown in Figure 3. After training each machine learning model (

M_{1}, M_{2}, \dots, M_{n}

) using preprocessed Korean patent data, the predicted results

P_{1}, P_{2}, \dots, P_{n}

were used to evaluate model performance. Consequently, all the high-performance models were extracted and compared based on a final model evaluation using a multi-model ensemble. The multi-model ensemble was implemented using the stacking technique. Here, the predictive results from each extracted machine learning model were used as inputs for a deep learning-based ensemble model to perform the final prediction.

The proposed ensemble model operated as follows. At Level-0, the patent data were utilized to train individual models

M_{1}^{’}, M_{2}^{’}, \dots, M_{k}^{’}

. Consequently, the predictive results

P_{1} ’, P_{2} ’, \dots, P_{k} ’

were combined and relearned by a metamodel at Level-1. Then, we progressed to the final performance test with the model’s predicted result

P_{f}

. To enhance predictive performance, we attempted to predict by setting the unit of

P_{f}

to days.

4. Experimental Results

The performance evaluation of each model based on evaluation metrics revealed that the RF model demonstrated the best performance in terms of MAE, MSE, and RMSE, while the DNN model demonstrated the best performance in terms of MAPE. Additionally, the XGB model showed performance in MAPE equivalent to RF, and recorded similarly high performance in MAE, MSE, and RMSE compared to RF. The detailed results for each model are presented in Table 5.

Based on these results, this study developed a stacking ensemble model utilizing RF, XGB, and DNN which demonstrated comparatively high performance against other machine learning models as the Level-0 baseline models. To identify the optimal ensemble configuration, four combinations were systematically compared: (RF, XGB), (RF, DNN), (XGB, DNN), and (RF, XGB, DNN). The Level-1 meta-learner used in the ensemble was configured as a DNN. Given its multi-layered architecture, a DNN is particularly well suited to capture complex nonlinear relationships [50,51,52,53], making it advantageous for modeling and learning the nonlinear interactions present among the predictions of the heterogeneous baseline models. The Level-1 meta-learner was constructed using the optimized hyperparameter settings, and the same hyperparameter setting used for the Level-0 model optimization was applied to ensure consistency in the comparison criteria.

The performance evaluation results for the four combinations are summarized in Table 6, with the (RF, DNN) combination identified as the optimal configuration. This combination achieved errors of MAE (852.81), MAPE (0.35), MSE (1,193,663.09), and RMSE (1092.55), recording the best performance across three evaluation metrics (MAE, MSE, and RMSE). Notably, the MAE, MSE, and RMSE values were improved relative to the single models RF, XGB, and DNN, and the MAPE performance was recorded at a level between the two single models RF and DNN, demonstrating the superior robustness of the stacking ensemble model (RF, DNN) compared to the single models.

For clarity, all metrics in Table 5 and Table 6 were computed on the hold-out test set, all preprocessing and model selection (including K-Fold cross validation for hyperparameter tuning) were performed exclusively on the training set, and the test set was reserved for the final external evaluation.

In this study, the training dataset utilized for machine learning includes multiple features, making it impossible to rule out the presence of multicollinearity among variables. According to [54,55], multicollinearity among variables can degrade the predictive performance of the model or lead to overfitting issues. Therefore, to assess the presence of multicollinearity, this study analyzed the feature correlation matrix, and the results are presented in Figure 4.

We have experimentally identified that there exists multicollinearity, as evidenced by correlation coefficients exceeding 0.8 among pairs of independent variables, such as IPC Activity and Size of IPC, as well as Number of IPC and Diversity Impact [56,57,58]. To address multicollinearity and further enhance model performance, this study applied dimensionality reduction using Principal Component Analysis (PCA). Prior to model training, all features were standardized and PCA was fitted. The explained variance ratio (EVR) of each principal component and the corresponding cumulative values are reported in Appendix B (Table A1).

For the purpose of testing, we set cumulative EVR thresholds of 99%, 95%, 90%, and 85%, which corresponded to retaining 20, 18, 16, and 14 principal components, respectively, in our data. Accordingly, in addition to the original feature space, we prepared four PCA-transformed datasets—denoted PC20, PC18, PC16, and PC14—based on these component counts. These PCA datasets were generated after feature scaling and were evaluated with the same training configuration and performance metrics as the baseline experiments, ensuring that any performance differences reflect the data representation (original vs. PCA) rather than changes in the training procedure.

Figure 5 implies that the original dataset-based model outperformed all four PCA-based dimensionality reduction models. Among the models trained on PCA-transformed datasets, the PC20 model achieved the highest performance. However, when compared to the model trained on the original dataset, it failed to achieve superior performance across all evaluation metrics. Notably, in terms of MAE, the error was approximately 43 units higher than that of the original dataset-based model. Given that the primary objective of the proposed model is to achieve high predictive performance, the original dataset-based model was deemed more suitable for this study’s objective. Therefore, the proposed model was ultimately trained on the original dataset.

Additionally, to verify the generalization performance of the ensemble model, K-Fold Cross Validation was conducted with k values of 3, 5, and 7. K-Fold Cross Validation splits the dataset into k folds and iteratively uses each fold once as an out-of-sample validation set while training on the remaining folds, which yields a more stable performance estimate, improves data utilization, and reduces overfitting by evaluating on unseen validation data in every round [59,60]. Among these, the 5-Fold Cross Validation yielded the best values across all evaluation metrics. The ensemble model trained using the 5-Fold Cross Validation was identified as the most generalized. Accordingly, it was adopted as the final proposed model for this study. The performance results of each fold and the proposed ensemble model are shown in Table 7. Table 7 presents the average performance and variability from K-fold cross validation, which partitions the dataset into k folds and iteratively uses each fold once for validation while training on the remaining folds. This procedure assesses the model’s stability (generalization) across data splits and indicates whether overfitting has occurred.

Taken together, the proposed model’s performance on the hold-out test sets (Table 6) was broadly consistent with the cross validation average (Table 7), suggesting stable generalization across data partitions, with no distinct signs of overfitting.

To further validate the feasibility of the proposed model, this study conducted additional experiments using a dataset of active patents that have not expired yet. The rationale behind this validation is that if the predicted patent life exceeds the actual maintenance duration of current active patents, it demonstrates both the feasibility in real cases and practical applicability of the proposed model.

In this supplementary experiment, a dataset comprising 2215 active patents was collected and utilized as input for the proposed model to predict patent life. Subsequently, the predicted patent life was simultaneously compared with the actual maintenance duration of these active patents. The results indicated that in approximately 71.2% of cases, the predicted patent life exceeded the current maintenance duration of active patents. Furthermore, the mean absolute error (MAE) was approximately 811.70 days, corresponding to an average error of approximately 2.22 years. Table 8 provides examples of prediction results for active patent life.

5. Discussion

The annual patent registration fee can be used as a criterion for evaluating a patent quality based on its lifetime, and it also serves as a meaningful indicator for shaping long-term strategies that promote sustainable technological innovation. Certain studies [13,29,61] mentioned that as the number of years of patent maintenance increased, the annual maintenance costs (patent registration fees) gradually increased, thereby rendering it difficult to maintain patents of low value for an extended period. The annual patent registration fees for Korean patent data are paid based on the patent fee guide provided by the Korean Intellectual Property Office (KIPO). This guide is detailed in Table 9, which shows that the payment cycle for annual registration fees is renewed every three years.

The prediction error (MAE) of 852.52 derived from the final ensemble model indicated a patent life prediction error of approximately 2.33 years. This level of precision is particularly valuable for promoting sustainable technological innovation, as it enables firms to make more informed decisions about which patents to maintain or abandon. By accurately predicting the longevity of patents, the model supports the efficient allocation of R&D resources and reduces the financial burden of maintaining low-value patents. This prevents the wasteful use of capital and human resources, which is a core principle of sustainability.

Furthermore, to validate the model’s potential for real-world sustainable intellectual property management, a comparative experiment was conducted using a dataset of active patents. The results showed that in 71.2% of cases, the predicted patent life exceeded the current life duration of these active patents. This indicates that the model is more likely to identify patents with high potential for long-term value, aligning with sustainable strategies that prioritize quality over quantity. The Mean Absolute Error (MAE) of 811 days (approximately 2.22 years) for this dataset, which falls within the range of patent registration fee payment cycles, further supports the model’s practical applicability.

These findings demonstrate the proposed model’s ability to effectively identify high-quality patents, thereby offering a decision support tool for assignees regarding patent maintenance and transaction decisions. Such capabilities are essential for fostering a sustainable innovation ecosystem where resources are concentrated on patents with genuine long-term value, ultimately contributing to sustainable technological innovation.

6. Conclusions and Further Studies

6.1. Conclusions

Patents quantitatively analyze extensive R&D activities. Studies focused on evaluating patent quality based on patent information are crucial for understanding and analyzing trends in technological development. As discussed in Section 2.1, most studies utilize quantitative indicators as proxy variables for evaluating patent quality, and Section 2.2 discussed the potential use of patent life as a proxy variable. In addition, most of the previous studies reviewed in this study utilized a single machine learning model, which results in the potential for performance improvement in prediction models that utilize multiple models being neglected. Therefore, this study predicted patent life using various machine learning techniques and extracted the models that demonstrated the best performance for each evaluation metric based on their performance evaluation results. By combining these extracted models, we constructed a multi-model ensemble model with improved performance compared to single machine learning models. Consequently, the ensemble model achieved MAE (852.52) and MAPE (0.34) values that were lower than those of the best single model for each metric.

The experimental results demonstrate that the proposed ensemble model demonstrated superior performance compared to single machine learning models and that a multi-model ensemble can improve performance in predicting patent life by taking into account aspects that might be overlooked by single models. Based on these results, the ensemble model was shown to be more robust in predicting patent life, thereby suggesting its high utility in future evaluations of patent quality based on patent life. Ultimately, this deep learning-based automated methodology for evaluating patent quality can be utilized as a decision-making tool for assignees regarding patent maintenance. It can also aid in determining the value of patents during patent transactions, thereby contributing to the pursuit of sustainable technological innovation.

6.2. Further Studies

However, this study had several limitations. First, the selection of learning variable candidates in this study was based on variables utilized in prior research, but the final adoption of these variables was determined by reflecting the researchers’ insightful judgment. Future research could improve model precision using variable extraction methodologies (e.g., feature importance and recursive feature elimination) to identify significant variables for the target variable and utilize them in the learning process.

Second, the regression model in this study, which predicts patent life as a continuous value, ensures precision but has the limitation that errors in a single predicted value can introduce potential uncertainty in real-world decision-making. Future research could explore a complementary approach to address this limitation by re-framing the problem as a classification task using criteria from the patent fee schedule detailed in Table 9. Ultimately, a more sophisticated solution could involve proposing a multi-task learning model that fuses regression and classification problems to simultaneously ensure both predictive precision and decisional clarity. In addition, linking patent information with firm-level market and financial data (e.g., market size, revenue, assignee valuation) is intended to enhance decisional clarity beyond the fee-aligned framework.

Third, this study relied solely on quantitative patent information and did not incorporate other data types (e.g., patent titles and abstracts). Incorporating features extracted from titles or abstracts into the learning process could improve the patent life prediction model’s reliability. To this end, we will consider incorporating text-derived features by employing Transformer-based language models (e.g., domain-adapted BERT variants) to obtain semantic embeddings of titles and abstracts [62,63] and, where appropriate, lightweight 1D CNNs to capture local n-gram patterns [64,65,66]. Such enhancements may also broaden the model’s applicability to wider technology evaluation contexts, including studies aimed at promoting sustainable technological innovation.

Author Contributions

Conceptualization, S.-H.P. and M.-S.K.; Methodology, S.-H.P., M.-S.K. and J.R.; Software, S.-H.P., S.-H.L. and J.K.K.; Validation, S.-H.P. and S.-H.O.; Formal Analysis, S.-H.P. and S.-H.O.; Investigation, S.-H.P. and S.-H.L.; Resources, S.-H.L. and J.K.K.; Data Curation, S.-H.P., M.-S.K. and J.R.; Writing—Original Draft Preparation, S.-H.P. and M.-S.K.; Writing—Review and Editing, S.-H.P., M.-S.K., J.R. and T.-E.S.; Visualization, J.K.K. and S.-H.O.; Supervision, S.-H.P. and T.-E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are not publicly available for direct sharing due to subscription and licensing restrictions from the Korean Intellectual Property Rights Information Service (KIPRIS).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. The Seven Machine Learning Techniques Used in This Study

Appendix A.1. Random Forest (RF)

Random Forest is an ensemble technique used for solving classification and regression problems and comprises multiple tree structures. Each tree is composed of independent, identically distributed random vectors. These trees are generated using multiple bootstrap samples from the original data and a random selection of the predictor variables. Further, the model performance can be enhanced by generating sufficient numbers of trees [67].

Appendix A.2. XGBoost (XGB)

XGB is an algorithm that combines a cause-based decision tree (CBDT) and a gradient boosting machine (GBM), capable of handling classification or regression problems rapidly and accurately for almost all types of data [68]. XGB sequentially trains multiple decision trees, and each tree works to correct the errors of the previous trees, thereby improving the model.

Appendix A.3. Light Gradient Boosting Machine (LGBM)

The LGBM enhances computational speed and memory efficiency by using histogram-based decision trees and adopts a leaf-wise tree growth method to significantly reduce loss. Further, it optimizes the training speed and performance through core techniques, such as gradient-based one-side sampling and exclusive feature bundling [69].

Appendix A.4. Deep Neural Network (DNN)

A DNN comprises multiple hidden layers and nodes designed to solve classification and regression problems. Each neural network operates based on activation functions using different weights. These networks collectively compute the most suitable direction as specified by the designer. A DNN includes numerous user-adjustable hyperparameters; common examples include optimization functions, the number of hidden layers, and loss functions [70].

Appendix A.5. Support Vector Regression (SVR)

SVR, a regression analysis technique based on a support vector machine (SVM), aims to determine the optimal regression line for particular training samples. For training samples

D = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{3}, y_{3})}

, with the expected value

f (x)

, the SVR is expressed as shown in Equation (A1) [71]:

f (x) = \sum_{i = 1}^{m} (\hat{α_{i}} - α_{i}) x_{i}^{T} x + b

(A1)

Appendix A.6. Linear Regression (LR)

LR is an algorithm that identifies the linear relationship between the input and output variables, calculates predictions as the weighted sum of the input variables, and finally aims to minimize the difference between the actual and predicted values. The LR model is expressed as shown in Equation (A2) [72]:

\hat{y} = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{n} β_{n}

(A2)

Appendix A.7. Auto Encoder (AE)

An AE is a neural network that reduces the dimensionality of input data to a lower dimension and then reconstructs it back to the original data. It comprises an encoder and a decoder. The AE aims to learn the principal features of the data by minimizing the difference between the input and reconstructed data, typically using the mean squared error or cross-entropy to train the neural network [73].

Appendix B. Detailed Principal Component Analysis (PCA) Results

Table A1. Explained Variance Ratio (EVR) and Cumulative Values of Principal Components.

Component	EVR	Cumulative EVR
PC1	0.179047	0.179047
PC2	0.097834	0.276881
PC3	0.075285	0.352166
PC4	0.074607	0.426773
PC5	0.056315	0.483088
PC6	0.052387	0.535474
PC7	0.048419	0.583893
PC8	0.041608	0.625501
PC9	0.040267	0.665768
PC10	0.039635	0.705402
PC11	0.03933	0.744732
PC12	0.038132	0.782864
PC13	0.036764	0.819628
PC14	0.035369	0.854997
PC15	0.0337	0.888697
PC16	0.030493	0.919191
PC17	0.025603	0.944794
PC18	0.022843	0.967637
PC19	0.014057	0.981694
PC20	0.009846	0.991541

References

Hall, B.H.; Jaffe, A.B.; Trajtenberg, M. Market value and patent citations: A first look. Rand J. Econ. 2005, 36, 16–38. [Google Scholar]
Squicciarini, M.; Dernis, H.; Criscuolo, C. Measuring patent quality: Indicators of technological and economic value. OECD Sci. Technol. Ind. Work. Pap. 2013. [CrossRef]
Schankerman, M.; Pakes, A. Estimates of the value of patent rights in European countries during the post-1950 period. Econ. J. 1986, 96, 1052–1076. [Google Scholar] [CrossRef]
Kyung, S.T. The quality of patents: A multilateral evaluation for Korea. J. Intellect. Prop. 2013, 8, 99–120. [Google Scholar] [CrossRef]
Higham, K.; de Rassenfosse, G.; Jaffe, A.B. Patent quality: Towards a systematic framework for analysis and measurement. Res. Policy 2021, 50, 104215. [Google Scholar] [CrossRef]
Griliches, Z. Patent statistics as economic indicators: A survey. In R&D and Productivity: The Econometric Evidence; University of Chicago Press: Chicago, IL, USA, 1998; pp. 287–343. [Google Scholar]
Park, S.T.; Kim, Y.K. A study on patent valuation for the activation of IP finance. J. Digit. Converg. 2012, 10, 315–321. [Google Scholar]
Girgin Kalıp, N.G.; Öcalan, Ö.; Aytekin, Ç. Qualitative and quantitative patent valuation methods: A systematic literature review. World Pat. Inf. 2022, 69, 102111. [Google Scholar] [CrossRef]
Choi, Y.M.; Cho, D. A study on the effect of the renewal-fee payment cycle in the decision of patent right retention: Focusing on the sunk cost and endowment perspective. J. Digit. Converg. 2021, 19, 65–79. [Google Scholar]
Hikkerova, L.; Doat, M.; Carayol, N. Patent life cycle: New evidence. Technol. Forecast. Soc. Change 2014, 88, 313–324. [Google Scholar] [CrossRef]
Guellec, D.; van Pottelsberghe de la Potterie, B. Applications, grants, and the value of patent. Econ. Lett. 2000, 69, 109–114. [Google Scholar] [CrossRef]
van Zeebroeck, N.; van Pottelsberghe de la Potterie, B. The vulnerability of patent value determinants. Econ. Innov. New Technol. 2011, 20, 283–308. [Google Scholar] [CrossRef]
Kim, Y.; Kim, M.G.; Kim, Y.M. Prediction of patent lifespan and analysis of influencing factors using machine learning. J. Intell. Inf. Syst. 2022, 28, 147–170. [Google Scholar] [CrossRef]
Chiu, Y.J.; Chen, Y.W. Using AHP in patent valuation. Math. Comput. Model. 2007, 46, 1054–1062. [Google Scholar] [CrossRef]
Kim, M.S.; Lee, J.H.; Oh, E.-S.; Lee, C.-H.; Choi, J.-H.; Jang, Y.-J.; Lee, J.-H.; Sung, T.-E. A study on deep learning-based intelligent technology valuation: Focusing on the models of qualitative evaluation factors estimation using deep neural networks. J. Korea Technol. Innov. Soc. 2021, 24, 1141–1162. [Google Scholar] [CrossRef]
Harhoff, D.; Narin, F.; Scherer, F.M.; Vopel, K. Citations, family size, opposition and the value of patent rights. Res. Policy 2003, 32, 1343–1363. [Google Scholar] [CrossRef]
Trappey, A.J.; Wu, C.Y.; Lin, C.-W.; Trappey, C.V. A patent quality analysis for innovative technology and product development. Adv. Eng. Inform. 2012, 26, 26–34. [Google Scholar] [CrossRef]
Lanjouw, J.O.; Pakes, A.; Putnam, J. How to count patents and value intellectual property: The uses of patent renewal and application data. J. Ind. Econ. 1998, 46, 405–432. [Google Scholar] [CrossRef]
Liu, X.; Yan, J.; Xiao, S.; Wang, X.; Zha, H.; Chu, S.M. On predictive patent valuation: Forecasting patent citations and their types. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1438–1444. [Google Scholar] [CrossRef]
Carpenter, M.P.; Narin, F.; Woolf, P. Citation rates to technologically important patents. World Pat. Inf. 1981, 3, 160–163. [Google Scholar] [CrossRef]
Albert, M.B.; Avery, D.; Narin, F.; McAllister, P. Direct validation of citation counts as indicators of industrially important patents. Res. Policy 1991, 20, 251–259. [Google Scholar] [CrossRef]
Yang, G.C.; Li, G.; Li, C.Y.; Zhao, Y.H.; Zhang, J.; Liu, T.; Chen, D.-Z.; Huang, M.H. Using the comprehensive patent citation network (CPC) to evaluate patent value. Scientometrics 2015, 105, 1319–1346. [Google Scholar] [CrossRef]
Thomas, P. The effect of technological impact upon patent renewal decisions. Technol. Anal. Strateg. Manag. 1999, 11, 181–197. [Google Scholar] [CrossRef]
Baudry, M.; Dumont, B. Patent renewals as options: Improving the mechanism for weeding out lousy patents. Rev. Ind. Organ. 2006, 28, 41–62. [Google Scholar]
Grönqvist, C. The private value of patents by patent characteristics: Evidence from Finland. J. Technol. Transf. 2009, 34, 159–168. [Google Scholar] [CrossRef]
Han, E.J.; Sohn, S.Y. Patent valuation based on text mining and survival analysis. J. Technol. Transf. 2015, 40, 821–839. [Google Scholar] [CrossRef]
Choi, Y.M.; Cho, D. A study on the time-dependent changes of the intensities of factors determining patent lifespan from a biological perspective. World Pat. Inf. 2018, 54, 1–17. [Google Scholar] [CrossRef]
Choi, Y.M.; Kim, J.H.; Lee, H.S. A three-dimensional patent evaluation model that considers the factors for calculating the internal and external value of a patent: Arrhenius chemical reaction kinetics-based patent lifespan prediction. J. Digit. Converg. 2021, 19, 113–132. [Google Scholar]
Choo, K.N.; Park, K.H. A study on the determinants of the economic value of patents using renewal data. Knowl. Manag. Res. 2010, 11, 65–81. [Google Scholar]
Danish, M.S.; Yousaf, Z.; Ali, R.; Ahmad, T. Determinants of patent survival in emerging economies: Evidence from residential patents in India. J. Public Aff. 2021, 21, e2211. [Google Scholar] [CrossRef]
Jang, K.Y.; Yang, D.W. The empirical study on determinants affecting patent life cycle: Using Korean patents renewal data. J. Intellect. Prop. 2014, 9, 79–108. [Google Scholar] [CrossRef]
Kim, M.S.; Lee, J.-H.; Lee, S.-H.; Lee, S.-H.; Rhee, J.; Park, S.-H.; Sung, T.-E. A study on the effect of intrinsic and extrinsic factors on patent life: Focusing on medical device industry and macroeconomic conditions. J. Korea Technol. Innov. Soc. 2023, 26, 479–497. [Google Scholar] [CrossRef]
Choi, J.; Jeong, B.; Yoon, J.; Coh, B.-Y.; Lee, J.-M. A novel approach to evaluating the business potential of intellectual properties: A machine learning-based predictive analysis of patent lifetime. Comput. Ind. Eng. 2020, 145, 106544. [Google Scholar] [CrossRef]
Liu, J.; Li, P.; Liu, X. Patent lifetime prediction using LightGBM with a customized loss. PeerJ Comput. Sci. 2024, 10, e2044. [Google Scholar] [CrossRef]
Okuno, S.; Aihara, K.; Hirata, Y. Forecasting high-dimensional dynamics exploiting suboptimal embeddings. Sci. Rep. 2020, 10, 664. [Google Scholar] [CrossRef]
Kwon, H.; Park, J.; Lee, Y. Stacking ensemble technique for classifying breast cancer. Healthc. Inform. Res. 2019, 25, 283–288. [Google Scholar] [CrossRef]
Rahman, M.S.; Rahman, H.R.; Prithula, J.; Chowdhury, M.E.H.; Ahmed, M.U.; Kumar, J.; Murugappan, M.; Khan, M.S. Heart failure emergency readmission prediction using stacking machine learning model. Diagnostics 2023, 13, 1948. [Google Scholar] [CrossRef] [PubMed]
Kamateri, E.; Salampasis, M.; Diamantaras, K. An ensemble framework for patent classification. World Pat. Inf. 2023, 75, 102233. [Google Scholar] [CrossRef]
Kamateri, E.; Salampasis, M. Ensemble method for classification in imbalanced patent data. In Proceedings of the PatentSemTech@SIGIR 2023, Taipeh, Taiwan, 23–27 July 2023; pp. 27–32. [Google Scholar]
Korean Intellectual Property Rights Information Service. KIPRIS Main Page. Available online: https://www.kipris.or.kr/khome/main.do (accessed on 11 February 2025).
Cokluk, O.; Kayri, M. The effects of methods of imputation for missing values on the validity and reliability of scales. Educ. Sci. Theory Pract. 2011, 11, 303–309. [Google Scholar]
Lim, J.; Kim, C.; Gu, J. Analysis of causal relationship between patent indicators and firm performance. Korean Manag. Sci. Rev. 2011, 28, 63–74. [Google Scholar]
Ko, N.; Kim, Y.; Park, J. An intellectual property evaluation model using patent transactions: A deep neural network approach. In Proceedings of the Korean Institute of Industrial Engineers Conference, Daejeon, Republic of Korea, 2–3 November 2017. [Google Scholar]
Korean Intellectual Property Office (Patent System Division). Notice on the Application for Batch Examination of Patents, Utility Models, Trademarks, and Designs. Patent Office Notice No. 2023-22, 29 December 2023. Available online: https://www.law.go.kr/admRulLsInfoP.do?admRulId=44250&efYd=0 (accessed on 29 December 2023).
Fischer, T.; Henkel, J. Patent trolls on markets for technology–An empirical analysis of NPEs’ patent acquisitions. Res. Policy 2012, 41, 1519–1533. [Google Scholar] [CrossRef]
Harhoff, D.; Wagner, S. The duration of patent examination at the European patent office. Manag. Sci. 2009, 55, 1969–1984. [Google Scholar] [CrossRef]
Chen, C. Using machine learning to forecast patent quality—Take ‘vehicle networking’ industry for example. In Transdisciplinary Engineering: A Paradigm Shift; IOS Press: Amsterdam, The Netherlands, 2015; pp. 993–1002. [Google Scholar]
Cockburn, I.M.; MacGarvie, M. Entry and patenting in the software industry. Manag. Sci. 2006, 57, No. 12563. [Google Scholar] [CrossRef]
Jeongmin, O.; Jung, T. Research on the relationship between technology cycle time and technology lifespan—Focusing on the patents from National R&D projects in Korea. Technol. Manag. 2021, 6, 57–75. [Google Scholar]
Ebrahimighahnavieh, M.A.; Luo, S.; Chiong, R. Deep learning to detect Alzheimer’s disease from neuroimaging: A systematic literature review. Comput. Methods Programs Biomed. 2020, 187, 105242. [Google Scholar] [CrossRef] [PubMed]
Singh, T.; Kalra, R.; Mishra, S.; Satakshi; Kumar, M. An efficient real-time stock prediction exploiting incremental learning and deep learning. Evol. Syst. 2023, 14, 919–937. [Google Scholar]
Jain, S. A Comparative Study of Stock Market Prediction Models. In Deep Learning Tools for Predicting Stock Market Movements; Wiley: Hoboken, NJ, USA, 2024; pp. 249–269. [Google Scholar] [CrossRef]
Palma, G.; Geraci, G.; Rizzo, A. Federated Learning and Neural Circuit Policies: A Novel Framework for Anomaly Detection in Energy-Intensive Machinery. Energies 2025, 18, 936. [Google Scholar] [CrossRef]
Chan, J.Y.L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.W.; Chen, Y.L. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
Sundus, K.I.; Hammo, B.H.; Al-Zoubi, M.B.; Al-Omari, A. Solving the multicollinearity problem to improve the stability of machine learning algorithms applied to a fully annotated breast cancer dataset. Inform. Med. Unlocked 2022, 33, 101088. [Google Scholar] [CrossRef]
Alabduljabbar, H.; Khan, M.; Awan, H.H.; Eldin, S.M.; Alyousef, R.; Mohamed, A.M. Predicting ultra-high-performance concrete compressive strength using gene expression programming method. Case Stud. Constr. Mater. 2023, 18, e02074. [Google Scholar] [CrossRef]
Khan, M.; Javed, M.F. Towards sustainable construction: Machine learning based predictive models for strength and durability characteristics of blended cement concrete. Mater. Today Commun. 2023, 37, 107428. [Google Scholar] [CrossRef]
Javed, M.F.; Siddiq, B.; Onyelowe, K.; Khan, W.A.; Khan, M. Metaheuristic optimization algorithms-based prediction modeling for titanium dioxide-assisted photocatalytic degradation of air contaminants. Results Eng. 2024, 23, 102637. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA; Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Duda, R.; Hart, P.; Stork, D. Pattern Classification, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2001. [Google Scholar]
De Rassenfosse, G.; Jaffe, A.B. Are patent fees effective at weeding out low-quality patents? J. Econ. Manag. Strategy 2018, 27, 134–148. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar; pp. 1746–1751. [Google Scholar] [CrossRef]
Liu, B.; Song, W.; Zheng, M.; Fu, C.; Chen, J.; Wang, X. Semantically enhanced selective image encryption scheme with parallel computing. Expert Syst. Appl. 2025, 279, 127404. [Google Scholar] [CrossRef]
Soni, S.; Chouhan, S.S.; Rathore, S.S. TextConvoNet: A convolutional neural network based architecture for text classification. Appl. Intell. 2023, 53, 14249–14268. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Shehadeh, A.; Alshboul, O.; Al Mamlook, R.E.; Hamedat, O. Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Autom. Constr. 2021, 129, 103827. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.-R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
Zhan, Y.; Zhang, H.; Liu, Y. Forecast of meteorological and hydrological features based on SVR model. In Proceedings of the 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Changsha, China, 26–28 March 2021; pp. 579–583. [Google Scholar] [CrossRef]
Weisberg, S. Applied Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Wang, Y.; Yao, H.; Zhao, S. Auto-encoder based dimensionality reduction. Neurocomputing 2016, 184, 232–242. [Google Scholar] [CrossRef]

Figure 1. Stepwise Data Preprocessing Process.

Figure 2. Frequency of Patent Life.

Figure 3. Experimental Procedure.

Figure 4. Feature Correlation Matrix.

Figure 5. Performance Comparison of PCA-Transformed & Proposed Datasets.

Table 1. Summary of Reviewed Studies.

Study (Ref.)	Core Principle & Method	Similarities	Differences
[26]	Survival analysis with Weibull distribution	Based on survival analysis; uses intrinsic factors	Early survival model; assumes Weibull distribution
[27]	Cox proportional hazards model	Based on survival analysis; uses intrinsic factors	Applies the Cox proportional hazards model; categorizes factors into three groups
[28]	Model grounded in Arrhenius chemical reaction rate theory	Uses intrinsic factors	Applies reaction rate theory; discrete classification
[29]	Cox proportional hazards model	Based on survival analysis; uses intrinsic factors	Combines assignee (external) characteristics with intrinsic factors
[32]	Time-dependent Cox regression model	Based on survival analysis; uses various factors	Time-dependent Cox model; considers three factor groups (intrinsic/extrinsic/industry) in an integrated manner
[13]	Gradient Boosting Model	Uses intrinsic and extrinsic factors; discrete classification	Employs a machine learning model; four-class classification
[33]	Feed-Forward Neural Network (FFNN)	Uses intrinsic and extrinsic factors; discrete classification	Uses a tuning (FFNN) model; reports the highest performance (0.85); proposes a nine-stage evaluation system
[34]	LightGBM with focal loss	Uses intrinsic and extrinsic factors; discrete classification	Uses a machine learning model (LightGBM); demonstrates the usefulness of the neural-network model through comparison with FFNN

Table 2. Patent Feature Configuration.

Feature	Explanation
Number of applicants	The number of applicants having applied for the patent
Number of agents	An agent is a patent attorney appointed when filing a patent application
Number of families	The number of international patent applications that are connected through their subject matter and that follow claims of priority
Number of IPC	The number of different IPC assigned to the patent application
Number of Claims	Counts the number of claims the patent makes
Ratio of independent claims	The percentage of independent claims in entire claims
Period from application to grant	The number of days between filing a patent application and receiving the patent grant.
IPC(A~H)	One-hot encoded binary variables for the eight IPC sections (A to H), resulting in 8 distinct features.
Size of IPC	The number of patents that were registered in the main IPC at the time of the patent was registered.
IPC Activity	The number of patents registered in the main IPC in the five years since the patent was registered.
Average of IPC Activity	IPC activity at time of registration/5.
Ratio of IPC Activity	IPC activity at time of registration/IPC size.
IPC Competitiveness	Number of applicants with patents registered in the main IPC at the time of filing
Number of Patent right Transfers	The frequency of legal events related to patent ownership transference
Duration of Patent	Maximum remaining legal life of a patent
Number of citations	The number of times the patent has been cited in the literature or patents
Citation impact	The extent to which the patent has influenced technological innovation activities since its filing
TCT Index	The Cycle of Technology
Claim impact	Number of Claims/Average number of Claims that same IPC and registered year
Diversity impact	Number of IPCs in the patent/average number of IPCs in the patent family with the same registration year and IPC

Table 3. Experimental Settings.

Software
Library	Version
Data Handling Library
pandas	2.2.1
numpy	1.26.4
ML Model Library
scikit-learn	1.4.1.post1
xgboost	2.0.3
tensorflow	2.10.1
keras	2.10.0
lightgbm	4.3.0
Hardware
Feature	Specification
CPU Architecture & Model	Intel Core i7-13700K
CPU Cores	16
CPU Threads	24
CPU Base/Max Frequency (GHz)	3.4/5.3
GPU Architecture & Model	NVIDIA GeForce RTX 3060
CUDA Cores	3584
GPU Memory (GB)	12, GDDR6
RAM (GB)	64, DDR5-5600
Operating System	Windows 11, version 23H2

Table 4. Optimized Hyperparameters and Search Space for the Seven Machine Learning Models.

No	Model	Hyperparameters	Search SPACE
1	RF	n_estimators: 280 max_depth: 30 min_samples_leaf: 1 min_samples_split: 2	50 to 300 5 to 30 1 to 5 2 to 10
2	XGB	n_estimators: 290 max_depth: 14 learning_rate: 0.03259162240984821 colsample_bytree: 0.8780425750115004 subsample: 0.9229737811346409	50 to 300 3 to 15 0.01 to 0.3 0.5 to 1.0 0.5 to 1.0
3	LGBM	n_estimators: 280 max_depth: 15 learning_rate: 0.23326321741873107 colsample_bytree: 0.8787077808247973	50 to 300 3 to 15 0.01 to 0.3 0.5 to 1.0
4	DNN	epoch: 200 learning rate: 0.0012536297097257307 optimizer: Adam activation: relu loss function: Mean Absolute Error batch size: 128	Dense unit1 to unit4: 8 to 256 Learning rate: 0.0001 to 0.01 Dropout 1 to 4 rate: 0.1 to 0.5 - - [64, 128, 256]
5	SVR	C: 19.62581040283463 epsilon: 0.6967828360688615 kernel: ‘rbf’	$e^{- 3} \approx 0.05$ to $e^{3} \approx 20.09$ 0.01 to 1.0 [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
6	LR	-	-
7	AE	Epochs: 300 learning_rate: 0.001 optimizer: Adam decoder_activation: relu loss function: Mean Absolute Error encoding_dim: 32	- [0.0001, 0.001, 0.01] - [‘sigmoid’, ‘relu’] - [16, 32, 64, 128]

Table 5. Evaluation Results of the Seven Baseline Models.

No	Model	MAE	MAPE	MSE	RMSE
1	RF	863.87	0.36	1,206,533.66	1098.42
2	XGB	866.19	0.36	1,210,977.01	1100.44
3	LGBM	892.67	0.38	1,243,363.19	1115.06
4	DNN	881.48	0.34	1,337,037.92	1156.3
5	SVR	932.11	0.38	1,372,223.57	1171.42
6	LR	956.04	0.41	1,352,985.65	1163.18
7	AE	928.25	0.38	1,394,592.92	1180.93

Table 6. Evaluation Results of the Stacking Ensemble Configurations.

No	Configurations	MAE	MAPE	MSE	RMSE
1	RF, XGB	860.80	0.35	1,194,991.14	1093.16
2	RF, DNN	852.81	0.35	1,193,663.10	1092.55
3	XGB, DNN	861.83	0.34	1,224,178.35	1106.43
4	RF, XGB, DNN	857.94	0.36	1,209,822.37	1099.92

Table 7. Performance Evaluation of the Proposed Model Based on K-Fold Cross Validation.

K	MAE	MAPE	MSE	RMSE
3	863.12	0.34	1,239,151.14	1113.16
5	852.52	0.34	1,222,869.6	1105.76
7	856.66	0.34	1,228,866.32	1108.51

Table 8. Predicted vs. Actual Patent Life of Active Patents.

Application Number	Filing Date	Current Duration (Days)	Predicted Life (Days)	Difference Between Duration and Prediction (Days)
1020140029920	13 March 2014	3992	4979.67	987.67
1020180004470	12 January 2018	2591	3241.81	650.81
1020170091204	18 July 2017	2769	5470.13	2701.13
1020140130055	29 September 2014	3792	4444.85	652.85
1020140034255	24 March 2014	3981	5192.61	1211.61

Table 9. Patent Registration Fees—KIPO.

Right	Fee Type	1~3 Years (SRF)	4~6 Years (ARF)	7~9 Years (ARF)	10~12 Years (ARF)	13~25 Years (ARF)
Patent	Base fee	₩ 13,000	₩ 36,000	₩ 90,000	₩ 216,000	₩ 324,000
Patent	Additional fee (Per Claim)	₩ 12,000	₩ 20,000	₩ 34,000	₩ 49,000	₩ 49,000

SRF: Setup Registration Fee; ARF: Annual Registration Fee.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, S.-H.; Kim, M.-S.; Rhee, J.; Lee, S.-H.; Kim, J.K.; Oh, S.-H.; Sung, T.-E. Predicting Patent Life Using Robust Ensemble Algorithm. Sustainability 2025, 17, 9658. https://doi.org/10.3390/su17219658

AMA Style

Park S-H, Kim M-S, Rhee J, Lee S-H, Kim JK, Oh S-H, Sung T-E. Predicting Patent Life Using Robust Ensemble Algorithm. Sustainability. 2025; 17(21):9658. https://doi.org/10.3390/su17219658

Chicago/Turabian Style

Park, Sang-Hyeon, Min-Seung Kim, Jaewon Rhee, Sang-Hwa Lee, Jeong Kyu Kim, Si-Hyun Oh, and Tae-Eung Sung. 2025. "Predicting Patent Life Using Robust Ensemble Algorithm" Sustainability 17, no. 21: 9658. https://doi.org/10.3390/su17219658

APA Style

Park, S.-H., Kim, M.-S., Rhee, J., Lee, S.-H., Kim, J. K., Oh, S.-H., & Sung, T.-E. (2025). Predicting Patent Life Using Robust Ensemble Algorithm. Sustainability, 17(21), 9658. https://doi.org/10.3390/su17219658

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Patent Life Using Robust Ensemble Algorithm

Abstract

1. Introduction

2. Literature Review

2.1. Proxy as a Patent Quality

2.2. Prediction of Patent Life

2.3. Stacking Ensemble

3. Data & Methodology

3.1. Data

3.1.1. Data Collection & Preprocessing

3.1.2. Features

3.2. Experiment Setting & Methodology

3.2.1. Experiment Setting

3.2.2. Methodology

4. Experimental Results

5. Discussion

6. Conclusions and Further Studies

6.1. Conclusions

6.2. Further Studies

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Seven Machine Learning Techniques Used in This Study

Appendix A.1. Random Forest (RF)

Appendix A.2. XGBoost (XGB)

Appendix A.3. Light Gradient Boosting Machine (LGBM)

Appendix A.4. Deep Neural Network (DNN)

Appendix A.5. Support Vector Regression (SVR)

Appendix A.6. Linear Regression (LR)

Appendix A.7. Auto Encoder (AE)

Appendix B. Detailed Principal Component Analysis (PCA) Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI