# Patent Keyword Analysis Using Regression Modeling Based on Quantile Cumulative Distribution Function

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

## 3. Proposed Method

#### 3.1. Patent–Keyword Matrix

- (TM.1) Searching patent documents related to target technology.
- (1-1) Using keyword searching equation, we collect the patent documents related to the target technology.
- (1-2) By examining all the retrieved patents, we select valid patents that can be used for analysis.
- (TM.2) Building structured patent data by text mining.
- (2-1) Using tokenization and normalization, we preprocess the patent documents to create the corpus.
- (2-2) By extracting keywords from the corpus, we construct a patent–keyword matrix.

#### 3.2. Quantile Regression Modeling Based on Cumulative Distribution Function for PKA

## 4. Experiments and Results

- $Y$: blockchain
- ${X}_{1},{X}_{2},\dots ,{X}_{10}$: access, authentication, bitcoin, cryptocurrency, databank, distributor, encash, ledger, network, secretkey

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Feinerer, I.; Hornik, K. Package ‘tm’ Version 0.7-12, Text Mining Package; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
- Feinerer, I.; Hornik, K.; Meyer, D. Text mining infrastructure in R. J. Stat. Softw.
**2008**, 25, 1–54. [Google Scholar] [CrossRef] - Park, S.; Jun, S. Zero-Inflated Patent Data Analysis Using Compound Poisson Models. Appl. Sci.
**2023**, 13, 4505. [Google Scholar] [CrossRef] - Uhm, D.; Jun, S. Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples. Future Internet
**2022**, 14, 211. [Google Scholar] [CrossRef] - Jun, S. Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling. Computers
**2023**, 12, 258. [Google Scholar] [CrossRef] - Park, S.; Jun, S. Sustainable Technology Analysis of Blockchain Using Generalized Additive Modeling. Sustainability
**2020**, 12, 10501. [Google Scholar] [CrossRef] - Wagh, Y.S.; Kamalja, K.K. Zero-inflated models and estimation in zero-inflated Poisson distribution. Commun. Stat.-Simul. Comput.
**2018**, 47, 2248–2265. [Google Scholar] [CrossRef] - Feng, C.X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl.
**2021**, 8, 8. [Google Scholar] [CrossRef] - Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Hilbe, J.M. Negative Binomial Regression, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Hilbe, J.M. Modeling Count Data; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Hogg, R.V.; Mckean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Pearson: Essex, UK, 2020. [Google Scholar]
- Shou, Y.; Smithson, M. cdfquantreg: An R Package for CDF-Quantile Regression. J. Stat. Softw.
**2019**, 88, 1–30. [Google Scholar] [CrossRef] - Shou, Y.; Smithson, M. Package ‘cdfquantreg’ Version 1.3.1-2, Quantile Regression for Random Variables on the Unit Interval Package; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
- Ding, J.; Du, D.; Duan, D.; Xia, Q.; Zhang, Q. A network analysis of global competition in photovoltaic technologies: Evidence from patent data. Appl. Energy
**2024**, 375, 124010. [Google Scholar] [CrossRef] - Shi, R.; Chai, K.; Wang, H.; Guo, S.; Zhai, Y.; Huang, J.; Yang, S.; Li, J.; Zhou, J.; Qiao, C.; et al. Comparative effectiveness of five Chinese patent medicines for non-alcoholic fatty liver disease: A systematic review and Bayesian network meta-analysis. Phytomedicine
**2024**, 135, 156124. [Google Scholar] [CrossRef] - Teshome, M.B.; Podrecca, M.; Orzes, G. Technological trends in mountain logistics: A patent analysis. Res. Transp. Bus. Manag.
**2024**, 57, 101202. [Google Scholar] [CrossRef] - Elsen, M.; Tietze, F. Contributions from low- and middle-income countries to the development of climate change adaptation technologies: A patent analysis. Technol. Forecast. Soc. Change
**2024**, 209, 123660. [Google Scholar] [CrossRef] - Zhao, X.; Wu, W.; Wu, D. How does AI perform in industry chain? A patent claims analysis approach. Technol. Soc.
**2024**, 79, 102720. [Google Scholar] [CrossRef] - Patel, M.S.; Franceschelli, D.; Grossbach, A.; Zhang, J.K.; Mercier, P.A.; Mattei, T.A. Top 50 Spine Surgery Publications Most Cited by Patents: A Bibliometric Analysis Focused on Research Driving Innovation. World Neurosurg.
**2024**, 191, 234–244. [Google Scholar] [CrossRef] - Ovsyannikov, I.R.; Zhdaneev, O.V. Forecast of innovative activity in key areas of energy transition technologies based on analysis of patent activity. Int. J. Hydrogen Energy
**2024**, 87, 1261–1276. [Google Scholar] [CrossRef] - Bruce, P.; Bruce, A.; Gedeck, P. Practical Statistics for Data Scientists, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]
- Theodoridis, S. Machine Learning a Bayesian and Optimization Perspective; Elsevier: London, UK, 2015. [Google Scholar]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Smithson, M.; Shou, Y. CDF-quantile distributions for modelling random variables on the unit interval. Br. J. Math. Stat. Psychol.
**2017**, 70, 412–438. [Google Scholar] [CrossRef] - Chafamo, D.; Shanmugam, V.; Tokcan, N. C-ziptf: Stable tensor factorization for zero-infated multi-dimensional genomics data. BMC Bioinform.
**2024**, 25, 323. [Google Scholar] [CrossRef] - Yirdaw, B.E.; Debusho, L.K.; Samuel, A. Application of longitudinal multilevel zero infated Poisson regression in modeling of infectious diseases among infants in Ethiopia. BMC Infect. Dis.
**2024**, 24, 927. [Google Scholar] [CrossRef] - Zhou, W.; Huang, D.; Liang, Q.; Huang, T.; Wang, X.; Pei, H.; Chen, S.; Liu, L.; Wei, Y.; Qin, L.; et al. Early warning and predicting of COVID-19 using zero-infated negative binomial regression model and negative binomial regression model. BMC Infect. Dis.
**2024**, 24, 1006. [Google Scholar] [CrossRef] - Ren, J.; Loughnan, R.; Xu, B.; Thompson, W.K.; Fan, C.C. Estimating the total variance explained by whole-brain imaging for zero-inflated outcomes. Commun. Biol.
**2024**, 7, 836. [Google Scholar] [CrossRef] - KIPRIS. Korea Intellectual Property Rights Information Service. Available online: www.kipris.or.kr (accessed on 1 July 2023).
- USPTO. The United States Patent and Trademark Office. Available online: http://www.uspto.gov (accessed on 1 June 2023).
- R Development Core Team. R: A Language and Environment for Statistical Computing Version 4.4.0, R Foundation for Statistical Computing. Available online: http://www.R-project.org (accessed on 1 February 2024).
- Foundation for Open Access Statistics, Journal of Statistical Software. Available online: https://www.jstatsoft.org (accessed on 1 June 2024).
- Jackman, S.; Tahk, A.; Zeileis, A.; Maimone, C.; Fearon, J.; Meers, Z. Package ‘pscl’ Version 1.5.9; Political Science Computational Laboratory; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
- Meyer, P.G.; Cherstvy, A.G.; Seckler, H.; Hering, R.; Blaum, N.; Jeltsch, F.; Metzler, R. Directedeness, correlations, and daily cycles in springbok motion: From data via stochastic models to movement prediction. Phys. Rev. Res.
**2023**, 5, 043129. [Google Scholar] [CrossRef]

Keyword | Min | Median | Mean | Max |
---|---|---|---|---|

blockchain | 0 | 2 | 4.4140 | 38 |

access | 0 | 0 | 0.2606 | 11 |

authentication | 0 | 0 | 0.4763 | 16 |

bitcoin | 0 | 0 | 0.1701 | 20 |

cryptocurrency | 0 | 0 | 0.2115 | 11 |

databank | 0 | 0 | 1.7800 | 24 |

distributor | 0 | 0 | 0.5626 | 14 |

encash | 0 | 0 | 0.1058 | 4 |

ledger | 0 | 0 | 0.8892 | 26 |

network | 0 | 0 | 0.9332 | 16 |

secretkey | 0 | 0 | 0.5144 | 10 |

Model | Loglikelihood | AIC | BIC | ||||||
---|---|---|---|---|---|---|---|---|---|

QRM | LRM | ZIP | QRM | LRM | ZIP | QRM | LRM | ZIP | |

access | 1370.29 | 549.57 | −3621.28 | −2734.59 | −1093.14 | 7250.55 | −2719.36 | −1077.91 | 7270.85 |

authentication | 1370.81 | 548.26 | −3627.43 | −2735.61 | −1090.51 | 7262.86 | −2720.39 | −1075.29 | 7270.85 |

bitcoin | 1397.82 | 553.43 | −3610.05 | −2789.63 | −1100.85 | 7228.10 | −2774.41 | −1085.63 | 7248.40 |

cryptocurrency | 1402.22 | 557.10 | −3598.48 | −2812.43 | −1108.21 | 7204.97 | −2797.20 | −1092.98 | 7225.27 |

databank | 1383.69 | 561.47 | −3599.97 | −2761.38 | −1116.94 | 7207.93 | −2746.16 | −1101.71 | 7228.23 |

distributor | 1370.23 | 548.67 | −3625.33 | −2734.45 | −1091.35 | 7258.66 | −2719.23 | −1075.13 | 7278.96 |

encash | 1370.55 | 549.89 | −3623.16 | −2735.11 | −1093.77 | 7254.32 | −2719.88 | −1078.55 | 7274.62 |

ledger | 1390.69 | 573.09 | −3528.22 | −2775.38 | −1140.18 | 7064.44 | −2760.16 | −1124.95 | 7084.74 |

network | 1374.97 | 555.71 | −3612.41 | −2743.93 | −1105.43 | 7232.82 | −2728.70 | −1090.20 | 7253.12 |

secretkey | 1382.74 | 551.14 | −3622.10 | −2759.47 | −1096.27 | 7252.20 | −2744.25 | −1081.05 | 7272.50 |

All keywords | 1486.48 | 610.57 | −3429.08 | −2948.96 | −7797.14 | 6902.16 | −2888.06 | −1136.24 | 7013.81 |

Keyword | Estimated Parameter | p-Value |
---|---|---|

access | 0.4714 | 0.3091 |

authentication | −0.2966 | 0.3479 |

bitcoin | −3.0515 | <0.0001 |

cryptocurrency | −3.7862 | <0.0001 |

databank | 0.4457 | <0.0001 |

distributor | −0.5016 | 0.1500 |

encash | 0.9106 | 0.3098 |

ledger | 0.6810 | <0.0001 |

network | 0.7086 | 0.0007 |

secretkey | −2.6537 | <0.0001 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, S.; Jun, S.
Patent Keyword Analysis Using Regression Modeling Based on Quantile Cumulative Distribution Function. *Electronics* **2024**, *13*, 4247.
https://doi.org/10.3390/electronics13214247

**AMA Style**

Park S, Jun S.
Patent Keyword Analysis Using Regression Modeling Based on Quantile Cumulative Distribution Function. *Electronics*. 2024; 13(21):4247.
https://doi.org/10.3390/electronics13214247

**Chicago/Turabian Style**

Park, Sangsung, and Sunghae Jun.
2024. "Patent Keyword Analysis Using Regression Modeling Based on Quantile Cumulative Distribution Function" *Electronics* 13, no. 21: 4247.
https://doi.org/10.3390/electronics13214247