PatentInspector: An Open-Source Tool for Applied Patent Analysis and Information Extraction

Featured Application: This work concerns a fully functional and deployed framework for patent analysis. The potential applications of this work range from the exploratory analysis and scoping of patent technologies and themes to the discovery of key companies that invest in speciﬁc patent domains. In our study, we present an exploration of patents related to human and project management and demonstrate how the developed tool enables the rapid interpretation of the ﬁndings. Abstract: Patent analysis is a ﬁeld that concerns the analysis of patent records, for the purpose of extracting insights and trends, and it is widely used in various ﬁelds. Despite the abundance of proprietary software employed for this purpose, there is currently a lack of easy-to-use and publicly available software that can offer simple and intuitive visualizations, while advocating for open science and scientiﬁc software development. In this study, we attempt to ﬁll this gap by offering PatentInspector, an open-source, public tool that, by leveraging patent data from the United States Trademark and Patent Ofﬁce, is able to produce descriptive analytics, thematic axes and citation network analysis. The use and interpretability of PatentInspector is illustrated through a use case on human resource management-related patents, highlighting its functionalities. The results indicate that PatentInspector is a practical resource for conducting patent analytics and can be used by individuals with a limited or no background in coding and software development.


Introduction
In this era of technological and entrepreneurial progress, an increasing number of companies seek to safeguard their intellectual property.Specifically, the number of annual patent applications has almost tripled in the last two decades, according to a study conducted by the World Intellectual Property Organization (WIPO) [1], rendering patent documents more valuable than ever before.Patents are widely considered as a safe choice for large companies and organizations to secure commercial rights, avoid litigation actions and retain their competitive advantage [2].
The scope and importance of patenting is made clear when considering the large number of patent offices around the world, responsible for receiving, evaluating and granting patent applications.Such offices, with the most prominent ones being the United States Trademark and Patent Office (USPTO), the European Patent Office (EPO) and the China National Intellectual Property Administration (CNIPA), handle the difficult task of processing and analyzing patent documents, examining their objectives and their validity.This wealth of information has led to the emergence of patent analysis (PA), as a promising scientific domain that leverages data from patent offices to extract valuable results [3].
In brief, PA is a field that covers the study of patent documents utilizing proven methodologies and techniques comprising text mining, machine learning and data visualization [4,5].The results of PA have numerous applications that can be exploited in different sections within an organization or a business, including R&D management, human resources, mergers and acquisition, company evaluation and competitive intelligence [6].In addition, PA offers a plethora of opportunities for the extraction of meaningful insights through the application of advanced approaches, such as topic modeling, network analysis and machine learning.
While PA offers valuable insights, it is a time-consuming multi-stage process that requires specific skills to be conducted.Patent documents must be collected from various sources, leveraging APIs offered by the patent offices, if applicable, or by using high-level programming languages and databases.After collecting the documents, they must be preprocessed and filtered to meet certain criteria depending on the research goals and examined domain and, finally, be analyzed using a set of methodologies.While this process may seem simple for a seasoned researcher or an individual with a background in programming, databases and data engineering, there are groups of users, such as industrial actors and business stakeholders, that may not possess these types of skills or knowledge and require PA to be streamlined, automated and free of prior knowledge.
Hence, in recent years, tools that automate the process of PA have emerged and have been utilized within organizations [4], due to the excessive volume of patent documents and the inherent complexity in analyzing them.These tools frequently offer the possibility of identifying and collecting related documents, filtering them based on established criteria and applying PA methodologies.Some of these tools are also offered for advanced scientific purposes and enable researchers from multiple disciplines to overcome the obstacles of PA and easily process patent entries.
However, while PA tools do exist and are in use, to the best of our knowledge, very few of them are available as free, accessible and open-source solutions, with the majority of tools being either proprietary or requiring payment after a short free trial.In addition, the existing open-source PA tools are somewhat complex to navigate, requiring a level of scientific knowledge.Thus, the lack of a flexible, open-source and public PA tool that can cater to the needs of multiple target groups for research purposes is a clear gap in the domain of PA software.Particularly in recent years, and even more so during the COVID-19 pandemic, the programming community has greatly encouraged the principles of open science [7,8] and scientific software development [9,10].These two concepts combine the need for transparency and openness in all scientific domains along with the creation of accessible software that can process and analyze data using scientific concepts, moving science forward and used primarily for research.
Recognizing (a) the current lack of an intuitive, easy-to-use, public and practical tool for PA, in contrast with the multitude of enterprise solutions, and (b) the growing movement for open science and the development of scientific applications that open additional research avenues to scientists and practitioners that may not be familiar with programming concepts, in this study, we introduce PatentInspector, an extensible opensource tool for PA primarily implemented in Python and deployed publicly for wider use.PatentInspector recognizes the challenges associated with software deployment [11] and leverages containers to reduce them, while providing a collective framework for the retrieval, processing, filtering and analysis of patent records.The tool is designed to be user-friendly, requiring no computer programming knowledge and being accessible by a large range of interested parties.It provides a suite of analytical tools, encompassing descriptive and exploratory alongside topic and citation analysis.
The structure of our study is twofold.First, the PatentInspector public tool is introduced, and its architecture is described.Secondly, a demonstrative analysis is performed on patents utilizing PatentInspector, focusing on administration and management.Specifically, the Cooperative Patent Classification (CPC) group "G06Q10/06" is used for the case study, which encompasses areas such as resource management, workflow optimization, human and project management and enterprise planning and modeling, to demonstrate the capabilities of PatentInspector as a scientific application that performs PA.While there is considerable activity on PA notebooks and applications on software repositories like Github [12], to the best of our knowledge, this is one of the few PA interface platforms that is easy-to-use and publicly distributed to the scientific and industrial communities.
The rest of the study is organized as follows.In Section 2, background information on PA is offered, focusing on the primary scientific methodologies used in our tool, while also presenting other similar tools.In Section 3, the objectives and key contributions of the study are highlighted.In Section 4, the architecture and development of PatentInspector are presented, while Section 5 serves as a case study of its functionalities.In Section 6, the findings of the case study are discussed, emphasizing the ease of interpretation that the tool provides, and, in Section 7, the main threats to the validity of the study are provided, along with conclusions and suggested future work directions in Section 8.

Background Information on Patent Analysis and Tools
The field of PA, generally, concerns the accumulation of patent records from one or multiple patent offices, with the aim of extracting insights and useful information via the application of scientific methodologies, text mining and statistics [5].The various techniques of PA range from descriptive and exploratory analytics to topic modeling, complex citation networks and machine learning classifiers.In this section, some indicative studies on PA will be presented, focusing on the methodologies supported by PatentInspector, and then the most prominent tools for PA will be analyzed, highlighting their functionalities.

Patent Analysis Literature
Descriptive/Exploratory Analytics: Several studies have leveraged descriptive statistics to portray the temporal, geographical or technological development of patents in various fields.The results of these studies are either descriptive information about patents (e.g., most prominent organizations) or insights from multivariate methods that explain the relationships between multiple variables.Ardito et al. [13] focus on the IoT domain and explore its trends and dynamics on a country and assignee level, pinpointing the USA and China as prominent countries and Huawei and Qualcomm as the main assignees.Fujii et al. [14] and Tseng and Ting [15] explore the AI domain with knowledge-based methodologies and discover the main technologies and investors in AI trends.In the context of software engineering, Georgiou et al. [16] perform a large-scale analysis on patents from the USPTO to discover the geographical, organizational and technological distributions.Similar analyses have also been conducted in the fields of low-carbon technologies [17], RFID concepts [18], augmented reality [19], nanoscience [20] and photovoltaics [21], indicating that PA as a practice can be efficiently used in multiple application domains and yield practical results.Additional studies have also attempted to combine the use of PA with bibliometrics, enhancing the insights of PA with knowledge derived from the research literature and bibliometric indicators [22][23][24].
Topic Modeling: Apart from leveraging descriptive statistics and exploratory analysis on patent data, several studies have employed algorithms on patent data that extract topics and thematic axes, pinpointing promising technologies and objectives.Among them, the Latent Dirichlet Allocation (LDA) algorithm, proposed by Blei et al. [25], is by far the most popular when it comes to extracting topics in PA.Due to its efficiency in extracting topics from textual information, LDA has been widely employed in many fields, including vehicular technologies [26,27], where Zhang et al. [27] leveraged a variation of LDA, namely the structural topic modeling (STM) algorithm [28], which has also been employed in [29] for the profiling of hydrogen technologies.Other fields include smart manufacturing [30], sustainable city development [31], data-oriented software [32] and telecommunication patents [33], with the latter reviewing assignee hotspots, based on the extracted topics.Hotspots are particularly important as they emphasize prime investors and technologies and they have also been investigated in a plethora of studies [34][35][36][37].
Patent roadmaps, which comprise emerging or trending technologies that pave the road for future patent applications, are also an important part of topic modeling studies.Kim et al. [38].propose a patent development map with a case study in 3D printing, using LDA, while Ma et al. [39] apply the same process in solar cell technologies.Zhang et al. [40] explore the Blockchain sector to assess technological maturity and forecast trending topics, while a large case study of patents in Australia [41] presents a methodology with semantic information that estimates development for specific topics, with a tailored case study.Finally, Kim et al. [42] leverage CPC clusters in telemedicine patents to evaluate the development of the field.
It should be mentioned that topic modeling has also been employed in studies that explore the profiles of firms, along with their knowledge portfolios [43], and the identification of disruptive technologies that may alter the structure of the market [44], with a case study on photovoltaics.
Citation Networks: Patent citation networks have also been proven to be highly important, based on the related literature, as they portray the interrelations between patent records and uncover the most influential patents or technologies.The most common types of citation networks are the patent-to-patent network, which examines the citations between different patents, and the CPC-to-CPC network, which examines the citations between different patent classes.Patent citation networks have been found to be important indicators in the timely identification of notable patents [45], while their use contributes to the mapping of technological research and discovering deeper connections between different domains [46].
Patent citation analysis has been employed in multiple sectors to find prominent assignees and organizations, technologies and patent entries, including but not limited to vehicle batteries [47], mobile technologies [48], agricultural and natural case studies [49][50][51][52][53][54], printed electronics [55] and nanotechnology [56].The diffusion of information in patent citation networks has also been studied [57,58], along with the identification of emerging technologies, their lifecycles [59][60][61] and the concept of open innovation [62] and whether it is reflected in patent citations.
Technological trajectories are also an aspect that is investigated in patent citations, which can be translated into the forecasting of the evolution of an emerging technology or an established practice based on its status in a citation network.This concept has been studied in patents regarding communication standards and energy devices [63,64], fuel cell research [65] and Blockchain [66].Finally, several studies focus on assignees along with their associated technologies and their status on patent networks as a sign of competitive advantage, inventive prowess and the largest market share [67,68].
As PA has multiple applications, some studies have also proposed new approaches to exploring patent citations.More specifically, Hu et al. [69] introduce ego citation networks as an alternative means of exploring the citation of patents coupled with bibliographic references.Yang et al. [70] construct a comprehensive patent citation network leveraging direct, indirect, coupling and co-citation metrics, while Chakraborty et al. [71] use exponential random graph models to incorporate social parameters into a patent citation network.Finally, brokerage analysis [72], which exploits triadic relationships, has also been used in patent-to-patent networks [32,57,73].

Patent Analysis Tools
As mentioned in the Introduction, there are several PA tools that allow the processing of patent records, which are widely used by enterprises and organizations.In Table 1, basic information about the most popular PA tools is presented, highlighting their key characteristics and operations.An inspection of the table reveals that the majority of the tools are, indeed, proprietary and owned by large organizations (e.g., PatSeer, Derwent Innovation, Orbit Intelligence), with most of them providing access to millions of patent records from multiple offices.However, the fact that they are proprietary means that they do not support a free trial (or may do so upon request) and typically require a subscription for their services.In addition, most of the proprietary tools focus on providing business indicators for patent growth (e.g., portfolio quality, investment value), which are often based on AI methodologies, while some of them also provide topic modeling or citation analysis functionalities.
Apart from proprietary PA tools, there are also several public tools that act as either PA suites or patent search databases.Among them, Patent2Net [74] is an educational suite that leverages data from EPO and focuses on citation networks and clustering.The suite also provides an interface [75] that allows users to explore its capabilities and export results in various graph formats.The main target groups of Patent2Net are the educational and scientific communities [74], while PatentInspector strives to include more target groups, such as industrial investors, developers, inexperienced researchers and HR representatives.UnifiedPatents is another partially public PA suite that mainly focuses on business indicators and differs from PatentInspector, as it can be primarily used by business owners and economists.The portal provides an intuitive interface and companies with smaller revenue can use it for free, although it introduces a pricing option for larger companies.Finally, PatentMiner [76] is a notable effort that was undertaken before PatentInspector and provided an interface that executed advanced PA with topic modeling.

Business Indicators
The remaining free PA tools (PatZilla, FreePatents and GooglePatents) are not PA tools in the typical sense, as they mainly provide advanced search engines for the retrieval of patent documents.Thus, their PA capabilities are minimal and they cannot be considered similar to PatentInspector, which employs established scientific concepts and targets all types of users.GooglePatents [99] in particular stands as one of the most popular patent search engines, encompassing data from multiple patent offices and offering limited descriptive information (e.g., top inventors, top organizations).
The analysis of PA tools and suites reveals that, as stated in the Introduction, while there is a plethora of such tools in the market and in software repositories, few of them are suitable for users with limited coding or scientific backgrounds.PatentInspector emerges to cover this deficit, with results from the USPTO while also offering different methodologies, efficient visualizations and interpretable insights.In addition, PatentInspector introduces a novel perspective of PA for mainstream users and more advanced parties by including topic modeling methodologies that can profile the thematic axes of patent documents and aid users in making informed decisions.

Objectives and Contribution
The main objective of this study was to create PatentInspector, a user-friendly tool designed for both scientific research and everyday use.Our goal is to provide a resource that is open to any individual and simplifies the process of PA, offering scientific concepts in an easily digestible manner.The novelty of the tool that has been developed is that, in contrast to the plethora of proprietary software, it focuses on research aspects and semantic insights from patents by leveraging topic modeling and citation analysis methodologies and visualizations.Thus, it offers a new perspective on patent activity, and it can be utilized by mainstream users in combination with insights from other PA solutions.
Overall, the primary contributions of PatentInspector are the following.C1.Provide accessible PA: The goal behind PatentInspector is to widen the reach of PA, making it accessible to a wider audience without the need for expertise in legal frameworks, computer programming or data science.It is our belief that, as in many domains, the wider public is unable to extract insights and analyze patent data due to existing software being primarily proprietary and data retrieval pathways requiring coding knowledge.PatentInspector strives to provide a solution to these problems, offering a solution that automates data retrieval and guides the potential user to the analyses that it performs.
C2. Bridge the gap between PA complexity and knowledge: Complementary to C1, PatentInspector seeks to minimize the inherent complexity of the PA field and enable individuals with a limited or no programing background to be able to use, even in an elementary fashion, a tool that can process patent records and extract results.The developed solution, while offering some additional functionalities for more experienced users, requires no advanced knowledge of PA, topic modeling or statistics, thus allowing anyone to use it effectively.We aspire for PatentInspector to become a valuable resource, enabling numerous individuals to gain insights into their areas of interest within the patent landscape.Based on its design, the developed platform can be applicable across various domains and accessible to individuals from diverse backgrounds.
C3. Flexible, open-source tool for PA: As mentioned in Section 2, a large number of existing PA software programs are proprietary and must be purchased or subscribed to.This, in turn, limits the pool of users that can utilize them, while the learning curve may be high.Hence, with respect to the rise of scientific software development and open science, we offer an extensible, public tool for PA that not only is publicly available regarding the usage and modification of the source code but is also flexible in its design and primary functions.
C4. Favor simplicity, encourage engagement: At its core, the proposed tool was designed to be simple and easy-to-use.The frontend component is composed of visualizations that do not contain complex information, while more sophisticated concepts are not forced on the regular user but can be leveraged by more experienced users.Thus, PatentInspector has the potential to achieve high engagement by any user due to its simplistic yet sophisticated nature.
The proposed tool has practical application value for various interested parties, who can use it for different objectives and purposes.The different implications and target groups are presented below.In addition, Figure 1 indicates the different ways that PatentInspector can be operated by various individuals in a concise manner.However, it is crucial to note that while PatentInspector provides insights into the patent landscape, it should not be the sole basis for important decisions.It also does not aim to replace manual PA, lacking certain features and the expertise of researchers.Finally, it must be noted that the current version of the tool only uses USPTO as its data source, thus limiting the results towards the US.In future versions of the tool, we plan to include more data sources.
Appl.Sci.2023, 13, x FOR PEER REVIEW 9 of 39 forced on the regular user but can be leveraged by more experienced users.Thus, Paten-tInspector has the potential to achieve high engagement by any user due to its simplistic yet sophisticated nature.
The proposed tool has practical application value for various interested parties, who can use it for different objectives and purposes.The different implications and target groups are presented below.In addition, Figure 1 indicates the different ways that Paten-tInspector can be operated by various individuals in a concise manner.However, it is crucial to note that while PatentInspector provides insights into the patent landscape, it should not be the sole basis for important decisions.It also does not aim to replace manual PA, lacking certain features and the expertise of researchers.Finally, it must be noted that the current version of the tool only uses USPTO as its data source, thus limiting the results towards the US.In future versions of the tool, we plan to include more data sources.I1.Developers: PatentInspector follows established architecture schemas, and it is fully open-source.Developers and programmers can utilize the source code, enrich and extend it with additional features and capabilities and develop the application.The code has been structured so as to encourage novice programmers to enhance their software development skills but also experienced professionals to modify it according to their preferences.
I2. Patent inventors: The developed tool can certainly benefit individuals that have accomplished an invention and wish to patent it.Primarily, it serves as a practical way to identify frequently cited patents in their research field, offering valuable insights and trends while also revealing whether their invention is innovative and can potentially be granted.It should be emphasized, though, that, currently, PatentInspector only supports patent grants from the USPTO, so any results would inevitably be skewed towards the US.
I3. Economists: Individuals that deal with the stock market, return-of-investment and economic deals can leverage PatentInspector to observe patent trends, focusing on specific organizations, scoping successful businesses and emerging patent fields to predict upcoming trends and make informed decisions.
I4. HR departments and policymakers: PatentInspector has also an important societal aspect in its functionalities, as it can be used in conjunction with other tools and software for business intelligence and skills analysis, to discover successful inventors.This in turn can lead HR departments and policymakers within organizations to extract insights for talent acquisition, by scouting active inventors and recruiting them or retaining active personnel in their own organizations.
I5. Researchers: Researchers are another important group that can use PatentInspector, as PA is a highly active field in research, with valuable insights [3,6].Hence, I1.Developers: PatentInspector follows established architecture schemas, and it is fully open-source.Developers and programmers can utilize the source code, enrich and extend it with additional features and capabilities and develop the application.The code has been structured so as to encourage novice programmers to enhance their software development skills but also experienced professionals to modify it according to their preferences.
I2. Patent inventors: The developed tool can certainly benefit individuals that have accomplished an invention and wish to patent it.Primarily, it serves as a practical way to identify frequently cited patents in their research field, offering valuable insights and trends while also revealing whether their invention is innovative and can potentially be granted.It should be emphasized, though, that, currently, PatentInspector only supports patent grants from the USPTO, so any results would inevitably be skewed towards the US.
I3. Economists: Individuals that deal with the stock market, return-of-investment and economic deals can leverage PatentInspector to observe patent trends, focusing on specific organizations, scoping successful businesses and emerging patent fields to predict upcoming trends and make informed decisions.
I4. HR departments and policymakers: PatentInspector has also an important societal aspect in its functionalities, as it can be used in conjunction with other tools and software for business intelligence and skills analysis, to discover successful inventors.This in turn can lead HR departments and policymakers within organizations to extract insights for talent acquisition, by scouting active inventors and recruiting them or retaining active personnel in their own organizations.
I5. Researchers: Researchers are another important group that can use PatentInspector, as PA is a highly active field in research, with valuable insights [3,6].Hence, researchers with a grasp of scientific methodologies can not only use PatentInspector as a validity check, when conducting manual PA, but they can also employ its various functionalities to accelerate their research and leverage the results for more complex algorithms.In addition, PatentInspector is an excellent alternative for harvesting patent records from a selected domain, with a variety of features.However, we should once again point out that PatentInspector cannot replace global patent databases, as, in its latest release, it only retrieves data from the USPTO.

Architecture and Workflow
PatentInspector has the structure of a standard web application, consisting of a frontend and a backend component, each with a distinct role.The backend component is developed in Python 3.11, using the Django framework [107], and is responsible for storing data, conducting computations and managing the application's core functionality.The efficient handling of the aforementioned operations is made possible by utilizing the Postgres relational database, which stores and retrieves data through complex SQL queries generated by Django's Object Relational Mapper (ORM).Additionally, the backend component provides management commands and performs necessary preprocessing procedures to streamline the tool administration.On the other hand, the frontend component, built using the JavaScript Vue framework [108], plays a crucial role in presenting the data to users in an intuitive and interactive manner.It is the part of the application that users directly interact with, providing an interface for accessing the information processed by the backend.In the frontend component, users can interact with the interface and generate PA reports while also being able to access previous reports that they have created.The overall architecture of the tool is presented in Figure 2.
Appl.Sci.2023, 13, x FOR PEER REVIEW 10 of 39 researchers with a grasp of scientific methodologies can not only use PatentInspector as a validity check, when conducting manual PA, but they can also employ its various functionalities to accelerate their research and leverage the results for more complex algorithms.In addition, PatentInspector is an excellent alternative for harvesting patent records from a selected domain, with a variety of features.However, we should once again point out that PatentInspector cannot replace global patent databases, as, in its latest release, it only retrieves data from the USPTO.

Architecture and Workflow
PatentInspector has the structure of a standard web application, consisting of a frontend and a backend component, each with a distinct role.The backend component is developed in Python 3.11, using the Django framework [107], and is responsible for storing data, conducting computations and managing the application's core functionality.The efficient handling of the aforementioned operations is made possible by utilizing the Postgres relational database, which stores and retrieves data through complex SQL queries generated by Django's Object Relational Mapper (ORM).Additionally, the backend component provides management commands and performs necessary preprocessing procedures to streamline the tool administration.On the other hand, the frontend component, built using the JavaScript Vue framework [108], plays a crucial role in presenting the data to users in an intuitive and interactive manner.It is the part of the application that users directly interact with, providing an interface for accessing the information processed by the backend.In the frontend component, users can interact with the interface and generate PA reports while also being able to access previous reports that they have created.The overall architecture of the tool is presented in Figure 2.  The Python programming language was chosen for its widespread popularity, especially in the realm of scientific computing [109], as well as its flexible functionality and maintainability.

Data Collection, Preprocessing and Storage
PatentInspector operates on patent record data offered free of charge by the USPTO.More specifically, the database used in PatentInspector relies on bulk data available in the PatentsView platform of USPTO [110], which serves as a repository of all USPTO-registered and granted patents and is updated regularly.The tool includes a management utility named "USPTO", showcased in Figure 3, that automates the process of downloading, decompressing, preprocessing and inserting the data into the database.This repository is organized in tables, with each table containing a different aspect of patent records (e.g., patent classes, patent inventors, etc.) The Python programming language was chosen for its widespread popularity, especially in the realm of scientific computing [109], as well as its flexible functionality and maintainability.

Data Collection, Preprocessing and Storage
PatentInspector operates on patent record data offered free of charge by the USPTO.More specifically, the database used in PatentInspector relies on bulk data available in the PatentsView platform of USPTO [110], which serves as a repository of all USPTO-registered and granted patents and is updated regularly.The tool includes a management utility named "USPTO", showcased in Figure 3, that automates the process of downloading, decompressing, preprocessing and inserting the data into the database.This repository is organized in tables, with each table containing a different aspect of patent records (e.g., patent classes, patent inventors, etc.) PatentInspector emphasizes only targeted tables made available from USPTO and, more specifically, only those regarding granted patents, while it does not retrieve those who have applied to USPTO but have not yet received a patent grant.This was a conscious decision based on the rationale that applied patents may be rejected by the patent office and would thus hold reduced importance in the collected data [111].The different fields and tables retrieved are presented in Table 2.   PatentInspector emphasizes only targeted tables made available from USPTO and, more specifically, only those regarding granted patents, while it does not retrieve those who have applied to USPTO but have not yet received a patent grant.This was a conscious decision based on the rationale that applied patents may be rejected by the patent office and would thus hold reduced importance in the collected data [111].The different fields and tables retrieved are presented in Table 2.After the tables of interest, containing information about roughly eight million patents, are downloaded, an automated preprocessing procedure takes place.The preprocessing deployed in PatentInspector involves stop word removal and the lemmatization of text fields such as the patent's title or abstract, to facilitate and accelerate the text analysis performed in later stages.Additionally, in this phase, computations are performed in advance for optimization purposes and stored as additional columns effectively, constructing a sort of long-term database stored cache.Table 3 summarizes all the precomputed fields that aid the throughput of the application.After the preprocessing is finished, the entire set of patent records is inserted into the database.This is achieved through two different approaches depending on the size of the data.For small tables such as Location, the Django ORM is leveraged to insert the data.For larger tables such as Patent, the preprocessed chunks are stored in a bulk CSV in the file system, which is later loaded into the database using Postgres' COPY command, resulting in a significant performance boost.The schema of the database for computational-related and user-related tables is shown in Figures 4 and 5, respectively.
Consequently, the storage of user-specific data is imperative; hence, user authorization and password reset functionalities are integral features.Users have the ability to generate reports that contain PA insights and access only their individual reports.These reports encompass criteria for patent filtering and include metadata pertinent to the analysis, such as creation dates.The analysis results are stored in the file system, utilizing a combination of JSON and Excel file formats.Further information on PatentInspector reports can be found in the "Computation" subsection (Section 4.2).It is important to highlight that more experienced individuals that wish to deploy PatentInspector on their local machines, rather than running the publicly deployed version, need not depend on the USPTO utility.PatentInspector provides an alternative tool known as the "Load Database" utility that facilitates the retrieval of a highly compressed and indexed dump from the cloud, subsequently loading it into the database.This process typically results in a significant reduction in waiting time, from approximately ten hours to just one hour, on a standard personal computer with a conventional network connection, while this utility is also employed in configuring containers for deployment purposes.
It should also be noted that expanding PatentInspector to incorporate data from other patent offices would involve developing a counterpart utility to USPTO that would correspond to each of the targeted patent offices.This utility would handle the downloading, preprocessing and data insertion tasks.
PatentInspector is specifically designed to support multiple users simultaneously.Consequently, the storage of user-specific data is imperative; hence, user authorization and password reset functionalities are integral features.Users have the ability to generate reports that contain PA insights and access only their individual reports.These reports encompass criteria for patent filtering and include metadata pertinent to the analysis, such as creation dates.The analysis results are stored in the file system, utilizing a combination of JSON and Excel file formats.Further information on PatentInspector reports can be found in the "Computation" subsection (Section 4.2).

Computation
The report entity serves as the central element in the user experience, driving computations.When a user initiates a new report or interacts with its results, triggering additional computations, they effectively add a new task to the task queue of PatentInspector.The use of a task queue, rather than executing computations immediately upon request, is essential because analyses can consume up to twenty minutes on an average computer.The task queue of PatentInspector periodically polls for new tasks and executes them in the background when it has available resources.To relieve of the user of having to wait for his/her report to be completed, once tasks are terminated, users are informed via email, based on their preference, that their analysis has been finished.
In PatentInspector, the term "task" refers to functions and their corresponding arguments that are executed at a deferred point in time.These tasks primarily consist of functions integrated with Django ORM code, which ultimately generate complex SQL queries sent to the database.In certain instances, tasks may include code from computational libraries to handle computations that cannot be carried out within the database system, such as topic analysis.
PatentInspector currently implements two tasks, namely "process report" and "topic analysis".In the "process report" task, all computations are executed using default parameters.For instance, the default setting for the number of topics in topic analysis is ten topics.While users can effectively use the tool with the default parameters, the tool allows users to modify their queries (e.g., change the number of topics) and produce alternative results.

Computation
The report entity serves as the central element in the user experience, driving computations.When a user initiates a new report or interacts with its results, triggering additional computations, they effectively add a new task to the task queue of PatentInspector.The use of a task queue, rather than executing computations immediately upon request, is essential because analyses can consume up to twenty minutes on an average computer.The task queue of PatentInspector periodically polls for new tasks and executes them in the background when it has available resources.To relieve of the user of having to wait for his/her report to be completed, once tasks are terminated, users are informed via email, based on their preference, that their analysis has been finished.
In PatentInspector, the term "task" refers to functions and their corresponding arguments that are executed at a deferred point in time.These tasks primarily consist of functions integrated with Django ORM code, which ultimately generate complex SQL queries sent to the database.In certain instances, tasks may include code from computational libraries to handle computations that cannot be carried out within the database system, such as topic analysis.
PatentInspector currently implements two tasks, namely "process report" and "topic analysis".In the "process report" task, all computations are executed using default parameters.For instance, the default setting for the number of topics in topic analysis is ten topics.While users can effectively use the tool with the default parameters, the tool allows users to modify their queries (e.g., change the number of topics) and produce alternative results.
The results of these computations are saved in two files: a JSON file containing all computational outputs and an Excel file containing the patents and their associated information.Users can easily download the Excel file for further manual analysis and tasks.The "topic analysis" task essentially executes the topic analysis methodologies utilizing user-provided parameters, effectively replacing the existing JSON result file.Programmatically extending PatentInspector to provide additional reactivity in the report results is straightforward, with supplementary sub-tasks like "patent analysis" having to be developed.

API and Interconnection
PatentInspector employs a REST API, which is accessed by the frontend application for user and task management, as well as data retrieval.The tool provides Swagger documentation, which is automatically accessible from the local server when running in a development environment at the "/swagger" URI.The API endpoints are summarized in Table 4.

Features List and Users' Perspective
User management and verification is an integral part of PatentInspector.The tool allows users to be registered, authenticated and update their credentials and preferences.In Figure 6, we provide an overview of the user management-related windows of the application, indicating that users can change passwords, log in and register while also being able to notify the tool to alert them via email when the analysis is completed, thus eliminating the need of leaving the application open to complete its computations.PatentInspector allows users to create reports based on a comprehensive set of filters targeting various aspects of the patent ecosystem.The basic idea behind the tool is that users can select which patent records to be analyzed using multiple criteria, ranging from the grant year or keywords in the patent title/abstract to inventor locations or names.Table 5 provides a concise overview of the available filters within the report construction form from a programmer's perspective, while Figure 7 provides the same information from a user's perspective.After users choose their PA criteria and submit the form, they are taken to their report list, presented in Figure 8. There, they can easily check report metadata, delve into previous report results or delete reports as needed.
User management and verification is an integral part of PatentInspector.The tool allows users to be registered, authenticated and update their credentials and preferences.In Figure 6, we provide an overview of the user management-related windows of the application, indicating that users can change passwords, log in and register while also being able to notify the tool to alert them via email when the analysis is completed, thus eliminating the need of leaving the application open to complete its computations.Paten-tInspector allows users to create reports based on a comprehensive set of filters targeting various aspects of the patent ecosystem.The basic idea behind the tool is that users can select which patent records to be analyzed using multiple criteria, ranging from the grant year or keywords in the patent title/abstract to inventor locations or names.Table 5 provides a concise overview of the available filters within the report construction form from a programmer's perspective, while Figure 7 provides the same information from a user's perspective.After users choose their PA criteria and submit the form, they are taken to their report list, presented in Figure 8. There, they can easily check report metadata, delve into previous report results or delete reports as needed.

Analysis Tabs
The analysis conducted by PatentInspector is organized into three primary tabs, namely the Descriptive Analysis Tab, the Thematic Analysis Tab and the Network Analysis Tab.In this section, we present each tab from the perspective of the user, analyzing the functionalities that they provide.
The Descriptive Analysis Tab of PatentInspector is organized into three distinct sections, each serving a specific purpose.Firstly, the "Basic Statistical Measures" section offers a table featuring statistical measures for a range of variables.Secondly, the "Variables Over Time" section provides insights through various time series representations.Lastly, the "Information for Each Entity" section presents data distributions tailored to different aspects of PA, ensuring an inclusive view of the data for individual entities.In Figure 9, the Descriptive Analysis Tab is presented, while Table 6 summarizes the derived statistics.

Analysis Tabs
The analysis conducted by PatentInspector is organized into three primary tabs, namely the Descriptive Analysis Tab, the Thematic Analysis Tab and the Network Analysis Tab.In this section, we present each tab from the perspective of the user, analyzing the functionalities that they provide.
The Descriptive Analysis Tab of PatentInspector is organized into three distinct sections, each serving a specific purpose.Firstly, the "Basic Statistical Measures" section offers a table featuring statistical measures for a range of variables.Secondly, the "Variables Over Time" section provides insights through various time series representations.Lastly, the "Information for Each Entity" section presents data distributions tailored to different aspects of PA, ensuring an inclusive view of the data for individual entities.In Figure 9, the Descriptive Analysis Tab is presented, while Table 6 summarizes the derived statistics.Finally, the tool includes a Network Analysis Tab that showcases the most cited patents across local and global networks.Furthermore, it provides an interactive 3D graph representation for the local citation network.These elements are presented in Figure 11.The Topic Analysis Tab consists of three main components.First, there is a form that allows users to adjust the criteria for topic analysis.These criteria include the choice of topic analysis method (the tool currently supports the Latent Dirichlet Allocation (LDA) method and the Nonnegative Matrix Factorization (NMF) method [112]), the number of topics, the words per topic, the date range for analysis and parameters like the removal of the most common words (for LDA) or the maximum document frequency (for NMF).The second component displays a scatter plot and its corresponding table, categorizing topics as "emerging," "dominant," "declining" or "saturated" based on the methodology outlined in [36], which relies on the patent share (the number of patents in each topic) and the Compound Annual Growth Rate (CAGR) of the patent share.Lastly, the third component presents topic details, including word weights and the number of patents in each topic.Figure 10 presents a detailed overview of the Topic Analysis Tab.
Finally, the tool includes a Network Analysis Tab that showcases the most cited patents across local and global networks.Furthermore, it provides an interactive 3D graph representation for the local citation network.These elements are presented in Figure 11.
Another important point is that within the framework of PatentInspector, a Patents Tab is featured (presented in Figure 12).This tab hosts a sortable table containing the patents that have been retrieved with the filters applied by the user, accompanied by the functionality to download a comprehensive Excel file.This file encompasses most of the pertinent information for the patents that have undergone filtration based on the specified criteria submitted when creating the report.The Patents Tab is highly important as it enables users with a richer background and knowledge to conduct a more in-depth analysis of the patents on their own terms.However, this does not limit users with limited knowledge from experimenting with the tool or downloading the extracted patents for additional analysis.Finally, the tool includes a Network Analysis Tab that showcases the most cited patents across local and global networks.Furthermore, it provides an interactive 3D graph representation for the local citation network.These elements are presented in Figure 11.Another important point is that within the framework of PatentInspector, a Patents Tab is featured (presented in Figure 12).This tab hosts a sortable table containing the patents that have been retrieved with the filters applied by the user, accompanied by the functionality to download a comprehensive Excel file.This file encompasses most of the

Case Study and Validation
To effectively showcase its functionalities and usefulness, in this section, Paten-tInspector is employed to perform a PA focused on the CPC group "G06Q10/06".This group spans a wide array of domains, including resource management, workflow optimization and human and project management, as well as enterprise planning and modeling.To validate the findings, a comparison of the results is made with a replication of this case study using the Lens software [102], which is hailed as an established source both for patent data retrieval and for PA insights [113][114][115][116].The results indicate that the descriptive insights and citation analysis extracted by PatentInspector largely correspond with the results from Lens, indicating that the constructed tool produces valid PA outputs.However, it should be taken into account that all comparisons were made with patents from the USPTO and not from the global patent landscape.The detailed description and comparison of the tool with Lens [102] can be found in the Supplementary Materials, along with three additional case studies conducted to further validate PatentInspector.
Our analysis, consisting of 13,424 patents retrieved from the filtering system using the "CPC = G06Q10/06" filter using exact matching, commences with a descriptive analysis, provided by the Descriptive Analysis Tab, starting with statistical measures, followed by an exploration of the variables over time, an investigation of entity-specific data and a subsequent topic analysis from the Topic Analysis Tab, concluding with a citation analysis from the Network Analysis Tab.In Table 7, we present basic statistical measures.

Case Study and Validation
To effectively showcase its functionalities and usefulness, in this section, PatentInspector is employed to perform a PA focused on the CPC group "G06Q10/06".This group spans a wide array of domains, including resource management, workflow optimization and human and project management, as well as enterprise planning and modeling.To validate the findings, a comparison of the results is made with a replication of this case study using the Lens software [102], which is hailed as an established source both for patent data retrieval and for PA insights [113][114][115][116].The results indicate that the descriptive insights and citation analysis extracted by PatentInspector largely correspond with the results from Lens, indicating that the constructed tool produces valid PA outputs.However, it should be taken into account that all comparisons were made with patents from the USPTO and not from the global patent landscape.The detailed description and comparison of the tool with Lens [102] can be found in the Supplementary Materials, along with three additional case studies conducted to further validate PatentInspector.
Our analysis, consisting of 13,424 patents retrieved from the filtering system using the "CPC = G06Q10/06" filter using exact matching, commences with a descriptive analysis, provided by the Descriptive Analysis Tab, starting with statistical measures, followed by an exploration of the variables over time, an investigation of entity-specific data and a subsequent topic analysis from the Topic Analysis Tab, concluding with a citation analysis from the Network Analysis Tab.In Table 7, we present basic statistical measures.In this context, it is important to highlight that the distributions of applications and grants tend to fall around 2021-2023 due to the absence of data from PatentInspector.As previously detailed in Section 4.1, PatentInspector exclusively handles granted patents.It is noteworthy that the statistical table indicates an average pre-grant duration of 3.91 years.
Consequently, it is reasonable to infer that patents submitted within the last three years are likely not included in the PatentInspector database.Additionally, USPTO is the only patent office that granted patents for G06Q10/06 solely because PatentInspector contains only patents from USPTO currently.
Upon an examination of the charts (Figures 13-20), it becomes evident that the domains associated with HRM present an upwards trajectory as the patent grants and applications are consistently on the rise.Regarding PCT status, while it presents an upwards trend based on Figure 17, 12,517 patents (93.2%) have not applied for PCT, while only 907 (6.8%) of the total patents have been granted PCT status.
Appl.Sci.2023, 13, x FOR PEER REVIEW 23 of 39 In this context, it is important to highlight that the distributions of applications and grants tend to fall around 2021-2023 due to the absence of data from PatentInspector.As previously detailed in Section 4.1, PatentInspector exclusively handles granted patents.It is noteworthy that the statistical table indicates an average pre-grant duration of 3.91 years.Consequently, it is reasonable to infer that patents submitted within the last three years are likely not included in the PatentInspector database.Additionally, USPTO is the only patent office that granted patents for G06Q10/06 solely because PatentInspector contains only patents from USPTO currently.
Upon an examination of the charts (Figures 13-20), it becomes evident that the domains associated with HRM present an upwards trajectory as the patent grants and applications are consistently on the rise.Regarding PCT status, while it presents an upwards trend based on Figure 17, 12,517 patents (93.2%) have not applied for PCT, while only 907 (6.8%) of the total patents have been granted PCT status.In this context, it is important to highlight that the distributions of applications and grants tend to fall around 2021-2023 due to the absence of data from PatentInspector.As previously detailed in Section 4.1, PatentInspector exclusively handles granted patents.It is noteworthy that the statistical table indicates an average pre-grant duration of 3.91 years.Consequently, it is reasonable to infer that patents submitted within the last three years are likely not included in the PatentInspector database.Additionally, USPTO is the only patent office that granted patents for G06Q10/06 solely because PatentInspector contains only patents from USPTO currently.
Upon an examination of the charts (Figures 13-20), it becomes evident that the domains associated with HRM present an upwards trajectory as the patent grants and applications are consistently on the rise.Regarding PCT status, while it presents an upwards trend based on Figure 17, 12,517 patents (93.2%) have not applied for PCT, while only 907 (6.8%) of the total patents have been granted PCT status.The most prolific inventors (Table 8) typically possess a minimum of 27 patents each within the domain of human and resource management.The majority of these inventors are situated in the United States, Japan, the United Kingdom, Ireland, Germany and Israel, as presented in Figure 21.It is important to note that there may be a notable bias in these statistics due to the database of PatentInspector being limited to patents granted by the US patent office, thus skewing the results towards the US.However, the presence of diverse inventors from different countries indicates that HRM is indeed a globally studied field, with multiple individuals interested in patenting their inventions.The most prolific inventors (Table 8) typically possess a minimum of 27 patents each within the domain of human and resource management.The majority of these inventors are situated in the United States, Japan, the United Kingdom, Ireland, Germany and Israel, as presented in Figure 21.It is important to note that there may be a notable bias in these statistics due to the database of PatentInspector being limited to patents granted by the US patent office, thus skewing the results towards the US.However, the presence of diverse inventors from different countries indicates that HRM is indeed a globally studied field, with multiple individuals interested in patenting their inventions.
The G06Q10/06 class is predominantly populated by Cooperation/Organization entities, making up 99.4% of the total.Among them, the leading assignees, presented in Table 9, tend to hold at least 124 patents each in the G06Q10/06 category.In total, the top 10 assignees collectively possess 3363 patents, accounting for approximately 25% of all patents within the G06Q10/06 category.The locations of the assignees, shown in Figure 22, closely mirror those of the inventors, but with a greater concentration in the prominent tech hubs across the United States, which is once again a result of USPTO being the only source of data.Among them, several reputable companies and organizations are visible, which employ a large pool of personnel and manage projects and resources on a complex scale, as well as companies that are involved in the Software and Informatics sectors such as IBM (USA), Microsoft (USA), Oracle (USA) and Accenture (IRL).The G06Q10/06 class is predominantly populated by Cooperation/Organization entities, making up 99.4% of the total.Among them, the leading assignees, presented in Table 9, tend to hold at least 124 patents each in the G06Q10/06 category.In total, the top 10 assignees collectively possess 3363 patents, accounting for approximately 25% of all patents within the G06Q10/06 category.The locations of the assignees, shown in Figure 22, closely mirror those of the inventors, but with a greater concentration in the prominent tech hubs across the United States, which is once again a result of USPTO being the only source of data.Among them, several reputable companies and organizations are visible, which employ a large pool of personnel and manage projects and resources on a complex scale, as well as companies that are involved in the Software and Informatics sectors such as IBM (USA), Microsoft (USA), Oracle (USA) and Accenture (IRL).PatentInspector was also employed to perform the LDA method for topic modeling, deleting the 20 most frequently appearing words in the document, using the default parameter of 10 topics.The execution of LDA yielded a coherence score of 0.523, meaning that the resulting topics were well-rounded and broad [25].Subsequently, PatentInspector categorized these topics according to their patent share and the CAGR of their patent share during the period between 30 March 2015 and 28 March 2020, which covered 5 years before the last grant date to the last grant date itself, using the default settings of PatentInspector.Of course, the user has the capability of modifying any of these parameters according to his/her preference.It should be noted that this is an indicative, demonstrative execution of LDA, for the purposes of the case study, and not necessarily the optimal model.However, based on the outputs of the tool, we can provide some interpretations for the HRM field.
In Table 10, the extracted topics are presented, along with the top words of each topic and their patent share and CAGR classification.A title has also been assigned to each topic based on the words that characterize it and by inspecting the most representative patents that belong to it.In Table 11, the most representative patents for each topic are also included, as extracted by PatentInspector, aiding the user to validate the produced topics and assign titles to each topic, in combination with the most probable words.US10068020-Consumable data management US10042904-System of centrally managing core reference data associated with an enterprise US10360546-Method for supplying electrical power and billing for electrical power supplied using frequency regulation credits US7310646-Data management system providing a data thesaurus for mapping between multiple data schemas or between multiple domains within a data schema US6907381-System for aiding the preparation of operation and maintenance plans for a power-generation installation US7885793-Method and system for developing a conceptual model to facilitate generating a business-aligned information technology solution US8249756-Method, device and system for responsive load management using frequency regulation credits US8204922-Master data management system for centrally managing core reference data associated with an enterprise US9418046-Price-and-branch algorithm for mixed integer linear programming US9021420-Deployment of business processes in service-oriented architecture environments US9159099-Exception notification system and method US8001166-Methods and apparatus for optimizing keyword data analysis US9191277-Method of registering a device at a remote site featuring a client application capable of detecting the device and transmitting registration messages between the device and the remote site Among the detected topics, Resource Allocation and Supply Chain Analysis (Topic 1) is observed, as well as Risk Assessment & Performance Evaluation metrics (Topic 9), Job Scheduling (Topic 6) and the Analysis of Risk in Decisions (Topic 4).Moreover, some niche topics are also present, such as Computing & IT Support (Topic 5), Hardware Maintenance (Topic 8) and Network-Client Communication (Topic 10).Topics related to software, such as Business Intelligence software (Topic 2), which accelerates and simplifies HRM tasks, as well as Interface Interaction & Electronic Records (Topic 3) are also present.Finally, dealing with ways to automate logistics using autonomous vehicle technologies and tracking methods is also observed (Topic 7).
Based on the interpretation of the CAGR and patent share, extracted by PatentInspector, several trends have emerged in the field of HRM.Notably, areas such as logistics, operational tracking, supply chain analysis, logic programming, interface interaction and IT support are witnessing growth.Conversely, domains like networking and client communication remain predominant.In contrast, job scheduling and hardware maintenance show signs of decline.Additionally, business intelligence as well as risk assessment and evaluation are currently exhibiting a state of saturation.
In the G06Q10/06 local citation network consisting of 27,702 citations (Table 12), the majority of highly cited patents have garnered no less than 142 citations.These patents primarily focus on decision support, resource management systems and associated methodologies, while job scheduling, optimization and sales automation also make an appearance, linking the cited patents with the extracted topics from the topic analysis component.Meanwhile, within the G06Q10/06 global citation network, which measures the citations from the entire USPTO database, the most cited patents have a minimum of 780 citations, as highlighted in Table 13, with a predominant emphasis on networking, client communications, collaboration and resource management.We can see that while local patents are more specific and targeted in their objectives, the global patents are more abstract in their purposes, which is expected when considering that patents from this class may be used as citations from other patents of different domains and may hence concern other concepts.

Patent
Incoming Citations US6151582-Decision support system for the management of an agile supply chain 142 US5953707-Decision support system for the management of an agile supply chain 138 US5630070-Optimization of manufacturing resource planning 125 US4937743-Method and system for scheduling, monitoring and dynamically managing resources 117 US6578005-Method and apparatus for resource allocation when schedule changes are incorporated in real time 110 US5369570-Method and system for continuous integrated resource management 109 US5826239-Distributed workflow resource management system and method 108 US5189606-Totally integrated construction cost estimating, analysis, and reporting system 107 US5111391-System and method for making staff schedules as a function of available resources as well as employee skill level, availability and priority 92 US5216612-Intelligent computer integrated maintenance system and method 90 The comparison of the constructed case study of PatentInspector with a replication in Lens (which can be found in the Supplementary Materials-Case Study #1) indicates that the extracted inventors, assignees and globally cited patens along with the timeline of granted patents correspond with the insights from PatentInspector.Hence, the alignment of the constructed tool with an established source of patent data and descriptive PA is an encouraging indicator of the validity of PatentInspector and its potential for scientific PA.

Discussion and Implications
Based on the insights derived from the investigated case study of Section 5, we can deduce that the use of PatentInspector facilitated the interpretation of HRM patents and profiles HRM as an active field, with an abundance of patent applications and grants, particularly in the last 10 years.Based on the topics extracted by the Topic Analysis Tab and the inspection of the most representative documents provided by the tool, it appears that the emergence of software solutions and the constant provision of data have certainly influenced the topics, objectives and purposes of patents in this field, and many reputable organizations have been granted patents related to HRM.
The use of PatentInspector showcases that HRM patents have a mean granting time of 3.6 years, among other useful statistics produced from the Descriptive Analysis Tab.The profiling of active inventors and assignees indicated that companies such as IBM, Microsoft and Amazon are interested in HRM patents, while PatentInspector provides an overview of their locations and the evolution of several variables over time (e.g., number of citations).
The analysis conducted by the Topic Analysis Tab profiled the primary objectives of HRM patents, presenting the status of the topics and the most representative patents.An observation of the extracted topics by the LDA methodology presents autonomous vehicles and logistics as emerging topics, along with hardware maintenance.In general, the Topic Analysis Tab provided an easy means of assessing the primary trends in HRM patents and whether these trends dominate or have saturated the market, using the implemented CAGR metric.Overall, the executed LDA model is well rounded, with a good coherence score, indicating that the extracted topics capture the semantics and objectives of HRM in a concise manner.
Moreover, the Network Analysis Tab portrays the most cited (locally or globally) patent entries, allowing users to view which patents are more influential among the retrieved documents and examine which technologies or patent objectives may shape or influence subsequent patent applications.
Based on the results of the case study, we can deduce that PatentInspector is an easyto-use and practical tool for PA, with the core insights produced by the tool providing the potential to assess the developments in a patent domain, with an emphasis on the US.The tool fully portrays the most prolific organizations, inventors and locations, while also being able to showcase the primary topics of patent objectives and their growth in a given time period.Moreover, the citation networks allow users to examine which patents are more popular among patent applications and are consistently used as reference points.This process is achieved via the use of streamlined visualizations that facilitate user understanding and require little or no scientific and coding background to be interpreted.
Evidently, PatentInspector serves as an easy-to-learn, public resource that, while not being able to replace more complex PA methodologies, can certainly facilitate the carrying out of basic PA tasks, while also offering opportunities for some higher-level analysis, such as topic modeling with two established algorithms.The simplicity of the tool encourages users of different backgrounds, ranging from PA enthusiasts to seasoned researchers, to leverage its capabilities and perform a baseline analysis for a patent domain of their choice.Moreover, the tool is not proprietary and is already deployed and ready-to-use, while the codebase is fully open-source and extensible.

Threats to Validity
In this section, we present some threats to the validity of the proposed PA software (v.1), making the distinction between internal validity, i.e., limitations in the methodological design of PatentInspector, and external validity, i.e., factors that limit the generalization and applicability of PatentInspector to other domains or patent offices.
Regarding internal validity, one primary limitation is that PatentInspector only retrieves data from the USPTO and not from other major patent offices, such as the EPO or CNIPA.This automatically skews the results, as the illustrated plots, topics and citation networks will inevitably present a partial view of patent grants, with a focus on the US region.However, this threat is mitigated by the fact that the USPTO has been indicated in the literature to be a viable source that effectively captures global patent trends [111,117].It should also be emphasized that the choice of the USPTO was based on the fact that it was the only patent office to include a "bulk data" endpoint that could allow the storage of the entire patent office in the database of PatentInspector, given the resources and limitations to access and real-time data retrieval when developing the tool.We recognize that this is a threat and plan to expand the tool to include more data sources in subsequent versions.
In addition, the developed tool leverages data from patents that have already received a grant and does not consider patents that have been applied for and are pending evaluation.While this may lead to data omission, it is a reasonable practice, as applied patents may be rejected, in contrast to granted patents that have been carefully examined.In addition, a minor threat to the developed tool is that we do not introduce new methodologies for PA or leverage advanced methodologies for strategic analysis, technology convergence or business scoping.However, as our primary goal was to introduce a PA tool accessible to multiple parties with various backgrounds, the features that were chosen and incorporated focused on scientific concepts that can be easily understood and interpreted by individuals of different levels.
In terms of our reliance on user judgement and expertise, this only has relevance when experimenting with the implemented topic analysis algorithms.An experienced user can alter the values of topic modeling (number of models, keywords to be removed, time range) and experiment with different setups.However, we consider this a minor threat, as the topic modeling aspect of PatentInspector is fully supported to run on the default parameters and produce reliable results.Finally, as far as the waiting times for the analysis to be conducted are concerned, this is indeed a limitation in the functionalities of PatentInspector, as the deployed server cannot support a large number of simultaneous users.To mitigate this, we have implemented an alert function that allows users to exit the tool, run the analysis in the background and receive an email once the report is generated.
Regarding the external validity of PatentInspector, the applicability of our data retrieval, preprocessing, storage and analysis could potentially extend to other patent offices but it may be hindered both by closed and proprietary APIs, as well as possible different patenting procedures, which may lead to different data being stored.Thus, any application of our tool to another office, such as the EPO, should be carefully structured, with proper adjustments to the database in order to accommodate potentially varied patent data.In addition, as PatentInspector relies on predownloaded data from USPTO that are periodically updated, it cannot be used as a patent database that retrieves data "on the fly", but rather as a tool for analysis that utilizes the most recent snapshot of patent data from the USPTO.Finally, PatentInspector focuses only on the scientific aspect of PA, demonstrating useful statistics, topics and citations, and does not delve into the legal procedures of patenting, such as litigations, or refined economic indicators.However, this is not a major threat as our primary goal was to offer an open-source scientific software application that mainly targets PA researchers and scientists, but which could also be used by industrial actors as a complementary tool for the analysis of trends, in conjunction with appropriate business intelligence suites.

Contributions
In this study, our vision was to offer an application of this scope, creating a flexible tool for PA that is open-source, free to use and provides interpretable insights for multiple interested parties.
The developed tool can indeed be used to extract descriptive statistics, thematic axes and citation analysis, focusing on the USPTO and being capable of analyzing thousands of patent records.Its usability was demonstrated in a case study of HRM patents, where the extracted visualizations captured the landscape of the domain and allowed the rapid detection of active inventors, prolific organizations and emerging or dominant thematic axes.
Overall, the tool that we offer contributes to the current landscape of PA tools by (i) offering a publicly deployed, direct and easy-to-use solution for PA that can be used by users without coding or advanced PA knowledge; (ii) providing a Topic Modeling panel that can be used by researchers to extract thematic axes on patent data while also evaluating the growth or decline of each topic; (iii) producing flexible visualizations that can be easily interpreted by all users, without requiring advanced background knowledge of PA; and (iv) having the source code of the tool publicly available and open-source, to be modified or improved by any researcher that wishes to expand the tool's functionalities.
We believe that PatentInspector is a valuable resource for any individual that wishes to conduct a baseline PA study, without being limited by pricing or knowledge gaps.

Conclusions and Future Work
The field of PA is evolving rapidly, being applied to a plethora of domains for various different objectives.The abundance of patent data and the constant need for analysis has led to a range of tools and software that facilitate this purpose.Especially due to the rise of open science and scientific software development, applications and tools that encourage scientists to openly engage with software and advance their research are more than necessary.We believe that bridging the field of PA with the open-source community can yield multiple benefits for all interested parties, advancing research and scientific maturity and promoting easy access to knowledge and learning.Some future work directions of this study include expanding our database to include patents from other offices, with the EPO being an important source, as well as configuring our patent database to periodically be updated and also include patent families (single or extended), using the latest data from the USPTO.In addition, we plan to enhance the capabilities of PatentInspector by adding more advanced methodologies for topic modeling, along with a grid search function to find the optimal model, technological convergence (e.g., convergence networks) and co-word analysis [118].Finally, linking PatentInspector with existing patent databases that could enable the faster retrieval of data would greatly accelerate the storage process and would elevate the user experience.

Figure 3 .
Figure 3.An overview of the USPTO management utility.

Figure 3 .
Figure 3.An overview of the USPTO management utility.

Figure 4 .
Figure 4.The ER diagram of the computational-related tables.Figure 4. The ER diagram of the computational-related tables.

Figure 4 .
Figure 4.The ER diagram of the computational-related tables.Figure 4. The ER diagram of the computational-related tables.

Figure 5 .
Figure 5.The ER diagram of the user-related tables.

Figure 5 .
Figure 5.The ER diagram of the user-related tables.

Figure 7 .
Figure 7.The filters of the report construction form in PatentInspector from a user's perspective.Figure 7. The filters of the report construction form in PatentInspector from a user's perspective.

Figure 7 .
Figure 7.The filters of the report construction form in PatentInspector from a user's perspective.Figure 7. The filters of the report construction form in PatentInspector from a user's perspective.

Figure 8 .
Figure 8.The report list user interface of PatentInspector.

Figure 8 .
Figure 8.The report list user interface of PatentInspector.

39 Figure 9 .
Figure 9.The user interface of the Descriptive Analysis Tab of PatentInspector.

Figure 10 .
Figure 10.The user interface of the Topic Analysis Tab of PatentInspector.The colored dots correspond to the detected topics and each topic has been assigned a quadrant based on its Patent Share and CAGR values.

Figure 9 .
Figure 9.The user interface of the Descriptive Analysis Tab of PatentInspector.
popular IPC classes o 5 most popular IPC subclasses o 5 most popular IPC groups o 5 most popular IPC subgroups

Figure 9 .
Figure 9.The user interface of the Descriptive Analysis Tab of PatentInspector.

Figure 10 .
Figure 10.The user interface of the Topic Analysis Tab of PatentInspector.The colored dots correspond to the detected topics and each topic has been assigned a quadrant based on its Patent Share and CAGR values.

Figure 10 .
Figure 10.The user interface of the Topic Analysis Tab of PatentInspector.The colored dots correspond to the detected topics and each topic has been assigned a quadrant based on its Patent Share and CAGR values.

39 Figure 11 .
Figure 11.The user interface of the Network Analysis Tab of PatentInspector.Red dots correspond to patents and grey lines connect patents that cite or are cited by other patents.

Figure 11 .
Figure 11.The user interface of the Network Analysis Tab of PatentInspector.Red dots correspond to patents and grey lines connect patents that cite or are cited by other patents.

Figure 12 .
Figure 12.The user interface of the Patents Tab of PatentInspector.

Figure 12 .
Figure 12.The user interface of the Patents Tab of PatentInspector.

Figure 21 .
Figure 21.Inventor location distribution for G06Q10/06.Different colors represent the number of inventors in a region, with red indicating more inventors and green indicating less inventors.

Figure 22 .
Figure 22.Assignee location distribution for G06Q10/06.Different colors represent the number of assignees in a region, with red indicating more assignees and green indicating less assignees.

Figure 21 .
Figure 21.Inventor location distribution for G06Q10/06.Different colors represent the number of inventors in a region, with red indicating more inventors and green indicating less inventors.

Figure 21 .
Figure 21.Inventor location distribution for G06Q10/06.Different colors represent the number of inventors in a region, with red indicating more inventors and green indicating less inventors.

Figure 22 .
Figure 22.Assignee location distribution for G06Q10/06.Different colors represent the number of assignees in a region, with red indicating more assignees and green indicating less assignees.

Figure 22 .
Figure 22.Assignee location distribution for G06Q10/06.Different colors represent the number of assignees in a region, with red indicating more assignees and green indicating less assignees.

Table 1 .
Prominent patent analysis tools.

Table 2 .
Fields retrieved from the USPTO.

Table 2 .
Fields retrieved from the USPTO.

Table 2 .
Cont.The USPTO does not keep track of current IPC codes but the ones at the time of the grant. 1

Table 4 .
The endpoints of the PatentInspector API.Used to search and show valid options for user to select when creating the report in a dropdown menu.

Table 5 .
The filters of the report construction form in PatentInspector from a programmer's perspective.

Table 5 .
The filters of the report construction form in PatentInspector from a programmer's perspective.

Table 6 .
Features of the Descriptive Analysis Tab.

Table 10 .
The 10 topics and their classification for G06Q10/06.

Table 11 .
Most representative patents for each topic.

Table 12 .
Most cited patents on the local network for G06Q10/06.

Table 13 .
Most cited patents on the global network for G06Q10/06.