1. Introduction
It is an undeniable statement that one of the key aspects of the modern world and its financial, business and industrial services are data and the valuable knowledge that they provide. As society strives to become technologically adept and interconnected, the production, processing and management of data have become a focal point of businesses that adapt their operations in order to function in this data-driven environment [
1,
2,
3]. Of course, these efforts would not be feasible without specialized software tools that are developed to facilitate this process. Through this perspective, Data-Oriented Software Development (DOSD) is equally important for organizations that wish to analyse significant volumes of data because the functionalities and opportunities that they offer in data processing can accelerate business growth [
4,
5,
6]. Particularly due to the Industry 4.0 [
7] and Industry 5.0 [
8] movements, data and software are treated as two intertwined entities that complement each other [
9,
10]. Thus, the analysis of these entities can serve as a reflection of industrial evolution and further highlight the importance of DOSD in this area. While there are several studies that discuss this concept based on empirical data [
11,
12,
13,
14], an important, but yet neglected, source of information is industrial-granted patents.
To this regard, patents are an indispensable part of the industrial community, as they comprise a secure way of establishing the creation and ownership of a property [
15]. Individuals and organizations on a global scale strive to secure a patent grant that will allow them to own the commercial and scientific rights, enabling them to economically exploit their owned patent [
16,
17]. Over the years, patents have reportedly been granted from fields such as medical sciences [
18,
19,
20,
21], engineering [
22,
23], computer science [
24,
25,
26] and many other scientific domains. The increase in patent activity is not a surprising phenomenon, as patents are research indicators that highlight research development and activity [
27,
28,
29,
30].
Due to the rapid development of new technologies, which in turn affects the methodologies and objectives of patents [
31,
32], the innovation potential and promising practices in any domain are constantly evolving. Thus, the early discovery of innovative technologies and the forecasting of emerging trends ensure that the research and industrial communities adapt to the ever-changing technological environment and produce high-quality results. To that end, the analysis of patent data is a potent way of uncovering technological shifts and forecasting future technologies [
33,
34,
35,
36], as their objectives and methodologies capture potential developments [
37,
38,
39,
40].
Patents are an accepted and secure way of intellectual property, with rising popularity in the industrial world in the same way that DOSD is a prominent domain of computer science as a subdomain of software engineering (SE), with a wide acceptance in the scientific community and considerable research [
41,
42]. Evidently, as the volume of available information increases, data generation, collection, processing and consumption is performed through software. Simultaneously, new challenges arise in integrating existing and future technologies in the evolving SE and DOSD domains [
43]. To tackle these challenges and to forecast future developments, evidence from patent analysis can facilitate in the timely detection of new methodologies and highlight technological trends [
33,
39,
44,
45].
Patent activity in SE and hence in DOSD has known an ever-increasing trajectory, from the dawning of the age of computers to the modern age of information and large-scale data processing [
46]. In a sense, this highly present patent activity can be a potent measure of innovation and technological advancement [
47,
48].
To this regard, tracking the development and research value of patents can be a challenging task, without the existence of several patent offices that contribute to the organization and storing of patent data. Simultaneously, these agencies serve as the pillar for patent applications on a global scale. Some indicative examples of popular and established patent offices are the European Patent Office (EPO, Munich, Germany), the United States Patent and Trademark Office (USPTO, Alexandria, VA, USA) and the Korean Intellectual Property Office (KIPO, Daejeon, Republic of Korea). While these offices cover regional patent applications, they also accept patents on a global scale, with academics and organizations from multiple countries filing their patents to the office of their choice.
Based on the rapid increase in patent applications of DOSD, especially during the last twenty years, as well as the growing necessity for specialized software, the main motivation of the current study is to investigate the DOSD patent landscape with the aim of identifying technological development trends and innovation dynamics covering a period from the infancy of DOSD patent activity up to the rising age of the fourth industrial revolution, known as Industry 4.0. Our findings highlight the dynamic technological shifts in DOSD patent activity, providing a roadmap of development and innovation. In addition, the empirical evidence provided serves as a point of reference for technological convergence in the DOSD domain, bridging past practices with future prospective innovations.
Currently, there are software business suites available such as Orbit Intelligence, Derwent Innovation, PatSeer, AcclaimIP and others that perform similar tasks, either focusing on the legal aspects and litigating activities of patents or by analysing patent entries. However, these tools tend to mostly focus on business indicators and industrial growth metrics, hence having a larger effect on the business and economic landscapes. To the contrary, our study serves as a methodological framework and not a software tool and as a primary investigator of the technological landscape and the innovative technologies and practices encompassed by granted patents.
The remainder of the study is organized as follows. In
Section 2, we provide some indicative related work in the field of patent analysis, while in
Section 3, we present our research methodology. In
Section 4, we discuss the results of our analysis, while in
Section 5, we present possible threats to the validity of the study. Finally,
Section 6 serves as discussion points for our results and conclusions.
2. Related Work
Research activity on patents is abundant, with a plethora of methodologies for analysis observed, frequently focusing on text mining and topic modelling. In this section, we present some indicative works that have a similar scope to this paper, highlighting their main points and merits. However, as our work is the first contained effort that covers data-related patents solely based on the DOSD sector, we present similar research conducted in adjacent domains (e.g., Artificial Intelligence, Blockchain).
Several studies perform exploratory research that aims to profile the main aspects of patents and visualize the data in engaging ways. Albino et al. [
49] performed a geographical and technological assessment of software used in low-carbon energy projects, highlighting the prominent contributors and leading countries. Kang et al. [
50] sought essential patents in the Korean and international markets and explored the correlation between the geographical distributions and essential patents. Kim et al. [
51] used clusters of patent classes in order to predict their evolution and the main objectives they represent. Moehrle et al. [
52] defined “technological speciation” as the emergence of specific technologies in patents and use textual patterns in camera related software to examine its validity.
The field of artificial intelligence (AI) has known a considerable increase in later years, and multiple works study the patents under this domain, while also attempting to predict future trends and technologies. Tseng and Ting [
53] conducted a quality evaluation study, focusing on patent agencies from several countries and introducing several metrics to evaluate the innovation and quality of AI patents. Fujii and Managi [
54] combined patent information from several offices and grouped AI patents based on their objectives. Their analysis indicates that AI patents focus on mathematical models and knowledge extraction. Several studies delve deeper into domains that rely on AI such as nanotechnology [
55] and autonomous vehicles [
56], utilizing bibliometrics, citation networks and data exploration to highlight the main characteristics of patents belonging to these fields. Finally, future trends are explored in the AI sector by providing classification schemas [
45], focusing on cooperation networks between the organizations that file the patents [
57] and by analysing interconnections between companies and technologies [
58].
Another sector that has been studied under the scope of patents is augmented reality (AR), which includes virtual devices and environments. Choi et al. [
59] exploited semantic patterns in augmented reality patents in order to provide directions for further innovation. They concluded that image rendering and processing are the most promising areas that require additional research and development. Jeong et al. [
60] worked in a similar spirit and extracted topics from AR patents retrieved from USPTO. Their results have a high degree of agreement with Choi et al. [
59], with topics relating to display techniques being highly dominant. Finally, Evangelista et al. [
61] conducted a rich exploratory analysis, dividing AR patents into five classes and uncovering geographical and organizational trends.
Similar studies have been conducted in domains relevant with security, such as blockchain, where research is focused on forecasting technologies [
62]. Daim et al. [
63] utilized patent classes relevant to blockchain and the Internet of Things (IoT) and produced clusters of patent objectives and types, forecasting future needs. Wustmans et al. [
64] combined patent data from USPTO and microtrends data from TRENDONE and used a hybrid methodology of semantic terms and topics to predict technological developments. Zhang et al. [
65] used advanced Latent Dirichlet Allocation techniques to extract patent topics over different periods in order to provide a roadmap of thematic axes and to forecast the evolution of the blockchain domain.
The last domain is IoT, of which there are plenty of opportunities for research. Several works [
66,
67] exploit the citation networks of IoT patent families and technologies in order to construct clusters of patent communities that contain primary characteristics of IoT sectors. Similar research was performed by Mazlumi et al. [
68], exploiting social network analysis and graph metrics on patent classes. Trappey et al. [
69] provided a roadmap of assignees and patent classes in order to aid the manufacturing of products related to IoT patents and the logistic procedures. In another study [
70], valuable manufacturing standards are provided, depending on the country, that facilitate the validation and granting of IoT patents, in the context of Industry 4.0. Moreover, some studies shed light on innovation and trends in the IoT industry by measuring consumer satisfaction [
71], exploring temporal trends [
72] or using bibliometrics and graph metrics [
73].
4. Results
In this section, the results of the conducted analysis are presented in accordance with the posed RQs.
- [RQ1]
What is the landscape of DOSD patents?
This RQ aims to conduct a descriptive analysis of the collected patents by examining the temporal evolution of granted patents, mapping them geographically and pinpointing the most active organizations that seek to own a patent and possibly exploit its contents commercially.
The distribution of the granting year (
Figure 3) reveals that the majority of patent-granting activity takes place after 2010 and follows an increasing trend. This showcases the necessity for DOSD in the last decade, which rose into the spotlight as the volume and types of available data rendered their manipulation from traditional software a challenging task. Patent granting in the 1990s is low, although this can be explained by the fact that patent offices had not effectively digitized their services, and thus, the stored patents were limited in number.
Moreover, the joint distribution of the patent subclasses within each decade (
Table 5) showcases some interesting findings on the focus of DOSD in each chronological period. The 1980s and 1990s seem to emphasize the Transformation of Program Code (G06F8/40), while the prime class of the 2000s is Software Deployment (G06F8/60), with Transformation of Program Code (G06F8/40) and Creation/Generation of Source Code (G06F8/30) behind. Finally, Software Design (G06F8/20) and Requirements/Specifications (G06F8/10) present a steady trend thorough the examined period.
In terms of geographical mapping, the USA is the top country, with 378 granted patents. The large number of USA patents is possibly due to the fact that some filed patents are automatically assigned to this country when filed to the USPTO. However, the USA is still a leading player in patent ownership [
88], and its numbers are consequently high. European countries have a strong presence in patenting products and inventions related to this domain, with the United Kingdom (40 patents) and Germany (35 patents) having a clear advantage over the rest of the continent, with France (17 patents) and Italy (15 patents) closely following. In Asia, the top countries are Japan (58 patents), Korea (14 patents), India (8 patents) and China (4 patents), which is supported by the rapid development of their software industries in later years [
89,
90,
91]. The limited number of patents belonging to Asian countries can be attributed to the fact that many Asian inventors prefer to file their patents in regional offices such as KIPO, JPO and CNIPA. Thus, the selection of USPTO, although a valuable source of information, is a minor threat to the validity of the study.
In
Table 6, we present the ten organizations that have the highest number of granted patents across the entirety of our dataset. The first organization with the highest number of patents is IBM, which is hailed as one of the leading companies in computer hardware, personal computers and commercial software. IBM appears to have a very active research department that aims at owning a large number of patents, maintaining the status of the company as a torchbearer in DOSD, which has been actively happening since the 1970s (
https://www.ibm.com/ibm/history/exhibits/dpd50/dpd50_intro.html, accessed on 18 September 2022). We can also observe that various companies are technology related and are active in the industries of electronics and devices (Samsung and Motorola), equipment (Siemens and Hitachi), as well as computer hardware and software products (HP and Intel). Ab Initio specializes in enterprise software facilitating data-related procedures in large companies. In terms of the countries of the top companies, the findings validate the geographical mapping, with the USA holding the lead.
Finally, there seems to be a plethora of different DOSD areas that the top assignees are focusing on, with IBM, Samsung and Intel owning patents related to Software Deployment (G06F8/60), and companies focusing on equipment, turning their attention to Software Maintenance/Management (G06F8/70) and hardware-related companies exploiting patents related to the Transformation of Program Code (G06F8/40), possibly for communication protocols and devices. An interesting exception is the Ab Initio Technology, which focuses on patents of Creation and Generation of Source Code (G06F8/30). Given that the services of this company are linked with developing data processing application and business suites, it is apparent that the creation and delivery of high-quality code is the core of its activities.
- [RQ2]
Which thematic trends can be traced in DOSD patents?
While RQ1 aimed at performing an exploratory analysis of the identified patents, RQ2 is directly leveraging the linguistic traits of the patent titles in order to trace patterns. This process can provide insights into the targeted areas that DOSD patents revolve thematically and uncover prominent topics in patent activity.
Table 7 provides a summary of the results extracted by the LDA algorithm by setting the
parameter to eight, along with the share and popularity metrics for each topic. The constructed model yielded a CS of 0.59, which is an indicator of a well-rounded model that produces balanced topics [
92]. Moreover, after carefully examining the key words that accompany each extracted topic in conjunction with the top five representative patents in terms of membership, we assigned a manual short title that better captures its general scope and purpose. An inspection of the topics showcases that they cover a wide range of DOSD tasks, with some of them being related to software that is used for handling memory issues (Topic 1) or being integrated in large scale systems (Topic 2) and others being closely related to dynamic frameworks that directly assist supporting business intelligence (Topic 7) or protocols that facilitate resource allocation and deployment and ensure proper knowledge transfer and data management (Topic 4). In addition, some topics cover facets of DOSD that have to do with version control and rollout of updates in software (Topic 8), while preserving software quality and the integration of processed data in interfaces and dashboards (Topic 5). Finally, two of the extracted topics are directly linked with parallel data processing, referencing the considerable amount of data produced in business and in software procedures along with specialized environments developed for this purpose (Topic 3), as well as large scale simulations of data processes, potentially for risk management and estimation (Topic 6).
In terms of the topic membership metrics,
Table 7 indicates that all topics are evenly distributed across the patent documents. The most shared topics appear to be Topic 8 (Version control and software quality) and Topic 5 (Data integration, interfaces and updates). Both of these topics are directly related to the technical side of DOSD, with Topic 8 referencing the continuous need of companies to ensure that the proper versions of software are deployed in production routines and Topic 5 concerning the issues that can be raised by integrating data in different interfaces and updating software to accept new data inputs. Thus, given the importance of these issues in a business, it is more than expected that these two topics have the highest share values. Apart from that, Topic 1 (Software for memory management) has the third highest share value, highlighting the need for software that has efficient memory handling for processing large and different forms of data. In contrast, the lower share metric can be found in Topic 4 (Resource allocation and information transferring). However, this can be attributed to the coverage of similar patents by other topics that have higher share values, such as Topic 5, and its more specific nature.
On the other hand, the popularity metric indicates topics that are dominant in the distribution of patent documents. With this in mind, the most popular topic in patent objectives appears to be the version control and software quality in data-related products (Topic 8) along with the integration of data in interfaces and the proper updates (Topic 5). The high popularity values of these topics correspond with their high share values and prove that integration and software quality procedures are the pillars of efficient DOSD. In contrast, the topics with the lowest popularity scores are the creation of automated software to be used in complex systems (Topic 2) and the exploitation of parallel processing and specialized programming environments (Topic 3). However, their restrained popularity values can be explained by the more technical and domain-specific aspects of Topic 2 and the fact that many patents that reference parallel processing may be focused on other primary objectives and may thus belong to other topics.
In addition,
Figure 4 serves as a visualization of the distances between the topics by projecting them onto a two-dimensional axis, utilizing the multidimensional scaling technique. In this figure, the circles correspond to the presence of each topic in the corpus of patent titles, while the circles are positioned based on the inter-topic distance. The exploration of
Figure 4 indicates a well-defined LDA model, since there are no overlapping circles, while topics represented by circles are located in every quadrant. In addition, the topics are well-distributed over the corpus of patent documents, as there is no clear dominant topic. This finding proves that DOSD patents express multiple equally important objectives. Furthermore, topics closer to one another are thematically adherent, focusing on similar technological objectives. Topic 1 (Software for memory management) and Topic 2 (Automated software for large scale systems) seem to refer to the management of memory, which can be expanded in large scale systems. In addition, Topic 2 appears to be the farthest away from other topics, while the small radius of its circle indicates that it is dominant in the least number of patent entries. However, this is not surprising, as its objectives are very specific and are tackled by field experts. Topic 7 (Dynamic frameworks and business environments) is also in a close distance and thus similar with Topic 1 (Software for memory management), which can be explained by the fact that dynamic interfaces and business environments usually handle advanced visualizations and need to efficiently distribute memory. Other distinct groups are Topic 5 (Data integration, interfaces and updates) and Topic 8 (Version control and software quality), which complement each other, as data integration and updates in software also require version control and quality routines to be deployed. These topics have the highest dominance score, as their circles are larger. Finally, in the lower right quadrant, the parallel processing architectures (Topic 3) are close to both simulations for advanced data processing (Topic 6) and the handling of resource allocation tasks (Topic 4).
- [RQ3]
How is technological innovation portrayed in the interconnection of DOSD patents?
The creation of the PCN and CCN directed networks reveals some very interesting findings about influential patents and patent classes that drive innovation and serve as guidelines that other patents follow when formulating their objectives and purposes. The top hub and authority patents extracted from the application of the HITS algorithm in the PCN network are presented in
Table 8. The PCN nodes tend to form communities of patents that cite each other, with a modularity score of 0.91, which is expected, given that each patent has its own set of forward and backward citations, even if some patents may cite the same patents.
The identified authorities are patents that, when present in the PCN, receive a large number of incoming edges by hubs, which essentially means that they are highly cited by other influential patents that shape the objectives of subsequent patents. It is apparent that the identified authorities concern patents with highly valuable objectives for DOSD, with the top authority being relevant to data integration. This finding is in line with the topics extracted in RQ2 and proves that proper integration of data is crucial in organizations, as is its commercial exploitation. Other authorities are relevant to object-oriented programming architectures, distributed software and databases, which are all aspects of developing software primarily targeted for data manipulation and management.
In contrast, the hubs of the PCN represent patents that have a large number of outgoing edges to authorities, hence being patents that highly cite other important patents. This fact indicates patents that are directly referencing other technological fields and may combine objectives and methodologies from different patents, thus creating an innovative result [
93,
94]. An interesting finding is that IBM is the sole assignee that has top hubs, which compliments the fact that it ranks first when it comes to the highest number of granted patents. Among the top hubs, there are some quite promising objectives of GUI handling, concurrent and parallel processing, as well as debugging and processing data in applications.
The other facet of RQ
3 was the identification of bridge nodes (or CPC classes) that drive or control innovation and transfer of knowledge in patent objectives. Ιn
Table 9, we present the top brokers of the CCN network for each brokerage role. Given that the patents of the CCN network contained only the citations of classes that were subclasses of G06F8, there were no itinerants detected. However, we present the top brokers for the remaining triad roles.
According to Gould et al. [
74], each brokerage role reveals different stages of innovation and knowledge transfer for the participating classes. Of course, a node (or class) can have multiple brokerage roles. In the case of the constructed CCN, coordinator classes facilitate the connection between internal classes of the same superclass, thus allowing knowledge and patent objectives to be transferred directly and without limitations. These classes essentially serve as “stopping points” for other classes that utilize them to reach other similar subclasses and are generally classes that define DOSD. Among them, we can find Compilation (G06F8/41), Software Deployment and Updates (G06F8/60, G06F8/65), Graphical Programming (G06F8/34) and Parallelism (G06F8/45). We can see that the coordinator classes also correspond to the identified topics of RQ
2, further validating the prominent fields of DOSD.
Gatekeeper classes are quite different from coordinators, as they have increased authority. Essentially, the classes that belong to this category are “guarding” subclasses of the same superclass and decide whether to allow or deny access to them from other classes. Gatekeeper classes can be defined as well-defined and robust aspects of DOSD that influence the objectives of a large number of patents in the network while simultaneously being intermediaries between different technological fields. Novel classes in this category are Version Control (G06F8/71) and Installation (G06F8/61), while the other classes have been described in the previous role.
Representatives are the exact opposite role of gatekeepers. Where a gatekeeper class would control access in a class of the same group, representative classes are trying to communicate with other classes and transfer knowledge. Representative nodes of the CCN are classes that actively cite other classes and are used in interdisciplinary patents of DOSD. An interesting class of this category is Software Design (G06F8/20), with the top representative class being Software Deployment (G06F8/60).
Finally, liaison classes are patent classes that link other classes that are unrelated to each other, in the sense that neither node belongs to the same class group. Liaison nodes act as mediators between technological fields and can be used to bridge different ideas and objectives of patents in elegant software solutions. Installation (G06F8/61) is the top liaison class, with Updates (G06F8/65) and Software Deployment (G06F8/60) closely following. An interesting addition in this category is Software Maintenance/Management (G0F8/70), which was also present in the main subclasses of the top organizations.
It is apparent that Software Deployment is an important broker, holding both promoting (Coordinator, Representative), authoritative (Gatekeeper) and neutral (Liaison) roles. This is an excellent indicator of the importance of proper software deployment architectures for data processing that need to be carefully developed and have potential to be applied in multiple fields. Another major class is Updates, which is another key aspect of DOSD while Version Control is a prominent gatekeeper, possibly due to the more technical nature of patents filed under this class. Finally, Software Design is a class that is actively used to promote innovation and define patent objectives with a Representative role and Software Maintenance/Management acts as a mediator and necessary procedure for the development of software and the granting of patents that belong to other classes.
In addition, the results from the network analysis on the CCN are presented (
Table 10), where the most important nodes ranked by their centralities can be seen. Overall, the CCN has a more abstract community structure, with a modularity score of 0.39.
As far as node centralities are concerned, the nodes that have a larger number of external and internal edges as citations (Degree Centrality) and the nodes that act as immediate connections between node paths (Betweenness Centrality) are similar to the results of the BA, with Software Deployment and Updates occupying the top spots. A more interesting finding lies in the nodes that are closer to every other node in the network (Closeness Centrality), thus being immediate or intermediate citations of other classes, with Requirements Analysis/Specifications (G06F8/10) and Software Design (G06F8/20) being present, indicating that proper requirement definition and design of software before the implementation are very important factors in patent objectives and innovation.