How Society 5.0 and Industry 4.0 Ideas Shape the Open Data Performance Expectancy

The open data (OD) performance expectancy is a critical factor for the user technology acceptance models for future implementation OD in Industry 4.0, and to have an impact in area of Society 5.0. The purpose of this article is identifying trends and key words (leading terms) in promoting ODs for their use in Industry 4.0 and Society 5.0. We are also looking for leaders in Europe in promoting the use of OD in the context of Industry 4.0 and Society 5.0. The research methodology includes methods such as: analyses based on text mining, visualization techniques, and multidimensional cluster analyses with correlation analyses. The dataset covered 288 digital products and services based on OD. The timeframe covers the period January 2018–January 2020, and the research focuses on European issues. The research is focused on texts promoting the digital OD products and services, with the most popular being applications, websites and platforms. The main direction in presenting the benefits of their use is related to promoting them as tools to provide real time information on public issues, primarily in areas such as transport, education, culture and sport, economics and finance and health. The main types of OD are geodata and those specified as national and local. Additionally, the geographical area in Europe-dominating countries, and the key terms promoting product and services in context of OD performance expectancy in Western Europe, Northern Europe, Southern Europe and Eastern Europe, were found.


Introduction
The usage of open data (OD) in Industry 4.0 with application of OD by smart sustainable Society 5.0 is able to develop the performance expectancy in the new innovative economy. It is necessary to discover what aspect of OD promotion for future implementation in Industry 4.0 has an impact on the acceptance models in the area of OD performance expectancy for Society 5.0. The openness of data, both public [1] and private [2], seems to be one of the crucial drivers [3] for the sustainable economy, and might have an impact on information and communication technology (ICT) [4,5] innovation and creativity bridge in developing a new ecosystem in Industry 4.0 [6] and Society 5.0. The idea of sustainability with the usage of open data is based on some similarity to public goods, but their use enriches the common value of open data. [7] Open data, as a source of information and knowledge in a knowledge-based economy, might well be a free resource for end-users; however, its production, maintenance and gathering need to be secured [8,9] and maintained, with significant cost by skilled staff [10][11][12], with appropriate AI and Big Data technologies [13], and through implemented systems with open standards. Open data give access to help in creating added value in Industry 4.0 and Society 5.0, and support the idea of greater openness and accountability in business and administration governance [14], e.g., open initiatives like open government data (OGD) [15,16]. There are some questions in term of the economic issue of sustainable open data usage acceptance and development, e.g., how are open data projects are funded sustainably in the absence of a direct revenue stream, and how the business can count the source of revenue from open data or their impact in terms of competitive advantage? The purpose of the paper is divided into three parts: To achieve parts one and two of our purpose, we looked for currently dominant terms in open data performance expectancy creation and compared them with terms related to Society 5.0 and Industry 4.0. Our research was inspired by the work of Zuiderwijk et al. [17], in which "open data performance expectancy" is associated with presenting the benefits of using open data. Thus, this construct positively influences behavioural intention to use and accept open data technologies. The benefits of using open data are the subject of research by many scientists. We can find works describing the economic [18][19][20][21][22][23][24][25], social [26][27][28][29][30] and environmental [31][32][33] value of open data. Therefore, it is interesting to investigate the direction in which open data performance expectancy is shaped. In this way, we present the first research question: RQ1: How are the products and services created on the basis of open data presented? What terms are dominating in the descriptions of presented case studies?
To achieve part three of our purpose we look for the geographical areas of Europe that have made a dominant contribution in creating open data performance expectancy. We explore the links between the geographical areas of Europe in terms of open data performance expectancy creation. Our research was inspired by works related to open data maturity in Europe [34][35][36][37][38][39]. These works gave insights into the development achieved in the field of open data in Europe. However, they do not investigate the links between individual regions and countries. In this way, we present the second research question: RQ2: How are products and services created on the basis of open data promoted in particular parts of Europe? Are there similarities between those geographical areas?
Due to the fact that data are at the center of open data investigations [34][35][36][37][38][39][40][41], we could not only be interested in the descriptions of products and services based on open data, but also in the types of open data. Thus, we explord the most common types of open data used in the creation of products or services and explored the links between the geographical areas of Europe in this respect. In this way, we present the third research question: RQ3: What kinds of open data are most frequently used? Are there similarities between the different geographical areas of Europe in this respect?
It is also worth adding the paper shows the continuation and supplementation of the proposed research into the "performance expectancy" construct for open data, presented in [42]. This kind of research, in the long term, may give an interesting insight into the development of the values promoted in society, and will bring a new perspective to the literature on the study of "performance expectancy" in the context of open-data-based sustainable development, Society 5.0 and Industry 4.0.
Aiming to achieve the above purposes and answer the three questions, we proceed with the literature review presented in Section 1, dedicated to describing the relationship between open data, Society 5.0 and Industry 4.0. In Section 2, we present a research methodology based on the analysis of materials promoting the implementation of open data using text mining, visualization techniques, correlation and cluster analyses. Section 3 described the research results, and in Section 4 we discuss the obtained results. The final section contains the conclusion.

Literature Review
The literature review showed that open data have been described in terms of different aspects of research. The theoretical and practical point of view showed the definition and evaluation of understanding the basics of open data in the context of openness and innovation. The context of data sharing in digital formats, e.g., PDF, XLS, CSV, RDF, LOD [43] became popular and-because of the pandemic-there are some necessary and important factors of digital data sharing in the electronic global ecosystem. The history of open data development created new terms and the associated initiatives of linked open data promotion in the web environment, e.g., Global Open Data Index (GODI) [44] or The Linked Open Data Cloud (LODC) [45]. The previous research had shown the impact of linked open data in the process of management in an e-society environment. Due to the linked open data principles, the open procurement data developed the ontologies' standards, e.g., the Open Contracting Data Standard (OCDS) [46], and that is why the public sector has the opportunity to reuse terms from external vocabularies and ontologies when appropriate [47]. The linked open data have a big impact on public administration governance, but business informatics has not had enough mass satisfaction in reusing the gathering linked open data [1,3]. Smart and innovative businesses have the opportunity to reuse the big open dataset with smart technologies, web application and web data analysis techniques.
Research studies regarding ideas and methods of open data promotion for sustainable development for Industry 4.0 and the implication of open data promotion for Society 5.0 are lacking.

Open Data, Society 5.0 and Industry 4.0 Relationship
Theories on the information society [48] changing into a sustainable information society are described by Fuchs [49] and Wątróbski et al. [50]. Nowadays, due to the Sustainable Development Goals (SDGs) and dimensions of sustainability [51][52][53], the current information society theory is developing into sustainable super-smart society paradigm [53,54]. In 2016, the Japan Business Federation "Keidanren" proposed the definition of a new smart society of the 21st Century, responding to new economy needs with Industry 4.0. The new society was named Society 5.0 and described as a super-smart and human-centered society which resolves various social and economic challenges by incorporating innovations such as robots, big data and artificial intelligence [51]. Society 5.0 is described by Fukuyama [55], p. 48, as a "society of imagination". Society 5.0 is expanding transparency and active participation in social issues, with equal opportunities for all people, and an integration of innovative technologies and society. The integration between cyberspace and physical space in the ecosystem of business and public government is possible to complete due to humans, Artificially Intelligent machines and robotics [51,55,56], advanced analytics and predictive decision support systems, and it creates an environment with the usage of high-IT technologies in Industry 4.0, and provides the opportunity to develop the competence of knowledge workers as well.
Improving the Industry 4.0 idea provided evidence of data needs for real-time data analysis and processing. This is described as a closed loop of relationships between physical and digital space. This cycle creates the physical-to-digital-to-physical loop as a cyber-physical space. The Industry 4.0 cycle improves the sustainable development of production, manufacturing, logistics, marketing, sales, etc. In manufacturing, the McKinsey Report showed the main levers of digitization in Industry 4.0; there are some value drivers of labor, inventories, supply/demand matching, time to market, service/after sales, resources/processes, and asset utilization, that improve the companies' performance. A cyber industry developed based on IT technologies, such as IoT, sensors, advanced analytics, cloud computing, cyber security, smart and mobile application, artificial intelligence and augmented reality, and Cyber Industry Network, was created [57]. The practical implementation of Industry 4.0 involves innovative production, adaptive manufacturing, adaptive robotics, logistics with radio-frequency identification (RFID) and real-time locating sys-Sustainability 2021, 13, 917

of 24
By analyzing the key terms describing Society 5.0 and Industry 4.0, we can see that some of the issues overlap, i.e., modeling processes and simulation with open data usage. It also seems that the bridge between the two concepts is focused on technologies supporting the creation of physical-to-digital-to-physical loops. This loop is used to ensure the sustainable development of a human-centered society with open innovation as a driving force for new business and services [51]. Therefore, when looking for the previously mentioned relationships, we can distinguish three main issues, such as: (1) human-oriented action; (2) sustainable development; (3) physical-to-digital-to-physical loop. This conclusion prompted us to pose our first hypothesis related to RQ1: Hypothesis 1 (H1). The dominant terms in the descriptions of open data-based products and services indicate the relationship with Society 5.0 and Industry 4.0.

Open Data Performance Expectancy
Since the main idea of using open data is to generate economic value by using open data in different ways and by different users, it is worth noting that this idea is firmly anchored in the realization of the idea of creating a "society of imagination", which leads us directly to the idea of Society 5.0 [55] and Industry 4.0 [62]. In such a society, the imagination and creativity of different people lead to problem solution and value creation due to the technological transformation of Industry 4.0 [51,56]. As open data stem from the Society 5.0 concept, numerous researchers explore both the influence of open data on society functioning and factors that improve the use of open data. Among the papers dealing with open data, one can find works focusing on a selected field, e.g., medicine [82][83][84] or environmental issues [31][32][33]. However, there are also papers dealing with open data from a more general perspective. Such papers include those that describe the issue of economic value of open data [18][19][20][21][22][23][24][25]. There are also papers on the influence of open data on democratic processes through expanding transparency, citizens' active participation in decision-making processes, policy making and solving social issues [26][27][28][29][30]. In the literature on open data, one can also find papers focused on exploring the conditions that are favourable for open data utilization [85][86][87][88][89][90]. Many of them bring the issue of open data quality to the fore [40,41]. Among the papers concerning research on the conditions favourable for open data utilisation are those that consider the issue of the design of platforms dedicated to open data. Such papers focus on very sophisticated issues, such as a context-specific approach [91].
It is not difficult to notice that although the papers devoted to open data focus on various economic, social or technical issues, two main research trends can be indicated. The first explores the question of open data usability and value. The second focuses on analysing factors that foster or hinder the increasing use of open data by society. Both trends lead to the subject of the acceptance of new technology. Though there are several models describing the acceptance of technology (theory of reasoned action (TRA) [92], theory of planned behaviour (TPB) [93] and technology acceptance model (TAM) [94], in this paper, one of the most important models describing the acceptance and use of new technology-the unified theory of acceptance and use of technology (UTAUT) model-was chosen as a background. This model is very often used to test the acceptance and use of Information Technology [95]. Open data acceptance has already been analysed with this model [17], as well as with the model resulting from social cognitive theory [96,97]. However, the observations made by the author of this paper refer more to the constructs proposed in the UTAUT model. The key idea of the UTAUT is that a number of factors lead to the behavioural intention to accept and use technology, while this behavioural intention, in combination with facilitating conditions, leads to the actual use of this technology [98]. Thus, Behavioural Intention to use Information Technology is an important construct in the UTAUT model. It is determined as an individual's intention, prediction or plan to use a technology in the future. Four constructs have a direct influence on Behavioural Intention to use Information Technology. These are Performance Expectancy, Effort Expectancy, Social Influence and Facilitating Conditions. In addition, there are four more important moderators for this construct, i.e., Gender, Age, Experience and Voluntariness of Use.
The relationship between the trends surrounding open data research and the UTAUT model stems from the fact that, in order to use open data, it is necessary to use the related dedicated technology [17]. The first mentioned trend, which analyzes the usability and value of open data, is related to the UTAUT model construct described as "performance expectancy". The second trend, which explores stimulants and destimulants of the growing use of open data, is connected with the UTAUT model constructs called "Effort expectancy" and "Social influence". The research on the acceptance and use of open data technology with the use of the UTAUT model was conducted by Zuiderwijk and his co-workers [17]. They proved that "performance expectancy" and "social influence" are positively correlated with behavioural intention to use and accept open data technologies. On the other hand, "Effort expectancy" and "Voluntariness of use" are negatively correlated with the use intention and acceptance of open data technology [17]. Zuiderwijk and his co-workers have indicated a number of recommendations on how to develop individual constructs in order to increase behavioural intention to use and accept open data technologies. Each of these constructs is a broad research area with several subareas, which can be seen in the available literature on open data.
This article is part of the research area dedicated to the "performance expectancy" predictor. Performance expectancy in the UTAUT model is defined as "the degree to which an individual believes that using the system will help him or her to attain gains in job performance" [95], p. 447. Performance expectancy is a construct which, in terms of inducing motivation to act, has strong links to the self-determination theory (STD) [99], especially in terms of internal motivation. Moreover, Venkatesh [95] admits that obtaining valuable results, such as an increase in earnings or work efficiency, are significant motivators to use a particular technology. In the context of open data technology, this means that the "availability of open data technologies, such as data platforms, software, tools and interfaces increases individual's or organization's expectancy to perform better" [95], p. 431, and that the availability of these technologies will lead to tangible financial benefits. Zuiderwijk and his co-workers postulate that measures should be taken to raise awareness of the benefits of open data in order to increase social performance expectancy [17]. When analysing materials promoting open data in the Internet, one can see that this postulate is strongly realised. Different countries are building open data portals and promoting the benefits of open data technologies. In this context, it seems interesting to examine the direction activities promoting the expected benefits of using open data (performance expectancy). For this reason, we have formulated the following hypotheses related to RQ2 and RQ3:

Hypothesis 2 (H2).
There is a similarity in promoting open-data-based products and services in particular parts of Europe.

Conceptual Scheme of Research Elaboration
In order to meet the main objective of the paper, the conceptual scheme of research elaboration was used ( Figure 1).

Conceptual Scheme of Research Elaboration
In order to meet the main objective of the paper, the conceptual scheme of research elaboration was used ( Figure 1).

Stage 1: Collecting Data
The research data were retrieved from the European Data Portal (https://www. europeandataportal.eu/). This portal implements the policy of promoting benefits resulting from using open data and collects descriptions of applications, websites, portals and services using open data (the so-called use cases). The collected use cases come from different countries. Thus, files with descriptions of created products/services from the last two years, i.e., 2018 and 2019, and January 2020 were retrieved. This period covered descriptions of 288 applications, websites, portals and services using open data. The last two years were selected as file recovery dates, because only these data enabled us to use the text mining method properly. Data from earlier periods were incomplete and would distort the analysis.
The files were processed into a text format, organised into a data frame and further analysed using text mining, database and visualization techniques, cluster analysis and correlation analysis. Variables which are subject to analysis are presented in Table 2.

Stage 2: Processing of Text Variables
Although most of the variables presented are text data, some of them could only be analysed by means of text mining. These variables include "Description" and "Type of open data". The analysis of these variables was made with the use of R packages dedicated for text mining. Thus, text data were tokenised into words and bigrams, and cleared from the so-called stop words, i.e., words that do not carry an important meaning (see Figure 2). The resulting set of tokens (words or bigrams) were subjected to counting and visualization, usually on tables and bar chart diagrams. The research data were retrieved from the European Data Portal (https://www.europeandataportal.eu/). This portal implements the policy of promoting benefits resulting from using open data and collects descriptions of applications, websites, portals and services using open data (the so-called use cases). The collected use cases come from different countries. Thus, files with descriptions of created products/services from the last two years, i.e., 2018 and 2019, and January 2020 were retrieved. This period covered descriptions of 288 applications, websites, portals and services using open data. The last two years were selected as file recovery dates, because only these data enabled us to use the text mining method properly. Data from earlier periods were incomplete and would distort the analysis.
The files were processed into a text format, organised into a data frame and further analysed using text mining, database and visualization techniques, cluster analysis and correlation analysis. Variables which are subject to analysis are presented in Table 2.

Stage 2: Processing of Text Variables
Although most of the variables presented are text data, some of them could only be analysed by means of text mining. These variables include "Description" and "Type of open data". The analysis of these variables was made with the use of R packages dedicated for text mining. Thus, text data were tokenised into words and bigrams, and cleared from the so-called stop words, i.e., words that do not carry an important meaning (see Figure 2). The resulting set of tokens (words or bigrams) were subjected to counting and visualization, usually on tables and bar chart diagrams. In this stage, the hypotesis H1 was verified.

Stage 3: Creating Subsets and Searching for Similarities between Them
Text variables were analysed in the context of the remaining variables, i.e., country and region of Europe (north, south, east, west). Thus, the above-mentioned operations of text mining were also performed on the data subsets. Thanks to this, an analysis was In this stage, the hypotesis H1 was verified.

Stage 3: Creating Subsets and Searching for Similarities between Them
Text variables were analysed in the context of the remaining variables, i.e., country and region of Europe (north, south, east, west). Thus, the above-mentioned operations of text mining were also performed on the data subsets. Thanks to this, an analysis was performed for the countries and regions of Europe. As a result of text mining, sets of tokens that were characteristic of a given subset (e.g., region of Europe) were achieved. In order to answer the question concerning the similarity of subsets, database techniques were used, which enabled us to link the created sets. The link was a field containing tokens selected at an earlier stage. Thanks to this, it was possible to use multidimensional cluster analysis, visualization, and calculate the correlation coefficient for the subsets. In the cluster analysis, we used the 1-Pearson distance as the distance function. As the principle for connecting cluster, the Ward method was used. The grouping of objects was performed with the hierarchical agglomeration method. This allowed us to generate hierarchically ordered clusters, which can be presented in the form of a dendrogram (hierarchical tree), presenting the distances between the objects.
Apart from calculating the correlation coefficient, the visualization of dependency between the subsets was performed on a scatter plot diagram. This visualization allowed for the identification of outliers that interfered with the correlation coefficient (see Figure 3). Thus, the outliers were rejected and the correlation coefficient was recalculated. The obtained results were visualized with the use of the following visualizations: bar chart, scatter plot, word clouds and map. For this purpose, R packages and data visualization tools, i.e., Mstr Desktop, were used.
In this stage, the hypoteses H2 and H3 were verified.
Sustainability 2021, 13, x FOR PEER REVIEW 9 of 25 performed for the countries and regions of Europe. As a result of text mining, sets of tokens that were characteristic of a given subset (e.g., region of Europe) were achieved. In order to answer the question concerning the similarity of subsets, database techniques were used, which enabled us to link the created sets. The link was a field containing tokens selected at an earlier stage. Thanks to this, it was possible to use multidimensional cluster analysis, visualization, and calculate the correlation coefficient for the subsets. In the cluster analysis, we used the 1-Pearson distance as the distance function. As the principle for connecting cluster, the Ward method was used. The grouping of objects was performed with the hierarchical agglomeration method. This allowed us to generate hierarchically ordered clusters, which can be presented in the form of a dendrogram (hierarchical tree), presenting the distances between the objects. Apart from calculating the correlation coefficient, the visualization of dependency between the subsets was performed on a scatter plot diagram. This visualization allowed for the identification of outliers that interfered with the correlation coefficient (see Figure  3). Thus, the outliers were rejected and the correlation coefficient was recalculated. The obtained results were visualized with the use of the following visualizations: bar chart, scatter plot, word clouds and map. For this purpose, R packages and data visualization tools, i.e., Mstr Desktop, were used. In this stage, the hypoteses H2 and H3 were verified.

Description of Research Sample
In order to present the research sample, meaning the collation of countries promoting products/services based on open data in 2018-2019, a bar chart and map visualization were developed, as presented in Figure 4.

Description of Research Sample
In order to present the research sample, meaning the collation of countries promoting products/services based on open data in 2018-2019, a bar chart and map visualization were developed, as presented in Figure 4.
As shown in the figure, the United Kingdom played a dominant role in the period under analysis. Among the top three countries were Spain and the Netherlands. The visualization on the map shows that Germany, Italy, France, Belgium and Ireland should be included in the group of countries that dominate in terms of promoting products and services based on open data. The identification of these countries is quite important because, in the following part of the study, the considerations were divided into Northern (70 cases), Southern (64 cases), Eastern (15 cases) and Western Europe (97 cases). There were 42 cases from outside Europe. Thus, it is worth noting that Northern Europe, which, in the analysed period, showed the highest activity in performance expectancy creation, was dominated by the United Kingdom and Ireland, with a small share of other countries from this part of Europe. The results were similar in Southern Europe, where Spain and Italy dominated, with a small share of other countries. This means that these parts of Europe feature large variation in the degree of activity of individual countries. The situation was different in Western Europe, where one can see the significant activities of several countries, i.e., Netherlands, Germany, France and Belgium, which means that, with respect to activity in promoting products and services based on open data, this part of Europe was less diverse in the analysed period. Eastern Europe is characterised by similar low diversity. However, countries from this part of Europe showed low activity in the analysed period. As shown in the figure, the United Kingdom played a dominant role in the period under analysis. Among the top three countries were Spain and the Netherlands. The visualization on the map shows that Germany, Italy, France, Belgium and Ireland should be included in the group of countries that dominate in terms of promoting products and services based on open data. The identification of these countries is quite important because, in the following part of the study, the considerations were divided into Northern (70 cases), Southern (64 cases), Eastern (15 cases) and Western Europe (97 cases). There were 42 cases from outside Europe. Thus, it is worth noting that Northern Europe, which, in the analysed period, showed the highest activity in performance expectancy creation, was dominated by the United Kingdom and Ireland, with a small share of other countries from this part of Europe. The results were similar in Southern Europe, where Spain and Italy dominated, with a small share of other countries. This means that these parts of Europe feature large variation in the degree of activity of individual countries. The situation was different in Western Europe, where one can see the significant activities of several countries, i.e., Netherlands, Germany, France and Belgium, which means that, with respect to activity in promoting products and services based on open data, this part of Europe was less diverse in the analysed period. Eastern Europe is characterised by similar low diversity. However, countries from this part of Europe showed low activity in the analysed period.

Terms that Dominate in Descriptions of Products and Services Using Open Data
When answering the first research question about the dominant terms in products/services descriptions, based on the variable "Description", a set of words (Table 3) and bigrams (Table 4) was created. The analysis of words showed a large number of the term "data", which dominated all other terms. This domination shows the leading role of data in the analysed products and services, which, of course, is not surprising. Therefore, in further analysis, this term was treated as an outlier, which could have a significant impact on the subsequent stages of the analysis of the obtained data. Then, the terms

Terms that Dominate in Descriptions of Products and Services Using Open Data
When answering the first research question about the dominant terms in products/services descriptions, based on the variable "Description", a set of words (Table 3) and bigrams (Table 4) was created. The analysis of words showed a large number of the term "data", which dominated all other terms. This domination shows the leading role of data in the analysed products and services, which, of course, is not surprising. Therefore, in further analysis, this term was treated as an outlier, which could have a significant impact on the subsequent stages of the analysis of the obtained data. Then, the terms "information", "users" and "public" scored big numbers. This is evidence of the promotion of the benefits of providing user-oriented information linked to public objectives, which is, of course, closely linked to the nature and idea of open data. The focus on users is also highlighted by the presence of the following nouns in the further part of the ranking: "people", "citizens" and "user", as well as verbs: "provide", "collects", "access", "enables", "helps", "offers", "create". The role of these terms will be more broadly discussed in the bigram analysis. The large number of terms such as "application", "website" and "platforms" is also worth noting. This ranking presents the available products/services created based on open data. The most popular are applications and websites, while platforms take the third position. As it was possible to compare the results of analysis using the text mining technique (on the "Description" variable) with the visualization techniques on the "Products/services" variable (see Table 2), such a comparison was made. In this way, it was found that the most popular products/services in the analysed period were applications and websites with a smaller share of platforms, companies and services.  Apart from the terms related to user benefits and those describing the options of created products/services, the ranking also presents a group of terms connected to a particular business sector. The dominant role, in this respect, is played by terms related to transport, i.e., "transport", "location", "traffic", "parking". The domination of these terms is not surprising if we analyse the range of business sectors in the analysed period, in which transport is at the forefront (see Figure 5). Sustainability 2021, 13, x FOR PEER REVIEW 13 of 25 Analysis of bigrams (Table 4), similarly to the analysis of words, selected the term that has the role of an outlier. In the case of bigrams, it was the term "real time", which illustrates the main benefit of using open data. In the bigrams, one can see the presence of an outlier determined at the stage of word analysis, meaning the term "data", which now appears with other terms, i.e., "collect (-s, -ed)/gathers data", "public data", "data sources", "data portal", "data visualization", "data driven", "weather data", "transport data", "traffic data", "data quality" and "geospatial data". The analysis of the bigrams also selected further phrases connected with presenting the benefits of products/services based on open data. These are phrases containing verbs like: "collects data", "enable people", "provide users", "helps users/people", "provide real", promotes transparency", as well as phrases containing adjectives like "user friendly", "easily accessible", "additional information". Using these phrases in communication encourages users to use the new product/service. This also suggests a high focus of the promotional message on the end user, which is characteristic of marketing communication. The second group of phrases consists of those related to the specificity of a sector for which the created product/service is dedicated. Thus, these include phrases such as: "parking facilities", "public transport", "traffic information" "bike citizens", "arrival times", alternative fuelling", "air quality", "public toilet", "real estate", "land insight", "house price". Thus, with reference to Figure 5, presenting the dominant sectors during the period under examination, it can be seen that, in the bigram, terms that are characteristic of the transport sector are the leading group. The third group of phrases present in the bigrams are those related to methods of open data processing and visualization: "machine learning", "artificial intelligence", "search engine", "predictive models", "data visualization", "interactive maps", "graphic information". When comparing the most common words and biographies with the terminology found in the Society 5.0 literature, we can find great similarities. Thus, the H1 hypothesis was accepted.

Analysis of Terms in a Geographical Context-Finding Similarities between Regions
The answer to the question about similarities between geographical areas was sought by means of multidimensional cluster analysis and correlation coefficients analy- Analysis of bigrams (Table 4), similarly to the analysis of words, selected the term that has the role of an outlier. In the case of bigrams, it was the term "real time", which illustrates the main benefit of using open data. In the bigrams, one can see the presence of an outlier determined at the stage of word analysis, meaning the term "data", which now appears with other terms, i.e., "collect (-s, -ed)/gathers data", "public data", "data sources", "data portal", "data visualization", "data driven", "weather data", "transport data", "traffic data", "data quality" and "geospatial data". The analysis of the bigrams also selected further phrases connected with presenting the benefits of products/services based on open data. These are phrases containing verbs like: "collects data", "enable people", "provide users", "helps users/people", "provide real", promotes transparency", as well as phrases containing adjectives like "user friendly", "easily accessible", "additional information". Using these phrases in communication encourages users to use the new product/service. This also suggests a high focus of the promotional message on the end user, which is characteristic of marketing communication. The second group of phrases consists of those related to the specificity of a sector for which the created product/service is dedicated. Thus, these include phrases such as: "parking facilities", "public transport", "traffic information" "bike citizens", "arrival times", alternative fuelling", "air quality", "public toilet", "real estate", "land insight", "house price". Thus, with reference to Figure 5, presenting the dominant sectors during the period under examination, it can be seen that, in the bigram, terms that are characteristic of the transport sector are the leading group. The third group of phrases present in the bigrams are those related to methods of open data processing and visualization: "machine learning", "artificial intelligence", "search engine", "predictive models", "data visualization", "interactive maps", "graphic information". When comparing the most common words and biographies with the terminology found in the Society 5.0 literature, we can find great similarities. Thus, the H1 hypothesis was accepted.

Analysis of Terms in a Geographical Context-Finding Similarities between Regions
The answer to the question about similarities between geographical areas was sought by means of multidimensional cluster analysis and correlation coefficients analysis. First of all, sets of words characteristic of four geographical areas of Europe were created, i.e., Northern Europe, Southern Europe, Western Europe and Eastern Europe. Then, visualization and relevant calculations were made. A correlation test was also performed, which showed that the correlation coefficients are significantly different than zero (Table 5). Then, visualization and relevant calculations were made. A correlation test was also performed, which showed that the correlation coefficients are significantly different than zero (Table 5). The presented research results show that the most correlated sets of words are those for Northern Europe and Western Europe (0.867), Northern and Southern Europe (0.821) and Southern and Western Europe (0.811). Relatively lower correlation coefficients could be observed everywhere where Eastern Europe was present. When analyzing the dendrogram (Figure 6), we can see that Eastern Europe also acts as a separate cluster.  It is also worth noting that the calculated correlation coefficients were also influenced by some of the words. This can clearly be seen, particularly on the word-cloud It is also worth noting that the calculated correlation coefficients were also influenced by some of the words. This can clearly be seen, particularly on the word-cloud visualizations (see Figure 7). Those words were mainly "information", "users", "application", "website" and "public". Thus, in further analysis, those words were removed from the sets and correlation coefficients were calculated again. Their values are also presented in in the Table 5. The recalculated correlation coefficients have much lower values, however, as before, the correlation between the sets of words from Western, Eastern and Southern Europe is greater than in the case of pairs of sets containing words from description of use cases from Eastern Europe. The previously observed general tendency related to a correlation of sets was also observed after removing outliers.

Products and services description
visualizations (see Figure 7). Those words were mainly "information", "users", "application", "website" and "public". Thus, in further analysis, those words were removed from the sets and correlation coefficients were calculated again. Their values are also presented in in the Table 5. The recalculated correlation coefficients have much lower values, however, as before, the correlation between the sets of words from Western, Eastern and Southern Europe is greater than in the case of pairs of sets containing words from description of use cases from Eastern Europe. The previously observed general tendency related to a correlation of sets was also observed after removing outliers.

Figure 7. Sets of words for descriptions of products/services created based on open data in Northern (on the left) and
Western Europe (on the right). Note: the term "date" was removed as a major outlier. Source: own study.
In addition to the analysis of the relationship between the above-mentioned regions of Europe, a cluster analysis for selected countries was also performed. For this analysis, countries with the largest number of products and services based on open data were selected. In Figure 8, we can see that it is possible to distinguish a cluster consisting of the Netherlands, Germany, Great Britain, Ireland and Spain. France, Belgium and Italy exist as separate clusters. Western Europe (on the right). Note: the term "date" was removed as a major outlier. Source: own study.
In addition to the analysis of the relationship between the above-mentioned regions of Europe, a cluster analysis for selected countries was also performed. For this analysis, countries with the largest number of products and services based on open data were selected. In Figure 8, we can see that it is possible to distinguish a cluster consisting of the Netherlands, Germany, Great Britain, Ireland and Spain. France, Belgium and Italy exist as separate clusters.
Thus, taking into account the results of the performed analyses, we can conclude that hypothesis H2 was accepted.

Types of Open Data Used in Products and Services
When analysing the text variable related to the description of the types of open data used for creating products/services, one can note a large number of occurrences of the word "data". This term dominated all other terms in the analysed word set. This mainly proves the leading role of the term "data" for modern technology, which is indeed characteristic of the idea of a super-intelligent society. However, it must be noted that this term was treated as an outlier in further analysis, which resulted in its removal from datasets at the stage of correlation coefficient calculation. Therefore, a similar algorithm was used as for the variable "Description", discussed in the previous sections. However, further examination of the results of the text-mining analysis performed on the variable "Type of open data" shows that the data referred to as "national", "local" and "public" were present in large numbers. They occur more frequently than data referred to as "government" or "municipal" (Table 6). At the top of the ranking of the most common types of open data, there are also those referred to as "geodata", whose number would increase even further if they were combined with terms such as "geospatial", "geographic(al)" that appear on further positions in the ranking. At the top of the range of types of open data, there are also those referred to as "transport" (data). Data related to the transport sector can be also found on further positions of the ranking, together with terms such as "transportation", "traffic", "parking". Table 6 also shows why the most frequently promoted products/services based on open data in the analysed period included such sectors as cultural and educational and financial and economic. However, apart from the leading role of the transport sector, Table 6 does not present a direct reflection of the range of the top sectors, shown in Figure 5. Thus, taking into account the results of the performed analyses, we can conclude that hypothesis H2 was accepted.

Types of Open Data Used in Products and Services
When analysing the text variable related to the description of the types of open data used for creating products/services, one can note a large number of occurrences of the word "data". This term dominated all other terms in the analysed word set. This mainly proves the leading role of the term "data" for modern technology, which is indeed characteristic of the idea of a super-intelligent society. However, it must be noted that this term was treated as an outlier in further analysis, which resulted in its removal from datasets at the stage of correlation coefficient calculation. Therefore, a similar algorithm was used as for the variable "Description", discussed in the previous sections. However, further examination of the results of the text-mining analysis performed on the variable "Type of open data" shows that the data referred to as "national", "local" and "public" were present in large numbers. They occur more frequently than data referred to as "government" or "municipal" (Table 6). At the top of the ranking of the most common types of open data, there are also those referred to as "geodata", whose number would increase even further if they were combined with terms such as "geospatial", "geographic(al)" that appear on further positions in the ranking. At the top of the range of types of open data, there are also those referred to as "transport" (data). Data related to the transport sector can be also found on further positions of the ranking, together with terms such as "transportation", "traffic", "parking". Table 6 also shows why the most frequently promoted products/services based on open data in the analysed period included such sectors as cultural and educational and financial and economic. However, apart from the leading role of the transport sector, Table 6 does not present a direct reflection of the range of the top sectors, shown in Figure 5.  The words used to describe the type of data used were also analysed in the regional perspective, divided into Northern, Southern, Eastern and Western Europe. Thus, sets of words characteristic of the above regions were obtained. Then, the analysed regions were compared in pairs in order to answer the question "Are types of data used in particular regions similar?" To this end, correlation coefficients were calculated. They showed that sets of Northern and Western Europe are the most similar (correlation coefficient = 0.472). The second highest correlation coefficient (0.425) was the one between sets for Northern and Southern Europe. All correlation coefficients were statistically significant. The same conclusions can be drawn from the analysis of the dendrogram in Figure 9. Thus, Hypothesis H3 was accepted. municipal 7 Source: own study.
The words used to describe the type of data used were also analysed in the regional perspective, divided into Northern, Southern, Eastern and Western Europe. Thus, sets of words characteristic of the above regions were obtained. Then, the analysed regions were compared in pairs in order to answer the question "Are types of data used in particular regions similar?" To this end, correlation coefficients were calculated. They showed that sets of Northern and Western Europe are the most similar (correlation coefficient = 0.472). The second highest correlation coefficient (0.425) was the one between sets for Northern and Southern Europe. All correlation coefficients were statistically significant. The same conclusions can be drawn from the analysis of the dendrogram in Figure 9. Thus, Hypothesis H3 was accepted.  As the correlation between Northern and Western Europe turned out to be the largest, the word cloud visualization in Figure 10 was also made. We can note that, in Northern Europe, the most popular data types are geodata, public and transport. In Western Europe, in addition to the above-mentioned types, they used the most frequent national, local, government, health, municipal data. As the correlation between Northern and Western Europe turned out to be the largest, the word cloud visualization in Figure 10 was also made. We can note that, in Northern Europe, the most popular data types are geodata, public and transport. In Western Europe, in addition to the above-mentioned types, they used the most frequent national, local, government, health, municipal data. Figure 10. Words describing type of data for Northern (on the left) and Western Europe (on the right). Source: Own study.

Discussion
In the article, we answered three research questions and verified three research hypotheses. The H1 hypothesis, which states that "The dominant terms in the descriptions of open data-based products and services indicate the relationship with Society 5.0

Discussion
In the article, we answered three research questions and verified three research hypotheses. The H1 hypothesis, which states that "The dominant terms in the descriptions of open data-based products and services indicate the relationship with Society 5.0 and Industry 4.0", was defended. Analysis of the number of words present in the descriptions of products and services showed a large presence of terms related to the words "data" and "information" (see Table 3). These terms are characteristic of the terminology present in the texts describing Society 5.0 [46]. We also noticed the leading role of the "real time" concept (see Table 4) characteristic of the idea of Industry 4.0 [58]. Comparing other words and bigrams describing products and services based on open data with key terms from the literature related to Society 5.0 and Industry 4.0, we noticed terms related to: (1) humanoriented action, (2) sustainable development, and (3) physical-to-digital-to-physical loop (see Table 7). The trend observed in our research is also visible in other works related to open data. Thus, human-oriented actions are noticeable in papers concerned with the political impact of open data [27][28][29][30] and their economic value [18][19][20][21][22][23][24][25]. Subsequently, sustainable development is present in articles related to environmental issues [31][32][33]. Additionally, the political, economic and environmental impacts are components of one of the dimensions of open data maturity indicated in [34]. Regarding the physical-to-digital-to-physical loop, we can indicate papers related to data quality [40,41]. Thus, the trend observed in our research is concurrent with trends present in other researchers. In our opinion, the links responsible for this are ideas of Society 5.0 and Industry 4.0. Using the text mining method, we extracted the most common terms from the description of products and services based on open data. Subsequently, using correlation analysis (see Table 5) and cluster analysis (Figure 6), we defended the H2 Hypothesis, which states that "there is a similarity in promoting open data-based products and services in particular parts of Europe." However, we should add that the correlations between Northern, Western and Southern Europe are moderate, and the correlations between Eastern Europe and the rest of Europe are somewhat weak. The development of "performance expectancy" was not evenly implemented in Europe over the years under study. The most active countries in this respect are the United Kingdom, the Netherlands, Spain and Germany, Italy, France, Belgium, and Ireland. When comparing these results with an analysis of open data maturity in Europe [34][35][36][37][38][39] one can see that these are the countries referred to as "trend setters" or "fast trackers". It is, therefore, not surprising that they are very active in promoting products and services based on open data, and therefore very active in performance expectancy. It is also worth mentioning that research by Zhong [58] indicated the scientific centers from the United Kingdom, Germany, France, and Italy as those publishing numerous papers devoted to the idea of "intelligent manufacturing", which is another important idea related to Industry 4.0. This gives rise to the assumption that the high positions of these countries in both rankings related to open data use and rankings of scientific publications on intelligent manufacturing can be linked. The similarity of sets of words for Northern, Western and Southern Europe also shows the economic relations of these parts of Europe.
Eventually, by text mining, correlation and cluster analyses (see Table 8 and Figure 9), we defended the H3 Hypothesis, which states that "Northern, Southern, Western and Eastern Europe use similar types of open data." However, we should add that only the correlation between Northern and Western Europe is moderate, and the correlations between the rest part of Europe are weak. Our results also confirm the predominant role of Northern and Western Europe in the field of types of open data, by using the open data described in works [34][35][36][37][38][39]. We used the concept of open data performance expectancy coming from the theory of the acceptance model. This is a particularly important construct whose role is to arouse the need to use a given technology (in this case, open data technologies) by promoting the benefits of this technology. Performance expectancy is a construct that, in terms of inducing motivation to act, has strong links to the STD motivation theory [99], especially in terms of internal motivation. The main purpose of the "performance expectancy" construct is to present and promote the benefits of using open data technology, which, in the context of STD, has connotations of influencing the intention to use open data technology by perceiving it as a technology that significantly determines the efficiency of its users. Analysis of the documents promoting the use of open data has also shown that the main direction in presenting the benefits of open data products and services is to promote them as tools to provide information on public issues in real time, primarily in areas such as transport, education, culture and sport, economics and finance and health. The issues of data collection and provision to the user, enabling them to access information, and ensuring transparency related to public issues, are highlighted.
It is worth noting that the transport sector is one of the sectors that used open data most frequently in the analysed period. This may be explained by a noticeable intensification of innovative activities in logistics and seeking a new and better performance of logistics tasks through digitation of its processes [81]. This brings us to a subject related to the idea of Industry 4.0, where the digitization of logistic processes allows for sustainable development in areas indicated by Kayikci and described by him as three dimensions of sustainability, i.e., economic, environmental and social dimensions [53]. Therefore, it can be assumed that a large number of products and services that use open data in the transport sector fit into these dimensions and allow for more reasonable fuel consumption (fuel savings), implying financial (and therefore economic) savings and reduction in environmental pollution and the use of non-renewable resources (environmental dimension) and social benefits (social dimension) in terms of improved health through cleaner air and noise reduction.
The main limitation of the conducted research is the short research period (January 2018-January 2020). A period of 10 years would be a better research period. However, to date, data from such a long period are not available. The research sample also included big disproportions between the number of promoted solutions based on open data. For example, Eastern European products/services were few in number, while the United Kingdom dominated all other countries in terms of geographical area. In subsequent studies, however, with more use cases available for analysis, these limitations will no longer be present.

Conclusions
In the article, we realized three parts of the article purpose and we answered three research questions. Realization of part one of our purpose led us to the research findings described in Society 5.0 and Industry 4.0 relations. They showed that the bridge between the two concepts is focused on technologies supporting the creation of a physical-to-digitalto-physical loop to ensure the sustainable development of a human-centered society. When building the bridge between Society 5.0 and Industry 4.0, our investigation showed three main issues, such as: (1) human-oriented action; (2) sustainable development; (3) physicalto-digital-to-physical loop. In this context, we considered the performance expectancy of open data.
The results obtained in the first part allowed us to fulfill part two of our purpose and answered the first research question. Thus, the research results indicated that the most popular digital products and services based on open data are applications, websites and platforms. Analysis of the number of words present in the descriptions of products and services based on open data showed a high presence of terms related to the words "data" and "information". These terms are characteristic of the terminology present in the texts describing Society 4.0 and Society 5.0. Analysis of bigrams depicted terminology typical for physical-to-digital-to-physical loop ("real time", "machine learning", "artificial intelligence", "search engine", "predictive models", "data visualization", "interactive maps", "graphic information", "data driven"). Thus, while the terms "data" and "information" have grown out of the concept of Society 4.0, the promotion of data processing by artificial intelligence and its provision to physical space for human use shows that, in materials that promote the usefulness of open data, there are ideas stemming from the concept of Society 5.0. This postulate is reinforced by the fact that analysis of both words and bigrams show significant user focus, which is also directly connected with the postulates of human-oriented action ("enables people/users", "help people/users", "provide users", "easily accessible", "user friendly"). We also noticed the presence of terms related to sustainable development ("air quality", "bike citizens", "parking facilities/spaces", "public transport/transportation", "traffic data", "transport data").
Realization of part three of our purpose allowed us to answer the second research question. Thus, the most active countries in promoting open data performance expectancy are the United Kingdom, the Netherlands, Spain and Germany, Italy, France, Belgium and Ireland. The most uniform geographical area in terms of the creation of open data performance expectancy is Western Europe; in Northern and Southern Europe, the dominating countries can be found, while Eastern Europe shows the smallest activity. The correlation between Northern, Western and Southern Europe is moderate and the correlation between Eastern Europe and the rest of Europe is somewhat weak.
Correlation and clustering of word sets for particular parts of Europe was also made at the stage of analysis of the used open data types. In this way, we answered the third research question. Thus, Northern, Southern, Western and Eastern Europe use similar types of open data. However, only the correlation between Northern and Western Europe is moderate, and the correlations between the rest of Europe are weak. The main types of open data for the most active countries are those referred to as national, local and geodata.
The text-mining analysis of the variable describing the open data type also showed the presence of data related to sectors where products or services based on open data were most often promoted. The data obtained through text mining technique did not allow for the establishment of a ranking of data in relation to the sector, but it did allow one to notice the leading role of data related to the transport sector. However, due to that fact that text mining analysis was not the only type of analysis performed; it was also possible to distinguish typical sectors based on visualization of the "Sector" variable. This proved that the leading sectors in the analysed period were, apart from transport, the education and culture sector, economic and finance sector, and environment sector. Thus, these are the sectors where open data were most frequently used.