Next Article in Journal
The Uncertainty–Certainty Matrix for Licensing Decision Making, Validation, Reliability, and Differential Monitoring Studies
Previous Article in Journal
A Comparative Study of Two-Stage Intrusion Detection Using Modern Machine Learning Approaches on the CSE-CIC-IDS2018 Dataset
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling the Knowledge Production Function Based on Bibliometric Information

by
Boris M. Dolgonosov
Independent Researcher, Haifa 3543424, Israel
Knowledge 2025, 5(2), 7; https://doi.org/10.3390/knowledge5020007
Submission received: 6 January 2025 / Revised: 24 March 2025 / Accepted: 25 March 2025 / Published: 3 April 2025

Abstract

:
An integral indicator of the development of society is the amount of knowledge, which can be measured by the number of accumulated publications in the form of patents, articles, and books. Knowledge production is examined on a global scale. We analyze existing econometric models and develop a generalized model that expresses the per capita knowledge production rate (called productivity) as a function of the amount of accumulated knowledge. The function interpolates two extreme cases, the first of which describes an underdeveloped society with very little knowledge and non-zero productivity, and the second, a highly developed society with a large amount of knowledge and productivity that grows according to a power law as knowledge accumulates. The model is calibrated using literature data on the number of patents, articles, and books. For comparison, we also consider the rapid growth in the global information storage capacity that has been observed since the 1980s. Based on the model developed, we can distinguish between two states of society: (1) a pre-information society, in which the knowledge amount is below a certain threshold and productivity is quite low, and (2) an information society with a super-threshold amount of knowledge and its rapid accumulation due to advanced computer technologies. An analysis shows that the transition to an information society occurred in the 1980s.

1. Introduction

The development of society is directly related to the accumulation of knowledge both in the field of science and technology, and in the cultural and humanitarian sphere. Knowledge is produced at a rate that depends on its amount and population size. In turn, knowledge production controls population growth. The corresponding dynamic equations were obtained by Dolgonosov and Naidenov [1]. In this approach, a crucial factor is the per capita productivity of knowledge w q , t , which depends on knowledge amount q and time t . Total knowledge production q ˙ = d q / d t can be represented in general form as
q ˙ = w q , t N + J t
where N is the population size, and J t is an external source of knowledge. We assume in (1) that the number of knowledge producers is proportional to the total population, as is usually the case in econometric models [2,3,4,5,6,7]. In our previous studies [8,9], we looked at the problem of knowledge production by assuming that productivity is constant. This assumption has reasonable grounds for a pre-information society with its undeveloped computing capabilities. However, by now, an information society has been formed in which the rapid progress of computer technologies and artificial intelligence leads to increased productivity, which should be reflected in the rate of knowledge accumulation and, as a consequence, in demographic dynamics. The problem is to figure out what the function w q , t is, how justified the constant productivity approximation is, and under what conditions it can be applied. We will consider the problem in this work, which will allow us to more accurately determine the state of society in terms of the speed of information processing.
The further development of the theory requires studying the general case where productivity depends on accumulated knowledge. This problem has also been addressed in econometric models describing the relationship between technological development and population growth. In contrast to the sum of technologies, knowledge is understood more broadly: it includes all the components of human culture, which undoubtedly influence population growth to a certain extent. Nevertheless, econometric models capture the essential features of the phenomenon. First of all, it is worth mentioning Romer’s model [2,3], which was written for technology, but we will extend it to knowledge in general. Romer’s model can be presented as
q ˙ = w q N 1 λ
with the only difference that Romer’s variable q is the sum of technologies (although this is not all knowledge), N 1 is the number of only those people who work in science and technology, and per capita productivity is expressed as
w q = w 0 q ε
where w 0 , λ , and ε are the parameters (everything is in our notation). Ultimately, Romer accepts λ and ε to be equal to 1.
Kato [7] analyzes a model similar to (2) and (3), with the only difference being that the total population N is used instead of N 1 . The author expresses the following thought about the exponent ε (in the original, it is designated as φ ):
“When φ > 1 , then the growth rate of technological progress would rise rapidly with increasing level of technology. However, such situations have not been observed in developed nations through postwar periods, so Barro and Sala-i-Martin [10] imposed the condition φ 1 .”
We use this remark when constructing the productivity function.
Kremer’s model [4] can also be represented as Equation (1). Unlike Romer’s model (2), Kremer uses the total population N instead of the number of S&T personnel N 1 , but the parameters λ and ε are still equal to 1. Therefore, instead of (3), we have
w q = w 0 q
A similar model of technology development was used by Collins et al. [11] in their evolutionary theory of long-run economic growth.
Jones [12,13] modified Romer’s model by setting ε < 1 in (3), which, after a series of transformations, led him to the equation
q ˙ q = α N 1 ˙ N 1
where α = λ / 1 ε . The meaning of this equation can be clarified after integrating it, which yields
q = k N 1 α
where k is a constant. From (6), it follows that the technologies accumulated to date are only the output of currently working technology producers. However, this approach does not reflect the influence of previous generations, whose work also contributed to the development of technology. Obviously, the equation for q must contain an integral term summing up the contribution of past generations.
The same problem was noted by Dong et al. [6], who, based on an analysis of well-known econometric models and extensive empirical material, showed that technological growth depends not only on the current generation of people, but also on the achievements of past generations. The authors found deviations from the proportionality law N 1 N between the number of technology producers and the total population when dealing with the long-term evolution of society over millennia.
Okuducu and Aral [14] suggested that productivity could be a constant, linear, quadratic, or exponential function of the knowledge amount, and used these representations to compute various hypothetical scenarios of knowledge dynamics.
There is a difference between the knowledge approach (1) and the econometric one (3) and (4). Productivity w ( q ) is the per capita knowledge product (different forms of publication, e.g., patents, articles, and books; cf. Abramo et al. [15]) in the first case or the per capita gross product in the second one. Knowledge is measured in information units, while gross product in monetary units.
The question arises [16]: is the information approach to demographic dynamics divorced from reality and is it possible to calibrate the corresponding model? The answer to this question is one of the objectives of this work. As for the reality and prospects of such an approach, we can refer to the work of Dolgonosov [9], in which a general global-scale model was proposed, including economic, environmental, demographic, and information components, and which was successfully calibrated using extensive empirical data.
In connection with the development of artificial intelligence, a dilemma has arisen about how to describe the presence of intelligent machines, whether to include them among knowledge producers, thereby expanding the producers population N , or to continue to believe that knowledge is produced by people, and the machine is still only a tool that helps them in knowledge production. Sadovnichy et al. [17] developed an approach, believing that intelligent machines can now be considered knowledge producers and, hence, included in the number N along with humans. This is a promising direction of research, especially given the rapid development of AI. But, for now, following the analysis of Akaev and Sadovnichii [18], we will remain with the traditional approach, according to which it is people who produce knowledge, while intelligent machines only help them in this matter. Then, the effect of AI is manifested through an increase in the knowledge amount and a corresponding increase in human productivity.
The above-mentioned productivity functions proposed by various authors require verification based on empirical material. To this end, we revisit the issue of productivity as a function of knowledge and verify the theoretical results using literature data.
Another nontrivial problem is how to determine the amount of knowledge. The most consistent approach is to estimate the memory capacity that knowledge takes up. However, at the moment, such information is unlikely to exist. Meanwhile, there is evidence [19] that digital memory has been rapidly increasing in recent times, which can be classified as a global information explosion that began somewhere between 1986 and 2007.
It should be expected that the total memory capacity far exceeds the knowledge capacity due to the repeated replication of useful information, especially in graphic and video formats. In this situation, it is necessary that we use data on different types of knowledge representation, such as patent applications, original articles, and books. These data have been largely cleared of duplication. Knowledge production should be assessed separately for each type.
Let us highlight the main points from the above literature overview and formulate the problem that is solved in this work.
Gap in existing knowledge. The literature cited shows that there is an understanding that the rate of knowledge production per capita (productivity) may depend on the accumulated knowledge, but there is insufficient clarity on the question of which class of functions is most natural for describing the dependence of productivity on the knowledge amount. In existing econometric models, the sum of technologies is a fuzzy concept, which is described by a qualitative variable with an uncertain measure. To quantify the knowledge amount (including the sum of technologies), it is necessary that we introduce an appropriate information measure. Then, it will become clear what empirical data should be used to test the proposed model.
Statement of the problem. The problem is to find the productivity w as a function of knowledge amount q . To do this, it is necessary that we analyze the function w q under various conditions corresponding to different areas of the parametric space; consider the application of this function to the description of global bibliometric data, which act as a measure of the amount of knowledge accumulated in the world; and study the modes of productivity growth characteristic of societies with little and much knowledge.
The importance of solving the problem lies in improving the model of knowledge production, which is part of the global model of civilization development; creating a tool for processing bibliometric information; and identifying the stages of civilization development in terms of the amount of knowledge and the rate of its production.
The sequence of steps to solve the problem is as follows:
(i)
The construction of the per capita knowledge production rate (productivity) as a function of knowledge amount;
(ii)
The collection of literature data on patent applications, scientific and technical articles, books of all genres, the information storage capacity, the population, and the GDP series over time;
(iii)
The calibration of the developed model using the collected empirical data to find values of the model parameters and assess the model adequacy;
(iv)
The interpretation of the results obtained, including an assessment of the threshold knowledge amount separating the pre-information society from the information one, and finding the transition point between them.

2. Model

2.1. Knowledge Production and Accumulation

The need to solve new problems that life poses to people encourages knowledge production (Figure 1). Civilization perceives information coming from the world and processes it using existing models extracted from the information storage. This process is knowledge activation. If the necessary models are missing, they are created. This process is knowledge production. The results obtained are stored in the information storage.
Knowledge is professionally produced only by a part of the population. As in many econometric models, we assume that this part is proportional to the population size. Dong et al. [6] found deviations from this law for individual countries, but there is reason to believe that the deviations are likely to be smoothed out when moving to a global scale, as usually happens when a statistical system is enlarged. Then, the overall rate of knowledge production will be equal to the average productivity multiplied by the population size, as expressed by Equation (1). However, for humanity as a global system, this equation can be simplified by keeping in mind the following fact. Human civilization does not have extraterrestrial contacts; hence, there are no external sources of knowledge, so, in (1), we must put J = 0 . Due to this isolation, the system is autonomous, which means that productivity does not depend on time explicitly, but only through q ( t ) , so that Equation (1) reduces to the form
q ˙ = w q N
Equation (7) can be written as
d q w q = N t d t
Integrating (8) with the initial condition
t = t 0 ,     q = q 0
and introducing functions
F q = q 0 q d q w q
S t = t 0 t N t d t
we come to the equation
F q = S t
which implicitly specifies q as a function of the cumulative population size S t , thereby formalizing the accumulation of knowledge over time.

2.2. Productivity Function

The Introduction used the concept of two states of society, differing in the amount of accumulated knowledge q . Let q = q h be the threshold separating these two states.
Definition: A society with a sub-threshold amount of knowledge q < q h is called a pre-information society, and a society with a super-threshold amount q > q h is called an information society.
To reveal the dependence of productivity on the knowledge amount, we will consider two opposite cases: an extremely undeveloped society and a highly developed information society. The productivity function should have the following properties:
  • In an extremely undeveloped society q q h , knowledge has not yet been accumulated (formally q = 0 ), but knowledge is produced with a non-zero initial productivity w ( 0 ) = w 0 ;
  • In a highly developed information society q q h , productivity increases slowly according to the power law w ( q ) q ε with an exponent ε not exceeding 1 (since an average knowledge producer uses a very limited amount of knowledge in his creative process—this is close to the opinion of Barro and Sala-i-Martin [10], mentioned in the Introduction).
The simplest interpolation formula with these properties is
w ( q ) = w 0 1 + h q ε
w 0 > 0 ,     h 0 ,     0 ε 1
where w 0 ,   h ,   ε are parameters. The threshold value q h is related to the parameter h as follows: q h = 1 / h . If h q 1 (or q q h ), we can use the constant productivity approximation as in our previous works.
The substitution of (13) into (10) yields
F q = 1 h w 0 ln ε 1 + h q ln ε 1 + h q 0
and, according to (12), we find
1 + h q = 1 + h q 0   exp ε h w 0 S 1 + h q 0 1 ε
where we use the deformed logarithm and deformed exponential [20], which are defined as
ln ε x = x 1 ε 1 1 ε
exp ε x = 1 + 1 ε x 1 / 1 ε
In the limit ε 1 , we obtain the natural logarithm and natural exponential:
ln 1 x = ln x ,     exp 1 x = exp x
At the ends of the ε range, we have the following:
  • Constant productivity
    ε = 0 ,     w = w 0 ,     q = q 0 + w 0 S
  • Productivity as a linear function of knowledge
    ε = 1 ,     w = w 0 1 + h q ,     1 + h q = 1 + h q 0 exp h w 0 S
In (20) and (21), accumulated knowledge is, respectively, a linear and exponential function of the total number of people S over the observation period t 0 , t . All people are taken into account here, not just the direct producers of knowledge, since the number of producers is assumed to be proportional to the population.
The presence of the integral quantity S in (16) describes the contribution of past generations to the accumulation of knowledge, as discussed by Dong et al. [6], in contrast to Formula (6), which refers only to the current population.

2.3. Asymptotics

Let us consider a situation where the most probable values of the parameters in Equation (16) correspond to the limit h , which is typical for a highly developed information society with q q h . Minimizing the standard deviation of the model from data by varying h causes w 0 to depend on h . The asymptotic form of Equation (16) is
q q 0 1 ε + 1 ε h ε w 0 S 1 / 1 ε
In the limit h , expression (22) must be independent of h , which implies
w 0 c h ε
and
q q 0 1 + 1 ε c S q 0 1 ε 1 / 1 ε
where c is a positive constant. Productivity (13) asymptotically obeys the power law
w q c q ε
Thus, the general productivity function (13) includes three special cases: a constant (20), linear (21), and power (25) function. There is another special case, which we consider in the next item.

2.4. Exponential Productivity

Kato [7] expressed the opinion that the option ε > 1 in (3) and, accordingly, in (13) gives an unrealistically rapid increase in human productivity (see the quote in the Introduction). This option can even lead to a singularity; nevertheless, for the sake of completeness, we will consider it. In particular, let us look at the case where productivity is converted from (13) to an exponential function of knowledge. Previously, Okuducu and Aral [14] considered productivity as an exponential function of q as an option.
Let us adopt the notion that, in (13), there is no upper limit for ε and the coefficient h decreases with increasing ε according to the law h = a / ε ,   a > 0 . Then, w q = w 0 1 + a q / ε ε , and, in the limit ε , we obtain
w q = w 0 e a q
From (10)–(12), it is easy to find
q = q 0 + 1 a ln 1 1 b S
where b = a w 0 e a q 0 . The cumulative population size S t increases with time; hence, at some point in time, it reaches the value S = 1 / b , at which a singularity occurs. Thus, in a finite time, the accumulated knowledge q becomes infinite, which now seems strange, but, if we bear in mind the rapid development of artificial intelligence, the singularity in (27) may be associated with the use of AI as an incredibly powerful tool for knowledge production, resulting in an exponential growth in productivity (26) as knowledge accumulates.

3. Model Calibration

3.1. From Continuous to Discrete

The productivity function (13) is calibrated by varying its parameters in order to minimize the standard deviation from data. Due to the annual discreteness of demographic data, integral (11) should be replaced by the sum of the population over the years t 0 to t :
S = i = t 0 t N i
where N i is the i th year population, and i is a year number.
According to Abramo et al. [15], knowledge can be measured through its publication. We will consider three forms of publication: patents, articles, and books. Each form accumulated up to and including a certain year t is the sum
q = q 0 + i = t 0 t X i
where X is knowledge production measured on a case-by-case basis by the annual publication of patents, articles, or books (which is denoted as q ˙ in the basic Equation (7)), X i corresponds to the i th year, and q 0 is knowledge (number of patents, articles, or books) accumulated up to year t 0 (not including t 0 itself). This equality is also used to determine the information storage capacity.

3.2. Bibliometric Data

To calibrate model Equations (16) and (24), we used the bibliometric data presented in Figure 2. Articles in scientific and technical journals (for 2000–2018) and patent applications (for 1985–2020) are represented by global data [21,22]. Data on new book titles (for 1950–1996) are selected for a group of 30 countries based on information provided by Fink-Jensen [23]. The group composition is indicated in the note to Table 1. The criterion for including a country in the group is the availability of data on books published for 1950–1996. For other countries, the data range is less than specified. There are gaps in the data for individual years that are filled by linear interpolation. When calibrating the model, we used the group population size in the case of books and the world population size in the cases of articles and patents (Figure 3).

3.3. Initial Amount of Knowledge

The information storage capacity at the beginning of the digitization period is known from the literature [19]. However, this cannot be said about the initial amount of knowledge q 0 , represented in the form of patents, articles, and books. To find q 0 , we use indirect estimation based on the relationships between annual knowledge production X t (as denoted in (29)), gross domestic product G t , and population N t . All these quantities are provided with literature data (for links, see the captions to Figure 2 and Figure 3). The problem is that the time series X ( t ) is usually very short, and, in order to find q 0 , it is necessary that we sum X ( t ) over a fairly long retrospective period. This can be carried out using the following algorithm:
1. Generate a function N t based on demographic data;
2. Generate functions X G and G N on ranges provided with data;
3. Approximate X G and G N with suitable functions and continue the functions to the origin (where G and N are zero);
4. Make up a composition of functions X t = X G N t , continuing it into the distant past, where X tends to zero;
5. Take the sum of X t for the entire previous period up to point t 0 (not including it), where the data for X begin:
q 0 = i = t 0 1 X i
Formally, the summation starts from , but, in fact, it is permissible to take a fairly distant point in the past, where X t is very small. We took the year 1900 as such a point, when the production of patents, articles, and books was negligible compared to modern amounts;
6. Calculate q t using Formula (29) in two ways: (i) using the available data for X , and (ii) using the results of model calculations according to item 4 (to compare the model with the data).
An example of applying this algorithm to finding the initial number of articles q 0 accumulated by the year t 0 = 2000 is shown in Figure 4. Data on articles are available in the year range 2000–2018. Despite such a short data range, the use of this algorithm allows us to estimate the accumulation of articles in a much wider year range: 1900–2020. The model fits the data well. This algorithm was also applied to patents and books (Figure 5).

4. Results and Discussion

The parameter values found as a result of model calibration are presented in Table 1 and Figure 6. The accuracy of the model-to-data fit is very high, as evidenced by the determination coefficient R 2 , the values of which are close to 1.

4.1. Storage Capacity

The best fit of Equation (16) to the data is achieved at ε = 1 , when a linear productivity (21) is the case:
q = q h ρ e S / σ 1
q h = 2.053 ,     ρ = 2.266 ,     σ = 29.42
where
q h = 1 h ,     ρ = 1 + h q 0 ,     σ = 1 h w 0
q is measured in Exabytes (only in this case), and S and σ are measured in billion people × year.

4.2. Patents

Kong et al. [26] found that newly created patents absorb much more knowledge from existing patents than from articles. Then, we can neglect the contribution of articles to the production of patents.
The number of patents is also best suited to the linear case ε = 1 —see (21)—and obeys Equation (31) with parameters (33) having values
q h = 20.41 ,     ρ = 1.770 ,     σ = 230.8
here, and, further in (35), q is measured in million texts.

4.3. Articles

Equation (16) when applied to the number of articles in scientific and technical journals gives the best result in the asymptotic limit h , which corresponds to Equation (24) at ε = 0.7580 (Table 1). Equation (24) can be rewritten as
q = q 0 1 + S σ τ τ
q 0 = 20.04 ,   σ = 114.5 ,   τ = 4.132
where
σ = q 0   1 ε c ,     τ = 1 1 ε

4.4. Books

For the number of new book titles (in all genres of literature), the best result corresponds to the same asymptotic Formula (35) as for articles, with ε = 0.5814 and parameter values
q 0 = 8.749 ,   σ = 46.74 ,   τ = 2.389

4.5. Memory Capacity Assessment

To find the memory capacity (in bytes) occupied by patents, articles, and books, we use estimates of the average sizes of these texts. An analysis of samples of several hundred patents and articles yields an average size of approximately 1.5 Megabytes per patent or article. Similarly, for books, we obtain an average size of 14 Megabytes per book. The latest storage capacity value of 310 Exabytes dates back to 2007. Memory capacity estimates for various types of knowledge representation as of 2007 are shown in Table 2.
We see that the memory capacity occupied by each text type is six orders of magnitude less than the total storage capacity. The storage capacity is filled primarily with visual information (photos, films, archives of TV programs, video surveillance, digitized museum exhibits, etc.). It is also necessary that we consider the multiple duplication of visual and textual information copied by many interested users to their devices. The need to store such immense information causes an accelerated growth in the capacity of storage devices, which is what we are seeing in reality (Figure 6a).

4.6. Productivity Increase

According to the adopted model, productivity increases for all types of texts studied here (patents, articles, and books), as depicted in Figure 7. With an increase in knowledge by 5 times ( q from 10 to 50 units), productivity increases by 2.3, 2.5, and 3.4 times for patents, books, and articles, respectively. For the same increase in storage capacity, productivity increases by 4.3 times. Thus, productivity grows more slowly than knowledge.
Table 3 shows that, during the observation period, productivity increases by 2–2.7 times. Unlike knowledge, the information storage stands apart: its capacity q increased over the observation period by 113 times, and its productivity w by 63 times. We see that memory is expanding much faster than new texts (patents, articles, and books) are created. Apparently, producing storage devices is a simpler process than creating new knowledge.

4.7. Constant Productivity Approximation

Consider the condition under which the constant productivity approximation may be acceptable. According to (13), this condition is q q h , where q h = 1 / h is a threshold value. Referring to Table 1, we find q h = 2.053 for storage and q h = 20.41 for patents. The former corresponds to the year 1983, the latter to 1989.
For articles and books, their productivity w and accumulated knowledge q obey nonlinear laws (25) and (35). As shown above (see (20)), constant productivity causes a linear increase in knowledge. Equation (35) can be linearized if the condition S σ is satisfied; then, w c q 0 ε . According to (36) and (38), σ = 114.5 for articles and σ = 46.74 for books. The threshold value S = σ is reached in 2016 for articles and in 1982 for books.
Therefore, we can use the constant productivity approximation (20) as long as we do not get too close to the specified dates, staying in the range of q where the condition q q h for storage and patents or S σ for articles and books holds. To summarize, as we approach the 1980s, the constant productivity approximation loses its adequacy (for articles, it happens later).
The dependence of knowledge production on population size (7), supplemented by the equation of knowledge dynamics, allows us to obtain the equation of demographic dynamics [8]. The constant productivity approximation w = w 0 leads to the well-known hyperbolic law of world population growth [27], which operated for over a thousand years. However, deviations from this law become increasingly evident as we approach the 1980s, due to the significant accumulation of knowledge and the growth of productivity w , which can no longer be considered constant. This fact is commonly referred to as a demographic and technological phase transition [28,29,30], and, at the same time, it can be interpreted as a transition from a pre-information society, where the constant productivity approximation operates, to a more developed information society with advanced computer technologies and growing rates of per capita knowledge production.
After the 1980s, personal computers became widespread and the information society continued to develop. Digital memory grew, reaching the level of analog memory and then surpassing it. The share of digital memory in total memory capacity increased as follows: 0.8% in 1986, 3% in 1993, 25% in 2000, and 94% in 2007 [31]. The capacities of both types of memory became equal in 2003. Thus, the early 2000s can be considered a milestone in the maturation of digital civilization. Currently, the majority of the world’s technological memory is organized in the most accessible and fastest digital format.

4.8. Model Limitations and Capabilities

Let us note the limitations and possibilities of application and development of the model.
  • The model is written for the world as a whole. This global system is closed in the sense that all knowledge is produced within the system, and there is no knowledge coming from outside.
  • The model is not applicable to individual countries because there is an exchange of knowledge between countries. To be applicable to individual countries, the model must be modified by including knowledge flows in Equation (1) that reflect the exchange between countries.
  • The available data series are not long enough, which reduces the model calibration accuracy. The series length in years is as follows: articles—19, information storage—22, patents—36, and books—47. In the case of books, systematic data are available only for a group of 30 countries. There are also quite large deviations from the trend (Figure 5c). These deviations can be smoothed out only by moving to an integral curve describing the accumulation of books (Figure 5d). Therefore, the calculation results for books should be considered rather approximate, especially when trying to extend them to the entire world.
  • The model can be used as a tool for predicting the development of civilization through the accumulation of knowledge. There are limitations on the forecast horizon, in particular, because the model does not account for the rapid development of artificial intelligence, which contributes to a significant increase in human productivity and the acceleration of knowledge production.
  • Knowledge production is present in the world system model [9], along with demographic, economic, and environmental factors. As shown there, this system is at the edge of losing stability. The model in [9] uses the constant productivity approximation, which is acceptable, as we have found out, for a pre-information society. However, in a developed information society, productivity w is no longer constant, but grows as knowledge accumulates. Including the function w ( q ) in global dynamics will allow us to clarify the behavior of the world system and improve the accuracy of forecasts.
  • In the future, it is necessary that we study the influence on the knowledge production function of two separate factors: population education and artificial intelligence.

5. Conclusions

The amount of knowledge correlates with the number of patents, articles, and books published in the world over the entire previous period, which allowed us to trace the dynamics of knowledge accumulation. The production of knowledge depends on its amount and population size. This dependence plays a crucial role in knowledge dynamics and related demographic dynamics. The goal of this work was to find out the form of this dependence and check how well it corresponds to real data.
We have proposed a model in which the total rate of knowledge production is expressed as the product of average human productivity and population size. Productivity increases as knowledge accumulates and information technology advances. At the early stage of a society’s development, knowledge is very scarce and productivity is low.
As knowledge grows, productivity gradually increases, reaching high values in a developed information society. In the asymptotic limit, when the knowledge amount q becomes large, productivity can be described by a power-law dependence on q . To combine the extreme cases of an undeveloped society and a highly developed one, we described the productivity using an interpolation function, which is a linear form of q raised to a certain power. This dependence generalizes important special cases where productivity can be a constant, linear, power, or exponential function of knowledge.
In a developed society, information is stored primarily in a digital format on various types of devices, which, together with analog memory, form the global information storage. With the development of digital technology, storage capacity is rapidly increasing. To describe this process, we used the proposed model.
The model was calibrated using literature data for the world as a whole (applied to patents, articles, and information storage) and for the group of 30 countries (applied to books, given the lack of data for many countries). Good agreement with the data was achieved. The general dependence of human productivity on the knowledge amount, as our analysis has shown, comes down to two special cases: a linear function of q for patents and storage capacity, and a power function of q for articles and books.
The analysis showed that, in a pre-information society, with a relatively small amount of knowledge, the constant productivity approximation can be used. The transition to a developed information society occurred in the 1980s. Productivity can no longer be considered constant: it grows with the accumulation of knowledge according to a linear law in the case of patents, and according to a power law in the case of articles and books.
Digital memory surpassed analog memory after 2003. The population’s need for the repeated duplication of useful information led to a rapid increase in the number of storage devices and, consequently, to an increase in the total capacity of information storage, which, by 2007, exceeded the memory capacity occupied by patents, articles, and books by six orders of magnitude.
The results obtained open up an opportunity to advance in describing the dynamics of various forms of knowledge and predicting their development in the future.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Dolgonosov, B.M.; Naidenov, V.I. An informational framework for human population dynamics. Ecol. Mod. 2006, 198, 375–386. [Google Scholar]
  2. Romer, P.M. Increasing returns and long-run growth. J. Polit. Econ. 1986, 94, 1002–1037. [Google Scholar] [CrossRef]
  3. Romer, P.M. Endogenous technological change. J. Polit. Econ. 1990, 98, S71–S102. [Google Scholar] [CrossRef]
  4. Kremer, M. Population, population growth and technological change: One million B.C. to 1990. Quart. J. Econ. 1993, 108, 681–716. [Google Scholar] [CrossRef]
  5. Abdih, Y.; Joutz, F. Relating the knowledge production function to total factor productivity: An endogenous growth puzzle. IMF Staff Pap. 2006, 53, 242–271. [Google Scholar]
  6. Dong, J.; Li, W.; Cao, Y.; Fang, J. How does technology and population progress relate? An empirical study of the last 10,000 years. Technol. Forecast. Soc. Change 2016, 103, 57–70. [Google Scholar]
  7. Kato, H. Population Growth and Technological Progress—From a Historical View. In An Empirical Analysis of Population and Technological Progress; Springer Briefs in Population Studies; Springer: Tokyo, Japan, 2016. [Google Scholar] [CrossRef]
  8. Dolgonosov, B.M. Knowledge production and world population dynamics. Technol. Forecast. Soc. Change 2016, 103, 127–141. [Google Scholar]
  9. Dolgonosov, B.M. Knowledge-induced dynamics of the global human-environment system: Between sustainability and collapse. Adv. Environ. Res. 2024, 103, 93–132. [Google Scholar]
  10. Barro, R.J.; Sala-i-Martin, X. Economic Growth, 2nd ed.; MIT Press: Cambridge, UK, 2003. [Google Scholar]
  11. Collins, J.; Baer, B.; Weber, E.J. Population, Technological Progress and the Evolution of Innovative Potential; Economics Discussion Papers; University of Western Australia Business School: Crawley, Australia, 22 May 2013. [Google Scholar]
  12. Jones, C.I. R&D-based models of economic growth. J. Polit. Econ. 1995, 103, 759–784. [Google Scholar]
  13. Jones, C.I. Growth: With or without scale effects? Amer. Econ. Rev. 1999, 89, 139–144. [Google Scholar] [CrossRef]
  14. Okuducu, M.B.; Aral, M.M. Knowledge based dynamic human population models. Technol. Forecast. Soc. Change 2017, 122, 1–11. [Google Scholar] [CrossRef]
  15. Abramo, G.; D’Angelo, C.A.; Carloni, M. The balance of knowledge flows. J. Informetr. 2019, 13, 1–9. [Google Scholar] [CrossRef]
  16. Court, V.; McIsaac, F. A representation of the world population dynamics for integrated assessment models. Envir. Mod. Assess. 2020, 25, 611–632. [Google Scholar] [CrossRef]
  17. Sadovnichy, V.; Akaev, A.; Korotayev, A. A mathematical model for forecasting global demographic dynamics in the age of intelligent machines. arXiv 2022. [Google Scholar] [CrossRef]
  18. Akaev, A.A.; Sadovnichii, V.A. The human component as a determining factor of labor productivity in the digital economy. Stud. Russ. Econ. Dev. 2021, 32, 29–36. [Google Scholar] [CrossRef]
  19. Hilbert, M. How much of the global information and communication explosion is driven by more, and how much by better technology? J. Amer. Soc. Inform. Sci. Technol. 2014, 65, 856–861. [Google Scholar]
  20. Umarov, S.; Tsallis, C.; Steinberg, S. On a q-central limit theorem consistent with nonextensive statistical mechanics. Milan J. Math. 2008, 76, 307–328. [Google Scholar] [CrossRef]
  21. OECD. Triadic Patent Families. 2022. Available online: https://data.oecd.org/rd/triadic-patent-families.htm#indicator-chart (accessed on 17 September 2022).
  22. World Bank. Scientific and Technical Journal Articles. 2022. Available online: https://data.worldbank.org/indicator/IP.JRN.ARTC.SC?yearlowdesc=true (accessed on 23 September 2022).
  23. Fink-Jensen, J. Book Titles per Capita. 2015. Available online: https://hdl.handle.net/10622/AOQMAZ (accessed on 3 June 2022).
  24. UN. World Population Prospects. 2022. Available online: https://population.un.org/wpp/ (accessed on 5 July 2022).
  25. Gapminder. Population. 2022. Available online: https://www.gapminder.org/data/documentation/gd003/ (accessed on 10 October 2022).
  26. Kong, J.; Zhang, J.; Deng, S.; Kang, L. Knowledge convergence of science and technology in patent inventions. J. Informetr. 2023, 17, 101435. [Google Scholar]
  27. von Foerster, H.; Mora, P.M.; Amiot, L.W. Doomsday: Friday, 13 November, A.D. 2026. Science 1960, 132, 1291–1295. [Google Scholar]
  28. Korotayev, A.; Goldstone, J.A.; Zinkina, J. Phases of global demographic transition correlate with phases of the great divergence and great convergence. Technol. Forecast. Soc. Change 2015, 95, 163–169. [Google Scholar] [CrossRef]
  29. Grinin, L.; Grinin, A.; Korotayev, A. Dynamics of technological growth rate and the forthcoming singularity. In The 21st Century Singularity and Global Futures; Korotayev, A., LePoire, D., Eds.; World-Systems Evolution and Gloibal Futures; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
  30. Grinin, L.; Grinin, A.; Korotayev, A. A quantitative analysis of worldwide long-term technology growth: From 40,000 BCE to the early 22nd century. Technol. Forecast. Soc. Change 2020, 155, 119955. [Google Scholar]
  31. Hilbert, M.; López, P. The world’s technological capacity to store, communicate, and compute information. Science 2011, 332, 60–65. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Conceptual diagram of knowledge production and accumulation.
Figure 1. Conceptual diagram of knowledge production and accumulation.
Knowledge 05 00007 g001
Figure 2. Cumulative sums of patents, articles, and books for the years of observation. Patents and articles represent global data, while books refer to the group of 30 countries listed in the note to Table 1. Data sources: number of patent applications—[21]; number of scientific and technological journal articles—[22]; and number of new book titles—[23].
Figure 2. Cumulative sums of patents, articles, and books for the years of observation. Patents and articles represent global data, while books refer to the group of 30 countries listed in the note to Table 1. Data sources: number of patent applications—[21]; number of scientific and technological journal articles—[22]; and number of new book titles—[23].
Knowledge 05 00007 g002
Figure 3. World population and population of the group of 30 countries by year. See note to Table 1 for group composition. Data sources: [24,25].
Figure 3. World population and population of the group of 30 countries by year. See note to Table 1 for group composition. Data sources: [24,25].
Knowledge 05 00007 g003
Figure 4. Finding the number of articles q 0 accumulated over previous years (1900–1999) by the beginning of the observation period (2000–2018): (a) annual publication of articles vs. GDP: X ( G ) ; (b) GDP vs. population: G ( N ) (population N ( t ) over time is shown in Figure 3); (c) annual publication of articles over time: X t = X G N t ; and, finally, (d) the accumulation of articles over time (since 1900): q = Sum X t . Model calculations are compared with the data.
Figure 4. Finding the number of articles q 0 accumulated over previous years (1900–1999) by the beginning of the observation period (2000–2018): (a) annual publication of articles vs. GDP: X ( G ) ; (b) GDP vs. population: G ( N ) (population N ( t ) over time is shown in Figure 3); (c) annual publication of articles over time: X t = X G N t ; and, finally, (d) the accumulation of articles over time (since 1900): q = Sum X t . Model calculations are compared with the data.
Knowledge 05 00007 g004aKnowledge 05 00007 g004b
Figure 5. Finding the number q 0 of accumulated patents (a,b) and books (c,d). Here, in contrast to Figure 4, only the start and end charts are shown.
Figure 5. Finding the number q 0 of accumulated patents (a,b) and books (c,d). Here, in contrast to Figure 4, only the start and end charts are shown.
Knowledge 05 00007 g005aKnowledge 05 00007 g005b
Figure 6. Information storage capacity (a) (four points correspond to 1986, 1993, 2000, and 2007) and the cumulative sums of patents (b), articles (c), and books (d) depending on the cumulative sum of population during the corresponding observation period, indicated at the top of the panels. Markers and solid lines are data, and dotted lines are model. See Table 1 for model parameters. Data source for storage capacity: [19]. Data sources for patents, articles, books, and population are indicated in the captions to Figure 2 and Figure 3.
Figure 6. Information storage capacity (a) (four points correspond to 1986, 1993, 2000, and 2007) and the cumulative sums of patents (b), articles (c), and books (d) depending on the cumulative sum of population during the corresponding observation period, indicated at the top of the panels. Markers and solid lines are data, and dotted lines are model. See Table 1 for model parameters. Data source for storage capacity: [19]. Data sources for patents, articles, books, and population are indicated in the captions to Figure 2 and Figure 3.
Knowledge 05 00007 g006aKnowledge 05 00007 g006b
Figure 7. Productivity as a function of knowledge amount for patents, articles, and books (see w on the left axis, where q is measured in millions of texts) and storage capacity (see w on the right axis, where q is measured in Exabytes).
Figure 7. Productivity as a function of knowledge amount for patents, articles, and books (see w on the left axis, where q is measured in millions of texts) and storage capacity (see w on the right axis, where q is measured in Exabytes).
Knowledge 05 00007 g007
Table 1. Optimal parameter values of the productivity function (13) and its asymptotics (25) for storage capacity and various types of knowledge representation *.
Table 1. Optimal parameter values of the productivity function (13) and its asymptotics (25) for storage capacity and various types of knowledge representation *.
Model
Parameters
Storage
1986–2007
Patents
1985–2020
Articles
2000–2018
Books
1950–1996
q 0 2.615.7020.048.75
ε 110.75800.5814
h 0.4870.0490
w 0 0.069780.08841
c 0.018040.05304
R 2 0.99630.99910.99970.9977
* Notes: (1) The storage capacity and the number of texts (patents, articles, or books) accumulated by the beginning of the corresponding observation period are denoted as q 0 . (2) System of units: q , Exabytes (Exa = 1018) for storage capacity; q , million texts for patents, articles and books; N , billion people; t , year. (3) The determination coefficient R 2 for articles and books is highest for the asymptotic Formula (24). (4) Data on books are given for a group of 30 countries for which data are available for the entire specified period 1950–1996 (gaps for individual years are filled by linear interpolation). The group includes the following countries: Argentina, Australia, Austria, Belgium, Bulgaria, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, India, Italy, Japan, Latvia, Lithuania, Netherlands, Norway, Poland, Portugal, Romania, Russian Federation, Spain, Sweden, Switzerland, Turkey, United Kingdom, and United States.
Table 2. Memory capacity of information storage and various types of knowledge representation as of 2007.
Table 2. Memory capacity of information storage and various types of knowledge representation as of 2007.
TypeNumber of Texts (in 2007), MillionSpecific Capacity,
Megabyte per Text
Total Capacity,
Petabyte *
Storage (world)310,000
Patents (world)44.01.50.07
Articles (world)30.61.50.05
Books (group)30.4140.30
* 1 Petabyte = 1015 bytes.
Table 3. The increase in productivity over the observation period.
Table 3. The increase in productivity over the observation period.
TypeYear q w w 2 / w 1
Storage *19862.60.158163.4
2007292.810.02
Patents **198515.920.15742.69
202078.510.4234
Articles **200020.040.17502.15
201854.420.3757
Books **19508.7490.18722.03
199628.940.3805
* For storage: q , Exabytes; w , Exabytes per billion people per year. ** For patents, articles, and books: q , million texts; w , million texts per billion people per year.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dolgonosov, B.M. Modeling the Knowledge Production Function Based on Bibliometric Information. Knowledge 2025, 5, 7. https://doi.org/10.3390/knowledge5020007

AMA Style

Dolgonosov BM. Modeling the Knowledge Production Function Based on Bibliometric Information. Knowledge. 2025; 5(2):7. https://doi.org/10.3390/knowledge5020007

Chicago/Turabian Style

Dolgonosov, Boris M. 2025. "Modeling the Knowledge Production Function Based on Bibliometric Information" Knowledge 5, no. 2: 7. https://doi.org/10.3390/knowledge5020007

APA Style

Dolgonosov, B. M. (2025). Modeling the Knowledge Production Function Based on Bibliometric Information. Knowledge, 5(2), 7. https://doi.org/10.3390/knowledge5020007

Article Metrics

Back to TopTop