Gender-Related Differences in the Citation Impact of Scientiﬁc Publications and Improving the Authors’ Productivity

: The article’s purpose is an analysis of the citation impact of scientiﬁc publications by authors of different gender compositions. The page method was chosen to calculate the citation impact of scientiﬁc publications, and the obtained results allowed to estimate the impact of the scientiﬁc publications based on the number of citations. The normalized citation impact is calculated according to nine subsets of scientiﬁc publications that correspond to patterns of different gender compositions of authors. Also, these estimates were calculated for each country with which the authors of the publications are afﬁliated. The Citation database, Network Dataset (Ver. 13), was chosen for the scientometric analysis. The dataset includes more than 5 million scientiﬁc publications and 48 million citations. Most of the publications in the dataset are from the STEM ﬁeld. The results indicate that articles with a predominantly male composition are cited more than articles with a mixed or female composition of authors in this direction. Analysis of advantages in dynamics indicates that in the last decade, in developed countries, there has been a decrease in the connection between the citation impact of scientiﬁc publications and the gender composition of their authors. However, the obtained results still conﬁrm the presence of gender inequality in science, which may be related to socioeconomic and cultural characteristics, natural homophily, and other factors that contribute to the appearance of gender gaps. An essential consequence of overcoming these gaps, including in science, is ensuring the rights of people in all their diversity.


Introduction
New knowledge, ideas, and innovations are created thanks to the development of scientific cooperation.Scientific cooperation is a joint activity of scientists to create and verify new knowledge.The results of scientific cooperation are the publication of scientific articles, the organization and implementation of joint scientific projects, and the organization of conferences, seminars, and other scientific events.The increase in the productivity of the scientific activity of individual scientists and scientific teams is a factor that affects the development of innovations in the region and the state as a whole.The current direction of scientometrics is to identify the influence of demographic, social, and gender differences on publishing productivity.In works [1,2], it was determined that the form and intensity of scientific cooperation affect publishing productivity and the creation of innovations [3].This process is significantly influenced by the peculiarities of the construction of the social space in which scientific teams cooperate.It can be assumed that one of the influencing factors in forming patterns of scientific collaboration is gender.The impact of gender differences on publication productivity and citation of scientific publications is described in [4].In work [5], it was found that gender-heterogeneous working groups allow for the production of scientific results of higher quality.However, it is complicated by natural gender homophily [6].The ability to collaborate with peers also manifests itself in citations of scientific publications.In work [7], scientists tend to cite publications by authors of the same gender as themselves.Gender-based questions about homophily in research are described in works [8,9].
Ensuring respect for human dignity, equality, and rights is a critical value of the EU and other countries with a high human development index.An essential condition for ensuring these values is the implementation of a policy of gender equality and the elimination of gender gaps.In recent decades, there has been a growing trend to reduce the influence of gender differences among researchers when forming the composition of scientific projects.In particular, work [10] indicated that the influence of gender differences on scientific publication productivity is decreasing in current conditions, especially among young scientists.The analysis in [10] claims that gender differences in the productivity of scientific activity have been disappearing recently.A few decades ago, the number of scientific publications with male authors significantly exceeded that of female authors, but now this trend has changed.However, it was difficult for women to obtain positions in science for a long time since this field was almost entirely male-dominated [11].However, even with the gender representativeness of the STEM direction in education and science, this process was accompanied by increased gender differences in productivity and influence [12].
The prevailing situation is that there are fewer females than males in the higher ranks in academic circles.In work [13], it is indicated that, personally, females with high scientific results in a scientific group significantly influence the productivity of this group.In work [14], it is indicated that this is influenced by the higher emotional intelligence of females compared to males.Ensuring gender diversity in educational and scientific spaces is complex and multifaceted.Some aspects of gender diversity policy in university networks are described in [15].It is important to note that gender representativeness can differ in different science areas.In work [16], a study of the results of the work of 150,000 mathematicians was conducted.It has been shown that females publish less early in their careers and drop out of research faster than males.As a result, top mathematics journals publish fewer articles authored by women.A similar trend can be observed in the direction of computer science.However, this is a separate research task.
Even though the trend of overcoming gender gaps is one of the priorities in developed countries, questions remain as to whether scientific publications with a different gender composition are cited differently.And if so, what could it be connected with?To find answers to this question, choose a method with which you can effectively evaluate citation impact.Traditionally, citation impact is defined as the number of times subsequent publications cite a publication.
One of the methods that can be used to evaluate the scientific publication productivity or citation impact of a scientist is the PageRank method [17].The traditional purpose of the PageRank method is to determine the influence of a user on social networks or to evaluate the importance of web pages.Each network user or page is assigned an actual number that measures importance or reputation.The larger this number, the higher the importance [18].There are modifications to the PageRank method to calculate the productivity of scientific activities, the citation index, the reputation of scientific journals, etc.The classical PageRank method uses only edge relations and does not consider higher-order structures, particularly subgraphs.One of the concepts of modifying the PageRank method, described in [19], is the complication of the evaluation calculation by including higher-order structures in the calculation.In work [19], it is shown that this approach helps rank social network users better.This approach makes sense because citation networks tend to have a complex structure.This fact can be considered to assess the impact of citations in practice.However, it is challenging to use this method in real-time.A dynamic change in the structure of the citation network leads to the need to recalculate the scores, which is cumbersome.
In [20], an iterative method for calculating PageRank is proposed, simplifying the rating calculation.In general, using the PageRank method allows you to consider all the information about all the citations of the network authors when evaluating them.While the h-index [21] and its analogs, such as the i10-index, g-index, etc., when calculating the productivity of scientific activity, lose part of the citations outside the core, the work [22] describes the method of calculating the scientific productivity of collective subjects (universities, scientific institutes, departments, faculties, etc.) based on the Time-Weighted PageRank Method with Citation Intensity (TWPR-CI).It is shown that the advantage of the TWPR-CI method is the higher sensitivity of the scientific productivity estimates for new collective subjects that it averages during the first ten years of observation.The method's sensitivity is essential and can be used for citation impact evaluation, especially for recently published posts.However, the number of citations of new publications may be small, so this method will not differ from the classic PageRank method.
An analysis of the continuity of research in intergender scientific cooperation [23] is a direction that allows a better understanding of the features of the involvement of scientists of different genders in joint scientific projects.Well-known methods of researching patterns of scientific cooperation and choosing scientists for the organization of projects [23,24] can also be used to study the influence of gender on scientific interaction.Also, the methods described in works [25][26][27][28][29][30] can be used to evaluate the productivity of scientific activity, management, and competence selection of project executors using a gender approach.The work [12] describes a thorough study of the impact of gender inequality on scientific careers in different countries.It found that the increase in female participation in science over the past 60 years has been accompanied by a widening of the gender gap in both scientific productivity and impact.The article hypothesizes that the gender composition of authors of scientific publications has an impact on citation rates.If the influence is detected, it may mean that the gender composition of scientific teams working on joint research affects their scientific publication productivity.This trend may differ depending on the countries and areas of scientific research and may change over time.Accordingly, the article's goal is an analysis of the citation impact of scientific publications by authors with different gender compositions.Also, this article does not suggest that biases are conscious and that biases may depend on other socioeconomic and cultural factors but allow for the reveal of existing inequalities.Identified differences in the citation of scientific publications are not a sign of discrimination based on gender but are an indicator that captures the current state of publication activity.
A citation dataset of scientific publications was investigated.The Network Dataset (13 versions) consists of more than 5 million scientific publications and 48 million citations [31], collected from databases such as DBLP [32], ACM [33], Microsoft Academic Graph [34], and others.The construction of the database is described in more detail in [35].The following research stages were implemented: 1.
Calculate the citation impact for each scientific publication in the citation network.
To calculate citation impact, the number of citations of scientific publications was counted.Also, for citation impact calculation, the PageRank method was used [36,37].

2.
All publications are divided into eight classes according to the gender composition of the authors of these publications.The publication belonging to the corresponding cluster is determined by the author's article based on a unique service for determining the gender of a person by their first name.

3.
To examine the dependency of scientific publications' citations based on the gender composition of their authors, the obtained results for eight classes are compared among each other.Special attention should also be paid to citation scores' impact on scientific publications by authors from different countries.Analyzing the change in citation scores' impact on scientific publications from different countries is also essential.
Researching the influence of gender differences on scientific publication productivity is relevant for the development of innovations and scientific production in general.The identified gender inequality in the academic circle should be eliminated at the institution of higher education or scientific research institution and at the state level.An increase in the scientific publishing activity of the authors contributes to the growth of the scientific productivity of the institutions with which these authors are affiliated.The described study continues the research published in previous works [22,37].

Basic Terms and Concepts
Some terms and concepts have been used in the publication.Citation impact is determined by the number of times subsequent publications cite a publication.This study used the PageRank method to calculate the citation impact of scientific publications.The citation impact of a scientific publication, which was calculated as a result, is called PageRank citation impact.Also, the traditional method of calculating their total number of citations was used to evaluate the impact of scientific publications.
The work focuses on the calculation of the citation impact of scientific publications with different gender compositions.It is important to understand the regional distribution by country and the change over time in the intensity of citation of scientific publications with different gender compositions: male, female, and mixed.
Patterns for the gender composition of authors were highlighted.Each pattern corresponds to a specific class in which scientific publications were included.Each of these classes is studied separately.The evaluation of the citation impact of scientific publications by authors from different countries was conducted using open data collected over a long period.This allows you to investigate the change of citation impact of scientific publications for different classes over time.Also, sufficient data allow us to analyze the citations separately and the impact of scientific publications in different countries.
The work examines eight patterns for the gender composition of authors of scientific publications.It is assumed that a particular pattern will determine each article, and the citation score impact for these articles will differ.All scientific publications are divided into eight classes or subsets for each pattern separately.Let S = {s 1 , s 2 , . . . ,s n } be the set of scientists, and n is the number of scientists.Let P = {p 1 , p 2 , . . . ,p m } be the set of scien- tific publications published by scientists from set S, and let m be the number of scientific publications.With each publication p j , j = 1, m one or more authors of this publication are associated.We set the function F ⊆ S × P, which the set of pairs will determine s i , p j , i = 1, n, j = 1, m.Let us set the function: g : S → {f, m} determines the gender of each sci- entist from the set S. Then, define a tuple: then all authors of scientific publications p k are women and publications belong to the pattern "Fff".If card(∆(p k )) = 1 then publications belong to the pattern "F".If ∀ d ∈ ∆(p k ), d = m, card(∆(p k )) > 1, then the authors of the scientific publications p k are male, and, accordingly, the publications belongs to the "Mmm" pattern.If card(∆(p k )) = 1, the publication belongs to the "M" pattern.Other patterns are described in Figure 1.A capital letter at the beginning of the pattern's name indicates the gender of the first author of the scientific publication, respectively, F-female, M-male.The analysis of the specified number of classes or subsets of scientific publications corresponding to the specified patterns is sufficient for the study.described in Figure 1.A capital letter at the beginning of the pattern's name indicates the gender of the first author of the scientific publication, respectively, F-female, M-male.The analysis of the specified number of classes or subsets of scientific publications corresponding to the specified patterns is sufficient for the study.It should be noted that the gender composition of publications is determined based on a service that checks the gender of the authors of these publications.Separately, a significant number of publications with an uncertain gender composition should be considered when at least for one author, the service cannot identify the author's gender with sufficient accuracy.It should also be understood that the obtained results may have some deviations since, among the authors, a certain number of persons may identify themselves as non-binary.Still, the first name cannot determine it.

The Assessment of Citation Impact and PageRank Citation Impact of Scientific Publications
To calculate the citation impact for each scientific publication, you need to calculate the number of citations of this publication in other scientific publications.This indicator shows the influence of a scientific publication.The higher the citation impact of a scientific publication, the greater the influence of this publication.If Q q , q , , q = is the citation scores impact for each scientific publication j p , j 1, m . This indicator only shows the total number of citations, and it can quantify this publication's interest among other relevant authors.
The PageRank method was used to evaluate the influence of scientific publications.This method allows you to determine the impact of a scientific publication in comparison with other publications under consideration.According to the PageRank method, the scalar evaluation of the citation impact of a scientific publication j p is j 1, m = calculated according to the formula: It should be noted that the gender composition of publications is determined based on a service that checks the gender of the authors of these publications.Separately, a significant number of publications with an uncertain gender composition should be considered when at least for one author, the service cannot identify the author's gender with sufficient accuracy.It should also be understood that the obtained results may have some deviations since, among the authors, a certain number of persons may identify themselves as nonbinary.Still, the first name cannot determine it.

The Assessment of Citation Impact and PageRank Citation Impact of Scientific Publications
To calculate the citation impact for each scientific publication, you need to calculate the number of citations of this publication in other scientific publications.This indicator shows the influence of a scientific publication.The higher the citation impact of a scientific publication, the greater the influence of this publication.If Q CI = {q 1 , q 2 , . . . ,q m } is the citation scores impact for each scientific publication p j , j = 1, m, Q CI : P → N ∪ {0} .This indicator only shows the total number of citations, and it can quantify this publication's interest among other relevant authors.
The PageRank method was used to evaluate the influence of scientific publications.This method allows you to determine the impact of a scientific publication in comparison with other publications under consideration.According to the PageRank method, the scalar evaluation of the citation impact of a scientific publication p j is j = 1, m calculated according to the formula: where r j is the PageRank score citation impact of a scientific publication p j , j = 1, m, β jy , j = 1, m, and y = 1, m are the coefficient that determines the presence of a scientific publication, p j , j = 1, m is the list of publication citations p y , and y = 1, m, ξ y is a coefficient that ensures the existence of a non-trivial solution of the system of linear algebraic Equation (1).As a result of applying Formula (1), a homogeneous system of linear algebraic equations is constructed: where B is the matrix of coefficients of the system of the form: where E is the single matrix, and r = w T is a column vector unknown of grades, w = (r 1 , r 2 , . . . ,r m ).
For there to be a non-trivial solution of the system of algebraic Equation ( 1), the matrix B must be degenerate, i.e., det(B) = 0.
Let us ask a subset of the Cartesian product C ⊂ P × P, which determines the citation of publications P × P = p j , p y p j , p y ∈ P, j = y .From plural scientific publications which are cited by a given publication p j ∈ P, we define through C p j = p y ∈ P p j , p y ∈ C, y = 1, m .The formulas can determine the coefficients of system (1): where β jy is the indicator of the presence of the publication p j in the list of publication references p y , and ξ y is the value inverse of the total number of citations in the publication p y .After finding the estimates, it is advisable to standardize them according to the formula where r i is the PageRank score citation impact of a scientific publication p i , and i = 1, m, r (p i ) is the normalized PageRank score citation impact of a scientific publication p i ,i = 1, m.The more citations a scientific publication has over time, the higher its citation impact.Therefore, to evaluate the citation impact of a scientific publication, you can count the number of citations of this publication.The advantage of calculating the citation impact of a scientific publication using the PageRank method is that this method considers the influence of a scientific publication by the number of citations compared with the citations of other scientific publications.
The citation base of scientific publications was analyzed in the Network Dataset (ver.13), and a citation network was built.Next, the citation score was calculated for all scientific publications based on the number of citations and PageRank rating citation impact of all scientific publications.It is necessary to solve the system of linear algebraic equations of large dimensions (2) to find the PageRank score citation impact.The iterative process of the Gauss-Seidel method is used to find the approximate solution of the system of linear algebraic Equation (2).At step zero, the value of the PageRank scores citation impact of all scientific publications is equal to 1.At the k-th step, the value of each PageRank score citation impact is calculated.The following formula is used to find the index of the publication: where r k j is the approximate value of PageRank citation impact publications p j at the k-th step, r k−1 j is the approximate value of the PageRank estimate citation impact publications p j at the (k − 1)-th step, and the coefficients are calculated according to Formulas (3) and (4).
After each step, starting from zero, the maximum relative change in citation scores was calculated to impact scientific publication according to the formula: where ∆ k is the maximum relative change in PageRank scores citation impact scientific publication p j .The iterative method stops if ∃ ε > 0 the maximum relative change in citation scores impacts scientific publication ∆ k < ε.The value ε > 0 is some small number that is specified in advance.After that, the values are normalized according to Formula (5).
A method for determining the gender composition of authors of scientific publications is proposed.The conceptual diagram of the method is shown in Figure 1.The method consists of three stages.
At the preparatory stage, PageRank scores are calculated for each scientific publication's citation impact and the citation impact calculated by the number of citations.
In the first stage, the gender identity of the authors is determined by their names using the genderize.ioservice [38].This service allows you to determine with the specified accuracy whether the entered first name belongs to a male or female.First is used to determine the gender name of each author.If the name belongs to a male's name according to the genderize.ioservice (identification accuracy threshold exceeds 0.9), then the author is identified as a man.If the name belongs to a female, according to the genderize.ioservice (identification accuracy threshold exceeds 0.9), the corresponding author is identified as a female.If the identification accuracy threshold is less than 0.9, then we believe the author's gender cannot be determined.The threshold is chosen empirically since the gender of the author should be identified as accurately as possible.As already indicated, among the authors of publications, there may be a small part of those who, according to the genderize.ioservice, are identified as male or female, but they are not binary.Determining this fact by the first name is impossible.
In the second stage, the set of scientific publications with the known gender of the authors is divided into eight subsets (Table 1).If the gender of at least one of the authors could not be determined, then the article belongs to the subset with an uncertain gender composition of authors.Each author of a scientific publication has a specific affiliation.Accordingly, the publication belongs to those countries whose authors are affiliated with institutions of higher education or scientific institutions of these countries.
Table 1.Patterns of scientific publications by the gender composition of their authors.

Pattern Interpretation
Fff all authors of a scientific publication are female (more than one author) Mmm all authors of a scientific publication are male (more than one author) Fmm all authors of the scientific publication are male except for the first author, who is female Mff all authors of the scientific publication are female except for the first author, who is male Ffm authors of scientific publications, both male and female.The first author is female Mfm the authors of the scientific publication are both male and female.The first author is male F the scientific publication has one female author M the scientific publication has one male author From the database of scientific publications, Citation Network, the dataset was selected from those scientific publications affiliated with the list of countries with different gender parity scores according to the Global Gender Gap Report 2022 [39].It is necessary to check whether there is a correlation between citation scores impact of scientific publications by authors from certain countries on their gender parity score, according to the Global Gender Gap Report 2022.
Also, to establish the dynamics of changes in the citation rating impact of scientific publications of different countries over time, their evaluations were calculated for two patterns with purely male and female authors.
Jupiter notebook environment was used for scientometric analysis and dataset processing in Python programming language.

Collection of Data
The database of Citation publications was used for the scientometric analysis of the Network Dataset (ver.13) of 5,354,309 scientific publications and 48,227,950 citations [31], collected from databases DBLP [32], ACM [33], Microsoft Academic Graph [34], and others.The specified version contains current data on publication citations as of May 2021.
The research used data that other researchers partially pre-processed.In particular, the considered dataset does not contain duplicate publications.Unique identifiers are assigned to each researcher and each publication.Also, only the authors' full names and their countries of affiliation were used in the study.The probability of spelling errors in these data is minimal.We also manually checked randomly selected data samples.
When determining the gender of the author, we avoided controversial points.If the genderize.ioservice did not indicate the gender with sufficient probability, we marked the gender of the author as unknown.
The patterns of the gender composition of the authors of these publications are defined in Table 1, and services for identifying male and female first names were used.The genderize.ioservice was used to compile lists of male and female first names.The genderize.ioservice contains data on the potential gender of 114,541,298 first names from 242 countries worldwide.Among the authors of publications in Citation Network, 451,052 unique first names were identified in the dataset, for which the gender affiliation of the authors was determined using the genderize.ioservice.As a result, it was established that among the authors of publications, there are 86,792 female names, 193,747 male names, and 170,513 names, the gender of which could not be established with a reliability of more than 90%.As a result of applying this method, the gender identity of all authors was established for 76.6% of publications in the selected dataset.For 23.4% of publications, it was not possible to establish gender affiliation for at least one of the authors.
To determine the gender of the authors, the use of the Gender API [40] service, which contains data on 6,084,389 first names from 191 countries, was also considered, but this service offers only 100 requests per month for free use.Therefore, it was selected for control.Among 280,539 first names of scientific publications, for which the gender of the authors was determined using the genderize.ioservice, 100 were randomly selected, for which the gender of the authors was determined using the Gender API service.In all 100 cases, gender identity coincided, which makes it possible to assert the sufficient reliability of the proposed method.
The space character separates author's full name into words to select the first author's name.Next, a search is conducted for each word in the list of names without considering the case of the letters.If the author's first name is not in the list of names according to the genderize.ioservice or only the initials are indicated, then it is considered that the gender of the author could not be established.In addition to the subsets specified in Table 1, one more subset must be constructed.This subset will include the remaining scientific publications and the gender of the authors, which could not be established by the specified method (NA).
His affiliation was determined to establish the author's affiliation with a specific country.A publication belongs to a subset of publications from a particular country if at least one of the authors is affiliated with a higher education institution belonging to that country.

Dataset Features Research
For scientometric analysis, the entire database analyzed scientific publications in English from 1815 to 2021; however, publications and bases were unevenly distributed over time.About 90% are scientific publications published from 1998 to 2021.The quantity of publications in the Citation Network Dataset (ver.13) by decades is shown in Figure 2.
shown in Figure 3.This study analyzed the data comprehensively, and the distribution was not carried out separately according to these directions.For visualization, data by subject were selected, including more than 200,000 publications.Relevance to the subjec area was determined by the FOS parameter from the Citation database Network Datase (Table 2).It should be noted that a scientific publication can belong to several subject area simultaneously.It can be assumed that, depending on the subject area to which scientific publication belong, the gender composition of the authors of these publications may differ.In addi tion, citing such publications from various subject areas may have certain features.How ever, this is a separate research task requiring more data from other subjects.The subject areas of the publications in this database were studied separately.The central part of publications belongs to such subject areas as computer science, artificial intelligence and artificial neural networks, mathematics and discrete mathematics, optimization and combinatorics, and software engineering.The cloud of subject directions is shown in Figure 3.This study analyzed the data comprehensively, and the distribution was not carried out separately according to these directions.For visualization, data by subject were selected, including more than 200,000 publications.Relevance to the subject area was determined by the FOS parameter from the Citation database Network Dataset (Table 2).It should be noted that a scientific publication can belong to several subject areas simultaneously.
Publications 2023, 11, x FOR PEER REVIEW   It can be assumed that, depending on the subject area to which scientific publications belong, the gender composition of the authors of these publications may differ.In addition, citing such publications from various subject areas may have certain features.However, this is a separate research task requiring more data from other subjects.
The subject area in this dataset was already defined by the authors of the study published in [35].Some of the specified subject areas may be part of other, more general subject areas.For example, artificial intelligence can be a subfield of computer science.

The Results of the Calculation of PageRank Citation Impact Index and Citation Impact Index by the Number of Citations
The Citation database Network Dataset was calculated by its citation impact according to the PageRank method and taking into account the number of citations.The accuracy of the iterative PageRank method has been established in citation impact ε = 10 −4 .The maximum relative change in PageRank citation impact of a scientific publication is considered the upper estimate of the absolute error of the method.After performing six iterations of calculating the impact rating of publications by Equation (7)., the absolute error was ∆ 6 = 2.48 × 10 −5 .The authors consider this estimation accuracy sufficient, so the calculation process was completed ∆ 6 < ε.A citation score was also calculated to assess the impact of scientific publications based on their citations in other publications.According to this method, all scientific publications in the database are reviewed, and the number of citations of one publication in others is recorded.This number will determine the citation impact of a scientific publication.
After calculating the citation scores and impact of scientific publications among all publications in the dataset, data on publications from countries for which the research hypothesis is tested were filtered.Next, the gender identity of the authors of these publications was determined using the genderize.ioservice.As a result of the research, the gender identity of all authors was established for 76.6% of publications.For 23.4% of publications, it was not possible to establish gender affiliation for at least one of the authors.For each country, publications were divided into subsets according to the patterns described in Table 2. Table 3 shows the number of scientific publications whose authors are affiliated with the specified 12 countries.Data for all countries are given in Appendix A. According to the Citation database, two countries with a small number of scientific publications were included in this table Network Dataset for comparison with other countries with a significantly higher number of publications.
Statistical characteristics were calculated for the PageRank score citation impact of the scientific publications: Range (R), Mean (M), Varience (V), the number of publications out of 3σ (the number of outliers, NO), and the mean without outliers (MwO).
As a result of the calculations, it was established that there is a small number of emissions in comparison with the number of publications (NP) for each pattern.The value of the mean without outliers is less than the mean with outliers, but the ratio of calculated values between different patterns is preserved.Also, the dataset was examined to fulfill the diverse requirements within the proposed subsets defined by the defined patterns.For this, the normalized Shannon entropy was calculated using the formula: where H is the normalized Shannon entropy, m v is the power of the subsets of scientific publications according to the patterns in Table 2 and the subset for which it was impossible to determine the gender composition of the authors of the publications (N/A), v = 1, W, W = 9, and m is the total number of publications.It is established that for Citation Network Data (ver.13), H = 0.7197.This such indicator indicates sufficient representativeness of the sample to measure the representativeness with the overall population's distribution out of the scope of this research.Gender composition of authors of scientific publications by specified countries is given in Table 4.It is observed that for most countries, the subsets determined by patterns Mmm and M should include more publications than pattern subsets Fff and F. The requirements of the project, according to which the study was carried out, required the inclusion of research information on the countries of Kazakhstan and Ukraine.The selection of articles for Kazakhstan and Ukraine is not representative, but the general trend regarding the gender composition of the authors of the publications is visible.For each subset that corresponds to the relevant patterns of gender composition and the subset with an uncertain gender composition of authors and selected countries, the impact of scientific publications was calculated by the PageRank method and by the number of citations.Normalized citation scores' impact is given in Tables 5 and 6.The results of a pairwise comparison of publications from the represented countries from different subsets according to different patterns, on average, indicate that scientific publications with the first author, who is male or with a predominantly male composition of authors, have higher citation impact compared to publications whose authors are primarily female (Table 7).The specified trend is preserved for citation impact estimates, calculated by the number of citations and citation impact by the PageRank method.A feature has been established that the maximum number of citations of scientific publications by subset with the pattern Mmm is higher than that of scientific publications from subsets with other patterns of the gender composition of authors for most of the indicated countries.A negative value in Table 7 indicates that the specified advantage of the estimates of the two subsets is reversed.If the value of preferences in Table 7 is closer to zero, there is a bias in the citation estimates and no impact.Accordingly, scientific publications with a male and female gender composition are mainly evaluated equally.
The change in relative PageRank scores was calculated for citation impact for the period up to 2010 and from 2010 to 2021 to understand how the specified preferences change over time.The value of the benefits was determined as the difference between the average normalized ratings of the respective patterns divided by the maximum of the values.The trend of rating changes was also considered, and PageRank citation impact was determined according to different patterns.Figure 4 shows the trends of changes in the values of the evaluations of advantages F ≺ M, Fff ≺ Mmm for different countries comprehensively by publications from four subsets, which patterns F, M, Fff, and Mmm determine.Such subsets of scientific publications were explicitly selected to highlight scientific publications with a purely male or female composition of authors.For subsets Ffm ≺ Mfm, Fmm ≺ Mff, Fmm ≺ Mmm, and Fff ≺ Mff which can be seen from Table 7, preferences vary in different countries, and this change is also traced over different periods.Such results can be connected to many socioeconomic factors, such as female representation in science, cultural characteristics, etc.As can be seen from Figure 4, over the last decade, the citation rate impact for scientific publications with a purely male composition of authors decreased compared to the citation impact of publications with a purely female composition of authors.In most countries in the last decade, there has been an increase in the influence of women in science and the representation of women in scientific research, which is published in the best scientific journals.However, the state of equilibrium, i.e., the approach of preference estimates to zero, has yet to be reached for any country.
Estimates of the preferences of subsets with different patterns by calculated citation impact can determine the availability of opportunities for females and males to participate in scientific projects and publish high-quality scientific articles.It can be assumed that in developed countries, for specific estimates of benefits FM and Fff Mmm , the value will be close to zero.This means that publications with a female and male composition are cited equally.Accordingly, the representation of females and males in science is Table 8 shows the pairwise comparison of relative PageRank scores citation impact of scientific publications from different research areas according to defined patterns.The scores in the table are indicated for the areas represented by the most significant number of publications in the dataset.The research hypothesis is confirmed for all the indicated directions.
Such results can be connected to many socioeconomic factors, such as female representation in science, cultural characteristics, etc.As can be seen from Figure 4, over the last decade, the citation rate impact for scientific publications with a purely male composition of authors decreased compared to the citation impact of publications with a purely female composition of authors.In most countries in the last decade, there has been an increase in the influence of women in science and the representation of women in scientific research, which is published in the best scientific journals.However, the state of equilibrium, i.e., the approach of preference estimates to zero, has yet to be reached for any country.Estimates of the preferences of subsets with different patterns by calculated citation impact can determine the availability of opportunities for females and males to participate in scientific projects and publish high-quality scientific articles.It can be assumed that in developed countries, for specific estimates of benefits F ≺ M and Fff ≺ Mmm, the value will be close to zero.This means that publications with a female and male composition are cited equally.Accordingly, the representation of females and males in science is equally high.

Findings
The estimates of citation impact may, to some extent, reflect the productivity of the authors of these publications.The more the author's publications are cited, the more the author is published in the best scientific journals.Accordingly, for such an author, there will be faster career growth in science, and they will be more invited to participate in scientific projects, etc.There is a "closed circle" effect here.If the author's publications are poorly cited, the career growth of such an author will be slower.
Since two performance assessment methods were used, the correlation coefficient between all assessments was calculated for their comparison.The correlation coefficient calculated between the estimates by the PageRank method and the number of citations equals 0.754.The correlation coefficient was also calculated for non-zero scores, equal to 0.647.This makes it possible to argue that the methods provide related but not functionally dependent estimates.Since relative evaluations are used for comparison, the different number of scientific publications from different patterns affects the evaluation result.
In many studies, for example in work [41], it is indicated that the participation of females in science is complicated, mainly due to pregnancy, the need to devote more time to raising children, and the greater representativeness of males in the management of scientific projects.Even a short-term pause in scientific activity can affect the dynamics of career growth in this direction, publication of high-quality scientific papers, research in scientific projects, etc.It can become more acute in different cultures and according to the socioeconomic status of the countries.Accordingly, this direction depends on ensuring gender equality in the country.
Based on the results, it can be concluded that scientific publications with male authors are cited more.Accordingly, their scientific publication productivity will be higher.It is established that the citation impact of a scientific publication depends on the gender composition of its authors.This means that the gender composition of scientific teams working on joint research affects their scientific publication productivity.Considering the superiority of publications with a male composition over publications with a female composition, we can conclude gender inequality.That is, the scientific publication produc-tivity of female authors in these conditions will be lower than male authors.The results of this study confirm the results published in [42].In particular, using coarsened exact matching, we show that publications by women are cited less by Wikipedia than expected, and publications by women are less likely to be cited than those by men.
However, the dynamics of evaluations of the advantages of subsets according to the defined patterns of the top ten countries by publication representation in the Citation Database Network Data show an overall improvement in gender equality in science.
Citation scores impacted scientific publications by certain countries' authors' gender parity scores, according to the Global Gender Gap Report 2022 [39].It was established that the correlation coefficient is −0.168, which indicates a weak anti-correlation.This can be explained by the fact that the gender parity score refers to all aspects that affect gender equality in a country.In this study, only the aspect of scientific activity is considered, particularly one of its components: publication activity and citation of scientific publications.In addition, many other socioeconomic and cultural factors influence the equal representation of females and males in science and their scientific results.

Limitations and Future Research Lines
A limitation of the study is that in the Citation database Network Dataset, most publications relate to the subject area of natural sciences.Accordingly, the presentation of scientific publications in the social sciences or humanities could be more extensive.It is possible that, for publications in a non-naturalist subject area, value estimations of the citation impact of scientific publications will differ from those calculated in this research.Also, note that the number of citations to scientific publications in some countries may influence the received results.The presence of a small number of outliers in the dataset was established.However, based on the results of the calculation of statistical characteristics, it can be concluded that these emissions do not affect the PageRank score for countries with a sufficiently large amount of data.However, it can affect the calculation of the PageRank score of those countries for which there is insufficient data in the database.
The most common gender for a name may differ across countries.For example, Andrea is typically used for women in the U.S. and men in Italy.Such authors, taking into account the threshold value of the accuracy of the identification of the gender of the name, could be defined as authors with an unknown gender.In a future study, combining affiliation data with their gender imputation to improve accuracy will be used.
Another limitation is the impossibility of setting authors from non-binary gender since identifying whether the author is male or female was made based on their first names.
The more citations a given article receives over time, the higher its influence and the higher the author's productivity.Accordingly, one of the directions of future research is the assessment of aspects of the organization of project teams with different gender compositions on the productivity of each team member and the team's results as a whole.Also, an essential aspect of future research is to show the dynamics of changes in the evaluations of the preferences of subsets according to the corresponding patterns.In addition, the specified patterns can be considered patterns of scientific collaborations.This can be singled out as a separate indicator for assessing gender equality in scientific activity in different countries, regions, universities, etc.The research aims to inform countries, universities, and scientific institutes of problems related to gender gaps in science and to find ways to overcome them.

Conclusions
This work analyzed the citation impact of scientific publications by authors with different gender compositions.The PageRank method was used for citation impact evaluation of scientific publications and calculating the number of citations of scientific publications.The estimated citation impact of publications is calculated for different countries by eight subsets of publications that correspond to the patterns of the gender composition of their authors.The citation score is also calculated in cases where the gender composition of the authors of a scientific publication cannot be identified.The advantages of evaluations for subsets corresponding to different patterns are calculated.
Based on the Citation Network Dataset, results of the citation impact evaluation of scientific publications with mostly male authors indicate that the citation impact of publications with a mixed gender composition prevails over the citation impact of publications with a only female composition.It indicates that articles from mainly male authors are cited more than articles with a mixed or female composition of authors.Analysis advantages in dynamics indicate that in the latter decade, there was a reduced influence of the gender composition of the authors' publications on citation impact.This may be the result of gender equality policies in many countries.However, the obtained results still confirm the existence of gender inequality in science, which may result from cultural and socioeconomic factors or natural homophily.
The obtained results can be considered more broadly.Author groups are often established, and the same author groups publish different publications in their direction.This means that citation scores obtained from calculation of the impact of scientific publications with different gender compositions of authors correspond to the assessment of the productivity of different gender patterns of scientists in scientific collaborations in different countries.This is important for intensifying the debate in the direction of ensuring gender equality and overcoming gender gaps in science.An increase in the scientific publishing activity of the authors contributes to the growth of the scientific productivity of the institutions with which these authors are affiliated.The obtained results do not indicate the presence of discrimination based on gender, and the results indicate the peculiarities of citing scientific publications with different gender compositions.However, the intensity of citations of such publications can be influenced by various socioeconomic, cultural, and other factors.
Appendix A (Tables A1-A3) presents the power of subsets of publications that correspond to the patterns of their gender composition.The average normalized PageRank scores indicated the citation impact of scientific publications by several citations for countries with more than 100 authors affiliated.

Figure 1 .
Figure 1.Conceptual diagram of the method of determining the gender composition of authors of scientific publications [22].

Figure 1 .
Figure 1.Conceptual diagram of the method of determining the gender composition of authors of scientific publications [22].

Figure 2 .
Figure 2. Number publications by decade based on Citation Network Dataset.

Figure 2 .
Figure 2. Number publications by decade based on Citation Network Dataset.

Figure 3 .
Figure 3. Distribution publications by subject area for Citation Network Dataset datab

Figure 3 .
Figure 3. Distribution publications by subject area for Citation Network Dataset database.

Figure 4 .
Figure 4. Change in the values of the preference estimates FM and Fff Mmm for different countries.

Figure 4 .
Figure 4. Change in the values of the preference estimates F ≺ M and Fff ≺ Mmm for different countries.

Table 2 .
Number of scientific publications by different subject areas, according to the Citation database Network Dataset (displayed data by subject area with more than 200,000 publications).

Table 3 .
Descriptive statistics for the PageRank score citation impact of the scientific publications.

Table 4 .
Gender composition of authors of scientific publications by specified countries.

Table 5 .
Normalized relative citation scores impact of scientific publications, determined by the number of citations.

Table 6 .
Normalized relative PageRank scores citation impact of scientific publications.

Table 7 .
Pairwise comparison of relative PageRank scores citation impact of scientific publications from different subsets according to defined patterns.

Table 8 .
Pairwise comparison of relative PageRank scores citation impact of scientific publications from different research areas according to defined patterns.

Table A1 .
Power of subsets of posts that match patterns of their gender composition (data for countries with more than 100 authors).

Table A2 .
Average normalized PageRank scores citation impact of scientific publications for countries with which more than 100 authors are affiliated.