Currently, the global economy is highly dependent on software, and it is becoming more common for companies that do not have software development as their end product to carrying out this activity to support their business [1
]. Organizations, in general, are structured to meet the needs of the market and to adapt to the innovations that occur in the area of technology.
The closed development model, in which a business owns the intellectual property of a software and is fully responsible for its development, was the predominant model for the software development industry in the early 2000s [1
]. However, this predominance did not last long, since in a short period of time the open source software development model (OSS), “where the company shares the software development code” [2
], became part of the routine of software development companies.
Major companies in the information technology industry are opting to open source their strategic products in order to gain a competitive advantage. Among them, Amazon, Google, Facebook and Microsoft have embraced open source for their artificial intelligence solutions. The decision to open source for this kind of software—which can give a considerable competitive advantage to a company—may not seem to promise a financial return, however, what these companies are likely to expect as a return is the opportunity to be the basis for future innovation [3
Multiple studies show that women are underrepresented in almost all fields of Science, Technology, Engineering and Maths (STEM) [4
]. This gender gap is also present at higher education institutions in both student numbers and academic staff [5
]. There has been an increasing number in studies approaching women in a male-dominant field. Women have entered many other previously male-dominated fields, including other STEM fields, but not computer science and engineering [6
]. While social capital is beneficial to the long-term engagement of both sexes, women appear in small numbers on most software development teams [7
]. Therefore, the understanding of the profile of women working in open source projects and identification of factors that influence the permanence and engagement of these women in teams can assist in pinpointing problems or bottlenecks that may be hindering women’s participation.
LinkedIn’s 2018 diversity report shows that women represent 42.9% of their workforce, with a 39.1% representation in leadership positions, representing a 12% increase in the last two years [8
]. Despite this increase, the representation of women in the technology area is 21.8%, while men represent 78.2%. Some organizations aim to change this underrepresentation, by adopting programs committed to a set of goals to increase women’s workforce participation and create a more inclusive culture. For an example, a recent report about diversity in Google’s company shows that hiring women has grown to 33.2% [9
], an increase of more than 1.9% regarding the previous report. In particular, the concentration of women in non-tech areas increased to 47.2%, representing an increase of more than 3.3 points. Although the hiring of women has increased, hiring for companies’ leadership positions has decreased by 25.9% (
points) from the previous report [9
In this context, this paper proposes a systematic literature review (SLR) in the main databases of scientific works in the computation field with the aim of discovering factors that can help increase the rate of women’s interest in OSS projects and software development projects.
The main findings of this research were that women are underrepresented in software development projects and open source software projects. The main causes of underrepresentation of women may be related to their workplace, as there is—implicitly—a gender bias from the part of men, towards women. In addition, women are more likely to abandon their careers in technology. We have not found in the literature a model for increasing women’s engagement in software development projects. An analysis of existing initiatives to increase women’s participation in STEM needs to be undertaken, as studies show that current solutions are not having a positive effect. Increasing the number of women in STEM may lead to an increase in the number of women in open source software projects and software development projects.
We see varying representation from men and women in different developer roles on our survey. All categories have dramatically more developers who identify as men than women but the ratio of men to women varies. Developer types above the line have respondents that are more likely than average to be men, and those below the line have respondents who are more likely than average to be women. Developers who are data scientists or academic researchers are about 10 times more likely to be men than women, while developers who are system admins or software development and information technology operations (DevOps) specialists are 25–30 times more likely to be men than women. Women have the highest representation as front-end developers, designers, data scientists, data analysts, quality assurance (QA) or test developers, scientists, and educators.
The remainder of this paper is organized as follows: Section 2
presents the relevant concepts to support the research and related works. Section 3
presents the conduction of the systematic literature review. In Section 4
, we present the results obtained from the systematic literature review. Section 5
presents an analysis and discussion of the results, as well as limitations and threats to the validity, and lastly, in Section 6
we present our conclusion and future works.
According to UNESCO [10
] gender equality exists when women and men enjoy the same status and have equal conditions, treatment, and opportunities to achieve their full potential, human rights, and when they contribute to and benefit from economic, social, cultural and political development [10
]. Gender diversity deals with equal representation of men and women in the workplace.
Research has shown that diversified software development teams are a factor that is directly associated with increased productivity in projects [11
]. A diverse team can be described as a composition of members of different ages, with different levels of experience and knowledge in the area, as well as the participants’ gender.
A more diverse software development team is closer to the end users of a software, as these users are typically considerably diversified. Therefore, a more diversified development team has a greater possibility to understand and represent the end user’s needs, contributing to an appropriate alignment between the software produced and its possible users [13
Team diversity is a factor that can be explored to a great degree in open source software development projects, given that a decentralized team working with the common goal of creating open source software is a characteristic of this kind of project. Generally in this type of environment, a considerable variety of culture, language and habits is observed from the participants, who often are from geographically separated localities [11
The scope of the open source definition exceeds the plain access to the source code of the product/system to be developed. According to the Open Source Initiative (OSI), it is a set of criteria to be followed for software distribution, such as [2
Free redistribution: The license must not have any restricted part to be sold or distributed separately, such as a software distribution component composed of several different codes. The license will not require royalties or other fees for the sale;
Source code: The compiled form and the source code must be included in the program that will be distributed. If the product is not distributed with the source code, there should be a well-known manner to obtain the source code at a reasonable cost to be downloaded over the internet or reproduced. The code must be readable and intelligible so that any programmer can modify the program;
Derived works: The license must allow modifications and derived works. The products originated from modifications must be distributed under the same license terms from the original software;
Integrity of the source code of the author: The license must explicitly allow the distribution of the program built from the modified source code or require that derived programs have a different name or version number of the original software;
Non-discrimination against persons or groups: The license can not be discriminatory against any person or group of persons;
Non-discrimination against areas of practice: The license should not restrict anyone from using the program in a specific field of activity;
Distribution of the license: The rights associated with the program must apply to all those whose program is redistributed, without the need to execute an additional license for these parties;
Non-specific license for a product: The rights associated with the program must not depend on the program being part of a specific software distribution. If the program is extracted from this distribution and used or distributed within the terms of the program license, all parties for whom the program is redistributed must have the same rights as those granted in connection with the distribution of the original software;
License not restricted to other software: The license must not contain restrictions to other software that are distributed with the licensed software;
Technology-neutral license: No license clause can establish an individual technology or interface style to be applied in the program.
When software products meet the distribution criteria and are classified as open source software, a number of advantages can be obtained, such as: An increase of creativity in development, improvements in product quality, a decrease in production cost, and faster identification of failures [14
3. Systematic Literature Review
This work presents the conduction of a systematic literature review (SLR). An SLR is a way of identifying, analyzing and interpreting available evidence related to a particular research question, area or phenomenon of interest [24
]. Studies that contribute to a SLR are called primary studies. A systematic review is considered a form of secondary study. During the execution of this work we followed the phases of Planning, Conduction, and Results [24
Planning: Aims to identify the real need of the SLR, in other words, the motivation for the execution of research [25
]. This phase is composed of the main activities: to define the objective, to prepare the protocol that will guide the SLR in order to minimize biases that can be committed by the researchers, and to evaluate this protocol, with the execution of the test of the research protocol in the data bases;
Conduction: Performs the application of the search strategy in order to identify and select studies according to the protocol defined in the planning phase. From the set of selected studies the data necessary to compose the results of the work must be extracted and synthesized [25
Publication of Results: Prepares the final SLR documentation, containing the description of the results and the answers to the research questions defined in the work protocol. The results, where possible, should be disclosed to potential participants [26
The tool (State of the Art through Systematic Review—StArt) [28
] auxiliated in the planning and conduction phases of the systematic literature review. The SLR was carried out with the objective of mapping the problems causing women’s lack of interest and possible solutions to these problems. Also, to identify the profile of women who work on open source software development projects and not open source projects.
3.1. Planning of the SLR
presents the research questions (RQ) that will comprise this systematic literature review.
3.2. Research Strategy
The search strategy involved the use of Automatic Search [29
], which consists of searching through data bases with a search string, followed by Manual Search [29
], in which papers in annals of conferences, congress or specific journals were searched. Automatic Search was performed in three data bases selected for having a considerable high volume of papers published in journals and conferences in the area of Information Technology and Communication, those are:
3.3. Selection Criteria (Inclusion and Exclusion)
The following selection criteria were defined for the selection of primary studies:
The work must be available in the previously defined digital library;
The study must have been written in English or Portuguese and published between 2007 and 2019;
The work must be related to the area of Information Technology or Computing;
The study must be related to open source software projects and gender diversity or software development project and gender diversity;
The work can be classified as gray literature, namely, technical reports, preliminary studies, technical specifications, official documents of specific organizations [25
As the criteria of exclusion for the studies, the non-fulfillment of some of the inclusion criteria was considered, as well as:
Regarding the evaluation of the identified studies quality—after the execution of the search strategy—the selection of the most relevant papers that compose the SLR must be conducted through the four steps of study selection, defined by [29
], which are shown in Figure 1
Step 1: Execution of the search strategy involving the automatic search. A preliminary list of studies should be generated for each data base used, and the reference file (in the .bib extension) should be downloaded to enable SLR management with the aid of the StArt tool. Duplicate entries should be deleted by the tool, while importing the file;
Step 2: Identification of potentially relevant studies, based on reading the title and abstract. In this step, it should be possible to rule out studies that are clearly irrelevant to the research. In case of doubt about the permanence of some study in the SLR, the next step may assist in this definition;
Step 3: Reading of the introduction, methodology and conclusion of the pre-selected works, again applying the inclusion and exclusion criteria;
Step 4: The papers selected in Step 3 should be read in full and the volume of papers resulting from this step should be used to compose the SLR and support the answers to the research questions.
3.4. Conduction of the SLR
The process of the automatic search of the primary studies, consisting of Step 1 of the selection strategy, was executed with the string defined in the protocol in the data bases previously defined. The automatic search on the three data bases resulted in a total of 61 papers, of which 29 or 47.5% of the articles came from the http://ieeexplore.ieee.org
(IEEE Digital Library), 19 or 31.2% of the articles from the http://dl.acm.org/
(ACM Digital Library) and 13 or 21.3% of the works were located at http://dblp.uni-trier.de/
(DBLP-Computer Science Bibliography). It is important to note that five studies were identified as duplicates and have already been eliminated during the initial load of the StArt tool.
The execution of Step 2 of the selection strategy, which consisted of reading the title and abstract of the papers, reduced the volume of pre-selected papers to 34 articles. During this step, the information required in the StArt tool was totally filled, since IEEE Digital Library alone, in its bibtex file, brought in all of its required data (Author, Title, Keywords, Journal, Abstract and Year).
After the execution of the fourth and last step of the selection strategy, which involved the complete reading of the articles, 24 primary studies were selected to be used in the data extraction.
The progress of resulting works at each step of the article selection strategy accumulated per year of publication is presented in Figure 2
In Figure 2
, it is possible to identify that the number of publications involving the participation of women in open source projects is in a phase of growing interest by researchers in the last 10 years, with an increase in the number of publications in the last 2 years.
3.5. Data Extraction
The primary studies selected in the SLR are presented in Table 2
is composed of the following fields: Primary Study, column identifying the selected studies; Reference, to present identification of the author of each work; and RQ, information on which research question was answered with the help of the identified work.
5. Comparison of Results with Existing Evidence
In order to analyze women’s participation in software projects, we make use of results from the Google Summer of Code (GSoC) which is a global program focused on introducing students to open source software development. Since its inception in 2005, the program has brought together more than 14,000 student participants and 24,000 mentors from over 118 countries worldwide [48
]. Figure 3
presents a digest of participants in GSoC in the last three years: 2016, 2017 and 2018, organized by gender (male or female) and type of developer (mentor or student).
Analyzing only the data regarding the gender of GSoC participants from 2016 to 2018 as shown in Figure 4
, there is a minor variation between the years, but persisting the volume of women close to 11.13% of the total number of participants, which reaffirms the results of the research found in the systematic literature review articles, in which the percentage of women contributing to software projects is less than 12%.
Inspecting data solely in respect to the type of developer (student or mentor) who are GSoC participants from 2016 to 2018, presented in Figure 5
, it was observed that there was a minimal alteration between the years, but the volume of mentors remained close to 68% of the total number of participants in the last three editions. It is worth noting that women mentors in 2016 were 8.43% of participants, in 2017 13.60% and in 2018 12.25% of total mentors, as can be seen in Figure 3
The Stack Overflow’s annual Developer Survey is the largest and most comprehensive survey of individuals who code throughout the world. Each year, they send out a survey covering everything from developers’ preferred technologies to their job priorities [49
]. From the survey results it is possible to have an understanding of the profile of women who work with software development in several countries. The percentage of women in software development is higher only when their experience is between 5 and 9 years, 34.0%. When we look at differences in years since learning to code by gender, we see evidence for the shifting demographics of coding as a profession, as well as retention problems in the tech industry for underrepresented groups. Research shows, for example, that women leave jobs in tech at higher rates than men, as shown in Figure 6
. Among our respondents, both in the United States and internationally, women are about twice as likely as men to have three years of coding experience or less. Companies interested in building a diverse developer workforce that is more reflective of society should focus on retention of their senior developers from underrepresented groups, along with thoughtful hiring from the population of more junior developers [49
presents the results obtained from the survey regarding the gender of the developers per country. The United States ranks first with 11.7% of women acting as software developers, followed by Canada with 10.7% and the United Kingdom 8.6%. Brazil is second to last with 5.2% and lastly Italy with only 3.7% of women [49
]. Overall, 91.7% of respondents are men, 7.9% are women, and 1.2% are non-binary or non-conforming gender. In addition, developers are usually young, 27.6% of participants are between 25 to 29 years old and 21.1% are between 20 to 24 years old. Regarding contributing to open source, 36.3% of survey participants never contributed, 28.1% contribute less than once a year, 23.1% less than once a month but more than once a year or 12.4% once a month or more often [49
Regarding the greatest challenges to productivity, the developers responded that they are: Distracting work environment and Meetings, as shown in Figure 8
. The largest difference in the relative frequency of female and male responses was in the item Being tasked with non-development work (36.9% for men and 32.8% for women). Furthermore, women consider Toxic work environment to be a more pronounced issue, 23.2%, while for men it is only 20.5%.
The results presented from GSoC dataset extraction [48
] and from Stack Overflow’s annual Developer Survey [49
], are summarized in Figure 9
. By means of the synthesis analysis of the data presented in Figure 9
, it is possible to confirm the findings of the systematic literature review approached in Section 3
, enabling a more accurate description of the scenario of women’s participation in software development projects and in open-source communities: Though increasing rates have been pointed out, women are still a target of underrepresentation in the area, reaching rates of less than 11% of the total amount of developers.
Threat to Validity
We cannot guarantee that all relevant primary studies were selected. It is possible that relevant papers were not chosen. In order to mitigate this, we performed the automatic search, and complemented it by performing a manual search to try to collect all primary studies in this field. During the data extraction process, the primary studies were classified based on our judgment. In order to mitigate this threat, the classification process was performed using peer review.
Another threat to the validity of our results is the possibility that the first author might have introduced his bias in the data collection process. In this respect, the analysis process of collected data was performed along with another researcher. This researcher reviewed and analyzed all the intermediate results (primary studies and survey results). This iterative process was repeated until the end of data collection and data analysis processes. We also held meetings to validate the obtained results.
A third threat is that a literature review method based on scientific papers, in principle, may provide partial answers. To mitigate this threat we also reviewed the Stack Overflow’s annual Developer Survey [49
]. This survey annually investigates the preferences of code developers around the world and covers a variety of issues related to gender, personal characteristics, and techniques. The survey was answered by 90,000 developers and only 7.9% of the participants are women.
Recommendations made by research in the area of open software development projects strongly highlight the urge to increase the diversity of the participants’ personal characteristics, due to the innumerable benefits that this miscellany can offer [13
]. Nevertheless, engaging participants with different profiles and backgrounds in software development projects is not a simple task.
Academia and industry efforts are crucial to promote a change in the volume of female participation in this type of project, and this work, through a systematic review of literature in the main libraries of scientific works in the area of computation, sought to assist in the finding of factors that may aid in a higher rate of women’s interest in open software development projects.
The main contributions of this work are in the answers of the research questions of the SLR, where it is possible to identify factors that cause women’s lack of interest in open source software projects and software development participation; possible solutions were suggested to increase the engagement of this public, for instance, the effort of keeping senior developers from underrepresented groups active, along with the thoughtful hiring of more junior developers [49
]. As a result of this work, it was also possible to identify the profile of women that participate in Google Summer of Code in relation to the number of students and the quantity of mentors, which corroborates with the results found in the systematic literature review where there is an underrepresentation of women, as well as Stack Overflow’s annual Developer Survey results [49
] where only 7.90% of software developers in the world are women.
As a follow-up to this research, we expect to deepen the study on the aspects that may influence the participation of women in open source software projects and software development projects, as well as to propose ways of addressing the identified problems regarding issues of gender inequality in open source communities and software factories.