How Sexuality Education Programs Have Been Evaluated in Low- and Lower-Middle-Income Countries? A Systematic Review

Background: Complex sexual and reproductive health interventions, such as sexuality education (SE), contain multiple components and activities, which often requires a comprehensive evaluation design and adaptation to a specific context. In this review, we synthetize available scientific literature on types of evaluation designs used for SE programs in low- and lower-middle-income countries. Methods: Two databases yielded 455 publications, from which 20 articles met the inclusion criteria. Narrative synthesis was used to summarize the findings. Evaluation approaches were compared to recommended evaluation frameworks. The quality of articles was assessed by using MMAT 2018. Results: A total of 15 interventions employed in 10 countries were evaluated in the 20 selected articles, with the quality of publications being moderate to high. Randomized controlled trial was the predominant study design, followed by quasi-experimental design. There were seven process evaluation studies, using mixed methods. Main outcomes reported were of public health or behavioral nature—condom use, sexual debut or delay, and number of sexual partners. By comparing evaluation designs to recommended frameworks, few studies fulfilled at least half of the criteria. Conclusions: Evaluations of SE are largely dominated by quantitative (quasi-)experimental designs and use of public health outcomes. To improve understanding of SE program effectiveness, it is important to assess the quality of the program development, its implementation, and its impact, using existing evaluation frameworks and recommendations.


Introduction
This paper studies the designs used to evaluate sexuality education interventions in low-and lower-middle-income countries (LMICs).

Study Aim
Despite the availability of multiple evaluation frameworks and methods suitable for complex interventions, as well as suggestions on assessment of quality and implementation of SE programs, little is known on its use and applicability in different settings. The aim of this review is to synthetize available scientific literature on evaluation designs used for SE programs and to assess the actual evidence-base for SE in LMICs.
The review answers three research questions: What are the most common evaluation designs used for sexuality education interventions? How do these evaluations align with existing recommendations for the evaluation of complex interventions (European Expert Group on Sexuality Education and Realist Evaluation)?
What are the self-reported benefits and limitations of different evaluation designs?

Materials and Methods
We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for systematic reviews [14]. This review was registered in the PROSPERO database-CRD42020148735.

Search Strategy
We searched two main databases: PubMed and Web of Science. Search terms relevant to sexuality education, age groups and evaluation approaches were used. The study population of interest were adolescents and youth (10-24 years old). The UN define adolescents as individuals being 10-19 years old and youth as those persons between the ages of 15 and 24 years [15]. Only studies, which were conducted in LMICs according to The World Bank classification were included [16]. Search terms are described in Table 1. Data search was performed between April and August 2019. In addition, we completed a manual search of the reference lists of relevant articles. All records were exported into Mendeley-an online reference management program produced by Elsevier. After we removed the duplicates, titles and abstracts were screened for inclusion.

Study Selection
This review was limited to full-text original peer-reviewed articles published in English, between January 2009, the year when UNESCO's International Technical Guidance on Sexuality Education was published [1], and January 2019. Articles were excluded if they: (1) provided insufficient information, for example letters, abstracts or conference papers; (2) had a narrow focus on HIV-related knowledge and outcomes; (3) focused exclusively on abstinence approach to sexuality education without addressing broader topics such as contraception or other STIs; (4) evaluated only national or widely scaled-up programs, which may require more complex approach to evaluation influenced by a number of factors such as region, type of schools etc., and render its incomparable with small-scale interventions; and (5) implemented interventions exclusively in health care facilities without school or community components. Details of the study selection are summarized in Figure 1. Titles of the 455 studies and abstracts of 131 records were screened. Full texts of articles that passed the title/abstract stage were obtained for text screening.

Study Selection
This review was limited to full-text original peer-reviewed articles published in English, between January 2009, the year when UNESCO's International Technical Guidance on Sexuality Education was published [1], and January 2019. Articles were excluded if they: (1) provided insufficient information, for example letters, abstracts or conference papers; (2) had a narrow focus on HIV-related knowledge and outcomes; (3) focused exclusively on abstinence approach to sexuality education without addressing broader topics such as contraception or other STIs; (4) evaluated only national or widely scaled-up programs, which may require more complex approach to evaluation influenced by a number of factors such as region, type of schools etc., and render its incomparable with small-scale interventions; and (5) implemented interventions exclusively in health care facilities without school or community components. Details of the study selection are summarized in Figure  1. Titles of the 455 studies and abstracts of 131 records were screened. Full texts of articles that passed the title/abstract stage were obtained for text screening.

Data Extraction
We extracted data relevant to the review questions. Two authors independently read all included articles and extracted data in a predefined and pretested data extraction form in Excel. The following was extracted from each article: authors, year, study setting, main study objectives, study population, study design, limitations, and study findings.

Data Analysis
A descriptive narrative synthesis was chosen as the most relevant and suitable method of data synthesis for this review [17]. Additionally, we developed a framework to assess the comprehensiveness of the evaluation designs, based on realist evaluation components and recommendations for evaluating SE by the European Expert Group [10,18]. The following aspects were assessed:

•
Use of a theory of change (ToC), log frame or middle-range theory (MRT); • Use of mixed methods and data triangulation; • Inclusion of key concepts of realist framework: context, mechanism and outcome (CMO); • Program evaluation: age appropriateness; gender sensitivity; culturally and socially responsiveness; human rights-based approach; positive attitude towards sexuality; comprehensive content; involvement of children and youth in needs assessment and program development; quality and variety of educators' and students' manuals; • Implementation evaluation: process of program development; teacher/educator training and support; linkages with relevant sexual and reproductive health services; and curriculum delivery (e.g., discrepancies in implementation); • Outcome and impact evaluation: short-term outcomes (e.g., knowledge, reflection on norms and values etc.); evaluation by children and youth (e.g., curriculum appreciation); long-term outcomes (e.g., public health outcomes, including unintended pregnancies, and positive sexual self-perception).
Further details on definitions and description of these components are provided elsewhere [10,18]. To calculate and report overall scores for each criterion, we employed a conservative approach; we only assigned score (1), if the criterion was fully addressed and described in the article.

Critical Appraisal
The quality of the included studies was assessed by using the updated mixed-methods appraisal tool (MMAT) [19]. The tool helps to examine the appropriateness of the study aim, adequacy and methodology, study design, data collection, study selection, data analysis, presentation of findings, discussions, and conclusions. For each of the included studies, the relevant five quality questions were asked corresponding to the study type, e.g., qualitative, quantitative (randomized or non-randomized trial) or mixed methods. For instance, the questions addressed were as follows: Is randomization appropriately performed? Is there an adequate rationale for using a mixed methods design to address the research question? Are the findings adequately derived from the data? and other questions depending on the study design. The studies were scored by using percentages (0-100%), where 100% is the highest score. It helped to create an overview of the quality of studies, and there was no exclusion of articles based on the quality score. Any discrepancies were discussed until a consensus was reached between two authors.

Critical Appraisal of Included Studies
All publications scored 60% and more; among them, nine studies received 60%, seven studies received 80% and four studies received 100% (see Table 2).    (12)(13)(14) pre-and post-quantitative female and male adolescents (12)(13)(14) to assess the effect of the intervention on delaying sexual debut and condom use questionnaire 60% 14 Merrill 2018 South Africa non-RCT, no control School female adolescents (11)(12)(13)(14)(15)(16) pre-and post-mixed-method and process evaluation female adolescents (11)(12)(13)(14)(15)(16) to investigate changes in short-term outcomes defined in the intervention model immediately before and after intervention delivery; to understand the intervention's implementation, including the quantity and quality of the  (12)(13)(14) mixed method (process evaluation) female and male adolescents (12-13), teachers to assess whether the intervention was implemented as planned; to assess the quality of the implementation; to understand the impeding and enabling factors for implementation; to assess acceptability and subjective evaluations of the intervention among the students and teachers; and to provide information that could assist in the interpretation of the behavioral outcomes.

General Description of Included Interventions
Study details, methodology and the main objectives of the evaluations are presented in Table 2. Included studies were conducted in 10 countries in Africa and South America. Three interventions were multi-centered, including at least two countries. Two publications reported on quantitative evaluation of the same intervention at different time points [32,33], and seven other publications evaluated three interventions applying different evaluation designs [21,24,25,28,30,34,37]. Thus, the 20 articles included an assessment of 15 SE interventions. All evaluation studies were published between 2009-2019, however almost half of the interventions (n = 7) were implemented before 2009.
All interventions were delivered primarily in schools with three having an additional community component. The sample size of participants varied, from 42 to 12,462 adolescents. The majority of interventions (n = 13) targeted adolescents 10-19 years old, and two also included youths of 20-24 years old. Adolescents benefited from sexuality education were of both sexes; however, three studies targeted only girls [29,35,39]. One program provided sexuality education to students with learning disabilities [27] and one to orphan adolescent girls [35].
Duration of SE programs varied. It was delivered via sessions, lectures or modules, which lasted from 35 min to 1.5 h, and were usually delivered on a weekly basis. The number of sessions and weeks differed between studies, from six to 25 sessions and from five to 16 weeks. Sexuality education was taught by teachers, educators, peers, or volunteers (local or foreign). Lectures, discussions, workshops, home assignments, plays, drama, sport events, comics, and storytelling were used to teach SRH topics. The most frequently addressed topic was HIV/STIs, followed by contraception use, delay of sexual activity, decision-making and negotiation skills, pregnancy prevention, parental communication, prevention of gender-based and sexual violence, and gender norms.

Evaluation Designs
Almost half of the interventions (n = 7) used an RCT design, with pre-and post-implementation quantitative assessment comparing an intervention and a control group. Other interventions followed a quasi-experimental design, with or without a control group, using mixed-methods, quantitative, or qualitative approaches to data collection. The majority of publications reported outcome and effectiveness evaluation results, with less focus on implementation (process) evaluations (see Table 2). Seven publications reported findings from implementation evaluations incorporated in outcome assessment (n = 1) or as a stand-alone assessment (n = 6). Nine evaluations exclusively used questionnaires (self-administered, face-to-face interviews or Audio Computer Assisted Self Interviews (ACASI)) for data collection, while the rest of the studies used a combination of different tools-questionnaires, in-depth interviews (IDIs), focus-group discussions (FGDs), biological samples, observations, checklists, cost tracking, attendance lists, and feedback forms. Evaluations targeted primarily adolescents who participated in the SE programs; however, a number of assessments (n = 6) also included teachers/educators, parents/caregivers, social workers, and peer educators. Evaluation outcomes were mostly reported per arm-intervention vs. control, as the predominant design was an RCT. A handful of studies disaggregated outcomes per gender.

Comparison of Included Evaluations Using Realist Evaluation and Expert Group Consensus Criteria
We applied a number of criteria outlined in the methodology section, to assess how the included studies made use of and incorporated them into their evaluation designs (see Table 3). While several publications reported on behavioral theories, Intervention Mapping, community engagement, and evidence used to develop study activities, only a handful of studies (n = 4, from which one partially and three fully) developed and published a theoretical framework to demonstrate mechanisms on how their intervention activities aimed to address the expected outcomes and to illustrate the specific context. As described in the section above, half of the evaluations applied exclusively quantitative methods to assess the outcomes, while the other half applied mixed-methods approach to data collection (n = 6).
A total of four evaluations (three partially and one fully) mentioned context and/or mechanisms and/or outcomes (CMO) to indicate how and which mechanisms were activated by implemented interventions and in what conditions, to reach the desired outcomes. Program and implementation criteria, e.g., age appropriateness of the program, rights-based approach, and interactive teaching, were partially addressed by all evaluations. All studies measured outcomes (short-term), e.g., improved SRH knowledge, self-esteem and skills developed, with almost half also addressing impact (long-term), such as reduction in STIs and sexual violence. However, the majority of studies demonstrated short-term outcomes immediately after implementation period and up to 24 months, and only one study looked at the longer period-54 months post-intervention [32]. Main outcomes reported were of public health or behavioral nature-condom use, sexual debut or delay, number of sexual partners, STIs incidence, number of unintended pregnancies, and service or HIV/STIs testing usage. Some studies looked at the improvement in SRH knowledge and attitudes, while others looked at communication on SRH-related topics with parents or peers. Seven process (implementation) evaluations reported on design of the intervention, dose, fidelity, acceptance of the intervention, barriers and facilitators of implementation, and monitoring and evaluation processes.

Self-Reported Limitations and Benefits of Different Evaluation Designs
Publications addressed mostly limitations of the study designs. As RCT with a quantitative assessment was used in almost half of the interventions, the main limitations inherent to it were as follows: • Loss to follow-up and low response rate; • Recall and self-reporting bias; • Contamination and systematic differences between intervention and control groups; • Length of intervention-short with no long-term follow-up; • Underestimation of the intervention effect due to provision of benefits to control group; • Low statistical power to perform sub-analysis, e.g., gender or dose, and challenges to pair preand post-measurements due to missing data or intervention adherence issues; • Questionnaire-related issues, e.g., language, terminology and scales used; • Lack of data triangulation.
Generalizability of findings was also questioned by many authors and non-randomized design was seen as a limitation per se. In case of multicomponent interventions, e.g., Aninanya et al. 2015, it was impossible to determine-by using pre-and post-intervention survey-which component or components most influenced study outcomes [31]. Studies that used mixed-method or qualitative approaches reported researchers' bias and lack of representation from different groups, e.g., interviews only with educators and not students.
A handful of studies reported benefits of different evaluation designs and tools used. The strong points were mostly related to RCT design, such as randomization, retention and use of face-to-face interviews/ACASI; however, it was clear from the discussions that mixed-method approach, involvement of various stakeholders, and contextualization of findings hold a potential of strengthening and enriching any evaluation design.

Discussion
To our knowledge, this is the first systematic review to summarize available peer-reviewed evidence on evaluation designs used for complex SE interventions in LMICs. This review not only describes evaluation designs used with their limitations and benefits, but it also compares them to the recommended evaluation frameworks for complex interventions, such as realist evaluation and consensus on evaluation of SE programs.
Randomized control trial (RCT) and quasi-experimental designs with pre-and post-measurements were predominately applied to interventions reported in this review. Similar reviews also demonstrated that these designs are still considered as a "gold standard" for outcome and effectiveness evaluations [10,40]. However, the authors included in the review mentioned multiple limitations related to these designs, such as randomization and blinding, short-term follow-up, drop-out rates, and low external validity [6].
Another shortcoming highlighted is the need for a large sample size to demonstrate a desired effect, which is costly and requires a multi-region or national program implementation [41]. Further, one more potential pitfall of using RCT is the desire to fit the intervention into the "gold standard" and recommended evaluation design, instead of the other way around. Such approach may compromise the quality of the intervention, hinder context adaptation in multi-center trials and prevent from depicting other relevant outcomes, besides of biological or public health outcomes. Similar concerns were also raised by the European Expert Group on Sexuality Education [10].
Additionally, while experimental designs can provide estimates of SRH intervention effectiveness, they offer limited insights on how and why the intervention worked or not. Having only an outcome evaluation result does not allow to distinguish how different components or content were adapted and delivered in practice. They also provide little insight into the ways through which interventions lead to behavior change and what were the facilitators and barriers in these processes. As a result, the ability to generalize and compare findings from one study to a different context might be compromised. Studying the impact mechanisms by using, for example, program and process evaluations alongside trial designs, provides valuable additions and a better understanding of planning, implementation, and monitoring of SRH interventions. The lack of such studies is demonstrated by findings from the current review, where only seven articles used process evaluation or reported on feasibility and acceptability of the intervention. Moreover, using qualitative methods alongside quantitative approach offers more insights into behavioral change in young people receiving sexuality education intervention.
This review also demonstrated that research of SE effectiveness is mostly focused on the reduction of risky behaviors, e.g., STI or unwanted pregnancies as public health outcomes. Secondary outcomes are mostly describing a change in SRH knowledge and attitudes. There is a very limited use of indicators that focus on positive aspects of sexuality. Despite the fact, that indicators such as self-efficacy are often used, they are usually only considered in respect to the desired behavior change, and not as a stand-alone. Indicators measuring the ability to experience pleasurable and satisfying sexual relationships are seldomly used [10]. The updated UNESCO International Technical Guidance on Sexuality Education also highlighted limited rigorous studies assessing "non-health" outcomes to date [42].
A review by Lopez et al. 2016 found that trials do not always adequately report the content of interventions [40], and Hoffmann et al., in 2014, suggested that the overall quality of description of interventions in publications is notably poor [43]. We also faced this challenge when conducting our review, as a handful of studies reported, in detail, the topics addressed and activities performed. This hindered eligibility for a number of studies. There is a need to have a detailed description of the intervention, especially if the evaluation tries to identify a component which has contributed the most to the success of the intervention.
Until around 2009, sexuality education was mainly focused on the issues of HIV infection, risk reduction, and abstinence. A slight shift in terminology, content and perspective on SE took place after UNESCO technical guidelines in 2009 [44]. However, half of the evaluated interventions in this review were implemented before the guidelines became available; thus, the definition and components of sexuality education varied among the studies. We excluded the studies with a narrow focus on HIV and abstinence-only aspect; however, it was challenging to judge from the intervention descriptions to what extent other topics, e.g., decision-making skills and gender or rights, were equally integrated in the curriculum and delivered. To improve the reporting standards, tools such as the Template for Intervention Description and Replication (TIDieR) could be used [43]. In addition, a handful of studies reported on development and use of theory of change (ToC) or log frame, which helps to illustrate the activities and links to desirable outcomes and impact. This is an essential step for any outcome and impact evaluation, which guides the implementation process and assists in design of the evaluation [45].
Few studies in this review conducted SE interventions in multiple contexts. Leveraging heterogeneity through testing an intervention in different settings and performing in-depth case studies might strengthen applicability of the findings [46]. At the same time, the heterogeneity of SE content, delivery, implementation, and evaluation is seen between world regions and countries. The majority of peer-reviewed evidence on SE is coming from high-income countries (HICs). Thus, this review targeted sexuality education programs in LMICs, where adolescents' SRH indicators, social, cultural, and political contexts differ from that in HICs, such as the USA and the European Union member states. For example, in 2016, an estimated 68% of adolescent girls aged 15-19 in LMICs have completed seven or more years of education, with higher rates in Latin America and lower in Africa (51%) [47]. Thus, non-governmental organizations (NGOs) and out-of-school settings in these countries might play a stronger role in implementation of SE. Simultaneously, conservative opposition to SE, lack of teacher training, political will, financing, strong monitoring, and evaluation mechanisms exist in many LMICs and HICs [48][49][50].
To summarize, based on the results of this review, we can demonstrate that SE programs are describing short-term outcomes (n = 14) well; however, we cannot make strong conclusions on whether the SE programs and their curricula were of a good quality, nor whether they were implemented in a high-quality manner. Finally, we have little insights into how the included SE programs meant to achieve their outcomes, as very few (n = 3) provided ToC, log frame, or MRT.

Limitations
This systematic review has a number of limitations. Firstly, only studies published in English were considered, leading to the exclusion of studies published in other languages, such as Spanish, French, or Russian, which are widely spoken in many low-and lower-middle-income countries around the globe. Secondly, this review did not include grey literature, such as UN reports and studies conducted by NGOs, which do not often make it into the peer-reviewed literature and, potentially, use approaches other than RCT approaches. Thirdly, the MMAT appraisal tool was used to assess the quality of reporting in the studies, but more specialized quality assessment tools, such as the Cochrane Collaboration's tool for assessing risk of bias, could have provided more in-depth reviews of quality. Additionally, specific search terms yielded a moderate number of articles, thus studies where "sexuality education" or "evaluation" were not specifically mentioned in a title/abstract or substituted by broad terms, such as "school-based intervention", "SRH program", "HIV intervention", "design and implementation", etc., might be missed. Lastly, due to time constraints and workload, we performed search in two databases: PubMed and Web of Science, which are the most often used search databases; however, we might have missed some relevant studies included in other databases, e.g., Global Health or EMBASE. Finally, we used a conservative approach to calculate overall scores in Table 3-only fully (Y) met criteria. Thus, such approach could misclassify some interventions, as it was not always clear from the information provided in the articles to what extent each criterion was addressed.

Conclusions
This review demonstrated a lack of mixed-methods, theory-driven, and comprehensive approaches in the evaluation of complex sexuality education program. While randomized control trials and quasi-experimental designs are undoubtedly important to demonstrate intervention effectiveness, they are not sufficient to comprehensively evaluate complex interventions. There should be a space for flexibility and adaptability of the evaluation designs to the intervention theory, content, and context. The need for the quality assessment of the development, implementation, and effectiveness of the sexuality education in different settings remains.
Author Contributions: O.I. designed the review, extracted and analyzed the data, and wrote the initial manuscript. M.R. performed the literature search, extracted the data, and participated in the data analysis and editing of the manuscript. S.D. and K.M. supervised the study and contributed to manuscript writing and editing. All authors read and approved the final manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.