Sustainable Education Starts in the Classroom

: Deﬁnitions of a sustainable higher education focus on the di ﬀ erent factors that are critical to the continued existence of the institution, the people it serves, and the surrounding society. If higher education is assumed to be a conduit for the acquisition of knowledge and skills that can contribute to a healthy, ethical, and sustainable society, then it has to be able to induce lasting behavioral change in its primary beneﬁciaries (i.e., students). In the age of fake news, misrepresentation, and rejection of scientiﬁc principles and facts, we identiﬁed cognitive operations that are key to scientiﬁc reasoning (i.e., apply, analyze and evaluate), and o ﬀ ered sustainable practice to students enrolled in a course devoted to scientiﬁc writing. Students were classiﬁed as possessing an inclination towards a reproductive mode of learning, which could increase their vulnerability to absorb fabrications and distortions of information. The research ﬁrst asked whether practice in applying, evaluating, and analyzing induces an information processing change (as measured by the content of scientiﬁc writing). Then, it asked whether environmental disruptions (e.g., shifting from face-to-face instruction, a mode familiar to students, to online instruction due to the COVID-19 pandemic) would a ﬀ ect the likelihood of change. We found that this type of practice was an e ﬀ ective propeller of change in students’ scientiﬁc reasoning. A disposition towards reproductive learning did not impair scientiﬁc reasoning, whereas engagement and practice made a positive contribution. We concluded that behavioral change is blocked by neither the availability of technology, nor the learner’s use, but rather by one’s motivation to make use of opportunities for change. A sudden alteration in the learning environment may create uncertainty but does not substantially alter this motivation. The ﬁndings of the present study can be useful to the development of a sustainable education in the Middle East and beyond.


Introduction
The present research was motivated by a flurry of discussions among faculty and students at a University in the Kingdom of Saudi Arabia (KSA) about how to translate into action the theme that the administrative body of the university had recently included in its mission. "Sustainability" was an appealing call that immediately catapulted faculty and students into discussing plans for sustainable change on campus and beyond. Sustainable change was operationalized as a change that meets the necessities of the present without compromising the ability of future generations to meet their needs [1,2]. Among the topics that were discussed was that of a sustainable university, above and beyond the usual plans for a "green campus". It was agreed that the goal of higher education is to equip current and prospective students with the competencies (encompassing knowledge, skills, and ethical values) necessary to tackle social, economic, and environmental challenges in the present and the future [2][3][4] so that students can serve as both custodians and builders of sustainable societies. If this is the goal, a university's response must not be limited to alterations of the curriculum of particular academic programs to introduce concepts and activities defining sustainability in the domain served by such programs. For instance, in the field of engineering, broadly defined, course work may be introduced to encourage students to solve problems, such as expand recycling, reduc energy consumption or redirect it to greener sources, improve or preserve water quality, and so on. The basic idea is that sustainable education is not limited to a body of knowledge and related activities. It comprises the essential mental operations that produce knowledge and guide behavior. At its core, a sustainable education involves quality instruction that promotes scientific reasoning for two key reasons. First, resistance to changing established but unsustainable practices of thinking and action is generally one of the main obstacles [5][6][7] to not only an inclusive and quality education (one of the sustainable development goals approved by the United Nations) [1], but also one that makes students lifelong learners capable of effectively tackling challenges in the present as well as in the future. A second obstacle is threats to evidence-based information processing in a world where the difference between what is false and true is progressively difficult to identify [8][9][10]. As falsehoods and distortions abound on a multitude of media platforms, scientific reasoning can be both the rescuer and the protector. Scientific reasoning entails a set of higher-order intellectual procedures for information gathering and problem-solving [11][12][13], which cannot be easily automated by machines [14]. Thus, at the university mentioned above, a consensus emerged that training in scientific reasoning would be a key goal for an education that does not merely prepare young people for the labor market but nurtures builders of a more sustainable world shifting from producing and competing to caring and conserving [2].
The question of whether higher education can improve scientific reasoning has provided some unsettling results [15,16]. For instance, through a large-scale longitudinal study of institutions in the United States of America (USA), Arum and Roksa [16] reported that students' analytical, critical, and communicative competencies show little improvement during the first two years of undergraduate education. Ding, Wei, and Mollohan [17] reported a similar finding in a large-scale cross-sectional study of the scientific reasoning skills of Chinese students across the entire four years of undergraduate education. Counter to this dismal outcome, in other studies, often of a smaller scale and including instructional interventions, evidence suggests that scientific-reasoning skills can be not only developed through training but also transferred across domains [11,18,19], and have a long-term impact on learners' academic success [11]. However, concerns regarding the development of scientific reasoning tend to focus primarily on students pursuing (or thinking of pursuing) science, technology, engineering, and mathematics (STEM) fields since scientific reasoning is considered of paramount importance for their successful handling of open-ended, real-world tasks in the professions they have chosen to pursue [20][21][22][23][24].
In our research, we focused on students who are at the beginning of their undergraduate educational journey without pre-selection based on their planned course of study. We asked whether sustained exposure to the principles of the scientific method, through practice with their varied artifacts in the scholarly literature of the social and behavioral sciences, neuroscience, and health sciences, has an impact on the development of scientific reasoning competencies. The scientific-reasoning competencies studied in this research encompass domain-general reasoning skills such as application, analysis, and evaluation [25]. They may include the understanding of how to systematically explore a problem, formulate and test hypotheses, select suitable methods, identify, manipulate and control variables, as well as examine and evaluate research results and their implications. We relied on a conservative approach to put our question to the test. Namely, we asked whether a course that focuses on scientific writing can be imbued with practice exercises that can foster information processing changes as measured by students' ability to produce a suitable research report. Our approach was informed by evidence from the extant literature that suggested that although students' scientific reasoning can be practiced and assessed through a variety of methods, all based on the recognition of the pedagogical importance of students' engagement with and reasoning from evidence, writing remains a key vehicle for practice and assessment [26][27][28][29][30].
Our field study specifically targeted young minds with a preference for replication of acquired information and a view of the instructor as "the sage on stage" who delivers contents to be absorbed (i.e., didactic approach) and preserved by the learners. In principle, this approach to learning may create an obstacle to the development of scientific reasoning skills since the latter are, in essence, constructive rather than reproductive. Namely, their ultimate goals are generating and appraising solutions to problems through knowledge application and information gathering, but not direct retrieval from memory [31], which is rendered ineffective by the mere fact that solutions have yet to exist in the present.
Students of Middle Eastern or Asian descent are often described as possessing this approach to learning, although their educational model has different epistemological and pedagogical roots. In the Middle East, the oral tradition of the early Arabic communities reinforced by Islamic epistemology [32] has fostered the practice of memorization and recitation of information directly imparted from "the sage on stage", which is an important aspect of the pedagogy of learning in schools of the Arab world [32][33][34][35]. Within this tradition, students are expected to commit large segments of text to memory and accurately reproduce them when the instructor demands it. However, in the oral tradition of early communities, which this pedagogy reflects, memorization and recitation were seen not only as means of preserving scriptures and remembering the past but also as activities conducive to knowledge acquisition, understanding, reasoning, as well as self-discipline [32][33][34][35][36]. In the East, Confucian epistemology and practices have fostered a similar approach to learning as knowledge acquisition resulting from recognized scholars rather than from self-directed inquiry [37,38]. However, even within this tradition, memorization is seen as the first step of learning, including understanding, application, and analysis [38,39].
For our field study, we selected Middle Eastern students because they are often a neglected population for whom misconceptions exist as passive memorizers, and as committed to passive and uncritical reproduction of information. However, evidence exists that suggests that if given the opportunity, learners can use memorization not as an end to itself, but as a conduit to understanding [40]. Memory is merely seen as a key cognitive process that preserves information for later use. We reasoned that a course in which replication is of no effective use and practice in scientific reasoning is sustained would offer such an opportunity to students who are accustomed to didactic instruction and perceive learning as the acquisition of information that is to be preserved for future use in virtually the same format. In this context, we investigated the extent to which these learners can exhibit an improvement in their scientific writing as measured by the content of the final assignment (a complete research report) compared with earlier assignments.
A sustainable education is, by definition, one that allows students to withstand or even take advantage of the changes that take place in the different ecosystems of the society to which they belong. For students, a relevant ecosystem is their habitual learning environment, which includes all the ways a given curriculum is translated into learning opportunities [41]. Thus, the ancillary question was whether a disruption of students' habitual learning environment, such as that produced by a switch from face-to-face to distance learning, could propel change, obstruct change [42][43][44][45], or have no effect at all on students' learning of scientific reasoning skills. A move to an unfamiliar instructional mode might add too many challenges to a course that makes reproductive acquisition futile, thereby disrupting learning. Alternatively, the added challenges of the online mode might spur students' curiosity and foster the adoption of a problem-solving approach to activities and materials, thereby ensuring learning. However, if the main pedagogical aspects of the two modes remain equivalent, the learning outcome might be expected to be equivalent.

Participants and Sampling
Participants were 209 undergraduate female students at a University located in the Eastern Province of KSA whose curriculum and instruction replicate a USA model of higher education (including English as the primary communication mode). All were full-time female students (age range: 18-25) who were enrolled in a compulsory 3-credit-hour course of the core curriculum taught by an instructor in English. The course is typically taken by freshmen after a general course in writing. It consists of applications of the tenets of the scientific method to writing through the development of a common research project and its integration into a research report. Prior to the COVID-19 pandemic, the selected university did not offer online classes. Unfamiliarity with the distance education mode defined the scholastic background of the participants. Students reported Arabic as their first language and English as their second language. For admission, they had demonstrated English competency through standardized proficiency tests (i.e., TOEFL, IELTS, or Aptis). According to students' reports, exposure to English and Western culture included a variety of means: formal instruction (i.e., mandatory English courses in primary and secondary education), internet browsing and surfing, viewing of foreign TV channels, interactions with English-speaking expatriates, and trips abroad.
In our field study, which relied on actual students, enrolled in real classes, and assessed on actual assignments and tests, random assignment of participants was unfeasible. Thus, convenience sampling was used to select 9 classes taught by one instructor during a period of two semesters: 5 taught face-to-face and 4 taught synchronously online by means of Blackboard Collaborate. In both instructional modalities, classes ranged in size from 14 to 30 students. Students qualified for participation by virtue of being enrolled in the selected course. They participated for an entire semester, during which they were given the assignments and tests on which our data are based. No student was enrolled in more than one of the selected classes. Participation complied with the guidelines of the Office for Human Research Protections of the USA Department of Health and Human Services and with the ethical standards in the treatment of human subjects of the American Psychological Association (APA). Important to note here is that the specific research question that guided the present investigation was unavailable to the instructor during implementation.

Procedure and Materials
A few aspects differentiated the synchronous online classroom, conducted through Blackboard Collaborate as a reaction to the COVID-19 pandemic, and the face-to-face classroom to which students were accustomed. Blackboard Collaborate is a video conferencing tool equipped with audio, video, and application-sharing tools, a text-chat box, and a whiteboard, which allowed students to interact with the instructor and participate in lectures and class discussions. Blackboard gave students access to study materials and resources, such as study guides, practice exercises, textbooks, and videos, whereas the university's library offered access to online scholarly articles, book chapters, and books. In both the online and face-to-face classrooms, peer-observations described the instructor as maximizing interaction between (a) learners, (b) learner and content, (c) learner and instructor, which are the criteria set by Moore [45] as necessary for successful distance education. Thus, from a pedagogical viewpoint, the synchronous online classroom replicated many aspects of the face-to-face classroom with one exception. Both relied on Blackboard Learn, a learning management system, mostly used for posting class materials and assignment submission. Although interactions in the online classroom could occur through typed or spoken messages, the camera function was disabled during meetings for cultural and religious reasons. As a result, students and faculty could not see each other.
As the goal of this course was to give students an introduction to research activities and principles, the instruction was highly structured. At the start of the semester, students participated in a common research project (i.e., a small-scale correlational study in the social and behavioral sciences) under the supervision of the instructor. Then, they spent the rest of the semester (a) deconstructing the project (understanding research questions, hypotheses, methods, data analyses, and results, as well as discussing varied interpretations of the results), (b) writing the different parts of a research report that described the project, and (c) learning different scientific methodologies and their implications, through practice sessions with examples (e.g., abstracts) taken from the scholarly literature. Practice sessions also served as preparation for midterm and final tests. The report was divided into 4 main parts: assignment 1 (i.e., introduction), assignment 2 (i.e., literature review), assignment 3 (i.e., method and result sections), and assignment 4 (i.e., the completed document, including the discussion section and the abstract). Each assignment was graded with a rubric assessing the proper inclusion of all expected components. Prior to use by the instructor, the content validity of each rubric had been estimated by two faculty members blind to the purpose of the current study. Every time an assignment was completed, it was to be added to a single APA-formatted document, which at the end of the semester was expected to be the full research report. As assignment 4 constituted the completed report, prior to submission, students were instructed to revise the content of the previous assignments based on the feedback they received during the semester to ensure a coherent narrative. Thus, this assignment illustrated the extent to which students' re-writing could benefit from the scientific knowledge and skills they had acquired through the instruction and materials of the course. Students were also questioned about their research report orally to assess whether they could clarify weak or omitted aspects of the work submitted (assignment 5: question-and-answer session). Thus, students' question-and-answer (Q and A) session could also be considered a demonstration of students' scientific knowledge and skills acquired from the course, albeit of a smaller scope than that of assignment 4 (e.g., what is the research question of your study? What is a correlation coefficient?). The number of questions answered correctly by a student over the number of questions asked by the instructor (multiplied by 100) constituted the student's grade in assignment 5.
To ensure a sustainable practice, three main activities, in addition to students' understanding of key concepts (as per Bloom's taxonomy) [22][23][24][25] were targeted: application (using information in new settings), analysis (drawing connections through examining, organizing, comparing, and contrasting), and evaluation (appraising, judging, critiquing, and arguing). Practice consisted of brief narratives (i.e., abstracts) of a variety of studies taken from the extant literature in the social and behavioral sciences, neuroscience, and health sciences. For each narrative, the students' main task was to apply key concepts (e.g., what is the method used?), identify key information (e.g., what is the research question/purpose of the study?), evaluate the reported results (e.g., are the results supported by the method used for data collection?), organize information (e.g., how would you rewrite this abstract to improve its readability?), and judge interpretations and conclusions (e.g., what are the strengths and weaknesses of the viewpoint expressed by the researcher in the discussion section?). Students also compared and contrasted abstracts to understand differences and similarities in research questions, hypotheses, methodologies, and results. Whenever possible, the information gathered from practical examples was linked to the research project that students had to translate into a research report by the end of the semester.
The midterm was described to students not only as an opportunity for self-assessment of one's research knowledge and skills but also as an opportunity to engage in further practice. It was explained to students that because tests challenge learners to retrieve known information (rehearsal practice) and use it to analyze and evaluate information in the test questions, they constitute practice as much as the exercises carried out in class prior to each test. Students were informed that rehearsal practice reinforces the links between items of information in the learner's long-term memory [46], whereas analysis and evaluation create new links and compel learners to develop a deeper understanding of research materials. Midterm and final tests contained short abstracts describing studies taken from the same scientific fields used for practice. Abstracts were followed by multi-part short-answer or multiple-choice questions that asked students to carry out the cognitive operations exercised during the practice sessions. As questions often engaged more than one operation (application, analysis, and evaluation), we were unable to categorize questions into mutually exclusive categories by the operations they activated. Thus, the students' scores on a test constituted our measure of performance. Midterm and final tests involved randomly presented sets of questions previously developed by the instructor and all pre-tested in earlier instances of the same class to ensure suitability to the target audience and medium difficulty. A conventional difficulty index [47] was used to operationalize difficulty as the ratio of the number of students that could answer a question correctly to the total number of the students who were given the question. Its values ranged from 0 to 1 [47]. A question of medium-level difficulty was defined as one that had an index greater than 0.3 and less than 0.8.
The curriculum of the course relied on a syllabus approved by the Texas International Education Consortium (TIEC) and textbooks written for a USA audience. Instruction by the selected instructor was judged by independent observers as meeting the criteria set by Richards et al. [48] for culturally relevant pedagogy (CRP). Namely, it (a) acknowledged learners' differences and similarities; (b) validated students' cultural identities in instructional modes and materials; (c) educated learners about diversity in the world; (d) fostered equity and respect; (e) nurtured relationships among all parties involved in the learning process; (f) motivated the use of active learning; (g) cultivated critical thinking skills; (h) focused learners' attention on excellence as defined by their potentials; (i) supported learners' efforts to comprehend social implications; and (j) relied on assessment suited to the cultural knowledge of the population being tested. For the latter criterion, it is important to note that although tests and assignments covered the content of textbooks written for a USA student population, they included examples that referred to Middle Eastern situations, practices, beliefs, and values.
During the entire semester, the instructor kept a checklist where questions spontaneously asked in class, via email, and during office hours by individual students were recorded. To ensure equity between the face-to-face and the online classrooms, both typed and spoken questions were included in the count. Written questions were those typed in the chat-box of the online class and in email messages by students attending either type of class. Unprompted questions were those that were not initiated by the instructor in a lecture, class discussion, or conversation during office hours. Unprompted questions that involved replication of learning were recorded as "R", those that concerned understanding, application, analysis, and evaluation were labeled "U", and those about class-management issues (e.g., deadlines for submission, scheduling of tests, etc.) were coded as "M". Examples of R questions might be "what is the minimum or maximum number of pages/words that the assignment requires?", "can I use the example, statement, or sentence that you mentioned in class?", "can we use your example exactly as you said it?", "do we need to paraphrase the information we take from articles?", and "are the questions taken directly from the textbook?", and "shall I memorize definitions?". Examples of U questions might be "when variables are changed by the researcher, is it an experiment?", "if the independent variable is a demographic characteristic such as age, can I still call the study an experiment?", and "can you explain the method of this study?". Usually, this type of question was accompanied by additional information provided by the student regarding a particular research artifact that required clarification by the instructor. Examples of M questions might be "what is the deadline for this assignment?", "will I get penalized for not submitting by the deadline?", "can we resubmit an assignment?", "how many multiple-choice questions are on the test?", and "are we required to participate?". Students' responses to queries by the instructor were not included in the count of R, U, and M answers. They were recorded as "P" (prompted queries).
The main independent variables of our field study were time of assessment (assignment 1, 2, 3, and 4), and instructional mode (face-to-face and online). The class section was not included in the analyses reported below since there was no evidence that it interacted with either independent variable. The main dependent variables were grades of assignments 1-4 serving as performance measures and attendance (percentage of class meetings attended during a semester), as well as the number of prompted replies and unprompted questions articulated, both of which served as a rough measure of engagement. In this design, midterm grades were conceptualized as assessing the initial impact of the practice exercises that students received in the face-to-face and online classrooms. Due to institutional requirements, final test grades were not available.

Results
All scores related to students' performance were distributed on a scale from 0 to 100. Performance assessment was based on content that was deemed to reflect scientific reasoning competence. Results are considered significant at the 0.05 level [49]. Descriptive statistics are reported in Table 1. Analyses are organized by the questions they are intended to answer.

What Was the Participants' Orientation towards Instruction?
A 3 (type of question) × 2 (instructional mode) mixed factorial ANOVA was conducted on the unprompted questions asked by students in class, during office hours, and via email. A significant main effect of the type of question was uncovered, F(2, 414) = 395.24, MSE = 11.54, p < 0.001, ηp 2 = 0.656 (other Fs < 1). Tests of simple effects indicated that students asked considerably more R questions than either U or M questions, ts (208) ≥ 9.30, p < 0.001. Based on the number of questions that students asked that required replication of information presented by the instructor or displayed in textbooks, students were classified as possessing a predilection for reproductive learning irrespective of whether they were exposed to face-to-face or online instruction. A qualitative analysis of both prompted and unprompted questions made by students supported the assumption that students were also instructor centered. Namely, the instructor was seen as responsible for dispensing content (didactic role), whereas the students saw themselves as responsible for absorbing it. Information collected a year earlier from four focus groups comprised of students from the same population as well as informal interviews of faculty was consistent with the evidence collected from prompted and unprompted questions, offering further corroboration. These findings replicate those reported in the extant literature, suggesting that there is a link between education and culture [50,51].

Was There Evidence of Learning (Assignment 1, 2, or 3 Versus Assignment 4)?
Grades that students received on assignment 1 (introduction) compared with those received on assignment 4 (full report) constituted a key measure of students' learning of scientific reasoning (range: 0-100). A 2 (time of assessment) × 2 (instructional mode) mixed factorial ANOVA was conducted on grades to assess learning and the impact of instructional change. This analysis yielded a main effect of time of assessment, F(1, 207) = 8.59, MSE = 119.76, p = 0.004, ηp 2 = 0.040 (other Fs ≤ 1.00, ns). As assignment 4 gave students the opportunity to revise their previous work before submission, the improvement observed in both the face-to-face and online classrooms signified a change in the learners' approach to scientific reasoning.
When grades that students received on assignment 2 (literature review) and assignment 4 (full report) were submitted to a 2 (time of assessment) × 2 (instructional mode) mixed factorial ANOVA, a similar pattern of results was uncovered. Namely, although there was no main effect of instructional mode or significant interaction, Fs ≤ 3.12, ns, a main effect of time of assessment was found, F(1, 207) = 44.32, MSE = 87.01, p < 0.001, ηp 2 = 0.176, indicating that performance improved in both the face-to-face and online classrooms from assignment 2 to assignment 4. However, when grades that students received on assignment 3 (method and result sections) and assignment 4 (full report) were submitted to the same 2 (time of assessment) × 2 (instructional mode) mixed factorial ANOVA, a slightly different pattern of results emerged. Time of assessment produced a main effect, F(1, 207) = 30.81, MSE = 260.20, p < 0.001, ηp 2 = 0.130, indicating that performance improved in both the face-to-face and online classrooms from assignment 3 to assignment 4. Although there was no main effect of instructional mode, F = 1.32, ns, the interaction between time of assessment and instructional mode was significant, F(1, 207) = 5.17, MSE = 260.20, p = 0.024, ηp 2 = 0.024. The performance improvement of students in the online classroom was much larger than that of students in the face-to-face classroom, t(207) = 2.27, p = 0.024. There was a numerical trend towards lower performance in assignment 3 by online students, but it failed to reach significance, t < 1, ns. Thus, the different magnitude of the performance improvement between online and face-to-face classrooms could not be directly attributed to online students' lower performance in assignment 3.
In sum, most of the assignments that preceded the writing of the completed research report suggested equity between face-to-face and online learning. However, the numerical trend towards lower performance in assignment 3 by online students replicated a significant difference in the impact of the practice exercises on students' learning measured by the midterm test, an assessment instrument that was temporally close to assignment 3. In fact, a one-way ANOVA on midterm grades with instructional mode as the independent variable illustrated that students enrolled in a face-to-face class (M = 79.63%, SEM = 1.95) performed at a higher level than those enrolled in an online class (M = 73.62%, SEM = 1.92), F(1, 207) = 4.83, MSE = 391.01, p = 0.029, ηp 2 = 0.023. Taken together, these findings indicate that although the instructional modes were largely similar in their impact on learning, particularly complex technical information (methodologies and results) was absorbed more slowly by the online students.
Important to note is that although the analyses involving assignment 4 performance did not suggest a difference in instructional modalities at the end of the course, the examination of the scores received by students in the question and answer session (assignment 5) illustrated superior performance by online students, F(1, 207) = 3.93, MSE = 840.79, p = 0.049, ηp 2 = 0.019. The instructor explained this difference by attributing it to the greater comfort that students experienced with physical distance when being questioned.

Did Engagement and Overall Performance (Class Grades) Differ between Instructional Modalities?
Attendance was computed by counting the number of classes attended, dividing it by the number of classes offered, and multiplying the obtained value by 100 (range: 0-100). Classes for which students were excused were counted as absences. Attendance was submitted to a one-way ANOVA with instructional mode serving as the independent variable. In this analysis, students enrolled in face-to-face classes had a better attendance records than those enrolled in online classes, F(1, 207) = 6.80, MSE = 179.49, p = 0.010, ηp 2 = 0.032. The same analysis applied to the number of students' replies across the entire semester in response to the instructor's queries relative to the number of queries available. Participation scores for prompted inquiries, reported as percentages, again illustrated less engagement by the online students, F(1, 207) = 28.50, MSE = 152.93, p < 0.001, ηp 2 = 0.121. However, although students attending online classes were less engaged, they did not complete the course less successfully. In fact, there were no differences in overall performance (as measured by class grades), F = 1.26, ns.

What Was the Relative Contribution of Practice, Orientation Towards Reiteration, and Engagement to Performance?
A linear regression analysis was conducted within each learning modality to determine the contribution of practice exercises (as measured by midterm test performance), orientation towards repetition, and engagement (as indexed by participation records regarding prompted questions) to performance in assignment 4. Orientation towards repetition was computed by adding together unprompted R and M questions and subtracting U questions from this sum. In the regression analyses, performance on the completed research report (assignment 4) served as the outcome variable. Results (see Table 2) indicate that, at the end of the semester, the content of the scientific writing of students in the face-to-face classroom as well as of those online had benefited from both engagement and practice exercises.

Discussion
The findings of our field study regarding the outcome of sustained scientific reasoning practice can be summarized in three points. First, teaching scientific reasoning skills within a semester can have measurable effects on learners' scientific reasoning (as measured by the content of a research report written by students at the end of the semester). It is important to keep in mind that the starting point for the assessment of students' learning is an assignment whose completion has benefited from approximately three weeks of practice. Thus, the degree of improvement in scientific reasoning, as measured by the instructor's grades (approximately 3% from assignment 1 to assignment 4), may actually be larger than the reported value. Furthermore, if one considers the utility of the basic competencies that this 15-week course helps students develop for the educational journey on which they have embarked, the potential benefits of training can be significant. Second, abrupt alteration of the learning environment, such as an unexpected change from face-to-face instruction (familiar mode) to synchronous online instruction (an unfamiliar mode) [52], does not appear to impair learning. Understanding of methodologies and statistical analyses and their applications may merely occur at a slower pace in the online mode. Third, learning of scientific reasoning skills benefits from sustained practice with the application, analysis, and evaluation of artifacts of the scientific enterprise (e.g., abstracts), as well as students' engagement, even if students have a propensity for reproductive learning, a propensity that contrasts with the constructive and generative learning associated with scientific reasoning.
Our results are consistent with those of other studies in which focused training benefited students' acquisition of scientific reasoning skills [11,18,19]. Albeit there may be initial pedagogical differences between the West and the East or the Middle East concerning approaches to learning [5,32,37,38], students with a fondness for reiteration of information provided by the instructor or found in study materials demonstrate the ability to switch to a different mode of processing information when reiteration is recognized as unfeasible. Interestingly, in our field study, the instructor informally reported that the frequency of students' unprompted R questions had not declined from the first to the second part of the semester. The instructor remarked that students' ability to change when the circumstances demand it indicates that habits are not hardware. However, she added, as an afterthought, that ingrained habits leave an indentation to which students misguidedly return for comfort when they face uncertainties and increased workload.
The basic assumption that has guided our study is that scientific reasoning plays a key role in a sustainable education, defined as one in which students learn the knowledge and skills that meet "the needs of the present without compromising the ability of future generations to meet their own needs" [1] (p. 45). It is an education that does not prepare young minds to compete and consume, but rather to care and conserve [4]. We believe that scientific reasoning is at the core of sustainable education because it highlights its reliance on evidence-based knowledge and rigorous analysis of information to ensure validity and reliability before information is treated as facts, and facts are disseminated. At the present time, humanity is facing a multitude of concurrent challenges, including poverty, injustice, war, an ongoing pandemic, extreme weather phenomena, etc. Dissemination of misinformation and perpetuation of unchallenged misperceptions add to these burdens, rendering just and equitable solutions more difficult to uncover and implement. Our modest field study has demonstrated that scientific reasoning skills can be taught to a population that has a preference for reiteration of information gathered from authoritative sources, a population that is potentially an easy target for misinformation dissemination schemes. Thus, scientific reasoning skills can play a critical role in ensuring that, at the very least, there is an attempt at protection, if not a counteraction [53,54].
One of the limitations of our study and a topic of future research is our reliance on a sample of female participants. In the Middle East as much as in the Western world, most women remain underrepresented in the fields of science, technology, engineering, and mathematics (STEM) [55][56][57][58]. However, in the Western world, a distinct gender difference pattern shapes standardized tests and school marks. To wit, female students usually exhibit greater school marks regardless of the subject matter [59], whereas males display higher scores on standardized tests, especially when quantitative scientific competencies are measured [60]. Recently, declines in the magnitude of the gender gap in standardized tests have been observed [61]. In Saudi Arabia, on the other hand, evidence of gender differences tends to favor females on both standardized tests and high-school grade point average (GPA) [62]. This pattern may be explained by a difference in motivation [63]. Women have recently and rather suddenly been given opportunities that were previously only permitted for men. Thus, women are motivated to demonstrate that they can handle challenges. In our study, both engagement and practice have been found to contribute to the learning of scientific reasoning competencies. Thus, it is reasonable to expect a higher performance by female students if male students are added to the original sample. Although we believe along with other researchers that a sustainable education is one in which both women and men make a contribution [64,65], due to gender segregation rules, a comparable sample of male students was unavailable for testing purposes.
Another limitation of our study is that the writing of the research report (our main outcome measure) was a compulsory activity, thereby potentially inflating the baseline of students' engagement (broadly defined to include attendance and all measures of participation). Nonetheless, students in the face-to-face and online classes differed on two measures of engagement (attendance, and participation as appraised by responses to prompted questions), suggesting that engagement was not at ceiling and that it was sensitive to the mode of instruction. Online students' informal remarks suggested that written responses in the text-chat box were perceived as deprived of the immediateness and ease of oral responses in the face-to-face classroom, whereas coordination of oral responses in the online classroom was reported to be challenging since students could not see each other, and thus easily determine when to interject. Both remarks were used by students to account for their reluctance to respond to the instructor's queries. Albeit of a qualitative nature, these remarks suggest that a sustainable education resting on the online modality may need to rely on preparatory activities to facilitate students' classroom management activities. An additional limitation of our field study is that random assignment of participants to instructional modes was unfeasible, thereby preventing the researchers from exercising control over individual difference factors, such as prior experience with similar class activities. The instructor did not report demographic and prior experience differences between the face-to-face and the online students, but rather a pervasive unfamiliarity with the activities required by the course. However, if such differences existed between face-to-face and online conditions, they did not significantly shape performance on the first assignment treated as the baseline.