A Comprehensive Framework for Comparing Textbooks: Insights from the Literature and Experts

: Textbooks are essential components in the learning process. They assist in achieving educational learning outcomes and developing social and cultural values. However, limited studies provide comprehensive frameworks for comparing textbooks. Most have focused on a speciﬁc textbook perspective within a particular discipline. Therefore, this study used a triangulation method to develop a comprehensive framework for textbook comparison. Through a systematic literature review and a two-round Fuzzy Delphi method with 155 textbook experts, a textbook comparison framework with four indicators (structure, content, expectations, and language) was developed. Additionally, some of the developed framework indicators and sub-indicators could be relevant for comparing textbooks in a particular discipline. For example, the page count sub-indicator was proven to be useful for comparing humanities and social science textbooks but not natural science textbooks. The ﬁndings of this study could facilitate the process of comparing textbooks, hence promoting the understanding of knowledge design and acquisition in different contexts, such as when comparing textbooks from different countries.


Introduction
A textbook is defined as "a book that teaches a particular subject and that is used especially in schools and colleges" [1].As is evident from this definition, textbooks are resources for teachers and students to perform teaching and learning activities aligned with a given curriculum.A typical textbook comprises organized units of knowledge in the form of text, tables, illustrations, and exercises for assessment.Textbooks are created by authors and publishers by adopting frameworks for how knowledge of a specific domain could be represented, like vocabulary analysis [2].Chang and Windeatt [3] looked into the impact of a Moodle-based electronic textbook on collaborative writing skills and improvement of digital literacy skills, and reported better results with respect to pedagogical and affective characteristics from e-textbook adaption.Textbooks bridge educational planning and classroom activities, with a potential significant influence on students' learning outcomes [4].Masango, Van Ryneveld, and Graham [5] looked into the advantages and disadvantages of electronic textbooks in public schools of South Africa and revealed that most of them consider such textbooks useful.Open textbooks available as open educational resources have been revealed to have a positive influence on the perception of teacher educators [6].

Importance of Comparing Textbooks
Although educational systems vary among different countries, all of them rely on textbooks to provide students with the best learning opportunities [7,8].Specifically, due to their differing historical, cultural, and financial backgrounds, each country has developed its own unique educational system, and this has led to different learning experience designs, including curriculum design and textbooks [9].Therefore, analyzing and comparing textbooks from different countries could help to better understand various education systems worldwide as well as the pedagogical methodologies followed by each country [7,[10][11][12].For example, in a textbook comparison study investigating the developmental course of the concept of fractions, Yang, Reys, and Wu [13] claimed that compared to Singapore and Taiwan, textbooks from the United States are more likely to clarify problems in real-life settings, which is highly recommended by professional organizations.Ryu, Jeon, and Paik [14] investigated the chemistry education major pre-service teachers' perception through an analysis of textbooks of chemistry and concluded that textbook description requires improvement so that ignorance of teachers towards science concepts is reduced.Similarly, Takeuchi and Shinno [15] concluded that, with respect to symmetry and transformations demonstrated in the lower secondary mathematics textbooks, England's textbooks are better connected with either real-life or other applications, whereas Japan's textbooks put more emphasis on geometric proofs.Mogias, Boubonari, and Kevrekidis [16] compared science textbooks related to Ocean Literacy Framework in relation to textual and pictorial materials and recommended a cooperation between marine scientists and educators with textbook authors and curriculum designers.Zheng [17] carried out a comparative analysis of Chinese and American native language textbooks and reported that the integration of big data technology into Chinese and American mother-tongue teaching has a positive impact on the quality of teaching.Textbooks are a significant carrier of cultural perceptions.
Textbooks serve important purposes other than spreading knowledge [18].They incorporate social and cultural values as well as political ideologies [7,19].Textbooks may be designed based on each country's social, cultural, and political uniqueness [19].For example, Lee and Collins [20] analyzed textbooks with respect to social role.After reviewing 20 recommended English textbooks from Australia and Hong Kong, they noted that there was an unbalanced distribution of social and domestic roles between males and females depicted in those textbooks.Specifically, in textbooks from both Australia and Hong Kong, women are confined to limited and more traditional occupations (e.g., nurses, teachers, secretaries, and fashion designers) and are more likely to be involved in domestic chores (e.g., childcare).Men tend to be portrayed in both physically and mentally demanding occupations (e.g., farmers, hunters, investors, and doctors) [20].Therefore, comparative studies of international textbooks could contribute to a better understanding of the society where the educational system is operating, in addition to prevailing stereotypes.
Additionally, to some extent, textbooks highlight what types of skills are highly valued by a society [12,21].Moreover, the frequency and intensity required by textbooks for training skills partly determines the learning opportunities that the society provides for its students [22].For instance, researchers suggested that textbooks would be beneficial to students if they could approach and learn from experimental raw data instead of educationally processed tables or figures [23].Zhou [2] did a vocabulary analysis of English major textbooks and revealed that the increased frequency of a new word leads to increase in learning opportunities for the students.Textbooks have been proven to be a great learning resource for students' ability and achievement [24].Consequently, textbook comparison at the international level could be used to understand the possible reasons for students' differing performances in various countries [25][26][27].
Finally, through cross-national comparative studies, people can identify the advantages and disadvantages of textbooks of their own country [13,15], which could potentially set the vision for future improvements [25].Furthermore, with the new wave of open education, several universities have begun publishing their textbooks as open educational resources (OER) to reduce financial costs and to facilitate knowledge sharing [27,28].However, since educational curriculums and textbooks differ from one country to another, it is difficult to reuse open textbooks in different contexts (and, by implication, countries) despite being open and free [28].Therefore, analyzing and comparing textbooks across different countries could help to identify the similarities and differences of textbooks of those countries [29], which further contributes to designing universal open textbooks [30].

Research Gap and Study Focus
Due to the aforementioned reasons, comparing textbooks has gained vast research attention [15,[31][32][33].However, to the best of our knowledge, most of the extant research has evaluated and compared textbooks in terms of limited indicators of structure [34], content [23,32,35], learning expectations, and/or language analysis [20,36].For example, to compare the presentation of fractions in textbooks of fifth and sixth grades, Yang et al. [31] mainly examined problems and exercises.Valverde et al. [7] claimed that textbooks should not only present knowledge in a well-structured way but also clarify what goals students are intended to achieve with this knowledge.Hence, they introduced three main indicators to evaluate a textbook: content, structure, and performance expectations [7].While some indicators have been fully discussed, others have been paid less attention or have been neglected.For example, a body of research did not take factors important to modern society and education into account, such as technology.Finby et al. [23] treated videos embedded within two introductory biology textbooks as irrelevant items and, therefore, did not examine them.Furthermore, few researchers have explored the frequency with which their examined textbooks were updated, which implies that they have no understanding of whether the contents of textbooks are supported by the latest research.This insufficient research and lack of examination regarding significant indicators may have resulted in their comparative frameworks being incomplete and outdated.To contribute to the extant literature, this study aims to provide a comprehensive framework by considering several indicators for textbook comparison.
Additionally, most textbook comparison frameworks were predominantly based on the literature without any expert validation [31,32,37].For instance, Kar et al. [22] developed a framework of problem analysis within textbooks by including indicators used in previous studies without conducting any expert validation of the framework.Similarly, in a study concerning two introductory biology textbooks, Finby et al. [23] classified figures and tables within textbooks without going through any validation process.Conversely, Sullivan and Benke [38] developed and validated their framework in a three-step process.Although they invited experts to offer comments during the validation process, less literature was consulted when determining indicators [38].Therefore, this study applies the triangulation method by combining data from both experts and literature to provide a reliable framework for textbook comparison.
Furthermore, most frameworks designed for textbook comparison were concerned with specific disciplines, particularly mathematics.For instance, the Trends in International Mathematics and Science Study (TIMSS) textbook analysis only characterizes mathematics and science [7].The analysis framework developed by Zhu and Fan [39] was also intended for examining problem types in mathematics textbooks.Conversely, the rest of extant research paid attention to different subjects.For example, the framework built by Sullivan and Benke [38] intended to compare introductory financial accounting textbooks.Moreover, Simon and Budke [33] created a task analysis framework targeted to geography textbooks to understand how comparison competency could be enhanced.To summarize, limited established frameworks are suited for comparing textbooks of diverse subjects.To address this research gap, our study sought to construct a framework that could be validated in and applied to multiple disciplines.
Based on the above background, this study aimed to construct a comprehensive framework that could be applied to compare textbooks from various disciplines.To this end, a systematic literature review was first conducted to build the textbook comparison framework.A two-round Fuzzy Delphi method was then applied with international experts to validate this framework.Specifically, this study aims to answer the following research questions: RQ1.What indicators are relevant for textbook comparisons?RQ2.What indicators are best fitted to compare textbooks within a specific academic discipline?

Methodology
To construct and validate the theoretical framework for comparing textbooks, this study used two methods, namely: a systematic literature review and a Fuzzy Delphi.Specifically, the first step was to collect information about the possible textbook comparison frameworks through a comprehensive review to build our textbook framework.Then, a two-round Fuzzy Delphi method was used to further increase the validity of the built framework.In this context, several international textbook experts were invited to review and validate the textbook comparison framework.Figure 1 presents the research methodology followed in this paper.Each of the two methods is discussed in the following sections.
Furthermore, most frameworks designed for textbook comparison were concerned with specific disciplines, particularly mathematics.For instance, the Trends in International Mathematics and Science Study (TIMSS) textbook analysis only characterizes mathematics and science [7].The problem analysis framework developed by Zhu and Fan [39] was also intended for examining mathematics textbooks.Conversely, the rest of extant research paid attention to different subjects.For example, the framework built by Sullivan and Benke [38] intended to compare introductory financial accounting textbooks.Moreover, Simon and Budke [33] created a task analysis framework targeted to geography textbooks to understand how comparison competency could be enhanced.To summarize, limited established frameworks are suited for comparing textbooks of diverse subjects.To address this research gap, our study sought to construct a framework that could be validated in and applied to multiple disciplines.
Based on the above background, this study aimed to construct a comprehensive framework that could be applied to compare textbooks from various disciplines.To this end, a systematic literature review was first conducted to build the textbook comparison framework.A two-round Fuzzy Delphi method was then applied with international experts to validate this framework.Specifically, this study aims to answer the following research questions: RQ1.What indicators are relevant for textbook comparisons?RQ2.What indicators are best fitted to compare textbooks within a specific academic discipline?

Methodology
To construct and validate the theoretical framework for comparing textbooks, this study used two methods, namely: a systematic literature review and a Fuzzy Delphi.Specifically, the first step was to collect information about the possible textbook comparison frameworks through a comprehensive review to build our textbook framework.Then, a two-round Fuzzy Delphi method was used to further increase the validity of the built framework.In this context, several international textbook experts were invited to review and validate the textbook comparison framework.Figure 1 presents the research methodology followed in this paper.Each of the two methods is discussed in the following sections.

Systematic Literature Review
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines was followed to produce this systematic literature review.PRISMA provides a standard peer-accepted methodology that uses a guideline checklist, which was followed in this study.

Systematic Literature Review
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines was followed to produce this systematic literature review.PRISMA provides a standard peer-accepted methodology that uses a guideline checklist, which was followed in this study.

Search Strategy and Selection Criteria
A systematic literature review was conducted to collect data about textbook comparison frameworks.To contend with the complex topic, an extensive search for research papers and articles was conducted in two well-known electronic databases: Web of Science and Scopus.The search was based on the following search strings: Search string: ((textbook substring) AND (comparison substring) AND (framework substring)) textbook substring: textbook comparison substring: comparison framework substring: frameworks or models or guidelines.
Additionally, since several significant international projects have already compared textbooks, such as TIMSS and PIRLS (Progress in International Reading Literacy Study), this study expanded its search strategy beyond the two scientific databases to identify the reports conducted by these projects.To ensure reliability, we only included published journals and books during the search.
After obtaining the relevant articles from the two databases, two researchers analyzed the search findings by their titles, abstracts, and full text based on pre-defined inclusion and exclusion criteria, as presented in Table 1.It should be noted that this study primarily focused on printed textbook comparison for two reasons.First, it is difficult to use one framework to compare both printed and electronic textbooks, as each one of them have different formats, design requirements, and standards.Second, despite the fact that electronic textbooks have been adopted in some countries, most countries still view printed textbooks as essential and frequently-used educational materials in their educational systems [40][41][42].Therefore, understanding the similarities and differences of printed textbooks in different countries is a crucial task.

Inclusion Criteria Exclusion Criteria
Papers written in English Papers which are not written in English Papers that discussed textbooks comparison Papers that focused on printed textbooks comparison Papers that provided detailed description of the used indicators to compare textbooks

Papers that discussed textbooks in general
Papers that focused on other textbooks (e.g., e-textbooks).Papers that did not provide enough details about the indicators used for textbook comparison

Selected Papers and Quality Assessment
This research corpus search generated a total of 657 articles, where 4 additional papers were further identified by the authors via screening the references of the identified articles.After removing duplicated papers, 453 papers remained.Then, 381 papers were removed based on a title and abstract screening.The remaining 72 papers were considered and assessed based on their full texts.Finally, 39 of these papers did not adhere to the inclusion criteria.Thus, a total number of 33 publications were analyzed, which included 29 journal articles, 1 report from an international project led by TIMSS, and 3 books.Figure 2 presents the full results of the selection process using PRISMA guidelines [43].
In this study, four criteria were used to evaluate the overall quality of each reviewed paper, each focusing on a different quality issue.Each quality criterion (QC) is a Yes/No question corresponding to a score of 1 or 0, respectively: QC1: Did the study report the sources and details of the outcome assessment?QC2: Did the study compare its reported results with previous results?QC3: Did the study conduct validity or reliability tests during quantitative analysis?QC4: Did the study involve a statistical analysis of significance during the quantitative assessment?
The strategy for assessing the quality of the selected papers was to find the average score of the four criteria, which was used by many other studies in the reviewed literature [44][45][46].The quality scores of the reviewed papers were as follows: (a) six studies involved a statistical analysis of significance during quantitative assessment (QC4), accounting for 18.18% of the total studies.Although many papers used a statistical method for instrument validation, they primarily focused on descriptive data for the results [10,13,47].(b) Ten studies conducted validity or reliability tests during their quantitative analysis (QC3), accounting for 30.3% of the total studies.These 10 studies were done with validating their inter-rater coding process, but none of them involved a validation test of the framework to compare the textbooks that they used.(c) Nine studies compared their results with other results (QC2), accounting for 27.27% of the total studies.This indicated that although some articles mentioned the comparison between their findings and previous studies' results, there were few descriptions of the similarities or differences among those findings.(d) 24 studies reported the sources and details of their outcome assessments (QC1), accounting for 72.72% of the total studies.Therefore, this study can contribute to existing literature by using both descriptive and statistical analysis of significance during quantitative assessment to draw conclusions.Furthermore, a validation test of the constructed framework to compare textbooks is conducted in this paper.We also compared the results of our paper to several previous studies in detail by describing the similarities and differences between what we discovered and their findings.In this study, four criteria were used to evaluate the overall quality of each reviewed paper, each focusing on a different quality issue.Each quality criterion (QC) is a Yes/No question corresponding to a score of 1 or 0, respectively: QC1: Did the study report the sources and details of the outcome assessment?QC2: Did the study compare its reported results with previous results?QC3: Did the study conduct validity or reliability tests during quantitative analysis?QC4: Did the study involve a statistical analysis of significance during the quantitative assessment?
The strategy for assessing the quality of the selected papers was to find the average score of the four criteria, which was used by many other studies in the reviewed literature [44][45][46].The quality scores of the reviewed papers were as follows: (a) six studies involved a statistical analysis of significance during quantitative assessment (QC4), accounting for 18.18% of the total studies.Although many papers used a statistical method for instrument validation, they primarily focused on descriptive data for the results [10,13,47].(b) Ten studies conducted validity or reliability tests during their quantitative analysis (QC3), accounting for 30.3% of the total studies.These 10 studies were done with validating their inter-rater coding process, but none of them involved a validation test of the framework to compare the textbooks that they used.(c) Nine studies compared their results with other results (QC2), accounting for 27.27% of the total studies.This indicated

Final Obtained Textbooks Comparison Framework
After reviewing the 33 studies, three researchers used a card sorting method to identify the indicators used for comparing textbooks, then developed a comprehensive framework for comparing textbooks of different countries.This method serves to organize and improve the information architecture by classifying different categories of collected information, as has been widely employed in many fields, such as psychology, robotics, software engineering, and website design [45,48,49].All divergences were discussed until an agreement was reached: the agreement ratio was first ~90% and reached consensus after discussions.To further validate the developed framework, many international experts were invited to offer their insights about it, as discussed in the next section.

Fuzzy Delphi Method
In many studies, questionnaires are common instruments to gather individuals' perceptions and behaviors in relation to a specific topic [48].In this context, a two-round Fuzzy Delphi survey with international textbook experts was conducted.The Fuzzy Delphi is a more advanced version of the Delphi method, in that it utilizes triangulation statistics to determine the distance between the levels of consensus within the expert panel.Inspired by many textbook evaluation checklists [50,51], a questionnaire for this textbook comparison framework validation was produced.
In this study, 207 experts belonging to authors' professional networks were approached, of which, 155 experts volunteered to participate in both rounds of the Delphi study.To ensure reliable answers that covered different perspectives of textbook comparison, these experts were chosen for their different involvements with textbooks, including teachers, publishers/editors, textbook inspectors, education administrators, principals, teacher trainers, and instructional designers.These experts were also from different regions, including Asia, Europe, Africa, and South and North America.Despite the experts being carefully chosen for this study, they were further asked to rate their familiarity with textbooks on a scale from 1 to 5 (where 1 is not familiar, and 5 very familiar).The obtained mean value was 4.16, demonstrating a high level of expertise and appropriateness for this study.
The data were collected from the Delphi survey over three months to give the experts enough time to complete their detailed responses.To ensure high-quality feedback from the Fuzzy Delphi method, experts were asked to keep certain factors in mind [45,52], including the readability of their feedback, the importance of disregarding personal political opinions, etc.
In the first round of the survey, the questionnaire asked the experts to (a) review and validate the four indicators and sub-indicators within each indicator on a scale from 1 to 5 (where 1 was "strongly disagree," and 5 "strongly agree"); (b) further enhance the name and definitions of each indicator and sub-indicator, if needed; and (c) add potential indicators and sub-indicators based on their expertise that had not been identified during the literature review.Specifically, the questions that the experts were asked included (a) "Based on your experience, please rate the indicators and sub-indicators of the framework;" (b) "Do you have any suggestions for the definitions of indicators that you think are inappropriate?Please elaborate;" and (c) "Are there any other indicators that you think are important to include in the textbook comparison framework?Please elaborate." In the second round, after obtaining the final textbook comparison framework (from the literature and the first round), the experts were asked to answer the following questions: (a) "Identify the comprehensiveness of the framework and its parts (i.e., overall, structure, content, language, and learning goal analysis) from 1 to 5, where 1 corresponds to 'strongly disagree' and 5 to 'strongly agree;'" and (b) "Select the most relevant indicators described in the framework to compare textbooks from a given academic disciplines, among the five disciplines (i.e., humanities, social sciences, natural sciences, formal sciences, and professional and applied sciences) identified by Wu et al. [53]."

Results
The obtained results are structured according to each research question and are discussed in detail in the following subsequent sections.

Relevant Indicators for Textbook Comparison
Based on the analysis of the experts' inputs during the first round of the Fuzzy Delphi method, 82.6% of the experts agreed or strongly agreed with the comprehensiveness of the framework we built to compare textbooks.Specifically, 69.1%, 86.5%, 76.3%, and 85% of the experts agreed or strongly agreed that the following four main indicators of the framework (structure, content, language, and learning goal analysis) were comprehensive, respectively.Conversely, experts have raised some concerns and suggestions in response to the openended questions of the questionnaire.For example, with respect to the structure analysis, experts suggested considering illustrations since textbooks might be figure-and/or tableoriented.Additionally, some experts highlighted the importance of qualitative analysis, particularly to consider how chapters are structured and connected with other chapters.They further proposed that lessons within textbooks should be measured not only by number but also in terms of their connectivity.Therefore, we added the indicator called "chapter/unit organization" to examine the same topic being discussed across multiple countries.Several experts further highlighted that readability (i.e., the complexity of texts) of textbooks should also be examined in order to understand whether textbooks are language-appropriate to a specific group of students.Furthermore, experts recommended to examine the gender sensitivity of textbooks, which reflects social perceptions of gender.Finally, several experts suggested renaming the indicator expectation analysis to learning goal analysis.Considering the responses provided by experts through the Delphi first round questionnaire, the framework was revised.
After revising accordingly, the experts were again asked to review the comprehensiveness of the framework.The descriptive statistics-mean, medium, and standard deviation (SD)-were calculated to depict the distribution and central tendency of our collected responses.Table 2 presents the means of the comprehensiveness of the framework, i.e., its structure, content, language, and learning goal analysis, which were: 4.28, 4.17, 4.27, 4.14, and 4.18, respectively.This demonstrates that the experts found the revised framework very comprehensive in the second round.Furthermore, the obtained SDs (0.689, 0.722, 0.696, 0.707, and 0.707) were small, which means that the experts had similar views about the comprehensiveness of this framework.The validated textbook framework with the definition of its indicators and sub-indicators is described in the following sub-sequent sections.The structure of a textbook is defined as how various parts of the contents are combined and sequenced [7].This not only influences the instructional sequences delivered in classrooms but also affects students' learning processes and effectiveness.Thus, through structure analysis, the underlying pedagogical techniques of a textbook could be examined.Readers of a textbook can understand its structure by examining both macrostructure (i.e., the universal features of textbooks) and microstructure (i.e., lessons intended for a topic) [7].The "Structure Analysis" indicator covers the following sub-indicators:

Page Count and Word Count
The number of the pages of a textbook is an essential feature since it depicts the scope of the text [7].A higher number of pages reflects either additional content covered or a greater depth of coverage.However, this indicator is not comprehensive enough to capture the coverage of textbooks because it neglects the varying textbook page indicator, such as page size [7].Therefore, another indicator-word count-is introduced as a confirmation [30].

Illustration Count
In addition to written words, textbooks also utilize illustrations, which consist of figures and tables, to clarify their contents [45,54].The illustration count reflects the degree to which a textbook relies on this kind of visual content [7].
Chapter/Unit Organization Chapter/unit refers to a topic covered in textbooks to which several lessons could possibly be devoted [7].To compare the same topic across different countries, we not only count how many lessons are designed for it but also attempt to understand coherence between lessons.Figure 3 summarizes the indicators and sub-indicators of the structure analysis.
figures and tables, to clarify their contents [45,54].The illustration count reflects the degree to which a textbook relies on this kind of visual content [7].
Chapter/Unit Organization Chapter/unit refers to a topic covered in textbooks to which several lessons could possibly be devoted [7].To compare the same topic across different countries, we not only count how many lessons are designed for it but also attempt to understand coherence between lessons.Figure 3 summarizes the indicators and sub-indicators of the structure analysis.

Content Analysis
Content is central to textbooks, instructing teachers what is age-appropriate learning content and informing students of what they are expected to master [7].In this case, content analysis is identified as how various content is presented and how content-specific pedagogies are performed in the textbook [7].The "Content Analysis" indicator covers the following sub-indicators:

Complexity of Exercises
Exercise is defined as an activity that students need to complete either independently or with teachers and/or peers.In our study, they are measured not only by their quantity,

Content Analysis
Content is central to textbooks, instructing teachers what is age-appropriate learning content and informing students of what they are expected to master [7].In this case, content analysis is identified as how various content is presented and how content-specific pedagogies are performed in the textbook [7].The "Content Analysis" indicator covers the following sub-indicators:

Complexity of Exercises
Exercise is defined as an activity that students need to complete either independently or with teachers and/or peers.In our study, they are measured not only by their quantity, but also by their quality in working strategy (i.e., individual, cooperative, or collaborative) and type of cognitive expectations.As students are expected to employ knowledge and various skills to solve exercises, these cognitive requirements could be categorized into six levels: "procedural knowledge, conceptual knowledge, representation, problem solving, reasoning, and problem posing" [22].Exercises requiring procedural knowledge usually expect students to employ operations or algorithms, whereas exercises with emphases on conceptual knowledge may require students to clarify the meaning of a concept.In exercises concerned with representation, solutions are typically in the form of pictures, tables, or interpretations.Exercises involving problem-solving are settled in daily-life situations.In exercises related to reasoning, written explanations are generally required.In problem-posing, students are expected to reform or model problems according to a given situation [22].

Motivational Factors
Motivational factors are concerned with how a textbook stimulates students' interest in learning.We listed four specific aspects: historical note, biographic note [55], example, and storytelling.For example, students are inspired by the biographies of scientists and consider them examples to follow.These four motivational factors are measured in terms of the frequency with which they are mentioned.

Efficiency of Illustrations
Apart from written concept explanations, textbooks also present learning materials in another format-illustrations (i.e., figures and tables).They are complex in their relationships with texts [54,56].In this way, illustrations presented in our examined textbooks are coded according to their efficiency type: "representation, organization, interpretation, and/or transformation" [56].Namely, if an illustration depicts the same thing that the text does or further explains the text, it will be coded as "representation" or "interpretation," respectively.An illustration will be coded as "organization" if it clarifies the procedures instructed in the text.Sometimes, illustrations are intended to enhance learners' memories, potentially by re-coding or relating pieces of information.Consequently, such illustrations are coded as "transformation" [56].

Technological Use
This aspect can be evaluated by two factors: technical use and technological support.Technical use is defined as the inclusion of material that helps with learning how to use specific tools, such as instructions for Microsoft Excel.On the other hand, technological support is technological resources that are introduced in the textbook, such as embedded website links or QR codes for educational purposes.Both are measured by their frequencies in a given textbook.

Depiction of Values
As mentioned earlier, other than spreading knowledge, textbooks also transmit social and cultural values [7,19].In our examination, we focused on six types of values conveyed by textbooks: collectivistic, individualistic, traditional, religious, ethnic, and social role values.While collectivistic values are group-oriented and highly focused on social outcomes, individualistic values tend to be self-oriented and focused on self-outcomes [57].Traditional values are defined as ideas that are considered momentous and worthy of being transmitted from one generation to another [58].Religious values represent a set of beliefs regarding human behavior and the ultimate purpose of human beings' existence [59].The depiction of ethnicity is defined by the National Education Association (n.d.) as "a socially constructed grouping of people based on culture, tribe, language, national heritage, and/or religion."And social role values reflect their society's widely held perceptions towards gender-related domestic and social roles.Following Lee and Collins's research [20]

Frequency of Textbook Updates
This indicator is used to depict how updated a textbook is, which potentially suggests whether the content of a textbook is aligned with new research.It is measured by how many times the textbook has been updated within a given period.Figure 4 summarizes the indicators and sub-indicators of the content analysis part.

Language Analysis
The linguistic features of a textbook not only imply interpersonal attitudes of its author(s) with its readers/learners, but also reflect the perceptions a society generally accepts with [20,36].The "Language Analysis" indicator covers the following sub-indicators:

Frequency of Different Personal Pronouns
Identifying the use of personal pronouns in the text is one of the most obvious ways to understand interpersonal relationships between authors and readers of textbooks [36].This is because personal pronouns in writing generally represent the author's voice and position [60].For example, the use of first-person pronouns (I and we) may indicate the author's personal involvement with the activity in the textbook [36].Furthermore, the use of "I" may draw readers' attention to the activity and the authority of the author; the inclusive use of "we" serves to actively involve readers in the activity.Aside from first-person pronouns, second-person pronouns (you) were also explored in the framework.Authors typically use the pronoun "you" to address readers with a degree of authority and direct them towards the details presented in the text [36].

Frequency of Imperatives Use
Sentences in the imperative form invite readers/students to get involved in tasks with diverse expectations.Rotman [61] distinguished two kinds of imperatives: exclusive and inclusive.While exclusive imperatives (e.g., "write" or "put") implicate readers as "scribblers," inclusive imperatives (e.g., "explain" or "prove") ask readers to be "thinkers" [36,61].The examination of imperative usage (i.e., counting the number of each type) can reveal how the textbooks' authors expect their readers to master knowledge, either through actively reflecting or through being commanded to complete assignments.

Frequency of Textbook Updates
This indicator is used to depict how updated a textbook is, which potentially suggests whether the content of a textbook is aligned with new research.It is measured by how many times the textbook has been updated within a given period.Figure 4 summarizes the indicators and sub-indicators of the content analysis part.

Language Analysis
The linguistic features of a textbook not only imply interpersonal attitudes of its author(s) with its readers/learners, but also reflect the perceptions a society generally accepts with [20,36].The "Language Analysis" indicator covers the following subindicators: Frequency of Different Personal Pronouns

Frequency of Sentence Types
This indicator investigates the semantic difficulties of sentences in textbooks.According to the transformational grammar, the active-affirmative-declarative sentence type is the most basic type, which is easier to understand and recall [62].On the contrary, other types like passive, questions, or negative sentences are transformations of the basic type and are more complicated [56].

Readability
This aspect relates to the complexity of the text and the ease of comprehension through reading [56].To some extent, this reflects whether a textbook's language complexity is appropriate for a specific group of students [56].Practically, Flesch Reading Ease and Flesch-Kincaid Grade level [63] can be employed to detect a textbook's readability.

Gender Sensitivity
To understand social perceptions of gender, both "gender-inclusive constructions" and "order of appearance" are good measurements [20].For example, some textbooks adopt masculine nouns (e.g., man) and pronouns (e.g., he and his) to refer to a person whose gender is unknown.Moreover, in using paired pronouns (e.g., he or she, and she/he), the order of the pronouns could reveal whether the society endorses conventional male supremacy, i.e., putting male pronouns first [20].Figure 5 summarizes the indicators and sub-indicators of the language analysis.

Learning Goal Analysis
Textbooks are sometimes written with learning goals, advocating what students are expected to do with their content [7], hence promoting the likelihood of using these anticipated skills.These learning goals, based on their complex demands on students' abilities, could be categorized into five levels: "(1) Understanding basic information/knowledge; (2) Understanding complex information/knowledge; (3) Theorizing, analyzing, and solving exercises; (4) Understanding the use of tools, routine procedures, and science processes and (5) Solving real-world problems" [7]. Figure 6 summarizes the indicators and sub-indicators of the learning goal analysis.To summarize the obtained findings, Figure 7 presents the final overall framework, with the indicators and sub-indicators for comparing textbooks.

Learning Goal Analysis
Textbooks are sometimes written with learning goals, advocating what students are expected to do with their content [7], hence promoting the likelihood of using these anticipated skills.These learning goals, based on their complex demands on students' abilities, could be categorized into five levels: "(1) Understanding basic information/knowledge; (2) Understanding complex information/knowledge; (3) Theorizing, analyzing, and solving exercises; (4) Understanding the use of tools, routine procedures, and science processes and (5) Solving real-world problems" [7]. Figure 6 summarizes the indicators and sub-indicators of the learning goal analysis.

Learning Goal Analysis
Textbooks are sometimes written with learning goals, advocating what students are expected to do with their content [7], hence promoting the likelihood of using these anticipated skills.These learning goals, based on their complex demands on students' abilities, could be categorized into five levels: "(1) Understanding basic information/knowledge; (2) Understanding complex information/knowledge; (3) Theorizing, analyzing, and solving exercises; (4) Understanding the use of tools, routine procedures, and science processes and (5) Solving real-world problems" [7]. Figure 6 summarizes the indicators and sub-indicators of the learning goal analysis.To summarize the obtained findings, Figure 7 presents the final overall framework, with the indicators and sub-indicators for comparing textbooks.To summarize the obtained findings, Figure 7 presents the final overall framework, with the indicators and sub-indicators for comparing textbooks.

Indicators Best Fitted to Compare Textbooks within a Specific Academic Discipline
Additionally, experts were asked to select the most relevant sub-indicators within the four main indicators (structure analysis, content analysis, language analysis and learning goal analysis) described in the framework to compare textbooks from a given academic discipline (i.e., humanities, social sciences, natural sciences, formal sciences, and professional and applied sciences [53]).Figures 8-11 indicate how many responded experts were in favor of employing these sub-indicators as textbook comparative criteria for each discipline.For example, 98 and 79 experts considered page count to be a significant sub-indicator for comparing textbooks in the humanities and social sciences domains, respectively (see Figure 8).However, only 43 experts supported considering page count when comparing professional and applied sciences textbooks.These relative differences suggested which sub-indicators were more crucial in comparing the textbooks from a given discipline.

Indicators Best Fitted to Compare Textbooks within a Specific Academic Discipline
Additionally, experts were asked to select the most relevant sub-indicators within the four main indicators (structure analysis, content analysis, language analysis and learning goal analysis) described in the framework to compare textbooks from a given academic discipline (i.e., humanities, social sciences, natural sciences, formal sciences, and professional and applied sciences [53]).Figures 8-11 indicate how many responded experts were in favor of employing these sub-indicators as textbook comparative criteria for each discipline.For example, 98 and 79 experts considered page count to be a significant sub-indicator for comparing textbooks in the humanities and social sciences domains, respectively (see Figure 8).However, only 43 experts supported considering page count when comparing professional and applied sciences textbooks.These relative differences suggested which sub-indicators were more crucial in comparing the textbooks from a given discipline.
Moreover, some sub-indictors within the Content analysis indicator deemed significant in all five disciplines, such as frequency of textbook update (see Figure 9), were included in each personalized framework based on subjects.
Figure 10 indicates that all language analysis sub-indicators were significant particularly across two disciplines, namely Humanities and Social Sciences.
Finally, learning goal analysis indicator was significant (with a slight difference) across all disciplines, as indicated in Figure 11.
Therefore, based on the experts' responses, the following sub-indicators should be considered for comparing textbooks from a given academic discipline:  Moreover, some sub-indictors within the Content analysis indicator deemed significant in all five disciplines, such as frequency of textbook update (see Figure 9), were included in each personalized framework based on subjects.Figure 10 indicates that all language analysis sub-indicators were significant particularly across two disciplines, namely Humanities and Social Sciences.Moreover, some sub-indictors within the Content analysis indicator deemed significant in all five disciplines, such as frequency of textbook update (see Figure 9), were included in each personalized framework based on subjects.Finally, learning goal analysis indicator was significant (with a slight difference) across all disciplines, as indicated in Figure 11.Therefore, based on the experts' responses, the following sub-indicators should be considered for comparing textbooks from a given academic discipline:  Finally, learning goal analysis indicator was significant (with a slight difference) across all disciplines, as indicated in Figure 11.Therefore, based on the experts' responses, the following sub-indicators should be considered for comparing textbooks from a given academic discipline:

Discussion
Teachers and students highly depend on textbooks to achieve learning outcomes [7].Therefore, analyzing textbooks can identify their unique features and effectiveness [64].Additionally, cross-national or cross-cultural comparative studies about textbooks reflect the unique dimensions of different societies or cultures [15].Although, for such an analysis and comparison, it is essential to have a validated and comprehensive framework, which has been a missing factor in previous literature [38].Therefore, this study developed a framework of comparing textbooks through a systematic literature review and a two-round Fuzzy Delphi survey.Experts participating in this study validated the inclusion of four indicators (structure, content, language, and learning goal analysis).
Concerning structure, our findings highlighted several sub-indicators, including page count, word count, illustration count, and chapter/unit organization.Studies in the reviewed literature primarily focused on analyzing the first three sub-indicators [23,34,65].However, no study has focused on analyzing chapter/unit organization when comparing textbooks.Kim [66] pointed out that the same topic could be organized in different ways across different countries.It is therefore necessary to analyze how different chapters/units, including lessons, might be organized within a given textbook.
Concerning content, our findings highlighted several sub-indicators, including motivational factors, efficiency of illustrations, technological use, complexity of exercises, depiction of values, and frequency of textbook update.These sub-indicators were discussed in the literature from perspectives both similar to and different from our findings.Regarding motivational factors in textbooks, Rivers [55] mainly focused on historical notes and scientists' biographies.In addition to these two motivational factors, the developed framework of this study also included other motivational aspects, such as storytelling and examples, which could be used to stimulate students' interest in learning.In terms of efficiency of illustrations, our framework incorporated Mikk's [56] four efficiency types (i.e., representation, organization, interpretation, and/or transformation) to examine illustrations' relationships with texts.With respect to the technological use sub-indicator, no study focused on integrating technological support (e.g., embedded URLs or QR codes) within textbooks.Therefore, our developed framework examined technological support by considering both technical use and technological support.To examine exercises provided by textbooks, Yang, Reys, and Wu [13] counted the number of problems and exercises, and analyzed the types of cognitive expectations.In addition to these two, the developed framework in this study also investigated the working strategy (i.e., individual, cooperative, or collaborative) advocated by a textbook.As textbooks could also cover social and cultural values, previous research studies included the sub-indicator depiction of values by focusing on traditional values [19] and social role values [20].In addition to these two sub-indicators, the framework of this study also covered individualistic, collectivistic, religious, and ethnic values.Furthermore, although textbooks should be updated frequently to make sure their contents are aligned with the latest research developments, no studies had yet covered the sub-indicator frequency of textbook update in their frameworks.
Concerning language, our findings revealed several sub-indicators, including frequency of different personal pronouns, frequency of imperatives use, frequency of sentence types, readability, and gender sensitivity.All five sub-indicators have been examined in different separate studies [20,36,56,61].However, to the best of our knowledge, no research integrates these five sub-indicators together.Hence, the developed framework of this study provides a comprehensive way to examine the linguistic features of a textbook.
Finally, concerning learning goals, our findings are aligned with the five sub-indicators proposed by Valverde et al. [7]: "understanding basic information/knowledge; understanding complex information/ knowledge; theorizing, analyzing, and solving exercises; understanding the use of tools, routines, procedures, and science processes; and solving real-world problems."These five sub-indicators, from basic to complicated, reflect what levels of knowledge and skills students are expected to achieve.
After validating the indicators and sub-indicators for comprehensively comparing textbooks, we further asked the experts to select the most relevant indicators and subindicators for comparing textbooks from a specific academic discipline (i.e., humanities, social sciences, natural sciences, formal sciences, and professional and applied sciences).Our results highlighted that while some indicators and sub-indicators, namely chapter/unit, frequency of textbook update, and learning goal analysis, were considered relevant to all five academic disciplines, other indicators and sub-indicators could be best fitted to compare textbooks within a specific academic discipline (e.g., page count as textbooks comparison criterion for only the humanities and social sciences domains).Based on our results, five personalized frameworks corresponding to each academic discipline were yielded.Importantly, the development of the personalized frameworks based on subjects complements and advances previous textbook comparison research, most of which was concerned with specific disciplines, such as mathematics [7,33,38,39].

Conclusions, Implications, and Future Directions
This study developed and validated a framework for comparing textbooks using the triangulation method, based on a systematic review and two-round Fuzzy Delphi method with 155 experts.The findings of this study could contribute to a better understanding of textbooks from cross-nation and cross-discipline perspectives.Specifically, the obtained comprehensive framework for comparing textbooks could help researchers and practitioners understand similarities and differences between textbooks from different countries or disciplines, hence contributing to a better design of textbooks for better learning experiences and outcomes.This can promote the design of open textbooks that could be used across different countries.
It should be noted that this study has several limitations that should be acknowledged and further researched.For example, the obtained studies within the systematic review were limited to the search keywords used as well as the electronic databases.Despite these limitations, this study presents a solid ground related to textbook comparison and design.Future research should focus on choosing textbooks (for a specific discipline and grade) from different regions (e.g., Asia, Europe, and Africa) and comparing them based on the developed framework from this study for a better understanding of how these textbooks are different across various regions.

Figure 1 .
Figure 1.Research methodology of this study.

Figure 1 .
Figure 1.Research methodology of this study.

Figure 2 .
Figure 2. Flowchart of the Comprehensive Selection Process.

Figure 2 .
Figure 2. Flowchart of the Comprehensive Selection Process.

Figure 3 .
Figure 3. Indicators and sub-indicators of the structure analysis.

Figure 3 .
Figure 3. Indicators and sub-indicators of the structure analysis.
, our study measures social role values by examining (a) to what extent are different genders described in social settings and in what areas are they professional, and (b) to what extent are different genders described in domestic settings.
, our study measures social role values by examining (a) to what extent are different genders described in social settings and in what areas are they professional, and (b) to what extent are different genders described in domestic settings.

Figure 4 .
Figure 4. Indicators and sub-indicators of the content analysis.

Figure 4 .
Figure 4. Indicators and sub-indicators of the content analysis.

Figure 5 .
Figure 5. Indicators and sub-indicators of the language analysis.

Figure 6 .
Figure 6.Indicators and sub-indicators of the learning goal analysis.

Figure 5 .
Figure 5. Indicators and sub-indicators of the language analysis.

Sustainability 2022 ,Figure 5 .
Figure 5. Indicators and sub-indicators of the language analysis.

Figure 6 .
Figure 6.Indicators and sub-indicators of the learning goal analysis.

Figure 6 .
Figure 6.Indicators and sub-indicators of the learning goal analysis.

Figure 7 .
Figure 7. Final validated framework for comparing textbooks.

Figure 7 .
Figure 7. Final validated framework for comparing textbooks.

Figure 8 .
Figure 8. Distribution of experts' answers concerning the sub-indicators of structure analysis based on academic discipline.

Figure 9 .
Figure 9. Distribution of experts' answers concerning the sub-indicators of content analysis based on academic discipline.
Social Sciences Natural Sciences Formal Sciences Professional and Applied Sciences

Figure 8 . 21 Figure 8 .
Figure 8. Distribution of experts' answers concerning the sub-indicators of structure analysis based on academic discipline.

Figure 9 .
Figure 9. Distribution of experts' answers concerning the sub-indicators of content analysis based on academic discipline.

Figure 10
Figure10indicates that all language analysis sub-indicators were significant particularly across two disciplines, namely Humanities and Social Sciences.

Figure 9 . 21 Figure 10 .
Figure 9. Distribution of experts' answers concerning the sub-indicators of content analysis based on academic discipline.

Figure 11 .
Figure 11.Distribution of experts' answers concerning learning goal analysis based on academic discipline.

Figure 10 . 21 Figure 10 .
Figure 10.Distribution of experts' answers concerning the sub-indicators of language analysis based on academic discipline.

Figure 11 .
Figure 11.Distribution of experts' answers concerning learning goal analysis based on academic discipline.

Figure 11 .
Figure 11.Distribution of experts' answers concerning learning goal analysis based on academic discipline.

Table 2 .
Experts' identifications of the comprehensiveness of each part of the framework.

•
Humanities: page count, word count, chapter/unit, motivational factors, depiction of values, frequency of textbook update, frequency of different personal pronouns, frequency of imperative use, frequency of sentence types, readability, gender sensitivity, and learning goal analysis.• Social Sciences: page count, word count, chapter/unit, motivational factors, frequency of textbook update, frequency of different personal pronouns, frequency of imperative use, frequency of sentence types, readability, gender sensitivity, and learning goal analysis.• Natural Sciences: illustration count, chapter/unit, complexity of exercises, efficiency of illustrations, frequency of textbook update, and learning goal analysis.• Formal Sciences: chapter/unit, complexity of exercises, motivational factors, efficiency of illustrations, technological use, frequency of textbook update, and learning goal analysis.• Professional and Applied Sciences: chapter/unit, technological use, frequency of textbook update, and learning goal analysis.