Chatbot Technology Use and Acceptance Using Educational Personas

: Chatbots are computer programs that mimic human conversation using text or voice or both. Users’ acceptance of chatbots is highly influenced by their persona. Users develop a sense of familiarity with chatbots as they use them, so they become more approachable, and this encourages them to interact with the chatbots more readily by fostering favorable opinions of the technology. In this study, we examine the moderating effects of persona traits on students’ acceptance and use of chatbot technology at higher educational institutions in the UK. We use an Extended Unified Theory of Acceptance and Use of Technology (Extended UTAUT2). Through a self-administrated survey using a questionnaire, data were collected from 431 undergraduate and postgraduate computer science students. This study employed a Likert scale to measure the variables associated with chatbot acceptance. To evaluate the gathered data, Structural Equation Modelling (SEM) coupled with multi-group analysis (MGA) using SmartPLS3 were used. The estimated Cronbach’s alpha highlighted the accuracy and legitimacy of the findings. The results showed that the emerging factors that influence students’ adoption and use of chatbot technology were habit, effort expectancy, and performance expectancy. Additionally, it was discovered that the Extended UTAUT2 model did not require grades or educational level to moderate the correlations. These results are important for improving user experience and they have implications for academics, researchers, and organizations, especially in the context of native chatbots.


Introduction
Chatbots (also known as Conversational Agents, bots, IM bots, Smartbots, or Talkbots) are computer programs designed to simulate an intelligent conversation with one or more human users via auditory or textual methods using natural language.Well-known examples of chatbots are Apple Siri and Amazon Alexa.Typical functionalities include providing information about the weather, scheduling meetings, tracking flights, giving up-to-date news, and finding restaurants, to name a few.Chatbots can also be used as a powerful tool in education.They can work as a language tutor, as in the example known as Sofia [1].Chatbots can also assist with the teaching of mathematics and help users to solve algebra problems, as with the Pari and Mathematica chatbots.In the study of medicine, chatbots help medical students by simulating patients and providing responses during interviews with medical students; an example of this type of chatbot is the Virtual Patient bot (VPbot).
This paper reports on part of a study that comprised three stages or iterations (Figure 1).The first iteration identified student groups at Brunel University London by building data-driven persona development models for university students.The outcomes of this stage were the persona template, persona model, and proposed data-driven development method [2].This second iteration identified acceptable chatbot features by evaluating an extended UTAUT2 model.The third iteration will evaluate the effectiveness of the persona modeling approach by designing and developing chatbot instantiation (future work).
This paper reports on part of a study that comprised three stages or iterations (Figure 1).The first iteration identified student groups at Brunel University London by building data-driven persona development models for university students.The outcomes of this stage were the persona template, persona model, and proposed data-driven development method [2].This second iteration identified acceptable chatbot features by evaluating an extended UTAUT2 model.The third iteration will evaluate the effectiveness of the persona modeling approach by designing and developing chatbot instantiation (future work).
This paper uses a persona lens in the form of persona elements to create an Extended UTAUT2 model.Each iteration followed a build-and-evaluate cycle to produce artifacts and to design the cycle steps [3].

A Persona Lens
The persona concept was coined in 1999 by Alan Cooper in Chapter 9 of his book The Inmates are Running the Asylum [4].Personas have become conventional design methods that are widely used.However, there is no standard definition for 'persona'.The literature presents persona as user-centered design (UCD) methods that represent a group of users who share common goals, attitudes, and behaviors during interaction with a product [5,6].Initially, as UCD became more popular, the usability of systems, websites, and products [7,8] increased customer-centered design, often known as 'human-centered design'.This is a style of design that involves consumers or users in the design process [7,8].Although UCD has expanded rapidly, there is still considerable dissatisfaction with the design of current items.Many businesses have neglected to prioritize customer needs as the most important part of the design process [9].As a result, a large number of design processes have failed to reach the intended customers or consumers [8,10].Usability concerns with goods, systems, and websites have been thoroughly documented, indicating that current product design procedures need to be improved.Many products are returned because they are difficult to operate or because the users are unable to use the features they want [8].
With current UCD methodology, personas provide a solution to some of these issues.[4] developed the 'persona' notion as a design process methodology [11].Personas are made-up archetypes of real users, not real individuals [11]."A precise portrayal of a hypothetical user and what s/he desires to accomplish" neatly summarizes personas ( [12], p. 1).A 'target customer characterisation,' 'profile,' or 'user archetype' is another term for a persona [13].Persona development is a different way of representing and communicating the demands of people [8].A great deal of research has been conducted on persona This paper uses a persona lens in the form of persona elements to create an Extended UTAUT2 model.Each iteration followed a build-and-evaluate cycle to produce artifacts and to design the cycle steps [3].

A Persona Lens
The persona concept was coined in 1999 by Alan Cooper in Chapter 9 of his book The Inmates are Running the Asylum [4].Personas have become conventional design methods that are widely used.However, there is no standard definition for 'persona'.The literature presents persona as user-centered design (UCD) methods that represent a group of users who share common goals, attitudes, and behaviors during interaction with a product [5,6].Initially, as UCD became more popular, the usability of systems, websites, and products [7,8] increased customer-centered design, often known as 'human-centered design'.This is a style of design that involves consumers or users in the design process [7,8].Although UCD has expanded rapidly, there is still considerable dissatisfaction with the design of current items.Many businesses have neglected to prioritize customer needs as the most important part of the design process [9].As a result, a large number of design processes have failed to reach the intended customers or consumers [8,10].Usability concerns with goods, systems, and websites have been thoroughly documented, indicating that current product design procedures need to be improved.Many products are returned because they are difficult to operate or because the users are unable to use the features they want [8].
With current UCD methodology, personas provide a solution to some of these issues.[4] developed the 'persona' notion as a design process methodology [11].Personas are made-up archetypes of real users, not real individuals [11]."A precise portrayal of a hypothetical user and what s/he desires to accomplish" neatly summarizes personas ( [12], p. 1).A 'target customer characterisation', 'profile', or 'user archetype' is another term for a persona [13].Persona development is a different way of representing and communicating the demands of people [8].A great deal of research has been conducted on persona templates [14,15], creating personas [13,16,17], and determining what they are good for [4,13,17].Persona development is becoming more popular as a design technique as it identifies the fundamental characteristics of consumers, which can be exploited in product design and marketing [17].It is also a cost-effective solution to enhance users' experiences with products and services [13].Furthermore, personas provide powerful representations of target users to product designers [8].
Personas are used to represents common clusters of traits [18,19], for example, to identify groups of students with similar characteristics or features.Personas also provide other advantages, including (1) gaining a deeper understanding of users; (2) determining early design needs; (3) aiding design thinking; (4) ensuring a focus on users' goals, requirements, and characteristics; (5) facilitating stakeholders' communication; and (6) considering political and social concerns in design decisions [6].However, according to [12], there are several issues with persona development.One of these is the development of personas that are not based on first-hand evidence [20], which is not the case in this study.Personas can be unreliable if they lack clear connection to the facts, such as when they are created by a committee [20].Several studies cover the best practices with personas [20,21].Creating a persona can be challenging because they are not based on first-hand customer data [12,22], and in some cases, the sample size is statistically insufficient [12,22].Data-driven personas were proposed by [12,23], for example, based on clickstreams [23,24] or statistical data [12,23].Machine learning methods, more specifically K-means clustering, are used to build personas [2,18,19,25].

The Extended Unified Theory of Acceptance and Use of Technology (UTAUT2)
The acceptance and use of technology is both a popular and practical subject, resulting in several models being developed from theories within sociology and psychology [26,27] developed the Unified Theory of Acceptance and Use of Technology (UTAUT) to synthesize existing acceptance theories and models, as well as to study student acceptance and use of technology in an organizational context.UTAUT was developed as a result of reviewing eight main theories and models of technology acceptance: the Theory of Reasoned Action (TRA), the Technology Acceptance Model (TAM), the Motivational Model (MM), the Theory of Planned Behavior (TPB), Combined TAM and TPB, the Model of PC Utilization, Diffusion of Innovation Theory (DoI), and Social Cognitive Theory (SCT).UTAUT consists of four constructs, as shown in Figure 2, namely performance expectancy, effort expectancy, social influence, and facilitating condition.UTAUT factors (constructs) affect the behavior intention (BI) and usage of technology.Impacting these constructs are four moderators-the age, gender, experience, and voluntariness of use [27].
templates [14,15], creating personas [13,16,17], and determining what they are good for [4,13,17].Persona development is becoming more popular as a design technique as it identifies the fundamental characteristics of consumers, which can be exploited in product design and marketing [17].It is also a cost-effective solution to enhance users' experiences with products and services [13].Furthermore, personas provide powerful representations of target users to product designers [8].
Personas are used to represents common clusters of traits [18,19], for example, to identify groups of students with similar characteristics or features.Personas also provide other advantages, including (1) gaining a deeper understanding of users; (2) determining early design needs; (3) aiding design thinking; (4) ensuring a focus on users' goals, requirements, and characteristics; (5) facilitating stakeholders' communication; and (6) considering political and social concerns in design decisions [6].However, according to [12], there are several issues with persona development.One of these is the development of personas that are not based on first-hand evidence [20], which is not the case in this study.Personas can be unreliable if they lack clear connection to the facts, such as when they are created by a committee [20].Several studies cover the best practices with personas [20,21].Creating a persona can be challenging because they are not based on first-hand customer data [12,22], and in some cases, the sample size is statistically insufficient [12,22].Datadriven personas were proposed by [12,23], for example, based on clickstreams [23,24] or statistical data [12,23].Machine learning methods, more specifically K-means clustering, are used to build personas [2,18,19,25].

The Extended Unified Theory of Acceptance and Use of Technology (UTAUT2)
The acceptance and use of technology is both a popular and practical subject, resulting in several models being developed from theories within sociology and psychology [26,27]   UTAUT constructs are similar to other constructs in other models.For example, performance expectancy (PE) and effort expectancy (EE) are similar to two TAM constructs, UTAUT constructs are similar to other constructs in other models.For example, performance expectancy (PE) and effort expectancy (EE) are similar to two TAM constructs, Perceived Usefulness (PU) and Perceived Ease of Use (PEOU), respectively.Moreover, social influence (SI) is similar to the Social Norm (SN) in TRA, and facilitating conditions (FC) are similar to PBC in TPB.Multiple sectors use UTAUT, such as E-government [28], online banking [29,30], and health/hospital IT [30].However, it has received less attention than other existing models.There are also criticisms related to its explanatory power and parsimony [31].
The UTAUT was extended by Venkatesh et al. (2012) [26] and named The Extended Unified Theory of Acceptance and Use of Technology (UTAUT2).UTAUT and UTAUT2 were developed for different environments.The former was developed for the organizational context, while the latter was conceived for a consumer context (Venkatesh, Thong and Xu, 2012) [26].As well as the four constructs found in UTAUT (PE, EE, FC, and SI), UTAUT2 has three additional constructs: hedonic motivation (HM), price value (PV), and habit (HT), as shown in Figure 3. UTAUT has four moderators-experience, gender, age and voluntariness of use-while UTAUT2 contains only the first three, without voluntariness of use.Perceived Usefulness (PU) and Perceived Ease of Use (PEOU), respectively.Moreover, social influence (SI) is similar to the Social Norm (SN) in TRA, and facilitating conditions (FC) are similar to PBC in TPB.Multiple sectors use UTAUT, such as E-government [28], online banking [29,30], and health/hospital IT [30].However, it has received less attention than other existing models.There are also criticisms related to its explanatory power and parsimony [31].
The UTAUT was extended by Venkatesh et al. (2012) [26] and named The Extended Unified Theory of Acceptance and Use of Technology (UTAUT2).UTAUT and UTAUT2 were developed for different environments.The former was developed for the organizational context, while the latter was conceived for a consumer context (Venkatesh, Thong and Xu, 2012) [26].As well as the four constructs found in UTAUT (PE, EE, FC, and SI), UTAUT2 has three additional constructs: hedonic motivation (HM), price value (PV), and habit (HT), as shown in Figure 3. UTAUT has four moderators-experience, gender, age and voluntariness of use-while UTAUT2 contains only the first three, without voluntariness of use.

From Persona Elicitation to Technology Acceptance
Earlier work, iteration 1, used K-means clustering to identify trait clusters [32].Eight personas resulted in the persona features shown in Figure 4, including demographic data (i.e., age and gender), educational data (i.e., level of study), virtual engagement (i.e., engagement with Virtual Learning Environments), physical engagement (i.e., attendance), and performance data (i.e., grade).Further features are added from the literature review to build the persona model (Figure 5).These elements of personas were incorporated into the survey question (See Appendices A and B) as well as into the proposed Extended UTAUT2 model (Section 2, Figure 6).To explain this further, persona design typically names and describes user archetypes with a mix of visual and narrative content (as seen in the Top Student example in Figure 5).Our prior work [32] added a more analytical approach to persona design using K-means to uncover clusters (persona) and unique characteristics.Subsequently, when constructing an extended UTAUT2 model, we are able to select the composite persona name or the unique attributes of the persona.Unique

From Persona Elicitation to Technology Acceptance
Earlier work, iteration 1, used K-means clustering to identify trait clusters [32].Eight personas resulted in the persona features shown in Figure 4, including demographic data (i.e., age and gender), educational data (i.e., level of study), virtual engagement (i.e., engagement with Virtual Learning Environments), physical engagement (i.e., attendance), and performance data (i.e., grade).Further features are added from the literature review to build the persona model (Figure 5).These elements of personas were incorporated into the survey question (See Appendices A and B) as well as into the proposed Extended UTAUT2 model (Section 1.4, Figure 6).To explain this further, persona design typically names and describes user archetypes with a mix of visual and narrative content (as seen in the Top Student example in Figure 5).Our prior work [32] added a more analytical approach to persona design using K-means to uncover clusters (persona) and unique characteristics.Subsequently, when constructing an extended UTAUT2 model, we are able to select the composite persona name or the unique attributes of the persona.Unique attributes were chosen as they would enable further exploration of attribute importance.This added detail may also be able to inform the persona design with a prioritized focus on the narrative around important factors.
attributes were chosen as they would enable further exploration of attribute importance This added detail may also be able to inform the persona design with a prioritized focus on the narrative around important factors.attributes were chosen as they would enable further exploration of attribute importance This added detail may also be able to inform the persona design with a prioritized focus on the narrative around important factors.The objectives of this study are as follows: To investigate how specific student groups (modeled as personas) differ in their use and adoption of chatbot technology.
To comprehensively examine the UTAUT model and its extension, UTAUT2, in a variety of contexts, with an emphasis on their structures, moderators, and applications.
To identify the main determinants of students' acceptance and use of chatbot technology using UTAUT2.
To improve understanding in the area of technological acceptance and to inform decision-making processes by elucidating the factors influencing technology adoption and usage.
information is provided in [2,32].The results of the data analysis of the first iterat [2] showed that there are seven main attributes of personas: age, gender, experience, ical engagement (attendance), virtual engagement (level of engagement with VLEs) cational level, and performance (grade).Figure 6 shows the proposed conceptual f work (Extended UTAUT2).Further discussion and justification for the research hyp ses are provided in the forthcoming subsections.These objectives are essential for improving adoption strategies, boosting decisionmaking processes, and comprehending user variation in technology adoption.Deepening our understanding of user preferences and demands, this study looks at UTAUT and UTAUT2 in a variety of scenarios.The understanding generated will direct the creation of efficient technology solutions that meet user requirements and promote higher adoption and utilization.Furthermore, decision-makers across sectors can benefit from insights into user approval, which can help them to formulate effective implementation strategies.Determining the factors impacting adoption aids in the improvement of adoption strategies, resulting in more seamless integration.This addition to the corpus of information on technological acceptability will stimulate new research and creative thinking in the development and application of technology.

The Proposed Conceptual Model and Hypotheses
This section covers the design of the proposed Extended UTAUT2 model.UTAUT2 [26] is employed to examine how students use technology, in this case, the chatbot.It explains the intention to use (BI) and seven constructs: performance expectancy, effort expectancy, social influence, facilitating conditions, hedonic motivation, price value, and habit.The moderators in UTAUT2 are age, gender, and experience.However, in this study, price value was excluded as the proposed chatbot was free to use.An expanded list of moderators, including UTAUT2 moderators, was also included in the proposed model and tested in the evaluation phase.The moderator of the proposed model is a persona; K-means clustering analysis was utilized with the students' data to build personas.Further information is provided in [2,32].The results of the data analysis of the first iteration in [2] showed that there are seven main attributes of personas: age, gender, experience, physical engagement (attendance), virtual engagement (level of engagement with VLEs), educational level, and performance (grade).Figure 6 shows the proposed conceptual framework (Extended UTAUT2).Further discussion and justification for the research hypotheses are provided in the forthcoming subsections.

Performance Expectancy
Performance expectancy (PE) is defined as "the degree to which an individual believes that using the system will help him or her to attain gains in job performance" [27].Prior research has identified PE as a significant predictor of BI [27,28].
Hypothesis 1 (H1): PE will have a positive effect on students' BI to use chatbots.

Effort Expectancy
Effort expectancy (EE) is defined as "the degree of ease associated with the use of the system" ( [27], p. 450).EE and its latent variable have been shown to be significant in many research studies and proven to work as a predictor of user intention to adopt new technology [26,34,35].
Hypothesis 2 (H2): EE will have a positive effect on students' BI to use chatbots.

Social Influence
Social influence (SI) is defined as "the degree to which an individual perceives that important others believe he or she should use the new system" ( [27], p. 451).SI was shown to be significant for specifying user intention to use technology in many studies [34,36,37].
Hypothesis 3 (H3): SI will have a positive effect on students' BI to use chatbots.

Facilitating Condition
Facilitating condition (FC) is defined as "the degree to which an individual believes that an organizational and technical infrastructure exists to support the use of the system" ( [27], p. 453).
Hypothesis 4 (H4): FC will have a positive effect on students' BI to use chatbots.

Hedonic Motivation
Hedonic motivation (HM) is defined as "the fun or pleasure derived from using technology" ( [26], p. 8).Studies have proven that HM plays a decisive role in determining technology acceptance and the use of technology [26,34,38].
Hypothesis 5 (H5): HM will have a positive effect on students' BI to use chatbots.

Habit
Habit (HT) as a construct in UTAUT2 [26] is defined in the information systems and technology context as "the extent to which people tend to perform behaviours (use IS) automatically because of learning" [39].HT can be described in two ways: as a prior behavior [39] or as an automatic behavior [39,40].HT has a direct and indirect effect on technology use, according to the UTAUT2 model [26,34].
Hypothesis 6 (H6): HT will have a positive effect on students' BI to use chatbots.

Behavioral Intention
Behavioral intention (BI) has been defined in prior research as a "function of both attitudes and subjective norms about the target behaviour, predicting actual behaviour" [41].The strength of an individual's commitment to engage in particular activities can be assessed by their BI [42].
Hypothesis 7 (H7): BI will have a positive effect on students' BI to use chatbots.

8.
The Moderating Effects of Personas on Technology Acceptance and its Use.
This study extends the moderator with more elements, which are now part of the persona moderator.An explanation of each moderator is provided below: (i) Age: This is a moderator in UTAUT and UTAUT2.It has an impact on all seven core constructs that affect users' intention to use and use of technology [43].This study tests whether age moderates the effect of determinants on BI and the use of technology.
(ii) Gender: Like the age moderator, gender is a moderator in UTAUT and UTAUT2, and also has an impact on all seven core constructs which affect users' intention and use of technology [43].This study will also test whether gender moderates the effect of determinants on BI and the use of technology.
(iii) Experience is a moderator in the UTAUT and UTAUT2 model.It is defined as mobile internet usage experience [43].In this study, the term experience presents prior experience of using chatbots such as Siri or Amazon Alexa (as exemplars).This study will test whether experience moderates the effect of determinants on BI and the use of chatbot technology.(vi) Educational level (year of study): This is a new moderator that represents the year of study for undergraduate students at Brunel University London.This moderator tests whether the year of study moderates the effect of determinants on BI and the use of technology.This educational level moderator stemmed from our proposed model (Figure 4).
Hypothesis 13 (H13g1, g2, g3, g4, g5, g6): Educational level moderates the effects of PE, EE, SI, FC, HM, and HT on students' BI and use of chatbot technology (vii) Grade: This is a new moderator that represents the performance of the students, derived from our proposed model (Figure 4).It tests whether grade moderates the effect of the determinants on BI and the use of technology.

Sampling and Survey Administration
Before conducting the data collection, ethical approval was obtained from the ethical committee at Brunel University London.A survey was designed to achieve the aim of this study [44].The survey was developed after reviewing state-of-the-art literature.It is important to carry out a pilot study before conducting the actual data collection in order to test the validity and reliability of the survey and to improve the format, questions, and scales [45].A pilot study establishes the ability to answer the proposed research question and it provides face validity [46][47][48].The sample size for the pilot study should be relatively small, a maximum of 100, according to [49].In this case, a pilot study was carried out with 99 randomly selected computer science students.Some questions were updated and simplified following the participants' comments.It took five months to design and build the final version of the survey; it was revised and reviewed by an expert and the researcher after the pilot study.
The adopted and extended model is referred to as the Extended UTAUT2 (Figure 6).The survey aimed to gather data on the students' acceptance and use of chatbots at Higher Education Institutions (HEIs).The survey was divided into two sections.The first section contained questions related to demographic data and the moderators.Also, it included some questions about the type of chatbots used, how long the participants had been using them, and their level of experience Appendix A (Table A1).The second section contained questions related to the main determinants/constructs of UTAUT2, as mentioned in previous sections (PE, EE, SI, HM, FC, and HT), and BI and USE Appendix B (Table A2).The questions were supported by references.
The survey was created using the University of Bristol's Bristol Online Survey tool, which is a free web-based survey tool.In this study, participation was purely voluntary, and the participants were informed of the study's purpose as well as their freedom to withdraw at any time.Also, they were assured that their data would be confidential and their identities would not be revealed.The survey took less than 8 min to complete on average.The chance to win one of ten GPB 20 Amazon vouchers was offered as a participation prize to motivate people to fill out the survey.A total of 431 students answered the survey.All of the responses were complete.The Teaching Program Office (TPO) of the College of Engineering, Design and Physical Science sent weekly email reminders to undergraduate and postgraduate computer science students to complete the survey.The survey was password-protected so it could be accessed only by the targeted respondents.All of the important questions were set as mandatory in order to guarantee that there were no missing data that would affect the data analysis, especially the data analysis using SEM.
The scales used in this study were adapted from prior UTAUT2 investigations, with all constructs measured using seven items (on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree)).The items for each construct were taken from a previous study [26].

Results
This section covers the results of the analysis.The steps taken during the analysis are described in more detail in Appendix C (Table A6).It is important to mention that all the questions included in the survey were taken from the literature (Appendices A and B), where they have been tested and proven to be valid and reliable for measuring the constructs that they were intended to represent.More specifically, the items of the survey were adapted from UTAUT2 [26], which has been used in many studies to investigate user acceptance and the use of different types of technology.The pilot study highlighted a few minor suggestions, including the survey layout and question wording, and this confirms the face validity.The pilot study survey data were analyzed to find any potential threats or drawbacks within the survey items, in order to decide whether to keep, delete, or amend each item.It took participants a maximum of 8 min to complete the survey, which was deemed reasonable, and this confirmed the content validity.Table 1 shows the result of the analysis of the pilot study data.The table shows Cronbach's alpha results ranging from 0.842 for HB to 0.956 for SI, showing that all constructs have outstanding reliability.This means that all the measured variables used with each construct are positively correlated.Also, the table indicates two internal consistency reliability indicators: inter-item correlation and item-to-total correlation.According to [50], the value of inter-item correlation should exceed 0.3, while item-to-total correlation should exceed 0.5.The result shows that all constructs exceed the cut-off value for inter-item correlation except for the USE construct.After examining each item of USE, it was found that USE-5 had a lower inter-item correlation (0.197); hence, USE-5 was excluded from the survey.

Preliminary Examination of the Main Study Data
This section provides an overview of the preliminary data analysis of the collected responses.A total of 431 responses were collected during five months from undergraduate and postgraduate computer science students.The analysis was performed using Statistical Package for the Social Science (SPSS) version 25.The preliminary data analysis included data screening and dealing with missing data and outliers, in addition to testing for normality, homogeneity, and multicollinearity in the dataset.Moreover, it covered reliability analysis, descriptive analysis, and exploratory data analysis.The results of this analysis focus on understanding undergraduate and postgraduate students' acceptance and use of chatbots.The descriptive statistics are shown in Appendix B (Table A2).
(a) Data screening and missing data: The answers in all the questionnaires were screened for any missing values using the descriptive statistic for every measured item in the questionnaire.For greater accuracy, we compared the collected answers with the expected responses from the original questionnaire.In data analysis, missing values are considered a critical problem that affects the results of the study.The situation is even more complicated with SEM [51], as some tools such as AMOS cannot work appropriately with missing data.Furthermore, several statistical methods cannot be employed when there are missing values, such as Chi-Square, modification indices, and fit measures (i.e., goodnessof-fit-index).However, the initial screening in SPSS v 25 revealed that there were no missing data for the main elements of the model.(b) Outliers: An outlier is defined as "observations with a unique combination of characteristics identifiable as distinctly different from the other observations" ( [52], p. 73).It is critical to detect and treat outliers as they strongly bias statistical tests and may affect the normality of the data [53].A study by [53] suggest deleting the extreme outliers while keeping the mild outliers.According to [52], there are two types of outliers: multivariate outliers and univariate outliers.For both types in this study, the results showed that there were no extreme outlier values in the dataset that needed to be removed, and a few mild outliers were kept in the database.
(c) Testing the normality assumption: In multivariate analysis, it is essential to examine the data for the presence of normality [50].The reliability and validity of the data are affected when the data are not normally distributed.In this study, we used the Jarque-Bera (skewness-kurtosis) test to check the normality of the data.According to [54], skewness values represent the symmetry of data distribution.The data are shifted to the left with a negative skew value, while they are shifted to the right with a positive skew value.Also, the kurtosis value represents the height of the data distribution [54].Peaked distribution comes with a positive value, while flatter distributions come with a negative value [54].
Ref. [53] recommended the normal range value for skewness-kurtosis as ±2.58.As can be seen in Table A3 (Appendix B), all items in the dataset were normally distributed, except with EE (EE1, EE3) FC (FC1, FC2), and USE (USE1, USE2 and USE3), which ranged from 3.182 to 5.743.However, the value of skewness was in the range of +0.412 to −2.470.Table A3 (Appendix B) shows the means, standard deviation, skewness, and kurtosis value for each.
Table A4 (Appendix B) shows the results for the normal distribution of the data using the Kolmogorov-Smirnov test in SPSS v 25.The results indicate that the p-values for all measured variables are 0.000 (p < 0.05), confirming that the data are not normally distributed.Therefore, PLS methods were used in the analysis as they are robust to nonnormally distributed data [55].
(d) Homogeneity of variance in the dataset: Homogeneity is defined as "the assumption of normality related with the supposition that dependent variable(s) display an equal variance across the number of an independent variable (s)" [53].In multivariance analysis, it is critical to specify the presence of homogeneity of variance because it might cause an invalid estimation of the standard errors [50].Therefore, Levene's test (SPSS v 25) was used to check for the presence of homogeneity of variance in the collected data, as shown in Table A5 (Appendix B).The results show that all constructs were significant (p < 0.05) when using gender as a non-metric variable in the independent sample t-test (Table A5, Appendix B).The p-value for all constructs is less than 0.05 (p < 0.05).This result confirms the absence of homogeneity of variance in the collected data and suggests that variance is not equal in the proposed model for the two genders of the study cohort, i.e., male and female.
(e) Multicollinearity: Multicollinearity appears when there are two or more variables that are highly correlated with each other [54]; different scholars suggest different values as satisfactory.For example, according to [54], a correlation value of 0.7 or higher is a reason for concern, while [53] state that a correlation value over 0.8 is highly problematic.Two values determine the multicollinearity: the Variance Inflation Factor (VIF) and tolerance [54].Multicollinearity appears when the VIF is less than 3.0 and the tolerance value is greater than 0.1.The multicollinearity check was performed on the dataset using SPSS (v 25), and given all the independent constructs, the results show that there is multicollinearity in the data because the tolerance value for all constructs is greater than 0.1, and they have VIF values less than 3.0, except for the PE construct (PE has VIF > 3.0 with all constructs except with habit), and only a few values between three and five.

Profiles of Respondents
The descriptive analysis of the collected data using SPSS indicates that there were 233 (54.1%) male and 197 (45.7%) female participants.Participant ages were grouped into five levels, with 82.1% of the participants falling in the age groups of 18-21 and 22-25 years, and only 10% in the 26-29 years age group.The minority age groups were <18 and ≥30, with 3% and 4.9%, respectively.The target participants were either at undergraduate or postgraduate level; the majority of respondents were undergraduate students (94.2%), while only 5.8% were Master's students.Also, the majority of students (97.7%) were full-time students, while 2.3% were part-time students.Undergraduate students were classified as follows: year one students were level 1, year two students were level 2, and placement and year three students were level 3.Over half (60%) of the respondents were at educational levels 1 and 2, while 40% were on placement or level 3. Regarding the distributions of students' grades, the results revealed that 51% and 27.4% had been awarded grade As and Bs, respectively, while the minority of 21.6% had grades of Cs, Ds, and Fs or selected 'not applicable/prefer not to say'.
In relation to user experience with chatbots, it is necessary to consider the chatbots being used.The survey questioned respondents on the types of chatbot they had used (Siri by Apple, Alexa by Amazon, Cortana by Microsoft, and Google Assistant by Google) (Appendix A).The participants were allowed to select more than one answer.The results show that Siri and Google Assistant were the two most popular chatbots amongst the students, while Cortana was the least popular chatbot.The results also show that other chatbots used by the students included Bixby by Samsung, S Voice, and Tmall Genie.
In terms of chatbot usage and frequency of use, the chatbot usage category revealed that the majority of students (77.3%) used chatbots, while 22.7% did not.The data on the frequency of use of chatbots showed that more than 47.7% of students were using chatbots daily or several times a day, while the rest (52.3%) used it weekly or once a month.The category of chatbot experience shows that the majority of the participants had 1-3 years' or 3-5 years' experience with chatbots, with 59 (35.1%) and 35 (31.8%), respectively.Just 20.7% of the students had less than one year's experience of using chatbots, while 4% had more than five years' experience.Approximately 30% of the respondents had some level of experience of using chatbots-they had tried and used some basic functionalities-while only 5% of respondents were not experienced at all.

Descriptive Analysis of the Main Study
This section covers the descriptive analysis of the main constructs in Extended UTAUT2.Each was assessed using a seven-point Likert scale, as follows: (i) Performance expectancy: To assess the PE construct, four items were employed, all of which were adapted from previous work on UTAUT2 [27,28], as shown in Table A2 (Appendix B).The means for the elements associated with PE range between 4.58 (±1.882) and 5.21 (±1.734).According to the findings, the chatbots aided the students in meeting their job performance goals.
(ii) Effort expectancy: Four items were employed to measure EE, all of which were adapted from UTAUT2 [26,27], as shown in Table A2 (Appendix B).The means for each item linked to the EE construct range between 5.59 (±1.383) and 6.11 (±1.099), indicating that the majority of the participants in this study agreed that chatbots are simple to use.
(iii) Social influence: Three items taken from UTAUT2 [26,27] were used to calculate SI.The means for each item connected to the EE construct range between 3.12 (1.716) and 3.18 (1.678), indicating that most participants agreed that significant others (friends and relatives) did not believe that they should use chatbots, as shown in Table A2 (Appendix B).
(iv) Facilitating condition: FC was measured by four items that were adopted from the work of [27,56,57].As can be seen from Table A2 (Appendix B), the means of the four items range between 5.41 (±1.569) and 6.07 (±1.216), revealing agreement on how important technological resources are to chatbot use.
(v) Hedonic motivation: HE was measured by three items that were adopted from the work of [26,27].Table A2 (Appendix B) shows that the means for the three items that measure the HM construct range between 5.43 (±1.478) and 5.50 (± 1.458), which shows that the majority of the respondents enjoyed using chatbots.
(vi) Habit: The HT construct was measured by three items that were adopted from the work of [26,27].Table A2 (Appendix B) presents the descriptive statistics of the HT construct.The means of the three measured variables HT1, HT2, and HT3 ranged between 3.80 (±2.325) and 4.62 (±2.209), which indicates that using a chatbot was not a habit for the students.
(vii) Behavioral intention: The BI construct was measured by three items that were adopted from [58][59][60][61].Table A2 (Appendix B) provides a descriptive analysis of the BI construct.The means of the measured variables of BI ranged between 4.57 (±2.024) and 5.13 (±1.822).The results show that the students had a good level of agreement on BI.
(viii) Use: USE is a dependent construct in the UTAUT2 model proposed by [26].Nine items were adopted from [26,62,63].The descriptive analysis of the USE construct (Appendix B, Table A2) shows that the means of the measured variable for USE1 to USE9 ranged between 4.19 (±1.869) and 6.31 (±1.399).The majority of the mean values are greater than four, meaning that the students had a good level of agreement on this variable (Table A2 (Appendix B)).

Testing the Normality Assumption 4.3.1. Evaluating Sample Size
SPSS version 25 was used to conduct the analysis.The number of participants in this study is 431.To test whether this sample size is adequate for further analysis, specific tests were undertaken.The first test was to measure the sampling adequacy using KMO.KMO values range between 0 and 1.Values higher than 0.6 indicate a satisfactory sample size [64,65].Table 2 shows the KMO value of 0.924; it indicates that the dataset is very suitable for further analysis (conceptual model).The second test is Bartlett's Test.The Bartlett Test of Sphericity measures the relationship between variables.In the Bartlett Test, a p-value less than 0.05 is satisfactory [65], and in this study, it is less than 0.001, which means that the data are suitable for further analysis [66].

KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.924

Model Testing/Evaluation
The reflective measurement model consists of several tests (Table 3), which include internal consistency reliability, indicator reliability, convergent validity, and discriminate validity [66].In the first test, internal consistency reliability, a satisfactory value is higher than 0.7 [67].
(i) Internal consistency reliability and composite reliability: Usually, Cronbach's alpha is used to test the internal consistency reliability of the measurement model.However, in PLS-SEM, the internal consistency reliability of the measurement model is evaluated using CR instead of Cronbach's alpha [68].Cronbach's alpha is not suitable for PLS-SEM because it is sensitive to the number of items in the scale, and this measure is also found to generate severe underestimation when applied to PLS path models [68,69].The composite reliability (CR) of PE, EE, SI, FC, HM, HT, BI, and USE is 0.934, 0.872, 0.952, 0.806, 0.934, 0.936, 0.938, and 0.696, respectively, indicating a high level of internal consistency reliability [68,70].In exploratory research, satisfactory CR is achieved with a threshold level of 0.50 or higher [71], but not exceeding 0.95 [72].Our AVE values are greater than 0.5, which is above the satisfactory criterion.The overall result values indicate that the convergent validity for all the constructs is satisfactory.Table 3 also shows that the model meets the requirement for discriminant validity efficiently.Therefore, the model could be used to test the causal relationships hypothesized.
(ii) Indicator reliability: This is examined to ensure that the latent variables accurately represent the constructs, as indicator reliability is a condition for validity.The outer loading threshold is set at 0.4.Thus, any indicator with a value that is less than 0.4 is excluded from the model [68,72].This was so for USE2, USE3, USE6, and USE7.However, if the outer loading value ranges between 0.4 and 0.7, a loading relevance test is required to decide whether to retain or delete the indicator from the model.Five measured variables were in the range between 0.4 and 0.7: EE1, FC2, USE1, USE4, and USE5, with values of 0.543, 0.461, 0.513, 0.535, and 0.681, respectively, as shown in Table 4 and Figure 7.The loading relevance test is Cronbach's alpha, CR, and AVE.In a loading relevance test in the PLS model, weak indicators are deleted just in case they lead to increases in a construct's AVE and CR over the threshold (0.5).All of our indicators were retained as their outer loading exceeded the threshold [72], except for USE9.Although it has a higher value, deleting it improves the outer loadings for the other USE indicators.(iii) Convergent Validity: The third test in the reflective measurement mode vergent validity.Convergent validity presents the model's ability to explain the v of the indicator.According to [73], AVE confirms convergent validity, which sh greater than 0.5 [67].The AVE for the latent constructs BI, EE, FC, HM, HT, PE, USE was 0.834, 0.64, 0.523, 0.826, 0.83, 0.78, 0.872, and 0.374, respectively.All va above the minimum threshold [68,71] except for USE.The CR for the latent constr EE, FC, HM, HT, PE, SI, and USE was 0.938, 0.872, 0.806, 0.934, 0.936, 0.934, 0.9 0.701, respectively.According to [50], the model confirms convergent validity w AVE is greater than 0.5 and CR is higher than the AVE for all constructs [50,74].T plies to all the constructs in this model, confirming convergent validity, as shown 5.  (iii) Convergent Validity: The third test in the reflective measurement model is convergent validity.Convergent validity presents the model's ability to explain the variance of the indicator.According to [73], AVE confirms convergent validity, which should be greater than 0.5 [67].The AVE for the latent constructs BI, EE, FC, HM, HT, PE, SI, and USE was 0.834, 0.64, 0.523, 0.826, 0.83, 0.78, 0.872, and 0.374, respectively.All values are above the minimum threshold [68,71] except for USE.The CR for the latent constructs BI, EE, FC, HM, HT, PE, SI, and USE was 0.938, 0.872, 0.806, 0.934, 0.936, 0.934, 0.952, and 0.701, respectively.According to [50], the model confirms convergent validity when the AVE is greater than 0.5 and CR is higher than the AVE for all constructs [50,74].This applies to all the constructs in this model, confirming convergent validity, as shown in Table 5. (iv) Discriminant Validity: Discriminant validity is the last test in the measurement model.According to [67], the indicator loading value should be more than all of its crossloadings.As shown in Tables 4 and 5, all the indicator loadings are higher than their cross-loadings [66].

Formative
After completing the reflective measurement test, the next step was to perform a formative measurement to assess the weight and loading of the indicator.According to [75], the indicators in the measurement model have no errors associated with them.Therefore, bootstrapping is used to estimate the significance of the indicators.In this study, SmartPLS3 used 5000 bootstrap samples before providing the report [66], which is shown in Figure 8.  [67], the indicator loading value should be more than all of its crossloadings.As shown in Tables 4 and 5, all the indicator loadings are higher than their crossloadings [66].

Formative Measurement
After completing the reflective measurement test, the next step was to perform a formative measurement to assess the weight and loading of the indicator.According to [75], the indicators in the measurement model have no errors associated with them.Therefore, bootstrapping is used to estimate the significance of the indicators.In this study, SmartPLS3 used 5000 bootstrap samples before providing the report [66], which is shown in Figure 8.A structural model R-squared (R 2 ) shows the ability of the model to explain the phenomena, as shown in Table 3 and Figure 6.The R 2 values for BI and USE are 0.917 and 0.114, respectively.BI explains 91% of the variance in the model while USE explains only 11%, which is very strong for the former and weak for the latter.The R 2 of 38% can be considered as significant [67].(i) Hypothesis testing: Figure 7 shows the path coefficient after performing bootstrapping using SmartPLS3.As can be seen in Table 6, the results of the bootstrapping A structural model R-squared (R 2 ) shows the ability of the model to explain the phenomena, as shown in Table 3 and Figure 7.The R 2 values for BI and USE are 0.917 and 0.114, respectively.BI explains 91% of the variance in the model while USE explains only 11%, which is very strong for the former and weak for the latter.The R 2 of 38% can be considered as significant [67].
(i) Hypothesis testing: Figure 8 shows the path coefficient after performing bootstrapping using SmartPLS3.As can be seen in Table 6, the results of the bootstrapping show that four hypotheses were supported, as follows: HT and BI (H6, p = 0.00); BI and USE (H7, p = 0.00); PE and BI (H1, p = 0.00); and EE and BI (H2, p = 0.018).However, three hypotheses were rejected: FC and BI (p = 0.071); HM and BI (p = 0.082); and SI and BI (p = 0.086).The four supported hypotheses will be used as the basis in iteration 3 to develop chatbot features.(a) Multiple Group Analysis-age moderator: Age was separated into five groups in the questionnaire as follows: <18, 18-21, 22-25, 26-29, and ≥30 years old.Before conducting Multiple Group Analysis, age was divided into two levels: less than or equal to 21 years old and greater than 21 years old.Out of the 431 participants, 244 were in the low age group (LA), while 187 were in the high age group (HA).This section investigates whether age moderates the effects of EE, FC, HT, HM, PE, and SI on BI.To support the relationship, the p-value should be <0.05 or >0.95.Table 7 shows that age moderates the effect of some relationships: BI and USE (p = 0.959, supporting H8a7), and PE and BI (p = 0.959, supporting H8a1).However, age does not moderate the effects of the other relationships, as follows: EE and BI (p = 0.497, rejecting H8a2); FC and BI (p = 0.395, rejecting H8a4); HM and BI (p = 0.105, rejecting H8a5); HT and BI (p = 0.278, rejecting H8a6); and SI and BI (p = 0.307, rejecting H8a3).(b) Multiple Group Analysis-gender moderator: Out of 431 respondents, there were 234 males and 197 females.As can be seen in Table 8, gender moderates the relationship between HT and BI (p = 0.978, supporting H9b6) and the relationship between EE and BI (p = 0.022, supporting H9b2).However, age does not moderate the relationship between BI and USE (p = 0.766, rejecting H9a7); between FC and BI (p = 0.818, rejecting H9b4); between HM and BI (p = 0.508, rejecting H9b5); between PE and BI (p = 0.225, rejecting H9b1); or between SI and BI (p = 0.125, rejecting H9b3).(c) Multiple Group Analysis-experience moderator: A descriptive analysis of the experience moderator shows four levels of experience.Two data categories were observed as follows: (1) no or low experience (NLE), which refers to participants with no experience or a low level of experience of using a chatbot, numbering 238 out of 431; and (2) experienced participants (E) with some experience or a high level of experience of using chatbots, numbering 168 participants.As shown in Table 9, experience moderates the effects of two relationships, which are BI and USE (p = 0.95, supporting H10c7), and SI and BI (p = 0.95, supporting H10c3).However, experience does not moderate the relationship between EE and BI (p = 0.40, rejecting H10c2); FC and BI (p = 0.80, rejecting H10c4); HM and BI (p = 0.30, rejecting H19c5); HT and BI (p = 0.10, rejecting H10c6); or PE and BI (p = 0.80, rejecting H10c1).(d) Multiple Group Analysis-attendance: Descriptive analysis of attendance shows that the attendance rate is very high.Two groups were created: low attendance (LA) and high attendance (HA).Attendance significantly moderates the relationship between BI and USE (p = 0.048, supporting H11d7), as shown in Table 10.However, attendance does not moderate the relationship between EE and BI (p = 0.688, rejecting H11d2), the relationship between FC and BI (p = 0.804, rejecting H11d4), HM and BI (p = 0.731, rejecting H11d5), HT and BI (p = 0.433, rejecting H11d6), PE and BI (p = 0.136, rejecting H11d1) or SI and BI (p = 0.718, rejecting H11d3).In the following Table, HA refers to the high-attendance group and LA refers to the low-attendance group.(e) Multiple Group Analysis-engagement with VLEs: A descriptive analysis of the engagement with VLEs shows a high level of engagement.The mean was 6.5 out of 7. Therefore, engagement with VLEs was divided into two groups: low engagement (<6) with only 57 participants, and high engagement (6-7) with 374 participants.The results of the multiple group analysis are presented in Table 11.Engagement with VLEs significantly moderates the relationship between FC and BI (p = 0.964, supporting H11e4).However, it does not moderate the relationship between BI and USE (p = 0.405, rejecting H11e7); the relationship between EE and BI (p = 0.466, rejecting H11e2); HM and BI (p = 0.103, rejecting H12e5); HT and BI (p = 0.288, rejecting H11e6); PE and BI (p = 0.749, rejecting H11e1); or the relationship between SI and BI (p = 0.124, rejecting H11e3).(f) Multiple Group Analysis-educational level: Based on a descriptive analysis of the educational level, two groups were created as follows: (1) low educational level, comprising level 1 and level 2 with 238 students; and (2) high educational level, comprising placement/level 3 students, with 158 students.The remaining were Master's students.Table 12 presents the results of the multi-group analysis.The results show that educational level has no moderating effects on any relationship, so all hypotheses were rejected.The results were as follows: the relationship between BI and USE (p = 0.87, rejecting H13F7); EE and BI (p = 0.71, rejecting H13F2); FC and BI (p = 0.81, rejecting H13F4); HM and BI (p = 0.11, rejecting H13F5); HT and BI (p = 0.81, rejecting H13F6); PE and BI (p = 0.36, rejecting H13F1); and SI and BI (p = 055, rejecting H13F3).(g) Multiple Group Analysis-performance (grade): As indicated in Table 13, the results of the Multiple Group Analysis reveal that grade has no moderating influence on any connection.This covers the relationships between BI and USE (p = 0.216, rejecting 0H14f7); EE and BI (p = 0.158, rejecting H14f2); FC and BI (p = 0.328, rejecting H14f4); HM and BI (p = 0.521, rejecting H14f5); HT and BI (p = 0.816, rejecting H14f6); and PE and BI (p = 0.336, rejecting H14f).

Discussion
UTAUT2 has been used to evaluate students' acceptance and use of technology in educational settings, with technology referring to the Learning Management System (LMS) [76], mobile-based educational applications [77], lecture capture systems [78], the MOOC platform [79], Google Classroom [80], the e-learning system [81], mobile E-textbooks [74], and mobile learning [82].
From a theoretical standpoint, this study has added to the literature base on technology adoption and acceptance models and theories by extending the UTAUT2 model to this new setting.This study examines the applicability of UTAUT2 in a fresh context (chatbots), with a new consumer (students), and in a new cultural setting (the United Kingdom), which is a significant step forward in the development of a theory.To our knowledge, no research has been conducted on students' acceptance and use of UTAUT2 chatbots in an educational setting, specifically in UK universities.This study aims to fill this gap by investigating the acceptability and use of chatbots by undergraduate and Master's students at a UK university.
According to some prior studies [27,34,83], performance expectancy is a crucial prerequisite for chatbot usage intent.Chatbots are used to collect information, and the best reason for students' future use of chatbots is that they fulfill the user's needs.Performance expectancy is the key predictor of user adoption of technology in both mandatory and voluntary settings, according to Morosan et al. [84].HEIs should think about how to create and develop these chatbots in order to provide students with a useful tool that will help them learn more successfully.This study is in line with previous studies such as by [26,27,34,35], who found that PE has a positive effect on behavioral intention to use chatbots.This result contradicts with previous studies such as by [85], who found that PE has no effects on behavioral intention to use technology.
Effort expectancy is also an important requirement for chatbot usage intent [27].Effort expectancy and its latent variable has been shown to be significant in many research studies and proven to work as a predictor of user intention to adopt new technology [26,34,35].This result is in line with previous studies such as by [26,27,34].This result contradicts the finding of previous studies such as by [35,85], where EE had no effects on behavioral intention to use chatbots.
A logical explanation for students' future usage of chatbots is the fact that they provide them with answers to their questions in the minimum amount of time and in an easy way.Habit is a vital requirement for chatbot usage intent.Students who are familiar with chatbot technology have the habit of asking chatbots for certain information; therefore, they will be more willing to use chatbots to seek any type of information.The result of this study is in line with previous studies such as [26,27,85].However, this result contradicts the findings of previous studies such as by [34], where habit had no effects on behavioral intention to use chatbots.
It is critical to offer specific advice regarding the function of personas in influencing students' acceptance and use of chatbot technology in the context of online and multicultural teaching after COVID-19.Following the COVID-19 epidemic, there has been a paradigm shift in the educational scene that has resulted in an increase in online and multicultural teaching approaches.As researchers, we offer particular suggestions for how persona chatbots should be included in this changing educational environment.First and foremost, by offering individualized help and attending to each student's unique needs, persona chatbots can improve the online learning experience by creating a more stimulating and encouraging virtual environment.It is important to carefully develop the personas that chatbots embody to reflect cultural diversity and make sure they speak to the experiences and backgrounds of a global student body.Personas should also be incorporated in accordance with educational goals, accommodating different learning methods and preferences.The deployment of chatbots should be followed by thorough training and orientation sessions to guarantee student acceptance.The relatability and efficacy of chatbots in a variety of educational contexts will be improved by this multicultural approach to provide an inclusive learning environment.
Secondly, interactive learning is important, as advocated by [86] in their study where they showed how active learning plays an important role.They emphasized the use of digital devices, particularly smart phones, as well as a range of technologies, such as LMSs, simulations, and modeling.They also demonstrated that a coherent approach to student-led interactive learning should be put into practice in real-world engineering courses.This innovative method uses the power of digital tools to improve learning overall while fostering a collaborative and engaging classroom environment.Moreover, persona chatbots are a useful tool for sustaining the momentum of online learning beyond COVID-19.They provide ongoing assistance to a wide range of learners and enable a smooth transition between in-person and virtual learning settings.These chatbots also have the ability to adjust to the changing demands of students, which helps to provide an inclusive and cutting-edge learning environment.Ultimately, the effectiveness and adoption of chatbots in virtual and multicultural learning environments are greatly influenced by the comprehension and incorporation of varied personalities in chatbot design.
In relation to user experience with chatbots, it is necessary to specify the chatbots being used and the types of interactions undertaken (active tutorship, adaptive learning, question and answer, self-assessments).In this study, the first part was covered in the survey, but the second part can be considered for future work in a different context [87].
suggest using an adaptive learning strategy to improve learning time and learner interest.This tactic involves tailoring learning routes according to each user's past knowledge, using adaptive learning algorithms and an LMS platform.Through in vitro testing, the research seeks to validate the efficacy of this approach, with potential applications for businesses and organizations to maximize training [87].
A study by [88] investigates the integration of adaptive learning and data mining to improve e-learning with an emphasis on incorporating adaptive technologies into an open-source LMS.By evaluating data and customizing information to unique learning preferences and strengths, it allows for personalized learning routes for students.To optimize training efficacy, the system automates the selection of training materials.Plans for practical testing are included, along with a discussion of the difficulties in choosing input variables and methods.All things considered, the study provides insightful information and practical tips for enhancing e-learning with adaptive technology [88].

Conclusions
This study introduces a proposed extended UTAUT2 framework for understanding students' acceptance and use of chatbots.A pilot study ensured the reliability and clarity of the survey questions.The study's findings are twofold.Firstly, they elucidate the interactions between exogenous (PE, EE, FC, SI, HM, and HT) and endogenous (BI and USE) factors.Secondly, the role of moderators in influencing the proposed relationships is explored, encompassing age, gender, experience, educational level, grade, attendance, and interactions with VLEs.Overall, the research underscores the influence of social and organizational aspects on students' attitudes toward chatbot technology adoption and use.The results of this study show that effort expectancy, performance expectancy, and habit emerged as pivotal predictors of student acceptance and engagement with chatbot technology.Regarding the moderators, educational level and performance have no moderating effects on any relationship in the model.However, age, experience, and attendance have a moderating effect on the relationship between BI and USE.Also, they have a moderating effect on PE, EE, and SI.Moderator importance could also direct design through inclusion in the persona design process.
Certain limitations warrant consideration, such as this study's confined generalizability due to data collection being limited to a specific academic field (Computer Science) and geographic location (Brunel University London).To address this, future research could encompass diverse departments, universities, and global settings.Additionally, while this study predominantly utilized quantitative methods, incorporating qualitative approaches, such as interviews, could provide more comprehensive insights.
The predictive model remains open to refinement.Future investigations might incorporate additional constructs (security, trust, or system quality) and moderators (educational level or engagement level) to broaden the scope of chatbot utilization across various contexts.Embracing a mixed-methods approach, combining quantitative and qualitative methodologies, could enhance the depth of explanatory data gathered for research objectives.
Furthermore, with the inclusion of ChatGPT, chatbots could have a wide-ranging effect on online learning, bringing both possibilities and difficulties.ChatGPT will play an essential role in determining how students accept and use chatbots.Students can use it easily because of its interactive and user-friendly interface.It can answer questions, offer clarifications, and encourage participation from students.Educational institutions can run awareness campaigns emphasizing ChatGPT's advantages over more conventional teaching techniques in order to increase acceptance.However, at the same time, it is important to recognize ChatGPT's limitations.Responses from ChatGPT are produced using patterns that are inferred from data, which could include biases and errors.A further drawback is that ChatGPT may provide responses that are incorrectly contextualized due to a lack of true comprehension [89].In order to avoid bias, regular updates, ongoing monitoring, and the integration of varied datasets are recommended approaches.Integrating human oversight combining the benefits of AI with human knowledge to guarantee accurate and contextually relevant information is one possible way to address this issue.To create a helpful and morally upright learning environment, it is crucial to find a balance between utilizing ChatGPT's advantages and resolving its drawbacks.

Analysis Step Aim Description
Cronbach's Alpha, inter-item correlation and item-to-total correlation for the pilot study To measure positivity of variables used with each construct, to ensure that all constructs have the required reliability The value of inter-item correlation should exceed 0.3, while item-to-total correlation should exceed 0.5.

Descriptive statistics Overview of the preliminary data analysis of the collected data
Acquire further details about the collected data (descriptive-frequencies).

(d) Homogeneity of variance in the dataset
Homogeneity is defined as "the assumption of normality related with the supposition that dependent variable(s) display an equal variance across the number of an independent variable (s)" [53] In multivariance analysis, it is critical to specify the presence of homogeneity of variance because it might cause invalid estimation of the standard errors [67].
(e) Multicollinearity Multicollinearity appears when there are two or more variables that are highly correlated to each other [54] Different scholars have suggested different values as satisfactory.For example, according to [54], a correlation value of 0.7 or higher is a reason for concern.[53] state that a correlation value over 0.8 is highly problematic.
Descriptive analysis of the main study Providing a foundational understanding of the data at hand.Used to understand data distribution and summarize large datasets.

Analysis Step Aim Description
Evaluating sample size using KMO To test whether the sample size is adequate for further analysis KMO values range between 0 and 1.Values higher than 0.6 indicate satisfactory sample size [64,65].

Internal consistency reliability and composite reliability
In the Partial Least Squares Structural Equation Modeling (PLS-SEM) approach, the internal consistency reliability of the measurement model is evaluated using composite reliability (CR) instead of Cronbach's alpha [68] In exploratory research, satisfactory composite reliability is achieved with a threshold level of 0.60 or higher, according to [71].

Indicator Reliability
To ensure that the latent variables accurately represent the constructs, indicator reliability is examined as a condition for validity The outer loading threshold is set at 0.4; therefore, any indicator with a value less than 0.4 is excluded from the model [68,72].

Convergent Validity
Convergent validity reflects the model's ability to explain the variance of its indicators As per [73], average variance extracted (AVE) confirms convergent validity, which is satisfactory at values greater than 0.5 [67].

Discriminant Validity
To ensure the measures are truly reflective of the unique constructs they are intended to assess, thus supporting the reliability, accuracy, and theoretical integrity of the research finding According to [67], the indicator loading value should be greater than all of its cross-loadings.
Formative measure Structural Model using R 2  To show the ability of the model to explain the phenomena R-squared (R 2 ) is used to achieve this.

Multiple Group Analysis
To study the moderators' effects on moderating the relationship in the proposed model These moderators are age, gender, experience, attendance, interaction with VLE, performance (grade), and educational level.
developed the Unified Theory of Acceptance and Use of Technology (UTAUT) to synthesize existing acceptance theories and models, as well as to study student acceptance and use of technology in an organizational context.UTAUT was developed as a result of reviewing eight main theories and models of technology acceptance: the Theory of Reasoned Action (TRA), the Technology Acceptance Model (TAM), the Motivational Model (MM), the Theory of Planned Behavior (TPB), Combined TAM and TPB, the Model of PC Utilization, Diffusion of Innovation Theory (DoI), and Social Cognitive Theory (SCT).UTAUT consists of four constructs, as shown in Figure2, namely performance expectancy, effort expectancy, social influence, and facilitating condition.UTAUT factors (constructs) affect the behavior intention (BI) and usage of technology.Impacting these constructs are four moderators-the age, gender, experience, and voluntariness of use[27].

Figure 2 .
Figure 2. The Unified Theory of Acceptance and Use of Technology [27].

Figure 2 .
Figure 2. The Unified Theory of Acceptance and Use of Technology [27].

Figure 3 .
Figure 3. Extended Unified Theory of Acceptance and Use of Technology [26].

Figure 3 .
Figure 3. Extended Unified Theory of Acceptance and Use of Technology [26].

Hypothesis 10 (
H10c1, c2, c3, c4, c5, c6): Experience moderates the effects of PE, EE, SI, FC, HM, and HT on students' BI and use of chatbot technology.(iv) Physical engagement (represented by attendance): This is a new moderator that stemmed from our proposed persona template/model (Figure 4) as shown in the Introduction section.It is defined as an indicator of the participants' behavioral engagement with the course being studied.This study tests whether attendance moderates the effect of determinants on BI and the use of technology.Hypothesis 11 (H11e1, e2, e3, e4, e5, e6): Attendance moderates the effects of PE, EE, SI, FC, HM, and HT on students' BI and use of chatbot technology.(v) Virtual engagement (represented by the level of engagement with VLEs): This is a new moderator that stemmed from our proposed persona template/model (Figure 4) as shown in the introduction section.It is defined as an indicator of behavioral engagement with the computer science course.This study tests whether virtual engagement with VLEs moderates the effect of determinants on BI and the use of technology.Hypothesis 12 (H12f1, f2, f3, f4, f5, f6): Virtual engagement with VLEs moderates the effects of PE, EE, SI, FC, HM, and HT on students' BI and use of chatbot technology.
(a) Data screening and missing data To ensure no missing values in the collected data Missing values prove problematic when using SEM.(b) Outlier To identify any outlier values as they bias the statistical test It is critical to detect and treat outliers as they bias statistical tests and may affect the normality of the data [53].(c) Testing the normality assumption To ensure that data are normally distributed The reliability and validity of the data are affected when the data are not normally distributed.

Table 1 .
Cronbach's Alpha, inter-item correlation, and item-to-total correlation for the pilot study.

Table 6 .
Results for each hypothesis, path coefficient (B), T-value, significance (p-value) and hypothesis support.
(ii) Multiple group analysis: The following sections cover the moderators' effects on the relationships in the proposed model.These moderators are age, gender, experience, attendance, interaction with VLE, grade (performance), and educational level.

Table 7 .
Result of Multi-Group Analysis-age moderator.
HA refers to high age, LA refers to low age.

Table 8 .
Result of Multi-Group Analysis-gender moderator.

Table 9 .
Result of Multi-Group Analysis-experience moderator.

Table 10 .
Result of Multi-Group Analysis-attendance moderator.

Table 11 .
Result of Multi-Group Analysis-engagement with VLEs moderator.

Table 12 .
Result of Multi-Group Analysis-educational level moderator.

Table 13 .
Result of Multi-Group Analysis-grade moderator.

Table A4 .
Normality of data.