Assessing Efficiency of Prompts Based on Learner Characteristics

Personalized prompting research has shown the significant learning benefit of prompting. The current paper outlines and examines a personalized prompting approach aimed at eliminating performance differences on the basis of a number of learner characteristics (capturing learning strategies and traits). The learner characteristics of interest were the need for cognition, work effort, computer self-efficacy, the use of surface learning, and the learner’s confidence in their learning. The approach was tested in two e-modules, using similar assessment forms (experimental n = 413; control group n = 243). Several prompts which corresponded to the learner characteristics were implemented, including an explanation prompt, a motivation prompt, a strategy prompt, and an assessment prompt. All learning characteristics were significant correlates of at least one of the outcome measures (test performance, errors, and omissions). However, only the assessment prompt increased test performance. On this basis, and drawing upon the testing effect, this prompt may be a particularly promising option to increase performance in e-learning and similar personalized systems.


Introduction
One of the challenges in personalized e-learning initiatives pertains to the selection and consideration of tools based on learner characteristics that may increase performance for different types of learners [1].To date, many user and learner models personalize the learning experience by implementing tutoring support based on perceived or self-reported learner characteristics [2,3].Prompts can help learners to engage more with the material but also reflect on their learning progress [4].They may also remind learners of appropriate learning and questioning strategies, which in turn improve performance in an online environment [5].
A variety of learner characteristics are relevant in self-regulation [6], specifically those that influence "cognitive, metacognitive, motivational, volitional and behavioral processes."( [6], p. 194).Several learner characteristics were considered important for self-regulation in the context of this study.These included five learner characteristics believed to be relevant for the implementation of prompts: willingness to commit to making an effort, need for cognition, processing strategies (surface strategy), computer self-efficacy and confidence.All of which can be considered traits rather than states since they are stable characteristics, which are unlikely to change frequently.In the next section the five learner characteristics on which prompts are based are presented.The assessment of these learner characteristics is referred to as the "learner profile" in the following sections.
In order to facilitate learning, several different and very specific prompts may be employed that focus on problem-solving [15] general instruction [5], self-explanation [16], or encouraging recall, reasoning, or observation [5].Prompts can serve multiple functions and can focus on cognitive learning processes, metacognitive self-regulation, and resource management [17].Many of these prompts are basically scaffolds.They enable learners to make sense of what is required of them when solving a problem.In the domain of science education, a number of reviews exist that provide further insight (e.g., [18]).One study is particularly relevant here.Moreno, Mayer, Spires, and Lester employed a prompt that required students to feedback how much the learning material presented to them also helped them understand the link between plant design and the environment [19].Students were subsequently prompted to recall what is required of them and monitor what they have learned [4], demonstrating how prompts may support self-regulation.
The central aim in the current research was to increase performance through the careful design and implementation of prompts, either by directly targeting maladaptive self-regulatory behavior or by increasing the accuracy of learner's self-assessments of their learning progress to date (via a confidence assessment).These aims are approached by implementing five prompts which are presented next.Instructors and e-learning designers alike aim to increase performance among those learners who generally tend to perform less well-at least as noted in previous performance records.Two potential options are available.For example, a strategy prompt can explain strategies to participants, including examples on how they may succeed.This may also improve self-regulation and, thus, improve performance [20].
While strategy prompts are useful means to influence the learning process, assessment prompts can prompt learners into action by having them provide evidence of learning, increase self-evaluation and by this mean make use of the testing effect [21].The testing effect simply describes the phenomenon that learners are able to recall information better if they have been tested on it [22].Challenging students to recall what they have learned is important since many learners are not effective self-monitors and, therefore, often overestimate their learning success [23].These activities change the learner's self-evaluation of progress made, leading to more accurate judgments [24] (experiment 4).Self-evaluations have been shown to increase performance in computer-supported learning [25].
Reducing overconfidence in those learners that are more likely to be inaccurate self-assessors is another challenge.Overconfidence is particularly an issue for over-confident individuals who actually have lower ability and performance [13,26], as such overconfidence may also affect progression and effort.A verification prompt can be useful here as it may reduce over-confidence or faking (reflected in extremely high scores on certain learning characteristics, such as work effort and need for cognition), as well as assist self-regulation in online settings [26,27].In addition, confidence assessments during the learning process (in form of a confidence prompt) may also reduce overly optimistic learner expectations regarding their learning progress [28].On the other hand, alternative prompts can be used to address underconfidence in learners (even when they actually the ability to do well).Underconfidence, like overconfidence, can have a significant effect, as this may lead to participants to disengage earlier or drop out.An explanation prompt can be helpful to ensure that participants understand the purpose and relevance of the e-module.Such a prompt may potentially reduce concerns among those learners who are less confident and indirectly help them to persist and maintain their willingness to invest efforts during the learning process [20].In addition, a motivation prompt can provide learners with information designed for different levels of user competence to encourage completion, particularly when this process is both lengthy and effortful for learners [29].

Current Study Rationale
Outcomes of e-learning design are often based on comparing specific groups of recipients.We believe that more work is needed to evaluate prompt design and effectiveness by considering not just comparative samples, but also matched participant design.Based on the learner profile a matched case-control design was utilized.Participants were of the experimental group were matched to those participants of the control group, who had identical values on the learner profile.This methodological design enables to evaluate subjects with equivalent scores on the learner profiles perform different when prompted.
The current study will address two research questions (RQ): • RQ1: Do participants perform better when prompted compared to a participants with an identical learner profile when not prompted?• RQ2: In case the prompts do not lead to better performance, can more efficient threshold be determined in terms of performance outcome?
By these means we intend to determine an individual prompting approach tailored to address characteristics relevant to learning and subsequently performance.Since the novelty of the work lies in the presentation of prompts based on cut-off values derived from self-reported data, explicit recommendations for threshold refinement are deduced.The latter is exploratory in nature since no research work addressing efficiency of thresholds for a prompting approach based on a learner profile could be determined.

Design and Methods
The next two sections outline the design (including learning context, learning profile, and basis for prompts), followed by the methods (including a description of the recruitment procedure and participants and follow-up measures).

Design: Learning Context
The following section describes the learning context and the implementation (as well as content) of the prompts within the learning context.The next section describes the e-learning context in more detail.
The study involved two different e-modules of similar length, similar readability indices, the same learning profile, the same number of test questions, and follow-up queries.Both e-modules covered topics related to health at work and teamwork.The five chapters on team development describe the four stages of forming, storming, norming, and performing [30], which are likely to occur when a new team is assigned a task.The five chapters on shiftwork taught about health effects of shiftwork.Each chapter ended with a multiple choice test question.The mean time spent on the e-module where 9.9 min (SD = 4.7 min) and 9.5 min (SD = 5.23) for the e-module on shiftwork and team development respectively.
Descriptive statistics for the modules are presented in Table 1 including the Automated Readability Index (ARI) [31] and the Flesh Reading Ease Readability Formula (FRERF) [32].The ARI and the FRERF are formulae to measure the understandability of texts.The ARI corresponds to the US grade level required to comprehend a text, an ARI score of "8" indicates that a normal seventh grader should have no difficulty understanding a text, whereas a value of "14" indicates that a text is most appropriate for college students.Higher values of the FRERF simply indicate that a text is easier to comprehend.Values from 0 to 30 indicate that a text is difficult and most appropriate for academics, whereas values close to "100" indicate that a text is very easy.The main difference between the indices is that the ARI is based on characters, words and sentences whereas the FRERF incorporates syllables.Since our main aim was to ensure comparability of text difficulty we decided for these rather simple formulae [33].The ARI varies between 8.73 and 11.70, which indicates that college students should not have any difficulty understanding the texts.The latter is confirmed by the FRERF.Note.WC = word count; CH = characters; SEN = sentences; CH/WC = ratio of characters to words; WC/SEN = ratio of word to sentences; ARI = Automated Readability Index; FRE = Flesh Reading Ease Readability Score.

Learning Profile
The e-modules each featured one learning profile at the beginning.The following measures were utilized in the learning profile.

Work Effort
Work effort was measured using five items [34], the first three measuring persistence, the last two measuring work effort intensity.An example item is "When I start an assignment I pursue it to the end".The response options range from (1) "strongly disagree" to (5) "strongly agree".
The scale utilized ranges from 5 to 25, whereas 20 is considered an extremely high work effort score.When the scores fall between 0 and 12, work effort is categorized as low.

Computer Self-Efficacy
Computer self-efficacy was measured using three items [35].An example item is "I feel confident troubleshooting computer problems".The response options range from (1) "strongly disagree" to (5) "strongly agree".The minimum score is three, the maximum score is fifteen.When the scores fall between 0 and 12, computer self-efficacy is categorized as low.

Need for Cognition
Need for cognition was measured using five items [36].An example item is "I would prefer complex to simple problems".The response scale ranged from (1) "extremely uncharacteristic" to ( 5) "extremely characteristic".The score ranges from 5 to 25.When the score ranges between 0 and 15 for women (and 0 to 18 for men), need for cognition is considered low.

Surface Strategy
Surface strategy was measured using three items of the two-factor Study Process Questionnaire [9].An example item is: "In order to understand something, I tend to study more than what may be necessary".To avoid socially desirable responses all items were reverse-coded [37].The Response options for all six items are a five-point Likert scale ranging from (1) "never or only rarely true of me" to ( 5) "always or almost always true of me".The score ranges from 5 to 15 points maximum.
High surface strategy is associated with scores between 13 and 15.

Confidence Assessment
All participants were asked how confident they are that they could answer a test questions about the previous sections.The response options can be selected on a visual analogue scale ranging from 0% to 100%.

Prompts: Design, Thresholds and Implementation
The prompts and the specific instructions for each are presented next.The verification prompt instructed learners as follows: "You have reported an exceptionally high score on e.g., work effort.Please confirm this is the score you meant to indicate!"The explanation prompt stated "This e-module is an important tool for you to familiarize yourself with the topic of team development/effects of shift work on health.Completing this e-module attentively and your own pace can provide you with the skills and knowledge to succeed at work and in subsequent learning units."The strategy prompt outlined: "The following strategies may help you succeed.Remember to check that you have understood the concepts covered so far.To help you, try to summarize, write down important concepts and re-visit difficult parts.When you have read and understood these instructions, please click "ok" to proceed."The motivation prompt reminded learners that "The e-module was designed for all kinds of learners with different skills and backgrounds.You can improve your performance by remembering your goals for taking this e-module.Focus on these goals for taking this e-module and how you have successfully learned in the past.Please feel assured that with appropriate effort, all participants should be able to complete the e-module successfully.Your hard work will pay off!"The assessment prompt told learners that "In order to assess progress, we would hereby like you to enter five key words that summarized what you have learned so far in this e-module.This small assessment serves to ensure that you as a learner can assess your own progress successfully."Figure 1 shows how the prompts were positioned in each e-module in relation to the chapters.Table 2 presents an overview of prompts, the learning characteristics of interest and thresholds used to trigger prompts.Thresholds were derived from data obtained in previous studies by the authors [38][39][40].We found that women tend to have lower judgment of learning scores overall.In order to determine when to trigger encouraging prompts, we decided to lower the confidence threshold for women by about 3%-5% compared to men, who rate their confidence on average as higher than women.Furthermore we found significant age differences regarding judgment of learning in several of our preliminary studies.Older participants reported being more confident about their learning than any younger group of participants.After correcting their confidence downward, they nevertheless continued to be over-confident as indicated by the (difference) scores produced between confidence and test items.Gender differences in self-rating have also been reported for need for cognition [41].Subsequently, thresholds were adapted to age and gender for these scales.Note.f = female; m = male.The above learning characteristics were based on their relevance to learning.The thresholds were based on previous research conducted by the authors using the same learning characteristics.
key words that summarized what you have learned so far in this e-module.This small assessment serves to ensure that you as a learner can assess your own progress successfully."Figure 1 shows how the prompts were positioned in each e-module in relation to the chapters.Table 2 presents an overview of prompts, the learning characteristics of interest and thresholds used to trigger prompts.Thresholds were derived from data obtained in previous studies by the authors [38][39][40].We found that women tend to have lower judgment of learning scores overall.In order to determine when to trigger encouraging prompts, we decided to lower the confidence threshold for women by about 3%-5% compared to men, who rate their confidence on average as higher than women.Furthermore we found significant age differences regarding judgment of learning in several of our preliminary studies.Older participants reported being more confident about their learning than any younger group of participants.After correcting their confidence downward, they nevertheless continued to be over-confident as indicated by the (difference) scores produced between confidence and test items.Gender differences in self-rating have also been reported for need for cognition [41].Subsequently, thresholds were adapted to age and gender for these scales.Note.f = female, m = male.The above learning characteristics were based on their relevance to learning.The thresholds were based on previous research conducted by the authors using the same learning characteristics.All participants in the experimental group faced one to three prompts maximum.Prompts could be triggered in a specific order (cf. Figure 1).The verification prompt is triggered upon completion of the learning profile (when individuals report extremely high work effort and need for cognition that may be indicative of faking or disengagement).The explanation prompt explains the purpose/relevance of the e-module and is situated at the beginning of the first chapter to participants All participants in the experimental group faced one to three prompts maximum.Prompts could be triggered in a specific order (cf. Figure 1).The verification prompt is triggered upon completion of the learning profile (when individuals report extremely high work effort and need for cognition that may be indicative of faking or disengagement).The explanation prompt explains the purpose/relevance of the e-module and is situated at the beginning of the first chapter to participants with low work effort scores.The confidence prompt is presented at the end of the first chapter, and constitutes the basis for two further prompts: a strategy prompt and a motivation prompt (process prompts).The strategy prompt, if triggered, is presented in response to high surface strategy and low need for cognition.Alternatively, the motivation prompt is triggered by low confidence in learning or low computer self-efficacy.The assessment prompt is triggered in reaction to extremely high confidence and high need for cognition reported in the learning profile.This prompt includes a request to participants to generate keywords and presented in response to possible overconfidence.

Methods: Procedure and Participants
All participants were recruited via their instructors at two Midwestern universities in the USA, a private English-speaking university in Germany and a university in the UK.Participation was voluntary.The mode of recruitment of participants was identical across both e-modules (announcement by instructor in the class).All participants were social science undergraduates who participated for extra credit.Once participants entered the online e-modules and read the study information, they could only proceed if they consent to participate.The learning profile was presented next (items assessed work effort, computer self-efficacy, need for cognition, surface strategy).Upon completion of the profile, participants were randomly allocated into the control group (no prompts) or the experimental groups (prompts triggered by learning profile) on an allocation rate of 1:1:1.All groups subsequently went through the e-module and completed five test sections, followed by a demographics section and a debrief statement.
As the primary goal was to determine if there were any module-independent effects of the prompts, the two datasets obtained from the two e-modules (n 1 = 225; n 2 = 273) were combined into one (N = 413).243 participants (37%) belonged to the control group (not-prompted) and 413 (63%) to the experimental group (received a prompt based on their learning profile).Fifty participants were male (20.6%) and 193 female (79.4%) in the control group and aged 17 to 46 years old (M = 22.02, SD = 4.13).The experimental group included 107 male participants (25.9%) and 306 female (74.1%) with an average age range of 17 to 52 (M = 20.63,SD = 3.16).To facilitate readability the expression control and experimental group is used in throughout the following sections.

Methods: Outcome and Demographic Measures
A number of measures were collected during or at the end of the e-learning modules.These did not feature in the learning profile.

Test Performance
All participants were asked five multiple-choice questions embedded into the e-module, the maximum score participants could obtain were 15 points.Each chapter of the module featured one test question with one or more correct responses.Two test scores were created for each dataset.One value considered overall performance of participants.The corrected test performance was computed by deducting the number of errors made by the participants from their overall performance score.This step was important as participants were able to select all response options; not correcting for test error could skew the data and make individuals who were guessing appear more successful.Test performance was measured by correct responses to multiple-choice questions, errors, and omissions.In addition to errors the number of omissions was also recorded.Omissions are defined as correct response options participants did not select.Subsequently omissions are the reverse function of the test score.

Demographics
Demographical information included gender and age.

Results
The results section was split into three subsections.The subsections outline the general characteristics of the measures used, the evaluation of prompt effectiveness, and the specific prompt results.

Descriptives and Scale Performance for Both Conditions
Descriptives and scale descriptives for the control and experimental group are included separately (see Table 3).Scales performed similarly well in both subsamples.The two groups did not differ significantly in terms of their overall learning characteristics.As noted in Table 3, there were no significant differences for any of the learning characteristics between the control-and experimental group.In all, 173 (41.9%) of the participants in the experimental group received the strategy prompt, 62 (15%) participants received the assessment prompt, 23 (5.6%) participants received the explanation prompt, 37 (9%) participants received the verification prompt and 399 (96.6%) received the motivation prompt.The majority of learner characteristics correlated significantly with the test score in both conditions (Table 4).

Analyses to Assess Prompt Effectiveness
The main goal of the current analysis was to assess whether receiving prompts at specific levels of learner characteristics will be effective as predicted.To respond to this question the following two steps of analysis were conducted.The first step consisted of a confirmatory analysis to assess in-between group comparisons for performance at the threshold-level proposed (cf.Table 1).Group comparisons between the experimental and control group were conducted using an ANOVA bootstrap approach [42].This analysis examined whether specific learner characteristics (such as low need for cognition) result in underachievement if no prompt is provided among matched learners (e.g., learners in the control group who had the same scores in the learner profile).The second step of analysis evaluated the effectiveness of existing prompts and considered the possibility of identifying more efficient thresholds.This step enabled us to examine whether the threshold of prompts used in our study may need to be adjusted-presenting a starting point for recommendations for future studies.
The second step of the analysis was only conducted when no significant between-group differences were found in the first step of the analysis.

Results for Different Prompts
The following section presents the results for every prompt.Findings are summarized in Table 5.

Strategy Prompt
In the first step of the analysis, participants of the control group were matched to those of the experimental group concerning their values for need for cognition.No significant performance differences for female participants with a value ≤15, and males with a value ≤18, were determined (thresholds determined for prompts; p > 0.05).In other words, the performance of individuals with identical scores in the control and experimental group was not significantly different.This contradicts our expectations and suggests that the threshold chosen was not efficient for the population tested.The strategy prompt based on surface strategy, was only triggered two times.Subsequently, no valid group comparisons could be made.A lower threshold for the prompt has to be discussed for future implementations.In the second step of the analysis, we evaluated whether better thresholds for this prompt could be determined.Women who scored 15 and men who scored 17 on the need for cognition scale did indeed perform better than participants with identical scores in the control group.

Assessment Prompt
The assessment prompt was triggered based on participants reporting very high judgment of learning levels (≥91 and 94, respectively, for participants aged below and older than 34, see Table 1).At these thresholds significant differences emerged between the experimental and the control group in terms of the test score/omissions (both F(1, 54) = 10.92,p < 0.005) and the corrected test score (F(1, 57) = 6.100, p < 0.05).Participants in the experimental group with the same scores as participants in the control group would significantly outperform their counterparts.The performance results obtained are further visualized in Figure 2. Whereas the experimental group achieved an average test score of 12.23, a corrected test score of 9.23 and made only 2.77 omissions, the control group only achieved a test score of 10.29, a corrected test score of 7.71 and made 4.71 omissions.In this case, we did not run the second step of the analysis as for the strategy prompts since significant results of the first step suggest that an efficient threshold level has been chosen.

Explanation Prompt
In the first step of the analysis, participants of the control group were matched to those of the experimental group concerning their values for work effort.No significant performance differences for female participants with a value ≤15 and males with a value ≤18 were determined (thresholds determined for prompts; p > 0.05).No more efficient threshold level could be determined in step 2 of the analysis.

Verification Prompt
For work effort (threshold 20), we observed no significant difference in confidence between participants of the two groups (p > 0.05).For need for cognition (threshold 25), we observed the same non-significant trend.Please note, however, that judgment of learning, similar to need for cognition and work effort, correlated positively with performance outcomes.This does emphasize the importance of this form of self-assessment in online learning.In other words, there was no evidence that participants were overconfident.Indeed, they tended to be very accurate self-assessors of their learning progress.This means the verification prompt did not actually address overconfidence as overconfidence was not a concern with our participants.In response to this, we omitted the second step of the analysis as optimal thresholds are only required when overconfidence indeed exists-an issue for future research.

Motivation Prompt
No significant differences were found at the original threshold levels (p > 0.05).Subsequently we proceeded to the second step of the analysis to investigate the possibility of improving thresholds.Indeed, significant performance differences were found when we excluded participants who had scored exactly 12 on the computer self-efficacy scale.Whereas the control group reached an average test score of 10.68 and made 4.32 omissions, the experimental group actually performed significantly better and obtained a test score of 11.16 while making only 3.84 omissions.This suggests better performance for participants with low computer self-efficacy in the experimental group when prompted.The threshold analysis for the judgment of learning score revealed a similar pattern.Performance differences emerge below the threshold of 75 for judgment of learning.Whereas the control group reached a test score of 10.78 and made 4.77 omissions, the corresponding participants of the experimental group reached a higher test score of 12.12 and made fewer (only 2.88) omissions.

Explanation Prompt
In the first step of the analysis, participants of the control group were matched to those of the experimental group concerning their values for work effort.No significant performance differences for female participants with a value ≤15 and males with a value ≤18 were determined (thresholds determined for prompts; p > 0.05).No more efficient threshold level could be determined in step 2 of the analysis.

Verification Prompt
For work effort (threshold 20), we observed no significant difference in confidence between participants of the two groups (p > 0.05).For need for cognition (threshold 25), we observed the same non-significant trend.Please note, however, that judgment of learning, similar to need for cognition and work effort, correlated positively with performance outcomes.This does emphasize the importance of this form of self-assessment in online learning.In other words, there was no evidence that participants were overconfident.Indeed, they tended to be very accurate self-assessors of their learning progress.This means the verification prompt did not actually address overconfidence as overconfidence was not a concern with our participants.In response to this, we omitted the second step of the analysis as optimal thresholds are only required when overconfidence indeed exists-an issue for future research.

Motivation Prompt
No significant differences were found at the original threshold levels (p > 0.05).Subsequently we proceeded to the second step of the analysis to investigate the possibility of improving thresholds.Indeed, significant performance differences were found when we excluded participants who had scored exactly 12 on the computer self-efficacy scale.Whereas the control group reached an average test score of 10.68 and made 4.32 omissions, the experimental group actually performed significantly better and obtained a test score of 11.16 while making only 3.84 omissions.This suggests better performance for participants with low computer self-efficacy in the experimental group when prompted.The threshold analysis for the judgment of learning score revealed a similar pattern.Performance differences emerge below the threshold of 75 for judgment of learning.Whereas the control group reached a test score of 10.78 and made 4.77 omissions, the corresponding participants of the experimental group reached a higher test score of 12.12 and made fewer (only 2.88) omissions.

Discussion
The future of learning will require more and more engagement with online resources, such as e-modules.As a result, it becomes increasingly important that the online learning materials offer appropriate guidance to the learners-and recognize the unique aspects of an individual participant.Such guidance may come in numerous forms, including prompts that enable the learner to regulate all of those (cognitive, motivational, or behavioral) processes that are particular to the individual learner's approach [6].The present work attempted to outline the merits of personalized support during e-learning, based on a variety of learner characteristics and the work around self-regulation.
Prompting as utilized in this study provides designers with one way in which they may be able to increase performance of those who may be more likely to underperform based on their characteristics.Different prompts may do so by either supporting recall or encouraging learners to become more cognizant of their actual performance in computer-supported learning systems [25].Using existing personality and strategy evidence, we demonstrated one approach to personalization.Our work, therefore, builds on the previous research on the importance of considering different learning characteristics such as personality and learning strategies in online settings [1][2][3].
The focus of our study was to find answers to two research questions.Specifically, we asked if participants perform better when prompted compared to a participants with an identical learner profile when not prompted (RQ1).This was examined using matched samples from both the control group (not prompted) and the experimental group.We found that prompting (RQ1) was only effective when the assessment prompt is presented.This finding builds on the ideas around the testing effect [21], that by having learners recall what they have learned, they will be able to recall information better [22].
A general summary provides starting points for recommendations.First, confidence was the most robust predictor of performance as greater confidence correlated positively with performance.Good performance in previous assessments may raise future performance expectations.This means that collecting confidence reports may generate a helpful baseline measure on which to build subsequent prompts in other online systems.However, it may also be worthwhile considering just how robust and fragile confidence is in online assessment (especially because instantaneous feedback can be integrated), as understanding this may provide further insight into the relationship to performance; Second, computer self-efficacy correlated negatively with test performance-an effect that we would expect will diminish over time as learners become more accustomed to new technologies.As we obtained these findings with younger participants (a student sample) rather than older participants (who may not have used e-learning during their studies) suggests that computer self-efficacy may not be merely an element of age or experience, but potentially a measure of perceived capability when assessment is moved from traditional paper to online systems.Given the number of different systems users may encounter online, personalization of a system may need to account for the heterogeneous background of users.
The lack of support for the other prompts may be due to the thresholds selected.This question was tackled as well: In the case the prompts do not lead to better performance, can more efficient thresholds be determined in terms of performance outcome (RQ2)?Specifically, we considered if our prompting thresholds were effective and possibilities to improve these.From a strict statistical point of view our thresholds did not operate as effectively as planned.The strategy, explanation, and verification prompts did not perform satisfactorily, regardless of the threshold, which suggests they should either not be implemented in future studies or must be designed differently.

Limitations and Future Research
A number of potential limitations apply.First, many of the learner characteristics used for implementing prompts could be considered both: affective states, as well as more stable trait characteristics.Whether the variables of the learner profile should be regarded more a state or a trait when used in an e-learning environment, may be focus of future studies; Second, by employing a dichotomous prompting approach based on whether or not participants scored high on specific learning characteristic reduces variance [43].An alternative approach could be to simply use percentile, or even decentile, rankings and create corresponding prompts or use an item response theoretical approach (for successful applications in online learning settings see work by [35,44,45]); Third, our information about participants was collected using self-report and was limited to a small set of variables; Fourth, prompts were kept relatively short and it was assumed that they were self-explanatory.This may have disadvantaged some participants with poorer comprehension skills.In addition, it would be interesting examining additive or inhibitive effects between prompts.The latter was outside the remit of the current study but Kauffman and colleagues [15] may be a good starting point; And fifth, relying on null hypothesis testing [46] and adherence to p-values as the determinants of what may be considered meaningful has only been criticized by leading statisticians [47] but may also be a suboptimal approach to optimize prompting.For example, Reisslein [48] explored different prompt formats and presentation effects.They interpreted non-significant results as an indicator that the prompts were effective across all groups where they were employed (rather than ineffective).Krause and Stark [49] also observed no significant prompt difference in performance when they asked students to engage in active problem-solving with or without reflection prompts.In their case, the descriptive statistics showed that purely numerically, performance was indeed higher for the prompted group, but not at p < 0.05 (they reported a difference p = 0.11).
Based on these limitations the authors would like to make five suggestions for future research.First, there was tentative evidence that the motivation prompt might be of use if thresholds are refined.This suggests that careful analysis of prompts may, even when prompts fail to generate the expected outcomes, provide starting points to optimize future implementations; Second the readability indices used to ensure that the e-modules are fairly comparable are based on linguistic features.More elaborated approaches consider cognitive features and processes relevant to text processing [50].Using Coh-Metrix as a readability index, for example, may shed more light on the interplay between text difficulty, prompting and learner characteristics; Third, more work is needed to understand the testing effect and how different prompts may come into play.For example, future studies may investigate whether the assessment prompt has to be based on the learner profile or may work for any student independent of their learner profile; Fourth, future longitudinal employment of learning profiles, such as the one we utilized, may provide insight into the stability and effectiveness of prompt interventions [51]; Fifth, and finally, the challenges with the design of thresholds will be to identify optimal points where performance will be sufficiently enhanced, and this will require reiterative designs that improve on previous models.Sanabria and Killeen [52] suggested that replication statistics may be particularly helpful, and potentially more so than null hypothesis testing, in order to test replicated effects.Given the incremental work required to find optimal thresholds, we would support the additional use of these statistics in future work.

Conclusions
In line with previous work on personalized prompting, we tested a number of prompts which we introduced together with an overview of how we operationalized personalized prompting.These were based on a number of learning and personality characteristics predicted to influence learning.The use of a matched case-control design enabled us to compare the effects.Unfortunately, the motivation, the verification, the strategy and the explanation prompt did not improve performance.Since we examined different thresholds in an exploratory manner, it is likely that these prompts were simply not effective the way they were designed.
We observed a significant benefit of the assessment prompt, which is in line with the testing effect, which proved an excellent starting point for future studies.A particular research question may address the issue whether the assessment prompt has to be based on the learner profile or works independently.All learning characteristics were significant correlates of at least one of the performance variables.Additional thought was given to the role of confidence, computer self-efficacy, and opportunities for threshold optimization in the hope to provide further starting points for future work on personalization.

Figure 1 .
Figure 1.Positioning of prompts in both e-modules.

Figure 1 .
Figure 1.Positioning of prompts in both e-modules.

Figure 2 .
Figure 2. Mean differences between the control and experimental group.The bars represent the standard error.*p < 0.01, ** p < 0.005.

Figure 2 .
Figure 2. Mean differences between the control and experimental group.The bars represent the standard error.* p < 0.01, ** p < 0.005.

Table 2 .
Prompt set up.

Table 2 .
Prompt set up.

Table 4 .
Correlation of scales and outcome measures for the experimental and the control group.
Note. t < 0.10; * p < 0.05; ** p < 0.001; N = 656.We used Spearman's rho to compute the correlation coefficients since data were not normally distributed.Results for the control group are in parenthesis, e.g., 0.057 is the correlation coefficient between need for cognition and test score for the experimental group, and 0.128 is the correlation coefficient between need for cognition and test score for the control group.

Table 5 .
Prompt specific results and suggestions for future studies.