3.1. Observations
In this study, we propose a new approach through data accumulation to identify difficult topics. This approach is based on the discrepancy between learners’ efforts and exam results by subject topics, in which learners’ efforts and exam results, in turn, correspond to formative and summative assessment results collected from the course. The topic that requires attention is related to the highest discrepancy found. Our approach is limited by the following two risks: (a) the exam question does not follow the content taught in class and (b) the test results do not match the learner’s performance, which might be described as “man proposes, god disposes”. In the following, we consider several assumptions about risk (a) and its effects to clarify the proposed solution; meanwhile, risk (b) will be mentioned and discussed in the experimental results.
Assumption 1. When preparing a summative assessment, the lecturer will observe the statistical results of a formative assessment.
Exams that are too easy or too difficult do not differentiate the ability and effort of learners in a group. In other words, the exam questions need to match the ability of the vast majority of learners. The best way to understand the suitability of the learners is to consider the results or feedback on the acquisition level during the learning process.
Assumption 2. For every learning topic in the course, lecturers use the same set of courseware for all teaching classes.
Although many different teaching methods can be applied in the same classroom, this is not mandatory for all lecturers. Instead, the set of shared documents on the e-learning system, including lectures, interactive class exercises, and homework, are inherited, improved over the years, and can be applied to all classes. Within this research, we call those documents courseware of the subject.
Assumption 3. Most learners have stable and moderate learning behavior for all topics in the same subject.
Each course covers multiple topics rolled out weekly with the same courseware for every learner. The exercise patterns (multi-choice questions, fill-in-the-blank, programming, etc.) cover various levels, from easy to difficult. Learning behavior includes learning attitude, diligence, level of effort, and self-discipline to work from home. If the courseware is engaging enough and accessible from the very first topic, studious learners who do not encounter unexpected situations will maintain a consistent learning behavior over the course process.
In this study, we focus on assessing the acquisition level of most learners (with the same set of lecturers and learners). If a metric that describes the acquisition level exists, it should follow the Observations and Definitions as follows:
Definition 1. The level of lecture acquisition is the metric through which it is assessed if the learner meets the learning outcome through the courseware and learning process.
Observation 1. If there exist two topics with the same level of lecture acquisition to the vast majority of learners (with the same set of lecturers and learners) and if there is a measuring tool for this, then the results returned from the measuring tool for those two topics need to be approximately the same.
From Assumptions 2 and 3, given that learners have stable learning behavior with well-designed courseware and an appropriate teaching path, the learners should meet the required level of learning outcome. Thus, the results from the tool to measure the level of lecture acquisition according to statistics must be approximately equivalent.
Observation 2. If there is a tool to measure the level of lecture acquisition, it should depend on the results collected from the summative assessment.
The summative assessment is a final assessment to officially determine the learner’s achievement level. Therefore, any measurement method needs to be based on this assessment.
Proposition 1. It needs to integrate additional information gathered from the formative assessment with the measuring tool for assessing the lecture acquisition level to achieve higher accuracy.
Regarding timing, the formative assessment needs to be released and completed before the summative assessment. The latter is considered as a method of post-testing, post-checking, or post-auditing. The information extracted from the summative assessment is typically enough to understand learners’ acquisition levels through statistical methods illustrating a Gaussian distribution. However, designing an exam without observing the information collected from the formative assessment will mean it is likely that the results of the summative assessment do not reveal the performance and efforts of each learner.
Moreover, according to the Assumption 1, the observation of statistical results of the formative assessment to create a summative assessment is often subjective from an individual or a group in charge of making the exam. Therefore, there is no guarantee of consistency between formative and summative assessment results. In other words, the results obtained from the summative assessment are not guaranteed to accurately describe the observed measuring tool.
We can obtain the following Propositions 2 and 3 based on Observation 1 and Proposition 1.
Proposition 2. If the tool to measure the level of lecture acquisition exists, then it should observe the information gathered from both the formative and summative assessments.
The discrepancy between formative and summative assessment of a topic is meaningful and observable. To do that, we need to consider the following definition of discrepancy.
Definition 2. The discrepancy between formative and summative assessments in each topic is the average discrepancy between two formative and summative assessments in the same subject by learners.
Proposition 3. When comparing topics, the topic with the most significant discrepancy is the one that needs to be reviewed in the continuous improvement process.
When composing the exam, observation of the topic with the largest discrepancy can be problematic. Therefore, the relevant summative assessment should be reviewed for future semesters or, if the person or group in charge of making the exam still wants to keep the same assessment/difficulty level in the summative assessment, they need to find a way to add more exercise content in the courseware and even change the content of the formative assessment. Whatever the decision is, it is still in the continuous improvement process phase.
3.2. Proposed Courseware Improvement Process
Figure 1 illustrates the courseware improvement process that is the core of our proposed approach. In the Learning Management System (LMS) block, the courseware includes learning materials for learners, such as videos, slides, and exercises in the online environment. Meanwhile, the data accumulation records learners’ interactions with the courseware, such as time spent watching videos, opening/downloading lecture slides, doing exercises, auto-grading results of assignments, etc. The data related to assessment score are used for the following steps:
Clustering and Noise Reduction: the data are divided into different related topics, including formative and summative assessments, which will be clustered and the noise will be removed. The data may contain some particular learners who have abnormal learning behaviors. Particular learners should be supported separately and removed from the data for the general support of learners to be correct.
Discrepancy Calculation: the discrepancy is described by the average absolute difference between the learner’s formative and summative scores.
Difficult Topic Exploring: the topic with the highest discrepancy is focused on. The result may be sent to the courseware improvement process block. This process collects information and sends suggestions to the lecturer. Finally, the lecturer reviews the topic and can appropriately adjust the courseware for the following semesters.
Figure 1.
Illustration of the proposed courseware improvement process.
Figure 1.
Illustration of the proposed courseware improvement process.
3.2.1. Clustering Method and Noise Reduction
In a classroom, there will typically be students with the same learning behavior, and there will also be students whose learning behavior is unstable. This can happen because although learners have different prior knowledge, they can study together in groups and form the same study habits. On the other hand, students with individual learning tendencies or abnormal learning behaviors will produce distinct learning behaviors, which we call outliers.
From the lecturer’s perspective, it would be ideal if they had a teaching method that is adaptable to each learning behavior. Therefore, lecturers strive to support as many learners as possible, leading to good support for large groups of people with similar learning behaviors and rare support for the outliers. Our study focused on observing the inconsistencies in most learners’ learning and assessing processes. As a result, the noise resulting from outliers will be excluded from our approach. We chose the OPTICS algorithm as a clustering and noise detection method to remove outliers.
Clusters are defined as dense regions separated by low-density regions. The algorithm starts with an arbitrary object in the dataset and checks the neighbor objects within a given radius (
eps). If the neighbors within that
eps are more than the minimum number of objects (
minPts) required for a cluster, it is marked as a core object. Otherwise, if the objects in its surroundings within the given
eps are less than the
minPts required, this object is marked as noise. The drawback is that it depends on one fixed
eps for different clusters like DBSCAN [
34]. Therefore, this method may produce bad predictions when encountering clusters with different intervals. OPTICS is a model that improves this weakness. Instead of determining based on a fixed distance of
eps, OPTICS determines if the distance between two points is appropriate by evaluating the distance of each pair of points in the local environment.
Figure 2 demonstrates an example, which shows that OPTICS detects outliers better than DBSCAN.
3.2.2. Discrepancy Calculation
After applying the OPTICS algorithm for clustering, all noise points are removed, and the discrepancy between formative and summative assessments is calculated. To observe the average discrepancy between formative and summative assessments of each student, we propose a metric to calculate their dissimilarity based on each topic in a course. For each topic
t in the set of topics
T used for evaluation, the discrepancy metric
d is determined by (
1).
is the set of learners participating in the formative assessments. Meanwhile, is the set of learners participating in the summative assessments. These two sets may differ, for example, a student did not take the final exam due to a personal issue like an unexpected illness. Therefore, a set will be used to calculate the discrepancy.
is an element in the set of scores of the students in the formative assessment F that represents the formative score for topic t of student i.
is an element in the set of scores of the students in the summative assessment S that represents the summative score for topic t of student i.
The discrepancy is described by the average absolute difference between the learner’s formative and summative scores. The difference between the formative and summative scores is the learner’s discrepancy, which can lean towards either of the two assessments. To avoid excluding any possibilities, we use the absolute value of the discrepancy when calculating its overall summation. The discrepancy d is calculated by taking the average differences across the number of learners. We chose a simple average estimate, the arithmetic mean, for this formula because learners are considered to have similar characteristics. However, we do not claim a limit to only using mean measures. As the d value approaches 0, the correspondence between the learning and testing procedures becomes stronger. Conversely, a larger d value indicates a more significant discrepancy between the two processes of learning and testing.