1. Introduction
Pakistan’s educational system is struggling to attain the sustainable development goal articulated by the UN. By 2030, “Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all”. However, the latest education reports state alarming statistics in every aspect [
1]. About 22 million children aged 5–16 have never been to school. Also, the dropout rate is very high, and enrollment for grade 3 is reported at 16%, which decreases to 4% at grade 10. Unlike these statistics, Islamabad, the capital of Pakistan, has a 91% enrollment. These figures reflect that the literacy rate is not uniformly distributed in all cities of the country. These elementary students do not own any form of computer and do not have access to the internet as well.
These students are mainly introduced to Mathematics, Literacy in English and Urdu, science, and social studies. The report does not have any data about computer science studies. In 1998, some efforts were made to integrate computers into the everyday teaching curriculum in Pakistan [
2]. However, even after twenty years, either Computer Science is not being taught in many schools, or there is no course material present on the subject.
Unlike other primary subjects like mathematics and science, computer science is given minimal attention in Pakistani elementary schools. Many schools still do not have computer labs. We surveyed public elementary schools in Islamabad and found that there were only 1 out of 20 schools with a computer lab. Also, there is an acute shortage of trained faculty to teach students the fundamental principles of computing, especially in rural areas.
The current limitations end up with a lack of infrastructure and a lack of skilled human resources to bring in computer science studies at the elementary level. However, committed to the mission of teaching problem-solving skills at a young age, we introduced Computer Science Unplugged (CS-Unplugged) in an elementary school in Pakistan to understand the feasibility and sustainability of our mission.
CS-Unplugged is a way of imparting computer science fundamentals through interactive activities. The activities can be done using simple educational materials like papers and cards. The activities are designed and performed in such a way that helps students to understand the concepts of Computational thinking. The main goal of these activities is to provide hands-on experience to make Computer Science an interesting and fun course without introducing any programming language or computers. These activities help students develop Computational thinking.
Students have the greatest ability to develop such skills at a younger age [
3]. The ability to learn and develop new skills is also formed in the early years. According to Piaget’s five stages of cognitive development, a child between ages 7 and 11 develops complex problem-solving skills [
4]. Therefore, CS unplugged activities can greatly improve analytical thinking among elementary students.
Besides, the teacher who performs these activities with the students does not necessarily have to be a computer science major. Any Math or Science teacher can perform these activities with minimal training. CS unplugged overcomes the barrier of learning a programming language to understand how a computer works to solve problems. These activities can also help to develop an interest in students for computers who may not own one or have access to them. These activities are simple yet powerful in delivering the concept.
CS-unplugged is a way to instill Computational thinking. Computational thinking (CT) was a term first coined by Papert [
5], who defined it as procedural thinking and programming. Later, Wing [
6] defined CT as a problem-solving skill set that everyone could learn. CT in academia refers to the processes that enable students to formulate problems and identify solutions that are presented in a form that could be conducted by information processing and programmable agents [
7].
CT can be applied to different kinds of problems that do not necessarily include programming [
8]. The terms commonly used in CT are (1) Sequencing: a cognitive ability that generates skills to arrange objects or actions in the correct order [
9]. (2) Conditionals: Instructions to make decisions when given some conditions [
10]. (3) Iterations/loops: Repeated processes in which the code segment is executed once [
11]. (4) Testing and debugging: The process of finding bugs and errors and how learners correct the bugs found during testing [
10]. (5) Pattern recognition: creating rules and observing patterns in data [
12]. (6) Modularity: A divide and conquer skill to separate the problems into smaller problems through sub-program/modules [
13]. (7) Algorithm Design: Creating an ordered series of instructions to solve similar problems or to perform a task [
13].
In one study, the researchers utilized tools such as board games, toys, cards, puzzles, and papers [
14,
15]. Some researchers have conducted case studies by developing new unplugged games for CT training courses. In one research, three life-size board games are created to introduce unplugged, gamified, low-threshold introduction to CT for primary school children [
16]. Few other researchers developed CT implementation through interacting with the agents (e.g., robotics, objects in the Scratch program, and electronic toys); students can consider steps and use technical skills to manipulate the machines/agents to solve problems [
13,
17,
18].
We are specifically interested in CT inclusion via CS-unplugged techniques because they do not require larger infrastructural change and yet are useful in developing skills needed by students. Consequently, CS-unplugged activities are the best substitute for resource-limited environments like the Pakistan public school system; these activities provide skills on paper and pen, without computers, and with a math or science teacher with minimal training.
We will now highlight use cases of CS-Unplugged worldwide in related work. We will discuss our quest framework and project design in the
Section 3, Methods and Materials. We will explain our machine-learning framework in the
Section 4, Supervised Machine Learning Algorithms. Later,
Section 5, Results and Discussions, and
Section 7, Conclusion, will wrap up the discussion.
2. Related Work
“Equity, rigor, and scale” are the fundamental principles for formulating the computing curriculum [
19]. That means irrespective of gender and social class, all students must be given equal learning opportunities; computer teachers must be provided proper training to ensure rigor, and the curriculum must not include expensive devices for wider dissemination of knowledge.
Though the curriculum in developed countries is already based on critical thinking, as far as Computer Science is concerned, these countries also face issues like high dropout rates. This is mainly because Computer Science is taught as a tool rather than a discipline like mathematics or physical sciences [
20,
21,
22]. Therefore, it was recommended to teach students Computer Science fundamentals gradually starting from the elementary school level to impart a deeper understanding of the discipline as mentioned in the survey done by [
23]; they observed that the students who lack problem-solving skills, struggle in introductory programming classes.
Tim Bell also emphasized in their survey that many students in developed countries do not opt for Computer Science as a career mainly because the subject is taught as a tool, not a discipline. They suggested teaching Computer Science at K-12 in three levels, where during the first two levels, a basic understanding of Computer Science concepts should be conveyed through projects like CS Unplugged and Scratch. Later, in the third level, students should be familiarized with the technical details of using a particular tool or technology [
24].
Another more comprehensive work is based on incorporating fundamental concepts in the K-12 curriculum [
25]. They argue that teaching programming should not be the objective of a Computer Science curriculum—rather, teachers should be made aware of the broader picture of Computer Science. One drawback of the programming-focused approach is that technologies change rapidly, making it challenging for Computer Science teachers to keep pace with emerging programming languages. Since these foundational concepts are not specific to any technology, they are more suitable for teachers at the primary and secondary levels with no background in Computer Science. CS4FN (Computer Science for Fun) developed by [
26]. CS4FN has many unplugged-style engaging activities and approaches by emphasizing kinaesthetic activity. Similar proof of study is being conducted by the Computational thinking test (CTt). CTt is a performance test generated by [
27] for middle school students. This an online test to evaluate students on all the terms of CT by asking 28 questions. Another research group developed core CT content [
28]. This framework is commonly known as Bebras Tasks. These tasks can assist in elementary student’s CT learning and skill development.
Scientists have coined the term “Big Ideas” in computer science based on the principles of big ideas in science: it represents ten computing principles that every 12th grade child should be aware of. These ideas are related to how a computer stores information. How is it processed? Who processes it? Can we simulate it? Can we automate it? What is the extent of automation? These ideas are listed here: (1) Image representation in a computer, (2) Algorithm interacts with data, (3) The performance of the algorithm can be evaluated, (4) An algorithm cannot solve some computational problems, (5) Program express algorithm and data in computer understandable form, (6) Human designs digital systems to serve human, (7) digital system creates a virtual representation of a phenomenon, (8) Data protection is critical, (9) Time-dependent operations must be incorporated, (10) Digital system communicates with each other [
25].
Sentance et al. evaluated various challenges teachers face in imparting Computer Science fundamentals at the K-12 level [
29]. They mentioned active learning via unplugged techniques is one fun way to incorporate. The Teaching London Computing Project is one successful example of recent work done in Computer Science Education [
30]. This resource provides many unplugged activities.
In another study [
21], the importance of Computer Science education was examined from an economic, social, and cultural perspective. They concluded that professional teacher training is the key factor in coping with the various challenges of implementing pedagogical approaches.
Hossein et al. applied a cost-effective game methodology for imparting Computer Science fundamentals that do not require digital games [
31]. Their focus was on introducing game-based learning in higher education, where they redesigned the traditional curriculum for core Computer Science courses such as “Data Structures and Algorithms” by introducing sorting and traversal algorithms through cards. Later, they compared “regular” and “game-based” teaching as per Bloom’s taxonomy of learning.
Recent work has been done in Italy about the effectiveness of developing critical thinking according to guidelines provided by the project K-12 Computer Science. They have adapted the code.org approach for their study, where students design projects using drag-and-drop control structures [
3].
Unfortunately, in developing countries such as Pakistan, elementary public schools are neither equipped with computers nor skilled teachers available. Therefore, we decided to analyze the effectiveness of the CS-Unplugged approach [
32] in the context of developing countries. This is our pilot study in one school. However, in the future, we plan to extend our project to other schools and incorporate activities related to all 10 big ideas.
4. Supervised Machine Learning Algorithms
We employed supervised machine learning algorithms to determine the student’s performance for a given unplugged activity. The goal was to predict results and find the best prediction model [
33].
Supervised learning algorithms determine the labels for the unseen data based on the pattern learned from the labeled data by using a mapping function . This mapping function produces an output y for each input x (or a probability distribution over condition . Several forms of mapping functions exist, including decision trees, logistic regression, support vector machines, neural networks, etc. In this study, we perform a machine learning-based analysis to predict the students’ performance after introducing the CS-Unplugged concept, as discussed above. There are two questions that we are addressing using machine learning approaches.
- Q1.
Can our model predict whether a student will pass the CS-Unplugged activity after the training? We used the following features as input to our models: Age, gender, class, and Pre-Training Result. Our label for this question is the Post-training results, where a student can either pass or fail a test. This makes it a binary classification problem.
- Q2.
How useful is our CS-Unplugged training to introduce the computer science concept? We used the above-mentioned features with the following concept of labels. These labels are also summarized in
Table 1:
- a.
If the student’s pre-training result is F and post-training, the result is P. We consider it a positive response.
- b.
If the student’s pre-training result is P and post-training, the result is P. We consider it a negative response.
- c.
If the student’s pre-training result is F and post-training, the result is F too, we consider it a negative response.
- d.
If the student’s pre-training result is P and the post-training result is F, we consider it a negative response.
We have the following setup. We have 360 students; among them, 157 are positive responses, and the rest are negative. We consider these students as negative responses as these students did not benefit from the training either because they already were producing correct answers or even after training, they failed to get the correct answer.
This is a binary classification problem: our y is either positive means pass or negative means fail. In our experiments, we applied Logistic Regression, Decision Tree, Support Vector Machine, K-Nearest Neighbors, Random Forest, and Bagging to analyze which algorithm would perform best for the given problem. The selection was made to rely not only on the parametric algorithm but on non-parametric algorithms as well as the ensemble algorithms. For our model selection, we used Stratified Cross-Validation. We kept 10 splits. The summary of the methodology pipeline is provided in
Figure 4.
We will now briefly introduce the algorithms applied and the evaluation metrics used. Later, we will discuss the results, leading to the experiment’s conclusion.
4.1. Machine Learning Models
4.1.1. Logistic Regression
Logistic Regression is one of the simplest machine-learning techniques for classification [
34]. This model learns the parameters by estimating the class-posterior probabilities with the sigmoid function. In this model, the kernel function looks at a discriminant function to solve the classification problem.
4.1.2. Decision Tree
A decision tree uses a tree-like graph or model of choices and their attainable consequences [
35]. A decision tree is a flowchart-like structure. It begins with a root node that separates the classes based on the highest information gain, and every internal node represents a “test” on an attribute (such as whether the student has passed the test or not).
4.1.3. Support Vector Machine
Support Vector Machines (SVM) are supervised linear classification models that use hyperplanes. Hyperplanes are the separation margin between classes. Data classification is done based on the margin that is wide and viable. When training the model, the vector w and the bias b must be estimated by the solution to a quadratic equation [
36].
4.1.4. K-Nearest Neighbor
K-nearest neighbor (KNN) is known as one of the simplest machine learning algorithms. The classification is done based on the distances between samples. The classification data set contains observations in the form of
X and
Y in the training data. Hence,
is the vector containing the feature values while
is the class label [
37].
4.1.5. Random Forest
Random Forest is essentially a collection of unpruned classification trees. This algorithm performs excellently on several practical problems, mainly because it is neither sensitive to noise in the data set nor subject to overfitting. Random forest is made up of independent Decision Trees. Where each tree sets conditional features differently. When a sample arrives at a root node, it is traversed to all the trees. As an outcome, each tree predicts the class label for the sample. In the end, the majority class is assigned to that sample. This algorithm works fast and generally outperforms many other tree-based algorithms [
38].
4.1.6. Bagging (Boosting Aggregate)
Bagging can be performed when the random forest decisions are taken from different learners and combined into one prediction only. Combining the decisions in the case of classification is voting. The same weights are taken by the models in bagging. The experts are individual decision trees or decision stumps, which are united by making them vote on every test. Later, the majority rule is applied for the classification [
39].
4.2. Evaluation Metrics
Some important result metrics that were tested throughout the experiment were accuracy, precision, recall, F1-score, and receiver-operating characteristics (ROC) curve.
Accuracy is defined as the ratio of the total number of CorrectPredictions achieved to the measure of TotalPredictions, regardless of correct or incorrect predictions.
Precision is given as
where
TP stands for true positives, i.e., how many true correct predictions are made by the classifier, and
FP stands for false positives, which is the measure of the incorrectly predicted positive.
Recall is given as follows:
Also called sensitivity or the true positive rate ratio of correct predictions to the total number of positive examples.
The
F measure combines precision and recall; a good
F measure suggests low false positives and low false negatives. A perfect F1 score will be 1, while the model is a total failure at 0. The formula for the F1 score is
The receiver operation characteristic (ROC) curve is a commonly used performance measure for the classification problem at various thresholds. ROC is a probability curve, while AUC represents the measure of separability. The basic purpose is to get an inference about how much the model is capable of classification distinctly. The higher the AUC score reflects the better performance of the model. The ROC curve is plotted with TPR against the FPR. The TPR is on the y-axis, and FPR is on the x-axis. Thus, the higher percent values for the performance metrics indicate a better model. The performance measures increase if the number of chosen samples is correctly classified.
5. Results and Discussions
In this section, we will discuss the results of pre-training and post-training activities and also discuss our Machine learning models.
5.1. Pre-Training and Post-Training Results
Let us discuss a few sample results. These samples are reproduced for clarity purposes.
Figure 2b is one of the possible correct solutions. In
Figure 2c, we display the solution where the student could not recognize the question well and moved toward the closest object instead of finding the correct object.
Figure 2d does not follow directions and uses illegal moves.
Figure 2e is not the shortest path, and
Figure 2f uses an illegal move only one time.
Figure 2f was one of the common mistakes students made. The detailed results are shared in
Figure 2; according to the results, only 37% of the students passed the pre-training activity. For the post-training activity, the students were given new sheets of paper with similar activity. However, instructions regarding the basic algorithm for reaching the destination were given and demonstrated this time. The results were then compiled to see if the CS-unplugged activity was effective or not. 75% of the students passed the post-training activity, reflecting the success of the CS-unplugged training. The complete results are shared in
Figure 5.
5.2. Gender Based Results
The bar plot in
Figure 6 depicts the pass or fail percentage among girls and boys. We observed that 32.3% of the boys passed the pre-training activity. On the other hand, 43.6% of the girls passed the pre-training. In the post-training activity, 78.6% of the boys passed, and 73.1% of the girls passed the post-training activity. We can observe from the data that training was more useful for the boys than the girls. This depicts that both genders found the training useful with more than 70% increase in success rate. Notably, the success rate was lower in females. Similar results can be found in [
8], where when CTt was conducted, females scored 1 point less than boys. This provides an opportunity to improve training for better inclusion.
5.3. Age Based Results
The experiment was conducted on students aged 7–13, as shown in
Figure 7. We found that this activity was easily understandable for the age group 10–11. We had one 14-year-old student who could not pass the activity even after the training. This might be due to any other unrelated issue. For our further analysis, we excluded this student from our data set. However, training was effective for all ages.
5.4. Class Based Results
We had students from 3rd grade, 4th grade, and 5th grade. We found the training more effective for 5th grade. Also, these kids were old enough to understand the concept. This also depicts that the suitable age group for this activity in Pakistan is 11–13. These results are shown in
Figure 8.
This analytical study helped us make our decision policy about which activity is more easily understandable by age, class, and gender so that we can focus and modify our content accordingly. We found that students could understand the concept and were more confident after training. We found a linear relation with each variable (age, gender, and academic level) here, and our post-training results encouraged us to continue with CS-Unplugged activities.
5.5. Supervised Machine Learning
Now, we will discuss the results related to our machine-learning model.
Table 2 is curated to represent the fine-tuned hyper-parameters for optimization using several machine learning algorithms. We used the Sigmoid linear function for Logistic regression and performed an L2 penalty for regularization. In the decision tree, we used ID3 and kept the depth of the tree up to 4. We also performed a grid search for the optimization of the C and gamma parameters in SVM, and for our Q1, it was set to C = 0.001 and gamma = 100; for Q2, it was set to C = 10 and Gamma = 0.1.
Our first problem was to predict how the student would perform in the post-training activity, given the age, gender, class, and results of pre-training.
Table 3 will represent the results for problem 1 when we used several algorithms:
We can achieve above 70% accuracy using most of the selected algorithms. Similarly, the highest F-measure is achieved by the SVM. We can observe that logistic regression is performing best to predict which student will pass the test once the training is provided for the CS unplugged activity, as shown in
Figure 9. Regarding the ROC score, as shown in
Figure 10, we can see that logistic regression has the highest score. These results show the prediction of student performance based on past results and age, gender, and academic level.
Our major interest in applying machine learning was addressed in Q2. We were looking at how effective our training is using the label to indicate positive outcomes only when the student failed pre-training and successfully completed the activity after training. The results of this experiment are summarized in
Table 4.
Support Vector Machine(SVM) provides the highest F-measure score. Looking at the Accuracy and ROC score, we can conclude that Logistic regression is the best classifier among the chosen algorithms, as shown in
Figure 11. The results showed that we were able to achieve more than 80% accuracy in 5 out of 6 selected algorithms. The highest ROC score is by Logistic regression, as shown in
Figure 12. Due to a small data set, we picked those algorithms that can work efficiently for a smaller data set. We also have a small feature vector. We believe the decision tree will easily provide us with general rules. However, we could not get a pure tree and had to perform pruning. Also, KNN was picked that will be able to perform well due to its small dimensions. However, KNN was not able to perform as expected. That means our data is overlapping but maybe in a hyper-sphere structure. We can also observe that ensemble methods performed better, which was also expected due to the nature of the algorithm’s states.
We can observe logistic regression performs slightly better than the other algorithms. This depicts that the CS-Unplugged activity is simple to use and will have good learning results. Also, a simple model can predict these changes with much higher performance. This study is unique, and to the best of our knowledge, no such work has been done in providing CS-Unplugged training to students to understand the technique within limited resources in Pakistan. In these circumstances, we validated our results by random sampling. We created another data set using the ranges of each feature. We annotated our random data set using Gaussian distribution and then generated a logistic regression model using the same parameters.
The idea was to reject the following null hypothesis: Null Hypothesis: When our target function is applied, the random data set generates a similar outcome to the original data set. Alternate Hypothesis: When our target function is applied, the random data set will have a different outcome from the original data set.
We performed the 10-fold cross-validation on the random data set. To compare whether our two data sets are significantly different, we performed the Student’s t-test. Our p-value for accuracy results of both data sets for the first problem (given attributes, can we predict who will pass the after training) is , and for the second problem (given the training, whether there will be a positive outcome) is . This helps us reject the null hypothesis and validate the experiment.