The Comparison of Students’ Self-Assessment, Gender, and Programming-Oriented Spreadsheet Skills

: Previous research proved that teaching spreadsheeting from a programming perspective is much more effective than the widely accepted tool-centered surface approach methods. Spreadsheeting as an introductory programming approach allows students to build up schemata leading to contextualized, concept-based problem-solving. Furthermore, it provides tools for real-world problem-solving in other disciplines, and supports knowledge-transfer to database management and “serious” programming. The present study provides the details of a nationwide testing of Grades 7–10 students on how they evaluate their spreadsheet knowledge, which classroom activities form their self-assessment values, and the results of three spreadsheet tasks evaluated by the SOLO categories of understanding. The comparison reveals that most students’ spreadsheet knowledge is pre-structural. On the other hand, they assess themselves much higher, which is primarily based on the number of activities carried out in classes. Traces of conscious problem-solving and knowledge-transfer within the scope of spreadsheeting are hardly detectable, while knowledge brought from mathematics is recognizable. In general, we found proof that the pieces of knowledge remain unconnected, not allowing students to reach the relational level of understanding and build up long-lasting knowledge.


Should We Teach Students to Program?
Hungary is one of the countries in the world where informatics as a school subject was introduced as early as the mid-'90s [1][2][3] in the first National Base Curriculum [4]. Various names and numbers of classes (from 2009) have been assigned to the subject in the meantime (Table 1) [4][5][6][7], with only minor changes in the content, detailed in the frame curricula. A thorough analysis of the 2013 issues [8,9] revealed, on the one hand, that the same material should be taught regardless of the name of the school subject, while on the other hand, that the same amount of material should be taught regardless of the number of classes [10][11][12][13][14] assigned to the subject. The research also revealed that the content of the frame curricula [8,9] is (1) oversized, even in the case of the largest number of classes, (2) loaded with a high number of ambiguous terms, (3) tool-centered, and (4) strictly divided into distinct subthemes. Furthermore, the research found that problem-solving is restricted to programming.
One consequence of the low number of lessons along with the overestimated, overplanned content is that informatics education has become a rather tool-oriented, boring, good-for-nothing subject which [15][16][17][18] does not support the development of computational thinking as a fundamental skill [19]. The widely accepted and popular low mathability [20,21] teaching approaches commit all of the errors which were outlined as early as 1993 in Soloway's paper [22]. The author and his fellow researchers claimed that schools are inefficient in terms of teaching programming and developing students' algorithmic skills, which is one of the ultimate goals of teaching informatics. As a result of this, they determined that: • Schools have made programming isolated and disconnected. • Schools should adopt a "whole programming" approach. Furthermore, they presented their ideas on how programming can be integrated into education. It was stated that programming is ubiquitous, and thus should be: • expanded to end-user computing; • socially sanctioned for intellectual advances for everyone; and • embedded in a rich cognitive context.
They went further by claiming that creating a computational medium requires: • making programming easier to learn and do; and • expressiveness and usefulness.
Furthermore, the vision of teaching programming-nonetheless informatics-to everyone, and giving meaning to this subject is in complete accordance with the concept of Technological Pedagogical Content Knowledge (TPCK) [43,44], and also with the Meaning System Model (MSM) [26], which have been proven to be effective over time. The TPCK claims that all three aspects of the model should be present in the teaching-learning process, regardless of the subject. The MSM goes one step further and declares that beyond the built-up knowledge pieces, the belief in the incremental nature of sciences (e.g., dynamic contextual knowledge) also plays a crucial role in the process. These findings are in complete accordance with the research results of Hattie [40,45], who makes a distinction between experienced and expert teachers. The experts of Hattie are teachers with high teaching self-efficacy and a strong belief in the incremental nature of sciences. Being aware of the recently published results, to carry out the vision of Soloway, teaching approaches must be revolutionized.

Problems We Are Faced with
In spite of the relatively early introduction of the school subject, Hungary is faced with the low efficiency of teaching-learning informatics. The country-both in education policy and in teachers of informatics-does not realize that teaching informatics is a lot more than using digital tools meaninglessly. The first warning came in 2009 in the form of the PISA 2009: Students On Line (Figure 1) [46]. The competence test revealed that in Hungary, there is an extremely negative correlations between the use of computers in school and computer problem-solving achievements. Another warning sign is the trend observable in the number of participating schools and students in the Nemes Tihamér programming [47] and application [48] contests [1]. This programming contest started as early as 1985, and the numbers of schools and students participating in 2018 were about the same as in 1990. In the application contest, which started in 2004, apart from the first two years, the lowest number of schools and students participated in 2018. In a similar way, low-level academic knowledge was found by testing first-year students of informatics at starting their tertiary studies [40,49].
These independent measurements of students' academic abilities reveal that the solution does not lie exclusively on the introduction and the development of tools, including both hardware and software, but rather on the applied teaching approaches and methods [26,40,50,51]. This finding is in complete accordance with Soloway's suggestion from 1993 [22]. It has also been proven that Prensky's idea of the digital natives [52] is only a myth [53,54], and its fast wide-spreading results in more harm than good. We had also found proof that the intensive use of mobile devices has a rather negative effect on the development of students' algorithmic skills [51].
One further reason for the low efficiency of teaching informatics in Hungary is the popularity of folk-pedagogy [50]. In the mid-1980s, universities-totally unprepared, especially in terms of didactics-started a teacher of informatics education program. In these circumstances, low effective teaching methods were presented, practiced, and stated as examples-whether consciously or not. Preservice and post-graduate teachers left universities with a diploma and with the burden of being taught how to behave and act like a folk-teacher. Furthermore, after leaving the shelter of their universities, these teachers were left alone. The most ambitious of them started developing their own approaches and methods, while the others were satisfied with not knowing and not wanting to know that students do nothing but play during classes [55].
The comparison of folk-medicine and folk-pedagogy in the influential paper of Lister reveals the negative and questionable effects of this approach [50], which include that folk-pedagogy: • is a category of informal knowledge distinct from "scientific medicine [pedagogy]"; • is usually unwritten and transmitted orally; • is not necessarily integrated into a coherent system, and may be contradictory; • is sometimes associated with quackery; • [but] . . . it may also preserve important knowledge and cultural tradition from the past.
The most distinguishing phrases of folk-teachers, according to Lister, are: • "I've been teaching for N years, and . . . " • "When I was a student . . . " In general, teachers apply methods that they find satisfying for all of the participants of the teaching-learning process-students, teachers, schools, parents, education policy, ECDL, etc.-without relying on any proof of effectiveness of these methods.

Mini Competence Tests
Being aware of the problems of informatics education in Hungary and the test results of first-year students of informatics [42,[56][57][58][59], mini competence tests were designed to measure primary and high school students' academic abilities in informatics. The specialty of our mini competence tests is that they present tasks and questions with short descriptions accompanied with figures. The idea behind these short and visualized tasks is to lessen the cognitive load of reading, because high-demanding texts would distort the outcome of the tests. The latest PISA tests (2015, 2018) [60,61] reveal that in all three subjects-reading, mathematics, and sciences-the results of Hungarian students are below the OECD average, and the share of low achievers in all three subjects are significantly higher than the average. Furthermore, the PISA 2009: Students On Line assessment presents similar results, with an additional negative correlation between the results and the use of a computer in school. Considering these published results, we wanted to lessen (1) the cognitive load of reading, and (2) the negative effect of the tool-centered teaching approaches in the informatics subject [62]. To achieve our goals, the complexity of the tasks are cut back, the texts of the tasks are shortened-compared to the PISA tasks-and all of the tasks are supported with informative pictures.
The purpose of the test is to investigate the effectiveness of informatics education, to identify the specific pieces of knowledge, and to measure the activation of knowledge transfer elements.
In general, our mini competence tests [12] are usually divided into three main parts: • general questions; • spreadsheet functions; and • mini problems to solve.

General Section
The first section of the competence test poses questions on general information, including questions on how students evaluate their knowledge in the following topics: To fill in the form, a 0-5 Likert scale is offered. In connection with their self-assessment, students are also asked whether they study/studied these subjects in school or not. In the present paper, the focus is on the students' spreadsheet knowledge. Consequently, their spreadsheet self-assessment values will be taken into consideration.
Self-assessment and achievement in the tasks can be significantly influenced by the activities carried out in an informatics class. Consequently, the students are asked to select items from a predefined list of activities that characterize their informatics lessons. The following list presents those activities, which have a direct connection to teaching-learning spreadsheeting: • solving ECDL tests (ECDL); • solving maturation exam tests (maturation); • formulating algorithms (algorithm1); • writing and drawing algorithms (algorithm2); • creating multilevel functions (multilevel functions); • working alone, based on a list of tasks (alone); • programming; • typing spreadsheet tables (typing); • checking the correctness of outputs (output); and • playing.
The general section also poses questions on the students' computer and mobile phone use, both in schools and in general. The time spent on using these devices were asked by selecting from a list with seven options. Four questions were set up considering the time spent on computers (desktop computer and laptop) and mobile devices (tablet, phone, and e-book reader) in classes other than informatics during a week (<1, 1, 2, 3, 4, 5, >5 h) and for daily personal use (<2, 2, 4, 6, 8, 10, >10 h).
Furthermore, questions were presented regarding the number of informatics lessons per week in Grades 1-12, the programming language(s) learned in school, the course and exercise books they use, and the frequency of using them.

Spreadsheet Functions
In the second part of the test, students were asked to list fifteen spreadsheet functions which they consider to be the most important ones. The names of functions must be provided without ranking them. One aim of this task is to see whether students can remember all of the functions presented in class, in course and exercise books, and/or in spreadsheet applications [62]. In the official coursebook of the time of the test [63], there are 41 spreadsheet functions listed, with hardly any problems to solve with them ( Figures 2 and 3). Primarily, similar to these examples, decontextualized, meaningless tasks are presented in the coursebook [63], leaving the principles of TPCK [43,44] completely unattended. Furthermore, one must call attention to the fact that students should be familiar with these operators and functions from their previous spreadsheet studies in primary school [8,9,62].  The other aims of the task listing the fifteen most important functions are to see whether schools: • make students learn functions; and • teach these functions at all or just let students navigate on the user interface.
From the results of this task, we can conclude how conscious the tested students are, and how fast thinking [64] can be applied in routine problem-solving activities.

Spreadsheet Problems and Their SOLO Categories
In part three, students solved real spreadsheet problems on paper without using a computer. Each section of the test contained multiple-choice and/or short-answer tasks. Most of these tasks could be completed by either activating knowledge transfer elements from other sciences, or basic knowledge in spreadsheeting. As it is mentioned above, according to the principles of the mini competence test, the instructions were short, accompanied with figures to make them concise and easy to understand.
The items of the problems were calculated recognizable pieces of the answers, which are detailed in the Chapters "Spreadsheet problems and their SOLO categories" and "Spreadsheet problems presented in the test". Furthermore, the items were converted into the SOLO (Structure of the Observed Learning Outcome) categories of understanding, which are widely accepted in teaching programming [49,50,65,66]. The usability of the SOLO categories on functional programming language problems has already been proven in previous studies [49,[56][57][58][59]. Consequently, it was reasonable to apply the same method in these tasks. The modified SOLO categories of understanding have the following four levels [42], where the highest SOLO category-extended abstract-is not considered, due to the structure of the presented tasks.

•
The task is ignored or the answer has no relation either to the task or the correct answer (pre-structural, P).

•
One of the items is recognizable (uni-structural, U). • Several items are correct, but the connection(s) between these items are not formulated (multi-structural, M).

•
Both the items and the connection between them are clearly formulated (relational, R).

Formula Completion
In this task, students were asked to complete a given string into two syntactically correct spreadsheet formulae ( Figure 4). Two samples were presented to provide places for two solutions-providing the range, i.e., A2:A20 (from A2 to A20) or two separate arguments, i.e., A2, A20 (A2 and A20). Furthermore, the last part of the task was to explain (name), via natural language, the domain to which the completed formulae refer. The English version of the task is shown in Figure 4, and the solutions are shown in Figure 5. In both solutions, the opening equal sign, the closing parenthesis, the reference operator, and the explanation of the domain are considered as separate items, adding up to a total of eight items.   Table 2 presents the SOLO categories of understanding assigned to the items of the formula completion task. uni-structural 3, if there is one reference operator, one closing parenthesis, and one explanation (only the equal sign is missing from one of the formulae) uni-structural 4, if only two equal signs and two closing parentheses are given pre-structural 3 pre-structural 2 pre-structural 1 pre-structural 0 pre-structural

Order of Execution
To test how knowledge transfer elements brought from mathematics can be activated in informatics, students were asked to provide the order of the execution of the given formula. In addition to the formula, a table of five rows was added, which the students   Table 2 presents the SOLO categories of understanding assigned to the items of the formula completion task. Table 2. SOLO categories assigned to the items of the formula completion task.

Order of Execution
To test how knowledge transfer elements brought from mathematics can be activated in informatics, students were asked to provide the order of the execution of the given formula. In addition to the formula, a table of five rows was added, which the students had to fill in with the steps ( Figure 6). The purpose of the extra row was to see whether students recognize that there are only four steps, or if they feel obligated to fill in all of the rows. The solution to the execution order problem ( Figure 6) is shown in Table 3. Using the SOLO categories, the scores are distributed as follows: • Calculating the average is the first step, one item.

•
Calculating the average, one item. • Calculating the average is followed by the subtraction, one item. • The subtraction is followed by the yes/no question, one item. • The question is followed by calling the IF() function, one item.

•
Calling the IF() function is the last step, one item. Table 3. The solution of the sequence/order task.

Step 1 Calculating the Average or Calling the AVERAGE() Function
Step 2 subtracting 50 from the average or subtracting Step 3 asking a yes/no question: the difference is less than 50? or asking a question Step 4 calling the IF() function Thus, a total of six items were scored. Table 4 presents the SOLO categories of understanding assigned to the items of the execution order task.

Array Formula Task
In the third task of the test, a sample table and a multi-level formula were presented ( Figure 7). The table is created from the Social Blade webtable [67] with the WDC (webtable to datatable conversion) method [68,69]. The specialty of the table is that, despite the fact that the language of the webtable is set to Hungarian, based on the selected country, the separator characters are not changed according to the syntactical rules of the language. As a result, the English thousand separator characters are transformed into a decimal comma (cells B6, B7, C3:C9, C248:C251) or a comma character (cells C2, D2:D9, D248:D251), depending on the number of commas. The consequences of this data conversion are analyzed in the paper of Nagy [70]. The aim of this task is to see how students explain what the given formula does, or what output it produces. In this task, the students' previous knowledge in mathematics (concept of functions), spreadsheeting (especially recognizing functions, their arguments, and their output), and the ability to connect this task to the previous ones are tested.
The solution to the problem is "The number of usernames starting with the character L." natural language sentence, or any other expression with the same meaning. In the answer, both the lowercase and the capital letter L are accepted.
The spreadsheet SOLO categories were assigned to the task, similar to the previous tasks (Table 5). Table 5. The answers to the array formula task with the points and the SOLO categories assigned to the answers.

Statement
Score SOLO Category the number of usernames starting with the character L (or l) 4 relational counts 3 multi-structural

The Sample
The test was carried out at the end of the pre-COVID academic year of 2017/2018, where the use of the digital tools was not affected by online teaching and testing. A total of 8880 pupils from 93 schools across Hungary participated in the measurement. Among them, 8517 are from Grades 7-10, forming the sample of this study. According to the selfassessment values, 6886 students claimed that they studied (Group Studied) spreadsheet management in school, 919 claimed that they did not (Group DidNotStudy), and 712 left the question unanswered (Group NoAnswer). Table 6 presents the number of students in Grades 7-10 participating in the test. Table 6. The number of students who studied spreadsheet management in school (Studied), did not study spreadsheet management in school (DidNotStudy), or did not answer the question (NoAnswer).

Hypotheses
The hypotheses of the present paper are set up according to the structure of the test, the current national base curriculum and frame curricula, and the coursebooks based on these official documents. The aims of our research are to reveal connections and/or missing links between the aims and the requirements of the official documents and the performance of students. Furthermore, previously published research results are taken into consideration, which shed light on the misconceptions and the negative effects of stereotypes considering teaching-learning informatics in general, especially in spreadsheeting.
Previously conducted research has proven that teaching spreadsheeting from a programming point of view and using functional languages as introductory programming languages is significantly more effective than either the interface-centered or the imperative or object-oriented "classical" methods [27,28,[71][72][73][74][75][76][77]. Being aware of these results and that school grade levels are often not good indicators of where students are in their cognitive development, the students' self-assessment values are compared to their results in the test.

Hypothesis 1 (H1).
There is no significant difference between the self-assessment values and the students' results in the test.

Hypothesis 2 (H2).
There is a strong correlation between classroom activities and the students' self-assessment.

Hypothesis 3 (H3).
There is a relationship between the learned functions, the students' selfassessment values, and the results in the test.

Hypothesis 4 (H4).
Students can apply knowledge from mathematics in solving programmingoriented spreadsheet problems.

Hypothesis 5 (H5).
There is no difference between the boys' and the girls' self-assessment and results in the tasks.

Students' Results in the Spreadsheet Tasks
From Grades 7-10, 8517 students completed the test (Table 6). Table 7 presents the average results in the three spreadsheet tasks. The averages of the scores are 1.32 (17%), 0.86 (14%), and 0.18 (5%) in the formula completion, the execution order, and array formula tasks, respectively. The frequency of students who scored zero items is 5226 (61%), 6464 (76%), and 7918 (93%) in the three tasks. The frequency of students who scored the maximum items is 118 (1%), 629 (7%), and 118 (1%). Both the modus and the median are zero in all three tasks. According to the PISA evaluation system, 78% of students are able to perform tasks at Level 2, and 99.3% of students are able to perform tasks at Level 1b [61]. In the spreadsheet tasks of the test, the students' performance is much lower, which is a shocking result and cannot be ignored. It is devastating that an extremely high percentage of students scored zero, even in the simplest task.
The mean ranks of the three tasks calculated by the Friedman Test are 2.02, 2.08, and 1.90 in the formula completion, execution order, and array formula tasks, respectively. Furthermore, it is found that there is a significant difference between the averages of the results of the three tasks (p < 0.001).
In the comparison of the three tasks (Spearman correlation), it is found that there is a strong correlation between the formula completion and the execution order tasks (R = 0.453). A weak correlation is found between the array formula and the other two tasks (R = 0.206 and R = 0.263 in the formula completion and execution order, respectively). The Chi-square crosstables reveal the details of the correlations between the pairs of the tasks. If the students' knowledge is pre-structural in the formula completion task, then there is a 91% probability that the execution order is also pre-structural. As the level of understanding increases in the formula completion task, the level of understanding is either pre-structural or relational in the execution order task. We can conclude, on the one hand, that syntactical awareness does not help the recognition of the execution order. On the other hand, better performance in the formula completion task allows knowledge-transfer activation, through which the execution order of expressions is deduced from mathematics. These mathematical knowledge pieces seem to be much more firmly solidified than the students' spreadsheet knowledge.
The knowledge that parentheses always work in pairs-which also comes from mathematics-should be applied in the formula completion task. The comparison of the opening and closing parentheses in the two formulae reveals that there are 3106, 1985, and 1949 students who completed the first formula, the second formula, and both formulae with the closing parenthesis, respectively. A strong correlation can be found between closing parentheses in the first and the second formulas (R = 0.707). On the contrary, in the other pairs of the items-except for the first closing parenthesis and the first reference operator (R = 0.82)-medium or weak correlations are detectable (Table 8). One can conclude that the two strongest pieces of knowledge are transferred from a schema and mathematics. However, one must note here that the schema-SUM(A2:A21) or SUM(A2,A21)-is not accompanied with the semantics of the expressions (R = 0.323 or R = 0.324), and neither with the reference operator of the other solution (R = 0.462, not in the table). In the comparison of the two tasks vs. the array formula task, it is found that the increase in the SOLO categories in the first two tasks-even reaching relational levelstudents only reach the multi-structural level in the array formula task. These results show that both the syntactical and the mathematical knowledge pieces remain unconnected, which leads to students not being able to cross the border of the multi-structural and the relational categories [78].
These findings partially prove Hypothesis H4. Knowledge-transfer activation from mathematics allows students to apply basic mathematical knowledge (parentheses), or to solve implicit math problems (execution order). However, when the mathematical background is only explicitly present, students are not able to apply it.

Gender Issue
In the comparison of the gender of students, a significant difference is revealed between boys and girls in the formula completion and the execution order tasks (p < 0.001 and p = 0.006)-where girls' results are higher-while there is no difference between the two groups in the array formula task (p = 0.655) (Figure 8). This result is in complete accordance with previously published papers considering boys' and girls' programming skills [79][80][81], where the "informatics and computer sciences fit boys better than girls" statement is proven to be a misconception.  The means of the SOLO values in Grades 7−10 are presented in Figure 8. The effect of the given grade is different in the three tasks. In the graphs, the lines for the girls and the boys are not parallel. The statistical analysis reveals the following differences between the genders (Table 9):

•
In Grade Seven, there is no significant difference in any of the tasks (p = 0.632, p = 0.069, and p = 0.704, respectively).

•
In Grade Eight, there is no difference in the formula completion and the array formula task (p = 0.948 and p = 0.355). However, in the execution order task, the difference is significant (p = 0.01).

•
In Grade Nine, there is a significant difference in all three tasks (p < 0.001, p < 0,001, and p = 0.014, respectively).

•
In Grade Ten, there is a significant difference in the formula completion task (p = 0.005), while there is no difference in the other two tasks (p = 0.406 and p = 0.224). Beyond considering the differences between the results of boys and girls, we cannot leave unnoticed that there is significant difference between the self-assessment of the two gender groups (p = 0.022). Boys evaluated their knowledge higher than girls did, which is in complete accordance with previously published results [27,28,73].
Hypothesis H5 must be rejected, because we have found differences between the two gender groups in both the self-assessment and the results. Considering the self-assessment, boys rated their knowledge higher than girls, while on the contrary, the results of the girls are higher than the boys.

Studied vs. Results
In the next analysis, the results of Groups Studied, DidNotStudy, and NoAnswer are compared and analyzed. As mentioned in the Chapter "The Sample", the groups are formed on the self-assessment section of the test. Even though the averages are extremely low (Table 9), the question was whether there is any difference between Groups Studied and DidNotStudy (Tables 10 and 11). In the array formula task, the number of students whose results are greater than zero is so low that we could not separate them into two groups. Consequently, the task is left out from this analysis. The analysis reveals that 43.9% vs. 10.2% (formula completion) and 27.6% vs. 5.2% (execution order) of the students in Group Studied and DidNotStudy scored at least one item, respectively. The averages of the students of the different groups are presented in Tables 10 and 11.
In spite of the low averages, a significant difference is revealed between Groups Studied and DidNotStudy. Considering all of the different groups-Total (Grades 7-10 altogether), Grade Seven, Grade Eight, Grade Nine, and Grade Ten-the results of Group Studied are higher on both tasks than in DidNotStudy. Group NoAnswer scored between the two groups.
The percentage of students in Groups Studied and DidNotStudy scoring items 0-8 and 0−6 in the formula completion and the execution order tasks are presented in Figures 9 and 10.  In the formula completion task, leaving out those students who scored zero, the two most frequent items are three and two (872 and 855 students, respectively, from 3291 students). This means that most students' knowledge is pre-structural. The two most frequent items do not depend on whether the students studied spreadsheeting in school or not (813 and 780, 24 and 27, from 3023, and 94 students in Groups Studied and DidNotStudy, respectively) ( Figure 11). Figure 11. The results of those students who scored at least one item in the formula completion task. In the execution order task, the two most frequent scores are six and one (629 and 450 students, respectively, from 2050 students). This means that most of the studentsconsidering only those who scored at least one item-either recognized only one step or all of them in the correct order ( Figure 10). In this task, the two most frequent items depend on whether the students studied spreadsheeting in school or not. In Group Studied, the two most frequent items are six and one (584 and 422 from 1901 students), while in Group DidNotStudy, the two most frequent items are two and one (16 and 15 from 48 students) ( Figure 12). Figure 12. The results of those students who scored at least one item in the execution order task.

Self-Assessment
The self-assessment values on spreadsheet management and the responses to the studied/did not study in school questions were taken into consideration at this stage of the analysis. Similar to the evaluation of the students' results, the 0-5 self-assessment values were converted into SOLO categories (Table 12). These categories are in complete accordance with the levels of science proficiency in PISA 2018, where Level 2 stands for the minimum [61]. We must note here that 516 students did not answer this question. These categories of understanding are in accordance with the European Qualifications Framework (EQF) [82] and, based on this document, the proficiency levels of DigComp 2.1 [83]. Due to the administration process of the test, the first level of EQF and DigComp cannot be tested, where work or study is carried out under direct supervision in a structured context. In a similar way, Levels 7 and 8 cannot be identified due to the limits of the testing procedure.
According to the frame curricula [8,9], teaching-learning spreadsheet management is obligatory from Grades Seven to Ten (sample size: 8517). The question arises, then, regarding why 10.79% (919 students) of the tested students claimed that they did not study spreadsheeting in school-either because they did not study it or because they did not realize that they studied it (Table 6)-and further, 8.36% (712) did not answer the question.
The comparison of Groups Studied and DidNotStudy revealed that students who studied spreadsheet management in school evaluated their knowledge higher than those who did not. Moreover, 65.1%, 19.7%, and 29.8% evaluated their knowledge at a four or five, in Groups Studied, DidNotStudy, and NoAnswer, respectively ( Figure 13). According to the SOLO categories, their knowledge is multi-structural or relational. In Group Studied, 5% (310 from 6888 students) claimed that in spite of studying spreadsheeting in school, their knowledge is non-existent, and is only the lowest, pre-structural level of understanding. In the other two groups, 56% (519 from 919 students) and 40% (287 from 712 students) claimed that their knowledge is pre-structural (Groups DidNotStudy and NoAnswer, respectively). Altogether, 4.5% of the students of Group Studied have pre-structural knowledge (zero or one), or 2.5% and 2%, respectively. In contrast, 49.8% of Group DidNotStudy rated their knowledge as zero, 6.6% evaluated their knowledge with one. This means that 56.4% of the students of Group DidNotStudy have a pre-structural level of understanding ( Figure 13). Considering all students, 9.73% of them rated their spreadsheet knowledge pre-structural, 27.16% uni-structural, 24.62% multi-structural, 20.75% relational or extended abstract, and 17.74% did not answer (Figures 14 and 15). Evaluation of the self-assessment values revealed that there is a significant difference between the grades (p = 0.024). In Grade Seven, the SOLO categories are almost equally distributed, while in Grade Eight, the multi-structural and the relational levels dominate, and in Grades Nine and Ten, the uni-and multi-structural levels dominate. Grade Eight students are the most confident, which is in complete accordance with their behavior as senior students ( Figure 14) [42,49]. Figure 15 presents the cumulative SOLO categories assigned to the self-assessment values. Based on the categories, a firm spreadsheet knowledge can be expected.

Spreadsheet Functions
As mentioned in the Chapter "Spreadsheet functions", in the second section of the test, students were asked to list the most important spreadsheet functions, and fifteen cells were offered to fill in. This is much less than the number of functions listed in the official coursebook for Grades 9-10 [63].
Students wrote various strings which they considered as function names. Along with the correct functions, primarily misspelled functions, operators, and fictive names were presented. Table 13 provides the number of students from Grades 7-10 listing 0-15 correct functions. Here, we must mention the responsibility of the translators. In Hungarian, in several cases, the translators did not follow the original naming conventions. Furthermore, the names can be changed occasionally from version to version. Among these nonconventional naming practices, one of the most distracting features is a period character inserted into the middle of the function name. One can never anticipate whether the dot is part of a name or not, and it seems to appear arbitrarily. Consequently, the translation does not fulfill its original role of helping with the memorization of these function names. We cannot leave unnoticed that on average, students listed 3.2 strings, among which 2.8 were found correct. The modus and the median of the number of correct functions are zero, due to the high number of students (4477) who could not name a single correct function. One of the reasons for the high number of zero functions is that students did not recall any function name. In other cases, they only wrote incorrect names. It is found that among those students who wrote at least one string, 45% of them named only correct (existing) functions. Among the incorrect strings, the most common errors were, as mentioned above, mathematical operators-addition, subtraction, division, and multiplication operators, incorrectly named or drawn-and misspelled functions ( Figure 16). In Figure 17, the students of Groups Studied and DidNotStudy are separated. The graph shows the percentage of those students who wrote at least one function in Groups Studied and DidNotStudy. It is found that, leaving out those students who did not write anything, the modus of the number of correct functions is four and two in Groups Studied and DidNotStudy, respectively. Furthermore, 54% of Group Studied and 13% of Group DidNotStudy named only correct functions. Another difference between the two groups is the number of students listing more than seven functions. In Group DidNotStudy, there are hardly any students who wrote that number of functions. These results show that studying spreadsheeting in school would train more conscious end-users than self-studies ( Figure 17). One might argue that there is no need for remembering the function names since wizards, tips, helps, etc. would guide end-users in finding the correct function(s). However, this is not so, because browsing the interfaces and looking for ideas requires slow thinking [64], whose processes are proved erroneous, leading to serious spreadsheet errors [39,80,84,85]. Instead, with a reasonable cognitive load [29,30,76], the names, the arguments, and the semantics of the functions must be memorized to allow end-users to activate fast thinking, which would lessen spreadsheet errors. Schema construction and applying the built schemata in solving novel problems is widely accepted and applied in teaching mathematics [86].
"Each problem that I solved became a rule which served afterwards to solve other problems" [87], declared Descartes. This idea plays a crucial role in Pólya's conceptbased problem-solving approach [86]. Unfortunately, cognitive load [88], schema construction [88,89], fast and slow thinking [40,64,90], and concept-based problem-solving [86] are not well-known in informatics, which might be explained by the distractor role of the digital devices, both hardware and software, the dominance of folk-pedagogy [50], and the lack of expert teachers [45] in this novel science.

SOLO Categories: The Level of Understanding
Beyond calculating the score of the students, the results are converted into the spreadsheet SOLO categories [49]. Based on the categorization, the frequency and the relative frequency of the students are calculated. The analysis reveals that the most frequent SOLO category is pre-structural (P) in all three tasks ( Figure 18). Furthermore, based on the SOLO categories of the three tasks, a cumulative SOLO category is assigned to the result of each student. Considering all of the possible outcomes (64), the number of triads forming the cumulative SOLO categories is presented in Table 14. The cumulative SOLO categories reveal the same results as the unique SOLO categories: the students' level of understanding is primarily pre-structural ( Figure 19). The comparison of the SOLO categories of the students' self-assessment ( Figure 15) and their results ( Figure 19) reveals that the self-assessment values are much higher than the students' real knowledge. Table 15 presents the number of students in the comparison of the self-assessment (A-P, A-U, A-M, A-R) and the results (R-P, R-U, R-M, R-R). The following can be revealed:

•
Diagonal of the matrix: the number of those students who gave a proper self-assessment value.

•
Upper triangle of the matrix: the number of those students who overestimated their knowledge. • Lower triangle of the matrix: the number of those students who underestimated their knowledge. In general, 12% of the students (952) were able to estimate their spreadsheet knowledge correctly (Table 15, diagonal, white cells with bold font style), 86% of them (6916) gave higher self-assessment values than their actual knowledge (Table 15, upper triangle, white  cells with normal style), and 2% (133) underestimated their knowledge (Table 15, lower triangle, gray cells).
In the comparison of the students' results and their self-assessment values, we found that there is no relationship between the achievement and self-assessment values. Thus, our Hypothesis H1 is rejected.

Self-Assessment vs. Classroom Activities
Among the listed classroom activities, we examined what might influence self-assessment and the proficiency level of students' knowledge. In general, we were looking for activities that might have a direct influence on the students' awareness of the processes of studying and doing spreadsheet management. Linear regression revealed that there is a weak relationship between the classroom activities and the self-assessment values (R < 0.2). We can conclude that no activities can be found, which alone has a significant effect on the students' self-assessment values. Furthermore, one must note here that the strongest relation is found between the activity of typing spreadsheet tables and the self-assessment values (R = 0.207). This finding is in complete accordance with previous results from analyzing the spreadsheet chapters of official coursebooks [10][11][12]62,70]. Coursebooks do not provide sample files with real-world data, but require students to type fictitious sample data in spreadsheet workbooks [63].
In a further examination, a cluster analysis formulated two distinct groups with a low and high number of activities. Those who marked higher number of activities are more self-confident, and their results are also higher compared to the cluster with the low number of activities. We can conclude that the students' self-assessment values are based on the number of different activities, and not on the effectiveness of teaching.
Based on these findings, we cannot confirm Hypothesis H2.

Number of Functions vs. Self-Assessment
We investigated the relationship between the number of listed and correct functions and the self-evaluation values. A weak relationship is found between the examined parameters (R = 0.232 and R = 0.217, respectively).
Consequently, we cannot prove that the number of learned functions has any effect on the self-evaluation values. This finding is further supported by the frequency of correct functions in the groups of the self-assessment values ( Figure 20). The three most frequent groups are where the self-assessment values are 4, 3, and 5, and the number of the correct function is 0. Students are rather confident considering their spreadsheet knowledge, but could not name any function.

Number of Functions vs. Results
Based on the correct functions written by the students, a frequency list was created. In the three tasks the SUM(), AVERAGE(), IF(), and LEFT() functions are mentioned. Their positions in the list are 1, 2, 5, and 20, respectively.
A medium relation is revealed between the formula completion and the execution order tasks vs. the number of functions (Table 16). This can be explained by the high position of the functions in the frequency list, where the knowledge might arrive from the classroom activities mentioned in the Chapter "General section".
On the contrary, a weak relation is revealed between the array formula task and the number of listed and correct functions (Table 16). This finding is in complete accordance with the results detailed in the Chapter "Students' results in the spreadsheet tasks". Those results revealed that pieces of knowledge are recognizable, but students are not able to synthesize their knowledge. Consequently, they cannot reach the relational level of understanding. In this case, students are familiar with the names of the functions, at least with the SUM() and the IF(), but cannot understand the relation between them accompanied with the LEFT() function (Table 16). Considering Hypothesis H3, it can be partially confirmed. On the one hand, a medium correlation is found between the results of the formula completion and the execution order tasks vs. the learned functions. On the other hand, a weak correlation is found between the array formula task vs. the learned functions.

Conclusions
Previous research proved that teaching spreadsheeting from a programming aspect [27,28,73,74] is much more effective [71,[75][76][77] than the widely accepted tool-centered low-mathability methods. The surface approach methods-focusing on the teaching of how the user interface works, typing spreadsheet tables, browsing wizards and help, coursebooks introducing as many functions as possible, and softwares offering newer and newer functions and features-do not support schema construction. Without schemata built up in the long-term memory, fast and slow thinking [64] cannot be applied effectively [81], which leads to erroneous spreadsheet documents [81,84], and ultimately to serious financial losses, both in human and machine resources [85].
The present study provides the details of the spreadsheet section of a nationwide testing of Grades 7-10 students in Hungary. The test focuses on the students' problem-solving skills, how knowledge-transfer elements are activated, and which classroom activities form the students' self-assessment values. The results of three spreadsheet tasks are evaluated by the SOLO categories of understanding, which is a widely accepted method in teaching programming. In accordance with the programming nature of the tasks presented in the test, four levels of both spreadsheeting and self-assessment SOLO categories are set up.
The self-assessment values provided by the students reveal a strong confidence in their spreadsheet knowledge, especially with those students who studied spreadsheeting in school. On the contrary, most of the students' results based on the three tasks of the test are pre-structural (84.7%), which means that no recognizable pieces of knowledge are present. The students' results demonstrate that the decontextualized, tool-centered, low-mathability approaches that characterize computer science education are inadequate for developing students' problem-solving and computational thinking skills [28,41,42,59,62,[73][74][75][76][77][78]. Furthermore, in complete accordance with the Dunning-Kruger effect [91], the pre-structural level of understanding (missing the point) prevents students from seeing what they do not know. It is also found that students form their self-assessment values based on the number of different activities carried out in classes, among which typing spreadsheet tables is the most frequent.
Considering the activation of knowledge-transfer items, the results reveal that some pieces from mathematics allow students to reach the relational level of understanding in the specific task. However, hardly any traces of spreadsheet knowledge-transfer are evident, which prevents students from transforming pieces of knowledge from one task to another, or, in other words, from classroom activities to problem-solving, and from knowing the names of functions to applying them at the relational level of understanding.
The results of the test proved that girls performed better than boys did in the programming-oriented spreadsheet problems. This finding, along with previously published similar results, call attention to the negative effect of gender stereotypes, namely that tertiary informatics education suits boys better than girls. The high number of overconfident male students might explain the extremely high attrition rate in informatics education [42]. Based on these findings, the lack of informaticians in the labor market might be improved by encouraging girls to study informatics, as it is expressed in Soloway's [22] and Ben-Ari's [24] papers.
At present, studying spreadsheeting seems to be only for the sake of learning how the user interface works, without any meaningful further aims. We can conclude from the results of the test that spreadsheet education at this stage does not support the development of students' computational thinking and algorithmic skills. As a result, students cannot reach the relational level of understanding, not to mention the extended abstract category, which is one level above relational. One must also note that-contrary to Prensky's unproven claims considering digital generations [52,53]-even digital children are in great need of educational support. To provide this support, we must have expert teachers [26,40,45] who can break away from the widely accepted tool-centered, lowmathability teaching and testing approaches, and switch to high-mathability problemsolving approaches [29,30,57,58,75,76].  Institutional Review Board Statement: Ethical review and approval were waived for this study, due to structure of the test, where only the schools are recorded strictly and occasionally the grades.