Coping with Access Difﬁculties and Absenteeism through Data Visualization: A Case Study from a Rural Vocational School in Northern Greece

: Absenteeism and early school leaving (ESL) constitute two main problems in education with a signiﬁcant impact on adolescents’ life. Early leavers from education and training may face considerable difﬁculties in the labor market later as adults. Motivated by the necessity of minimizing this phenomenon, we developed a novel application that uses big data, generated from a student attendance management system in a vocational senior high school in Greece. This application ﬁrst automatically conducts data preprocessing and data transformation and saves the processed data on a cloud data warehouse. Then, an online analytical processing (OLAP) analysis is performed, resulting in a real-time visualization that provides a variety of different graphs. In this study, we demonstrate the need for real-time visualization of the analyzed data. Such a type of presentation provides information on the spot regarding potential early leavers’ behavior, helping school administrators gain time for prompt and effective actions. Through the data processing and analysis, the application provides instructors with constant information, in addition to those acquired by the formal student attendance management systems. Our research provides indicative evidence in favor of the use of such applications, as by adequately reacting to the observed patterns in real-time, we observed a signiﬁcant decrease in students’ unnecessary absences and a reduction of the ELS phenomenon in a three-year school period.


Introduction
New competencies will be required in the years to come, and a way for future citizens to be well prepared for this is by "learning to know". According to Jacques Delors, "learning to know" means "learning to learn" to benefit from the opportunities that education provides throughout life [1].
In Greece, vocational education and training (VET) graduates gain a specialized degree in their field. Moreover, it has been observed that a large percentage of them seek to join the production workforce as specialized personnel. During the period 2015-2019, a highly targeted effort was made in Greece to support VET. The aim of this effort was effective institutional, educational, and pedagogical reforms in various fields [2,3].
In order to prevent and tackle early school leaving (ESL) in society, it is fundamental to detect the main factors causing it. The literature proposes a wide set of aspects, ranging from school-based explanations (school segregation) to individual, family, and socio-economic characteristics [4,5]. ESL is mainly observed among students of vocational education, where many of them are above 18 years old, students who face financial problems in their families and are forced to work in parallel with their studies, or students who come from a foreign country. The percentage of the latter can be expected to increase by 45 in the future due to the refugee crisis [6]. spot students in dropout danger, and generally come to conclusions that can be used to reduce absenteeism.

Related Work
A significant observation is that in Greece, vulnerable social groups, such as immigrants or other ethnic minorities, present a five times higher risk of dropping out of secondary education than the general population [22]. Other factors driving this phenomenon include cognitive ability, family composition, socioeconomic situation, or school location [23]. Interventions should not and cannot always be aimed exclusively at a person or his/her individual family. Instead, school policies can be proven to be effective. The same factors were acknowledged by teachers in a relevant study [24]. Additionally, several behavioral problems, such as depression, anxiety, or attention deficit hyperactivity disorder (ADHD) syndrome, increase the risk of dropping out of school at an early stage [25]. Health and family problems are also important reasons for absenteeism; in India, a biometric system in combination with home visits was proposed for the decrease of chronic absenteeism and, thus, of the dropout rate [26]. Socioeconomic factors were identified as a major predictor of expected ESL in the research in [27].
Another more recent study [28] makes a distinction between risk factors for absenteeism and ESL. The authors mention that the factors with large effects on absenteeism include a negative attitude towards school, substance abuse, externalizing and internalizing problems of the juvenile, as well as low parent-school involvement. As far as the issue of dropout is concerned, the risks that indicated a large effect on absenteeism include a negative attitude towards school, substance abuse, externalizing and internalizing problems of the juvenile, as well as low parent-school involvement. According to them, as far as dropout issue is concerned, the risks that indicated a large effect include a history of grade retention, having a low IQ, or experiencing learning difficulties and low academic achievement.
A significant distinction is to understand whether a student is absent due to the presence of an insuperable barrier (illness, serious family reasons, etc.) or by choice. Truancy is a common reason for systematic absenteeism that can lead to significant loss of knowledge, poor grades, and even school dropout. A truant has a 3.4 times higher risk of leaving school than his/her peers [29]. Moreover, peer group effects are emerging. Students' social behavior relates to the tendency of missing classes for no important reason. Bembich [30] highlighted the dysfunctional relational structures that raise the risk within groups of students. Therefore, having the opportunity to detect patterns in students' interactions when the number of their absences is becoming high for no obvious reason becomes of crucial importance.
A different approach to this problem is to investigate the level of school leavers. In the study in [31], students were separated into four different groups (i.e., those without any qualification, with low qualified, apprentices, and with full qualification). It was shown that these four groups revealed clear differences in the effects of different factors on the risk of ESL. Level-related factors were recognized also in the research in [32]. Higher levels of absenteeism appeared to be more closely related to lower achievement orientation, active-recreational orientation, cohesion, and expressiveness.
The common denominator in the research concerning absenteeism and ESL is the recognition of the importance of the early detection of students at risk. As Lyche [33] well stated: "Early identification enables broader, less costly measures to be set up earlier and leaves the more costly one-on-one measures for later stages of education to the remaining at-risk students that have not yet been picked up". The idea of taking action in the shortest possible time is central in the European Union policy as well. That is, in order to cultivate knowledgeable and successful teachers and schools, by emphasizing and empowering the elimination of social and educational exclusions, it needs to be an integral part of the pursuit of progress and development of the society and the economy [34]. Thus, predictive and prescriptive analytics should be established at every educational level to Appl. Sci. 2022, 12, 6946 4 of 20 provide the necessary data for decision making. Leveraging school data educators can take action rapidly to prevent ELS. Early warning systems have been used in six European countries (i.e., Poland, Lithuania, Germany, Sweden, Ireland, and the United Kingdom) with positive results [35].
In higher education, efforts are usually aimed at creating predictive systems that will notify tutors for on-time interventions concerning students at risk, based on the prediction of their academic achievements. Machine learning techniques provide an accurate prediction of dropping out, allowing educators prompt action to prevent students' failure. Lykourentzou et al. [36] compared three different techniques achieving high accuracy of prediction in e-learning course students. Márquez-Vera et al. [21] identified significant predictive factors for students at risk, including low academic performance, substance use (alcohol and smoking), hours of work, maternal education, and the number of students in each class, using machine learning techniques. A recent study [37] provided results of high accuracy to a time series prediction problem by proposing a model with an accuracy that can reach up to 84% in the final weeks of study, provided that the course demands students' involvement in online learning activities. Forum participation was found to be an indicator of higher grades, thus leading to a lower dropping-out risk. Clustering methods were also used to provide evidence of cheating in written assignments, resulting in poor final performance despite high grades during the semester [14]. Finally, Gontzis et al. [38] presented a predictive analytic tool based on participation-related attributes that can early predict students that are in danger of dropping out.
Delis et al. [39] presented the methodology and technological aspects of the application that is used in the Greek educational community to organize and manage students' data.

Materials and Methods
This section describes the methodology followed in order to create a multiple visualization environment that would instantly be informed about students' absentee status and can easily be accessed by educators. In more detail, firstly we present an overview of the This section describes the methodology followed in order to create a multiple visualization environment that would instantly be informed about students' absences status and easily be accessed by educators. In more detail, firstly we present an overview of the Greek educational system focusing on vocational schools and the geographical area of this study. Subsequently, the research questions, methodology, and finally the tools and the implementation design are explained.

Setting and Dataset Description
Education in Greece is compulsory for all children between the ages of six and fifteen. K12 education is divided into primary and secondary education. Primary education includes the k1 to k6 grades. Secondary education is divided into lower secondary (i.e., K7, K8, and K9 grades), which is compulsory, and upper secondary (i.e., K10, K11, and K12 grades) which is optional. The grades up to k9 constitute a compulsory education, while upper secondary education is optional. During upper secondary education, students can either attend a general high school and be prepared to enter higher education or a vocational high school and be qualified with specific hard skills by receiving a vocational training.
This study focused on the vocational high school of Prosotsani in the rural area of the municipality of Drama that belongs to the administrative region of Eastern Macedonia and Thrace in Northern Greece. Drama is a nonprivileged area with limited career opportunities for young people. Eastern Macedonia and Thrace is the administrative region in Greece with the second-lowest gross domestic product (GDP) per capita and is one of the poorest regions in the EU [40]. The difficult financial conditions along with a significantly low birth rate are affecting the quality of life negatively and add even more importance to the role of education and training. Especially for adolescents, it was found that socioeconomic status inequalities influence their literacy abilities, leading to poorer educational attainment and working memory task results [41], proving the augmented need for quality public education in low privileged areas.
Teachers and educational stakeholders in the reference school have empirically pointed out the significance of the absenteeism problem. Therefore, since 2012, the vocational high school's director has been using an early warning system to track the exact number of students' absences in order to inform their legal guardians. It must be noted that this warning system was an initiative undertaken by the director, as there was no general strategy for using an early warning system in Greek education. However, tracking the number of absences is not sufficient to solve the ELS problem. Visualizing results in an interpretable way is also necessary.
Data over three successive years were retrieved from the vocational high school of Prosotsani. These data came from the database of the early warning system and included the absences of students along with the characteristics that affect absenteeism. This area was chosen because it reflects the challenges of attending a nonprivileged, public, provincial school with the background of the Greek economic crisis. Students from 13 small towns and villages in the wider area of Prosotsani attended courses in the reference school (Table 1). There were a total of 258 students in the high school over the three successive years. The majority of students were boys (64%). Students from the wider area during the school period used public transportation to attend their courses. The retrieved data also contained information concerning age, gender, personal information (address, phone number, etc.), and a detailed record of absences on an hourly basis.

Research Questions
Investigating the situation presented above at the vocational high school of Prosotsani, since 2012, and over three successive years, also based on the need for a visualization of the data during this period, the research questions addressed were the following: RQ1 Can we identify those factors that affect the absenteeism phenomenon so as to create a cube of important data for real-time visualization? RQ2 Which types of visualization can provide a clear view of students' behavior in a way that even nonexperts in the field of learning analytics can easily draw conclusions for decision making? RQ3 Does the strict policy of frequent parental briefing, based on the available visualization of data followed by this school unit, discourage students from enrolling, resulting in students' population decrease? RQ4 Are there any patterns in students' absences concerning the day, the hour, or the class? RQ5 Are there any correlations between students' absences and the other variables of the data warehouse?

System Implementation Design
Our team designed and developed a new educational application with the main goal of reducing ESL and school and university dropouts ( Figure 1). It is a Windows Desktop application and can be described as Extract-Transform-Load (ETL).

RQ2
Which types of visualization can provide a clear view of students' behavior in a way that even nonexperts in the field of learning analytics can easily draw conclusions for decision making? RQ3 Does the strict policy of frequent parental briefing, based on the available visualization of data followed by this school unit, discourage students from enrolling, resulting in students' population decrease?

RQ4
Are there any patterns in students' absences concerning the day, the hour, or the class? RQ5 Are there any correlations between students' absences and the other variables of the data warehouse?

System Implementation Design
Our team designed and developed a new educational application with the main goal of reducing ESL and school and university dropouts ( Figure 1). It is a Windows Desktop application and can be described as Extract-Transform-Load (ETL). It receives data from other educational applications, such as from MySchool, Moodle, Leak, and Classter. In more detail, MySchool is an online application designed by the Greek Ministry of Education, and it is used by all Greek secondary schools, whereas Moodle is a widely used web application used for asynchronous education. Leakage is a windows desktop application designed and developed by our team that offers telematics services to reduce ESL and educational dropout. Finally, Classter is a web application of the company Vertitech that aims to manage the educational process in the life cycle of an organization such as primary, high school tuition, and general or vocational high school.
Our application was programmed in the C# programming language in Microsoft Visual Studio 2017. It utilizes Microsoft Virtual Machine.NET Framework 3.5 libraries so that our application can work in all versions of Microsoft Windows. The application sends automatically training data to the warehouse database from Excel, Access, SQLSERVER. It also sends information via SMS to parents such as information about their children's attendance. An essential requirement of the application is that the data it sends to the cloud warehouse database have to be anonymous, not only because it must comply with the community privacy requirement but also because personal data do not help us make educational decisions.
Our application transfers data to the cloud warehouse database, which was developed in SQL SERVER 2017. The warehouse database is designed with a table of events and dimensions tables in a snowflake schema. The table of facts has in each line the attendance of each student per day and the dimension tables most of the characteristics that affect the attendance of the student such as environment, schoolteacher, family, etc. It receives data from other educational applications, such as from MySchool, Moodle, Leak, and Classter. In more detail, MySchool is an online application designed by the Greek Ministry of Education, and it is used by all Greek secondary schools, whereas Moodle is a widely used web application used for asynchronous education. Leakage is a windows desktop application designed and developed by our team that offers telematics services to reduce ESL and educational dropout. Finally, Classter is a web application of the company Vertitech that aims to manage the educational process in the life cycle of an organization such as primary, high school tuition, and general or vocational high school.
Our application was programmed in the C# programming language in Microsoft Visual Studio 2017. It utilizes Microsoft Virtual Machine.NET Framework 3.5 libraries so that our application can work in all versions of Microsoft Windows. The application sends automatically training data to the warehouse database from Excel, Access, SQLSERVER. It also sends information via SMS to parents such as information about their children's attendance. An essential requirement of the application is that the data it sends to the cloud warehouse database have to be anonymous, not only because it must comply with the community privacy requirement but also because personal data do not help us make educational decisions.
Our application transfers data to the cloud warehouse database, which was developed in SQL SERVER 2017. The warehouse database is designed with a table of events and dimensions tables in a snowflake schema. The table of facts has in each line the attendance of each student per day and the dimension tables most of the characteristics that affect the attendance of the student such as environment, schoolteacher, family, etc.
From the cloud warehouse database, we isolated the data that we needed to make educational decisions. The technique we used was that of business intelligence used for many years by companies with many data to determine the behavior of their customers and their organization. Thus, with CUBE and roll-up techniques, we succeeded in isolating an educational phenomenon and managing it per period, per area, per age of students, etc.
With the data we isolated with CUBE techniques and with the capabilities of the Tools for Visual Studio (RTVS) tool, we could more easily visualize an educational phenomenon and look for the prediction.
The snowflake model, as an extension of the star model, with each dimension table extending outward, supported the role or explained details to multiple tables for external connection. The main point of these tables was for a detailed description of the fact table for some dimensions, which can reduce the fact table properties, to enhance the efficiency of the query [42].
As shown in Figure 2, the design of the cloud warehouse database was in the form of a snowflake schema. The transaction fact table included the students' absences per hour in the daily school schedule, the total students' absences per school day, and the students' absences due to the fact of illness. The dimensional tables included project characteristics and characteristics that affected students' attendance such as weather, teachers, grades, and student characteristics (without including properties that reveal the student's identity).
organization. Thus, with CUBE and roll-up techniques, we succeeded in isolating an educational phenomenon and managing it per period, per area, per age of students, etc.
With the data we isolated with CUBE techniques and with the capabilities of the Tools for Visual Studio (RTVS) tool, we could more easily visualize an educational phenomenon and look for the prediction.
The snowflake model, as an extension of the star model, with each dimension table extending outward, supported the role or explained details to multiple tables for external connection. The main point of these tables was for a detailed description of the fact table for some dimensions, which can reduce the fact table properties, to enhance the efficiency of the query [42].
As shown in Figure 2, the design of the cloud warehouse database was in the form of a snowflake schema. The transaction fact table included the students' absences per hour in the daily school schedule, the total students' absences per school day, and the students' absences due to the fact of illness. The dimensional tables included project characteristics and characteristics that affected students' attendance such as weather, teachers, grades, and student characteristics (without including properties that reveal the student's identity). In Figure 3, there is a snapshot of the cloud warehouse database. In the "Absences Fact Table", information regarding the presence of each student per school hour is shown. Namely, it shows whether the student is present or absent at a specific hour. In the case that the student is absent, it also shows whether he/she is excused with a parents' note or with a doctor's note accordingly. Additionally, it records when a student has missed the class due to the fact of an exploitation. In the rare case that a group of students have a free hour between courses, the corresponding field in the fact table takes the value: "No classes". Enterprise data warehouses (EDW or simply, DW) are complex systems serving as a In Figure 3, there is a snapshot of the cloud warehouse database. In the "Absences Fact Table", information regarding the presence of each student per school hour is shown. Namely, it shows whether the student is present or absent at a specific hour. In the case that the student is absent, it also shows whether he/she is excused with a parents' note or with a doctor's note accordingly. Additionally, it records when a student has missed the class due to the fact of an exploitation. In the rare case that a group of students have a free hour between courses, the corresponding field in the fact table takes the value: "No classes". Enterprise data warehouses (EDW or simply, DW) are complex systems serving as a repository of an organization's data. Apart from their role as enterprise data storage facilities, they include tools to manage and retrieve metadata, tools to integrate and cleanse data, and, finally, business intelligence tools for performing analytical operations. Conceptually, data warehouses are used for the timely translation of enterprise data into information useful for analytical purposes. In doing so, they have to manage the flow of data from operational systems to decision support environments.
The process of gathering, cleansing, transforming and loading data from various operational systems that perform day-to-day transaction processing (hereafter, sources or source data stores) is assigned to the ETL processes [43]. However, when it comes to the environment of online analytical processing (OLAP), which is performed over simply but neatly organized cubes, these two tasks, along with profiling (i.e., data quality assurance), have already been completed, either by the organization ETL workflow or by a do-it-yourself data wrangling. The rest of the high-level tasks are too few and too high for our purpose here [44]. Appl repository of an organization's data. Apart from their role as enterprise data storage facilities, they include tools to manage and retrieve metadata, tools to integrate and cleanse data, and, finally, business intelligence tools for performing analytical operations. Conceptually, data warehouses are used for the timely translation of enterprise data into information useful for analytical purposes. In doing so, they have to manage the flow of data from operational systems to decision support environments.

Results and Discussion
Simple and more complex graphs were created by our application (Figure 4) to investigate the most important factors that affect students' absenteeism. The choice of the graphs was driven by their ability to provide clear and interpretable relations of the data. The number of absences per student and in total is visualized per school hour (i.e., 1st to 7th hour), per day, per region, or per school year. Additionally, the use of maps can reveal problems related with distant or problematic regions. Possible relations between the variables that were measured can be proven by using correlation analysis that can be accessible at a glimpse using correlation plots.
neatly organized cubes, these two tasks, along with profiling (i.e., data quality assurance), have already been completed, either by the organization ETL workflow or by a do-it-yourself data wrangling. The rest of the high-level tasks are too few and too high for our purpose here [44].

Results and Discussion
Simple and more complex graphs were created by our application (Figure 4) to investigate the most important factors that affect students' absenteeism. The choice of the graphs was driven by their ability to provide clear and interpretable relations of the data. The number of absences per student and in total is visualized per school hour (i.e., 1st to 7th hour), per day, per region, or per school year. Additionally, the use of maps can reveal problems related with distant or problematic regions. Possible relations between the variables that were measured can be proven by using correlation analysis that can be accessible at a glimpse using correlation plots. To begin with, certain simple graphs allow educators to draw simple conclusions regarding students' data. The bar chart from the three-year period ( Figure 5) indicated that students from certain regions (i.e., Mikropoli, Petroussa, Volakas, Kali Vrysi, and Anthochori) tended to miss courses more frequently. The abovementioned areas have a long distance from the school, and students from these areas had many absences during the first year (2012-2013) out of the three years in our research, but, subsequently, they were able to reduce them with the help of the early warning system (Table 1). Moreover, it was obvious that during every school year (between 2012 and 2015) it was always the first hour of the daily schedule that was the most likely to be missed ( Figure 6). This is usually due to the fact of transport-related problems, bad weather and bad sleeping habits. To begin with, certain simple graphs allow educators to draw simple conclusions regarding students' data. The bar chart from the three-year period ( Figure 5) indicated that students from certain regions (i.e., Mikropoli, Petroussa, Volakas, Kali Vrysi, and Anthochori) tended to miss courses more frequently. The abovementioned areas have a long distance from the school, and students from these areas had many absences during the first year (2012-2013) out of the three years in our research, but, subsequently, they were able to reduce them with the help of the early warning system (Table 1). Moreover, it was obvious that during every school year (between 2012 and 2015) it was always the first hour of the daily schedule that was the most likely to be missed ( Figure 6). This is usually due to the fact of transport-related problems, bad weather and bad sleeping habits.
Students in the second grade of the vocational high school tended to skip classes unexcused more than the other students (Figure 7). This happened mostly in the middle of the week.
As shown in Figure 8, the students who lived in region 4 were late for school more than their peers from other areas. Since reg.04 is not the most distant area, this might indicate problems in transport connection.
The pie chart in the left part of Figure 9 provides information about the percentage of students' enrollments during three successive school years. Additionally, in the right part of Figure 9, the quota of total absences for the same period is shown. That is, in the total of the three years 2012-2015 of our research, the students in the 1st year were 31.91%, in the 2nd year 31.14%, and in the 3rd year 36.6%. There was a descending percentage of absences while the number of students during the same period was either almost the same or increased. Vocational schools in Greece follow an inclusive policy, gathering many students from vulnerable groups. Practices that can be considered as strict or demanding may discourage student admission by increasing the fear of failure. The eventual rise in the enrollment of students and the parallel drop in students' absences indicate that the implementation of our system had a positive acceptance rather than created a feeling of freedom restriction.  Students in the second grade of the vocational high school tended to skip classe excused more than the other students (Figure 7). This happened mostly in the mid the week.  Students in the second grade of the vocational high school tended to skip classe excused more than the other students (Figure 7). This happened mostly in the midd the week.  As shown in Figure 8, the students who lived in region 4 were late for school more than their peers from other areas. Since reg.04 is not the most distant area, this might indicate problems in transport connection. The pie chart in the left part of Figure 9 provides information about the percentage of students' enrollments during three successive school years. Additionally, in the right part of Figure 9, the quota of total absences for the same period is shown. That is, in the total of the three years 2012-2015 of our research, the students in the 1st year were 31.91%, in the 2nd year 31.14%, and in the 3rd year 36.6%. There was a descending percentage of absences while the number of students during the same period was either almost the same or increased. Vocational schools in Greece follow an inclusive policy, gathering many students from vulnerable groups. Practices that can be considered as strict or demanding may discourage student admission by increasing the fear of failure. The eventual rise in the enrollment of students and the parallel drop in students' absences indicate that the implementation of our system had a positive acceptance rather than created a feeling of freedom restriction. As shown in Figure 8, the students who lived in region 4 were late for school more than their peers from other areas. Since reg.04 is not the most distant area, this might indicate problems in transport connection. The pie chart in the left part of Figure 9 provides information about the percentage of students' enrollments during three successive school years. Additionally, in the right part of Figure 9, the quota of total absences for the same period is shown. That is, in the total of the three years 2012-2015 of our research, the students in the 1st year were 31.91%, in the 2nd year 31.14%, and in the 3rd year 36.6%. There was a descending percentage of absences while the number of students during the same period was either almost the same or increased. Vocational schools in Greece follow an inclusive policy, gathering many students from vulnerable groups. Practices that can be considered as strict or demanding may discourage student admission by increasing the fear of failure. The eventual rise in the enrollment of students and the parallel drop in students' absences indicate that the implementation of our system had a positive acceptance rather than created a feeling of freedom restriction.  Radar charts (or spider charts) are mainly used to compare categorical attributes. Each axis represents an area, so it is easy to rank them by ascending order. This is important because it allows teachers to instantly draw conclusions regarding the progression of absenteeism compared with other schools, indicating performance and improvement. The three-fold diagram of Figure 10   Radar charts (or spider charts) are mainly used to compare categorical attributes. Each axis represents an area, so it is easy to rank them by ascending order. This is important because it allows teachers to instantly draw conclusions regarding the progression of absenteeism compared with other schools, indicating performance and improvement. The three-fold diagram of Figure 10  Radar charts (or spider charts) are mainly used to compare categorical attributes. Each axis represents an area, so it is easy to rank them by ascending order. This is important because it allows teachers to instantly draw conclusions regarding the progression of absenteeism compared with other schools, indicating performance and improvement. The three-fold diagram of Figure 10 was created with Microsoft Radar Chart (ver.2.0.2.0, 2019, Microsoft Power BI, USA), and it indicated a change in the school's rank depending on the type of absenteeism. The absences per student are compared to the frequency of skipping the first class and the absence during the last hour of school. Differences in the areas' order between these three diagrams can be indicative of factors affecting absenteeism. The visualization is dynamic, allowing teachers to choose the school year and have a constantly updated view of the new incoming data. To imprint the contribution of a variable in the overall flow, the flow diagram called Sankey was used. In particular, in Figure 11 the contribution of each region presented in Table 1 to the total number of absences during three successive school years is shown. Links are weighted to using the number of total absences so thicker lines indicate a greater contribution to the absenteeism phenomenon. To imprint the contribution of a variable in the overall flow, the flow diagram called Sankey was used. In particular, in Figure 11 the contribution of each region presented in Table 1 to the total number of absences during three successive school years is shown. Links are weighted to using the number of total absences so thicker lines indicate a greater contribution to the absenteeism phenomenon.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 13 of 22 Figure 11. Students' absences in each region during the three school years. Figure 12 was compiled with our system and the ArcGIS application, which helps to visualize the data using interactive maps. The star schema storage allows us the choice of visualizing different schools or different school years. In Figure 12, the total number of students' absences in the area per student is shown. Different colors represent different years: the school year 2012-2013 is red, the year 2013-2014 is yellow, and the year 2014-2015 is blue. This visualization helps the teachers to identify absenteeism problems in students from a specific area and thus to relate them to relevant factors, such as transportation problems and eventually to determine if the problem was effectively managed over time. Figure 11. Students' absences in each region during the three school years. Figure 12 was compiled with our system and the ArcGIS application, which helps to visualize the data using interactive maps. The star schema storage allows us the choice of visualizing different schools or different school years. In Figure 12, the total number of students' absences in the area per student is shown. Different colors represent different years: the school year 2012-2013 is red, the year 2013-2014 is yellow, and the year 2014-2015 is blue. This visualization helps the teachers to identify absenteeism problems in students from a specific area and thus to relate them to relevant factors, such as transportation problems and eventually to determine if the problem was effectively managed over time. Figure 12 was compiled with our system and the ArcGIS application, which helps to visualize the data using interactive maps. The star schema storage allows us the choice of visualizing different schools or different school years. In Figure 12, the total number of students' absences in the area per student is shown. Different colors represent different years: the school year 2012-2013 is red, the year 2013-2014 is yellow, and the year 2014-2015 is blue. This visualization helps the teachers to identify absenteeism problems in students from a specific area and thus to relate them to relevant factors, such as transportation problems and eventually to determine if the problem was effectively managed over time. Microsoft Power BI was used to create dynamic visualization with the same data format as in the previous graphs to develop the online map in Figure 13  Microsoft Power BI was used to create dynamic visualization with the same data format as in the previous graphs to develop the online map in Figure 13 To investigate the dependence between multiple variables time and to highlight the most correlated variables in our data table we used the correlation plot. The statistical measure of Pearson correlation coefficient was used where the value of +1 (or −1) indicates a perfect correlation between two variables, with +1 indicating a positive correlation and −1 a negative (inverse) correlation; a value in the range from 0.6 to 1 (or from −0.6 to −1) indicates a strong correlation; a value between 0.4 and 0.6 (or between −0.4 and −0.6) indicates a moderate correlation; a value in the range from 0 to 0.4 (or from 0 to −0.4) indicates a weak correlation [45]. The correlation coefficients are colored according to the value providing in a simple and interpretable way. The students were divided into three groups according to their level of absenteeism. In the first group students who had to repeat the class because they exceeded the limit of absences (N = 25). The second group contains To investigate the dependence between multiple variables time and to highlight the most correlated variables in our data table we used the correlation plot. The statistical measure of Pearson correlation coefficient was used where the value of +1 (or −1) indicates a perfect correlation between two variables, with +1 indicating a positive correlation and −1 a negative (inverse) correlation; a value in the range from 0.6 to 1 (or from −0.6 to −1) indicates a strong correlation; a value between 0.4 and 0.6 (or between −0.4 and −0.6) indicates a moderate correlation; a value in the range from 0 to 0.4 (or from 0 to −0.4) indicates a weak correlation [45]. The correlation coefficients are colored according to the value providing in a simple and interpretable way. The students were divided into three groups according to their level of absenteeism. In the first group students who had to repeat the class because they exceeded the limit of absences (N = 25). The second group contains students with high levels of absenteeism who successfully completed the class (N = 118), while the students in the third group had rarely been absent (N = 116). As it was expected due to the criterion used for divided the students in the groups, a one-way ANOVA revealed that there was a statistically significant difference in the total number of absences between these three group of students (F(between groups 2, within groups 256) = 234.35, p = 0.00). Apart from the correlation matrix for each group, the p-values are provided in Appendix A to evaluate the statistical significance of the results.
The correlation matrix in Figure 14 shows that most of the parents did not contact the school to justify their children's absences. There was a weak negative correlation between the total number of students' absences and the number of absences that their parents justified. This result may seem contradictory. However, it could be indicative of a lack of interest on behalf of parents concerning their children's schooling or an attitude of low expectations.  The number of absences due to the fact of expulsion was positively but not significantly correlated with the distance (r(23) = 0.45, p = 0.16) but weakly and negatively correlated with the population (r(23) = 0.17, p = 0.02). Additionally, there was a positive correlation between the number of expulsion absences and the total number of absences (r(23) = 0.38, p = 0.00), showing that the tendency to skip classes regularly often comes with behavioral difficulties. In the study by Sara et al. [46], which was conducted in Denmark, four characteristics that influenced school dropout were highlighted: class size, school size, last month's absences, and average income per postal code.
In the second group of students (Figure 15), parents justified their children' absences. There was a strong positive correlation between the number of justified absences from the parents and the total number of absences (r(116) = 0.75, p = 0.00). Students that lived in more privileged areas tended to skip more classes because there was a very strong positive correlation between the market value of students' residence and the total number of absences (r(116) = 0.89, p = 0.00). There was a moderate positive correlation between the total The number of absences due to the fact of expulsion was positively but not significantly correlated with the distance (r(23) = 0.45, p = 0.16) but weakly and negatively correlated with the population (r(23) = 0.17, p = 0.02). Additionally, there was a positive correlation between the number of expulsion absences and the total number of absences (r(23) = 0.38, p = 0.00), showing that the tendency to skip classes regularly often comes with behavioral difficulties. In the study by Sara et al. [46], which was conducted in Denmark, four characteristics that influenced school dropout were highlighted: class size, school size, last month's absences, and average income per postal code.
In the second group of students (Figure 15), parents justified their children' absences. There was a strong positive correlation between the number of justified absences from the parents and the total number of absences (r(116) = 0.75, p = 0.00). Students that lived in more privileged areas tended to skip more classes because there was a very strong positive correlation between the market value of students' residence and the total number of absences (r(116) = 0.89, p = 0.00). There was a moderate positive correlation between the total number of absences and the distance that student had to make to get to school (r(116) = 0.35, p = 0.00). In the group of students who rarely skipped classes (Figure 16), the total number of absences was highly correlated with the market value of their residence, also for students with a low number of absences (r(114) = 0.063, p = 0.00). Additionally, there was a strong positive correlation between the total number of absences and the number of absences that were justified by parents (r(114) = 0.60, p = 0.00). Finally, there was a moderate positive correlation between the number of absences and the distance to the school from students' residence (r(114) = 0.46, p = 0.00). In their study, Nascimento et al. [47] used correlation tables to show that the level-age dispersion had the highest positive correlation with school dropout, while, on the other hand, the adequacy of teacher training had the highest negative correlation.
In the group of students who rarely skipped classes (Figure 16), the total number of absences was highly correlated with the market value of their residence, also for students with a low number of absences (r(114) = 0.063, p = 0.00). Additionally, there was a strong positive correlation between the total number of absences and the number of absences that were justified by parents (r(114) = 0.60, p = 0.00). Finally, there was a moderate positive correlation between the number of absences and the distance to the school from students' residence (r(114) = 0.46, p = 0.00).

Conclusions
In this paper, we presented a two-fold effort to diminish absenteeism that was implemented in the vocational high school of Prosotsani. The system proposed combined with the early warning system that the school already uses provided educational stakeholders with simple and complex graphs that can be easily interpreted. The results showed a significant impact in gradually reducing the number of students' absences during three successive school years. At the same time, there was a rise in students' enrolment, indicating that this policy was well received by the local community.
Concerning RQ1, it was shown that factors such as the hour, region, or the day could provide useful knowledge from students to their tutors, so that they can control the absenteeism phenomenon. As it was shown in the previous section, there were significant correlations between students' absences and factors such as distance and the market value of the properties where they live. This information should be provided in simple graphs, such as bar and pie charts, to be straightforward and easy to read. Additionally, radar charts and maps containing rich information can be used by nonexperts for sense making such as knowing in which area the absenteeism has decreased over the years. In addition, elaborated graphs, such as Sankey diagrams and correlation plots, can be used to present relationships between different factors (RQ2). Additionally, simple comparisons, such as those in Figure 9, answer to RQ3, showing that the strict policy of frequent parental briefing did not discouraged students from enrolling even when there were aware of this policy in the two following years. Graphs 5, 6, and 7 revealed information concerning patterns connected with the hour, the day, and the class that students attend (RQ4). There, the educators can spot which day has the most absences or whether students from a certain region tend to skip classes more often. Those patterns may vary from time to time. Thus, the proposed system's ability to daily load data and provide updated information to educators is one of its main benefits. This actionable knowledge can be used to introduce new policies when needed. For example, Figure 7 can lead tutors to investigate what

Conclusions
In this paper, we presented a two-fold effort to diminish absenteeism that was implemented in the vocational high school of Prosotsani. The system proposed combined with the early warning system that the school already uses provided educational stakeholders with simple and complex graphs that can be easily interpreted. The results showed a significant impact in gradually reducing the number of students' absences during three successive school years. At the same time, there was a rise in students' enrolment, indicating that this policy was well received by the local community.
Concerning RQ1, it was shown that factors such as the hour, region, or the day could provide useful knowledge from students to their tutors, so that they can control the absenteeism phenomenon. As it was shown in the previous section, there were significant correlations between students' absences and factors such as distance and the market value of the properties where they live. This information should be provided in simple graphs, such as bar and pie charts, to be straightforward and easy to read. Additionally, radar charts and maps containing rich information can be used by nonexperts for sense making such as knowing in which area the absenteeism has decreased over the years. In addition, elaborated graphs, such as Sankey diagrams and correlation plots, can be used to present relationships between different factors (RQ2). Additionally, simple comparisons, such as those in Figure 9, answer to RQ3, showing that the strict policy of frequent parental briefing did not discouraged students from enrolling even when there were aware of this policy in the two following years. Graphs 5, 6, and 7 revealed information concerning patterns connected with the hour, the day, and the class that students attend (RQ4). There, the educators can spot which day has the most absences or whether students from a certain region tend to skip classes more often. Those patterns may vary from time to time. Thus, the proposed system's ability to daily load data and provide updated information to educators is one of its main benefits. This actionable knowledge can be used to introduce new policies when needed. For example, Figure 7 can lead tutors to investigate what led the students of the second to skip classes more often than their peers and make an appropriate intervention to reduce them. Finally, the correlation plots showed several statistically significant correlations, especially for students with high number of absences (i.e., students at a high dropping out risk) and the students who eventually had to repeat their class (RQ5).
On a deeper level, the constantly updated visualization of students' absences combined with related variables (school year, hour in the daily school schedule, and region) provides teachers with the opportunity to spot and prove possible reasons for systematic absences and identify students at risk of realizing social aspects of the problem. This way valuable time can be gained, allowing teachers to act and ask for the intervention of specialists (phycologists or social workers) when needed. The potential long-term positive contribution of our system to prevent absenteeism and ESL could be a robust argument towards its implementation in the common policy of every high school towards the elimination of these phenomena.

Acknowledgments:
We would like to thank Konstantinos Kabadais, Director of the Vocational School of Prostotsani as well as all the members of the teaching staff for their devotion in the implementation of this new technology for the students' benefit. Their cooperation led to the successful completion of students' studies and also to the increase in the student's guardian trust towards the public vocational schools.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. p-Values for the correlation matrix in Figure 14.  Table A2. p-Values for the correlation matrix in Figure 15.  Table A3. p-Values for the correlation matrix in Figure 16.