Building Data Literacy for Sustainable Development: A Framework for Effective Training

Said, Raed A. T.; Mwitondi, Kassim S.; Benseddik, Leila; Chemlali, Laroussi

doi:10.3390/data10110188

Open AccessArticle

Building Data Literacy for Sustainable Development: A Framework for Effective Training

¹

College of Social and Human Sciences, Mohamed Bin Zayed University for Humanities, Abu Dhabi P.O. Box 106621, United Arab Emirates

²

Social and Economic Survey Research Institute (SESRI), Qatar University, Doha P.O. Box 2713, Qatar

³

School of Communication, Arts and Sciences, Canadian University Dubai, Dubai P.O. Box 117781, United Arab Emirates

⁴

College of Law, Ajman University, Ajman P.O. Box 346, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Data 2025, 10(11), 188; https://doi.org/10.3390/data10110188

Submission received: 28 September 2025 / Revised: 27 October 2025 / Accepted: 30 October 2025 / Published: 11 November 2025

Download

Browse Figures

Versions Notes

Abstract

As the transformative influence of novel technologies sweeps across industries, organisations are called upon to position their staff in the equally dynamic operational environment, which includes embedding technical and legal communication skills in their training programs. For many organisations, internal and external communication of data modelling and related concepts, reporting, and monitoring still pose major challenges. The aim of this research is to develop an effective data training framework for learners with or without mathematical or computational maturity. It also addresses subtle aspects such as the legal and ethical implications of dealing with organisational data. Data was collected from a training course in Python, delivered to government employees in different departments in the United Arab Emirates (UAE). A structured questionnaire was designed to measure the effectiveness of the training program using Python, from the employees’ perspective, based on three key attributes: their personal characteristics, professional characteristics, and technical knowledge. A descriptive analysis of aggregations, deviations, and proportions was used to describe the data attributes gathered for the study. The main findings revealed a huge knowledge gap across disciplines regarding the core skills of big data analytics. In addition, the findings highlighted that previous knowledge about statistical methods of data analysis along with prior programming knowledge made it easier for employees to gain skills in data analytics. While the results of this study showed that their training program was beneficial for the vast majority of participants, responses from the survey indicate that providing a solid knowledge of technical communication, legal and ethical aspects would offer significant insights into the big data analytics field. Based on the findings, we make recommendations for adapting conventional data analytics approaches to align with the complexity or the attainment of the non-orthogonal United Nations Sustainable Development Goals (SDG). Associations of selected responses from the survey with some of the key data attributes indicate that the research highlights vital roles that technology and data-driven skills will play in ensuring a more prosperous and sustainable future for all.

Keywords:

association rules; big data; correspondence analysis; legal and ethical awareness; sustainable development goals (SDGs); technical communication; training effectiveness

1. Introduction

Research shows that our capabilities to generate data are growing exponentially, fast outperforming our processing potential, and this trend will continue to grow [1]. Traditional methods of storing, processing, and analysing digital data have become impractical due to the increasing rise of digital data created from diverse and divergent sources. Today we are having to deal with multi-faceted, high-dimensional structured data and highly complex unstructured data. Such data has always been a key ingredient in the decision-making process. Traditional methods of storing, processing, and analysing digital data have become impractical due to the exponential rise of digital data created from diverse and divergent sources. Accuracy, replicability, robustness and timeliness are crucial aspects of data analytics, as we deal with multi-faceted data including simple surveys, high-dimensional structured to highly complex unstructured data. Such developments have sparked data-intensive research through applications aimed at addressing societal challenges and opportunities around big data modelling [2,3], a branch of data science that examines massive data for patterns, trends, and associations. For example, using association rules to analyse underlying patterns entails associating two or more data attributes based on a conditional probabilistic model comprising an antecedent (IF) and a consequent (THEN) [4,5].

The foregoing inter-dependencies have been applied in a wide range of data-driven improvements of products and services, including education, i.e., analysis of raw data potentially leads to enhanced products, artefacts and services, which, in turn, become “data inputs” for their own enhancement and new inventions. Designed to run iteratively, they render any prior knowledge “updatable” as new posterior knowledge arrives, underpinning the massive developments in how we generate and consume data. Apparently, in updating existing knowledge, we have to be mindful of the behaviour of a wide range of parameters and characteristics that affect the accuracy of our inputs because of data randomness, variation and interactiveness that cannot always be observed [2]. Conventional techniques rely on data completeness and/or fulfilled assumptions by design or modification. For instance, the Chi-square [6,7] is impaired when marginal totals are fixed or when the expected frequencies are small (less than 5). Many applications would usually avoid applying the test when expected cell frequencies are low or below a particular threshold—i.e., more than 10% of the two-way table cells having expected frequencies below 5. These decisions potentially lead to information loss, via masking and swamping [8,9], hence addressing them is imperative for attaining a prosperous and sustainable societal development. A key aspect of addressing those issues relates to providing proper training of big data analysts. System users are a rich source of data for improving the system itself. Within a working environment, current employees are the decision makers of today, and they form the basis of future decisions. Analysing their experiences and perceptions in a spatio-temporal context is naturally ideal as it generates highly valuable data for transforming their working environment, which constitutes the main motivation of this work, as described below.

1.1. Motivation

This work was motivated by a number of factors, such as the dynamics of modern-day data analytics, particularly the increasing demand for data practitioners and researchers to deliver timely and accurate decisions. It was also motivated by the digital divide at national, regional and global levels [10,11] as well as the need to address the shortage in data analytics skills [12]. Further, utilising surveyed data of end users, learning a data analytics package, evaluating their experiences and perceptions through visualisation, not only provides an alternative analytical approach under violated assumptions, but it also provides insights into their capabilities and potential as decision makers. These capabilities and potentials can be tracked in a spatio-temporal context to enhance societal products and services. Such an approach can help deliver data-driven solutions to the global socio-economic, cultural and technological challenges, which is the main agenda of the United Nations Sustainable Development Goals (SDG) [2,3,13,14].

The deep and wide socio-economic, cultural and technological variations across the globe entail a unified understanding of the aspects that affect us locally, nationally, regionally and globally—aspects that are encompassed in the Sustainable Development Goals (SDG) project [13]. Future applications of SDG modelling are going to be flooded with increasingly larger amounts of data. It is therefore imperative that the data attributes that go into our decision support systems, be they from simple surveys and citizen science or from sophisticated satellite and earth data capturing equipment, align with those of other researchers in relevant fields. One of the main challenges derives from unseen, or rather ignored, gaps in knowledge that arise from working in silos or on a fixed agenda. It is, therefore, important that those responsible for data processing, modelling and dissemination of research findings at institutional, national, regional or global levels are well acquainted with the relevant tools and techniques. This is mainly because the variation in conclusions reached by different studies, as a result of data randomness, needs harmonisation.

One manifestation of big data is the SDG project, the complex interactions of which present both a challenge and an opportunity to researchers and decision makers across fields and sectors. The training for this research aligns with the 8th SDG, which focuses on sustained, inclusive, and sustainable economic growth, full and productive employment, and decent work for all. The SDG is greatly advanced by professional development training programs in big data analytics. The abilities and knowledge gained from such programs are crucial for utilising data to power economic growth and job creation in a future that is increasingly data-driven. These programs give people the skills necessary to evaluate and draw conclusions from large datasets, enabling them to make valuable contributions to sectors that drive growth. Big data analytics may also support evidence-based decision-making, enhance resource allocation, and improve operational efficiency—all of which contribute to the sustainability of economic growth. The 8th SDG and big data analytics professional development highlight the vital role that technology and data-driven skills will play in ensuring a prosperous and sustainable future for all [15], which is the main idea of this study.

1.2. Background

Big Data and related concepts are becoming increasingly household across disciplines and sectors as we seek to investigate massive data using machine learning algorithms to learn rules and guide our decision-making processes [16,17]. While Big Data is defined as datasets larger than what can be handled by standard data management and analysis tools and procedures, Big Data Analytics refers to the methods and technologies that an organisation can use to analyse complex, large-scale data for a range of applications aimed at enhancing the organisation’s performance across the board [18]. Large-scale data production typically outpaces the capacity of analytical techniques and computing power, which is why big data analytics is posing both new challenges and possibilities across a range of industries. Numerous methods aim to alter corporate decision-making, enhance the effectiveness of processes, identify prospective areas for future innovation, and include citizens in the formulation, planning, and implementation of policies. However, some businesses are still reluctant to embrace the higher scale of data availability and analytics associated with big data, despite the fact that its benefits are becoming more widely acknowledged [17].

Research has indicated that a critical factor in deciding on a successful and long-lasting Big Data deployment is readiness, which hinges on a good understanding of the procedures, tools and methods to be adopted [19]. Readiness is impacted by various elements, such as organisational alignment with the Big Data strategy, organisational competences to effectively use Big Data, and organisational maturity level with regard to e-governance. One of the main obstacles affecting the implementation of big data analytics is the lack of skilled technologists who can handle the dynamism of big data. The presence of skillful managers with a good understanding of the processes, tools and techniques relating to big data analytics is critical to the performance of any organisation [20]. Consequently, a key outcome of any big data analytics courses should be to deliver two-dimensional skills—soft skills (including decision-making, creative and communication skills) and hard skills (information skills, technical and analytical skills) [21,22].

This present study examines responses from a sample of trainees regarding their self-assessment and perceptions about a data analytics training course, and the study is designed to reflect the different roles employees within an organisation play in decision-making. We set off from the premises that (1) training and upskilling prepare future leaders and (2) that training in data analytics has particularly far-reaching impact across the SDG spectrum. It is in this context that we relate staff training to the entire SDG project, as the impact of training goes beyond SDG #8. By assessing the effectiveness of data analytics training on employees at a corporate workplace and identifying the key factors that impinge on knowledge acquisition capabilities among staff who routinely generate and consume corporate data.

1.3. Research Questions and Objectives

We investigate factors affecting the development and the effectiveness of a training program in Big Data Analytics using Python, one of the most popular and efficient open-source tools available today [23,24]. The investigation also seeks to fill knowledge gaps under constrained distributional assumptions. Training focused on Python and its final assessment was based on the following learning outcomes.

Understand and use Python data science libraries.
Create Python codes for the statistical analysis.
Create visualisation graphics using Python libraries.

This research is designed to engage trainees in identifying the key factors that potentially influence their performance in routine data analysis tasks in the work place. It is predicated on the deviations between “observed” and “expected” data attributes (see Section 2.2.1 and Section 2.2.2) based on established associations between selected attributes of interest, such as the level of education and the trainees’ personal assessment of their experience with Big Data. The following three research questions are therefore set and examined in the foregoing context, and they focus on “training effectiveness”.

Do personal characteristics affect employees’ performance in data analytics training?
Do Professional characteristics affect employees’ performance in data analytics training?
Do prior knowledge skills affect employees’ performance in the data analytics training?

Answers to the foregoing questions are likely to cast a light on the coherence of the training lifecycle among the course participants. Its main aim aligns with the foregoing questions, which we seek to address via the following objectives.

To visualise the distributional behaviour of responses.
To assess visual objects with respect to questions.
To explore associations between variables.
To test and interpret the associations.
To highlight new directions for data visualisation.

2. Methods

This section describes the research process—an integrated systematic data collection and analysis approach. It is divided into two main sub-sections, data sources and modelling strategy. The strategy is to elicit trainees’ perceptions of the program, from a survey, and visually use them to assess the effectiveness of the training program. The choice of the exact analysis method is conditional on the research questions in Section 1.3.

2.1. Data Sources

Data was obtained from an online survey of attendees of a three-day training program in Big Data Analytics using Python in different Governmental departments in the United Arab Emirates (UAE). A total of 87 responses were recorded on questions in three categories, as illustrated in Figure 1. The technical prowess of the trainees was assessed via a final exam, scored out of 60. A summary of the data attributes is presented in Table 1.

2.2. Modelling Strategy

The three research questions in Section 1.3 all allude to inter-dependencies between variables. Further, as only one variable in Table 1 is non-categorical, the modelling strategy can be confined to exploratory data analysis (EDA) and inter-variable associations. The latter deploys two closely-related methods—the Chi distribution [6] and correspondence analysis [7]. As graphically illustrated in Figure 1, the strategy was to combine the three characteristics to elicit trainees’ perception of the program and use the data to assess the effectiveness of the training and hence address the research questions in Section 1.3. Statistical package R [25] was used as the main tool of the analyses.

2.2.1. Observable and Expected Values Deviations

One popular technique in survey research is the two-way contingency table—a matrix presentation of the frequency distribution of two variables, the independence of which is being measured. Table 2 exhibits the distribution of the trainees in accordance with the two variables—Education and Big Data, as described in Table 1. We can use the contents of the two-way contingency Table 2 to assess the association between the two variables.

Since

\sum_{i = 1}^{k = 20} n_{i} = N = 87,

where k is the combined number of groups/cells and n is the number of cases in each cell, we can compute the expected value for each of the cells if we hypothesise probability values

p_{i}

for each cell. If the hypothesis is true, then the deviation between each value and its expected value

(n_{i} - N p_{i})

must be very small, which makes the deviations between the actual and observable values quite interesting. A test statistic based on these deviations and weighted by the reciprocals of their expected values is defined as

X^{2} = \sum_{i = 1}^{k = 20} \frac{{[n_{i} - E (n_{i})]}^{2}}{E (n_{i})} = \sum_{i = 1}^{k = 20} \frac{{[n_{i} - N p_{i}]}^{2}}{N p_{i}}

(1)

It can be shown that as

N \to \infty,

X^{2}

approximately possesses a Chi-square probability distribution in repeated sampling [6,26], defined as the weighted average

{\tilde{χ}}^{2} = \frac{1}{d} \sum_{k = 1}^{n} \frac{{(O_{k} - E_{k})}^{2}}{E_{k}}

(2)

where

O_{k}

and

E_{k}

are the observed and expected counts, respectively. Equation (2) represents a test of independence to determine whether two categorical variables are statistically related or not. Establishing an association between two attributes of interest, such as the level of education and the trainees’ personal assessment of their experience with Big Data, may contribute towards answers to the questions in Section 1.3. The test sorts the data according to the variables being tested and “tests the hypothesis that there is no relationship between them” by comparing the “actual counts” from the sample data with the “expected counts”, given that the null hypothesis of no relationship is true. That is, the Pearson’s contribution to the Chi-square criterion for each data point

i, j

is the average of the differences

d_{i j} = \frac{f_{i, j} - e_{i, j}}{\sqrt{(} e_{i, j})}

(3)

where

f_{i j}

and

e_{i j}

are the observed and expected counts. The expected counts, in each cell, are defined as

Expected Count = \frac{Row Total \times Column Total}{Total Count of the sample}

(4)

Equation (4) is an “average” measure of the opinion of the trainees in the collected sample, and it generalises our findings on how the trainees felt about the course. Typically, the call counts should not be too small for the Chi-square to provide an adequate approximation to the theoretical distribution. While most applications require that this number be 5 or higher, we can allow it to be as low as one [27], as we pay attention to issues of data randomness [2].

2.2.2. Correspondence Analysis

We shall also be using correspondence analysis [7], to visualise associations between different categories of selected data attributes in two-dimensional space. The goal, in this case, is to establish an association between some row elements and some column elements, an exercise that generates orthogonal components, with maximisation of variation in the data in mind. Let

M

denote a matrix of dimension

r \times c

, from which we can compute a set of weights

\begin{matrix} w_{r} & = \frac{1}{c_{M}} M 1 = diag [\frac{1}{\sqrt{w_{r}}}] \\ w_{c} & = \frac{1}{c_{M}} 1^{T} M = diag [\frac{1}{\sqrt{w_{c}}}] \end{matrix}

(5)

where

1

is a univariate vector of ones and

c_{M}

is the sum of the contents of

M

across rows and columns, defined as

c_{M} = \sum_{i = 1}^{r} \sum_{j = 1}^{c} M_{i j}

(6)

From Equations (5) and (6), we can obtain the same two-way table transformed into proportions,

Π = \frac{M}{c_{M}}

. Correspondence analysis is an extension of principal component analysis [26] and, as defined above, it presents mechanics quite similar to those of the Chi-square, except that it provides a more intuitive graphical visualisation.

3. Findings and Analyses

In this section, we analyse, present and discuss descriptive and inferential findings from the survey based on the methods in Section 2. It is structured according to the objectives set in Section 1.3, and the results presented here have been selected to closely reflect those objectives, although not all combinations of attributes have been used.

3.1. Visualisation of Variables

The survey was conducted with the aim of uncovering answers to questions relating to how in-service employees respond to data analytics training. This section presents visual graphics of several variables used in framing the questionnaire. The results are based on the variables representing the survey questions; they are selected to reflect the study motivation, and they are strategically aligned with the objectives in Section 1.3.

The left-hand side bar plots in Figure 2 exhibit a disproportionately higher level of representation by teachers in the course than from any other sector—almost a third of the total. The same imbalance caused by those with a degree in the arts is exhibited on the right-hand side panel. With the data sampled from a public university, this is not necessarily surprising. The issue needing attention here relates to categorisation and it is imperative to raise some key questions. For instance, the job category of “Engineer” does not have anyone from the sector of “Education”. It is also likely that some of those categorised as “Accountant” might have links to education. The same applies to the categorisations of technology and sciences. These are fundamental issues that we need to pay attention to and address, as they hinge on the related concepts of masking and swamping [8,9] and may lead to information loss.

The left-hand side pie chart in Figure 3 exhibits the distribution, in percentage, of how trainees felt about the need for the course and one on the right-hand side shows how they felt upon completion of the course. The two plots indicate that trainees would have had similar feelings about course, before and after—that is, the course aligned almost perfectly between those who felt they needed the course more and those who felt had benefited. Likewise, the two panels in Figure 4 provide visualisation of three variables—education level on the horizontal axis, the test results on the vertical, and the colour-coded responses on how the trainees felt before and after the training. The jittered points are presented with added random variation to the location of each point—which is a very useful way of avoiding over-plotting that might arise from discreteness in smaller datasets. In R, the jittered position is specified in both width and height, and it is added as ±the specified value, in both directions, such that the total spread is twice the specified value. Scores of “excellent” and “very good” are almost equally distributed between high and low performers—which might be attributed to the low performers’ recognition that they had learnt something useful.

Figure 5 presents similar results to those in Figure 4, but with respect to prior exposure to coding and Big Data concepts. The left-hand side panel shows responses from different job categories in relation to exposure to coding and how they felt about the course. The right-hand side panel does the same with respect to exposure to Big Data concepts. Interestingly, the course appears to have attracted more teachers and administration staff than it did engineers, accountants and those in the services industry. Such patterns are interesting as they reflect potential gaps in data analytics skills within some fields and they can also be used as inputs to intervention programs. Each of the plots in this section has an animated version, providing a three-variable combination that helps visualise the relationships in a two-dimensional space, and they can provide data handling and modelling insights within the institution.

3.2. Assessing Variable Independence

The fact that the data in Table 1 were sampled from a single country implies lack of representativeness, particularly given the categorisation issues raised above. This is a challenge that hinges on data randomness, and it can only be addressed through interdisciplinarity and unification of concepts [2], which means our survey data can be used as a baseline to measure and establish our benchmark for comparative analysis across different categories. Since the research questions in Section 1.3 allude to associations among data attributes, we can examine the associations among some of the key attributes among individuals. We can use the Cohen-friendly association plots [28] to visually examine the deviations from independence between different aspects of the rows and columns of data in Table 1.

Figure 6, for instance, exhibits associations between different job categories and exposure to Big Data concepts. Each cell in the plots is represented by a rectangle that is assigned height proportional to the signed contribution to the Pearson’s correlation in Equation (2). The width is proportional to the square root of the residuals, such that the area of the box is proportional to the difference in observed and expected frequencies. The rectangles are positioned relative to a baseline, indicating independence when the correlation is zero. If the observed frequency of a cell is greater than the expected one, the box rises above the baseline; otherwise, the box falls below the baseline and is shaded a different colour, depending on the specification in the code. The closer the two parameters get, the smaller the correlation, with zero implying that the two attributes are not related. In this case, engineers, programmers and researchers had an excellent exposure to Big Data concepts, with accountants and administrative staff exhibiting a “good” level of exposure, which hypothetically separates employees with strong “numerical literacy” from those without.

Figure 7 shows similar associations but between job categories and test scores. Like in Figure 6, the lowest variation is observed among engineers and quite a high variation is observed among administrative staff and programmers. The latter might be attributed to the quotient space theory of problem solving [29,30], suggesting that qualified engineers might be more rigorous in problem tackling than programmers. There has been conflicting findings on this stance, with previous research attesting that those with prior knowledge of programming tend to perform worse because of their predispositions that do not allow them to open up to new knowledge [31,32]. Overall, the study establishes a negative relationship between prior programming knowledge and performance.

3.3. Correspondence Analyses

As noted earlier, the strategy in Section 2.2.2 leads to visual associations between different data variables in two-dimensional space, generating orthogonal components and maximising variation in data. Figure 8 exhibits the correspondence between different job categories and their prior exposure to Big Data concepts. Both axes measure the levels of variation in the data, with the extreme left of the horizontal axis representing the negative measure and the extreme right, the most positive measure, similarly for the south and north directions of the vertical.

In this case, the highest variation (62.7%) is accounted for by the first component and it can be seen that engineering stands out here, in terms of mapping to prior exposure to Big Data concepts. The most interesting thing here is to know the contributing points to the solution provided by the method. Being farther away from the origin means being more closely associated with the factors in the proximity. In the figure, it can be seen that while all categories are farther away from the origin, none, other than engineering, is close to any factor. This implies that prior exposure to Big Data concepts does not correspond to any of the job categories, other than engineering.

Table 3 presents the row points most associated with the first principal dimension (PD), and Table 4 presents the column points associations. In both cases, the values are sorted by their coordinates. Secretaries appear to be most negatively associated with the principal components, at −0.8999352, while the association with engineers exhibited the highest positive value at 1.0839431, followed by programmers at 0.6301034. Research had the highest row points association 0.612429087 with the second PD, followed by engineering at 0.294461740. For the columns, it was, again, “excellent” followed by “acceptable” at 0.6403924 and 0.3799449, respectively. The third PD, not plotted, is most influenced by programmers at 0.25298405 on “very good” at 0.651543419. These are quite interesting patterns, for the organisation to investigate, as they potentially indicate commonalities between engineers, researchers and programmers.

Figure 9 presents the correspondence between job categories and decision roles held. It shows total variation due to the first PD as just over 46%, with engineering lying close to the top end of decision-making roles ‘good’ and ‘excellent’, and research, programming, and administration close to the origin and close to ‘good.’ Table 5 shows that secretaries have the highest negative influence on the first principal dimension (PD) at −1.562 followed by services at −0.807, while engineering presents the highest positive influence at 0.629 followed by administration at 0.505. The column associations presented in Table 6 show that “excellent” exerts the highest positive influence in the formation of the PD at 0.884, with the highest negative being “acceptable” at 0.705. For the 2nd PD row points, engineering exerts the highest influence (1.521), then secretaries (0.644), while “excellent” tops the column points associations at 1.007 followed by “acceptable” at 0.257. The third dimension (not plotted) row points for accountants was −0.635 and 0.503 for programmers, while the column points associations were “excellent” at −0.179 and “very good” at 0.433.

Figure 10 represents correspondence analysis between performance and job categories. Engineering and research are placed quite far from the origin but within close proximity of a high score of 42 = 70%.

Table 7 exhibits the row points’ associations with the first PD, with the score of 42/60 = 70% standing out as the highest contributor to its formation at 1:39091464 and 48/60 = 80%, representing the lowest negative at −0.46798738. The highest in the column points (Table 8) is the category of secretaries, at 2:8449866, followed by research at 0.7198203. The row points’ associations for the second PD are the scores 24/60 = 40% and 48/60 = 80%, representing the highest negative and positive at −1.09615651 and 0.50675403, respectively. The column point associations for the second PD are “Accountants” at −1.42723221—the highest negative—and “Services” at 0.77067359—the highest positive contributor.

Figure 11 represents a correspondence analysis between performance and decision roles held within the institution. The first and second PDs account for 80% of the total variation in the data. The decision roles individuals hold in an organisation are crucial to its performance, hence the importance of mapping them. Table 9 exhibits the row points associations with the first dimension, the highest in the positive direction being 0.711 and −0.505 in the negative direction. Table 10 provides similar data for the column points associations “excellent” at 0.986 and “weak” at −0.775, respectively. For the second dimension, highs for the row associations are the 60 = 100% and 30 = 50% scores at 0.429 and −0.834 into the positive and negative directions, respectively. The third dimension, not plotted, accounts for 16.5%—with row points ranging from −0.324 for the maximum score of 60 = 100% to 0.338 for the lowest score of 18 = 30%.

In each of the plots in this section, a pair of data attributes from Table 1 are simultaneously plotted. Some of the relationships are positive, some are negative and some are non-existent. These data values will be largely irrelevant to the interpretation if they are near zero, which indicates that there is little association. Because of its straightforward geometric interpretation, correspondence analysis is a very useful way for showing the relationships between variables that other graphical methods cannot offer. It also helps to achieve the research objectives.

4. Discussion

The foregoing analyses revealed that scores of “excellent” and “very good” are almost equally distributed between high and low performers, which might be attributed to low performers’ recognition that they had learned something useful. This pattern is consistent with prior work showing that perceived learning gains may not always align with prior technical skill, particularly when employees without strong computing backgrounds feel they have acquired new and useful competencies [33]. The training course attracted more teachers and administrative staff than engineers, accountants, and those in the services industry. Similar occupational imbalances have been observed in other studies, where employees in technical and engineering domains typically possess higher baseline exposure to data concepts compared with administrative or services personnel [34,35]. It was observed in the survey that previous knowledge about statistical methods of data analytics along with prior programming knowledge makes it easier to adapt. The engineering effect observed within the corresponding analysis was an interesting one, as it implies discipline clusters that may be used for a wider benefit to other departments within organisations. The study also found that engineers, programmers, and researchers had excellent exposure to Big Data concepts, with accountants and administrative staff exhibiting a “good” level of exposure, which hypothetically separates employees with strong “numerical literacy” from those without. Research in vocational education supports this distinction, indicating that foundational numeracy and literacy skills underpin data comprehension and analytics learning effectiveness [36]. The diversity of experiences among the trainees, as exhibited by their own assessments, highlights the urgency of adopting an interdisciplinary approach in problem solving. While such a level of collaboration aligns with SDG #17, it is pivotal in addressing the non-orthogonality characteristic of SDGs, hence fulfilling objectives #4 and #5. Given that the results are derived from statistical distribution theory, it is essential to recognize that certain results are challenging to understand or might not align with “current perceptions”, partly due to data randomness and other aspects of concept drift [2].

Communicating findings to the audience we target can be potentially challenging due to knowledge gaps across disciplines and core skills. In this particular application, we focused more on the visual outputs than on the actual Chi-square tests, as some of the statistical test conditions, such as having at least five cases in each cell, were not fulfilled. We therefore highlighted the impact of training by visualising paired variables. The results show that the training was beneficial for the vast majority of participants, with or without prior exposure to technical aspects of data analytics. The factor identified for development and efficacy of data analytics was the adaptability to open-source tools for data analysis. It is important to note that our findings must align with those of other researchers in relevant fields, which underpins objectives #4 and #5. Sharing data and findings at all levels of research should remain a priority for the SDG project. It is expected that our findings will not only feed into long-term strategic plans for the organisation that provided the data, but also, they will provide empirical evidence for future SDG studies to adapt to.

It is imperative for organisations to train their employees on the importance of processing information in accordance with the relevant legal frameworks, the significance of understanding the implications of such practice as well as the ability to apply the appropriate measures when incidents happen. Furthermore, enforcing ethical principles in the workplace through training programs is key to promoting responsible data processing. Big Data Analytics requires the use of sophisticated machine learning algorithms in all applications. However, such algorithms, if not handled ethically, could harm the industry through bias and discriminatory practices [37,38]. As part of training agendas, ethical topics such as confidentiality, informed consent, and avoiding disruptions, disclosure, or modification of information relating to individuals or organisations are of paramount importance for any employee to perform their duties efficiently. All that can only be achieved if different categories of employees are assessed on various aspects of understanding of the underlying key aspects of their working environment—technical and legal, in particular.

In addition to using the survey as the main research method to collect the data for the present work, the researcher realized that it was important to note the observations and comments made by the participants about their training course. Indeed, some participants highlighted different issues, which were not the main point of discussion in this research; however, given the paramount importance of such feedback, collected randomly and without prior planning, two main issues were seen as alarming and can potentially affect the trainees’ performance in their jobs. These were related to awareness of technical communication and understanding of the legal and ethical considerations when dealing with organisational data. Therefore, the following section provides insights into the importance of technical communicative skills in the workplace along with the legal and ethical reflections needed to perform the job successfully.

4.1. The Importance of Technical Communicative Skills in the Workplace

Technical communication proficiency has become a necessity in every field, with data, information, communication and technology being no exception. The aim of this paper was to explore ways of offering effective training to employees dealing with Big Data, and one conclusion was that proficiency in technical communication is essential [39]. Indeed, the communicative style used in Big Data Analytics contains terminologies, specific jargon and technical concepts that could be challenging for employees lacking technical communication skills. This might lead employees to either misinterpret the data or refrain from undertaking tasks related to those concepts. Organisations are therefore advised to consider mastering technical communication as a key criterion in the recruitment process and to provide efficient communication training tailored to the needs of employees.

Quite often, dealing with Big Data Analytics in different industries does not only involve solid mathematical and statistical skills, but it also requires understanding software analytics equipment. The latter usually includes manuals and documentations written in a foreign language, typically English, which some users may not be fully acquainted with. Therefore, having strong communication skills is significant for employees as it allows them to have a combination of attributes (subject knowledge as well as technical linguistic abilities) needed to perform their roles effectively [40]. In the same vein, having the ability to understand and use technical language enables employees to communicate their data, elaborate on complex concepts and draft their reports accurately and effectively to their senior managers and stakeholders. The transformative influence of novel technologies in different industries hinges on identifying relevant skills for employees to drive changes, i.e., identifying crucial skills, analysing challenges and proposing solutions [21]. To provide employees with efficient practice in the workplace, organisations are strongly advised to embed technical communication in their training programs whenever it is applicable. This could be done through the following:

Conducting needs analysis assessment to understand the specific linguistic needs and weaknesses among employees. This could be performed through initial tests in the recruitment process or prior to starting their jobs. Furthermore, organisations can incorporate AI-assisted training tools to determine the employees’ level and identify their areas of improvement. The outcomes of such an assessment could be used to inform people involved in creating training content, to design discipline-specific training programs relevant to Big Data Analytics.
Evaluating and updating training programs regularly to meet the rapidly evolving technological advancement and requirements for Big Data Analytics.
Recruiting experts’ facilitators and trainers in technical communication. The facilitators should be able to tailor the content of their materials to the Big Data field and its associated practices.
Designing generic and inclusive training materials to support employees. These materials should be specifically focused on the technical linguistic skills needed to support employees in performing their jobs successfully.
Providing employees with regular training in technical communication. These could include multiple levels from basic to advanced ones, aiming at meeting the needs of the diverse communities, particularly in organisations where employees have different linguistic and cultural backgrounds.

4.2. Legal and Ethical Considerations

Besides the subject knowledge, skills and technical linguistic abilities required in dealing with Big Data Analytics, organisations are required to raise their employees’ awareness regarding the importance of the legal as well as the ethical considerations when handling and processing data. Essentially, in today’s digital age, huge amounts of data are collected, used, shared, and stored; therefore, understanding and complying with local and international standards for a safe and lawful use of data is a must. Failure to do this can result in organisations facing severe financial and ethical issues [41]. Initially, when performing their tasks, Big Data Analytics specialists are usually faced with a significant amount of data collected from multiple sources concerning individuals and organisations from different parts of the globe. Therefore, employees are expected to adhere to local data privacy laws and regulations as well as international ones, such as the EU General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) and to be mindful of any laws set by given industries or organisations [42,43]. In this line of thought, it is worth mentioning that some legal aspects must be integrated into the training programs offered to employees, aiming at raising their awareness and eventually preventing any major risks when working with massive amounts of data.

Effective training should raise the employees’ awareness regarding issues such as data breaches, which refer to unauthorized access to information, usually through unsecure devices, outdated software, or through cyber-attacks [44]. Data breaches can significantly harm organisations and companies, leading to financial loss and eventually affecting the organisation’s reputation. In addition, attention should also be given to data privacy measures as Big Data Analytics often implies the processing of sensitive personal data including names, email addresses, browsing history, etc.; organisations should train their employees on the importance of processing information in accordance with the relevant legal frameworks, the significance of understanding the implications of such practice as well as the ability to apply the appropriate measures when incidents happen. Big Data Analytics requires the use of sophisticated machine learning algorithms in all applications. However, such algorithms, if not handled ethically, could harm the industry through bias and discriminatory practices [45]. As part of training agendas, ethical topics, such as confidentiality, informed consent, and avoiding disruptions, disclosure or modification of information relating to either individuals or organisations, are of paramount importance for any employee to perform their duties efficiently. Thus, enforcing ethical principles in the workplace through training programs is key to promoting responsible data processing [46].

5. Research Contribution and Implications

The scientific contribution of this work lies in addressing the critical intersection between human capital development and data management within contemporary organisational contexts, particularly in the modern era marked by the complexities of Big Data and artificial intelligence integration. Such an intervention is crucial for demonstrating value, justifying investment, and influencing future learning and development decisions. Engaging employees in strategies on how to handle internal and external organisational data is fundamental to any organisation [47,48].

With today’s rapidly evolving digital landscape, organisations face historically unparalleled volumes of structured and unstructured data, creating both transformative opportunities for competitive advantage and significant operational challenges that require advanced management strategies. Since organisational success and innovation are rooted in human capital, continuous evolution of employees’ knowledge repository and technical competence proves to be not only beneficial but downright crucial for sustainable growth and adaptability. This research recognizes that strategic workforce development in data management and literacy is a pivotal investment for organisations to thrive in an increasingly data-driven economy. The intervention strategies explored in this work serve several critical functions: they provide measurable metrics for assessing tangible value to stakeholders, establish solid justification for continued investment in human resource growth, and create evidence-supported frameworks for guiding future learning and development initiatives. In addition, by actively engaging employees in comprehensive strategies for managing both internal operational data and external market intelligence, organisations can foster a culture of data-driven decision-making that permeates all levels of the enterprise, ultimately reshaping the way institutions leverage information assets for achieving strategic objectives and maintaining a competitive edge in their respective markets.

6. Concluding Remarks

This research assessed the impact of data analytics training on in-service trainees using the open-source tool, Python. It sought to identify the factors affecting the development and effectiveness of a training program and how to effectively measure its output and impact on organisations. Designed to engage trainees in identifying the key factors that potentially influence their performance at the workplace, the research was predicated on the three research questions in Section 1.3. Deviations between “observed” and “expected” data attributes were used to assess associations between selected attributes of interest, leading to answers to the three questions. The impact of personal characteristics on the employees’ performance in Data Analytics training was not noticeable, but the impact of professional characteristics was, as was the impact of prior knowledge skills. The study establishes a negative relationship between prior programming knowledge and performance. While many prior studies [49,50] report a positive correlation between prior programming knowledge and performance, this study’s inverse relationship may indicate that experienced programmers found the course content too basic or unengaging, leading to lower post-training scores.

The study revealed that prior exposure to Big Data concepts does not correspond to any of the job categories, other than engineering. This observation aligns with previous studies that reported that Big Data familiarity tends to be concentrated among technical professionals, while cross–disciplinary integration of data analytics skills remains limited [34,51]. The use of association rules and correspondence analysis in identifying the “readiness” of employees for intensive data analysis at the workplace opens novel avenues in organisational change [52,53] away from predominantly standard statistical analysis, as evidenced by many studies [54]. It is expected that this research will significantly contribute to the current interdisciplinary literature across the SDG spectrum, enhancing SDG attainment by feeding into fields such as data science, education and human resource management fields.

Author Contributions

R.A.T.S.: Conceptualisation, data collection and curation, investigation, resources; K.S.M.: methodology, software, writing original draft, formal analysis; L.B.: writing—review and editing, visualisation; L.C.: project administration, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki. However, the protocol did not require approval from the ethical board as the data used in this study was initially collected as standard course feedback following a training session, which is routine educational practice. The decision to analyse this feedback data for research purposes was made after the collection process. The data was completely anonymized, with no sensitive personal information collected. The study did not involve any interventions, and participants were not subjected to any procedures beyond normal educational practices. All feedback was voluntary and the analysis focused on improving educational delivery and understanding the experiences of the participants.

Informed Consent Statement

Prior to the feedback collection, participants were provided with verbal information about the voluntary nature of the process and their right to decline participation. Informed verbal consent was secured from all participants, who were explicitly made aware that their contribution to the feedback session was discretionary and contingent upon their willingness to share their perspectives about the training experience.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are grateful to their respective institutions and departments for the time they were provided to accomplish this work. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflicts of Interest

The authors declare no conflicts of interest. There are no circumstances or interests that may be perceived as inappropriately influencing the representation or interpretation of reported research results.

References

Mwitondi, K.; Mak, H.W.L. Robust Machine Learning Algorithmic Rules for Detecting Air Pollution in the Lower Parts of the Atmosphere. Data Sci. J. 2025, 24, 27. [Google Scholar] [CrossRef]
Mwitondi, K.S.; Said, R.A. Dealing with Randomness and Concept Drift in Large Datasets. Data 2021, 6, 77. [Google Scholar] [CrossRef]
Mwitondi, K.; Munyakazi, I.; Gatsheni, B. A robust machine learning approach to SDG data segmentation. J. Big Data 2020, 7, 97. [Google Scholar] [CrossRef]
Buneman, P.; Jajodia, S. (Eds.) SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data; Association for Computing Machinery: New York, NY, USA, 1993. [Google Scholar]
Agrawal, R.; Imieliński, T.; Swami, A. Mining Association Rules Between Sets of Items in Large Databases. SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1900, 50, 157–175. [Google Scholar] [CrossRef]
Hirschfeld, H.O. A Connection between Correlation and Contingency. Math. Proc. Camb. Philos. Soc. 1935, 31, 520–524. [Google Scholar] [CrossRef]
Bendre, S.M. Masking and swamping effects on tests for multiple outliers in normal sample. Commun. Stat.—Theory Methods 1989, 18, 697–710. [Google Scholar] [CrossRef]
Lawrence, A.J. Deletion Influence and Masking in Regression. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 181–189. [Google Scholar] [CrossRef]
Van Dijk, J. The Digital Divide; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
Henry, L. Bridging the Urban-Rural Digital Divide and Mobilizing Technology for Poverty Eradication: Challenges and Gaps. GSM Assoc. 2019. Available online: https://www.un.org/development/desa/dspd/wp-content/uploads/sites/22/2019/03/Henry-Bridging-the-Digital-Divide-2019.pdf (accessed on 27 September 2025).
Restuccia, D.; Taska, B. Different skills, different gaps: Measuring and closing the skills gap. In Developing Skills in a Changing World of Work; Rainer Hampp Verlag: Baden-Baden, Germany, 2018; pp. 207–226. [Google Scholar]
SDG. Sustainable Development Goals. 2015. Available online: https://www.un.org/sustainabledevelopment/sustainable-development-goals/ (accessed on 15 September 2025).
SDGI. Sustainable Development Goals Indicators. 2017. Available online: https://unstats.un.org/sdgs/indicators/database/ (accessed on 15 September 2025).
Yadav, P.; Tudela, L.A.M.; Marco-Lajara, B. The role of AI in assessing and achieving the sustainable development goals (SDGs). In Issues of Sustainability in AI and New-Age Thematic Investing; IGI Global Scientific Publishing: Palmdale, PA, USA, 2024; pp. 1–17. [Google Scholar]
Monino, J.L. Data value, big data analytics, and decision-making. J. Knowl. Econ. 2021, 12, 256–267. [Google Scholar] [CrossRef]
Li, L.; Lin, J.; Ouyang, Y.; Luo, X.R. Evaluating the impact of big data analytics usage on the decision-making quality of organizations. Technol. Forecast. Soc. Change 2022, 175, 121355. [Google Scholar] [CrossRef]
Shi, Y. Advances in big data analytics. Adv. Big Data Anal. 2022, 10, 978–981. [Google Scholar]
Prakash, D. Data-driven management: The impact of big data analytics on organizational performance. Int. J. Glob. Acad. Sci. Res. 2024, 3, 12–23. [Google Scholar] [CrossRef]
Franke, F.; Hiebl, M.R. Big data and decision quality: The role of management accountants’ data analytics skills. Int. J. Account. Inf. Manag. 2023, 31, 93–127. [Google Scholar] [CrossRef]
Johnson, M.; Jain, R.; Brennan-Tonetta, P.; Swartz, E.; Silver, D.; Paolini, J.; Mamonov, S.; Hill, C. Impact of big data and artificial intelligence on industry: Developing a workforce roadmap for a data driven economy. Glob. J. Flex. Syst. Manag. 2021, 22, 197–217. [Google Scholar] [CrossRef]
Xu, L.; Zhang, J.; Ding, Y.; Sun, G.; Zhang, W.; Philbin, S.P.; Guo, B.H. Assessing the impact of digital education and the role of the big data analytics course to enhance the skills and employability of engineering students. Front. Psychol. 2022, 13, 974574. [Google Scholar] [CrossRef]
Navlani, A.; Fandango, A.; Idris, I. Python Data Analysis: Perform Data Collection, Data Processing, Wrangling, Visualization, and Model Building Using Python; Packt Publishing Ltd.: Birmingham, UK, 2021. [Google Scholar]
Congedo, L. Semi-Automatic Classification Plugin: A Python tool for the download and processing of remote sensing images in QGIS. J. Open Source Softw. 2021, 6, 3172. [Google Scholar] [CrossRef]
The-R-Foundation. The R Project for Statistical Computing. 2022. Available online: https://www.r-project.org/ (accessed on 4 December 2024).
Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef]
Cochran, W.G. The χ² Test of Goodness of Fit. Ann. Math. Stat. 1952, 23, 315–345. [Google Scholar] [CrossRef]
Cohen, A. On the graphical display of the significant components in a two-way contingency table. Commun. Stat.—Theory Methods 1980, A9, 1025–1041. [Google Scholar] [CrossRef]
Zhang, R.; Jayawardene, V.; Indulska, M.; Sadiq, S.; Zhou, X. A Data Driven Approach for Discovering Data Quality Requirements. In Proceedings of the ICIS—Decision Analytics, Big Data and Visualisation, Auckland, New Zealand, 14–17 December 2014. [Google Scholar]
Zhang, P.; Xiong, F.; Gao, J.; Wang, J. Data quality in big data processing: Issues, solutions and open problems. In Proceedings of the 2017 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), San Francisco, CA, USA, 4–8 August 2017; pp. 1–7. [Google Scholar] [CrossRef]
Bringula, R.; Reguyal, J.J.; Tan, D.D.; Ulfa, S. Mathematics self-concept and challenges of learners in an online learning environment during COVID-19 pandemic. Smart Learn. Environ. 2021, 8, 22. [Google Scholar] [CrossRef]
Alexandron, G.; Ruipérez-Valiente, J.A.; Lee, S.; Pritchard, D. Evaluating the Robustness of Learning Analytics Results Against Fake Learners. In European Conference on Technology Enhanced Learning; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
Lee, D.M.; Pliskin, N.; Kahn, B. The relationship between performance in a computer literacy course and students’ prior achievement and knowledge. J. Educ. Comput. Res. 1994, 10, 63–77. [Google Scholar] [CrossRef]
Li, G.; Yuan, C.; Kamarthi, S.; Moghaddam, M.; Jin, X. Data science skills and domain knowledge requirements in the manufacturing industry: A gap analysis. J. Manuf. Syst. 2021, 60, 692–706. [Google Scholar] [CrossRef]
Tambe, P.B. Reskilling the Workforce for AI: Domain Expertise and Algorithmic Literacy. Manag. Sci. 2025. [Google Scholar] [CrossRef]
Falk, I.; Millar, P. Literacy and n Eracy in Vocational Education and Training: Review of Research; National Centre for Vocational Education Research: Adelaide, Australia, 2001. [Google Scholar]
Bengio, Y.; Hinton, G.; Yao, A.; Song, D.; Abbeel, P.; Darrell, T.; Harari, Y.N.; Zhang, Y.Q.; Xue, L.; Shalev-Shwartz, S.; et al. Managing extreme AI risks amid rapid progress. Science 2024, 384, 842–845. [Google Scholar] [CrossRef]
Schuett, J. Three lines of defense against risks from AI. AI Soc. 2025, 40, 493–507. [Google Scholar] [CrossRef]
Nikou, S.; De Reuver, M.; Mahboob Kanafi, M. Workplace literacy skills—How information and digital literacy affect adoption of digital technology. J. Doc. 2022, 78, 371–391. [Google Scholar] [CrossRef]
Balusamy, B.; Kadry, S.; Gandomi, A.H. Big Data: Concepts, Technology, and Architecture; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Maor, O. Bridging legal methodology and ethical considerations: A Novel Approach Applied to challenges of Data Harvesting. Digit. Soc. 2025, 4, 1. [Google Scholar] [CrossRef]
Mallet, P. Comparative Analysis of Data Privacy Legislation: Convergence and Divergence Between the GDPR and CCPA. In Tech Fusion in Business and Society; Springer: Berlin/Heidelberg, Germany, 2025; pp. 465–475. [Google Scholar]
Huang, M.L. Digital Privacy in the Age of Surveillance: A Comparative Study of GDPR and CCPA. OTS Can. J. 2025, 4, 65–74. [Google Scholar] [CrossRef]
Nicolás-Agustín, Á.; Jiménez-Jiménez, D.; Maeso Fernandez, F.; Di Prima, C. ICT training, digital transformation and company performance: An empirical study. Eur. J. Innov. Manag. 2025, 28, 1687–1708. [Google Scholar] [CrossRef]
Koshiyama, A.; Kazim, E.; Treleaven, P.; Rai, P.; Szpruch, L.; Pavey, G.; Ahamat, G.; Leutner, F.; Goebel, R.; Knight, A.; et al. Towards algorithm auditing: Managing legal, ethical and technological risks of AI, ML and associated algorithms. R. Soc. Open Sci. 2024, 11, 230859. [Google Scholar] [CrossRef] [PubMed]
Nguyen, A.; Ngo, H.N.; Hong, Y.; Dang, B.; Nguyen, B.P.T. Ethical principles for artificial intelligence in education. Educ. Inf. Technol. 2023, 28, 4221–4241. [Google Scholar] [CrossRef] [PubMed]
Jeske, D.; Calvard, T. Big data: Lessons for employers and employees. Empl. Relations Int. J. 2020, 42, 248–261. [Google Scholar] [CrossRef]
Griffin, R.W.; Phillips, J.M.; Gully, S.M.; Creed, A.; Gribble, L.; Watson, M. Organisational Behaviour: Engaging People and Organisations; Cengage AU: South Melbourne, Australia, 2023. [Google Scholar]
Smith IV, D.H.; Hao, Q.; Jagodzinski, F.; Liu, Y.; Gupta, V. Quantifying the effects of prior knowledge in entry-level programming courses. In Proceedings of the ACM Conference on Global Computing Education, Chengdu, China, 17–19 May 2019; pp. 30–36. [Google Scholar]
Stoenoiu, C.E.; Jäntschi, L. Connecting the Computer Skills with General Performance of Companies—An Eastern European Study. Sustainability 2024, 16, 10024. [Google Scholar] [CrossRef]
Chen, C.h. Influence of employees’ intention to adopt AI applications and big data analytical capability on operational performance in the high-tech firms. J. Knowl. Econ. 2024, 15, 3946–3974. [Google Scholar] [CrossRef]
Aakula, A.; Saini, V.; Ahmad, T. The Impact of AI on Organizational Change in Digital Transformation. Internet Things Edge Comput. J. 2024, 4, 75–115. [Google Scholar]
Schwaeke, J.; Gerlich, C.; Nguyen, H.L.; Kanbach, D.K.; Gast, J. Artificial intelligence (AI) for good? Enabling organizational change towards sustainability. Rev. Manag. Sci. 2025, 19, 3013–3038. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning for intelligent data analysis and automation in cybersecurity: Current and future prospects. Ann. Data Sci. 2023, 10, 1473–1498. [Google Scholar] [CrossRef]

Figure 1. Implementation framework.

Figure 2. Distribution of trainees based on different job categories, sector, work and education.

Figure 3. Percentage distribution of trainees on how they needed the training and how they felt afterwards.

Figure 4. Distribution of results and based on course needs and benefit assessments.

Figure 5. Distribution of results based on coding and Big Data exposure.

Figure 6. Associations between job categories and exposure to Big Data concepts.

Figure 7. Associations between job categories and test score.

Figure 8. Correspondence analysis of the relationship of job categories to Big Data.

Figure 9. Correspondence analysis of the relationship between job categories and decision roles held.

Figure 10. Correspondence analysis of performance and job categories.

Figure 11. Correspondence analysis of performance and decision roles.

Table 1. Description and relevance of variables.

Variable	Description and Relevance
Age	Age (in years) of respondent
Gender	Gender of respondent
Nationality	Nationality of respondent-=-binarized to UAE and non-UAE (Expat)
Experience	Working experience of respondent (in years)
Education	Education level attained by the respondent at the time of the course
Major	Main area of study
Income	Income level of respondents (in UAE Dirhams). USD ($)1 ≈ UAED 3.67
Marital	Marriage status of respondent
Family	Number of people in the family, including respondent
Work	Sector in which respondent works
Job	Type of job of respondent
English	Respondent’s self-assessment of the English Technical communication proficiency
Computing	Respondent’s self-assessment of computing skills
Analytics	Respondent’s self-assessment of data analytics skills
Statistics	Respondent’s work-related statistical roles and responsibilities
StatSoftware	Respondent’s previous training in using any statistical package
Python	Respondent’s previous training in using machine learning packages like Python or R
Big Data	Respondent’s experience with Big Data
Decision	Respondent’s contribution in key decision making at work
Coding	Respondent’s ability to write and/or understand computer codes (programming)
ML	Respondent’s general understanding of Machine Learning
AI	Respondent’s general understanding of Artificial Intelligence
DL	Respondent’s general understanding of Deep Learning
Course Need	Respondent’s need to attend the Big Data Analytics course
Benefit	Respondent’s self-assessment of the benefit of attending the Big Data Analytics course
Score	Respondent’s performance on the final exam—scored out of 60 (categorical)
Results	Respondent’s performance on the final exam—scored out of 60 (numerical)

Table 2. A two-way contingency table for the Education and Big Data variables.

Education	Acceptable	Excellent	Good	Very Good	Weak	Total
Diploma	0	0	0	0	1	1
First Degree	16	3	10	4	25	58
High School	1	0	2	0	4	7
Masters	2	1	9	2	6	20
Total	19	4	21	6	37	87

Table 3. Row points vs. PD 1.

Secretary	−0.899
Teacher	−0.462
Services	−0.437
Admin	0.100
Researcher	0.208
Accountant	0.338
Programmer	0.630
Engineer	1.083

Table 4. Columns vs. PD 1.

weak	−0.419
acceptable	−0.146
very good	0.431
good	0.542
excellent	1.083

Table 5. Row points vs. PD 1.

Secretary	−1.562
Services	−0.807
Teacher	−0.319
Programmer	0.070
Researcher	0.079
Accountant	0.373
Admin	0.505
Engineer	0.629

Table 6. Columns vs. PD 1.

acceptable	−0.705
Weak	−0.026
very good	0.073
Good	0.221
excellent	0.884

Table 7. Row points vs. PD 1.

48/60	−0.467
24/60	−0.421
60/60	−0.189
36/60	−0.079
54/60	−0.062
30/60	−0.014
18/60	0.313
42/60	1.391

Table 8. Columns vs. PD 1.

Profession	Value
Services	−0.493
Programmer	−0.418
Accountant	−0.319
Teacher	−0.220
Admin	0.261
Engineer	0.591
Researcher	0.719
Secretary	2.844

Table 9. Row points vs. PD 1.

48/60	−0.505
30/60	−0.459
36/60	−0.409
60/60	0.052
54/60	0.101
24/60	0.551
18/60	0.699
42/60	0.711

Table 10. Columns vs. PD 1.

weak	−0.775
acceptable	−0.369
good	0.071
very good	0.355
excellent	0.986

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Said, R.A.T.; Mwitondi, K.S.; Benseddik, L.; Chemlali, L. Building Data Literacy for Sustainable Development: A Framework for Effective Training. Data 2025, 10, 188. https://doi.org/10.3390/data10110188

AMA Style

Said RAT, Mwitondi KS, Benseddik L, Chemlali L. Building Data Literacy for Sustainable Development: A Framework for Effective Training. Data. 2025; 10(11):188. https://doi.org/10.3390/data10110188

Chicago/Turabian Style

Said, Raed A. T., Kassim S. Mwitondi, Leila Benseddik, and Laroussi Chemlali. 2025. "Building Data Literacy for Sustainable Development: A Framework for Effective Training" Data 10, no. 11: 188. https://doi.org/10.3390/data10110188

APA Style

Said, R. A. T., Mwitondi, K. S., Benseddik, L., & Chemlali, L. (2025). Building Data Literacy for Sustainable Development: A Framework for Effective Training. Data, 10(11), 188. https://doi.org/10.3390/data10110188

Article Menu

Building Data Literacy for Sustainable Development: A Framework for Effective Training

Abstract

1. Introduction

1.1. Motivation

1.2. Background

1.3. Research Questions and Objectives

2. Methods

2.1. Data Sources

2.2. Modelling Strategy

2.2.1. Observable and Expected Values Deviations

2.2.2. Correspondence Analysis

3. Findings and Analyses

3.1. Visualisation of Variables

3.2. Assessing Variable Independence

3.3. Correspondence Analyses

4. Discussion

4.1. The Importance of Technical Communicative Skills in the Workplace

4.2. Legal and Ethical Considerations

5. Research Contribution and Implications

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI