Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science

Kramer, Elena; Lamberg, Dan; Georgescu, Mircea; Weiss Cohen, Miri

doi:10.3390/info16090735

Open AccessArticle

Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science

¹

Department of Software Engineering, Braude College of Engineering, Karmiel 2161002, Israel

²

Department of Economical Informatics, Alexandru Ioan Cuza University, 700506 Iasi, Romania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(9), 735; https://doi.org/10.3390/info16090735

Submission received: 29 June 2025 / Revised: 13 August 2025 / Accepted: 19 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Advancing Educational Innovation with Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Providing students with effective learning resources is essential for improving educational outcomes—especially in complex and conceptually diverse fields such as Mathematics and Computer Science. To better understand how these subjects are communicated, this study investigates the linguistic structures embedded in academic texts from selected subfields within both disciplines. In particular, we focus on meta-languages—the linguistic tools used to express definitions, axioms, intuitions, and heuristics within a discipline. The primary objective of this research is to identify which subfields of Mathematics and Computer Science share similar meta-languages. Identifying such correspondences may enable the rephrasing of content from less familiar subfields using styles that students already recognize from more familiar areas, thereby enhancing accessibility and comprehension. To pursue this aim, we compiled text corpora from multiple subfields across both disciplines. We compared their meta-languages using a combination of supervised (Neural Network) and unsupervised (clustering) learning methods. Specifically, we applied several clustering algorithms—K-means, Partitioning around Medoids (PAM), Density-Based Clustering, and Gaussian Mixture Models—to analyze inter-discipline similarities. To validate the resulting classifications, we used XLNet, a deep learning model known for its sensitivity to linguistic patterns. The model achieved an accuracy of 78% and an F1-score of 0.944. Our findings show that subfields can be meaningfully grouped based on meta-language similarity, offering valuable insights for tailoring educational content more effectively. To further verify these groupings and explore their pedagogical relevance, we conducted both quantitative and qualitative research involving student participation. This paper presents findings from the qualitative component—namely, a content analysis of semi-structured interviews with software engineering students and lecturers.

Keywords:

Natural Language Processing (NLP); meta-language; supervised learning; unsupervised learning; Computational Thinking (CT); Mathematical Thinking (MT)

1. Introduction

This research investigates the relationship between Mathematical Thinking (MT) and Computational Thinking (CT)—two forms of reasoning that, while widely studied, remain open to multiple definitions and interpretations. Rather than focusing on abstract theoretical comparisons, this study adopts a novel empirical approach: a comparative analysis of the meta-languages used across subfields in Mathematics and Computer Science.

While research on the meta-language of Mathematics and Computer Science as whole disciplines is not new, this study introduces a novel approach by focusing on the meta-languages of specific subfields within Mathematics and Computer Science—what can be described as “local meta-languages”. We examine the hypothesis that different subfields possess distinct meta-languages, which may nonetheless exhibit structural similarities.

The central hypothesis guiding this research is that if certain Mathematical and Computer Science subfields exhibit similar meta-languages, this linguistic proximity may correlate with students’ ability to understand and succeed in those fields. In other words, shared meta-linguistic structures may support cognitive transfer and conceptual accessibility across disciplines.

To test this hypothesis, we first compiled and analyzed a corpus of academic texts drawn from key subfields within each discipline. The Mathematical subfields included Linear Algebra, Abstract Algebra, Combinatorics and Probability, Mathematical Analysis, Set Theory, and Logic. The Computer Science subfields comprised Functional Programming, Imperative Programming, Object-Oriented Programming, Data Structures and Algorithms, Automata and Formal Languages, and Operating Systems. These subfields represent foundational areas typically covered in undergraduate Computer Science and Software Engineering programs.

Our methodology relied on advanced Natural Language Processing (NLP) techniques—including lemmatization and tokenization—to systematically remove field-specific terminology from the texts. This preprocessing isolates the underlying meta-language: the general vocabulary and grammatical structures used to express abstract reasoning across disciplines.

To identify linguistic patterns, we applied several clustering algorithms—K-means, Partitioning around Medoids (PAM), Density-Based Clustering, and Gaussian Mixture Models—to classify subfields based on meta-language similarity. Recognizing that clustering results can vary significantly depending on distance metrics, we further validated our findings using XLNet, a Neural Network model designed to capture deep linguistic structures. This dual approach allowed us to assess the robustness of cluster assignments and to explore how linguistic similarity maps onto disciplinary boundaries.

To complement the computational analysis and further evaluate our hypothesis—that shared meta-languages facilitate mutual understanding across fields—we conducted both quantitative and qualitative studies involving student participants. This paper presents the results of the qualitative component based on a content analysis of semi-structured interviews with software engineering students and lecturers.

In the context of STEM education, where disciplines like Mathematics and Computer Science are often perceived as abstract and linguistically demanding, understanding how knowledge is communicated is critical for improving student learning outcomes. Many students struggle not only with core concepts but also with the specialized language—meta-language—that frames definitions, procedures, and problem-solving strategies. By analyzing and comparing the linguistic structures used across STEM subfields, this research contributes to the development of more accessible and pedagogically aligned learning resources. Identifying similarities in meta-language across domains can support an instructional design that bridges familiar and unfamiliar content, thereby fostering deeper comprehension, smoother knowledge transfer, and increased confidence among students navigating complex technical subjects.

2. Literature Review and Related Work

Computers and programming have transformed the modern world, making technological literacy an essential skill for academic and professional success in the digital age [1]. As a result, Computational Thinking (CT)—a problem-solving approach applicable across various domains—has gained increasing attention in education. To prepare students as future professionals capable of addressing complex business and societal challenges, it is necessary to develop their competence in both Mathematical Thinking and Computational Thinking.

Mathematical Thinking and Computational Thinking are essential cognitive frameworks that support problem solving across a wide range of disciplines. CT, as conceptualized by Wing [2], refers to a collection of mental strategies used to systematically analyze complex problems and devise solutions that can be implemented by computers. It encompasses core practices such as decomposition, pattern recognition, abstraction, and algorithmic thinking, all of which are vital for navigating the challenges of the digital age [1]. In contrast, MT is a long-standing educational construct characterized by logical reasoning, abstract thought, and the application of mathematical tools to both theoretical and practical problems [3].

A central challenge lies in finding effective ways to assess thinking itself. One promising direction involves examining the linguistic dimension, recognizing that thinking is shaped by language [4,5]. Noam Chomsky [6], often regarded as the founder of modern linguistics, emphasized the deep connection between language and thought. Since both Mathematics and Computer Science are structured around formal languages, and individuals rely on meta-language in their reasoning, exploring these meta-languages can provide insights into the nature of thinking. According to Alfred Tarski [7,8], meta-language refers to the language used to describe another language—discussing its meanings, sentence structure, usage, and truth conditions. Accordingly, the meta-language of any discipline refers to the linguistic framework used in a discipline to convey its definitions, axioms, intuitions, and heuristics.

What distinguishes our approach is its focus on comparing the meta-languages of specific subfields within Mathematics and Computer Science separately. However, the use of meta-language in Mathematics and Computer Science as general disciplines is not a novel concept. For instance, Florian Richter [9] offers a theoretical exploration of the often-neglected distinction between object-language—the language being analyzed—and meta-language within the context of Computer Science and formal logic. His work emphasizes the relevance of this distinction for understanding logical inference and computational reasoning. Another example is the work of Eugenio Moggi [10], which offers a foundational perspective on the design and use of formal meta-languages in programming language theory. His approach centers on the use of monads as a unifying structure for modeling computational effects and for defining modular, multi-stage meta-languages. This contribution is particularly significant for understanding the role of formal meta-representations in both Computer Science and Logic.

A comparative investigation of meta-languages in Mathematics and Computer Science is thus not only of theoretical interest but also of practical educational relevance. As noted in [11], students’ capacity to engage in computational thinking relies on their fluency with both the conceptual frameworks and the representational tools characteristic of the discipline. These tools can be understood as the field’s meta-languages. By exploring whether students demonstrate improved comprehension in subfields with overlapping or similar meta-linguistic structures, this study seeks to determine whether linguistic proximity enhances cognitive accessibility.

The remainder of this paper is structured as follows: First, it presents the process of extracting the meta-linguistic layer from texts. Next, the Data Mining stage of the research, including enhanced clustering and the implementation of a Neural Network Model (XLNet), is presented. This stage yields a clustering of related fields in Mathematics and Computer Science based on the similarity of their meta-languages. This paper then describes the qualitative research and content analysis of the interviews, and details the identified themes. Finally, this study concludes with a summary of the overall results and key conclusions drawn from the research process.

3. Proposed Approach

This study examines the relationship between Mathematical Thinking and Computational Thinking by focusing on their linguistic dimensions, drawing on the established connection between language and thought as a framework for evaluating modes of thinking. To better understand the relationship between competencies in different domains of Mathematics and Computer Science, a novel approach comparing the meta-languages of various subfields in each discipline is presented. This is denoted as

M = {eng - Algebra, eng - Set - Theory, eng - Analysis, eng - Functional - Prog, \dots}

M denotes the set of meta-languages associated with various subfields in Mathematics—such as Algebra, Set Theory, and Analysis—and in Computer Science, including areas like Functional Programming. The specific subfields analyzed in this study are listed in the Introduction and will be referenced again later.

These meta-languages are hypothesized to satisfy the following properties:

Distinctiveness: Each pair of meta-languages $(m_{i}, m_{j})$ , with $m_{i} \neq m_{j}$ , is distinguishable; i.e., there exists an algorithm capable of identifying the meta-language used in a given text.
Translatability: For any disciplinary pair of meta-languages, there exists a transformation $T_{i \to j} : m_{i} \to m_{j}$ that preserves semantic content.

The hypothesis was tested using a corpus of texts originating from a variety of disciplines. Only the portions of the corpus that corresponded to the meta-language were extracted (excluding domain-specific terms, formulas, etc.). The corpus was then classified as both supervised and unsupervised.

During these Data Mining phases [12,13,14,15,16], a collection of texts from key domains in Mathematics and Computer Science was used. The mathematical fields included Linear Algebra, Abstract Algebra, Combinatorics and Probability, Mathematical Analysis, Set Theory, and Logic. The Computer Science fields included Functional Programming, Imperative Programming, Object-Oriented Programming, Data Structures and Algorithms, Automata and Formal Languages, and Operating Systems. These are foundational topics in most Computer Science and engineering curricula.

Figure 1 presents the pipeline of the Data Mining stage.

Building on the pipeline illustrated in Figure 1, we now describe each step of the Data Mining process in detail.

Employing advanced NLP techniques (tokenization, lemmatization, POS tagging, and automatic extraction of discourse markers—to extract the meta-linguistic layer from texts [17]).
Dividing the text into chunks and applying the Doc2Vec model [18] to convert them into vector representations.The advantages of using the Doc2Vec model will be discussed in the Classification process section.
Calculating distances between vectors. Both cosine similarity and Pearson correlation (i.e., cosine similarity on centered vectors) were tested, and no difference in the results was observed.
Unsupervised classification—using various clustering algorithms (K-means, PAM (K-Medoids), DBSCAN, and Gaussian Mixture Models (GMMs)) [19]. As will be shown later, the results of the PAM algorithm were the most significant; therefore, we used its output for comparison with the XLNet model.
Supervised classification—using the XLNet model [20] to compare with the clustering results, as they are sensitive to the distance metrics chosen.

As a result, clusters of subfields within Mathematics and Computer Science were identified as sharing similar meta-linguistic structures. The practical significance of this finding lies in the observed correlation between students’ academic performance and the meta-language through which content is delivered. These results suggest that rephrasing or translating material into a more familiar meta-language may improve comprehension and support more effective learning outcomes.

Pipeline of the Whole Process

We now present an example illustrating the entire process based on the following input text from the analysis domain: “But then the sequence

{f_{n} (x)}

converges uniformly to f on the interval

[a, b]

, and therefore the limit preserves each linear transformation”.

This example is visualized in Figure 2.

4. Text Preprocessing

The goal of the preprocessing pipeline is to isolate those fragments of a text that convey howscholars argue—connectives, discourse cues, and logical operators—while suppressing discipline-specific terminology that encodes what is being discussed. The resulting “rhetorical backbone” serves as the input for embedding and clustering.

The objective of this research is to isolate the meta-linguistic scaffold—connectives (such as but then, therefore, let) from the disciplinary core (uniformly converges, natural isomorphism) and discourse markers that signal argumentative flow, while suppressing discipline-specific terminology. The procedure consists of three macro-steps:

▹: Corpus acquisition (Section 4.1);
▹: NLP normalization and lexical cleaning (Section 4.2);
▹: Segmenting into fixed-length windows for downstream models (Section 4.3).

4.1. Corpus Acquisition

This stage assembles parallel corpora that are sufficiently large and balanced to train statistical models. In this study, only open-access textbooks and lecture notes in Mathematics and Computer Science were examined, resulting in 7382 PDF files (Table 1).

Once the corpora were assembled, the next step was to extract their textual content from the source documents, a process described in Section 4.1.

From PDF to Raw Text

The objective at this stage is accurate text extraction. Born-digital PDFs—documents that originate in a digital format rather than being scanned from paper—can be parsed directly. In contrast, scanned documents require Optical Character Recognition (OCR) to convert images into machine-readable text. Because any extraction errors at this stage could affect downstream processing, a robust two-parser strategy with a fallback mechanism was implemented. The two-parser strategy refers to using the following two different parsing approaches depending on the type of PDF:

Born-digital PDFs. Three parsers were initially benchmarked: PyMuPDF, pdfminer.six, and PyPDF2. PyPDF2 demonstrated the best balance between speed and extraction fidelity, making it the primary candidate. Although pdfminer.six was retained as a fallback due to its robustness in handling exotic text encodings, further evaluation showed that PyPDF2 was sufficient for all documents in the dataset. Consequently, only PyPDF2 was used in the final pipeline.
Scanned PDFs. For image-only pages, Tesseract 4.1 was employed.

4.2. Employing NLP Techniques for Normalization and Lexical Cleaning

The normalization step standardizes orthography, removes layout artifacts, and strips formulae so that subsequent linguistic processing is not distracted by mathematical notation.

Step 1:: Hyphenated line breaks $(w o r d) - \ n (w o r d) \to w o r d w o r d$ ;
Step 2:: Removal of LaTeX blocks $$ \dots $, \ [\dots \], any \ b e g i n {\dots} \dots \ end {\dots}$ ;
Step 3:: Deletion of numerals $\underset{˙}{+} (? : [.,] \underset{˙}{+}) ? % ?$ ;
Step 4:: Unicode NFKD and whitespace squash.

After obtaining the raw text, the material underwent a series of normalization and cleaning procedures, outlined in Section 4.2.1.

4.2.1. Tokenization and Annotation

After normalization, each document is tokenized and enriched with POS tags and lemmata. These annotations enable frequency-based filtering and the construction of multi-word stop lists. Here, spaCy 3.7.4 with the model en_core_web_sm (components NER and parser disabled) was employed. For each token tok, the following holds:

The original form tok.text;
Its lemma tok.lemma_;
Its part-of-speech tag tok.pos_;
A stop-word flag tok.is_stop.

4.2.2. Domain-Term Masking

Masking deletes high-TF–IDF n-grams that are strongly biased towards a single domain corpus, thereby neutralizing topical vocabulary while preserving discourse connectives.

Automatic Multi-Word Stop List

This phase builds a dynamic stop list that captures multi-word technical terms such as uniform convergence or AVL tree. The list is corpus-specific and therefore more precise than generic stop-word inventories.

Phase 1:: Extract every n-gram ( $n = 2, 3$ ) with frequency $\geq 3$ in D.
Phase 2:: Compute TF–IDF (min_df = 3, max_df = 0.8).
Phase 3:: Keep the top $T O P_K = 3000$ highest-ranked candidates.
Phase 4:: Discard phrases consisting only of functional POS (CCONJ, SCONJ, PART, ADP, PRON, ADV); this preserves markers such as but then.
Phase 5:: Compare relative frequencies in D and R; a phrase p enters the final stop list $S$ if

$p_{R} < 5 \cdot 10^{- 6} and \frac{p_{D}}{p_{R}} > 80 .$

Single-Token Filter

Tokens that survive the previous phase are subjected to a vocabulary-frequency threshold based on Zipf scores; rare content words are removed to further suppress topic signal. For all remaining tokens, the frequency threshold

ZipfFreq (w) \geq 4.0

(

\geq 1 / 10,000

in general English; source: wordfreq) was applied. Depending on the desired strictness, the pipeline runs in one of the following three modes (Table 2):

Among the available modes—soft, moderate, and hard—the moderate setting was selected. The soft mode retained excessive domain-specific terminology, which diminished the separation between clusters. In contrast, the hard mode removed more technical jargon but also eliminated key discursive connectives, which slightly reduced clustering performance. The moderate mode offered a balanced trade-off, preserving enough linguistic structure to support meaningful clustering while filtering out unnecessary noise.

Algorithm per string s: delete every

p \in S

, tokenize, apply the mode’s rule to each token, and re-join with spaces. Complexity

O (| s |)

.

4.2.3. Parameter Selection

The hyper-parameters shown in Table 3 were optimized on a held-out slice of the corpus to maximize the ratio between retained discourse markers and deleted technical terms.

4.2.4. Illustration

Table 4 demonstrates how increasingly strict cleaning modes affect a sample sentence: domain terminology is stripped while connective phrases remain intact, evidencing successful meta-language extraction.

The moderate mode removes the domain term converges uniformly while preserving the discourse connector but then, yielding an accurate meta-linguistic profile for model training.

4.3. Segmentation

After masking, text is split into non-overlapping windows of 256 tokens. Sensitivity tests with 128 and 512 tokens altered XLNet validation loss by <0.7%. The final working set contains 152 k segments.

4.4. Effect of the Pipeline

Table 5 shows token counts at successive stages. Domain masking removes 42% of all tokens yet keeps 96.4% of discourse markers (hand-checked list of 70).

All scripts are published in clean_text.py and make_stoplist.py; experimental runs are tracked in MLflow (commit a1b2c3d). It is important to note that the method presented here for extracting meta-language from text is specific and focused. An earlier attempt was made to rely on term dictionaries from the fields of Computer Science and Mathematics, but this approach did not yield satisfactory results due to several significant challenges. First, a term may be meaningful within one discipline but not considered a distinct concept in another. Second, identifying multi-word expressions proved problematic, as the full phrase may constitute a meaningful term concept, whereas the individual words on their own do not. Finally, there is a practical limitation: comprehensive, high-quality dictionaries of domain-specific terminology in Mathematics and Computer Science are currently lacking, particularly for use in natural language processing (NLP) tasks.

5. Classification Process

This section describes how numerical text representations obtained in Section 4 were clustered and validated using a neural model, bridging the unsupervised and supervised components of the research framework. After removing domain-specific terminology and segmenting the documents, each text chunk was encoded numerically using two methods: TF-IDF vectors and Doc2Vec embeddings. Preliminary experiments revealed that Doc2Vec [18] produced more semantically coherent and stable clusters than TF-IDF, as it captures contextual relationships beyond mere word frequency. Therefore, Doc2Vec was selected as the primary representation, while TF-IDF was retained for comparison. A distance matrix was then computed using Pearson’s correlation to quantify similarities between the document vectors, forming the basis for applying various clustering algorithms (K-means, PAM (K-Medoids), DBSCAN, and Gaussian Mixture Models (GMMs)). Both cosine similarity and Pearson correlation (i.e., cosine similarity on centered vectors) were tested, and no difference in the results was observed.

5.1. Clustering

This section presents the application of various clustering methods.

5.1.1. K-Means Clustering

Clustering analysis is highly sensitive to the choice of distance metric. As an initial approach, we employed the K-means algorithm, a widely used method recognized for its simplicity and robust mathematical basis. This algorithm partitions the dataset into k clusters, each defined by a centroid, which represents the arithmetic mean of the data points assigned to that cluster. Table 6 summarizes the results obtained from applying K-means clustering.

Each row in the table corresponds to a specific field, from either Mathematics or Computer Science. Each column corresponds to a cluster. The numbers in each cell represent the percentage of data from a field that belongs to a particular cluster. Color coding is used to highlight significance, as follows:

Red: high concentration—more than 50% of the data from that domain fall into this cluster (indicating a strong association);
Green: medium association—between 16% and 50%;
Blue: weak association—15% or less.

The clustering outcomes are as follows: Operation Systems and Data Structures and Algorithms tend to belong to Cluster 1, Functional Programming to Cluster 3, and Set Theory to Cluster 4. For the remaining courses, no clear grouping emerged that would suggest meaningful clustering.

5.1.2. PAM Clustering

Table 7 summarizes the results obtained from applying Pam clustering.

The clustering outcomes are as follows:

Cluster 1: Operation Systems, Data Structures and Algorithms, Imperative Programming, OOP, Combinatorics and Probability;
Cluster 2: Linear Algebra, Abstract Algebra;
Cluster 3: Functional Programming, Automata and Computation Theory, Logic, Set Theory;
Cluster 4: Analysis.

5.1.3. Density Clustering

Density-based clustering identifies clusters as contiguous regions of high data point density separated by areas of lower density. This approach operates under the assumption that data points are sampled from an unknown probability distribution, with clusters corresponding to dense areas within the data space. The most prominent algorithm in this category is DBSCAN, which groups closely packed points and classifies those in sparse regions as outliers. In contrast with methods like K-means, density-based clustering does not require predefining the number of clusters and is capable of detecting clusters of arbitrary shapes and sizes. This flexibility makes it particularly well suited for uncovering complex data structures. Moreover, density-based techniques are robust to noise and excel at identifying outliers.

The motivation for testing this method stems from the limitations of the Euclidean distance metric and, by extension, the K-means algorithm. For instance, consider the three data points A, B, and C: A is close to B, and B is close to C, but A is not sufficiently close to C under Euclidean distance. As a result, A and C would be assigned to different clusters, even though—conceptually or contextually—they belong together, as in the case of related fields in Mathematics and Computer Science. In such scenarios, density-based clustering algorithms like DBSCAN are more appropriate, as they can capture transitive closeness and cluster structure more effectively. Table 8 summarizes the results obtained from applying density clustering.

The clustering outcomes are as follows: The last seven courses in the table tend to belong to Cluster 2. For the remaining courses, no clear grouping was observed that would indicate meaningful clustering.

5.1.4. Gaussian Mixture Models (GMMs)

Gaussian Mixture Models (GMMs) approach clustering by assuming that the data are generated from a combination of several Gaussian (normal) distributions. A Gaussian distribution is characterized by a bell-shaped curve centered around a mean, with most data points concentrated near the mean and fewer occurring farther away. In GMMs, each Gaussian component represents a cluster, and data points are assigned to clusters based on the probability of their belonging to each distribution. These probabilities are determined by how well a point matches the parameters—mean and variance—of each Gaussian component.

In our study, the results showed that all the data were assigned to Cluster 1; therefore, it was deemed unnecessary to present a table with these data. The outcomes of the whole clustering process are presented and discussed in the Results section.

5.2. Transformer (XLNet) Implementation

Given the sensitivity of clustering results to the choice of distance metrics, a Neural Network Model—specifically XLNet—was introduced to compare and validate the resulting groupings. XLNet [20] was selected for its advanced capabilities in modeling long-range dependencies and bidirectional context, using a permutation-based training strategy that captures rich linguistic patterns often missed by traditional models [21]. Unlike models that rely on masked tokens or fixed word order, XLNet considers all possible word arrangements, making it particularly effective for tasks such as text classification and linguistic comparison [22,23,24]. These strengths, combined with its robustness to domain shifts and superior performance across benchmarks, made XLNet an ideal choice for extracting and comparing the meta-linguistic structures of texts from different disciplines. The advantages of XLNet over BERT are convincingly demonstrated in [20]. Furthermore, XLNet has also been shown to perform at a high level among the evaluated models BERT (baseline), DistilBERT, RoBERTa, XLNet, and ELECTRA [25]. The outcomes of integrating the XLNet model into the research process are presented and discussed in the Results section.

Building on the theoretical justification for employing XLNet to capture deep linguistic structures, the technical implementation is presented next, detailing the pipeline from data preprocessing and model fine-tuning to the construction of a symmetric error matrix that operationally defines the distance between discipline-specific meta-languages.

5.2.1. Data Preparation

Corpus layout. For every discipline $c \in C$ , the cleaned texts (see Section 4) reside in Books_Clean.
Segmentation. The helper script prepare_dataset.py splits each file into non-overlapping 256-token XLNet chunks, yielding a dataset $D = {(x_{i}, y_{i})}_{i = 1}^{N}$ .
Stratified split. $D$ is divided 80/20 into train and test while preserving the empirical class distribution.

5.2.2. Classifier Training

Model. The process begins with the public checkpoint xlnet-base-cased; the classification head (sequence_summary + logits_proj) is randomly initialized.
Tokenizer. XLNetTokenizerFast (Rust tokenizers) was employed, removing the sentencepiece dependency.
Hyper-parameters.
- Batch = 8 (fp16 on GPU/MPS);
- Epochs = 6;
- AdamW, $η = 2 \times 10^{- 5}$ , linear scheduler;
- logging_steps = 10, evaluation and checkpoint at the end of every epoch.
Training. The loop is executed via Trainer; on an Apple M4 Max, 3.9 batch/s was reached.

5.2.3. Baseline Metric

After three epochs, the model attains a test accuracy of

{Acc}_{t e s t} = 0.783

, confirming that the meta-language layer alone contains enough signal to discriminate between disciplines.

5.2.4. Computation Steps of Error Matrix and Meta-Language Distance

Predictions. For every test sample, the true label $y_{i}$ and the XLNet prediction ${\hat{y}}_{i}$ were recorded.
Confusion matrix. ${CM}_{a b} = | {i : y_{i} = a, {\hat{y}}_{i} = b} |$ for all $a, b \in C$ .
Row normalization

$P (\hat{y} = b ∣ y = a) = \frac{{CM}_{a b}}{\sum_{b^{'}} {CM}_{a b^{'}}} .$
Symmetrization

$d (a, b) = \frac{1}{2} [P (\hat{y} = b ∣ y = a) + P (\hat{y} = a ∣ y = b)], d (a, a) = 0 .$

The larger $d (a, b)$ , the more often XLNet confuses texts of the two disciplines—that is, the “closer” their meta-languages.
Visualization. The matrix $D = [d (a, b)]$ is written to metalang_distance.csv; a heat map is shown in Figure 3 and explained in Results section.

5.2.5. Interpretation

$d (a, b) \approx 0$ —the model almost never confuses the classes; meta-languages are well separated.
$d (a, b) \to 0.5$ —the discourse skeletons are so similar that once domain terms are masked, the two corpora become nearly interchangeable.

Thus, the distance measure directly reflects the difficulty for a learner (XLNet) to distinguish two courses, lending the metric an immediate pedagogical interpretation. For qualitative analysis, a JSONL file misclassified.jsonl was saved; each record contains the following:

The original 256-token chunk;
Its true discipline;
The false prediction.

The first 20 examples are printed to the console after training.

6. Content Analysis of Semi-Structured Interviews with Software Engineering Students and Lecturers

Following the text clustering results obtained in the previous stage of the research, the next step involved testing the hypothesis that students tend to exhibit similar levels of understanding in subject areas characterized by similar meta-languages. To explore this assumption, student input was gathered through a series of interviews.

The main objective of the interviews was to assess the core hypothesis of this study—that students are likely to perform at similar levels across Computer Science and Mathematics disciplines that share closely related meta-languages. Since individuals often pursue areas where they experience success, the interviews also aimed to explore whether students show a preference for fields with similar linguistic structures. Additionally, the interviews sought to gain insights into participants’ views on the connection between Mathematical and Computational Thinking and to explore the potential for strengthening these cognitive skills to enhance success in related domains.

6.1. Participant Profile

This study involved 10 final-year software engineering students from a college of engineering. These participants had completed the courses selected as representatives of the Mathematics and Computer Science domains and brought with them diverse educational backgrounds in these subjects from their prior schooling. In addition, 10 lecturers from the same institution participated in the research. These lecturers teach across both Mathematics and Computer Science disciplines and possess applied experience in information technology. All participants were provided with detailed information regarding the aims of the study. They were informed that participation was voluntary, anonymity would be preserved, and the research was approved by the college’s ethics committee.

6.2. Interview Guide

The interview questions were carefully designed to align with the research objectives and to elicit participants’ perspectives on the relationship between the two forms of thinking. Prior to their use, the interview guide underwent expert validation to ensure relevance and clarity. Originally developed as a semi-structured instrument, the guide was refined during both the validation process and the course of the interviews. These adjustments were informed by participants’ responses, although the majority of the questions retained their initial structured format. The final version of the interview guide is presented in Table 9.

6.3. Content Analysis

The content analysis in this study was guided by established methodologies outlined in seminal and contemporary sources, including [26,27,28,29,30]. The key stages of qualitative content analysis followed in this research were as follows:

Defining coding units and categories.
Transcripts were segmented into meaningful units, which were then assigned to preliminary thematic categories based on the research objectives and prior theory.
Developing a coding framework.
A coding framework was iteratively developed and refined during initial coding. This included labeling themes such as “computational thinking skills”, “mathematical thinking skills”, and “relationship between the two types of thinking”.
Systematic coding of the content.
Each interview transcript was examined, and segments were coded according to the framework. Codes were applied consistently across the student and lecturer datasets.
Analyzing the coded data.
The coded segments were analyzed for patterns and thematic relationships. Emerging themes were identified and supported by illustrative quotations from participants. These are presented in detail in Table 10, showing how categories, themes, and selected quotes interrelate.

Findings from the qualitative content analysis of semi-structured interviews are detailed in the Results section.

7. Results

This section presents the results of this study, including outcomes from the clustering analysis, the XLNet-based classification process, and the qualitative content analysis of semi-structured interviews. Together, these findings provide a comprehensive view of the linguistic patterns and meta-language structures across the examined disciplines.

7.1. Clustering Process Outcomes

Several clustering algorithms were subsequently tested on the corpora: K-means, PAM (K-Medoids), DBSCAN, and Gaussian Mixture Models (GMMs). Ultimately, the PAM algorithm produced the most coherent and expected results. A key challenge in applying K-means clustering lies in its dependence on the Euclidean distance metric, which is well suited for detecting spherical clusters. However, given our focus on correlations, we employed the Pearson distance instead. While K-means is appealing due to its simplicity and mathematical robustness, the results obtained in this case were neither fully interpretable nor consistent. This may be attributed to the algorithm’s inherent reliance on Euclidean geometry, as previously discussed.

Density-based clustering was also tested, but the resulting distributions lacked coherence and interpretability. This suggests that such methods may not be well suited to the structural characteristics of our dataset, underscoring the importance of matching clustering techniques to data properties.

Similarly, the Gaussian Mixture Model initially produced a division that was illogical and lacked a meaningful structure. In contrast, the PAM (Partitioning around Medoids) algorithm yielded a more coherent and interpretable clustering solution. As a result, the PAM-based division—comprising four final clusters—was selected for validation using the AI model. Given that the results of the clustering can vary substantially based on the distance metrics chosen, the XLNet Neural Network model was also used to allow comparative analysis, and the results are presented below.

7.2. The Outcomes of Using the XLNet Model

The confusion matrix resulting from the application of the XLNet model (as shown in Figure 3) is presented and analyzed below, offering insights into classification accuracy and highlighting areas where courses share overlapping meta-linguistic structures.

All disciplines were assigned numerical labels from 0 to 11, with Computer Science domains labeled from 0 to 5 (e.g., Operating Systems as 0) and Mathematics domains labeled from 6 to 11 (e.g., Abstract Algebra as 6), following the order presented in Table 6. Examining the first row of the results, which corresponds to the Operating Systems label, a total of 671 samples were observed. Of these, approximately 45% were correctly classified as Operating Systems, while the remaining samples were misclassified into domains labeled 1, 3, 4, and 8. This distribution indicates a linguistic similarity between Operating Systems and those specific domains, suggesting shared characteristics in their meta-languages. These findings are consistent with the previously identified cluster structure, reinforcing the credibility of the clustering approach.

A similar pattern was evident in other rows of the data, indicating a regularity in how XLNet recognized language structures. The model successfully captured deeper linguistic patterns across related disciplines. Importantly, the groupings identified by XLNet closely aligned with those produced by the clustering analysis. This strong correspondence supports the conclusion that the clustering results were meaningful rather than arbitrary. Together, the results of both methods provide reliable and insightful classifications of the disciplines based on their underlying language characteristics.

The combination of cleaned corpus and an XLNet classifier not only reaches 78% accuracy and an F1-score of 0.944 (according to the calculation in Appendix A) but also supplies a quantitative “proximity” scale based on the symmetric error rate, suitable for clustering disciplines and for pinpointing pedagogical bottlenecks where two courses share an almost identical meta-linguistic frame.

Following the Data Mining stage, the next phase of the study involved direct human participation: a qualitative content analysis of semi-structured interviews conducted with software engineering students and lecturers.

7.3. Results of the Content Analysis

Understanding the demographic and experiential background of both students and lecturers provides essential context for interpreting the findings of this study.

Among the students interviewed, the group comprised 70% males and 30% females. A total of 50% of students reported having a high level of Mathematics education in school, while 30% indicated a strong background in Computer Science at that stage. A total of 60% of students had been exposed to programming from an early age, suggesting that a substantial proportion entered higher education with prior computational experience.

The lecturers included 50% females and 50% males. A total of 80% had many years of teaching experience, reflecting a high level of pedagogical expertise. Additionally, 40% of lecturers reported extensive professional experience in the high-tech industry, indicating that a portion of the teaching staff brings both academic and applied industry perspectives to their instruction.

Following the content analysis of semi-structured interviews with software engineering students and lecturers, a set of codes was developed, which were subsequently grouped into the following seven overarching categories:

Socio-demographic background;
Perceived similarities and differences among the studied courses;
Characteristics and definitions of Computational Thinking;
Characteristics and definitions of Mathematical Thinking;
Skills associated with Computational Thinking;
Skills associated with Mathematical Thinking;
Interrelationship between the two types of thinking.

All interviews were coded independently using a predefined coding scheme developed during the early stages of analysis. Thematic saturation was reached when no new themes emerged across subsequent interviews. The quotes selected for inclusion in the table are illustrative rather than exhaustive; they were chosen to exemplify common themes without redundancy. The analysis of the interview data revealed several key themes, as follows:

Students and lecturers classify courses based on different rationales; students tend to prioritize practical application.
Both groups categorize courses in a way that aligns closely with the clustering observed through meta-language similarity analysis, although they do not explicitly cite this as their reasoning.
Computational Thinking is viewed as heavily reliant on engineering and algorithmic problem-solving approaches, whereas Mathematical Thinking is associated with precision and the formulation of problems.
Participants suggested that Computational Thinking cannot develop independently of Mathematical Thinking. However, an overemphasis on Computer Science courses may reduce students’ interest in Mathematics, potentially resulting in academic difficulties.
There was a shared belief in the value of introducing programming at an early age, as early exposure tends to foster long-term interest and academic success in related fields.

Using illustrative quotations, the analysis explores the relationships between these categories and interprets their significance within the study’s framework. Table 10 presents the categories, themes, illustrative quotes, and relationships among them.

To ensure transparency and minimize bias in the selection and interpretation of interview quotes, we followed established practices in qualitative content analysis. The selection of quotes presented in Table 10 was guided by three main criteria: (1) relevance to the identified theme, (2) clarity in expressing a distinct viewpoint or pattern observed across multiple participants, and (3) representativeness of recurring ideas or contrasting opinions. A response was considered a shared theme if at least 60% of the students or lecturers provided the same or a closely related answer. This threshold was applied separately to each group.

8. Discussion

This section presents the major insights derived from the thematic content analysis, their interpretations, and the conclusions drawn in light of the research aims.

This study employed two complementary approaches to explore the relationship between subfields in Mathematics and Computer Science: (1) AI-based clustering using text classification and distance metrics applied to disciplinary corpora and (2) a qualitative content analysis of semi-structured interviews with students and lecturers. While both methods aimed to identify perceived or actual similarities between fields, they approach the question from fundamentally different angles.

AI-based clustering provided an objective, language-driven grouping of fields based on the similarity of their meta-languages. This method captures latent linguistic structures that may underlie instructional materials or conceptual formulations in each discipline. In contrast, the qualitative interviews explored participants’ subjective perceptions, preferences, and experiences—particularly their ability to recognize similarities and differences between courses based on practical, cognitive, or pedagogical reasoning.

Although the results from these methods are not identical, they are not necessarily contradictory. For example, both students and lecturers grouped certain courses in ways that coincided with the AI-generated clusters, even if they did not explicitly attribute their decisions to linguistic features. This alignment suggests that meta-language similarity may implicitly influence perceived similarity, even when participants are unaware of it. However, divergences between the two approaches also highlight areas where subjective perceptions are shaped more by utility, educational context, or professional relevance than by linguistic structure.

Therefore, rather than seeking a direct one-to-one correspondence between the two methods, this study positions them as complementary: AI-based clustering offers a theoretically grounded structural analysis, while interviews provide insight into how learners and educators engage with that structure in practice. This dual perspective enhances our understanding of how meta-linguistic similarity might be leveraged pedagogically to improve comprehension and curriculum design.

Several important findings emerged from the qualitative analysis, as follows:

Divergent Rationales for Course Categorization between Students and Lecturers When asked to group courses from the fields of Computer Science and Mathematics, students and lecturers applied different criteria. When limited to two groups, all lecturers classified courses based on disciplinary content and thematic coherence (see selected lecturer quotes in Theme 1, Table 10). In contrast, students tended to categorize Mathematics courses based on perceived utility—labeling them as either “useful” or “non-useful” in relation to their practical application in software engineering (see student quotes in Theme 1, Table 10). When allowed to create three or more groups, both groups arrived at similar divisions, albeit with different reasoning. This suggests that experienced lecturers, who possess comprehensive subject-matter expertise across both domains, classify courses based on academic structure and content, while students often prioritize applicability to real-world or programming contexts. This finding underscores the students’ instrumental view of Mathematics subjects, often treating them as secondary or supportive tools in service of mastering computer science.
Unintentional Alignment with Meta-Language-Based Clustering
Interestingly, both students and lecturers grouped courses in a way that largely corresponds to the clustering patterns previously identified based on meta-language similarity. However, participants did not explicitly recognize this rationale. This is not surprising, as individuals do not typically engage in a conscious comparison of meta-linguistic structures. Such comparisons require advanced computational tools, such as those applied in this study (e.g., Neural Networks). The alignment between human intuition and algorithmic grouping suggests that linguistic structures inherent in disciplinary texts may subconsciously influence how people perceive and relate academic content.
Defining Features of Computational and Mathematical Thinking
Participants generally characterized Computational Thinking as involving engineering-oriented reasoning and algorithmic problem-solving strategies, while Mathematical Thinking was described as requiring precision and focusing more on problem formulation than on solution implementation. These observations are consistent with existing literature ([2,31,32]), which highlights the distinctive cognitive demands of each type of thinking. Notably, many participants found it difficult to clearly differentiate between the two, reflecting their conceptual overlap and shared foundations.
Interdependence of Thinking Abilities and Declining Mathematical Engagement
Both students and lecturers acknowledged that computational thinking depends on a solid Mathematical foundation. Nevertheless, it was noted that some students show strong performance in Computer Science while underperforming in Mathematics. This discrepancy may result from students’ undervaluation of Mathematics, leading to reduced motivation and investment. This suggests that the problem may not stem from a lack of ability, but rather from an insufficient emphasis on mathematical involvement in educational trajectories focused heavily on computing.
Early Exposure to Programming Enhances Long-Term Success
Participants strongly endorsed the idea of introducing programming concepts at an early age. Students who had early exposure reported sustained interest and higher achievement in related coursework throughout their studies. They also noted that this early familiarity allowed them to focus more deeply on advanced topics during their degree. The lecturers, drawing on their professional experience, similarly emphasized that early engagement with Computer Science tends to foster long-term motivation and academic success.

Another important methodological consideration concerns the initial objective of comparing textbooks in several languages. This goal proved challenging due to the scarcity of high-quality professional literature in some of these languages; the available materials were often of insufficient quality for reliable analysis. Given that the text-cleaning process acts as a bottleneck in the workflow, the quality of source data has a substantial impact on all subsequent stages. As a result, this research focused on languages for which adequate materials were available. The question of incorporating additional languages remains open, and future comparative studies with English will require source material of comparable quality and diversity.

9. Conclusions

This study explored the relationship between Mathematical Thinking and Computational Thinking through two complementary methodologies: AI-based clustering of meta-language in disciplinary texts and qualitative content analysis of semi-structured interviews with students and lecturers. While the findings reveal potentially meaningful patterns—such as overlaps in perceived course similarities and meta-linguistic structures—they may be viewed as hypotheses but not necessarily as definitive conclusions. Given the exploratory and interpretive nature of the research, the next logical step is to design a quantitative study to test these hypotheses in a controlled and statistically rigorous manner.

According to the interviews, one of the major conclusions is that students consistently place a higher priority on Computer Science courses than on Mathematics courses. This tendency reflects a pragmatic approach, where students perceive Computer Science as more directly relevant to their future careers, offering practical skills and immediate applicability. In contrast, Mathematics courses are often viewed as tools to be utilized rather than disciplines to be mastered. Despite this perception, the data gathered from the interviews clearly demonstrate that a strong mathematical foundation is necessary for excelling in many core areas of Computer Science, including algorithm design, data structures, cryptography, machine learning, and computational complexity.

This discrepancy between perceived utility and actual necessity reveals a significant educational challenge: how can educators bridge the motivational gap and help students appreciate the intrinsic value of Mathematics? One approach is to reframe the presentation of mathematical content through the lens of Computer Science, leveraging a meta-language that students are already comfortable with. For example, if a student is fluent in the conceptual framework of algorithmic thinking but unfamiliar with the structures of combinatorics, then presenting combinatorics in the form of algorithmic processes may help lower the cognitive barrier. This strategy aligns with recent advances in AI, which enable dynamic reformulation of course materials. By using AI tools to translate mathematical concepts into familiar linguistic and structural patterns, the learning experience can be personalized to promote deeper engagement.

The discussion with students and lecturers further highlighted that students are more likely to invest effort and exhibit curiosity when they feel confident in their understanding of the subject’s language and logic. Thus, making the meta-language of Mathematics more accessible—particularly by drawing parallels with well-understood Computer Science domains—could be a powerful strategy to improve motivation and learning outcomes. Lecturers also noted that when students succeed in one domain, they are more open to exploring adjacent fields, especially if they recognize a linguistic or conceptual bridge between them.

Based on these insights, several conclusions can be drawn. First, the disconnect between students’ preferences and their actual needs in interdisciplinary learning should be actively addressed through curriculum design. Second, promoting Mathematical Thinking should not rely solely on emphasizing its utility for Computer Science; rather, it should highlight its intellectual beauty and interconnectedness with computing disciplines. Third, the adaptation of teaching materials through AI-driven meta-language transformation represents a promising direction for educational innovation. Such an approach could not only enhance comprehension in challenging subjects but also foster a more integrated view of knowledge across disciplines.

Our ongoing research aims to empirically test this hypothesis by designing an experimental framework where mathematical topics are reformulated using the meta-languages of adjacent Computer Science domains.This method will be examined to determine whether it leads to improved student performance and engagement, especially in areas where students have previously struggled. If successful, this approach could serve as a scalable model for improving interdisciplinary education through adaptive language strategies and personalized learning technologies.

In future work, expanding the analysis to additional languages remains a priority. This will require securing high-quality, domain-relevant educational resources in those languages to ensure comparability with the English corpus and to maintain methodological rigor.

By treating the results of this study as a basis for further inquiry, we aim to contribute to the broader understanding of how meta-language, cognition, and disciplinary structure intersect—and how this intersection may be leveraged to improve STEM education.

Author Contributions

Conceptualization, E.K., D.L. and M.G.; methodology, E.K. and D.L.; software, E.K. and D.L.; validation, E.K. and D.L.; formal analysis, E.K. and D.L.; investigation, E.K. and D.L.; resources, E.K.; data curation, E.K.; writing—original draft preparation, E.K., D.L. and M.W.C.; writing—review and editing, E.K., D.L. and M.W.C.; visualization, E.K., D.L. and M.W.C.; project administration, E.K. and D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was approved by the Ethics Committee of the Braude College of Engineering, protocol code 2023-004 approved on 5 June 2023.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to confidentiality issues.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MT	Mathematical Thinking
CT	Computational Thinking
AI	Artificial Intelligence
CCONJ	Coordinating Conjunction
SCONJ	Subordinating Conjunction
PART	Particle
ADP	Adposition
PRON	Pronoun
ADV	Adverb
PAM	Partition around Method
OCR	Optical Character Recognition
NLP	Natural Language Processing
OOP	Object-Oriented Programming
GMMs	Gaussian Mixture Models
D	Domain
R	Reference
NFKD	Normalization Form Compatibility Decomposition
NER	Named Entity Recognition
POS	Part-of-Speech
TF-IDF	Term Frequency—Inverse Document Frequency

Appendix A. F1-Score Calculation

A comparative analysis of the two clustering approaches described above is performed as follows:

Define

TP (True Positive): A pair of objects $i, j$ belongs to the same cluster and $M [i, j] + M [j, i] \neq 0$ .
FN (False Negative): A pair of objects $i, j$ belongs to the same cluster, but $M [i, j] + M [j, i] = 0$ .
FP (False Positive): $M [i, j] + M [j, i] \neq 0$ , but the objects i and j are in different clusters.
TN (True Negative): A pair of objects $i, j$ belongs to different clusters and $M [i, j] + M [j, i] = 0$ .

Here, M denotes the confusion matrix.

Then, the standard evaluation metrics are defined as follows:

$Precision = \frac{TP}{TP + FP}$
$Recall = \frac{TP}{TP + FN}$
$F 1 - score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}$

References

Shute, V.J.; Sun, C.; Asbell-Clarke, J. Demystifying Computational Thinking. Educ. Res. Rev. 2017, 22, 142–158. [Google Scholar] [CrossRef]
Wing, J.M. Computational Thinking. Commun. ACM 2006, 49, 33–35. [Google Scholar] [CrossRef]
Mason, J.; Burton, L.; Stacey, K. Thinking Mathematically, 2nd ed.; Pearson Higher Education: London, UK, 2011. [Google Scholar]
De Saussure, F. Course in General Linguistics; Bally, C., Sechehaye, A., Eds.; Open Court Publishing: Chicago, IL, USA, 1916. [Google Scholar]
Heidegger, M. Being and Time; Macquarrie, J.; Robinson, E., Translators; Harper & Row: New York, NY, USA, 1962. [Google Scholar]
Chomsky, N. Language and Mind, 3rd ed.; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
Tarski, A. The Semantic Conception of Truth and the Foundations of Semantics. Philos. Phenom. Res. 1944, 4, 341–376. [Google Scholar] [CrossRef]
Gruber, M. Alfred Tarski and the “Concept of Truth in Formalized Languages”: A Running Commentary with Consideration of the Polish Original and the German Translation; Springer: Cham, Switzerland, 2016; Volume 39. [Google Scholar]
Richter, F. Logic, Language, and Calculus. arXiv 2020, arXiv:2007.02484. [Google Scholar] [CrossRef]
Moggi, E. Metalanguages and Applications. In Semantics and Logics of Computation; Pitts, A., Dybjer, P., Eds.; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Weintrop, D.; Beheshti, E.; Horn, M.; Orton, K.; Jona, K.; Trouille, L.; Wilensky, U. Defining Computational Thinking for Mathematics and Science Classrooms. J. Sci. Educ. Technol. 2016, 25, 127–147. [Google Scholar] [CrossRef]
Cheng, J. Data-Mining Research in Education. arXiv 2017, arXiv:1703.10117. [Google Scholar] [CrossRef]
Hand, D.J. Principles of Data Mining. Drug Saf. 2007, 30, 621–622. [Google Scholar] [CrossRef] [PubMed]
Cohen, I.; Huang, Y.; Chen, J.; Benesty, J. Pearson Correlation Coefficient. In Noise Reduction in Speech Processing; Kuo, T.-C., Ed.; Springer: New York, NY, USA, 2009; pp. 1–4. [Google Scholar]
Van Dongen, S.; Enright, A.J. Metric Distances Derived from Cosine Similarity and Pearson and Spearman Correlations. arXiv 2012, arXiv:1208.3145. [Google Scholar] [CrossRef]
Broder, A.; Glassman, S.; Manasse, M.; Zweig, G. Syntactic Clustering of the Web. Comput. Netw. ISDN Syst. 1997, 29, 1157–1166. [Google Scholar] [CrossRef]
Salton, G.; Buckley, C. Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
Le, Q.V.; Mikolov, T. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 21–26 June 2014. [Google Scholar]
Kapp-Joswig, J.-O.F.; Keller, B.G. Clustering—Basic Concepts and Methods. arXiv 2022, arXiv:2212.01248. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Research Data Pod. Paper Reading: XLNet Explained. Available online: https://researchdatapod.com/paper-reading-xlnet-explained/ (accessed on 21 September 2024).
Vijayarani, S.; Janani, S. A Survey on Text Classification Algorithms. Int. J. Comput. Sci. Inf. Technol. 2016, 7, 480–483. [Google Scholar]
Hossain, M.I.; Khan, H.U.; Irfan, M.S.; Basir, M.R. A Survey on Text Classification Algorithms from Text to Predictions. Int. J. Innov. Res. Comput. Commun. Eng. 2019, 13, 83. [Google Scholar]
Roy, S. A Survey on Text Classification from Traditional to Deep Learning. J. Comput. Technol. 2020, 13, 31. [Google Scholar]
Cortiz, D. Exploring Transformers in Emotion Recognition: A Comparison of BERT, DistilBERT, RoBERTa, XLNet and ELECTRA. arXiv 2021, arXiv:2104.02041. [Google Scholar]
Berelson, B. Content Analysis in Communication Research; Free Press: Glencoe, IL, USA, 1952. [Google Scholar]
O’Connor, H.; Gibson, N. A Step-by-Step Guide to Qualitative Data Analysis. Pimatisiwin 2003, 1, 63–90. [Google Scholar]
Mayring, P. Qualitative Content Analysis. Companion Qual. Res. 2004, 1, 159–176. [Google Scholar]
Zhang, Y.; Wildemuth, B.M. Qualitative Analysis of Content. In Applications of Social Research Methods to Questions in Information and Library Science; Wildemuth, B.M., Ed.; Libraries Unlimited: Westport, CT, USA, 2009; pp. 1–12. [Google Scholar]
Creswell, J.W.; Plano Clark, V.L. Designing and Conducting Mixed Methods Research, 3rd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2021. [Google Scholar]
Rambally, G. The Synergism of Mathematical Thinking and Computational Thinking. In Cases on Technology Integration in Mathematics Education; Polly, D., Ed.; IGI Global: Hershey, PA, USA, 2016; pp. 416–437. [Google Scholar]
Kaufmann, O.T.; Stenseth, B. Programming in Mathematics Education. Int. J. Math. Educ. Sci. Technol. 2020, 52, 1029–1048. [Google Scholar] [CrossRef]

Figure 1. Pipeline of the Data Mining stage.

Figure 2. Pipeline of the whole process.

Figure 3. Confusion matrix of applying the XLNet model.

Table 1. Datasets used in preprocessing.

Corpus	Contents	Size
D (Domain)	Domain textbooks/notes	7382 docs, 4.8 M tokens
R (Reference)	100,000 random English Wikipedia lines (20-May-2025 dump)	3.4 M tokens

Table 2. Modes of single-token filter.

Mode	Frequency Check	Stored Form
soft	original word	original word
moderate	lemma	original word
hard	lemma	lemma

Table 3. Hyper-parameters of text cleaning.

Parameter	Value	Comment
top_k	3000	candidate n-grams
rel_boost	80	$p_{D} / p_{R}$ threshold
abs_ref_max	$5 \times 10^{- 6}$	max. frequency in R
common_thr	4.0	Zipf cut-off
wiki_lines	300,000	size of reference corpus

Table 4. Effect of the three cleaning modes.

Input	Output
But then the sequence $f_{n} (x)$ converges uniformly to f.
soft →	but then the sequence converges uniformly
moderate →	but then the sequence
hard →	but then sequence

Table 5. Token statistics after each processing step.

Stage	Tokens	% of Raw	% Markers Kept
Raw extraction	4.80 M	100	100
Normalization	4.24 M	88.3	100
Dictionary filter	3.91 M	81.5	99.8
Domain masking	2.80 M	58.3	96.4

Table 6. Outcomes of applying K-means clustering.

Subject	Cluster 1	Cluster 2	Cluster 3	Cluster 4
Operation Systems	61	12	17	10
Data Structures and Algorithms	53	16	18	16
Functional Programming	18	26	52	4
Imperative Programming	35	17	38	10
OOP	32	26	22	21
Automata and Computation Theory	42	15	13	31
Abstract Algebra	17	42	16	25
Analysis	43	24	15	18
Combinatorics and Probability Theory	37	20	6	37
Linear Algebra	18	32	4	47
Logic	36	14	7	44
Set Theory	26	7	14	52

Table 7. Outcomes of applying PAM clustering.

Subject	Cluster 1	Cluster 2	Cluster 3	Cluster 4
Operation Systems	63	23	9	5
Data Structures and Algorithms	59	10	30	2
Functional Programming	17	21	58	5
Imperative Programming	65	5	9	20
OOP	58	14	14	15
Automata and Computation Theory	20	7	63	10
Abstract Algebra	16	67	9	8
Analysis	8	9	8	75
Combinatorics and Probability Theory	62	12	14	12
Linear Algebra	15	65	10	10
Logic	17	10	52	21
Set Theory	30	9	51	10

Table 8. Outcomes of applying density clustering.

Subject	Cluster 1	Cluster 2	Cluster 3	Cluster 4
Operation Systems	41	18	8	33
Data Structures and Algorithms	30	29	7	34
Functional Programming	18	49	2	31
Imperative Programming	27	34	8	31
OOP	31	35	5	29
Automata and Computation Theory	11	58	2	29
Abstract Algebra	4	79	9	9
Analysis	19	43	2	35
Combinatorics and Probability Theory	11	62	2	26
Linear Algebra	2	87	1	11
Logic	13	61	1	25
Set Theory	16	52	14	19

Table 9. Interview questions.

For lecturers only	Gender
	Educational background
	Teaching experience
	IT applied experience
For students only	Gender
	Level and quality of prior mathematical knowledge at school
	Level and quality of prior computer science knowledge at school
	At what age have you been exposed in programming?
	Assess your interest in mathematics and computer science before
	From the beginning your college studies, re-evaluate them as you approach
	the end of your studies in college
1. From the following list of courses, which ones did you include in your specialization? If you must divide them into exactly two categories, which courses would you place in each of the two categories? Are there any courses that seem “similar” to you? Please explicitly state why you consider them to be similar.
The list is Linear Algebra, Calculus 1, Combinatorics and Probability, Logic, Discrete Mathematics 1, Abstract Algebra, Introduction to System Programming, Data Structures and Algorithms, Java Programming, Automata and Computation Theory, Operation Systems, and Programming Languages.
2. If you can divide all the courses from the list into categories of “similar courses” (not necessarily only two categories), is the division different from the previous question? By what criteria did you divide? Could it be related to their meta-languages?
3. Suppose that there were two courses (not necessarily from the given list) you are interested in while studying these courses. What do you think about the differences and similarities of these courses?
4. How would you explain to someone mathematical thinking? What is specific to this type of thinking? Computational thinking?
5. What do you think are the unique properties of mathematical thinking and computational thinking that differentiate them/that make them similar?
6. Would you rather have a description of the task to be performed—as a list of requirements or in pseudo code? What is the reason?
7. Would you rather have the proof of a theorem in mathematics—as a formal proof or a textual explanation? What is the reason?
8. How does the development of mathematical thinking help develop computational thinking? How does the development of computational thinking help develop mathematical thinking? What age is appropriate to begin teaching these two types of thinking?
9. Does our college’s software engineering curriculum follow the proper sequence for the subjects of computer science and mathematics? If not, what can be improved to help students get the tools required to develop mathematical and computer science thinking?

Table 10. Themes and their relations to categories and selected quotes.

Theme 1: Students and lecturers categorize courses for different reasons, with students emphasizing practicality.

Relation to categories:

Socio-demographics.
Similarities and differences between studied courses.

Selected quotes:

Students:
“I will divide the courses according to what is useful for work and less useful for work.”
“Courses that are more theoretical and I didn’t get to meet them at work.”
“I combine the logic course with computer science courses because it develops the type of thinking I require for my work.”
Lecturers:
“The third group includes courses that can be taught both mathematically and in the computer science style. For instance, a logic course.”
“These are fewer engineering courses, more mathematical.”
“All the courses in this group are actually from the field of discrete mathematics.”

Theme 2: Both students and lecturers divide courses into groups close to clustering division based on meta-languages similarity, but they do not think this is a reason for division.

Relation to categories:

Similarities and differences between studied courses.

Selected quotes:

“I don’t think the courses I was interested in have a similar structure of their text.”
“Courses I put in this group differ in the structure of the proofs.”

Theme 3: The most important components of computational thinking are engineering thinking and algorithmic thinking for finding solutions. Mathematical thinking requires precision and is more about formulating problems than solving them.

Relation to categories:

The properties and definition of computational thinking.
The properties and definition of mathematical thinking.

Selected quotes:

Students:
“Computational thinking is the ability to solve problems by, sometimes, using mathematical tools.”
“Mathematical thinking involves the ability to translate a problem from one’s mind into formal, precise form.”
Lecturers:
“Mathematicians formulate problems.”
“Computational thinking is the solution of precisely formulated problems. And this is an engineering approach.”
“Mathematical thinking is characterized by a set of well-defined rules and definitions.”
“Computational thinking involves analytical calculations and the development of algorithms. It is also akin to engineering thinking.”

Theme 4: Computational thinking ability cannot exist without mathematical thinking ability, but it is possible that, due to excessive interest in computer science courses, interest in mathematics decreases, leading to academic failures.

Relation to categories:

Computational thinking skills.
Mathematical thinking skills.
The relationship between two types of thinking.

Selected quotes:

Students:
“I’m fine with math; I just didn’t have time to invest in it during my degree.”
“I feel that a solid mathematical foundation significantly has helped me to succeed in computer science courses.”
“Because I was deeply immersed in computer science, I ended up neglecting math.”
Lecturers:
“Based on my more than 20 years of experience as a lecturer, I’ve observed that students who excel in computer science tend to have mathematical thinking ability.”
“Computer science field is derived from mathematics, and it’s inconceivable that successful computer science students lack mathematical thinking.”

Theme 5: Children should be introduced to programming from a young age. For students who have been exposed to it early on, their interest and success tend to increase throughout their studies.

Relation to categories:

Socio-demographics.
Computational thinking skills.

Selected quotes:

Students:
“I was introduced to programming at age 7, and my interest grew during my studies.”
“I was introduced to programming during my school years. In college, I was able to tackle complex subjects that had previously sparked questions in my mind.”
Lecturers:
“Computational thinking should be cultivated from an early age, beginning in school. This approach shaped my educational journey, and by the time I pursued my degree, I had a clear understanding of my academic interests.”
“My son, who is 7 years old, is enrolled in enrichment classes focused on computational thinking at school. I observe that these classes are contributing positively to his development.”

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kramer, E.; Lamberg, D.; Georgescu, M.; Weiss Cohen, M. Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science. Information 2025, 16, 735. https://doi.org/10.3390/info16090735

AMA Style

Kramer E, Lamberg D, Georgescu M, Weiss Cohen M. Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science. Information. 2025; 16(9):735. https://doi.org/10.3390/info16090735

Chicago/Turabian Style

Kramer, Elena, Dan Lamberg, Mircea Georgescu, and Miri Weiss Cohen. 2025. "Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science" Information 16, no. 9: 735. https://doi.org/10.3390/info16090735

APA Style

Kramer, E., Lamberg, D., Georgescu, M., & Weiss Cohen, M. (2025). Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science. Information, 16(9), 735. https://doi.org/10.3390/info16090735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating AI with Meta-Language: An Interdisciplinary Framework for Classifying Concepts in Mathematics and Computer Science

Abstract

1. Introduction

2. Literature Review and Related Work

3. Proposed Approach

Pipeline of the Whole Process

4. Text Preprocessing

4.1. Corpus Acquisition

From PDF to Raw Text

4.2. Employing NLP Techniques for Normalization and Lexical Cleaning

4.2.1. Tokenization and Annotation

4.2.2. Domain-Term Masking

Automatic Multi-Word Stop List

Single-Token Filter

4.2.3. Parameter Selection

4.2.4. Illustration

4.3. Segmentation

4.4. Effect of the Pipeline

5. Classification Process

5.1. Clustering

5.1.1. K-Means Clustering

5.1.2. PAM Clustering

5.1.3. Density Clustering

5.1.4. Gaussian Mixture Models (GMMs)

5.2. Transformer (XLNet) Implementation

5.2.1. Data Preparation

5.2.2. Classifier Training

5.2.3. Baseline Metric

5.2.4. Computation Steps of Error Matrix and Meta-Language Distance

5.2.5. Interpretation

6. Content Analysis of Semi-Structured Interviews with Software Engineering Students and Lecturers

6.1. Participant Profile

6.2. Interview Guide

6.3. Content Analysis

7. Results

7.1. Clustering Process Outcomes

7.2. The Outcomes of Using the XLNet Model

7.3. Results of the Content Analysis

8. Discussion

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. F1-Score Calculation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI