Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text

Alsheddi, Abeer Saad; Menai, Mohamed El Bachir

doi:10.3390/app15126633

Open AccessArticle

Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text

by

Abeer Saad Alsheddi

^1,2,*

and

Mohamed El Bachir Menai

¹

Department of Computer Science, King Saud University, Riyadh 11451, Saudi Arabia

²

Computer Science Department, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6633; https://doi.org/10.3390/app15126633

Submission received: 13 May 2025 / Revised: 3 June 2025 / Accepted: 6 June 2025 / Published: 12 June 2025

(This article belongs to the Special Issue New Trends in Natural Language Processing)

Download

Browse Figures

Versions Notes

Abstract

The style change detection (SCD) task asks to find the positions of authors’ style changes within multi-authored texts. It has several application areas, such as forensics, cybercrime, and literary analysis. Since 2017, SCD solutions in English have been actively investigated. However, to the best of our knowledge, this task has not yet been investigated in Arabic text. Moreover, most existing SCD solutions represent boundaries surrounding segments by concatenating them. This shallow concatenation may lose style patterns within each segment and also increase input lengths while several embedding models restrict these lengths. This study seeks to bridge these gaps by introducing an Edge Convolutional Neural Network for the Arabic SCD task (ECNN-ASCD) solution. It represents boundaries as standalone learnable parameters across layers based on graph neural networks. ECNN-ASCD was trained on an Arabic dataset containing three classes of instances according to difficulty level: easy, medium, and hard. The results show that ECNN-ASCD achieved a high

F_{1}

score of 0.9945%, 0.9381%, and 0.9120% on easy, medium, and hard instances, respectively. The ablation experiments demonstrated the effectiveness of ECNN-ASCD components. As the first publicly available solution for Arabic SCD, ECNN-ASCD would open the door for more active research on solving this task and contribute to boosting research in Arabic NLP.

Keywords:

natural language processing; style change detection; multi-authored documents; graph neural networks; Arabic language; pretrained models

1. Introduction

With the rapid development of Artificial Intelligence technologies and the availability of textual data, the demand has increased to differentiate between the styles of authors who collaboratively write a document. A style change detection (SCD) task focuses on finding the positions of writing style changes in multi-authored documents [1,2,3,4,5,6,7]. SCD plays a pivotal role in many practical applications. In the authorship attribution task, detecting positions of style changes can help segment a document before mapping each segment to its author [8]. In legal contexts, the demand for forensic linguistic and law enforcement applications motivates work on this task [8,9]. Analyzing linguistic cues of law documents, such as threatening letters or ransom notes, can provide valuable insights into the potential style changes. In security measures, analyzing writing styles and detecting changes in sensitive documents can help find unauthorized modifications. In plagiarism, SCD-based solutions can suggest potential plagiarism cases by identifying changes in writing style without comparing the suspected and source documents. In historical documents, such solutions can help to identify literary plagiarism [10]. In commerce, SCD-based solutions can help evaluate the consistency of writing assistance tools and guide proofreaders to improve the coherence of overall texts with the fewest possible breaks in the flow of the texts [8]. SCD-based solutions can also help institutions adhere to a specific style in their documents, maintaining credibility.

The SCD task is related to a particular language. For the English language, the annual organization of PAN competitions (https://pan.webis.de, accessed on 30 April 2025) has become an essential event for encouraging researchers to develop state-of-the-art (SOTA) solutions on English datasets [1,2,3,4,5,6,7,11]. To the best of our knowledge, there is no available SCD solution for the Arabic language.

Since 2020, most SCD solutions have detected the style changes based on pretrained models to embed inputs and Fully Connected (FC) layers for classification [4,5,6,7,11]. This approach represents any boundary surrounding two segments by concatenating these segments as a single input and subsequently classifies it. However, this approach may lose the contextual information and distinct style patterns presented within each segment. The concatenation may also lead to potential confusion in style boundaries, where one style ends and another begins. Concatenating two segments results in an input with a longer length, while most pretrained models restrict the input lengths. Therefore, another approach can be investigated to preserve the boundaries of each segment.

In addition, comprehending relationships between textual segments, such as words and sentences, would enhance the detection of writing styles. Graph-based solutions take a graph as input, trying to involve structural properties within the data. Graph Neural Networks (GNNs) extend the existing neural networks to operate on graph-structured data directly [12]. Recently, GNN models have achieved promising results for some Natural Language Processing (NLP) tasks, such as an authorship verification task that determines whether an unknown text was written by a specific author [13,14,15] and semantic relationship tasks that analyze semantic relations between textual segments [16,17,18,19,20,21,22,23,24,25,26,27,28]. To the best of our knowledge, no existing GNN-based solution is available for the SCD task.

In this paper, we introduce the Edge Convolutional Neural Network for the Arabic SCD task (ECNN-ASCD) based on GNNs to detect writing style changes in Arabic texts. ECNN-ASCD is considered the first solution to tackle the SCD task in Arabic texts to the best of our knowledge. It was trained particularly to learn the characteristics of Arabic writing styles. ECNN-ASCD serves as a baseline, encourages researchers to solve this task, and contributes to boosting research in Arabic NLP. Moreover, boundary representations in ECNN-ASCD help preserve the style within each segment surrounding these boundaries. The boundary representations are learnable and adjusted across layers. These representations are updated based on representations of surrounding segments and boundaries extracted from previous layers.

The remainder of this paper is organized as follows: Section 2 defines the SCD task and reviews the characteristics of the Arabic language related to the SCD task. Section 3 provides a detailed overview of the SOTA solutions. Section 4 describes the proposed solution. Section 5 presents the experimental evaluation and discusses the results. Section 6 summarizes the findings and suggests further directions.

2. Background

This section presents the definition of the SCD task based on the annual PAN competitions. It then briefly reviews the characteristics of Arabic texts regarding the SCD task.

2.1. Task Definition

The SCD task was introduced by PAN, belonging to the multi-author analysis tasks in NLP. It asks to find the locations of authors’ writing style changes in a multi-authored text. This task decomposes a multi-authored document into its authorial components by identifying the positions of style changes [1]. The authorial component contains one or more textual segments, such as one paragraph or consecutive paragraphs, written by the same author.

Table 1 shows examples of writing style changes and provides the expected output for the SCD task. Document 1 is a single-authored document, while Document 2 is a multi-authored document written by two authors with two style changes. The SCD outcome is a set of binary values related to boundary positions at a specific level. The outcomes in the table are located at the paragraph and sentence levels. A boundary with Value 1 signifies a change between the textual segments around it, but Value 0 does not. As a result, the length of the result is equal to the number of textual components minus 1. PAN 2017 [1] asked to determine the change at the sentence level, while the paragraph level dominated in the coming editions PAN 2020 to PAN 2024 [4,5,6,7,11]. This study focuses on the paragraph level.

2.2. Arabic Language

Arabic is a Semitic language [29]. It differs from Indo-European languages alphabetically, morphologically, and syntactically. From an alphabetical perspective, there are twenty-eight Arabic letters. They have contextual variants that can be presented in different shapes depending on their position in words. For example, the letter “ع” (in this paper, an Arabic word is represented in some or all of three variants according to context: “Arabic word” ([Buckwalter Arabic transliteration [30]: English translation)) can be represented as “عـ” if the letter appears at the beginning of a word like “علم” ([Elm]: know), “ـعـ” if it appears in the middle like “العلم” ([AlElm]: the knowledge), “ـع” if it is at the end like “قطع” ([qTE]: cut), or “ع” if it appears independent of other letters like “باع” ([bAE]: sold). Moreover, Arabic does not support capitalization, which increases the difficulty of some information extraction and understanding tasks, including SCD.

From a morphology perspective, the Arabic templatic morphology interweaves patterns and affixes to roots [31]. For instance, the word “الطلاب” ([AlTlAb]: the students) was coined starting with the root: “طلب” ([Tlb]: request). Then the pattern “فعال” ([fEAl]) is applied to produce “طلاب” ([TlAb]: students). Finally, the prefix “ال” ([Al]: the) precedes the pattern to get the final word “الطلاب.” This structure differs from the concatenative languages, such as the English word “the students”, which was constructed by concatenating “the” and “s” with the word “student”. In some cases, a single Arabic word represents an English sentence. For example, the Arabic word “فأسقيناكموه” ([f>sqynAkmwh]: then we give it to you to drink) can be translated into English using eight words.

From a syntactic perspective, several Arabic grammatical rules differ from English ones. Arabic is a relatively free word-order language [29]. The order of sentences can be verb–subject–object, subject–verb–object, or object–verb–subject. Unlike English, Arabic sentences consider syntactic constituency. A noun and its modifiers must correspond in gender, number, and definiteness. In the case of two items, the quantifier and the noun must agree in gender. For example, “طالبتان اثنتان” ([TAlbtAn AvntAn]: two female students) is grammatically correct because the word “اثنتان” ([AvntAn]: two female) and the word “طالبتان” ([AvntAn]: two female students) are feminine. In the case of three items, the quantifier and the noun must not agree on gender. For example, the word “ثلاث” ([vlAv]: three) and the word “طالبات” ([TAlbAt]: female students) in the phrase “ثلاث طالبات” ([vlAv TAlbAt]: three female students) disagree on gender. It can also be noted that dual forms are supported in Arabic by concatenating dual pronouns, “طالبتان”, which are not present in English, where dual pronouns are typically separate words, “two students”.

These Arabic characteristics influence the way authors write their thoughts within linguistic structures. Each language has literary traditions that can inspire authors and influence their writing style. Arabic poetry, for instance, has a unique tradition with distinct meters that differ from those of English poetry. Arabic poetry is more symbolic, incorporates more complex metaphors, and follows more intricate rhyme patterns than English poetry [32]. Table 2 presents the Arabic example for the SCD task. The presented texts are shown in cursive script from right to left. There are significant differences between the two authors’ sentences in terms of length and structure. It has style changes similar to the English example in Table 1.

3. Related Work

Since 2017, several solutions have been proposed to address the SCD task in, particularly, the English language. According to the literature review conducted in this study, four prediction methods have been adopted: statistical, classical machine learning (ML), Deep Neural Networks (DNNs), and hybrid methods. These methods are investigated in the following subsections.

3.1. Statistical-Based Methods

These methods included selecting handcrafted features from given documents, including n-grams, part-of-speech tags, and punctuation marks. The set features were processed to predict style changes. Khan [33], Khan [34] studied different features and defined a measure for each type, including a ratio of character to space and a commonality index. A similarity score was then calculated by counting the common features between documents, assuming a style had changed if the score was less than a similarity threshold. Karas et al. [35] adopted a distribution test called the Wilcoxon Signed Rank Test [36] to predict the style changes. This test showed no change in style if two documents were derived from the same distribution; a change in the style would lead to putting the two documents into different distributions. However, its results were inaccurate in determining single-author documents because the test considered that every document already included a style change [35].

3.2. ML-Based Methods

There are two classes of classical ML methods, including supervised and unsupervised methods. In supervised methods, different classifiers, including logistic regression [37], random forest [38], and Support Vector Machine [39] classifiers, were investigated. Handcrafted-based solutions were developed based on one of these three classifiers [40,41,42,43] or by taking the weighted sum of them [44]. Nath [42] observed that the PAN 2018 dataset documents with a long length (≥4786 characters) have style changes. This observation was then used to train an additional classifier [42]. Despite this straightforward observation, the solution performed less than a top solution in Nath’s experiments by only 0.05 accuracy [42]. In addition, Zlatkova et al. [45] constructed a stacking ensemble solution of several ML algorithms to benefit from the characteristics of each algorithm [45]. They then trained the logistic regression classifiers as a meta-learner. In unsupervised methods, clustering algorithms were adopted to group documents into clusters according to the similarity of their writing styles. Documents with close similarity scores were grouped into one cluster, indicating that the same author wrote them. The K-means clustering algorithm [46] was the common algorithm used along with handcrafted features [10,47,48,49]. In contrast, Nath [50] and Castro-Castro et al. [51] designed their clustering algorithms. Although different similarity functions have been studied, the cosine similarity function is used the most. It outperformed the Manhattan and Matusita functions [52] and the Jaccard and Dice functions [47].

3.3. DNN-Based Methods

Long Short-Term Memory (LSTM) [53] and Convolutional Neural Network (CNN) [54] models were investigated for the SCD task. Deibel and Löfflad [55] developed a BiLSTM-based solution to tackle the SCD task at the sentence level. Schaetti [56] and Müller [57] developed CNN-based solutions to extract features. Nath [42,52] investigated GloVe embeddings [58] and developed a BiLSTM-based Siamese NN solution. A network constructed as a Siamese NN contains two identical neural subnetworks working in parallel and connected at their output [59]. In Nath’s experiments, the BiLSTM-based solution outperformed BiGRU-based and logistic regression-based solutions. However, Nath observed that increasing the number of units or layers consumes additional running time without yielding a better result [42,52]. Hosseinia and Mukherjee [60] developed an LSTM-based Siamese NN solution involving an attention mechanism. The difference between its subnetworks is in how they receive input. The first subnetwork receives the typical direction of a document while the second subnetwork receives the reverse direction to be similar to the backward pass in a BiLSTM architecture [60]. The similarity functions in Siamese NN-based solutions [42,52,60] measured the difference between subnetwork outputs followed by an FC layer to predict style changes. Adding similarity functions instead of other FCs increased the accuracy by 10% [60]. To further enhance performance, incorporating BERT [61] was investigated with CNN models [62] or with BiLSTM followed by CNN models [63]. Zi and Zhou [63] claimed that using the BiLSTM layer addresses the dependency between words in both directions and using the CNN layer reduces features and avoids overfitting.

3.4. Hybrid-Based Methods

These methods incorporated two of the previous methods in different stages. For example, the supervised- and unsupervised-based methods were incorporated. Singh et al. [43] trained the logistic regression classifier to predict similarity scores and then applied hierarchical clustering algorithms for assigning authors. Alvi et al. [40] trained the random forests classifier to predict the author’s number and then used the K-means clustering algorithm for assigning authors. Liu et al. [64] depended on the K-means clustering algorithm to detect the positions of the style change. Then they adopted a BiGRU-based Siamese NN involving attention mechanism layers for assigning authors [64]. Moreover, embedding models to extract features were incorporated with statistical or ML methods to predict changes. In statistical methods, Safin and Kuznetsova [65] vectorized documents using the Skip-Thoughts model [66]. A pairwise distance matrix was then constructed based on an average of cosine similarity values [65]. Rodríguez-Losada and Castro-Castro [67] tried to catch semantic information significantly at the sentence level instead of the word level by studying three Sentence-BERT-based pretrained models [68]. However, the average performance of these pretrained models did not exceed an accuracy of 63%. Thus, they combined them with additional representation, handcrafted features and then measured the similarity using the cosine similarity function [67]. In ML methods, investigated embedding models included FastText [69], BERT, DeBERTa [70], RoBERTa [71], ELECTRA [72], and mT0-xl [73]. The representation of texts using BERT was common in 2020–2022 [74,75,76,77,78]. In 2021, Deibel and Löfflad [55] claimed that the inferred information from a few features may not be enough to develop their solution. Thus, they averaged the representations of handcrafted features and the FastText model to tackle the SCD task at the document level before feeding them into an FC one [55]. In 2022, Jiang et al. [79] investigated ELECTRA to address a missing masked language modeling task while fine-tuning BERT. In 2023, each solution of DeBERTa-based [80,81], RoBERTa-based [82], and mT0-xl-based solutions [83] was compared with another solution that had a similar architecture but was based on BERT in ablation experiments. The results of these experiments showed that BERT-based solutions obtained less performance than the other solutions [80,81,82,83].

4. Proposed Solution: ECNN-ASCD

This section briefly reviews GNNs and focuses on two relevant modules to this study, Graph Convolutional Networks (GCNs) and EdgeConv. The section then describes the development of ECNN-ASCD in two phases. The first phase is designing a representative graph to display relationships for the SCD task. The second phase is adapting an appropriate GNN module to overcome its existing limitations to solve the SCD task.

4.1. GNNs

In 2009, the GNN framework was introduced by applying a message-passing scheme to propagate information across nodes [12]. Figure 1 illustrates the general propagation framework, where the colors assigned to the nodes are utilized for visual separation to distinguish between individual nodes in the graph structure. The input graph represents different computation graphs. Each node defines its computation graph based on its neighborhoods. The computation graph of the target node A appears on the right side of Figure 1. The message-passing scheme can be summarized in three steps. First, each target node computes a message for its neighbors. The rectangles filled with slanted black lines in Figure 1 represent messages from one node to its neighbors within the computation graph. Second, the target node receives and aggregates the messages from its neighbors. Third, the target node updates its representation. The solid gray rectangle in Figure 1 symbolizes the aggregation operation in the GNNs. These steps are applied to all nodes iteratively through multiple computation layers. Each node in the first layer uses information received from its immediate neighbors. By stacking two layers, the subsequent layer incorporates information from the neighbors of neighbors (2-hop neighborhood) derived from the first layer. Thus, stacking N layers enables nodes to utilize information with the N-hop neighborhood. This iterative nature enhances the capturing of dependence patterns in the graph data.

Formally, let

G (V, E)

be a graph, where

V

and E are the sets of nodes and edges, respectively.

X \in R^{n \times f}

is a matrix containing representations of the length f for n nodes. Equation (1) shows the calculation of a representation of one node

i \in V

in the kth layer. Different aggregation functions can be used. Choosing these functions is still critical and attracting significant attention from researchers [84]. For example, the first function,

A g g_{1}

, can average a set of prior representations of neighbors

N (i)

from the

k - 1

th layer, while the second function,

A g g_{2}

, can sum the result of the first function and the representation of the node i itself from the previous layer,

k - 1

.

x_{i}^{(k)} = A g g_{2} (x_{i}^{(k - 1)}, A g g_{1} (\{x_{N (i)}^{(k - 1)}\})) .

(1)

Kipf and Welling [85] introduced GCNs as one of the early GNN models. Its message-passing schema involves a symmetric-normalized aggregation and an element-wise nonlinear transformation, as shown in Equation (2). Initially, the messages from all the neighbors

N (i)

are symmetrically normalized by the degrees of both the neighbor

j \in N (i)

and the target i nodes. Subsequently, these messages are aggregated using a weighted summation. Finally, target node representations are updated by combining the aggregated messages with the current target node representations by a neural network transformation. While GCNs are widely used for diverse tasks [16,17,18,19,20,21,24,27,28], they encounter limitations diminishing their abilities. First, GCNs can handle edge weights within the adjacency matrices as shallow edge representations, while edges can contain richer information. Second, the iterative propagation in deep GCN architectures results in receiving similar messages. This similarity leads to shared representations across most nodes, less distinctiveness, and high convergence, known as an over-smoothing issue. Third, GCNs update node representations based on the degree of neighbors and target nodes. Thus, GCNs are structure-dependent. GCNs may struggle with noisy graphs that have sparsely labeled edges.

x_{i}^{(k)} = \sum_{j \in N (i) \cup {i}} \frac{1}{\sqrt{deg (i)} \cdot \sqrt{deg (j)}} \cdot (W^{⊤} \cdot x_{j}^{(k - 1)}) .

(2)

Several methodologies have been proposed to incorporate edge representations in GNNs. EdgeConv [86] has emerged as one of the convolutional modules applied in three-dimensional data and point clouds. It attempts to benefit from edge and node representations as shown in Equation (3). The edge representations are passed in messages by measuring the differentiation between the target i and the neighbor

j \in N (i)

representations. These messages are summed with the target node representations via a neural network. These messages are subsequently aggregated. Three built-in aggregation operations are in the EdgeConv module: “add”, “mean”, and “max” operations. The update step in EdgeConv is similar to GCNs that fuse aggregated messages with the current target node representations. While EdgeConv deals with edge representations, it may suffer from the over-smoothing issue. Table 3 summarizes key symbols used in this paper.

x_{i}^{(k)} = Agg (\{W^{⊤} \cdot (x_{N (i)}^{(k - 1)} - x_{i}^{(k - 1)}) + W^{⊤} \cdot x_{i}^{(k - 1)}\}) .

(3)

4.2. Graph Design

This subsection describes how the documents are formulated as graphs in this study. The preprocessing step is applied to the input documents as shown in Algorithm 1. The step includes segmenting texts into paragraphs to detect the style changes at the paragraph level. The preprocessing also verifies document consistency by checking paragraph counts with labels. Algorithm 1 returns the embeddings of the input documents. This research uses pretrained models to represent paragraphs because they have provided high results in SOTA solutions. For example, 14 pretrained-based models out of 15 participated solutions in the latest edition, PAN 2024, achieved a higher

F_{1}

score than a handcrafted-based model that participated in the same year [11]. The SCD task deals with each document in the input data separately. In other words, the task does not look for a common writing style between two documents. Thus, there is no direct relationship between any two documents. Therefore, each document can be treated as a disjoint subgraph. This separation helps decrease the complexity and sparsity of graphs. Every subgraph corresponds to a document in the input data. Each node in a subgraph represents a paragraph in the document. This design preserves each paragraph’s individual representation, thereby preserving each one’s contextual information. Edges connect the preceding to succeeding paragraphs, capturing the sequential flow within the document. This subgraph can be constructed as a path graph.

Algorithm 1 Preprocessing

Input:

D o c s

: A dataset,

B: Boundaries’ list as ground-truth labels

Output:

R e p_{D o c}

: Dataset representations at the paragraph level

1:: $R e p_{D o c} \leftarrow \emptyset$
2:: for all $d o c \in D o c s$ do
3:: $p a r a \leftarrow$ segmenting_paragraphs( $d o c$ )
4:: $p a r a \leftarrow$ checkConsistency( $p a r a$ ,B) //Compared to ground truth information
5:: $R e p_{p a r a} \leftarrow$ pretrained_model( $p a r a$ )
6:: $R e p_{D o c} \leftarrow R e p_{D o c} \cup R e p_{p a r a}$
7:: end for
8:: Return $R e p_{D o c}$

Algorithm 2 shows the main design steps of the graph

G (V, E)

. After applying the preprocessing step, all the nodes in a subgraph

G_{d o c}

will contain the representations of the paragraphs located in one document. Edges are added between every two consecutive nodes without initializing their representations. The ground-truth labels are assigned to edges. The edges correspond to boundaries between paragraphs. Hence, classifying these edges determines whether the writing style changes between paragraphs.

Algorithm 2 Graph design

Input:

D o c s

: A dataset,

B: Boundaries’ list as ground-truth labels

Output:

G (V, E)

: A graph with nodes and edges

1:: $D o c s \leftarrow$ Algorithm 1 ( $D o c s$ )
2:: $G \leftarrow \emptyset$
3:: for $d o c \leftarrow 1$ to | $D o c s$ | do
4:: $G_{d o c} \leftarrow \emptyset$
5:: $R e p_{d o c} \leftarrow$ extract_represent( $d o c$ )
6:: $G_{d o c}$ .add_nodes( $R e p_{d o c}$ ) // $G_{d o c}$ is updated with added nodes
7:: $G_{d o c}$ .add_consecutive_edges( $d o c$ ) // $G_{d o c}$ is updated with added edges
8:: for all $e_{i j} \in E_{d o c}$ do //All edges in $G_{d o c}$ are mapped to labels
9:: if $B [e_{i j}]$ =0 then
10:: $e_{i j}$ .label = 0
11:: else
12:: $e_{i j}$ .label = 1
13:: end if
14:: end for
15:: $G \leftarrow G \cup$ $G_{d o c}$
16:: end for
17:: Return $G$

It is noteworthy that the SCD task focuses on boundaries between only consecutive paragraphs. However, other graph structures, like a complete graph, can focus on different possible combinations of two paragraphs, which do not accurately reflect the task. For example, Document 2 in Figure Table 1 is represented in Figure 2 at the paragraph level using two graph structures: path and complete graphs. The edges in the path graph represent the original edges that express ground-truth labels in a dataset. The other edges in the complete graph represent additional edges that are added to illustrate the completeness concept. The bold edges hold trusted labels that can be determined directly from the ground-truth labels. However, the remaining edges in a complete graph may not hold trusted labels. These labels cannot be determined directly from the ground-truth labels, as seen with dotted edges in Figure 2. The complete graph can also produce another issue related to the labels’ distribution. The distribution of binary labels in the complete graph may not correspond to the original distribution in datasets. In Figure 2, the distribution of zero in the path graph is

\frac{1}{3}

, while its distribution in the complete graph is

\frac{1}{6}

.

4.3. GNNs for the SCD Task

The graph edges proposed in Section 4.2 represent the writing styles. These styles need to be classified for the SCD task. Thus, EdgeConv was adapted to handle the edge representations. Figure 3 shows the ECNN-ASCD’s architecture. The input graph

G^{(0)}

has initial node representations

X_{N \times F^{(0)}}^{(0)}

with an empty set of edge representations, where

N \times F^{(0)}

represents the shape of the node representation matrix. To extract more non-local features, the ECNN-ASCD’s architecture comprises four layers. They contain the same components except the first layer. Every layer k generates new edge representations

E_{N - 1 \times F^{(k)}}^{(k)}

, where the number of edges is equal to the number of nodes minus one. Every layer’s node and edge representations are input to the subsequent layer. The symbol ⊕ indicates to sum the required information to obtain the edge representations. After passing through four layers, the output graph

G^{(4)}

is obtained with node representations

X_{N \times F^{(4)}}^{(4)}

and edge representations

E_{N - 1 \times F^{(4)}}^{(4)}

. The ECNN-ASCD’s architecture ends with a single FC layer. It classifies the edge representations extracted from the last layer. Moreover, activation functions introduce non-linearities in mapping input values to desired outputs during training instead of being restricted to modeling only linear relationships. Two activation functions were adopted: ReLU and Sigmoid. ReLU has a simple mathematical operation, the comparison to zero, which helps get faster training times in deep networks. Sigmoid is placed at the final layer before detecting using a threshold of 0.5 to round outputs to 0 and 1.

The default EdgeConv module feeds the input node representations into a neural network. In ECNN-ASCD, two sequential linear layers are used as the neural network. The first layer concatenates node representations with those of their neighbors. This concatenation results in a doubling of the input size. The second layer maps the concatenated node representations to half of their size. Thus, the EdgeConv module reduces the input size. Therefore, FC layers are inserted in every ECNN-ASCD layer to apply a linear transformation to the representation vectors. The aggregation method in this EdgeConv module calculates the mean value of the transformed neighbor representations to update the target node representations. Algorithm 3 outlines the operations performed in every ECNN-ASCD layer. A layer k receives the graph resulting from the previous layer

G^{(k - 1)}

. Their edge representations

E^{(k - 1)}

are extracted, whereas the first layer does not receive the previous edge representations because they have an empty set. The previous node representations,

X^{(k - 1)}

, are fed into the EdgeConv module to generate the new node representations,

X^{(k)}

. The initial node representations,

X^{(0)}

, are added in each GNN layer to alleviate the over-smoothing issue. These representations are aggregated by summing them. Thus, the edge representations in the layer k are updated and adjusted across layers.

Algorithm 3 An ECNN-ASCD layer

Input:

G^{(k - 1)}

: A graph at the

k - 1

layer with their node and edge representations,

X^{(0)}

: The original node representations

Output:

G^{(k)}

: A graph at the k layer with updated node and edge representations

1:: $E^{(k - 1)} \leftarrow$ extract_edge_representations ( $G^{(k - 1)}$ )
2:: if $E^{(k - 1)} \neq \emptyset$ then
3:: $E^{(k - 1)} \leftarrow$ FC ( $E^{(k - 1)}$ )
4:: end if
5:: $G^{(k)} \leftarrow$ EdgeConv ( $G^{(k - 1)}$ )
6:: $X^{(k)} \leftarrow$ extract_node_representations ( $G^{(k)}$ )
7:: $X^{(k)} \leftarrow$ activation_fun ( $X^{(k)}$ ) // $G^{(k)}$ is updated with these node representations
8:: $X^{(0)} \leftarrow$ FC ( $X^{(0)}$ )
9:: for all $e_{i j} \in E^{(k - 1)}$ do
10:: $e_{i j}^{(k)} \leftarrow$ $e_{i j}^{(k - 1)}$ + $x_{i}^{(k)}$ + $x_{j}^{(k)}$ + $x_{i}^{(0)}$
11:: end for
12:: $E^{(k)} \leftarrow$ do_dropout( $E^{(k)}$ ) // $G^{(k)}$ is updated with these edge representations
13:: Return $G^{(k)}$

Equation (4) shows how the edge representations

e_{i j}^{(k)}

can be updated, where

σ (\cdot)

represents a nonlinear activation function,

A g g^{k}

represents the aggregation function of the

k^{t h}

layer,

x_{i}

and

x_{j}

indicate the representations of end nodes i and j of edge

e_{i j}

(

j \in N (i)

), and

W^{k}

is a learnable parameter in the layer k that adjusts the dimension of output representation vectors. The edge representations in each layer are updated according to the new node representation in the same layer. The graph

G^{(k)}

with the node and edge representations of the current layer k is returned using Algorithm 3.

e_{i j}^{(k)} = {Agg}^{k} (W_{e}^{k} e_{i j}^{(k - 1)}, σ (E d g e C o n v_{k} (x_{i}^{k - 1}, x_{j}^{k - 1})), W_{0}^{k} x_{i}^{0}) .

(4)

5. ECNN-ASCD Evaluation

The evaluation of ECNN-ASCD was conducted on an Arabic SCD dataset. Ablation experiments were also conducted to validate the contribution of each component in ECNN-ASCD. The following subsections describe the settings of the experiments, discuss their results, and show how to adopt the proposed method to solve the SCD task for English.

5.1. Experiment Settings

The same settings were used for all the experiments. They are identified from different aspects and described in this subsection, including the datasets, models for comparison, setup of the environment, techniques, and evaluation metrics.

Datasets: AraSCD (https://github.com/abeersaad0/SCD/tree/main, accessed on 30 April 2025) is a large Arabic dataset for SCD. It holds 30,000 documents extracted from several publicly available Arabic linguistic resources. It contains three classes of instances according to difficulty level: hard, medium, and easy. The criteria for this classification focus on categories of texts and the time period during which these texts were written. AraSCD encompasses three categories of text, poetry, books, and newspapers, which exhibit diverse author backgrounds and may enrich AraSCD with varying writing styles. Hard instances were written by five poets during a single era; medium instances were authored by both three poets of the same age and two writers who wrote two books on a similar domain; and easy instances encompass texts written by two poets from two different eras, two writers of two books covering two distinct topics, and one newspaper writer. Table 4 provides an overview of its statistics. Each level has the same number of documents. The average lengths of documents and paragraphs are measured as the average number of paragraphs and words, respectively, per document. The average number of style changes is measured per document. The percentage of style changes signifies the ratio of changes relative to all boundaries in the dataset.
Baseline models: Two baseline models were evaluated on each test set in AraSCD. First, Baseline-Predicting 0 (Baseline-Pr0) assigns the value 0 to all predicted labels, implying no style changes across all boundaries in the test sets. Hence, this baseline model predicts that all documents are single-authored. Second, Baseline-Predicting 1 (Baseline-Pr1), in contrast, assigns the value 1 to all predicted labels, suggesting that changes in writing styles occur across all test set boundaries. As a result, this baseline model suggests that all documents are written by multiple authors.
Other models for comparison: Since no prior Arabic work had been conducted, two models were developed in this study for comparison purposes with the results of ECNN-ASCD. First, a basic machine learning model named BERT-MLP was developed. It encodes the input using AraBERTv02 (https://huggingface.co/aubmindlab/bert-base-arabertv02, accessed on 30 April 2025), which is the same selected pretrained model for ECNN-ASCD. BERT-MLP classifies the input based on two FC layers with 128 neurons. It is worth noting that the classification in the FC layers maps one sample in the input with one predicted output. Thus, a sample for BERT-MLP is a boundary represented by concatenating two sequential paragraphs separated by a [SEP] token. Second, GCNN-ASCD was developed based on the GCN module. For a fair comparison, GCNN-ASCD maintained the same ECNN-ASCD architecture except for replacing the EdgeConv module with the GCN module in all the layers.
Environment setup: All the experiments were run on a computer with an Intel(R) i7 processor up to 5.60 GHz, 64-bit, ASUS TUF RTX 4090 24 GB OC GAMING, and 64 GB (2 × 32 GB) DDR5 5600 Mhz.
Technical setup: Python 3.12 and the PyTorch 2.4 framework were used to develop the models. Each input document is used as a single batch to update the weights, as the test phase in the real scenario is applied to at least one document. The hyperparameter settings that were used during the training for model optimization are summarized in Table 5.
Evaluation metrics: The F₁ score metric (https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.f1_score.html, accessed on 30 April 2025) is a common performance measure used in classification tasks [87]. The macro-averaged F₁ score measures a macro-average of correctly predicted style change detection positions compared to the total number of positions without considering the proportion of each class, change or no change, in the dataset. We recall that the SCD task considers the style change case as the positive class for evaluating models. The macro-averaged F₁ score is used in evaluating solutions submitted for PAN competitions from 2020 to 2024 [4,5,6,7,11]. The other two metrics can be derived from the confusion matrix, which are precision and recall, as shown in Equations (5)–(7). Precision measures how many correctly classified change positions are among all predicted change positions. The high precision indicates that the classifier classifies the correct change positions more than it incorrectly classifies as change positions. Recall measures how many correctly classified change positions are among all ground-truth change positions. The higher the recall, the more correctly the change positions are classified. The macro-averaged precision and recall metrics were used to compare ECNN-ASCD’s performance with that of the other models on AraSCD.

Precision = \frac{TruePositives}{TruePositives + FalsePositives}

(5)

Recall = \frac{TruePositives}{TruePositives + FalseNegatives}

(6)

F_{1} = \frac{2 * Precision * Recall}{Precision + Recall} .

(7)

5.2. Results and Discussion

The evaluation of ECNN-ASCD includes tuning its hyperparameters, examining its components through ablation experiments, and comparing its results to those of other developed models. These experiments are discussed below.

5.2.1. Hyperparameter Tuning

Extensive fine-tuning experiments were conducted on the test sets to further enhance the performance of ECNN-ASCD. This research focuses on fine-tuning four hyperparameters: the number of ECNN-ASCD layers, a pretrained model for encoding input, a learning rate for model optimization, and the number of epochs. Their F₁ score results were recorded in Table 6. Candidate values for each hyperparameter are derived from previous SCD works. The values with the best performance were adopted and are given in bold. When considering the number of ECNN-ASCD layers, it is evident that increasing the layers from two to five leads to a gradual improvement in performance across levels. The deep ECNN-ASCD layers increase the N-hop neighborhood and help capture the long-range relationship between nodes. The maximum number of hops is related to the maximum number of nodes in a subgraph, where nodes represent paragraphs within a document. According to Table 4, the average number of paragraphs is around 14 per document. However, increasing the number of layers to 14 will increase the architecture’s complexity and learnable parameters without guaranteeing a better result. For example, increasing from four to five layers enhances the performance slightly for one level, but it is not valuable compared to the performance at four layers. Thus, this study stopped investigating at five layers and adopted four layers for ECNN-ASCD.

In terms of encoders, the candidates were selected because their English versions were previously used for SCD. BERT had the top usage from 2020 to 2022 [62,63,75,76,77,88,89] and RoBERTa had the top usage in 2023 and 2024 [90,91,92,93,94,95,96,97]. The candidate Arabic versions of BERT and RoBERTa are considered the most downloaded on the Huggingface platform. In this study, paragraphs were encoded using one of five candidate pretrained models. First, RoBERTa-Davlan (https://huggingface.co/Davlan/xlm-roberta-base-finetuned-arabic, accessed on 30 April 2025) is a fine-tuned version of xlm-roberta-base. Second, RoBERTa-jhu (https://huggingface.co/jhu-clsp/roberta-large-eng-ara-128k, accessed on 30 April 2025) is an English–Arabic bilingual encoder with the same size as XLM-R large. Third, mBERT (https://huggingface.co/google-bert/bert-base-multilingual-cased, accessed on 30 April 2025) is a multilingual base model trained on Wikipedia pages of 104 languages, including Arabic, that have the largest content volume on the platform. Fourth, CamelBERT (https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa, accessed on 30 April 2025) is based on BERT for Modern Standard Arabic. Fifth, AraBERT is based on BERT architecture and uses the same BERT-Base configuration. In this work, all the models were developed under the same setting. All the trainable parameters of the pretrained models were frozen. Thus, no fine-tuning was performed on their parameters for extracting embeddings. This freezing allowed us to assess the models’ capabilities rigorously within the constraints of our experimental setup. The maximum available length for the pretrained models is 256 tokens, which may be less than the length of a single paragraph. Therefore, any paragraph exceeding this length will be truncated using the default truncation strategy, called “longest-first”. It truncates a token at a time from the longest sequence to reach 256 tokens. A special classification token [CLS] was used to represent paragraph-level classification instead of using the whole input tokens [61]. Table 6 shows the results of the fine-tuning experiments. Both the CamelBERT-based and AraBERT-based models yielded high results, where they were trained especially on Arabic texts based on the BERT architecture. In particular, the superior performance of AraBERT can be attributed to its ability to capture the author’s style. Thus, style patterns extracted by AraBERT are more effective in representing and learning. Adjusting the learning rate (LR) also plays a crucial role in optimizing model performance. Previous SCD works utilized a large range of LR values, such as 0.001 [55] and 0.00002 [81,95,98,99,100]. Several experiments were conducted using the ADAM optimizer [101] with different LR values. Table 6 reports two values. Small LR values can lead to slower convergence with more stability due to making more minor updates at each iteration. Large LR values may help achieve fast convergence but risk overshooting the optimal value. According to the results, the LR of 0.001 yielded better results and strikes a balance between rapid convergence and stable training. Regarding the epoch, training ML-based models aims to learn them till reaching convergence. The model’s parameters in this convergence are stabilized, and further training does not significantly enhance performance. In this study, the number of training epochs was extended along with applying the best values of the hyperparameters (above). The plots in Figure 4 showcase the training and test set scores across epochs from 1 to 50. The performance evolves over the epochs, where the curves exhibit a near-linear trajectory before reaching 50 epochs. This linearity indicates that 50 epochs provided sufficient training for ECNN-ASCD to converge across the different difficulty levels. Moreover, the learning curves in Figure 4 exhibit a continuous increase over epochs without notable fluctuations. This stable behavior indicates that ECNN-ASCD is learning without encountering significant underfitting or overfitting issues. The absence of increased gaps between the training and test curves demonstrates the robustness of ECNN-ASCD. Thus, it validates ECNN-ASCD’s ability to generalize to unseen data without relying on memorization.

5.2.2. Ablation Experiments

Several ablation experiments were conducted to prove the effectiveness of the main components in ECNN-ASCD. The experiments involved cumulatively adding components up to obtaining the proposed solution. The

F_{1}

results of these experiments are summarized in Table 7. Each row represents a model that contains a component mentioned in the same row and all the added prior ones. The table showcases the

F_{1}

scores for three levels of instances with paired t-test results. The subscript values “

\pm 0 . X

” represent the difference between the current model score and the one directly above it. The symbols “S” and “NS” denote that there is a statistically significant difference and insignificance, respectively, after adding the current component. The symbol “⇐” indicates that the model containing the current component surpasses its predecessor. Conversely, the symbol “⇑” signifies the prior model’s dominance before introducing the current component. The last column shows the final decision, either adding or removing the current component according to the provided influence. The six main components were studied, which are GNN modules, a warmup mechanism, previous edge representations, initial node representations, an attention mechanism, and the convergent case. The components are discussed in detail below.

First, the base component is an appropriate GNN module, either GCN or EdgeConv. These models represent the baseline models without any additional components. Specifically, each model uses four GNN layers, either GCN or EdgeConv, where the layer count was determined during the tuning phase. The nodes in path graphs represent paragraphs. Each edge was labeled to indicate whether there is a style change between end nodes. Edge representations were classified by summing the representations of two end nodes extracted from the fourth GNN layer. As can be seen, basic EdgeConv statistically outperforms basic GCN significantly on two difficulty levels. It shows that EdgeConv enhanced the learning of edge representations for SCD better than the GCN-based model. Second, the warmup mechanism was added to optimize the models. This mechanism helps models to train with a very low LR and then it will be linearly increased until reaching the global minimum of the loss function. Kucukkaya et al. [80] noted that their SCD models have minimum convergence levels. Introducing the warmup with the ratio of 0.1 enhanced their models significantly. In this study, the same ratio is used. As a result, including the warmup mechanism significantly improves performance across all difficulty levels. Third, edge representations obtained from the previous model are static representations. They do not directly benefit from using deep layers. Thus, adding edge representations from a prior layer into a current layer enables continual learning and adjustment across layers. The incorporation also prevents edge representations from relying only on node representations from the last layers. Table 7 showcases the significant improvement after adopting dynamic edge representations. Gong and Cheng [102] and Zhou et al. [103] adjusted edge representations across layers within their GNN-based models. The best performance was achieved when they used their proposed edge representation approach. Fourth, initial node information is aggregated while measuring edge representations. As shown in Table 7, the fusion of both initial node and edge representations can mitigate the over-smoothing issue and contribute to further performance gains. Zhou et al. [103] developed a deep GNN architecture with 64 layers, and they obtained better results when aggregating the initial node representations. Fifth, incorporating the attention mechanism achieved promising results in previous SCD works [60,64]. In this study, the attention mechanism was incorporated as straightforward learnable weights to identify the relative importance of nodes. The weights are learned using an FC layer. However, this incorporation did not bring stable behavior. The paired t-test results revealed that the fluctuating attention scores led to inconclusive additive results. Moreover, incorporating attention increases the learnable parameters and the model complexity without clear benefits. This case prompted us to exclude the attention component from the final model. Further research is recommended to investigate attention mechanisms along with GNN-based models for SCD. Sixth, the convergent case means training the best accumulated components until the convergence state is reached. This state yielded a substantial performance improvement. Developing a model with an acceptable complexity level enables continual learning for extended epochs within an acceptable time. As shown in Figure 4, the F₁ scores increase across epochs till depicting flattened curves. Therefore, the approved version of ECNN-ASCD was obtained.

5.2.3. Performance Comparison

A performance comparison of ECNN-ASCD against the other developed models is presented in Table 8. The results demonstrate that ECNN-ASCD achieves the top results across all three instance levels. The paired t-test was conducted on each pair of the five models on all types of instance as shown in Table 9, Table 10 and Table 11, where “⇐” indicates that the model in the row is significantly better compared to the model in the column, “⇑” signifies a significant superiority of the model in the column, and “˷” indicates no significant difference in the results. Since the results of the paired t-test are symmetric, there is no need to calculate both directions of the comparison. Thus, the grey cells in Table 9 indicate redundant comparisons. The tables show that ECNN-ASCD performed statistically better than the others on all instances. These results suggest that GNNs, particularly EdgeConv, possess substantial potential for the SCD task.

Regarding the baseline models, the results clearly indicate that the proposed solution has significantly surpassed the performance of the baseline models. It can be noticed that these two baseline models gave almost similar results because of the equilibrium in the labeling classes within AraSCD, as shown in Table 4. In addition, even when adopting the same encoder in ECNN-ASCD and BERT-MLP, deep learning in ECNN-ASCD helps edge representations to learn and update through layers. Furthermore, by comparing the GNN modules, ECNN-ASCD performs better than GCNN-ASCD, which is based on the GCN module. The EdgeConv module adopts multi-dimensional edge representations while passing its messages. GCN differs from EdgeConv by excluding edge representations across layers. Therefore, adopting learnable edge representations significantly contributes to the performance improvement in GNNs for SCD, distinguishing from the prevalent static edge representations in GCNs. Additionally, the nuances of style may vary according to different factors. Although designing AraSCD and PAN datasets aims to detect style changes, these datasets have not explicitly been annotated for integrating physiological or psychological variables. These variables may influence styles, such as the author’s mood and environmental factors [104]. The writing style of the same author can also evolve over time [105]. Hence, identifying the stages of writing texts of the same author can also be considered another research direction [105,106,107]. In this paper, the research scope does not delve into studying the impact of these factors. Further studying is recommended to incorporate them and explore how they intricately shape writing style variations. Although the F₁ score is the common evaluation metric in recent PAN editions [4,5,6,7,11], it is not well correlated to human perception and may not differentiate between writing style characteristics. Consequently, adopting other metrics in addition to the F₁ score helps to analyze different perspectives of performance. The effectiveness of ECNN-ASCD can be evaluated from a human perspective. The human assessments can enhance the credibility and reliability of model predictions. It is one of the common metrics for evaluating NLP models, such as assessing chatbots [108]. In this research, time and resource constraints have posed significant challenges in conducting human evaluations, while they can provide valuable insights into style consistency. For a more comprehensive evaluation framework, future research could explore these aspects in greater detail. Moreover, ECNN-ASCD is one of the deep learning models that operates as a black box. The decision-making process of such models is challenging to interpret. Exploring techniques for interpretability can enable the understanding of predictions and ensure their fairness [109]. Applying these techniques may also facilitate model debugging and error analysis. While this paper has focused on developing ECNN-ASCD, the model’s interpretability is not the primary focus of the current study. Further delving into interpretability is necessary to enhance the transparency and trustworthiness of ECNN-ASCD.

5.3. Proposed Solution for English: ECNN-ESCD

Since no SOTA solutions to Arabic SCD exist, we trained the same method on English datasets so that we can compare the model’s performance with English SOTA solutions. The Edge Convolutional Neural Network for the English SCD task (ECNN-ESCD) was developed following the same ECNN-ASCD architecture. Despite not tuning to the characteristics of English writing styles, this subsection validates that GNNs, particularly EdgeConv, possess substantial potential for the SCD task on English texts. Statistics of the selected PAN datasets are presented in Table 12. PAN 2021 and PAN 2022 competitions have provided multiple tasks within the SCD track. This study includes those who tackle detecting style changes, namely, PAN 2021-Task 2 and PAN 2022-Task 3. Table 13 presents the comparison results of the ECNN-ESCD and SOTA solutions on these PAN datasets. The solutions are ordered from the highest to the lowest

F_{1}

score, where the

F_{1}

score is used in PAN 2021 and PAN 2022 [5,6]. SOTA solutions include participants in the two PAN competitions [10,40,43,52,55,62,63,67,75,76,77,78,79] and others who evaluated their models on the same datasets after closing these competitions [97,110]. The baseline models were developed by assigning authors randomly to paragraphs in a uniform way within a document.

As shown in the table, ECNN-ESCD exceeds the baseline models in the two datasets and can become a product for SCD on English texts. ECNN-ESCD ranked third out of eight solutions in the PAN 2021 dataset. Even though ECNN-ESCD works on boundary representations and struggles with its limitations, ECNN-ESCD remains highly competitive, and its results are comparable. When comparing the two datasets, the average length of paragraphs in the PAN 2021 dataset is longer than that of the PAN 2022 dataset. The difference in length is because the PAN 2022 dataset was designed to tackle SCD at the sentence level, which contains one sentence per paragraph. Longer paragraphs in the PAN 2021 dataset can help ECNN-ESCD to provide rich semantic data and enhance its detection.

6. Conclusions and Future Work

This work introduced the ECNN-ASCD solution to detect writing style changes in multi-authored Arabic documents. ECNN-ASCD is based on the EdgeConv module, which integrates edge representations during the message-passing scheme. Edges in ECNN-ASCD indicate text boundaries, and their representations are adjusted across layers while preserving text representations surrounding these boundaries. The performance of ECNN-ASCD was evaluated on AraSCD, the first Arabic dataset for the SCD task Appendix A. Through intensive experiments, ECNN-ASCD demonstrated significantly superior performance over comparative models and achieved the best

F_{1}

score of 0.9945%, 0.9381%, and 0.9120% on easy, medium, and hard instances, respectively. These results show the enhancement of boundary representations by aggregating initial node and learnable edge representations extracted from previous layers. The findings of the ablation experiments statistically validate the effectiveness of ECNN-ASCD components in improving the performance of detecting style changes. As the first publicly available solution for the Arabic SCD, ECNN-ASCD would bridge the gap in the Arabic SCD task and pave the way for more active research. Since the message-passing scheme is crucial in the GNN modules, adjusting it in EdgeConv for the SCD could detect more diverse writing styles. Designing a richer graph, such as incorporating dependency trees and knowledge graphs, could involve more representative edge features. Further enhancements could also be conducted by investigating attention mechanisms. A deep study of these mechanisms can improve detection by focusing on the relevant parts of the written texts. Future investigations are recommended to design specialized SCD datasets that incorporate both physiological and psychological aspects, such as annotations of sentiment analysis tasks, to gain deeper insights into studying these aspects. Evaluating the models using human assessment can provide a comprehensive understanding of their performance across multiple perspectives, facilitating further improvements. Exploring model interpretability presents a promising direction in the future, which could enhance the applicability of ECNN-ASCD across various domains.

Author Contributions

Conceptualization, A.S.A.; methodology, A.S.A. and M.E.B.M.; software, A.S.A.; validation, A.S.A. and M.E.B.M.; formal analysis, A.S.A.; investigation, A.S.A.; resources, A.S.A.; data curation, A.S.A.; writing—original draft preparation, A.S.A.; writing—review and editing, A.S.A. and M.E.B.M.; visualization, A.S.A.; supervision, M.E.B.M.; project administration, A.S.A. and M.E.B.M.; funding acquisition, A.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is publicly available at https://github.com/abeersaad0/SCD/tree/main (accessed on 30 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Baseline-Pr0	Baseline-Predicting 0
Baseline-Pr1	Baseline-Predicting 1
CNN	Convolutional Neural Network
DNN	Deep Neural Network
ECNN-ASCD	Edge Convolutional Neural Network for the Arabic Style Change Detection
ECNN-ESCD	Edge Convolutional Neural Network for the English Style Change Detection
FC	Fully Connected
GCN	Graph Convolutional Network
GNN	Graph Neural Network
LR	learning Rate
LSTM	Long Short-Term Memory
ML	Machine Learning
NLP	Natural Language Processing
SCD	Style Change Detection
SOTA	State of the Art

Appendix A. Examples of Predictions from ECNN-ASCD

Figure A1–Figure A3 present the actual and predicted style changes of three instances: easy, medium, and hard, respectively. In each figure, the left side shows the actual style changes, whereas the right side shows the predicted ones obtained by ECNN-ASCD. Every boundary is assigned a red color when it separates two paragraphs written by different authors. The easy example indicates the best case when the model predicts all the boundaries correctly. The medium example shows that the prediction was correct, except for two mistakes. The hard example shows five mistakes, which are one false positive and four false negative mistakes. It is worth noting that each figure presents part of a document, not all of its content.

Figure A1. Example of actual and predicted style changes in easy ASCD instances.

Figure A2. Example of actual and predicted style changes in medium ASCD instances.

Figure A3. Example of actual and predicted style changes in hard ASCD instances.

References

Tschuggnall, M.; Stamatatos, E.; Verhoeven, B.; Daelemans, W.; Specht, G.; Stein, B.; Potthast, M. Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering. In Working Notes of CLEF 2017—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Dublin, Ireland, 2017; Volume 1866, p. 22. [Google Scholar]
Kestemont, M.; Tschuggnall, M.; Stamatatos, E.; Daelemans, W.; Specht, G.; Stein, B.; Potthast, M. Overview of the Author Identification Task at PAN-2018. In Working Notes of CLEF 2018—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 25. [Google Scholar]
Zangerle, E.; Tschuggnall, M.; Specht, G.; Stein, B.; Potthast, M. Overview of the Style Change Detection Task at PAN 2019. In Working Notes of CLEF 2019—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Lugano, Switzerland, 2019; Volume 2380, p. 11. [Google Scholar]
Zangerle, E.; Mayerl, M.; Specht, G.; Potthast, M.; Stein, B. Overview of the Style Change Detection Task at PAN 2020. In Working Notes of CLEF 2020—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2020; Volume 2696, p. 11. [Google Scholar]
Zangerle, E.; Mayerl, M.; Potthast, M.; Stein, B. Overview of the Style Change Detection Task at PAN 2021. In Working Notes of CLEF 2021—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, pp. 1760–1771. [Google Scholar]
Zangerle, E.; Mayerl, M.; Potthast, M.; Stein, B. Overview of the Style Change Detection Task at PAN 2022. In Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, pp. 2344–2356. [Google Scholar]
Zangerle, E.; Mayerl, M.; Potthast, M.; Stein, B. Overview of the Multi-Author Writing Style Analysis Task at PAN 2023. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; pp. 2513–2522. [Google Scholar]
Rexha, A.; Kröll, M.; Ziak, H.; Kern, R. Authorship Identification of Documents with High Content Similarity. Scientometrics 2018, 115, 223–237. [Google Scholar] [CrossRef] [PubMed]
Akiva, N.; Koppel, M. Identifying Distinct Components of a Multi-author Document. In Proceedings of the 2012 European Intelligence and Security Informatics Conference, Odense, Denmark, 22–24 August 2012; pp. 205–209. [Google Scholar]
Alshamasi, S.; Menai, M.B. Ensemble-Based Clustering for Writing Style Change Detection in Multi-Authored Textual Documents. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 18. [Google Scholar]
Zangerle, E.; Mayerl, M.; Potthast, M.; Stein, B. Overview of the Multi-Author Writing Style Analysis Task at PAN 2024. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2424–2431. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
Embarcadero-Ruiz, D.; Gómez-Adorno, H.; Embarcadero-Ruiz, A.; Sierra, G. Graph-Based Siamese Network for Authorship Verification. In Working Notes of CLEF 21—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 11. [Google Scholar]
Embarcadero-Ruiz, D.; Gómez-Adorno, H.; Embarcadero-Ruiz, A.; Sierra, G. Graph-Based Siamese Network for Authorship Verification. In Working Notes of CLEF 22—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 277. [Google Scholar]
Valdez-Valenzuela, A.; Martinez-Galicia, J.A.; Gomez-Adorno, H. Heterogeneous-Graph Convolutional Network for Authorship Verification. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 8. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2205–2215. [Google Scholar]
Yu, B.; Mengge, X.; Zhang, Z.; Liu, T.; Yubin, W.; Wang, B. Learning to Prune Dependency Trees with Rethinking for Neural Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 3842–3852. [Google Scholar]
Li, B.; Fan, Y.; Sataer, Y.; Gao, Z.; Gui, Y. Improving Semantic Dependency Parsing with Higher-Order Information Encoded by Graph Neural Networks. Appl. Sci. 2022, 12, 4089. [Google Scholar] [CrossRef]
Hu, Y.; Shen, H.; Liu, W.; Min, F.; Qiao, X.; Jin, K. A Graph Convolutional Network With Multiple Dependency Representations for Relation Extraction. IEEE Access 2021, 9, 81575–81587. [Google Scholar] [CrossRef]
Sun, K.; Zhang, R.; Mao, Y.; Mensah, S.; Liu, X. Relation Extraction with Convolutional Network over Learnable Syntax-Transport Graph. Proc. AAAI Conf. Artif. Intell. 2020, 34, 8928–8935. [Google Scholar] [CrossRef]
Zhou, L.; Wang, T.; Qu, H.; Huang, L.; Liu, Y. A Weighted GCN with Logical Adjacency Matrix for Relation Extraction. In ECAI 2020; iOS Press: Amsterdam, The Netherlands, 2020; pp. 1–8. [Google Scholar]
Mandya, A.; Bollegala, D.; Coenen, F. Graph Convolution over Multiple Dependency Sub-graphs for Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 6424–6435. [Google Scholar]
Jin, Z.; Yang, Y.; Qiu, X.; Zhang, Z. Relation of the Relations: A New Paradigm of the Relation Extraction Problem. arXiv 2020, arXiv:2006.03719. [Google Scholar]
Tian, Y.; Chen, G.; Song, Y.; Wan, X. Dependency-driven Relation Extraction with Attentive Graph Convolutional Networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 4458–4471. [Google Scholar]
Zhao, K.; Xu, H.; Cheng, Y.; Li, X.; Gao, K. Representation Iterative Fusion Based on Heterogeneous Graph Neural Network for Joint Entity and Relation Extraction. Knowl.-Based Syst. 2021, 219, 9. [Google Scholar] [CrossRef]
Liu, P.; Wang, L.; Zhao, Q.; Chen, H.; Feng, Y.; Lin, X.; He, L. ECNU_ICA_1 SemEval-2021 Task 4: Leveraging Knowledge-enhanced Graph Attention Networks for Reading Comprehension of Abstract Meaning. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Online, 10–31 January 2021; pp. 183–188. [Google Scholar]
Guo, Z.; Zhang, Y.; Lu, W. Attention Guided Graph Convolutional Networks for Relation Extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 241–251. [Google Scholar]
Li, Z.; Sun, Y.; Zhu, J.; Tang, S.; Zhang, C.; Ma, H. Improve Relation Extraction with Dual Attention-Guided Graph Convolutional Networks. Neural Comput. Appl. 2021, 33, 1773–1784. [Google Scholar] [CrossRef]
Farghaly, A.; Shaalan, K. Arabic Natural Language Processing: Challenges and Solutions. ACM Trans. Asian Lang. Inf. Process. 2009, 8, 1–22. [Google Scholar] [CrossRef]
Habash, N.; Soudi, A.; Buckwalter, T. On Arabic transliteration. In Arabic Computational Morphology: Knowledge-Based and Empirical Methods; Springer: Dordrecht, The Netherlands, 2007; pp. 15–22. [Google Scholar]
Wintner, S. Morphological processing of semitic languages. In Natural Language Processing of Semitic Languages; Zitouni, I., Ed.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 43–66. [Google Scholar]
Ahmed, M.A.; Trausan-Matu, S. Using natural language processing for analyzing Arabic poetry rhythm. In Proceedings of the 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet), Targu Mures, Romania, 21–23 September 2017; pp. 1–5. [Google Scholar]
Khan, J.A. Style Breach Detection: An Unsupervised Detection Model. In CLEF 2017 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Dublin, Ireland, 2017; Volume 1866, p. 10. [Google Scholar]
Khan, J.A. A Model for Style Change Detection at a Glance. In CLEF 2018 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 8. [Google Scholar]
Karas, D.; Spiewak, M.; Piotr, S. OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection. In CLEF 2017 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Dublin, Ireland, 2017; Volume 1866, p. 12. [Google Scholar]
Ramachandran, K.M.; Tsokos, C.P. Mathematical Statistics with Applications in R, 3rd ed.; Elsevier: Philadelphia, PA, USA, 2020. [Google Scholar]
Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. Methodol. 1959, 21, 238. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Alvi, F.; Algafri, H.; Alqahtani, N. Style Change Detection using Discourse Markers. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 6. [Google Scholar]
Jacobo, G.X.; Dehesa-Corona, V.; Rojas-Reyes, A.; Gomez-Adorno, H. Authorship Verification Machine Learning Methods For Style Change Detection In Texts. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 7. [Google Scholar]
Nath, S. Style Change Detection. Ph.D. Thesis, Université de Neuchâtel, Neuchâtel, Switzerland, 2021. [Google Scholar]
Singh, R.; Weerasinghe, J.; Greenstadt, R. Writing Style Change Detection on Multi-Author Documents. In CLEF 2021 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 9. [Google Scholar]
Safin, K.; Ogaltsov, A. Detecting a Change of Style Using Text Statistics. In CLEF 2018 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 6. [Google Scholar]
Zlatkova, D.; Kopev, D.; Mitov, K.; Atanasov, A.; Hardalov, M.; Koychev, I.; Nakov, P. An Ensemble-Rich Multi-Aspect Approach for Robust Style Change Detection. In CLEF 2018 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 14. [Google Scholar]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. Berkeley Symp. Math. Stat. Probab. 1967, 5, 281–297. [Google Scholar]
Elamine, M.; Mechti, S.; Belguith, L.H. An Unsupervised Method for Detecting Style Breaches in a Document. In Proceedings of the 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 3–7 November 2019; pp. 1–6. [Google Scholar]
Mandic, L.; Milkovic, F.; Doria, S. Combining the Powers of Clustering Affinities in Style Change Detection. Course Project Reports; University of Zagreb: Zagreb, Croatia, 2019. [Google Scholar]
Zuo, C.; Zhao, Y.; Banerjee, R. Style Change Detection with Feed-forward Neural Networks. In CLEF 2019 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Lugano, Switzerland, 2019; Volume 2380, p. 9. [Google Scholar]
Nath, S. Style Change Detection by Threshold Based and Window Merge Clustering Methods. In CLEF 2019 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Lugano, Switzerland, 2019; Volume 2380, p. 11. [Google Scholar]
Castro-Castro, D.; Rodríguez-Losada, C.A.; Muñoz, R. Mixed Style Feature Representation and B0-maximal Clustering for Style Change Detection. In CLEF 2020 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2020; Volume 2696, p. 7. [Google Scholar]
Nath, S. Style Change Detection using Siamese Neural Networks. In CLEF 2021 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 11. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Lecun, Y. Generalization and Network Design Strategies. Connectionism in Perspective; Elsevier: Amsterdam, The Netherlands, 1989; Volume 19, p. 18. [Google Scholar]
Deibel, R.; Löfflad, D. Style Change Detection on Real-World Data using an LSTM-powered Attribution Algorithm. In CLEF 2021 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 11. [Google Scholar]
Schaetti, N. Character-based Convolutional Neural Network for Style Change Detection. In CLEF 2018 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 6. [Google Scholar]
Müller, P. Style Change Detection. Bachelor’s Thesis, ETH Zurich, Zürich, Switzerland, 2019. [Google Scholar]
Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall series in artificial intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature Verification using a ‘Siamese’ Time Delay Neural Network. In Proceedings of the the 6th International Conference on Neural Information Processing Systems, San Francisco, CA, USA, 29 November–2 December 1993; NIPS’93. pp. 737–744. [Google Scholar]
Hosseinia, M.; Mukherjee, A. A Parallel Hierarchical Attention Network for Style Change Detection. In CLEF 2018 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Avignon, France, 2018; Volume 2125, p. 7. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; p. 16. [Google Scholar]
Lao, Q.; Ma, L.; Yang, W.; Yang, Z.; Yuan, D.; Tan, Z.; Liang, L. Style Change Detection Based On Bert And Conv1d. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 6. [Google Scholar]
Zi, J.; Zhou, L. Style Change Detection Based On Bi-LSTM And Bert. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 5. [Google Scholar]
Liu, G.; Yan, Z.; Wang, T.; Zhan, K. ARTW: A Model of Author Recognition based on Writing Style by Recognizing Style Crack. In Proceedings of the ICISS 2022: 2022 the 5th International Conference on Information Science and Systems, Beijing, China, 26–28 August 2022; pp. 130–135. [Google Scholar]
Safin, K.; Kuznetsova, R. Style Breach Detection with Neural Sentence Embeddings. In CLEF 2017 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Dublin, Ireland, 2017; Volume 1866, p. 7. [Google Scholar]
Kiros, R.; Zhu, Y.; Salakhutdinov, R.; Zemel, R.S.; Torralba, A.; Urtasun, R.; Fidler, S. Skip-Thought Vectors. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2015; Volume 1, p. 11. [Google Scholar]
Rodríguez-Losada, C.A.; Castro-Castro, D. Three Style Similarity: Sentence-embedding, Auxiliary Words, Punctuation. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 11. [Google Scholar]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar]
Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 12. [Google Scholar] [CrossRef]
He, P.; Liu, X.; Gao, J.; Chen, W. DeBERTa: Decoding-enhanced BERT with Disentangled Attention. In Proceedings of the International Conference on Learning Representations ICLR 2021, Vienna, Austria, 4–8 May 2021; p. 17. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; p. 18. [Google Scholar]
Muennighoff, N.; Wang, T.; Sutawika, L.; Roberts, A.; Biderman, S.; Le Scao, T.; Bari, M.S.; Shen, S.; Yong, Z.X.; Schoelkopf, H.; et al. Crosslingual Generalization through Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1, pp. 15991–16111. [Google Scholar]
Iyer, A.; Vosoughi, S. Style Change Detection Using BERT. In CLEF 2020 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2020; Volume 2696, p. 9. [Google Scholar]
Zhang, Z.; Han, Z.; Kong, L. Style Change Detection based on Prompt. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 4. [Google Scholar]
Zhang, Z.; Han, Z.; Kong, L.; Miao, X.; Peng, Z.; Zeng, J.; Cao, H.; Zhang, J.; Xiao, Z.; Peng, X. Style Change Detection Based On Writing Style Similarity. In CLEF 2021 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 4. [Google Scholar]
Lin, T.M.; Chen, C.Y.; Tzeng, Y.W.; Lee, L.H. Ensemble Pre-trained Transformer Models for Writing Style Change Detection. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 9. [Google Scholar]
Str, E. Multi-label Style Change Detection by Solving a Binary Classification Problem. In CLEF 2021 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bucharest, Romania, 2021; Volume 2936, p. 12. [Google Scholar]
Jiang, X.; Qi, H.; Zhang, Z.; Huang, M. Style Change Detection: Method Based On Pre-trained Model And Similarity Recognition. In CLEF 2022 Labs and Workshops, Notebook Papers; CEUR Workshop Proceedings; CEUR-WS.org: Bologna, Italy, 2022; Volume 3180, p. 6. [Google Scholar]
Kucukkaya, I.E.; Sahin, U.; Toraman, C. ARC-NLP at PAN 2023: Transition-Focused Natural Language Inference for Writing Style Detection. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 10. [Google Scholar]
Ye, Z.; Zhong, C.; Qi, H.; Han, Y. Supervised Contrastive Learning for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 6. [Google Scholar]
Hashemi, A.; Shi, W. EnhancingWriting Style Change Detection using Transformer-based Models and Data Augmentation. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 9. [Google Scholar]
Huang, M.; Huang, Z.; Kong, L. Encoded Classifier Using Knowledge Distillation for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2023—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Thessaloniki, Greece, 2023; p. 6. [Google Scholar]
Peng, C.; Xia, F.; Naseriparsa, M.; Osborne, F. Knowledge Graphs: Opportunities and Challenges. Artif. Intell. Rev. 2023, 56, 13071–13102. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep Learning-based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2022, 54, 1–40. [Google Scholar] [CrossRef]
Liu, X.; Chen, H.; Lv, J. Team foshan-university-of-guangdong at PAN: Adaptive Entropy-Based Stability-Plasticity for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2750–2754. [Google Scholar]
Mohan, T.M.; Sheela, T.V.S. BERT-Based Similarity Measures Oriented Approach for Style Change Detection. In Accelerating Discoveries in Data Science and Artificial Intelligence II; Springer: Cham, Switzerland, 2024; Volume 438, pp. 83–94. [Google Scholar]
Lin, T.M.; Wu, Y.H.; Lee, L.H. Team NYCU-NLP at PAN 2024: Integrating Transformers with Similarity Adjustments for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2716–2721. [Google Scholar]
Huang, Y.; Kong, L. Team Text Understanding and Analysis at PAN: Utilizing BERT Series Pre-training Model for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2653–2657. [Google Scholar]
Wu, Q.; Kong, L.; Ye, Z. Team bingezzzleep at PAN: A Writing Style Change Analysis Model Based on RoBERTa Encoding and Contrastive Learning for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2963–2968. [Google Scholar]
Chen, Z.; Han, Y.; Yi, Y. Team Chen at PAN: Integrating R-Drop and Pre-trained Language Model for Multi-author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2547–2553. [Google Scholar]
Wu, B.; Han, Y.; Yan, K.; Qi, H. Team baker at PAN: Enhancing Writing Style Change Detection with Virtual Softmax. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2951–2955. [Google Scholar]
Sheykhlan, M.K.; Abdoljabbar, S.K.; Mahmoudabad, M.N. Team karami-sh at PAN: Transformer-based Ensemble Learning for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2676–2681. [Google Scholar]
Liu, C.; Han, Z.; Chen, H.; Hu, Q. Team Liuc0757 at PAN: A Writing Style Embedding Method Based on Contrastive Learning for Multi-Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2716–2721. [Google Scholar]
Zamir, M.T.; Ayub, M.A.; Gul, A.; Ahmad, N.; Ahmad, K. Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection. arXiv 2024, arXiv:2401.06752. [Google Scholar]
Liang, X.; Zeng, F.; Zhou, Y.; Liu, X.; Zhou, Y. Fine-Tuned Reasoning for Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2710–2715. [Google Scholar]
Lv, J.; Yi, Y.; Qi, H. Team fosu-stu at PAN: Supervised Fine-Tuning of Large Language Models for Multi Author Writing Style Analysis. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2781–2786. [Google Scholar]
Sanjesh, R. eam riyahsanjesh at PAN: Multi-feature with CNN and Bi-LSTM Neural Network Approach to Style Change Detection. In Working Notes of CLEF 2024—Conference and Labs of the Evaluation Forum; CEUR Workshop Proceedings; CEUR-WS.org: Grenoble, France, 2024; Volume 3740, pp. 2881–2885. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Gong, L.; Cheng, Q. Exploiting Edge Features for Graph Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9203–9211. [Google Scholar]
Zhou, Y.; Huo, H.; Hou, Z.; Bu, L.; Mao, J.; Wang, Y.; Lv, X.; Bu, F. Co-embedding of edges and nodes with deep graph convolutional neural networks. Sci. Rep. 2023, 13, 26. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Riddell, A.; Juola, P. Mode Effects’ Challenge to Authorship Attribution. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 1146–1155. [Google Scholar]
Ríos-Toledo, G.; Posadas-Durán, J.P.F.; Sidorov, G.; Castro-Sánchez, N.A. Detection of changes in literary writing style using N-grams as style markers and supervised machine learning. PLoS ONE 2022, 17, 25. [Google Scholar] [CrossRef] [PubMed]
Amelin, K.; Granichin, O.; Kizhaeva, N.; Volkovich, Z. Patterning of writing style evolution by means of dynamic similarity. Pattern Recognit. 2018, 77, 45–64. [Google Scholar] [CrossRef]
Gomez Adorno, H.M.; Rios, G.; Posadas Durán, J.P.; Sidorov, G.; Sierra, G. Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts. Computación y Sistemas 2018, 22, 7. [Google Scholar] [CrossRef]
Alsheddi, A.S.; Alhenaki, L.S. English and Arabic Chatbots: A Systematic Literature Review. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 662–675. [Google Scholar] [CrossRef]
ŞAHiN, E.; Arslan, N.N.; Özdemir, D. Unlocking the black box: An in-depth review on interpretability, explainability, and reliability in deep learning. Neural Comput. Appl. 2025, 37, 859–965. [Google Scholar]
Huertas-Tato, J.; Martín, A.; Camacho, D. Understanding writing style in social media with a supervised contrastively pre-trained transformer. Knowl. Based Syst. 2024, 296, 12. [Google Scholar] [CrossRef]

Figure 1. General propagation framework, where the colors are utilized for visual separation.

Figure 2. Example of a graph design: (a) a path graph used in ECNN-ASCD; (b) a complete graph.

Figure 3. ECNN-ASCD architecture, where the colors are utilized for component separation.

Figure 4. F₁ scores of ECNN-ASCD across epochs from 1 to 50 on different instances: (a) easy instances; (b) medium instances; and (c) hard instances.

Table 1. Example of the SCD task.

	Document 1	Document 2
	This paragraph was written by the first author.	This paragraph was written by the first author.	Author 1
Author 1	This paragraph has two sentences. They were written by the first author.	Two sentences are in this paragraph. The second author wrote both sentences.	Author 2
	This paragraph as well was written by the first author.	The second author wrote this paragraph too.
	This paragraph was also written by the first author.	This paragraph also was written by the first author.	Author 1
Para. level	[0, 0, 0]	[1, 0, 1]
Sent. level	[0, 0, 0, 0]	[1, 0, 0, 1]

Table 2. Arabic example of the SCD task.

	Document 1	Document 2
	هذه الفقرة كتبت من المؤلف الأول.	هذه الفقرة كتبت من المؤلف الأول.	Author 1
	هذه الفقرة تتكون من جملتين. وكتبت كلاهما	يمكن صياغة الفقرة الثانية من خلال جملتين
Author 1	من المؤلف الأول.	منفصلتين. حيث قام المؤلف الثاني بكتابتهما.	Author 2
	هذه الفقرة كتبها المؤلف الأول.	هذه الفقرة كتبها المؤلف الأول.	Author 1
Para. level	[0, 0]	[1, 1]
Sent. level	[0, 0, 0]	[1, 0, 1]

Table 3. Symbol definitions.

Symbol	Definition
$G (V, E)$	$G$ : input graph, $V$ : node set, and E: edge set
\| $V$ \| = n	n number of nodes
$i \in V$	Nodes in $G$
$e_{i j} \in E$	Edges in $G$ located between every two consecutive nodes i and j
$j \in N (i)$	Set of one-hop neighbors of node i in $G$
$deg (i)$	Degree of node i
$W^{k}$	Trainable weight matrix at layer k
$X \in R^{n \times f}$	Node feature matrix for n nodes
$x_{i}^{k} \in R^{f}$	f-dimensional node embedding of node i at layer k
$E \in R^{n - 1 \times f}$	Edge feature matrix for $n - 1$ edges
$e_{i j}^{k} \in R^{f}$	f-dimensional node embedding of edge $i j$ at layer k
$A g g (\cdot)$	Aggregation function
$σ (\cdot)$	Nonlinear activation function
⊕	Summation operation

Table 4. AraSCD statistics.

Level	# Doc.	Avg. Doc. Leng.	Avg. Para. Leng.	Avg. Style Change
Easy	10,000	11.4221 para.	150.2 words	4.52 changes (43.33%)
Medium	10,000	14.4912 para.	125.0 words	5.66 changes (41.96%)
Hard	10,000	14.7409 para.	124.4 words	5.82 changes (42.32%)

Table 5. Predefined hyperparameters.

Hyperparameter	Seed	Optimizer	Dropout	Batch	Max Sequence Length	Warmup Rate
Value	42	Adam *	0.5	1	256	0.1

* https://docs.pytorch.org/docs/stable/generated/torch.optim.Adam.html (accessed on 30 April 2025).

Table 6. Hyperparameter tuning (highest scores and corresponding hyperparameter values in bold).

Layer	2	AraBERT	0.001	20	0.9868	0.9004	0.8332
	3	AraBERT	0.001	20	0.9911	0.9199	0.8536
	4	AraBERT	0.001	20	0.9918	0.9214	0.8684
	5	AraBERT	0.001	20	0.9915	0.9236	0.8663
Encoder	4	RoBERTa-Davlan	0.001	20	0.9679	0.8693	0.7795
	4	RoBERTa-jhu	0.001	20	0.9883	0.8782	0.7726
	4	mBERT	0.001	20	0.9516	0.8354	0.4935
	4	CamelBERT	0.001	20	0.9903	0.9179	0.8578
	4	AraBERT	0.001	20	0.9918	0.9214	0.8684
LR	4	AraBERT	0.00002	20	0.98260	0.90487	0.84113
LR	4	AraBERT	0.001	20	0.9918	0.9214	0.8684
Epoch	4	AraBERT	0.001	20	0.9918	0.9214	0.8684
Epoch	4	AraBERT	0.001	50	0.9945	0.9381	0.9120

Table 7. Results of the ablation experiments.

Components	Easy ( $F_{1})$	Medium ( $F_{1})$	Hard ( $F_{1})$	Decision
Basic GCN	0.7968	0.6958	0.3640	-
Basic EdgeConv	${0.9714}_{\begin{matrix} + 0.1745 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.8356}_{\begin{matrix} + 0.1398 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.3640}_{\begin{matrix} + 0.0000 \end{matrix} (\begin{matrix} N S \end{matrix})}$	Added
Warmup	${0.9835}_{\begin{matrix} + 0.0121 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.9081}_{\begin{matrix} + 0.0724 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.8180}_{\begin{matrix} + 0.4540 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	Added
Prev-X	${0.9851}_{\begin{matrix} + 0.0016 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.9168}_{\begin{matrix} + 0.0087 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.8426}_{\begin{matrix} + 0.0246 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	Added
X0	${0.9918}_{\begin{matrix} + 0.0068 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.9214}_{\begin{matrix} + 0.0046 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.8684}_{\begin{matrix} + 0.0259 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	Added
Attention	${0.9904}_{\begin{matrix} - 0.0015 \end{matrix} (\begin{matrix} N S \end{matrix})}$	${0.9247}_{\begin{matrix} + 0.0034 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.8453}_{\begin{matrix} - 0.0231 \end{matrix} (\begin{matrix} S ⇑ \end{matrix})}$	Removed
X0-Convergent (ECNN-ASCD)	${0.9945}_{\begin{matrix} + 0.0026 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.9381}_{\begin{matrix} + 0.0167 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	${0.9120}_{\begin{matrix} + 0.0436 \end{matrix} (\begin{matrix} S \Leftarrow \end{matrix})}$	Added

Table 8. Performance comparison on AraSCD (highest scores highlighted in bold).

Solution	Easy			Medium			Hard
	$F_{1} Score$	Precision	Recall	$F_{1} Score$	Precision	Recall	$F_{1} Score$	Precision	Recall
Baseline-Pr0	0.3632	0.285	0.500	0.3678	0.291	0.500	0.364	0.286	0.500
Baseline-Pr1	0.3005	0.215	0.500	0.2948	0.209	0.500	0.2996	0.214	0.500
BERT-MLP	0.9653	0.9673	0.9635	0.8508	0.8758	0.8416	0.6761	0.7072	0.6752
GCNN-ASCD	0.9465	0.9517	0.9429	0.8352	0.8469	0.8294	0.6914	0.6923	0.6907
ECNN-ASCD	0.9945	0.9949	0.9941	0.9381	0.9402	0.9363	0.9120	0.9147	0.9099

Table 9. Paired t-test results on easy instances (the grey cells are redundant comparisons).

	ECNN-ASCD	GCNN-ASCD	BERT_MLP	Base-0
ECNN-ASCD
GCNN-ASCD	⇑
BERT_MLP	⇑	⇐
Base-0	⇑	⇑	⇑
Base-1	⇑	⇑	⇑	⇑

Table 10. Paired t-test results on medium instances (the grey cells are redundant comparisons).

	ECNN-ASCD	GCNN-ASCD	BERT_MLP	Base-0
ECNN-ASCD
GCNN-ASCD	⇑
BERT_MLP	⇑	⇐
Base-0	⇑	⇑	⇑
Base-1	⇑	⇑	⇑	⇑

Table 11. Paired t-test results on hard instances (the grey cells are redundant comparisons).

	ECNN-ASCD	GCNN-ASCD	BERT_MLP	Base-0
ECNN-ASCD
GCNN-ASCD	⇑
BERT_MLP	⇑	⇑
Base-0	⇑	⇑	⇑
Base-1	⇑	⇑	⇑	⇑

Table 12. Statistics of PAN datasets.

Level	# Doc.	Avg. Doc. Leng.	Avg. Para. Leng.	Avg. Style Change
PAN 2021 (Task 2)	16,000	6.8836 para.	44.2091 words	3.1556 changes (53.63%)
PAN 2022 (Task 3)	10,000	15.9652 para.	19.9412 words	8.1380 changes (54.38%)

Table 13. SOTA results on PAN 2021 and PAN 2022 datasets.

PAN 2021-Task 2		PAN 2022-Task 3
Solution	$F_{1}$	Solution	$F_{1}$
Zamir et al. [97]	0.7750	Lin et al. [77]	0.7156
Zhang et al. [76]	0.7510	Jiang et al. [79]	0.6720
ECNN-ESCD	0.7373	Zhang et al. [75]	0.6581
Str [78]	0.7070	Zi and Zhou [63]	0.6483
Huertas-Tato et al. [110]	0.7043	Lao et al. [62]	0.6314
Deibel and Löfflad [55]	0.6690	Huertas-Tato et al. [110]	0.6312
Singh et al. [43]	0.6570	ECNN-ESCD	0.6215
Nath [52]	0.6470	Alvi et al. [40]	0.5636
-	-	Rodríguez-Losada and Castro-Castro [67]	0.5565
-	-	Alshamasi and Menai [10]	0.4995
Baseline (random)	0.470	Baseline (random)	0.4809

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsheddi, A.S.; Menai, M.E.B. Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text. Appl. Sci. 2025, 15, 6633. https://doi.org/10.3390/app15126633

AMA Style

Alsheddi AS, Menai MEB. Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text. Applied Sciences. 2025; 15(12):6633. https://doi.org/10.3390/app15126633

Chicago/Turabian Style

Alsheddi, Abeer Saad, and Mohamed El Bachir Menai. 2025. "Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text" Applied Sciences 15, no. 12: 6633. https://doi.org/10.3390/app15126633

APA Style

Alsheddi, A. S., & Menai, M. E. B. (2025). Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text. Applied Sciences, 15(12), 6633. https://doi.org/10.3390/app15126633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Edge Convolutional Networks for Style Change Detection in Arabic Multi-Authored Text

Abstract

1. Introduction

2. Background

2.1. Task Definition

2.2. Arabic Language

3. Related Work

3.1. Statistical-Based Methods

3.2. ML-Based Methods

3.3. DNN-Based Methods

3.4. Hybrid-Based Methods

4. Proposed Solution: ECNN-ASCD

4.1. GNNs

4.2. Graph Design

4.3. GNNs for the SCD Task

5. ECNN-ASCD Evaluation

5.1. Experiment Settings

5.2. Results and Discussion

5.2.1. Hyperparameter Tuning

5.2.2. Ablation Experiments

5.2.3. Performance Comparison

5.3. Proposed Solution for English: ECNN-ESCD

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Examples of Predictions from ECNN-ASCD

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI