Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements

Emekci, Hakan; Özkan, İbrahim

doi:10.3390/su17208979

Open AccessArticle

Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements^†

by

Hakan Emekci

^1,*

and

İbrahim Özkan

²

¹

Applied Data Science Department, TED University Ankara, 06420 Ankara, Türkiye

²

Faculty of Economics and Administrative Sciences, Çankaya University, 06815 Ankara, Türkiye

^*

Author to whom correspondence should be addressed.

^†

This article is derived from the doctoral dissertation titled “Computational analysis of CBRT’s policy statements and quantifying the effects on financial markets” (Hakan Emekci, 2017).

Sustainability 2025, 17(20), 8979; https://doi.org/10.3390/su17208979

Submission received: 9 September 2025 / Revised: 21 September 2025 / Accepted: 30 September 2025 / Published: 10 October 2025

(This article belongs to the Special Issue Public Policy and Economic Analysis in Sustainability Transitions)

Download

Browse Figures

Versions Notes

Abstract

Sustainable economic development requires transparent and consistent institutional communication from monetary authorities to maintain long-term financial stability and public trust. This study investigates the latent authorial structure and stylistic heterogeneity of central bank communications by applying stylometric analysis and unsupervised machine learning to official announcements of the Central Bank of the Republic of Turkey (CBRT). Using a dataset of 557 press releases from 2006 to 2017, we extract a range of linguistic features at both sentence and document levels—including sentence length, punctuation density, word length, and type–token ratios. These features are reduced using Principal Component Analysis (PCA) and clustered via Hierarchical Clustering on Principal Components (HCPC), revealing three distinct authorial groups within the CBRT’s communications. The robustness of these clusters is validated using multidimensional scaling (MDS) on character-level and word-level n-gram distances. The analysis finds consistent stylistic differences between clusters, with implications for authorship attribution, tone variation, and communication strategy. Notably, sentiment analysis indicates that one authorial cluster tends to exhibit more negative tonal features, suggesting potential bias or divergence in internal communication style. These findings challenge the conventional assumption of institutional homogeneity and highlight the presence of distinct communicative voices within the central bank. Furthermore, the results suggest that stylistic variation—though often subtle—may convey unintended policy signals to markets, especially in contexts where linguistic shifts are closely scrutinized. This research contributes to the emerging intersection of natural language processing, monetary economics, and institutional transparency. It demonstrates the efficacy of stylometric techniques in revealing the hidden structure of policy discourse and suggests that linguistic analytics can offer valuable insights into the internal dynamics, credibility, and effectiveness of monetary authorities. These findings contribute to sustainable financial governance by demonstrating how AI-driven analysis can enhance institutional transparency, promote consistent policy communication, and support long-term economic stability—key pillars of sustainable development.

Keywords:

natural language processing; machine learning; stylometric analysis; clustering; AI for sustainability

1. Introduction

Communication by the central bank has become a cornerstone element in the monetary policy toolkit, particularly in a time where forward guidance is a significant driver of market expectations and maintaining financial stability is of the essence. In the past twenty years, a large body of research has emphasized the paramount role of straightforward, coherent, and truthful communication from the monetary authorities in shaping the actions of agents in the economy [1,2]. However, whereas considerable research has been invested in what is being said and in what style and timing of policy announcements is being used, relatively less is known about the stylistic cues inherent in such communications and specifically about the authors and the institutional context from which they emerge.

From a sustainability perspective, consistent and transparent central bank communication constitutes a fundamental pillar of sustainable financial governance. The United Nations Sustainable Development Goal 16 emphasizes the importance of building effective, accountable, and transparent institutions, while sustainable economic development requires predictable policy frameworks that enable long-term planning and investment decisions. AI-driven analysis of institutional communication patterns offers a novel approach to evaluating and enhancing transparency, thereby contributing to more sustainable financial governance practices.

Central bank transparency research argues that more communication reduces information asymmetry between policy bodies and society and hence supports more effective policy action [3]. Conventional analyses have explored the effects of changing sentiment and tone in the pronouncements from the central bank on asset prices [4] and macroeconomic metrics [2]. However, what institutional actors—whether policy committees, technical officials, or specialized communications teams—are behind these communications remains largely unknown to date. Authorship evidence in this context may reveal fresh aspects of institutional transparency, bring to light policy signaling changes, and reveal implicit biases or strategic framing of policy decisions among the central banks.

Recent computational advances have enabled more sophisticated analysis of central bank communications, moving beyond traditional sentiment analysis to examine stylistic patterns and authorship signatures. However, the potential for intra-institutional heterogeneity in communication styles remains largely unexplored.

The Central Bank of the Republic of Turkey (CBRT) presents itself as a prominent case study of investigation in this context. While CBRT press releases from a fundamental channel of monetary policy communication, they reveal little about their authors. Institutional obscurity in this case fosters significant questions of accountability and communication consistency. In addition, Turkey’s unique macroeconomic environment with frequent leadership turnover in the CBRT provides a particular empirical setting to investigate whether policy communication styles remain constant or change in reaction to institutional instability.

This paper applies stylometric methods to study a wide corpus of CBRT press releases from different leadership eras. Utilizing clustering algorithms and natural language processing feature extraction methods, we aim to determine unique patterns of authors and assess their time stability. These findings aim to enrich the literature on central bank transparency by introducing authorship attribution as a novel dimension of communication analysis within monetary policy frameworks.

2. Literature Review

2.1. Central Bank Communication and Market Impact

The development of central bank communication in the post-crisis monetary environment has made it a cornerstone element of new policy standards. Blinder et al. [1] argue that clearer communication boosts the predictability of policy, reduces financial market volatility, and strengthens monetary policy transmission mechanism. Empirical evidence by Gurkaynak et al. [5] further confirms that financial markets react not only to policy rate changes but also to the subtle tone and substance of the spoken and written public communications of the central bank.

Recent studies continue to emphasize the communicative power of tone and textual shifts. Ehrmann and Talmi [6], for instance, show that when the European Central Bank (ECB) issues press releases that are linguistically similar to previous ones, market volatility remains low. However, deviations in phrasing after a period of consistency result in significantly higher volatility—suggesting that markets interpret stylistic discontinuity as a signal of potential policy change.

Hansen and McMahon [2] use topic modeling on Federal Open Market Committee (FOMC) releases and show that theme evolutions within policy speech shape critically important financial indicators and inflationary expectations. Their work highlights the latent semantic topography of monetary policy documents and the need for disentangling both explicit signals and tacit linguistic patterns.

Furthermore, Gardner et al. [7] develop a sentiment index based on FOMC statement texts and demonstrate that it not only affects market responses but also serves as a strong predictor of future interest rate changes. These findings suggest that financial markets are increasingly responsive to both explicit messages and the underlying tone embedded in official discourse.

2.2. Computational Approaches to Central Bank Text Analysis

Recent computational advances have expanded the analytical capabilities for examining central bank communications beyond traditional sentiment analysis. Transformer-based models like those developed by Pfeifer and Marohl [8] and Gambacorta et al. [9] now capture domain-specific terminology and reveal latent semantic structures in monetary policy discourse.

Gardner et al. [7] demonstrate that sentiment indices derived from FOMC statements serve as predictive indicators of future interest rate changes, while Gómez-Cram and Grotteria [10] show that asset prices react most strongly when Federal Reserve communications deviate from established linguistic patterns. Similarly, Ehrmann and Talmi [6] find that departures from consistent phrasing in ECB releases generate significant market volatility, even absent explicit policy changes.

These studies establish that financial markets respond to subtle linguistic variations in central bank communications. However, existing computational approaches treat institutional communications as monolithic outputs, overlooking potential intra-institutional heterogeneity in authorship or drafting processes. While transformer models excel at capturing semantic content, they provide limited insight into the systematic stylistic variations that constitute authorial signatures—precisely the domain addressed by stylometric analysis.

2.3. Stylometry and Authorship Attribution

Stylometry, as the quantitative analysis of stylistic properties of textual collections, is a methodological tool that has the capability to reveal concealed structures within written speech. Developed within literary research, early stylometric attempts aimed at solving issues of authorship discrepancies using measures like frequency distribution of individual terms and sentence length variance [11].

The discipline has since evolved due to machine learning technology, incorporating features such as character n-grams, syntactic parsing, and usage of function words to improve attribution accuracy [12,13]. Recent applications include identifying sentiment shifts in Fed minutes [14], detecting stylistic variation across speakers within central banks [15], and classifying rhetorical stance in ECB speeches using domain-specific transformer models [8].

Of particular note, recent efforts have introduced domain-tuned language models (e.g., CentralBankRoBERTa) trained exclusively on central bank communications to capture institutional idioms and context-specific phrasing [9]. These models outperform general-purpose LLMs on specialized tasks such as stance detection, masked-word prediction, and speaker classification, illustrating how NLP is being adapted to the monetary policy domain.

2.4. Gaps and Research Opportunities

While previous research successfully describes the dynamics of sentiments and thematic content changes [16,17], prior research largely ignores the aspect of authorship composition within institutional communications. Identifying whether variations in rhetorical tone or thematic bias correspond with changes in underlying cohorts of authors could deepen our knowledge on monetary policy dynamics.

Additionally, studies such as Bertsch et al. [18] use semantic similarity metrics to compare speech tone and topic prevalence across more than 50 central banks, highlighting how linguistic alignment or divergence can produce international spillovers. Their findings suggest that shifts in communication tone by one major central bank (e.g., the Fed or ECB) can subtly influence the language and policy posture of others—offering new paths for comparative stylometric analysis.

This research situates itself at the intersection of computational stylometry and monetary policy analysis. By applying machine learning and NLP-driven authorship attribution techniques to CBRT press releases, it contributes to emerging efforts to unpack the latent structures of institutional communication and enrich our understanding of how textual signals shape policy credibility and market outcomes. However, despite this extensive focus on content and thematic analysis, the question of who writes these communications—and whether different authorial voices contribute distinct stylistic signatures—remains unexplored. Current research largely constructs these communications as institutionally unified and thus neglects the possibility of intra-institutional heterogeneity due to heterogeneous authorship or committee dynamics.

3. Data and Methodology

3.1. Dataset and Description

The study used a dataset of 557 official announcements from the Central Bank of the Republic of Turkey (CBRT) from 1 January 2006 to 1 October 2017. These announcements were programmatically downloaded from the public archive of the CBRT with web scraping methods written in R and using libraries such as TM and DPLYR. The data cover a timeframe of intense global monetary policy change and offer a rich corpus to conduct stylometric analysis on. Each announcement is of variable length and subject matter and includes monetary policy decisions, committee reports, and market interventions. Importantly, CBRT never reveals individual authorship of these communiques, making it a perfect case to conduct unsupervised authorship attribution on.

Raw text data in their original state includes lots of noise in the form of stopwords, punctuation marks and other embellishments, and inconsistencies in spacing and formats. In preparation of the corpus for processing and analytical use, the following preprocessing pipeline was run:

Cleaning has been performed by using All announcements were stripped of HTML tags, extraneous whitespace, and special characters. Standard Turkish stopwords were filtered out with the use of the Zemberek 2.1.1 NLP toolkit.
Normalisation has been performed and all text has been lowercased and common abbreviations unabbreviated to their normal forms.
Tokenisation has been performed with the use of the Natural Language Toolkit (NLTK) modified to suit the structure of the Turkish language.
Feature Extraction has been conducted at both sentence and text levels. These include:
❖
Sentence Length: Average word count and character count per sentence.
❖
Punctuation Density: Average number of commas, colons, and semicolons per sentence.
❖
Type-Token Ratio (TTR): Measure of lexical variety conducted with the calculation of the ratio of unique to total tokens.
❖
Word Length: Average length of words in characters
❖
Punctuation-to-Character Ratio: Ratio of punctuation frequency to total characters.

The resulting feature matrix of engineered attributes served as the basis of further clustering analysis.

This study focuses specifically on stylometric features (sentence structure, punctuation patterns, lexical diversity) rather than semantic embeddings (BERT, RoBERTa) to maintain focus on authorship attribution rather than thematic content analysis. While transformer-based embeddings excel at capturing semantic relationships and policy content, stylometric features provide distinct advantages for detecting authorial signatures that persist across different topics and policy contexts. The choice of structural over semantic features ensures that identified clusters reflect writing style differences rather than topic-driven variations in monetary policy discourse.

3.2. Dimensionality Reduction and Clustering

The selection of PCA over non-linear dimensionality reduction techniques (t-SNE, UMAP) was based on stylometric features’ typically linear relationships and the need for interpretable components to understand stylistic differentiation. While non-linear methods excel at preserving local structures, PCA maintains global structure and provides quantifiable variance explanation essential for authorship attribution. Hierarchical clustering was chosen over partitional methods (k-means, DBSCAN) due to the exploratory nature of authorship detection where the true number of authorial groups is unknown, and the interpretability provided by dendrograms for examining clustering solutions at multiple resolutions.

Due to the multivariate nature of stylometric attributes, we performed dimensionality reduction before applying clustering in order to reduce the effects of multicollinearity and improve cluster separability. Principal Component Analysis (PCA) was conducted. The first two principal components extracted 73% of the total variance from the set of attributes, visualized in the factor map (see: Figure 1). Hierarchical clustering was subsequently performed with the Hierarchical Clustering on Principal Components (HCPC) algorithm, operationalized in CRAN-R via the FactoMineR version 1.35 package [19]. HCPC combines PCA to reduce dimensions, agglomerative Ward’s method of grouping to perform clustering, and k-means to consolidate to achieve robustness. Ward’s criterion minimizes cluster-specific intra-group inertia to create cluster homogeneity and maxima cluster dissimilarity between groups. In mathematical notation, the algorithm maximizes the objective function:

\arg \min_{C} \sum_{i = 1}^{k} \sum_{x \in C_{i}} | x - μ_{i} |^{2}

(1)

where C_i is cluster i and µ_i its centroid in principal component space. Dendrogram inspection and inertia gain curve showed that optimal cluster number was achieved with k = 3 and that there were three differing authorial groups in the CBRT communications.

3.3. Validation via Multidimensional Scaling (MDS)

Multidimensional Scaling (MDS) was performed to verify the structure of the resulting clustering. MDS maps high-dimensional similarities in features to a two-dimensional space without disrupting pairwise distance relationships [20]. Two models of MDS were built: One from cosine distances from most frequent character 4-g. Another from Euclidean distances over word unigram frequency vectors. Both the MDS plots validated the HCPC clustering result and had announcements clustered into three groups corresponding to the stylometric groups determined previously.

3.4. Sentiment and Tone Attribution

Having established writer clusters, sentiment analysis was then carried out to assess the most frequent tone correlated to each group. Consistent with the developed typology of sentiment analysis which could, in turn, be categorized along technique, text perspective, and level of rating. This work utilizes a lexicon-based and machine learning method to guarantee a robust methodology.

The lexicon-based approach relied on a curated sentiment lexicon comprising pre-compiled terms with established polarity values. Sentiment polarity was calculated by assessing the semantic orientation of words and sentences, thereby capturing the degree of subjectivity and evaluative opinion expressed within the text. All algorithms supporting this approach were implemented in CRAN-R, with custom scripts developed by the author to process and analyze the corpus.

To clarify the methodological approach: the stylometric clustering analysis employed purely unsupervised methods (PCA and hierarchical clustering) without any prior labeling. The sentiment analysis described here represents a separate analytical step applied to the already-identified clusters. The machine learning approach utilized the pre-trained Google Cloud Natural Language API (without any fine-tuning or additional training on our dataset) to classify sentiment and assess whether the unsupervised stylometric clusters exhibit systematic differences in communicative tone. This API applies a pre-trained supervised model developed by Google that will allow the model to learn from labeled examples and test classification accuracy. Through a unification of these two complementary approaches, the analysis was able to support a fine-grained mapping of stylistic authorship to tone of communication and thereby support an exploration of potential biases of policy signaling within clusters of authors.

While extensive research has already explored central bank communications from semantic and sentiment perspectives using domain-specific approaches, our contribution lies in applying stylometric techniques to reveal latent authorial structures within institutional discourse. Our primary objective is stylometric analysis for authorship attribution, not sentiment analysis itself. The sentiment analysis component serves as a supplementary validation tool to characterize the tonal properties of the stylometrically identified clusters rather than as the primary analytical focus. This methodological choice allows us to concentrate on the under-explored dimension of intra-institutional authorship heterogeneity, complementing rather than duplicating existing sentiment-focused research in central bank communication analysis.

4. Results

This section presents the empirical results derived from stylometric analysis of the CBRT press releases. Both sentence-level and document-level linguistic features were systematically extracted and measured using multivariate statistical methods. By applying principal component analysis (PCA) and hierarchical clustering on principal components (HCPC) to the textual dataset, clear patterns of authorship could be discerned from the textual corpus.

4.1. Principal Component Analysis (PCA)

PCA was applied to seven stylometric variables listed in Table 1 with a view to decreasing the dimensional complexity of the dataset while preserving the variance needed to distinguish stylistic signatures.

As shown in Figure 1, the two principal components captured 73% of the total variance—44% by Dim 1 and 29% by Dim 2.

Figure 1. Variables Factor Map.

Figure 1, the variable factor map, indicates that sentence structure features (average sentence length in words and letters) and punctuation measures (commas, colons, semicolons) loaded heavily on Dim 1, whereas measures of lexical richness like the type-token ratio (TTR) and average length of a word loaded heavily on Dim 2. This orthogonal distinction confirms that syntactic and lexical dimensions both contribute to stylistic differentiation of the texts collectively.

The individual factor map (Figure 2) locates each of the 557 CBRT announcements on the reduced two-dimensional PCA space. The visualization reveals noticeable clustering tendencies, which imply latent groupings of the documents preceding actual assignment of cluster membership.

4.2. Hierarchical Clustering Results

Hierarchical clustering was applied to the PCA-reduced features using Ward’s linkage criterion. The dendrogram produced (Figure 3) and associated cluster maps (Figure 4 and Figure 5) report a best solution that consisted of three discernible clusters.

Cluster 1 is characterized by brevity of sentence length (mean 16.42 words) and moderate punctuation frequency. Cluster 2 has intermediate sentence length (20.45 words) but is differentiated by the highest lexical richness, with a TTR of 0.72. Cluster 3 is highly divergent with much longer sentence length (41.32 words) and higher frequency of punctuation, at an average of 1.89 commas per sentence. Table 2 illustrates the prominent characteristics of each cluster. The noticeable divergence between sentence composition and lexical measures validates the occurrence of at least three stylistically distinct groups of authors within the CBRT communications.

Total variance decomposition across clustering presented the dominance of within-cluster variance over between-cluster variance, which is a requirement for a meaningful clustering solution to be a solution that is robust as well as informative about the topic matter at hand.

The selection of hierarchical clustering with Ward’s linkage criterion was based on its suitability for exploratory authorship attribution where the true number of clusters is unknown. Ward’s method minimizes within-cluster variance while maximizing between-cluster separation, making it optimal for detecting distinct stylistic signatures in textual data. The choice of k = 3 clusters was determined through comprehensive analysis of the dendrogram structure and inertia gain patterns, which demonstrated optimal cluster separation at this level.

Multiple validation approaches confirmed the robustness of our clustering solution. The inertia gain curve showed clear inflection points supporting the three-cluster structure, while dendrogram inspection revealed distinct hierarchical separation between groups. Multidimensional scaling validation using both character-level 4-g and word-level unigrams demonstrated the stability and distinctiveness of the three identified clusters across different feature representations. Additional validation through silhouette analysis and within-cluster sum of squares metrics confirmed cluster cohesion and separation, providing comprehensive evidence for the reliability of our HCPC clustering results.

4.3. Multidimensional Scaling (MDS) Validation

To validate independently of clustering structure, two different measures of textual similarity were used for multidimensional scaling: character-level 4-g frequency vectors for Figure 6a and word-level unigram frequency vectors for Figure 6b.

Both MDS projections reinforced the HCPC-derived clustering, while the spatial cohesion of Cluster 3 on both maps highlights its stylistic distinctiveness, possibly reflective of a special group of authors or a distinct stage of CBRT’s communication strategy.

4.4. Stylometric Insights

The overall stylometric evidence establishes the heterogeneity of the various author styles incorporated within CBRT press releases. The identified clusters differ systematically along structural dimensions like sentence complexity and use of punctuation as well as lexical richness, as measured by TTR scores. Also, initial analysis of sentiments indicates that the stylistic groupings map to tonal differences. In particular, Cluster 3, with its long sentence forms and higher density of punctuation, is linked to a negative sentiment profile compared to the overall neutral or positive tone seen for the other two clusters, 1 and 2. These observations have important implications for the interpretive models applied to CBRT communications, as they imply that market participants may respond, either consciously or instinctively, not only to overt policy content but also to stylistic and tonal signals built into the text.

The systematic variation in stylometric features suggests that CBRT employs multiple communication approaches within its institutional framework. Cluster 1’s concise style may reflect urgent policy announcements or standardized operational communications, while Cluster 3’s complex sentence structure likely represents technical policy explanations requiring detailed justification. The intermediate characteristics of Cluster 2 suggest a deliberate balance between accessibility and technical precision. These stylistic patterns have direct implications for policy communication effectiveness, as shorter sentences typically enhance comprehension and immediate market response, while longer, complex sentences may signal deliberative depth but risk communication clarity. The consistent clustering across time periods indicates that these represent stable institutional communication strategies rather than random variation.

5. Discussion

The empirical insights yielded by this study contribute substantively to the burgeoning field of textual analysis and stylometry within the domain of central bank communication. The identification of three discrete authorial clusters in the CBRT’s monetary policy announcements offers robust confirmation of the view that such institutional communications are rarely monolithic in their narrative construction. While the extant literature has predominantly concentrated on sentiment analysis, readability metrics, and thematic structuring, the present study introduces a distinct stylometric lens, demonstrating that even ostensibly anonymous policy statements exhibit systematic, quantifiable stylistic signatures.

The identification of three distinct stylistic clusters directly addresses the research gap regarding intra-institutional heterogeneity in central bank communications. These findings provide empirical evidence that CBRT communications reflect systematic contributions from different authorial groups within the institution, challenging assumptions of institutional uniformity.

The persistent stylistic differences—particularly in sentence complexity (Cluster 3’s 41.32 words vs. Cluster 1’s 16.42 words) and lexical richness—suggest distinct authorial signatures that remain stable over time, indicating at least three separate drafting groups or individual authors within CBRT. The temporal robustness of these clusters across leadership changes reveals that communication style is embedded in organizational structures rather than reflecting individual preferences, suggesting institutionalized drafting practices that transcend personnel turnover. While CBRT maintains authorship anonymity, the existence of detectable stylistic patterns creates unintended transparency dimensions—market participants could potentially identify underlying communication sources, affecting how policy signals are interpreted and potentially undermining intended institutional uniformity.

The detected stylistic variation has direct implications for policy effectiveness and market interpretation. When different authorial groups produce communications with systematically different sentence complexity and tonal characteristics, market participants may unconsciously associate certain stylistic patterns with specific policy stances or degrees of commitment. For instance, Cluster 3’s longer, more complex sentences coupled with higher punctuation density may signal technical precision, while Cluster 1’s brevity might convey urgency or decisiveness, creating unintended policy signals independent of content. These stylistic inconsistencies may undermine institutional credibility by suggesting internal disagreement or ad hoc communication processes, and when sophisticated market participants detect predictable stylistic patterns, they may begin to anticipate policy directions based on writing style rather than explicit content, potentially reducing the central bank’s communication control and creating unwanted market volatility.

The extraction of these clusters through hierarchical clustering on principal components (HCPC) implies the operational presence of multiple internal committees or drafting teams contributing to CBRT’s policy texts. This is particularly noteworthy in light of the institutional opacity surrounding authorship at the CBRT. The stylometric markers—encompassing variations in syntactic structure, punctuation density, and lexical richness, as captured by type-token ratios and mean sentence length—emerge as reliable discriminators of distinct authorial styles, even within the formal and standardized register characteristic of central bank discourse.

These findings bear significant implications for the interpretive models employed by market participants in decoding monetary policy signals. Should these stylistic patterns remain stable over time, they could function as proxies for underlying shifts in committee consensus or policy orientation, thus aligning with [21] contention that heterogeneity in central bank statements may reflect internal divergences in preference and decision-making power. Indeed, the persistent stylistic differentiation detected here suggests that market actors could, either explicitly or implicitly, respond to such textual cues in their assessments of policy direction.

Moreover, the temporal robustness of these clusters across successive changes in CBRT leadership highlights an institutional inertia in writing conventions that stand in contrast to documented cases in other monetary authorities. Previous research has demonstrated that leadership transitions in institutions like the Federal Reserve and European Central Bank correspond with observable shifts in tone, complexity, and communicative style [6]. CBRT’s resilience in maintaining stable stylistic regimes suggests an embedded structural consistency that may insulate its communication processes from leadership turnover.

From a policy communication perspective, these results raise critical considerations for transparency and efficacy. While stylistic differentiation may appear subtle, it has the potential to generate unintended signals regarding internal cohesion or strategic shifts, thus influencing market expectations. As Bholat et al. [22] argue, effective central bank communication hinges not merely on the clarity of content but also on the consistency of presentation. CBRT may, therefore, benefit from the further harmonization of its drafting protocols to mitigate the risk of market misinterpretation arising from such stylometric variation.

Additionally, the clustering outcomes presented here invite further investigation into the linkage between stylometric patterns and economic outcomes. Should future research establish systematic correlations between specific stylistic clusters and subsequent monetary policy actions—such as interest rate adjustments—stylometric profiling emerge as an anticipatory tool for market analysis. Extending this framework to include correlations with asset price responses, including bond yield spreads and exchange rate movements, would provide valuable empirical validation of stylometric signals as early indicators of policy shifts.

Furthermore, this study offers a methodological contribution by demonstrating the efficacy of integrating sentence-level and document-level stylometric features with dimensionality reduction (PCA) and unsupervised clustering (HCPC). This multidimensional text-analytic approach moves beyond traditional sentiment analysis paradigms, uncovering latent structural regularities within policy texts that elude surface-level lexical assessments [23]. In sum, the findings underscore the value of stylometry as a complementary analytical framework for dissecting central bank communications, revealing that even within standardized institutional discourse, measurable stylistic variance persists and may carry informational salience for financial markets and policy observers alike.

Finally, this study has several limitations that suggest directions for future research. The temporal scope ending in 2017 coincides with Turkey’s constitutional referendum that fundamentally transformed the institutional structure from parliamentary to presidential system, representing a structural break that makes pre-2017 data representative of a distinct institutional era. Future research should examine the post-2018 period to assess whether stylometric patterns persist under the transformed institutional framework and incorporate temporal modeling to examine how stylistic patterns correlate with macroeconomic events or institutional transitions. Additionally, while this study focuses exclusively on CBRT communications as an exemplary case for applying stylometric analysis to anonymous authorship detection in institutional contexts, comparative analysis with other emerging market central banks that maintain similar institutional anonymity would contextualize these findings and assess the broader applicability of the methodology. The clustering approach assumes authorship-driven variation, though contextual factors like policy complexity or audience shifts may also influence style. While PCA and MDS were effective, advanced machine learning techniques could better reveal non-linear stylistic patterns. This study establishes the essential methodological foundation for empirical market testing by first identifying distinct stylistic groups within CBRT communications. Meaningful market impact analysis requires normalizing communication patterns within each identified writer group, as different groups may have distinct baseline tones and stylistic tendencies. Finally, while the deliberate focus on stylometric rather than semantic features enables authorship attribution, integration with semantic and thematic analyses could provide additional insights into the content dimensions of institutional communication patterns.

6. Conclusions, Future Directions, and Policy Implications

This study provides novel empirical evidence on the internal heterogeneity embedded in the Central Bank of the Republic of Türkiye’s (CBRT) communications, revealed through the application of stylometric analysis to its monetary policy announcements. By employing sentence-level and document-level stylistic metrics, combined with principal component analysis (PCA) and hierarchical clustering on principal components (HCPC), the analysis uncovered three distinct stylistic clusters that challenge the conventional assumption of institutional uniformity. These findings demonstrate that CBRT communications exhibit systematic stylistic variation, likely reflecting the contributions of discrete authorial groups or drafting teams. Stylometric markers such as sentence length, punctuation density, and lexical diversity emerged as salient indicators of these latent authorial styles, thereby advancing the literature on central bank communication and illustrating the explanatory potential of computational text-analytic methods in institutional discourse.

The ramifications of these findings are wide-reaching. From a market perspective, the existence of discernible clusters of author style suggest variation in style may inadvertently influence the meaning of monetary policy statements by financial agents, in the extreme case of textual delicacy being closely monitored for policy guidance in particular. Such unforeseen communication has potential impacts upon perceptions in the market, directions of the exchange rate, and investor sentiment. From an institutional perspective, this points to the dimension of reinforcing the convergence of practices of communication so as to promote stylistic uniformity and strengthen the credibility of and ability to forecast policy signaling.

Extending these observations, a number of future research directions come to mind. Firstly, connecting stylometric clusters to economic outcome variables like exchange rate volatility, bond yield variation, or change in inflation expectations would put the predictive salience of stylistic differentiation to the task of explaining market outcomes. Secondly, extending the time coverage of the study to incorporate more recent communications would shed light on whether leadership change, differences in governance structures, or changes in macroeconomic conditions have restructured these stylistic profiles. Thirdly, the use of advanced natural language processing methodologies like topic modeling and pretraining-based language models like transformers may allow a richer comprehension of the semantic and thematic underpinnings of stylistic variation. Lastly, cross-country comparative research, especially in the realms of emerging market economies, would boost the external validity of these observations and shed broader light on the interplay between the design of institutions and the strategy of communication in the emergence of monetary policy effectiveness.

From a sustainability standpoint, this research contributes to building more transparent and accountable financial institutions—a cornerstone of sustainable development. By applying AI techniques to reveal hidden communication patterns, this methodology supports evidence-based institutional reform and promotes the transparency necessary for sustainable economic governance. The ability to systematically evaluate communication consistency using machine learning tools represents a valuable contribution to sustainable finance practices and institutional accountability.

Policy wise, the identification of consistent stylistic heterogeneity requires tangible institutional transformations designed to ensure greater clarity, coherence, and credibility of CBRT communications. Firstly, embracing standardized editorial guidelines and formal style books would facilitate the standardization of linguistic options, thus reducing stylistic inconsistency potentially conveying ambiguity or internal conflict. Secondly, the establishment of a centralized control mechanism within the communications apparatus—in charge of reviewing and synthesizing contributions from various drafting groups—would facilitate greater tonal and stylistic homogeneity. Thirdly, as algorithmic trading platforms employ real-time parsing of central bank communications, maintaining stylistic homogeneity is essential to the protection of predictable policy signaling and institutional credibility. Lastly, the CBRT might benefit from periodic training and capacity-development workshops for the officials charged with drafting policy announcements, ensuring greater sensitivity to the behavioral and market-sensitive consequences of language use. Together, these papers reflect the twin value of stylometric analysis: as a method of revealing hidden textual organization and as a practical resource for stimulating communication strategy. In revealing the interplay of subtle stylistic dynamics and institutional credibility and market interpretation, the paper places computational linguistics center stage in contemporary money communication research and practice.

Author Contributions

Formal analysis and investigation, H.E.; methodology, H.E.; writing—original draft, H.E.; review and editing, İ.Ö.; resources, İ.Ö. and H.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Blinder, A.S.; Ehrmann, M.; Fratzscher, M.; De Haan, J.; Jansen, D.-J. Central Bank Communication and Monetary Policy: A Survey of Theory and Evidence. J. Econ. Lit. 2008, 46, 910–945. [Google Scholar] [CrossRef]
Hansen, S.; McMahon, M. Shocking Language: Understanding the Macroeconomic Effects of Central Bank Communication. J. Int. Econ. 2016, 99, S114–S133. [Google Scholar] [CrossRef]
Ehrmann, M.; Fratzscher, M. Explaining Monetary Policy in Press Conferences. SSRN Electron. J. 2007. [Google Scholar] [CrossRef]
Lucca, D.; Trebbi, F. Measuring Central Bank Communication: An Automated Approach with Application to FOMC Statements; National Bureau of Economic Research: Cambridge, MA, USA, 2009; p. w15367. [Google Scholar] [CrossRef]
Gurkaynak, R.S.; Sack, B.; Swanson, E. Do Actions Speak Louder Than Words? The Response of Asset Prices to Monetary Policy Actions and Statements. Int. J. Cent. Bank. 2005, 1, 55–93. Available online: https://www.ijcb.org/journal/ijcb05q2a2.htm (accessed on 17 May 2025).
Ehrmann, M.; Talmi, J. Starting from a Blank Page? Semantic Similarity in Central Bank Communication and Market Volatility. J. Monet. Econ. 2020, 111, 48–62. [Google Scholar] [CrossRef]
Gardner, B.; Scotti, C.; Vega, C. Words Speak as Loudly as Actions: Central Bank Communication and the Response of Equity Prices to Macroeconomic Announcements. J. Econom. 2022, 231, 387–409. [Google Scholar] [CrossRef]
Pfeifer, M.; Marohl, V.P. CentralBankRoBERTa: A Fine-Tuned Large Language Model for Central Bank Communications. J. Finance Data Sci. 2023, 9, 100114. [Google Scholar] [CrossRef]
Gambacorta, L.; Kwon, B.; Park, T.; Patelli, P.; Zhu, S. CB-LMs: Language Models for Central Banking. Available online: https://www.bis.org/publ/work1215.htm (accessed on 20 August 2025).
Gómez-Cram, R.; Grotteria, M. Real-Time Price Discovery via Verbal Communication: Method and Application to Fedspeak. J. Financ. Econ. 2022, 143, 993–1025. [Google Scholar] [CrossRef]
Holmes, D.I. Authorship Attribution. Comput. Humanit. 1994, 28, 87–106. [Google Scholar] [CrossRef]
Koppel, M.; Schler, J.; Argamon, S. Computational Methods in Authorship Attribution. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 9–26. [Google Scholar] [CrossRef]
Stamatatos, E. A Survey of Modern Authorship Attribution Methods. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 538–556. [Google Scholar] [CrossRef]
Ahrens, M.; Erdemlioglu, D.; McMahon, M.; Neely, C.J.; Yang, X. Mind Your Language: Market Responses to Central Bank Speeches. J. Econom. 2025, 249, 105921. [Google Scholar] [CrossRef]
Grieve, J. Quantitative Authorship Attribution: An Evaluation of Techniques. Lit. Linguist. Comput. 2007, 22, 251–270. [Google Scholar] [CrossRef]
Apel, M.; Blix Grimaldi, M.; Hull, I. How Much Information Do Monetary Policy Committees Disclose? Evidence from the FOMC’s Minutes and Transcripts. J. Money Credit Bank. 2022, 54, 1459–1490. [Google Scholar] [CrossRef]
Schmeling, M.; Wagner, C. Does Central Bank Tone Move Asset Prices? J. Financ. Quant. Anal. 2025, 60, 36–67. [Google Scholar] [CrossRef]
Bertsch, C.; Hull, I.; Lumsdaine, R.L.; Zhang, X. Four Facts about International Central Bank Communication. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
Lê, S.; Josse, J.; Husson, F. FactoMineR: An R Package for Multivariate Analysis. J. Stat. Softw. 2008, 25, 1–18. [Google Scholar] [CrossRef]
Borg, I.; Groenen, P.J.F. Modern Multidimensional Scaling: Theory and Applications, 2nd ed.; Springer Science + Business Media: New York, NY, USA, 2005; pp. xxi, 614. [Google Scholar]
Hansen, S.; McMahon, M.; Prat, A. Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach *. Q. J. Econ. 2018, 133, 801–870. [Google Scholar] [CrossRef]
Bholat, D.; Hans, S.; Santos, P.; Schonhardt-Bailey, C. Text Mining for Central Banks; Centre for Central Banking Studies, Bank of England: London, UK, 2015; Available online: https://ideas.repec.org/b/ccb/hbooks/33.html (accessed on 20 October 2017).
Gentzkow, M.; Kelly, B.; Taddy, M. Text as Data. J. Econ. Lit. 2019, 57, 535–574. [Google Scholar] [CrossRef]

Figure 2. Individuals Factor Map.

Figure 3. Hierarchical Tree.

Figure 4. Hierarchical Clustering on the Factor Map.

Figure 5. Three Clusters on Factor Map.

Figure 6. (a) MDS on Most Frequent Character 4-g. (b) MDS on Most Frequent Word Unigrams.

Table 1. Used Attributes for Stylometric Analysis.

Sentence Level Measures	Sentence length	Avg. no. of Words in a sentence
		Avg. no. of characters in a sentence
		Avg. no. of commas in a sentence
		Avg. no. of semicolons and colons in a sentence
Text Level Measures	Punctuation	Type-Token ratio
		Avg. length of the word
		No. of Punctuation marks by characters in Text

Table 2. CBRT Announcement Cluster Features (Means).

Dimensions	Cluster 1	Cluster 2	Cluster 3
Words in Sentence	16.42	20.45	41.32
Characters in Sentence	82.33	106.97	205.59
Commas in Sentence	0.75	0.79	1.89
Colons-Semicolons in Sentence	0.34	0.11	0.67
Punctuation/Total Characters	0.01	0.01	0.01
Word Length	5.01	5.24	4.97
Type-Token Ratio (TTR)	0.37	0.72	0.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Emekci, H.; Özkan, İ. Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements. Sustainability 2025, 17, 8979. https://doi.org/10.3390/su17208979

AMA Style

Emekci H, Özkan İ. Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements. Sustainability. 2025; 17(20):8979. https://doi.org/10.3390/su17208979

Chicago/Turabian Style

Emekci, Hakan, and İbrahim Özkan. 2025. "Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements" Sustainability 17, no. 20: 8979. https://doi.org/10.3390/su17208979

APA Style

Emekci, H., & Özkan, İ. (2025). Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements. Sustainability, 17(20), 8979. https://doi.org/10.3390/su17208979