1. Introduction
Language serves as a fundamental tool for human communication, with its usage closely tied to the various functions it fulfills in social contexts. Online and offline social networks significantly influence how language is structured and used in different settings. The relationship between language and social networks has been a topic of interest in sociolinguistics and communication studies for decades [
1,
2,
3]. One important aspect of language within social networks is how linguistic features signal different roles or positions within a network [
4,
5]. Numerous studies have explored how language features operate in social networks and contribute to the formation of communities. For instance, research on discourse and social networks has demonstrated that language can reveal information flow within these networks and the dynamics of power relations [
6,
7]. Similarly, language functions transcend mere information transmission and vary with social context [
8].
In general terms, a register refers to a variety of language associated with a particular situation, including specific communicative purposes [
9,
10]. Registers are typically described by their characteristic uses of lexical and grammatical features, which vary according to the context in which they are employed. For example, academic language and entertainment language are distinct registers that serve different social functions—academic language prioritizes precision, objectivity, and intellectual rigor; entertainment language is geared toward engaging, entertaining, and appealing to emotions. These contrasting language functions shape their respective uses of linguistic features, such as vocabulary, sentence patterns, and tone, aligning with each register’s broader communicative goals.
In both academic and entertainment contexts, language serves as a tool for navigating social situations, where specific linguistic choices can align with professional or casual identities, facilitating the construction and negotiation of social relationships.
This topic has been explored in studies on academic English, such as those by Biber [
11], who investigated language variation in American higher education along two dimensions: speech versus writing and academic versus nonacademic language. Moreover, research by Nesi and Gardner [
12] examined language variation in academic writing across grades, various subjects, and genres, offering key insights into the use of academic language. However, studies on language variation in non-English academic settings, particularly in languages such as Chinese, remain underdeveloped [
13].
The development of linguistic analysis methods has made it possible to examine language across various dimensions, offering deeper insights into how the functions influence its features. A key methodology for exploring language variation is multi-dimensional analysis (MDA), a quantitative method developed by Biber [
14,
15,
16] that enables researchers to analyze the complexity of language across various registers. MDA identifies patterns of linguistic features that differentiate registers or communication contexts by considering multiple dimensions simultaneously, such as syntactic complexity, lexical diversity, and sentence structure [
11,
17]. In recent years, MDA has been widely applied to academic English to examine register variation. Studies by Biber et al. [
18], for example, have focused on grammatical complexity in English for Academic Purposes (EAP), while other research has investigated variations in academic writing across different sub-registers, including personal versus objective genres [
19] and native versus non-native learners [
20,
21]. Other studies have explored professional academic writing [
22,
23,
24,
25] and even the interaction between different research reports and disciplines, as exemplified by Gray [
26].
These studies confirm the existence of key dimensions of language variation, such as ‘involved vs. informational’ and ‘narrative vs. non-narrative,’ which are present across various forms of academic writing [
11]. Additionally, ‘stance’ and ‘persuasion’ have been identified as significant dimensions of language variation in academic discourse; however, the specific terms used in different studies are not always consistent. For instance, ‘stance’ has been referred to as ‘personal opinion’ [
20], ‘expressions of opinions, attitudes, emotions, and mental processes’ [
19,
21,
27], and ‘author-centered stance’ [
24]. Similarly, ‘persuasion’ is described as ‘reflective/argumentative discourse’ [
20], ‘production of possibility and argumentation’ [
19,
21,
27], ‘overt expression of persuasion’ [
16,
22], and ‘explicit versus implicit argumentation’ [
23].
In addition to academic discourse, MDA has been utilized to analyze language variation in various domains, including fiction [
28], internet texts [
29,
30,
31], and television programs [
32]. These studies have revealed common dimensions, such as ‘involved vs. informational’, and unique dimensions specific to each domain, such as ‘exposition or discussion vs. simplified interaction’ in television programs [
20]. However, despite the growing application of MDA across different registers, there is a need for further studies, particularly in non-English contexts, which investigate the comparative patterns of language variation between academic and non-academic registers—an area that holds promise for enhancing our understanding of the universal language dimensions proposed by Biber [
17].
Additionally, entropy, an important quantitative measure used to assess the information content of texts, was first defined by Shannon [
33] and has since been widely applied in numerous studies. Entropy allows for the calculation of information content of a given text sample, and in a simple way, information can be parameterized by word or part-of-speech diversity [
34]. For example, Christian et al. [
35] studied over 1000 languages and discovered that word entropies exhibit relatively narrow, unimodal distributions, aligning with information-theoretic communication models. Other research has found that entropies are sensitive to language and text categories [
36,
37]. These findings reveal patterns inherent in entropy values as a linguistic feature, which can indicate group membership and social identity, thereby influencing the creation and maintenance of network communities. It should be specially noted that entropy reflects unpredictability, not necessarily structural or functional complexity. For instance, a randomly generated text could have high entropy but no communicative value; thus, it possesses null complexity. Thus, the present study uses real natural language as the corpus, excluding randomly generated meaningless content, and utilizes entropy value calculations to discuss and supplement language functions such as “uncertainty.”
To summarize, this study will employ MDA and entropy analysis on a substantial corpus of both academic and entertainment Chinese texts to examine how language functions impact distinct linguistic features and influence broader social network communications in these contexts. The findings will enhance our understanding of how language functions across various registers, deepening our comprehension of the relationship between language variation and social contexts. Through this approach, the research aims to provide new theoretical insights into how language functions influence and are influenced by interactions within academic and entertainment registers.
The primary contributions of this study are outlined as follows:
This paper explores the influence of language functions on linguistic features. It identifies five key dimensions of linguistic variation: narrative versus rational discourse, modification, reference, uncertainty, and prudence.
By analyzing a comprehensive Chinese corpus, this study offers a theoretical perspective for understanding how language functions impact linguistic features and shape different registers.
For the language functions of “uncertainty,” this paper explains and illustrates them through entropy value calculation, which confirms the findings of multidimensional analysis and reflects unpredictability of language use in specific registers.
The remainder of this article is organized as follows:
Section 2 presents the research questions, the corpus under analysis, and a thorough explanation of the MDA framework, which includes linguistic features as well as the instruments and procedures utilized, followed by the methods for calculating entropy.
Section 3 provides the outcomes of the multi-dimensional analysis encompassing five key dimensions, along with the entropy calculation results across character, word, and sentence lengths.
Section 4 discusses the results in both academic and entertainment registers. Finally,
Section 5 summarizes the findings and contributions of this study and proposes directions for future work.
4. Discussion
In
Section 3.1 and
Section 3.2, we revealed five dimensions for summarizing language variations across the same four registers through a multi-dimensional analysis of 97 linguistic features. The dimensions emerging from this study are (i) narrative versus rational discourse, (ii) modification, (iii) reference, (iv) uncertainty, and (v) prudence. The first dimension, quite similar to the universal discourse ‘narrative versus non-narrative discourse’ proposed by Biber [
17], provides preliminary evidence that such a universal dimension of language variation also applies to Chinese written texts, at least when examining the language uses for registers in academic and entertainment contexts. The study found that the two academic registers differed significantly from the two entertainment registers in four out of the five dimensions.
Language in research dissertations and journal papers is found to be much more similar to one another, except for the first and fifth dimensions. More specifically, language in journal papers generally appears to be more rational and prudent than in research dissertations. Given that the vast majority of journal papers are authored by academic scholars while research dissertations are completed by graduate students, this finding seems highly reasonable. Based on this finding, dissertation supervisors or instructors of thesis writing should communicate more effectively with graduate students about language use approaches to enhance the level of academese in dissertations. More studies with careful design in this area are needed to determine whether this is a unique case in Chinese or a more prevalent issue in other languages as well.
Meanwhile, novellas and magazine articles differ in their language usage across each of the five dimensions, although both serve as written registers for entertainment purposes and are clearly distinct from academic registers. This indicates that MDA studies of written Chinese registers for non-academic purposes are very promising and warrant more attention and effort.
The above findings are also consistent with the results of the entropy value calculations, which can measure language uncertainty—a topic considered in multidisciplinary research [
46]—especially as an indicator of Dim.4 “Uncertainty,” with its importance second only to Dim.1. In
Section 3.3, we calculated three types of entropy values for four Chinese registers across 1000 texts. Our findings indicate that the average entropy values at both the character and word levels for entertainment texts, such as magazine articles and novellas, are significantly higher than those for academic texts, including journal papers and research dissertations. This is completely consistent with the distribution of mean scores of Dim.4 across different register texts, showing that entertainment has higher uncertainty than academic in character and word use.
We were also surprised to find that the distribution of uncertainty shows certain similarities with the distributions of Dim.1 and Dim.2, indicating that entertainment texts place more emphasis on “narrative discourse” and “modification, “which are associated with higher uncertainty in language due to various adjectives, stative verbs, and other components. In contrast, academic texts focus more on “rational discourse” and a lack of “modification,” using more fixed and less varied vocabulary, which in turn reduces the uncertainty in linguistic expression. Additionally, we noted significant differences in entropy values between any two types of texts based on all three measures, with the exception of novellas and magazine articles, which do not show a significant difference in word-level entropy. This indicates that the entertainment register has powerful functions in communication, causing novellas and magazines to exhibit similarities in certain aspects.
The above findings resemble those of previous research. For instance, Chen et al. [
47] also explored the entropies of various text types, identifying a similar distribution pattern of entropy for word forms and parts of speech (POS) in both Chinese and English. Their research found that news texts exhibited higher relative entropy of word forms, while academic texts demonstrated lower entropy values, suggesting significant differences in the syntactic structures of narrative versus expository text types. Our study similarly finds that entertainment texts show higher entropy values than academic texts, particularly at the character and word levels, which may reflect the richer linguistic features and expressions characteristic of entertainment texts.
However, concerning sentence length, the average entropy of entertainment texts is significantly lower than that of academic texts, which is similar to the distribution of mean scores of the texts in Dim.3 “reference.” In general, academic texts require a rigorous process of argumentation, often involving extensive citations to ensure the credibility and comprehensiveness of the arguments, which increases the uncertainty of sentence length. On the other hand, entertainment texts focus on capturing the reader’s interest and providing entertainment, using simple, direct language with an emphasis on emotional resonance. To reach a broad audience, entertainment texts typically avoid excessive theoretical citations and complex sentence structures.
Yu and Jiang [
48] focused on the colligational diversity of lexical and grammatical words in Chinese, using entropy values to explore how the collocational behavior of words changes through grammaticalization. In their research, an increase in entropy values indicates greater variation and semantic bleaching associated with grammaticalization. Although our study does not directly address the process of grammaticalization, the observed differences at the sentence-length level suggest that entertainment and academic texts diverge in their linguistic structures. Academic texts, which are often more rigidly structured and uniform in sentence length, exhibit lower entropy values. This structural linguistic feature may reflect the trends they describe, where grammaticalization leads to increased syntactic regularity and constraint, while entertainment texts display more linguistic variation.
In this study, it is also shown that three types of entropy values in Chinese texts can serve as linguistic features to distinguish between academic and entertainment registers, as well as their sub-registers. This supports previous studies suggesting that entropy values are influenced by both language and genre, demonstrating strong sensitivity to these distinctions [
36,
37].
Through MDA analysis of a large-scale corpus, we empirically confirmed certain claims in previous literature regarding the differences between academic and non-academic Chinese. More specifically, about two-thirds of the 97 linguistic features mentioned in previous works as marked features of academic or formal Chinese were found to be meaningful in distinguishing between academic and entertainment Chinese texts. For instance, the 73 linguistic features finally proven to be meaningful as distinguishing features between academic and nonacademic texts in Chinese include ‘specific classical Chinese words’, ‘oral words’, and ‘er-words’, as proposed by Li [
40] and Wu [
49], as well as indicators of degree of formality, as argued by Feng et al. [
39].
Over 30 linguistic features are not included in any of the five dimensions. One possible reason is that the academic texts analyzed were all from the fields of humanities and social sciences, excluding those from STEM fields. Therefore, features like ‘concrete terms of science and technology’ were not found to be meaningful in identifying language variations across the four registers. For most of the other features, the primary reason is simply the lack of sufficient evidence for their significant role in distinguishing between academic and entertainment registers. For example, the frequent use of compound sentences has long been recognized as a characteristic of academic Chinese. However, eight out of the ten compound sentences were actually more frequently used in novellas or magazine articles than in the two academic registers. Such findings highlight the importance of an empirical investigation of language use in written Chinese texts, as this study demonstrates.
More specifically, this study revealed significant differences between academic dissertations and journal papers in terms of Dim.1 “Narrative vs. Rational”, Dim.5 “Prudence”, and three kinds of entropy values. While both dissertations and journal papers in Dim.1 exhibit a rational function, the former demonstrates a notably weaker degree of rationality and tends more toward a narrative style. Additionally, dissertations show a stronger prudence function compared to journal papers, and their entropy values at the character, word length, and sentence length levels are also higher.
From the perspective of network implications, this phenomenon could be interpreted as reflecting the influence of social networks on language functions. Register is, on that account, intra-individual functional linguistic variation in a specific social setting [
50]. Research papers published in academic journals typically address cutting-edge topics in academic fields and are written by senior scholars who require a more rational style of expression. In contrast, dissertations are written by graduate students, who are less experienced in research and whose content tends to be less innovative, which may lead to a greater reliance on a narrative style. Furthermore, due to the limited duration of academic training, students may display greater prudence and uncertainty [
51] in their use of academic vocabulary and sentence structures in writing.
This distinction underscores the authoritative role that senior scholars or teachers occupy within academic networks, where rational discourse serves as a fundamental norm. The proficient use of such discourse by scholars not only reinforces their position within the network but also contributes to the maintenance of hierarchical structures. This hierarchical dynamic can limit students’ roles, positioning them as passive receivers of information rather than active, equal participants in collaborative academic networks. Consequently, it becomes crucial to examine the social function of rational discourse, particularly its role in shaping students’ visibility and influence within these networks.
The ability to effectively engage in rational discourse may enable students to navigate and assert themselves within these networks, potentially enhancing their academic capital and collaborative opportunities. In certain contexts, such as interdisciplinary teams, students’ narrative skills may offer distinct advantages, allowing for more flexible interaction and knowledge exchange across disciplinary boundaries. Therefore, we advocate for the use of social network analysis tools (e.g., Gephi [
52]) to map and visualize how discourse differences between teachers and students influence collaboration patterns and overall network dynamics. These approaches would provide a deeper understanding of how discourse shapes power relations and collaboration flows within academic settings, offering valuable insights into the complex social networks that underpin academic collaboration.
5. Conclusions
This study underscores the crucial role of language functions in shaping linguistic features within academic and entertainment registers. By employing multi-dimensional analysis (MDA) and entropy calculations on a large-scale Chinese corpus, we have identified five key language functions that account for a significant portion of the variation in language use across these registers. These functions demonstrate how specific linguistic features co-occur to create functional patterns that affect communication in various contexts.
The findings emphasize the intricate relationship between linguistic features and language functions, offering a novel theoretical framework for understanding how language adapts to different communicative environments. By investigating how these functions shape language in both academic and entertainment contexts, this study provides valuable insights into the evolution of social networks and the dynamics of communication across varied settings.
In conclusion, the integration of multi-dimensional analysis with entropy measures facilitates a deeper understanding of how language functions influence linguistic variation. This research contributes to broader theoretical discussions on language use and communication while offering practical implications for understanding the dynamics of social interaction in different registers. The limitations of this study highlight several areas for future improvement. First, there is a lack of a stronger theoretical synthesis between functional linguistics and information theory, which would provide a more comprehensive framework for understanding language functions in social networks. Second, due to the violation of homogeneity, applying a generalized linear model may offer better results and more robust insights in data analysis. Third, this paper lacks actual modeling or data from real-world networks, which limits the ability to directly test the theoretical concepts discussed.
In terms of future work, we aim to refine the analytical framework to enhance its applicability across diverse communicative contexts, thereby expanding its potential to examine the complexities of language functions within social networks. While this study focused primarily on linguistic features, future research could explore actual co-authorship networks to examine whether teachers’ use of rational discourse correlates with higher brokerage centrality or if students’ narrative styles predict bridging roles in less formalized networks. Additionally, it will be crucial to investigate a theoretical synthesis between functional linguistics and information theory to create a more integrated approach to analyzing language in social structures.