1. Introduction
Organizations increasingly communicate through many digital channels, yet official corporate websites remain a distinctive space: they are organization-authored, comparatively stable, and perceived by stakeholders as an authoritative reference point for “who the company is” and “what it stands for” (
Argenti, 2017;
Oestreicher, 2009). In particular, “About us” and company-description pages condense key identity claims (e.g., mission, vision, values, purpose) and frame them in language that can shape perceived legitimacy, credibility, and expectations among customers, partners, employees, and investors (
Falkheimer & Heide, 2023;
Hallahan et al., 2007;
Morsing & Schultz, 2006). Because these narratives are produced in a controlled setting, they offer a useful lens for studying corporate self-representation as part of strategic communication.
Strategic communication is commonly defined as the purposeful use of communication to achieve organizational goals (
Falkheimer & Heide, 2023;
Hallahan et al., 2007). In practice, this involves aligning messages across audiences and channels while maintaining coherence between declared values and actions, especially under increasing stakeholder demands for accountability and transparency (
Argenti, 2017;
Morsing & Schultz, 2006). Prior research shows that corporate websites play a continuing role in corporate communication architectures by hosting corporate narratives and providing access to reporting and stakeholder-oriented content (e.g., CSR, sustainability, governance materials) (
Cerioni, 2021;
Chibudike et al., 2021;
García-Sánchez et al., 2022). However, much of the empirical work on corporate websites has focused on content disclosure, usability, or thematic content analysis, often examining specific topics or sectors (
Baleanu et al., 2011;
Jose & Lee, 2007;
Ong & Djajadikerta, 2018).
In parallel, sentiment research in business contexts has largely concentrated on external sources such as social media, news media, or reviews, where sentiment is expressed by audiences rather than by organizations themselves (
Bhattasali, 2021;
Ingole et al., 2024;
Tetlock et al., 2008;
Uhl, 2014).
Kostić and Šarenac (
2020) suggest that the sentiments identified in corporate communications are part of a larger digital communication strategy that organizations should carefully construct and manage.
Buechel et al. (
2016) use an anthropomorphic perspective, viewing organizations as social agents with human-like characteristics. They discovered that sustainability reports possessed more dominantly positive emotional tones, whereas business reports were more neutral. This implies that the emotional content of the reports is influenced by the type of the reports (voluntary sustainability reporting versus mandated corporate reporting). This also suggests that organizations portray a particular emotional profile through their sustainability reports, which remains rather steady over time, thus adding to their unique organizational identity. We propose that the same extends to their descriptions on the official websites. This leaves a gap at the intersection of strategic communication and sentiment analysis: we know comparatively less about the affective framing that organizations embed in their own official website narratives, particularly in short “About us” descriptions that stakeholders often encounter early in their information search. Still, corporate narratives can be written in ways that emphasize optimism, trustworthiness, and anticipation of future success or, conversely, acknowledge challenges and risks, which may matter for how stakeholders interpret corporate identity claims and commitments (
Argenti, 2017;
Kent & Taylor, 2016;
Matten & Moon, 2008). Prior research has also examined corporate “About us” pages as arenas of strategic self-presentation. For example, Park, Lee, and Hong (
Park et al., 2016) show that Fortune 500 corporations construct symbolic realities through structured rhetorical elements that convey visions of economic superiority and organizational identity. Their findings suggest that corporate self-descriptions are not neutral informational texts but carefully designed narrative performances that vary by business type.
Sentiment analysis offers a systematic approach for quantifying such affective cues in text, typically by estimating polarity and discrete emotions using validated lexicons and aggregation procedures (
Kogan et al., 2009;
Liu, 2020;
Mohammad & Turney, 2013;
Nassirtoussi et al., 2014;
Pang & Lee, 2008). While sentiment methods have been widely applied to public discourse and customer feedback, their application to organization-authored website self-presentations remains relatively underexplored, especially using multi-emotion profiles rather than only positive/negative polarity.
Accordingly, this study examines how leading companies describe themselves on their official websites and what sentiment/emotion patterns are embedded in those descriptions. We focus on the top Croatian enterprises, based on the 2022 Lider list (
1000 Najvećih, 2023), and collect company descriptions from official websites (October 2023). Croatia is a theoretically and empirically relevant context for two reasons. First, it represents a smaller EU market that is less studied in computational corporate communication research, where evidence is often dominated by large English-language economies. Second, it is a post-transition EU member state operating within consolidated European regulatory and transparency frameworks (including increasing sustainability reporting pressures) yet characterized by a smaller market structure and evolving corporate governance practices. This hybrid institutional position allows us to examine whether affective framing and strategic signifiers identified in larger, highly institutionalized markets emerge similarly in a smaller, structurally different corporate environment. Also, they communicate to stakeholders in a distinct language and institutional environment, and this combination makes it informative to test whether affective and strategic signifiers appear similarly in a smaller, non-English corporate setting. By analyzing a Croatian-language corpus, the study extends computational corporate communication research beyond dominant Anglo-American contexts and contributes to cross-contextual validation of strategic communication patterns.
We use the term strategic communication here in an operational sense as organization-authored self-representation in a controlled digital setting (the official website). We treat company-description pages as narrative artifacts that typically contain strategic signifiers (mission/vision/values/purpose language and related identity claims) and affective framing (sentiment polarity and discrete emotions embedded in the wording). This understanding aligns with recent conceptualizations of visual strategic communication, which frame corporate self-representation—through symbols, images, and visual identity elements—as deliberate identity performances aimed at reinforcing authenticity and coherence in stakeholder perceptions (
Johansen & Gregersen, 2024;
Laba, 2024). While our focus is textual rather than visual, website self-descriptions operate within the same strategic logic of curated identity signaling in controlled digital environments. Corporate websites have also been conceptualized as strategic instruments of impression management, where structure, design, and content collectively shape organizational image and online reputation (
Briciu et al., 2020). In this view, “About us” sections function not merely as informational pages but as curated identity performances aimed at consolidating legitimacy and stakeholder trust. Importantly, our empirical measures capture textual patterns; interpretations of intent are therefore made cautiously and considered alongside alternative explanations such as standardized corporate language, sector-specific terminology, and compliance-related phrasing.
The study addresses three research questions:
RQ1: What sentiment polarity and discrete emotions dominate the self-descriptions of top Croatian companies on their official websites?
RQ2: Do companies form distinct and recurring sentiment–emotion profiles when clustered on emotion vectors derived from the NRC lexicon?
RQ3: How do these profiles differ in the presence of common strategic signifiers (e.g., mission, vision, values, goals, purpose) in company descriptions?
Our contribution is threefold. First, we provide empirical evidence on organization-authored website self-presentations in a Croatian corporate setting, expanding the geographical and linguistic coverage of computational research on corporate communication. Second, we combine polarity and discrete-emotion sentiment measures (AFINN, Bing, NRC) with a clustering approach to map sentiment profiles that summarize recurring narrative patterns. Third, we propose these profiles as a practical basis for benchmarking and auditing corporate self-descriptions (e.g., identifying optimistic consumer-oriented narratives, issue-addressing transparency, and low-affect technical descriptions), motivating future work that links website tone to stakeholder outcomes and firm characteristics.
The remainder of the paper is organized as follows. The next section describes the data collection process, sample characteristics, and sentiment analysis methodology, including the lexicons and clustering procedures employed. The Results Section presents the empirical results, reporting descriptive sentiment patterns and the identified sentiment–emotion profiles, followed by a section that discusses the findings in light of strategic communication theory, considers alternative explanations and limitations, and outlines implications for research and practice. The last section concludes the paper and highlights directions for future research.
2. Methods
We began by extracting official, usable website self-description from the Lider’s list of the top 1000 companies (
1000 Najvećih, 2023) and visiting their official websites in October 2023. The ‘About us’ section was the primary source from which we gathered the corporate descriptions. We employed the description located on the home page in instances where such a section was unavailable. If the description was not available, we moved on to the next company on the list.
The companies in the sample are headquartered in 14 of the 21 Croatian counties and operating across 32 distinct sectors (Lider taxonomy), reflecting substantial heterogeneity in firm characteristics. In practical terms, the largest representation comes from energy and utilities, retail/wholesale and FMCG, banking/insurance and financial services, manufacturing/industrial and logistics, and transport/tourism-related services, alongside smaller numbers of firms in specialized B2B niches (e.g., technology, infrastructure, professional and technical services).
In terms of market orientation, the sample contains a mix of B2C, B2B, and mixed-model firms. B2C-dominant firms are primarily in retail, telecommunications, consumer goods, and tourism/hospitality, while B2B-dominant firms are more common in industrial manufacturing, energy trading/supply chain, infrastructure operators, and specialized distribution. Several firms operate in hybrid models (e.g., utilities, large distributors, and transport/logistics providers) serving both end consumers and organizational clients; therefore, B2B/B2C orientation should be treated as a coarse contextual descriptor rather than a strict classification.
Regarding ownership structure, the sample includes a heterogeneous mix of state-owned/public service and infrastructure entities (e.g., utilities and transport/infrastructure operators), privately owned domestic firms, and subsidiaries of international corporate groups (notably in banking, retail, energy, and fast-moving consumer goods). Because ownership information is not always consistently available in the Lider dataset in a standardized field, ownership is used descriptively to contextualize narrative patterns.
As shown in
Table 1, firms vary widely in economic scale and organizational size. Total income in 2022 is highly right-skewed, with a median of 73.19 million euros and a larger mean (77.07 mil. eur), indicating the presence of a small number of very large firms. A similar pattern is observed for exports and imports, where median values (9.49 million and 3.7 million of euros, respectively) are substantially lower than means, and many firms report zero or very low international trade volumes.
Employment levels also display considerable dispersion, with a median of 254 employees (range: 3–1564), while both median and average net salaries surpass the average monthly earnings in Croatia in 2022 (which was about 1015 euro at the time). Financial change indicators show mixed dynamics: while median income change is positive (0.15), both income change and revenue measures exhibit substantial variability, including negative values, reflecting differences in firms’ recent performance trajectories. Overall, the descriptive statistics confirm that the sample captures a highly heterogeneous corporate landscape, characterized by strong skewness in financial variables.
Although some companies have websites in English, we collected all descriptions in Croatian to ensure uniformity since the descriptions available in English may be adapted to different markets and may not be fully indicative of the descriptions in Croatian. We employed machine translation to ensure direct translations and minimize potential translation biases, and the translations were manually reviewed for accuracy. Since sentiment analysis in this study uses the word-term as the analysis unit, machine translation’s focus on word-by-word or phrase-level accuracy was particularly advantageous in preserving meaning at the word level. Data and the analysis are available in
Supplementary files.
We utilize the R programming language (v. 4.3.2) and RStudio (2026.01.0+392) to conduct sentiment analysis, utilizing a variety of R packages to assess sentiments presented in company descriptions, such as tidytext (
Silge & Robinson, 2016), tidyverse (
Wickham et al., 2019), dplyr (
Yarberry & Yarberry, 2021), stringr (
Wickham, 2019), tm (
Feinerer, 2013), SnowballC (
Bouchet-Valat & Bouchet-Valat, 2020), RColorBrewer (
Neuwirth & Neuwirth, 2014), syuzhet (
Jockers, 2017), ggplot2 (
Wickham et al., 2023), wordcloud (
Fellows et al., 2018), factoextra (
Kassambara & Mundt, 2017), mclust (
Scrucca et al., 2023), cluster (
Maechler, 2019), clusterCrit (
Desgraupes & Desgraupes, 2018), psych (
Revelle & Revelle, 2015), and rstatix (
Kassambara, 2019). All packages, except for the last package, are used for the sentiment analysis. The last one is used for the cluster analysis of quantitative data derived from the sentiment analysis employed to reveal the patterns in communication strategies.
The sentiment analysis process begins with the extraction of sentiment scores from tokenized text using three distinct lexicons: AFINN, Bing, and NRC. The strength and polarity of sentiment are indicated by the scores assigned to each word in the AFINN lexicon, which range from −5 to +5 (
Nielsen, 2011). Words are classified into positive and negative sentiments in the Bing lexicon. In contrast, the NRC lexicon (
Mohammad & Turney, 2013) offers a more complex analysis by incorporating a variety of emotions, including fear, joy, and trust.
Each word in the company descriptions is scored or categorized based on the selected lexicons. This approach produces sentiment scores (AFINN), binary classifications (Bing), and emotion categorizations (NRC) for each textual unit. Then, sentiment scores and categories are aggregated at the company level to derive an overall sentiment orientation and emotional profile for each company description.
After determining companies’ positions using positive and negative sentiment values on the axes, the analysis proceeds to clustering. Further, the cluster analysis process utilized sentiment scores derived from the NRC dictionary that were applied to company descriptions. In order to ensure that all sentiment categories contributed equally, the analysis began with the normalization of these results. The elbow approach was employed to determine the optimal number of clusters. Consequently, three clusters were chosen and then identified by the application of the k-means algorithm. Dimensionality reduction techniques were implemented to visualize the 10-dimensional sentiment data in a 2-dimensional plot.
Results are visualized using plots and word clouds, highlighting prevalent sentiments and emotions, the distribution of sentiment polarity, and the most frequent sentiment-bearing words in the company descriptions. We expect to determine sentiment prevalence in corporate descriptions. This examination will also show the manner in which organizations strategically position themselves regarding the sentiments within their narratives. The most prominent and sentiment-laden words used in the company descriptions will indicate the core themes and values that organizations highlight. Nevertheless, cluster analysis would reveal the overarching types of sentiments used in the communication strategies of the organizations by grouping them by the values of similar sentiments.
3. Results
Initial insights show that data contains 33,343 words in 1442 sentences. Terms like “Mission,” “Vision,” “Goals,” “Values,” and “Purpose” are standard elements of corporate language and are commonly used in company descriptions to communicate strategic focus and corporate identity (
Table 2). Certain terms are consistently used in both Croatian and English descriptions. For instance, the term “Purpose” is observed seven times in both versions, indicating that the company’s primary intent is similarly underscored in both languages. The Croatian and English descriptions differ in their emphasis on particular terms; the term “Goals” is employed more frequently in Croatian descriptions than in English ones. This difference is due to the fact that the English language employs a more nuanced terminology, such as goals, objectives, targets, and aims, in contrast to a single word in Croatian.
Both “Mission” and “Vision” are frequently referenced, with a somewhat higher frequency in the English descriptions than in the Croatian versions but reflecting the same meaning in descriptions. The continuous use of these terms indicates that a substantial number of organizations value the dissemination of their strategic aim and identity, and these are common focal points in company descriptions.
The term “Values” has a high frequency in both versions. This points to a strong emphasis on corporate values as part of the company’s identity. However, the data suggests general trends rather than absolute rules: not every company places the same level of emphasis on these terms, but as a collective, these terms are important.
The slight increase in the frequency of “Mission,” “Vision,” and “Values” in English descriptions could suggest that the machine translation process might add slight variations or that the terms in English may capture a broader range of related concepts than the Croatian equivalents (
Table 2). The data suggest that the strategic concepts these terms represent are preserved across languages, which is important for companies operating in multilingual environments or with a diverse stakeholder base.
Based on
Table 3, anticipation seems to be the most dominant sentiment, with “production” leading the count at 130 mentions, suggesting that the texts may be future-oriented or focused on growth and development themes. Some words relate to more than one sentiment, such as “providing,” which is associated with anticipation, joy, and trust. This means a reader can grasp any of these sentiments or their combination, and it could indicate the relevance of the contexts where the same word carries different connotations depending on its use.
Disgust and fear have relatively low word frequencies, with “waste” and “highest” being the most common words, respectively. This could indicate that negative sentiments are less prominent in the analyzed texts or that the lexicon for these emotions is not heavily triggered.
Joy is represented with positive words like “providing,” “success,” and “good,” which are mentioned fairly frequently. This suggests a positive tone in the texts where companies might be highlighting their achievements or the benefits they offer.
The high frequency of trust-related terms, like “leading,” “management,” and “system,” suggests that the organizations are striving to establish the perception of credibility and reliability in their descriptions. The frequencies of words linked with surprise and sadness are lower than those related to trust and anticipation, suggesting that these sentiments may not be as clearly portrayed or may be avoided in the corporate descriptions. The emotions of surprise and anger are related with the least variety of words, which implies that the descriptions place a smaller weight on these emotions or have a narrower range of settings.
“Leading” and “management” are the most frequent words associated with trust, which could also reflect a strategic emphasis on leadership qualities and managerial expertise in the company narratives. This implies that there is a strategic use of language in the corporate descriptions that may be aimed at eliciting specific emotional responses from readers or reflecting certain corporate values and themes, such as innovation, sustainability, and leadership.
The emotion of “trust” has the highest relative frequency among the emotions offered, which implies that the language employed in the corporate descriptions is mainly targeted toward generating or reflecting trust (
Figure 1). The second most frequently expressed emotion is “anticipation,” suggesting that the descriptions place a substantial emphasis on the future or forward-thinking features. This may indicate a strategic emphasis on growth, expectations, or forthcoming initiatives. Additionally, this might serve as an indicator of inspiring anticipation in the context of customer engagement and participation in organizations’ initiatives.
The presence of words like “waste,” which invokes disgust, could suggest environmental-related content within the texts, which may be areas of concern or focus for the companies involved. “Change” can relate to the internal or external changes that companies face, but that word is associated with fear.
“Surprise,” usually perceived as a positive emotion, has a lower frequency. That can mean that companies prefer to maintain a tone of stability and predictability in their communications. Nevertheless, according to the NRC lexicon, words such as “good,” “present,” “highest,” “organization,” “dynamic,” and “unique” provoke the sentiment of surprise. However, it is important to note that the NRC categorization is based on general language use and may not align perfectly with corporate communication contexts. In the corporate context, these words are used so commonly that they have lost their “surprising” quality and instead become standard vocabulary for projecting competence and positivity, meaning that companies use these words to convey positive attributes and competitiveness and not to provoke surprise. This may be interpreted as an effort by corporations to find a balance between showcasing their original, innovative activities and presenting a sense of stability. Consequently, despite their classification as “surprising” in standard English usage, adjectives such as “unique” and “dynamic” help to this purpose.
The research indicates the presence of diverse emotions, with a distinct inclination toward positive sentiments such as “trust” and “anticipation.” These sentiments are frequently linked to reliability and forward momentum, both of which are desirable qualities in corporate representation. The strategic significance of the dominance of positive emotions for organizations is that it may be employed to impact stakeholder perception, showing that the company’s objective is to be seen favorably by readers (
Figure 1 and
Figure 2). Furthermore, the emphasis on “trust” may be employed to establish relationships with stakeholders, as trust is a critical component of commercial partnerships and interactions.
The company descriptions are rich and varied in emotional content, as shown by the word cloud (
Figure 3), which contains a wide range of words linked with a spectrum of emotions. The companies’ use of adjectives such as “excellence,” “success,” “good,” and “improvement” in their descriptions is indicative of the construction of a positive image, self-promotion, and positive sentiment. On the other hand, terms such as “vision,” “production,” and “organization” with “anticipation” indicate the strategic significance of the company’s future objectives and direction.
Words such as “risk,” “waste,” and “cutting,” near “disgust,” and “fear” might reflect the acknowledgment of challenges, risks, or negative aspects that the company is facing or addressing. However, the use of terms like “sewage,” “military,” and “operation” may reflect certain industries or sectors and sector-specific language that does not bear the fear of disgust when read in the context.
Despite the existence of terms that are linked to negative emotions, such as “disgust” and “sadness,” they appear to be less frequent than those that are linked to positive emotions. This suggests that organizations may be making an effort to achieve balance in their public image.
The top 25 organizations are represented in the bar chart (
Figure 4), which is determined as the difference between the sum of good and negative sentiments produced using the Bing dictionary. The presented companies have the greatest net positive sentiment scores, indicating that their descriptions contain a greater amount of positive language than negative.
Figure 4 may illustrate sector-specific trends in positivity despite the fact that these businesses operate across various industries. For example, the hospitality industry is frequently associated with a more positive and enthusiastic word use, as demonstrated by the Arena Hospitality Group. The language use of Henkel Hrvatska is defined by a deliberate combination of aspirational and achievement-oriented vocabulary, which underscores their lengthy history, global presence, and future-focused mindset. Themes of sustainability, digitalization, and corporate responsibility are skillfully integrated into their communication strategy, which effectively balances high-level corporate principles with actual consumer-facing information.
Furthermore, this chart serves as a reference for analysts and firms to ascertain the company’s position in relation to others in terms of sentiment expression within their industry or the market. It might also lead to a review and correction of companies’ messaging to ensure that it is consistent with the targeted sentiment levels, particularly for the companies that have lower sentiment scores.
The highest value on the positive sentiment axis is 66, while the highest value on the negative sentiment axis is 11, which additionally confirms the prevalence of positive sentiments in the sample (
Figure 5). The plot shows a higher density of companies clustered in the low range of both positive and negative sentiment, with fewer companies at the extremes. This could indicate a tendency for most companies to maintain a balanced approach in their communications. A few outliers that deviate from the rest of the data relate to the top companies in
Figure 4 and represent companies with particularly unique communication strategies with a lot of expressed emotions, primarily positive ones.
While a linear relationship between the positive and negative sentiments cannot be observed, the Spearman correlation coefficient (, p = 0.026) points to a weak relationship in a negative direction. It shows that the more positive sentiments companies tend to use in their descriptions, negative sentiments will be used somewhat less.
The plot can be divided into quadrants by the dashed red lines to categorize companies based on their sentiment profile (e.g., high positive/low negative, high positive/high negative, etc.). This could provide a simple classification scheme for communication strategies. However, there is a more precise way to cluster the companies’ communication strategies. Cluster analysis is applied next to group companies with similar sentiment profiles in their communications, allowing us to identify distinct patterns in how different businesses express emotions and sentiments. This segmentation helps identify similar communication strategies across diverse industries and corporate types.
To assess the robustness of the clustering solution, we conducted a set of sensitivity checks (reported in the
Supplementary files). First, the k-means algorithm was repeated with multiple random initializations, yielding highly consistent cluster assignments across runs. Second, alternative values of k were explored (k = 2–4), with the three-cluster solution providing the most stable and interpretable structure.
As a robustness check, we repeated the clustering using only discrete emotion dimensions (excluding polarity-based sentiment measures). The elbow criterion again suggested a three-cluster solution. To assess the consistency of firm assignments independently of label identity and cluster geometry, we computed the Adjusted Rand Index (ARI) between the full-emotion and reduced-emotion solutions. The resulting ARI of 0.53 indicates that a substantial proportion of firm pairs are grouped together in both solutions and the three-profile structure is largely preserved when using discrete emotions alone, whereas clustering based solely on positive or negative emotions yields substantially lower agreement.
Finally, clustering was repeated using restricted polarity-based subsets (positive-only and negative-only dimensions). In both cases, the solution collapsed to a two-cluster structure, indicating that polarity-related measures alone are insufficient to recover the full three-profile segmentation; however, the resulting partitions still separated broadly more affect-rich/optimistic narratives from more restrained, low-affect descriptions, with boundary cases shifting across specifications. Together, these checks suggest that the identified sentiment–emotion profiles are not driven by a single modeling choice.
Cluster validity was assessed using average silhouette width, the Calinski–Harabasz index, and the Davies–Bouldin index for k = 2–6. None of the indices indicated a single, sharply dominant solution. While k = 4 yielded slightly higher silhouette and lower Davies–Bouldin values, the improvement over k = 3 was marginal, and the Calinski–Harabasz index favored a more coarse partition (k = 2). We therefore selected k = 3 as a balanced solution that provides interpretable and theoretically meaningful sentiment–emotion profiles while remaining stable across robustness checks. This choice reflects a trade-off between geometric separation and substantive interpretability, which is common in clustering analyses of high-dimensional textual and affective data.
The resulting dimensions account for 33.2 percent and 24.3 percent of the variance, respectively. Subsequently, the clusters are represented in this 2D figure, with each point denoting a company and colored in accordance with its cluster assignment (
Figure 6).
It appears that dimension 1 (33.2% of variance) captures the expression of positive and negative sentiments, such that companies that are placed on the right side of the axis exhibit more positive, whereas those that are situated lower on the axis exhibit more neutral or negative sentiments. Dimension 2, which accounts for 24.3 percent of the variance, represents overall emotion intensity, with companies closer to the top of the plot showing higher emotion scores across both positive and negative sentiments.
Kruskal–Wallis tests revealed no statistically significant differences across clusters for total income, revenue, export/import activity, employment, or average wages (all
p > 0.10); complete analysis is available in
Supplementary files. A marginal effect was observed for income change (χ
2 = 5.62,
p = 0.06), which did not reach conventional significance thresholds. The absence of systematic financial or structural differences reinforces the interpretation of clusters as narrative–affective profiles rather than reflections of firm size or performance.
The association between sector and cluster membership was not statistically significant when assessed using a Monte Carlo chi-square test (p = 0.87). Although no statistically significant association between sector and cluster membership was detected, the interpretation of sectoral effects is constrained by sparse distributions and limited statistical power. The observed Cramér’s V of 0.526 suggests that sectoral differences may still be substantively meaningful but cannot be reliably tested within the present sample. This pattern suggests that sentiment–emotion profiles are not strictly sector-determined, yet sectoral communication norms may still play a role that cannot be conclusively assessed due to sample sparsity.
The blue cluster is distinguished by moderate to high scores across most emotions, with the highest values achieved for anger, disgust, fear, and sadness (compared to other clusters). This cluster includes companies from various sectors (such as healthcare, insurance, utilities, manufacturing, automotive, and technology) and both B2C and B2B companies. Many of these companies have operations that involve complex processes or deal with potentially sensitive issues while operating in highly regulated industries.
However, the main shared characteristic of the companies in this cluster is not the industry itself but rather the company’s approach to communicating about the problematic aspects of its operations or the issues it wants to solve. This demonstrates a communication strategy that prioritizes transparency regarding environmental and social challenges despite the potential to generate a more unfavorable sentiment profile than companies that refrain from addressing such issues. Their communication style involves acknowledging concerns and concurrently emphasizing their obligation to provide solutions or vital services.
Due to the nature of their business, some of the companies must discuss environmental concerns, waste management, or potentially risky products. Also, many of these companies need to communicate with diverse stakeholder groups, from regulators to consumers, explaining the more nuanced communication approach. Their communications invariably incorporate terms and notions that sentiment analysis identifies as negative (e.g., waste, pollution, pain, damage). This cluster underscores a shortcoming of basic sentiment analysis: firms that are actively engaged in the resolution of significant environmental, social, or public health issues may receive a higher score in negative sentiment due to their open discussion and resolution of these challenges.
This results in a sentiment style that is more emotionally charged and ambivalent, an indication of the importance of maintaining public trust while addressing potentially negative topics. Additionally, the majority of the organizations in this cluster are involved in industries that are subject to strict regulation. This also explains the need for careful, balanced communication that addresses both positive aspects and potential risks. In order to detect obstacles and concurrently maintain a positive and trustworthy public image, they must methodically manage their communications.
The green cluster is mostly composed of firms that specialize in the production of food, retail, and tourism. These sectors are ideally suited to enthusiastic communications that emphasize pleasure, quality, and enjoyment. These enterprises focus on the provision of great customer experiences and the improvement of lifestyles, which explains their high scores in positive sentiments, such as anticipation, trust, and joy. Moreover, their descriptions invoke the highest levels of surprise in comparison to other clusters. However, the absolute intensity of these emotions varies significantly among cluster members, which means that some companies adopt a mildly positive tone while others employ a highly enthusiastic or optimistic tone. The companies in this cluster have the lowest negative sentiment scores, which implies that they prioritize optimism in their communication strategy. This cluster is likely to reflect organizations with a generally positive public impression, and they use language that is intended to engage and inspire their audience.
The inclusion of a few outliers from industries that we may anticipate to be present in other clusters (such as pharmacy, insurance, automotive, energy, and furniture) implies the intentional implementation of communication tactics that are more frequently observed in sectors that prioritize customer satisfaction and experience. These companies may be differentiating themselves by emphasizing positive results and customer benefits rather than mentioning solutions for problems. This tactic may be indicative of recent rebranding initiatives, a focus on the well-being or lifestyle components of their products, or a strategy to differentiate themselves in competitive marketplaces. Such positioning demonstrates that while industry norms somewhat influence communication styles, individual company strategies can lead to unexpected sentiment profiles, highlighting a trend towards more positive, consumer-friendly messaging across diverse sectors.
The red cluster comprises companies’ descriptions with low emotional intensity and neutral to negative sentiment and predominantly comprises companies operating in technical, industrial, and B2B sectors. The cluster shows low values across all emotions, with particularly low values of anticipation, joy, and trust compared to other clusters. The insights indicate that these companies utilize a communication approach that emphasizes factual information, technical expertise, and operational specifics while simultaneously limiting emotional appeals and brand storytelling. This strategy is likely the consequence of the products and services they offer, which are tailored to target specialized industries or other organizations that stress professional competence and technical requirements over emotional connection. Nevertheless, this cluster also encompasses retail enterprises, which implies that even consumer-facing corporations may use a more subdued tone in their corporate communications.
These different companies have a propensity for a neutral, fact-based communication approach, which may also be driven by factors such as regulatory restraints, the importance of professional connections, or a long corporate history. Although this approach is consistent with the nature of their business and the target audience, it also brings considerable opportunities for these organizations to proactively integrate more engaging components into their communications. That means that these businesses have the potential to distinguish themselves in their respective industries without losing their professional tone if they strategically incorporate more positive language.
Through the cluster analysis of companies’ sentiment profiles, three distinct groupings were identified, each indicative of its own features and communication tactics. It is important to recognize that, despite the fact that industry type is typically the determining factor in communication styles, specific business objectives often lead to unexpected sentiment profiles. This suggests that while the regulatory environment, target audience, and public perception issues significantly affect organizational communication approaches, they sum up to a company’s strategic choice.
The data also underscores a substantial limitation of basic sentiment analysis: companies that publicly address problems, such as environmental issues, tend to achieve a higher score in negative sentiment. The relevance of context in the interpretation of sentiment scores in corporate communications is illustrated by these findings, which account for the intricate interplay between the emotional tone of corporate messages, industry traditions, and company-specific objectives.
4. Discussion
These findings make a valuable contribution to the ongoing discourse on website content and strategic communication in the digital era. The research confirms the importance of authentic business narratives in a digitally linked society (
Kemp et al., 2023) by merging sentiment analysis of company-authored descriptions with classical strategic communication frameworks, as outlined by
Hallahan et al. (
2007) and
Morsing and Schultz (
2006). The notion of authenticity invoked here should not be understood in a strictly essentialist sense. Recent work on organizational-level visual identity authenticity argues that dominant approaches often assume the existence of a stable organizational essence that can be transmitted through coherent identity elements, while alternative perspectives emphasize authenticity as dynamic, aspirational, and co-constructed in interaction with stakeholders (
Johansen & Gregersen, 2024). Although their focus is on visual identity, the underlying debate is equally relevant to textual self-representations on corporate websites. In this view, the affective patterns identified in our analysis may reflect not a direct transmission of an internal essence but rather a strategic and institutionalized performance of identity shaped by stakeholder expectations and communication norms.
Buechel et al. (
2016) have observed a greater propensity for organizations to portray an emotional profile through their communications, which is indicative of a more comprehensive strategic purpose. Our investigation’s results corroborate their views, particularly in the setting of customer-oriented firms.
Mohammad and Turney’s (
2013) research on the broad spectrum of emotions is consistent with the study’s emphasis on sentiments in corporate descriptions and is particularly evident in the communication of organizations that address sensitive issues. The results showed that “Mission,” “Vision,” “Goals,” “Values,” and “Purpose” are frequent corporate language elements used to communicate strategic emphasis and corporate identity. This shows that the content of corporate websites has evolved to be more strategically aligned with corporate objectives than it was with early websites (
Pablo & Hardy, 2009). That means that a noticeable shift has been observed toward using website content not just for information dissemination but as a tool for reinforcing corporate narratives and engaging stakeholders more deeply. This corresponds with the increased emphasis on purpose-driven enterprises, where communicating a clear purpose beyond profit can boost stakeholder involvement and benefit society (
Rey et al., 2019). Moreover, the value-driven approach may help to improve corporate reputation or influence stakeholders’ expectations about corporate behavior (
Gomez-Vasquez, 2023).
These findings about strategic signifiers indicate a strategic approach as a plausible driver of the corporate narratives, including the expressed sentiments. The significance of sentiment analysis in corporate communication is underscored by its ability to distill the essence of corporate narratives into quantifiable data, thereby enriching strategic communication practices. The strategic positioning of Croatian top enterprises, as reflected in the prevalence of the positive sentiments of trust and anticipation expressed in their descriptions, is consistent with current discoveries in the field of corporate communication (
Argenti, 2017;
Smith et al., 2022) and integrated communications management (
Varona Silva et al., 2024). As organizations manage the intricacies of digital transparency, they underscore the importance of trust in their stakeholder relationships. Similarly, forward-looking statements are associated with anticipation, which is essential for company activities that are focused on the future and investor interactions.
Additionally, the broader strategic communication landscape is influenced by the presence of positive sentiment. It implies that firms are increasingly embracing a narrative that is consistent with their progressive and optimistic corporate ideals, potentially in response to the increased demand for corporate social responsibility and sustainability (
Greenwood & Van Buren, 2017) and transparency demands under the EU directive 2022/2464 (
Macuda & Zieniuk, 2024). This observation is consistent with the movement toward more engaging and empathic communication, as promoted by Kent and Taylor (
Kent & Taylor, 2016), who underscore the necessity for firms to establish a more personal connection with their stakeholders.
However, the language in corporate self-descriptions can also arise from institutionalized corporate templates, social desirability pressures, sector-specific terminology, and compliance-oriented disclosure practices. An additional implication concerns the growing role of AI in both the consumption and the production of corporate website narratives. By 2023, parts of corporate website copy may have been influenced by AI-assisted drafting or editing workflows, potentially increasing stylistic homogenization and positivity. Additionally, AI-mediated search and browsing (e.g., conversational agents and summarization features embedded in search engines or browsers) increasingly act as interpretive gatekeepers that condense “About us” content into short answers, key points, or comparative snippets. In such contexts, the strategic and affective cues we measure-mission/vision/value signifiers and the dominance of trust/anticipation language-may become even more influential because they are likely to be prioritized, paraphrased, and re-presented to stakeholders at scale. Second, generative AI is progressively used to draft or optimize corporate copy, which may lead to stylistic homogenization and inflated positivity, potentially weakening authenticity and increasing the risk of “formulaic” reputation signaling. This dual AI effect suggests that sentiment profiles are not only reflections of deliberate strategic communication but also may be emerging artifacts of AI-assisted content workflows and AI-filtered stakeholder exposure in the future (
Kalaivani et al., 2025). Future research could therefore compare human-authored versus AI-assisted corporate descriptions, test whether AI summaries preserve or distort sentiment signals, and examine whether AI-generated optimism is associated with stakeholder trust outcomes or skepticism over time.
The results are also consistent with the anthropomorphic paradigm of
Buechel et al. (
2016), which views firms as agents capable of emotive expression. This approach is particularly relevant when firms strive to humanize their brand in the digital domain by creating narratives that are consistent with the values and emotions of their stakeholders (
Brown & Fiorella, 2013). Consequently, sentiment analysis of corporate descriptions serves as a strategic instrument for assessing the adequacy of communication tactics in meeting stakeholder expectations and preferences, as well as a litmus test for the emotional tone that a firm has built.
Furthermore, the research suggests that firms with higher sentiment scores, which transmit a more positive tone, may have a competitive edge in terms of stakeholder appeal and their corporate image. This is in line with the reputation management paradigm, which states that sentiment affects the impressions of stakeholders that are communicated through corporate communication (
Seiffert-Brockmann et al., 2021).
The descriptions of Croatian enterprises that present positive sentiments of trust and anticipation could be interpreted as content strategies that are designed to engage and convert audiences (
Kostić & Šarenac, 2020). The same authors discuss the ethical concerns, underscoring the dangers of disinformation and the responsibility of communication professionals to maintain ethical standards. Our results indicate that the potential for authentic and positive sentiments may function as signals of ethical positioning, particularly in firms whose narratives explicitly address challenges and risks, as observed in the blue cluster.
Moreover, the companies in the blue and red may be using websites to keep stakeholders informed, fostering a relationship built on transparency and dialog (
Cerioni, 2021), which explains a more careful use of positive sentiments and the presence of negative sentiments in their descriptions. This transparency allows for open communication about the issues a company wants to solve or problematic aspects of its operations. Since these companies are among Croatia’s 100 most successful, it is reasonable to assume that the stakeholders recognize the value conveyed through such strategic communication.
Thus, the application of sentiment analysis to corporate descriptions enhances scholarly discourse and provides practical insights for strategic communication practitioners. It enables organizations to align their narratives with the emotional expectations of stakeholders, thus strengthening business reputation and promoting stakeholder engagement in the dynamic digital context.
Importantly, labeling these narratives as strategic does not imply uniform intentional design of emotional cues but reflects the fact that mission-, vision-, and value-oriented texts are institutionally recognized instruments of strategic communication within which affective framing routinely occurs.
Regarding future research, the study has set the baseline for comparisons to other companies and additional explorations of the relationship between tangible outcomes, such as customer loyalty, investor confidence, and market performance, and the sentiments represented in corporate descriptions. This empirical evidence will substantiate the strategic application of sentiment analysis in corporate communication and strategic planning (
Chen et al., 2023). Furthermore, cross-cultural and cross-linguistic analysis should be implemented in future studies to improve the understanding of the ways in which sentiments in corporate narratives shift across a variety of markets and stakeholder groups.
To translate the analytical contribution of this study into a form that is actionable for both researchers and practitioners, we outline a repeatable analytical pipeline that operationalizes sentiment–emotion profiling for auditing and benchmarking corporate website narratives:
Scoping and inventory: define target pages/sections (About, Mission/Vision/Values, ESG/IR pages), languages, and cadence; capture a pre-change baseline.
Data capture and versioning: crawl target URLs, store HTML + rendered text with timestamps and hashes; archive diffs to track copy changes.
Preprocessing: detect language, normalize encoding, remove boilerplate, sentence-segment, and, when needed, machine-translate to a single analysis language while retaining originals for back-checks.
Analytics: combine lexicon sentiment (AFINN/Bing/NRC) with transformer-based classifiers; extract topics/keywords (e.g., ESG, governance, innovation), readability, and “jargon density”; compute Trust, Anticipation, Transparency indices (normalized emotion scores), a Consistency Score across languages/channels, and Volatility (week-over-week sentiment change).
Profiling and benchmarking: cluster firms (k-means) into narrative profiles, position each firm against sector/region percentiles, and flag outliers (e.g., high positivity + low transparency).
Governance linkage: tag copy to Directive 2022/2464 or ESG themes; surface “issue-addressing” passages to support risk communication and reduce greenwashing risk.
Action loop: deliver a dashboard with alerts and editorial guidance; re-run after edits to quantify lift.
Validation and ethics: human review samples, inter-rater checks, event-style backtests for listed firms; log data provenance for reproducibility.
Methodologically, the proposed pipeline translates our profiling framework into a reproducible workflow suitable for longitudinal and cross-national research designs, allowing practitioners and scholars to trace the evolution of companies’ own narratives under varying regulatory and market conditions. Conceptually, it specifies measurable constructs (e.g., Trust, Anticipation, Transparency, Consistency, Volatility) that future studies can validate against independent indicators of governance quality, stakeholder outcomes, and firm performance, thereby motivating the concluding implications.