Next Article in Journal
A Robust Hybrid CNN–LSTM Model for Predicting Student Academic Performance
Previous Article in Journal
A Survey on Digital Trust: Towards a Validated Definition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mapping the Digital Media Landscape in Bulgaria: Analysis of Web Publications

by
Plamen Hristov Milev
Department of Information Technologies and Communications, University of National and World Economy, 1700 Sofia, Bulgaria
Digital 2025, 5(2), 15; https://doi.org/10.3390/digital5020015
Submission received: 3 April 2025 / Revised: 11 May 2025 / Accepted: 13 May 2025 / Published: 15 May 2025
(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)

Abstract

:
This study explores the thematic structure and editorial focus of the digital media landscape in Bulgaria by analyzing one year of online news publications from eight major media outlets. The data were collected through a custom-built web scraping application developed in Java, which enabled the automated extraction and processing of full-text articles from publicly accessible news websites. The structured dataset, generated during the scraping process, records word-level occurrences in both article titles and bodies, along with publication dates and URLs. By applying lexical frequency analysis and temporal tracking, this study identifies the most frequently used words and platform-specific usage patterns. The findings reveal clear distinctions in editorial focus between public broadcasters, private national media, and international outlets. Additionally, the analysis highlights how title construction and word prominence vary depending on platform type and media strategy. This study demonstrates the potential of web scraping and computational text analysis as scalable tools for investigating media systems in small and transitional democracies.

1. Introduction

The exponential growth of digital content and the transformation of traditional media into dynamic, multi-platform ecosystems have fundamentally reshaped the way societies produce, distribute, and consume information. In this rapidly evolving context, understanding the thematic structure and editorial dynamics of digital media has become crucial for researchers, policymakers, and media professionals alike. Understanding how media platforms construct thematic focus and editorial priorities has long been a central concern in communication research. Foundational theories such as agenda-setting and framing have highlighted the influence of media in shaping public perception and political discourse [1]. These concepts remain relevant in the digital era, where editorial choices are increasingly mediated by algorithms, audience metrics, and content dynamics. The comparative model of media systems developed by Hallin and Mancini provides a useful lens for comparing editorial strategies, particularly in countries like Bulgaria that exhibit features of the polarized pluralist model [2]. In the Bulgarian context, the digital media landscape has undergone complex transformations in recent years, particularly regarding pluralism and access to information [3]. As news increasingly migrates to online environments, questions emerge about how media platforms prioritize topics, frame narratives, and balance domestic with international coverage. Over the past decade, scholars have proposed various approaches to studying digital news production, ranging from qualitative discourse analysis to large-scale computational techniques. Recent advancements in computational methods for text classification and sentiment analysis, particularly Bayesian- and Gaussian-based approaches, have further enhanced the analytical capabilities in digital media research. Bayesian classifiers, including Naive Bayes models, have been widely adopted due to their robustness, simplicity, and effectiveness in managing uncertainty within textual data, significantly improving accuracy in various classification tasks [4]. Moreover, Gaussian techniques offer promising alternatives for term weighting and sentiment analysis, providing novel ways to represent textual data more precisely compared to traditional frequency-based metrics [5,6]. Recent advances in automated narrative framing analysis, such as the identification of conflicts, heroes, and villains in news stories, further demonstrate the potential of computational methods to uncover deeper editorial structures [7]. While qualitative methods offer depth, they often struggle with scalability and temporal coverage. In contrast, data-driven approaches such as natural language processing (NLP) and web scraping enable the automated analysis of large volumes of unstructured media data, offering new opportunities for real-time and longitudinal media research [8,9,10]. Recent studies emphasize the growing relevance of AI-assisted web scraping for digital media analysis, particularly when combined with natural language inference and sentiment classification frameworks [8,11]. Scientific repositories, enriched through web scraping, can substantially improve literature searches and bibliometric analyses [12,13]. However, despite the increasing availability of computational tools, much of the existing research has focused on dominant international outlets or English-language media, often overlooking smaller national media systems or emerging democracies. Moreover, questions remain about how different types of media—public, private, international—diverge in their editorial agendas and how they respond to political, social, and geopolitical developments. Some studies suggest convergence toward a common global news agenda, while others highlight persistent fragmentation and national editorial priorities [14,15]. This study seeks to contribute to this ongoing debate by analyzing the content of eight major Bulgarian media outlets—encompassing public broadcasters, private national platforms, and internationally affiliated sources—using a combination of web scraping and NLP techniques. The aim is to uncover thematic trends, identify differences in editorial focus, and better understand how digital media in a smaller European country navigates both local and global information flows. The research highlights clear distinctions in how different platforms address topics such as national governance, elections, international conflicts, and geopolitical alliances. By focusing on word frequency and title emphasis, this study offers a granular look into the structure of digital media content over the course of one year (April 2024–March 2025). Based on the described objectives, this study is guided by the following research questions:
  • What are the most frequently used words in online news publications across Bulgaria’s leading digital media platforms?
  • How do patterns in lexical frequency reflect editorial focus across public, private, and international media?
  • What do these patterns suggest about framing strategies, media pluralism, and editorial convergence or divergence within Bulgaria’s digital media ecosystem?
These findings underscore the importance of media intelligence approaches in studying news ecosystems and offer empirical insights that are relevant not only to media studies but also to political science, communication, and digital sociology.

2. Materials and Methods

To collect and process the dataset used in this study, a custom web scraping Java desktop application was developed. Although alternative approaches such as Gaussian-based kernels [6] and Bayesian text classifiers [4,16] are available and have demonstrated strong performance in classification tasks, the current study employs a Java-based custom web scraping and lexical frequency analysis approach due to its effectiveness for large-scale structured data extraction and its alignment with the specific research objectives of this study, namely capturing comprehensive lexical dynamics across multiple news platforms. Our methodological approach aligns with recent computational frameworks that utilize narrative mapping techniques to systematically identify and analyze media framing patterns [17]. The system combines JavaFX for user interface management and embedded browser functionality with jsoup for robust HTML parsing and content extraction. Web scraping has proven highly effective in various fields, including analyzing consumer sentiment in tourism [15], gathering structured data for film rating analytics [18], decentralizing and verifying news sources using blockchain technology [19], and extracting key phrases from web content [20]. The application targets the following eight major Bulgarian media websites: Bulgaria ON AIR, Bloomberg TV Bulgaria, BNR, BNT, bTV, DW (Bulgarian service), Eurocom, and Nova TV, which were selected for their broad audience reach and continuous digital publishing activity. The application employs the JavaFX components WebView and WebEngine to simulate a lightweight browser capable of loading both static and dynamically generated web content [21]. The internal logic and interaction between the system’s components are illustrated in Figure 1.
Upon launching, the application creates a stage and displays a scene containing a WebView, which loads the specified media URL. The associated WebEngine executes the page rendering and, where needed, runs JavaScript scripts to ensure that dynamically generated elements are fully accessible. Once the webpage is fully loaded, its HTML content is extracted. This raw HTML is then passed to the jsoup parser, which systematically analyzes and extracts key article components, such as the title, publication date, main body text, and page metadata [22]. A key characteristic of the system is that it performs a full-text lexical analysis of every article during runtime. Rather than relying on a predefined set of keywords, the application extracts all words found in both the article title and body. For each word encountered, the system records whether it appears in the title, in the text, or in both. In addition, the structured output includes the publication date and the URL of each article. No full-text articles were stored after processing. Once the analytical extraction was complete, only the structured dataset—containing word occurrences and associated metadata—was retained. This design ensures efficient data handling and minimizes long-term storage requirements, while enabling the generation of statistically sound lexical datasets. This data model also ensures compliance with web usage guidelines. Duplicate detection was handled using unique text hashes, and only canonical articles were retained. In cases where parsing failed due to layout inconsistencies, error logs were maintained and problematic entries excluded from analysis. The final dataset includes only verified entries with valid URLs, ensuring consistent quality. Ethical compliance was ensured by scraping only publicly available content without bypassing paywalls or authentication. All targeted platforms were monitored to ensure adherence to their terms of service. Additionally, full article texts were not retained after analysis, limiting potential copyright or privacy concerns. The resulting structured dataset served as the foundation for determining the most frequently occurring words over the one-year period (April 2024–March 2025) and supported further analyses on temporal trends and editorial focus across media platforms. This approach aligns with contemporary practices in computational content analysis, where full-text data are used to extract structured insights at scale [23]. The selection of JavaFX and jsoup for developing the custom web scraping application was guided by several key considerations. JavaFX provides a robust graphical user interface and efficient integration with Java-based backend components, allowing the seamless creation of desktop applications that support user-friendly operation and real-time monitoring of the scraping process. On the other hand, jsoup was chosen due to its ease of use, effective parsing of HTML content, and reliable extraction capabilities even with dynamically loaded content. Alternative scraping tools and languages such as Python-based Scrapy and Beautiful Soup were initially considered [24,25]. However, Python solutions, while widely adopted and powerful, generally require additional components to manage desktop integration and user interfaces effectively. Furthermore, Scrapy, despite its powerful automation features, often requires more complex initial configuration and has limitations in straightforward desktop application embedding. Consequently, the chosen Java-based approach provided an optimal balance between usability, integration, maintainability, and performance tailored to the specific research needs of systematically analyzing multiple online media platforms. To align with recent methodological best practices, we also considered the potential for temporal bias introduced by scraping cycles [26]. While this study did not aim to perform sentiment analysis, future extensions could incorporate frameworks, where scraping and classification are integrated [27].
The selected media platforms reflect a diversity of ownership structures, editorial profiles, and audience orientations. Bulgaria ON AIR is a national media outlet combining television broadcasting with an active online presence, covering both domestic affairs and international developments [28]. Bloomberg TV Bulgaria is a business-focused platform affiliated with the global Bloomberg network, providing financial news, economic commentary, and global market insight [29]. BNR (Bulgarian National Radio) is Bulgaria’s public radio broadcaster, offering comprehensive coverage of political, economic, and cultural developments, with an emphasis on public service content [30]. BNT (Bulgarian National Television) serves as the country’s national public television channel, delivering wide-ranging domestic news and institutional reporting [31]. bTV is a leading private television network in Bulgaria, offering mainstream news coverage with a blend of local and international focus [32]. DW (Deutsche Welle, Bulgarian edition) is the Bulgarian-language service of Germany’s international broadcaster, with a strong emphasis on foreign policy, European affairs, and global analysis [33]. Eurocom is a private Bulgarian television channel with an independent editorial stance, known for its political commentary and attention to domestic issues [34]. Nova TV is a major commercial broadcaster in Bulgaria, offering dynamic coverage of current events, with a mix of national reporting and coverage of international developments [35]. These sources were selected to ensure a representative overview of Bulgaria’s digital media environment, encompassing both public and private, national and international perspectives.
One notable methodological limitation of this study stems from the decision not to store the full-text articles after the initial processing. While this choice effectively addresses potential privacy and copyright concerns, it restricts deeper semantic or sentiment analyses, which would have required access to the complete original texts. To mitigate this constraint, the scraping tool was designed to store extensive metadata, including word-level occurrences in titles and texts, timestamps, and article URLs, thus providing a rich structured dataset. This approach ensures sufficient detail for robust lexical frequency analysis and temporal tracking while respecting ethical and legal considerations. However, future research aiming at more nuanced textual or sentimental analyses would benefit from storing complete or partially anonymized text data, provided that appropriate ethical and copyright clearances can be ensured. Another limitation of the analysis is the exclusion of multimedia content such as videos, images, or embedded social media posts, which increasingly constitute a substantial part of digital journalism and editorial framing. Although the sample covers eight prominent media platforms, it may not represent the entire Bulgarian digital media landscape, potentially omitting smaller, niche, or partisan outlets. Future work incorporating topic modeling, discourse analysis, and multimodal content could provide a more comprehensive picture.

3. Results

The selection of media sources included in this analysis was based on their significant influence, audience reach, and active digital presence within the Bulgarian media landscape. Specifically, the selected platforms are prominent media outlets with television and/or radio coverage, representing a mixture of public and private media, as well as local and international news providers. This approach ensures a diverse and comprehensive overview of online media content in Bulgaria. The analyzed dataset covers a period of one year, specifically from April 2024 to March 2025. This timeframe was chosen to provide an up-to-date and representative snapshot of contemporary media coverage, capturing seasonal trends, significant national and international events, and editorial dynamics across the examined platforms. The data collected from these sources are presented in the following tables and are graphically illustrated in corresponding figures to facilitate clear and insightful analysis.
Table 1 presents the total number of publications for each of the analyzed media platforms within the period from April 2024 to March 2025.
The data clearly illustrate considerable variations in the quantity of published content across the examined media platforms. The Bulgarian National Radio (BNR) recorded the highest total number of publications, amounting to 52,692, followed by Nova (44,867 publications), and the Bulgarian National Television (BNT) with 41,520 publications. These three media outlets evidently play leading roles in terms of news coverage and information dissemination within the country. Platforms such as Bulgaria ON AIR (34,213 publications) and Eurocom (32,358 publications) also demonstrate considerable activity, highlighting their significance within the Bulgarian digital media landscape. Other platforms like bTV (25,911 publications) and Bloomberg (13,842 publications) show moderate content activity. In stark contrast, Deutsche Welle (DW) published significantly fewer articles, only 2719. This disparity is likely due to DW’s specific editorial strategy focusing on selected international and socially significant topics, as opposed to local Bulgarian platforms, which tend to provide more frequent daily news coverage. These initial findings offer an essential starting point for subsequent analyses, helping to identify the dominant information sources in Bulgaria’s digital media ecosystem and enabling the formulation of hypotheses regarding editorial policies and target audiences of the various media platforms.
Table 2 presents the 20 most frequently occurring words in publication titles, along with their occurrences in the main body of the publication texts, during the analyzed period (April 2024–March 2025).
Figure 2 provides a graphical representation of the data shown in Table 2 in the context of the top 20 most frequent words in publication titles.
The analysis of the most frequently used words in publication titles highlights key thematic focuses of the studied media platforms. The prominence of geographic references such as “Bulgaria” (9008 occurrences) and “Sofia” (7776 occurrences) underlines a strong emphasis on domestic news and local events. Frequent mentions of countries and geopolitical terms including “Ukraine”, “USA”, “Russia”, “Israel”, “Germany”, “Greece”, “China”, “Turkey”, “France”, “Iran”, and “Romania” reflect significant editorial attention to international affairs, geopolitical tensions, and diplomatic relations. The presence of “Trump” (6374 occurrences) and thematic words such as “Elections” and “Government” indicate an ongoing focus on politics and governance issues. The occurrence of “Paris” (1662 occurrences) is likely driven by heightened media attention around the upcoming 2024 Summer Olympic Games hosted in the French capital.
Figure 3 provides a graphical representation of the data shown in Table 2 in the context of the occurrences of the top 20 title words in publication texts.
The frequency of these title words in publication texts further validates their thematic significance within media content. Words such as “Bulgaria” (55,816 occurrences) and “Sofia” (30,777 occurrences) maintain their prominence, underscoring the continuous coverage of domestic topics throughout the articles. Terms like “Government” (26,030), “Elections” (24,628), and “USA” (29,274) highlight in-depth discussions related to political and governance issues, both domestically and internationally. The extensive mentions of geopolitical entities such as “Ukraine”, “Russia”, and “Europe” demonstrate consistent editorial interest in global affairs. Lower but significant frequencies of terms such as “Paris” (7104 occurrences) further reflect sustained interest related to internationally relevant events, notably the Paris 2024 Olympics, generating deeper textual coverage beyond headlines. It is noteworthy that the ranking of words based on their frequency in publication titles does not completely align with their frequency within the publication texts. Such discrepancies suggest differences in editorial strategies between selecting attention-grabbing headlines and delivering detailed content within the articles. For instance, certain terms like “Trump”, “NATO”, and “Israel” have relatively high prominence in titles compared to their frequency in article texts, likely due to their newsworthiness and capacity to attract readers’ attention. Conversely, words such as “Government”, “Europe”, and “Bulgaria”, which have higher occurrences within texts, indicate deeper thematic coverage beyond mere headlines. This observation underscores the distinct roles played by titles (aiming to capture reader interest quickly) and article bodies (intended to provide comprehensive and detailed information).
To achieve a deeper understanding of editorial priorities and thematic preferences among individual media platforms, we conducted a detailed analysis of how frequently the top 20 identified keywords appear in each platform’s publications. This analysis reveals distinct editorial strategies, varying focuses on domestic and international topics, and potential audience targeting differences across media outlets.
Table 3 presents the distribution of the top 20 words in publications of Bulgaria ON AIR from April 2024 to March 2025.
The keyword distribution in the content of Bulgaria ON AIR indicates a strong editorial interest in both domestic and international themes, with a clear balance between local relevance and global reporting. The dominant presence of “Bulgaria” (7866) and “Sofia” (4026) confirms consistent attention to national and urban issues, likely reflecting daily political, social, and infrastructural developments. Additionally, “Government” (2760) and “Elections” (2790) highlight the channel’s sustained focus on political processes and institutional coverage. Among international topics, terms like “USA” (2896), “Ukraine” (2666), “Russia” (2290), and “Europe” (2089) appear frequently, reflecting global political tensions and regional diplomacy, likely driven by ongoing conflicts and EU affairs. This suggests a relatively even distribution of coverage between Western and Eastern geopolitical interests. Notably, words associated with Middle Eastern conflicts—such as “Israel”, “Gaza”, “Iran”, and “Turkey”—also maintain a significant presence, pointing to a comprehensive international news agenda. Coverage of “Paris” (670) and “France” (851) may be related to high-profile global events like the 2024 Olympics or EU developments. In summary, Bulgaria ON AIR demonstrates a content strategy that intertwines domestic political reporting with timely international developments, positioning itself as a news source with both local grounding and global awareness.
Table 4 presents the distribution of the top 20 words in publications of Bloomberg from April 2024 to March 2025.
The data for Bloomberg TV Bulgaria reaffirm the outlet’s international and policy-oriented focus, with minimal emphasis on local content. Terms like “USA” (7320), “Gaza” (3891), “Trump” (3742), “Europe” (3813), and “Government” (3313) are among the most prominent, reflecting deep editorial engagement with global political developments, international conflict, and leadership dynamics. Although “Bulgaria” (1341) and “Sofia” (247) are present, their relatively low frequencies suggest limited coverage of national or local topics, aligning with Bloomberg’s global positioning and target audience. This contrast is especially noticeable when compared to public broadcasters such as BNR or BNT. The strong presence of “Russia”, “Ukraine”, “China”, “Germany”, and “France” reflects focused attention on major global actors and regions relevant to economic and geopolitical analysis. The especially high number of mentions of “Gaza” (even surpassing “Trump”) indicates extensive coverage of Middle Eastern conflicts, likely viewed through the lenses of global stability, humanitarian impact, and international policy response. Topics related to elections (2368) and governance (3313) further emphasize the outlet’s interest in institutional processes, democratic systems, and global leadership transitions, which are core themes for Bloomberg’s economic–political reporting model. In summary, Bloomberg TV Bulgaria stands out for its strongly international, policy-centric editorial strategy, with limited domestic coverage but deep engagement with global developments that affect markets, security, and international governance.
Table 5 presents the distribution of the top 20 words in publications of BNR from April 2024 to March 2025.
The data for BNR confirm its profile as a public service broadcaster with a strong focus on national and institutional topics. The most frequently occurring words—“Bulgaria” (13,176), “Sofia” (8149), “Elections” (5400), and “Government” (5389)—reflect the outlet’s consistent commitment to informing the public about domestic affairs, governance, and political processes. The frequency of “Ukraine” (4625), “Russia” (3798), “USA” (5071), and “Europe” (4632) indicates substantial attention to global developments, especially those with geopolitical significance for Bulgaria and the European Union. This supports BNR’s role not only as a national broadcaster, but also as a gateway to international information for the Bulgarian audience. Mentions of “NATO” (1356), “Israel” (1941), and “Gaza” (1573) reflect coverage of international security, military alliances, and conflicts, which are themes that are both timely and relevant in the current global context. At the same time, regional countries such as Greece, Romania, and Turkey appear with relatively high frequency, confirming a regional orientation in addition to global and national reporting. The coverage of “Trump” (2467), “France” (2315), and “Germany” (2751) points to consistent attention to the political dynamics in key Western countries, likely in relation to diplomacy, European affairs, or global leadership shifts. In conclusion, BNR maintains a well-balanced editorial strategy, with a strong core of national reporting complemented by robust international coverage. The data highlight BNR’s role as a comprehensive and reliable source of both domestic and global news for Bulgarian audiences.
Table 6 presents the distribution of the top 20 words in publications of BNT from April 2024 to March 2025.
The lexical distribution for BNT, Bulgaria’s national public television broadcaster, demonstrates a balanced editorial approach, combining extensive national coverage with regular attention to international developments. The high frequency of domestic terms such as “Bulgaria” (10,294) and “Sofia” (5628) reflects the outlet’s public service mission to inform citizens about national and municipal issues. Similarly, the substantial use of “Elections” (3174) and “Government” (2992) confirms BNT’s role in reporting on institutional processes, democratic governance, and policy matters. While the international terms appear in lower absolute numbers compared to some other platforms, they still indicate solid global engagement. The presence of “USA” (3109), “Europe” (2884), “Ukraine” (2325), and “Russia” (1927) shows that BNT maintains consistent coverage of global geopolitical developments, particularly those affecting the region and Bulgaria’s international position. Mentions of “Germany” (2111), “France” (2128), and especially “Paris” (2500) suggest coverage of European Union affairs and major global events, likely including the 2024 Summer Olympics in Paris, which could explain the notable prominence of the city name. Terms such as “Israel”, “Gaza”, and “Iran” reflect coverage of Middle Eastern conflicts, albeit at a more moderate level than in international or business-focused outlets. The regional perspective is reinforced through frequent references to Greece, Turkey, and Romania. Overall, BNT’s content profile aligns with its role as a national broadcaster, informing the public on local governance and national politics, while maintaining a clear, though proportionally smaller, window to the international stage.
Table 7 presents the distribution of the top 20 words in publications of bTV from April 2024 to March 2025.
The content distribution for bTV, one of Bulgaria’s largest private national broadcasters, reveals a balanced editorial profile with a slightly stronger domestic orientation complemented by solid international reporting. High frequencies of “Bulgaria” (6542) and “Sofia” (4069) confirm consistent coverage of national and local events. The prominence of “Government” (2869) and “Elections” (2839) indicates a strong focus on political processes, public institutions, and electoral dynamics, which are key themes in national reporting. Although the frequencies for international terms are not as high as in internationally oriented platforms like Bloomberg or DW, bTV maintains a well-rounded foreign affairs section. Terms such as “USA” (2761), “Ukraine” (2093), “Russia” (1836), and “Europe” (1947) demonstrate attention to major global developments, particularly those affecting Bulgaria’s geopolitical context. Coverage of Middle Eastern issues is present but relatively modest, with “Israel”, “Gaza”, “Iran”, and “NATO” each appearing in under 1000 publications. This suggests that international conflict is covered selectively, likely based on event salience or relevance to domestic discourse. Regional focus is visible through mentions of Greece (764), Turkey (878), and Romania (654), suggesting attention to Bulgaria’s neighboring countries and EU partners. In conclusion, bTV exhibits a dual editorial focus, combining consistent reporting on national politics and institutions with contextual international coverage, without showing the extreme thematic skew typical of either public broadcasters or international news agencies.
Table 8 presents the distribution of the top 20 words in publications of DW from April 2024 to March 2025.
The data for DW, a Bulgarian service, as an international broadcaster, reflect a clearly global editorial focus, consistent with Deutsche Welle’s mission to present international perspectives to regional audiences. Domestic keywords such as “Bulgaria” (600) and “Sofia” (187) appear with relatively low frequency, confirming that DW does not prioritize local news reporting. Instead, the editorial emphasis is on broader political, global, and democratic developments. Terms such as “Government” (978), “USA” (946), “Ukraine” (827), “Russia” (835), and “Germany” (815) are among the most frequent, suggesting sustained coverage of international political relations, war and conflict, governance, and EU affairs. Germany’s presence, not surprisingly, is notable and likely influenced by DW’s country of origin and its lens on EU leadership and diplomacy. “Europe” (722) and “Elections” (674) reinforce DW’s attention to institutional and political processes across the continent. NATO, Israel, Gaza, and China also appear regularly, indicating the coverage of geopolitical flashpoints and international security. Compared to other platforms, DW places less emphasis on regional neighbors like Greece, Romania, and Turkey, likely due to its global orientation rather than Balkan-centric focus. In conclusion, DW’s Bulgarian service demonstrates a predominantly international content strategy, reporting on major geopolitical and governance topics while providing minimal coverage of Bulgarian domestic affairs—aligning fully with its role as a global public broadcaster.
Table 9 presents the distribution of the top 20 words in publications of Eurocom from April 2024 to March 2025.
The lexical profile of Eurocom, a private Bulgarian media outlet with a distinct editorial voice, shows a strong engagement with national political content, paired with considerable attention to international developments. The high number of references to “Bulgaria” (6251), “Sofia” (3417), “Government” (3865), and “Elections” (3482) clearly reflects a dominant focus on domestic politics, governance, and civic affairs. This positions Eurocom as an outlet heavily invested in political commentary and national discourse. However, Eurocom also dedicates substantial coverage to international affairs, with terms like “USA” (3736), “Ukraine” (3528), “Russia” (2990), and “Trump” (2446) appearing at high frequencies. This mix suggests an editorial strategy that combines internal political analysis with global context, particularly the international dimensions of security, diplomacy, and foreign influence. The presence of “NATO” (1027) and “Europe” (2206) reinforces Eurocom’s attention to Bulgaria’s place within regional and alliance structures, while Middle Eastern topics like “Israel”, “Gaza”, and “Iran” are present but not dominant. Mentions of “China” (1041) and “Germany” (1013) reflect interest in key global powers, while references to neighboring countries like “Turkey”, “Greece”, and “Romania” support a well-rounded editorial scope that includes regional relevance. In summary, Eurocom demonstrates a dual editorial profile that is deeply focused on domestic political life, while simultaneously maintaining a meaningful presence in international and geopolitical reporting. This combination positions the outlet as both nationally engaged and globally aware.
Table 10 presents the distribution of the top 20 words in publications of Nova from April 2024 to March 2025.
The lexical profile of Nova TV, one of Bulgaria’s major commercial broadcasters, reflects a content strategy that strongly integrates national political focus with timely international coverage. Terms like “Bulgaria” (9746), “Sofia” (5054), “Government” (3864), and “Elections” (3901) show that Nova maintains high engagement with domestic politics and governance, reflecting the editorial weight placed on informing viewers about internal developments and institutional processes. Internationally, Nova provides moderate but consistent coverage of global topics. Frequent mentions of “USA” (3435), “Ukraine” (2544), “Russia” (2015), and “Europe” (2587) suggest a broad awareness of key geopolitical developments, especially those with regional and security implications. Nova also demonstrates engagement with event-driven topics, such as “Trump” (1925) and “Paris” (906), with the latter likely driven by the 2024 Summer Olympics. Similarly, “France” (1113) and “Germany” (1319) show up with relatively high frequency, reflecting the importance of EU-centric coverage. Mentions of “Israel”, “Gaza”, “Iran”, and “NATO” confirm that Nova addresses international conflicts and alliances, though not with the intensity seen in more globally focused outlets like Bloomberg or DW. The presence of regional neighbors—“Greece”, “Turkey”, and “Romania”—further supports Nova’s commitment to reporting with both national and regional relevance. In conclusion, Nova TV offers a broad editorial mix, balancing domestic political reporting with the responsive coverage of global and regional developments. Its content reflects a mainstream media profile that is tailored to a general audience with interest in both local and international affairs.
Detailed visualizations of keyword frequency across individual media platforms are provided in Appendix A (Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7 and Figure A8). These figures complement the tabular data and illustrate the lexical focus of each outlet.
The results obtained through the quantitative and lexical analysis of web publications from eight major Bulgarian media platforms over the period April 2024–March 2025 allow for several key experimental conclusions that highlight both overarching patterns and distinctive editorial strategies. The analysis of publication volume showed significant variation among platforms, with BNR, Nova, and BNT emerging as the most prolific sources of digital content. The examination of the most frequently used words in titles and full texts revealed a consistent editorial focus on both domestic and international topics. High-frequency terms such as “Bulgaria”, “Sofia”, and “Government” underscore the media’s engagement with national affairs, while terms like “USA”, “Ukraine”, “Russia”, and “Europe” reflect the salience of global geopolitical developments in Bulgarian media discourse. Comparative analysis between word usage in titles and in article bodies revealed editorial strategies designed to capture audience attention. Certain words, including “Trump” and “NATO”, appeared more frequently in titles than in article content, likely chosen for their attention-grabbing quality, a pattern consistent with digital-era editorial practices aimed at maximizing clicks and shares. Additionally, the frequent appearance of the terms “Trump”, “USA”, and “NATO” reflects not only global political relevance but also editorial decisions about what constitutes news value across platforms. Their presence suggests a strong interest in international power dynamics and security alliances, particularly in the context of war, elections, and diplomatic shifts. This may reflect editorial alignment with geopolitical concerns or reliance on international news agencies. For instance, the elevated frequency of “Trump” corresponds to intensified media attention surrounding the United States presidential primaries, illustrating how global political events shape the editorial agenda even in national media. The disparity between term frequency in titles versus article bodies highlights potential framing strategies. In contrast, words such as “Government” and “Bulgaria” were more dominant in the texts themselves, suggesting that these topics receive more substantive treatment within the body of the articles. Other terms, such as “Paris”, spiked in connection with specific international occasions like the 2024 Summer Olympics. Finally, the distribution of keywords across individual media platforms demonstrated clear editorial distinctions. Public service broadcasters such as BNR and BNT exhibited a strong focus on domestic issues and political coverage. In contrast, internationally oriented media such as Bloomberg, DW, and Bulgaria ON AIR emphasized global political and economic developments. Private national outlets including bTV, Nova, and Eurocom adopted a more balanced approach, combining consistent local reporting with selective but substantial international coverage.
To better understand the editorial dynamics across different types of media, a comparative overview of editorial tendencies among the eight analyzed platforms is presented in Table 11.
The comparative analysis across the eight media platforms reveals distinct editorial patterns shaped by ownership structure, audience targeting, and institutional role. Public broadcasters such as BNR and BNT consistently emphasize national governance topics, reinforcing their traditional mandate for public service journalism. In contrast, private commercial outlets like Nova and bTV exhibit a more balanced mix of national and international topics, often reflecting market-driven editorial choices. International platforms such as DW and Bloomberg focus predominantly on foreign affairs, particularly economic and political developments in Europe and the United States. Hybrid platforms like Eurocom demonstrate higher levels of political commentary and domestically oriented narratives. These variations illustrate the complex interplay between media ownership, editorial policy, and the evolving digital news ecosystem in Bulgaria.

4. Discussion

The results of this study offer valuable insights into the thematic structure, editorial focus, and content distribution patterns of Bulgarian digital media. Through the application of media intelligence methodologies—including web scraping, NLP-based lexical analysis, and longitudinal observation—the research illustrates how digital news platforms respond to both national developments and global events. The findings support the initial working hypothesis that editorial priorities across media outlets are shaped by a combination of institutional role (public vs. private), audience orientation, and geopolitical context. Our findings also suggest promising directions for further research involving automated aspect extraction, as demonstrated in recent studies using GPT-based pipelines [11,27]. The observed variation in publication volume between media platforms aligns with prior studies highlighting structural differences between public service broadcasters and commercially driven outlets [8,9]. BNR and BNT, for instance, prioritize the coverage of domestic issues, which is reflected in the frequency of terms such as “Bulgaria”, “Sofia”, and “Government”. These patterns confirm their public service mandate and reinforce previous findings on the content strategies of national broadcasters. In contrast, Bloomberg and DW exhibit stronger international orientation, placing editorial emphasis on global actors and institutions, such as the “USA”, “China”, “Russia”, and “Europe”- which corresponds with their positioning as international or financially focused media. The divergence in word frequency between titles and body texts supports another hypothesis regarding editorial framing. The elevated presence of attention-grabbing words like “Trump” and “NATO” in headlines suggests a strategic approach aimed at increasing visibility and engagement, consistent with findings from recent studies on headline construction and audience targeting [10,12]. On the other hand, more contextually rich and policy-relevant terms, such as “Government”, dominate the article content, pointing to substantive coverage beyond click-oriented framing. The temporal analysis of the top keywords further reinforces the event-driven nature of news production. Fluctuations in terms such as “Elections”, “Ukraine”, and “USA” clearly correspond with political cycles, international crises, and major diplomatic developments. This observation is consistent with earlier work on media responsiveness to political agendas and global events [15,20]. Beyond confirming existing models of media behavior, this study also reveals important nuances in the Bulgarian context. Despite the global trends of digital acceleration and content homogenization, local editorial differences remain strong, shaped by institutional, political, and cultural factors. The findings support classical agenda-setting theory by highlighting how certain topics—like international politics, domestic governance, or conflict—are consistently prioritized in headlines and textual content across outlets. For instance, the frequent use of terms such as “Trump”, “USA”, and “Russia” indicates a global political framing even in national platforms, confirming the influence of transnational agenda-setting. Moreover, the variations in lexical emphasis across public vs. private media align with Hallin and Mancini’s media systems model, where public broadcasters tend to emphasize institutional discourse, while private media reflects more market-oriented framing. The presence of convergence in word usage across diverse media types also raises questions related to media pluralism and editorial homogenization. Despite differing ownership and institutional affiliations, some lexical trends appear consistent, suggesting possible pressures from digital platform logic, news agency feeds, or shared journalistic norms, which are elements central to convergence theory. Although the current research focuses explicitly on the digital media landscape in Bulgaria, its methodology and findings have broader implications and applicability for other small or transitional democracies, particularly in Central and Eastern Europe. Countries with similar media system characteristics—such as mixed ownership patterns, political polarization, and rapid digital transformation—may exhibit comparable editorial behaviors and lexical patterns in their digital media content. For instance, analyses conducted in regions such as Romania, Slovakia, or Hungary could potentially yield parallel insights regarding the interplay between media ownership and editorial priorities. Conversely, variations in political systems, media regulations, and societal contexts might lead to different patterns, providing valuable comparative insights. Future research could extend this analytical framework to multiple countries, enabling a deeper comparative understanding of digital media dynamics across different political and media-system contexts. One emerging theme from the analysis concerns the editorial tensions between global homogenization and local media identity. While many platforms demonstrate convergence in keyword usage—particularly around global figures like “Trump” or geopolitical topics like “Russia” and “NATO”—significant variation also persists in the prioritization of domestic terms and local framing. This duality suggests that, while Bulgarian digital media platforms are influenced by global news cycles and international content flows, they still retain localized editorial strategies that reflect institutional mandates, audience expectations, and national context. These dynamics illustrate the ongoing negotiation between global media logic and local informational needs, a phenomenon particularly visible in transitional democracies with hybrid media systems. Additionally, recent research highlights the ongoing relevance of media capture theory when interpreting editorial trends and media autonomy in post-socialist media landscapes, which closely parallels the observed dynamics in the Bulgarian context [36]. Compared to other post-socialist countries, Bulgaria demonstrates both commonalities and distinctions. The pronounced editorial role of public broadcasters, for instance, appears more prominent in the Bulgarian case, potentially due to historical institutional continuity and relatively lower levels of media regulatory reform. These contrasts further underscore the importance of national context when interpreting digital media patterns in the region. Although this research does not focus on sentiment classification, it shares common ground with computational approaches that extract meaning from unstructured text data. Similar frameworks have been used in commercial contexts to evaluate large-scale customer feedback [37], demonstrating the generalizability of full-text analysis across domains. This suggests that, even in a digitally networked environment, national media systems retain distinct identities and strategic orientations. In a broader context, these findings contribute to the understanding of how digital media ecosystems operate in small states within the European Union. They also highlight the relevance of media intelligence tools for systematically capturing and interpreting large volumes of unstructured news data. Future research could expand this analysis by incorporating multimodal content (video transcripts, social media engagement metrics), applying sentiment analysis and clustering models at the paragraph or sentence level, or comparing results with media landscapes in other countries. Moreover, investigating the role of algorithms and audience data in shaping editorial decisions would provide further depth to the study of media dynamics in the digital age. The methodological framework demonstrated in this research could be further enriched by integrating probabilistic modeling approaches, such as Bayesian methods or Gaussian processes, which have shown promise in enhancing sentiment analysis accuracy and providing deeper insights into textual data structures [38,39]. Future studies could explore how combining lexical frequency analysis with these sophisticated probabilistic models might offer additional layers of interpretation, which are especially valuable in analyzing subtler nuances of media content and editorial framing.

5. Conclusions

This study applied a media intelligence framework to analyze the thematic structure and editorial dynamics of Bulgarian digital media through data-driven methods, including web scraping, quantitative content analysis, and natural language processing. The findings demonstrate the complex interplay between domestic and international news coverage, editorial positioning, and media typology. The analysis revealed clear distinctions between public service broadcasters, international outlets, and private national media, each exhibiting unique content patterns and thematic priorities. While public media emphasized national politics and governance, international platforms focused more heavily on geopolitical events and global actors. Private media maintained a hybrid strategy, balancing domestic relevance with responsiveness to major international developments. The results also highlight the significance of title construction in shaping reader attention and reflect the event-driven nature of editorial focus, particularly in the context of political and geopolitical cycles. These insights contribute to a deeper understanding of content strategies in the digital news ecosystem and demonstrate the utility of media intelligence approaches for systematic media monitoring and analysis. From a methodological perspective, the study illustrates the power of web scraping as a scalable and efficient technique for collecting large volumes of real-time media content directly from public news websites. This approach enables timely and flexible data acquisition, bypassing limitations of closed or incomplete archival systems, and offers strong potential for replicability and cross-country comparisons. The findings have broader implications for media professionals and researchers interested in news production and digital communication trends. By capturing variations in editorial focus and content dynamics across platforms, this study offers a valuable foundation for future research on agenda-setting, framing strategies, and the intersection between media and political environments, particularly within smaller EU member states and post-transition democracies. This research underscores the critical role of data-driven approaches in understanding how digital media systems function and evolve, while also offering practical tools and insights for navigating the increasingly complex information environment. The innovation of this study lies primarily in its integrated methodological approach, combining custom-built Java-based web scraping and structured lexical frequency analysis across multiple digital news platforms. Unlike traditional single-source analyses or manual content assessments, this method enables large-scale, automated comparative analyses with high granularity. By systematically distinguishing word occurrences in article titles and bodies, this research provides novel insights into editorial framing strategies and thematic emphases in digital media. Such a comprehensive, automated data-driven framework is not only scalable and replicable but is also adaptable to diverse media environments, making it particularly valuable for exploring media dynamics in transitional democracies and digital transformation contexts. The findings of this study have practical implications for policymakers, journalism educators, and media practitioners. Insights into editorial patterns and framing tendencies can inform discussions on media pluralism, content diversity, and editorial independence. Understanding the dynamics of lexical focus and international versus domestic topic prioritization could aid regulatory bodies in developing guidelines that encourage balanced reporting. Additionally, raising public awareness through digital literacy initiatives could empower audiences to better navigate and critically assess media content in an increasingly complex digital information environment. Future research could expand upon this framework by integrating sentiment analysis, topic modeling, or network analysis to explore relational structures among topics, actors, and media outlets. Audience engagement analysis, based on social media metrics or comment data, could offer additional insights into the public reception of editorial strategies. Comparative studies across multiple small or transitional democracies would also allow for a more systematic understanding of how media systems evolve under different socio-political conditions. Future research may also include the application of inferential statistical techniques, such as clustering or significance testing, to enrich the analytical framework and validate observed editorial patterns.

Funding

This work was financially supported by the UNWE Research Programme (Research Grant No. 22/2024/A).

Data Availability Statement

The datasets analyzed during this study are available upon request from the author.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
NLPNatural language processing
BNRBulgarian national radio
BNTBulgarian national television
DWDeutsche Welle (Bulgarian edition)

Appendix A. Keyword Distributions by Media Platform

Figure A1. Graphical representation of the distribution of top 20 words in publications of Bulgaria ON AIR (April 2024–March 2025).
Figure A1. Graphical representation of the distribution of top 20 words in publications of Bulgaria ON AIR (April 2024–March 2025).
Digital 05 00015 g0a1
Figure A2. Graphical representation of the distribution of top 20 words in publications of Bloomberg (April 2024–March 2025).
Figure A2. Graphical representation of the distribution of top 20 words in publications of Bloomberg (April 2024–March 2025).
Digital 05 00015 g0a2
Figure A3. Graphical representation of the distribution of top 20 words in publications of BNR (April 2024–March 2025).
Figure A3. Graphical representation of the distribution of top 20 words in publications of BNR (April 2024–March 2025).
Digital 05 00015 g0a3
Figure A4. Graphical representation of the distribution of top 20 words in publications of BNT (April 2024–March 2025).
Figure A4. Graphical representation of the distribution of top 20 words in publications of BNT (April 2024–March 2025).
Digital 05 00015 g0a4
Figure A5. Graphical representation of the distribution of top 20 words in publications of bTV (April 2024–March 2025).
Figure A5. Graphical representation of the distribution of top 20 words in publications of bTV (April 2024–March 2025).
Digital 05 00015 g0a5
Figure A6. Graphical representation of the distribution of top 20 words in publications of DW (April 2024–March 2025).
Figure A6. Graphical representation of the distribution of top 20 words in publications of DW (April 2024–March 2025).
Digital 05 00015 g0a6
Figure A7. Graphical representation of the distribution of top 20 words in publications of Eurocom (April 2024–March 2025).
Figure A7. Graphical representation of the distribution of top 20 words in publications of Eurocom (April 2024–March 2025).
Digital 05 00015 g0a7
Figure A8. Graphical representation of the distribution of top 20 words in publications of Nova (April 2024–March 2025).
Figure A8. Graphical representation of the distribution of top 20 words in publications of Nova (April 2024–March 2025).
Digital 05 00015 g0a8

References

  1. McCombs, M.; Shaw, D.L. The Agenda-Setting Function of Mass Media. Public Opin. Q. 1972, 36, 176–187. [Google Scholar] [CrossRef]
  2. Hallin, D.C.; Mancini, P. Comparing Media Systems: Three Models of Media and Politics; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  3. Open Society Institute. Mapping Digital Media: Bulgaria. Open Society Foundations. 2013. Available online: https://www.opensocietyfoundations.org/publications/mapping-digital-media-bulgaria (accessed on 1 April 2025).
  4. Xu, S. Bayesian Naïve Bayes Classifiers to Text Classification. J. Inf. Sci. 2016, 44, 48–59. [Google Scholar] [CrossRef]
  5. Vichianchai, V.; Kasemvilas, S. A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis. J. ICT Res. Appl. 2021, 15, 152–168. [Google Scholar] [CrossRef]
  6. Roman, I.; Mendiburu, A.; Santana, R.; Lozano, J.A. Sentiment Analysis with Genetically Evolved Gaussian Kernels. arXiv 2019, arXiv:1904.00977. [Google Scholar]
  7. Frermann, L.; Li, J.; Khanehzar, S.; Mikolajczak, G. Conflicts, Villains, Resolutions: Towards Models of Narrative Media Framing. arXiv 2023, arXiv:2306.02052. Available online: https://arxiv.org/abs/2306.02052 (accessed on 1 April 2025).
  8. Cordeiro, D.; Lopezosa, C.; Guallar, J. A Methodological Framework for AI-Driven Textual Data Analysis in Digital Media. Future Internet 2025, 17, 59. [Google Scholar] [CrossRef]
  9. Tanasescu, L.G.; Vines, A.; Bologa, A.R.; Vaida, C.A. Big Data ETL Process and Its Impact on Text Mining Analysis for Employees’ Reviews. Appl. Sci. 2022, 12, 7509. [Google Scholar] [CrossRef]
  10. Kim, E.-G.; Chun, S.-H. Analyzing Online Car Reviews Using Text Mining. Sustainability 2019, 11, 1611. [Google Scholar] [CrossRef]
  11. Shah, A.; Shah, H.; Bafna, V.; Khandor, C.; Nair, S. Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference. Eng. Appl. Artif. Intell. 2025, 147, 110284. [Google Scholar] [CrossRef]
  12. Hassanien, H.E.-D. Web Scraping Scientific Repositories for Augmented Relevant Literature Search Using CRISP-DM. Appl. Syst. Innov. 2019, 2, 37. [Google Scholar] [CrossRef]
  13. Naing, I.; Aung, S.T.; Wai, K.H.; Funabiki, N. A Reference Paper Collection System Using Web Scraping. Electronics 2024, 13, 2700. [Google Scholar] [CrossRef]
  14. Zherlitsyn, D.; Kolarov, K.; Rekova, N. Digital Transformation in the EU: Bibliometric Analysis and Digital Economy Trends Highlights. Digital 2025, 5, 1. [Google Scholar] [CrossRef]
  15. Barbera, G.; Araujo, L.; Fernandes, S. The Value of Web Data Scraping: An Application to TripAdvisor. Big Data Cogn. Comput. 2023, 7, 121. [Google Scholar] [CrossRef]
  16. Jurafsky, D.; Martin, J.H. Naive Bayes, Text Classification, and Sentiment. In Speech and Language Processing, 3rd ed.; 2025; Chapter 4; Available online: https://web.stanford.edu/~jurafsky/slp3/4.pdf (accessed on 1 April 2025).
  17. Concha Macías, S.; Norambuena, B.K. Evaluating the Ability of Computationally Extracted Narrative Maps to Encode Media Framing. arXiv 2024, arXiv:2405.02677. Available online: https://arxiv.org/abs/2405.02677 (accessed on 1 April 2025).
  18. Sarker, K.U.; Saqib, M.; Hasan, R.; Mahmood, S.; Hussain, S.; Abbas, A.; Deraman, A. A Ranking Learning Model by K-Means Clustering Technique for Web Scraped Movie Data. Computers 2022, 11, 158. [Google Scholar] [CrossRef]
  19. Alexandrescu, A.; Butincu, C.N. Decentralized News-Retrieval Architecture Using Blockchain Technology. Mathematics 2023, 11, 4542. [Google Scholar] [CrossRef]
  20. Haarman, T.; Zijlema, B.; Wiering, M. Unsupervised Keyphrase Extraction for Web Pages. Multimodal Technol. Interact. 2019, 3, 58. [Google Scholar] [CrossRef]
  21. JavaFX. Available online: https://openjfx.io/ (accessed on 1 April 2025).
  22. Jsoup: Java HTML Parser. Available online: https://jsoup.org/ (accessed on 1 April 2025).
  23. Trilling, D.; Jonkman, J.G.F. Scaling Up Content Analysis: How to Manually Annotate a Small Subsample for Supervised Machine Learning. Soc. Sci. Comput. Rev. 2018, 36, 598–614. [Google Scholar] [CrossRef]
  24. Scrapy: A Fast and Powerful Scraping and Web Crawling Framework. Available online: https://scrapy.org/ (accessed on 1 April 2025).
  25. Beautiful Soup. Available online: https://pypi.org/project/beautifulsoup4/ (accessed on 1 April 2025).
  26. Ulloa, R.; Mangold, F.; Schmidt, F.; Gilsbach, J.; Stier, S. Beyond Time Delays: How Web Scraping Distorts Measures of Online News Consumption. Commun. Methods Meas. 2025, 1–22. [Google Scholar] [CrossRef]
  27. Arboretti, R.; Barusco, M.; Barzizza, E.; Biasetton, N.; Ceccato, R. An Integrated Framework for Automated Web Scraping and Sentiment Analysis of Product Reviews. J. Mach. Intell. Data Sci. 2024, 5, 46–53. [Google Scholar] [CrossRef]
  28. Bulgaria ON AIR. Available online: https://bgonair.bg/ (accessed on 1 April 2025).
  29. Bloomberg. Available online: https://bloombergtv.bg/ (accessed on 1 April 2025).
  30. BNR News. Available online: https://bnr.bg/ (accessed on 1 April 2025).
  31. BNT News. Available online: https://bntnews.bg/ (accessed on 1 April 2025).
  32. bTV News. Available online: https://btvnovinite.bg/ (accessed on 1 April 2025).
  33. DW. Available online: https://www.dw.com/bg/ (accessed on 1 April 2025).
  34. Eurocom. Available online: https://eurocom.bg/ (accessed on 1 April 2025).
  35. Nova News. Available online: https://nova.bg/news/ (accessed on 1 April 2025).
  36. Bajomi-Lázár, P. Media Capture Theory: A Paradigm Shift? Cent. Eur. J. Commun. 2024, 17, 238–244. [Google Scholar] [CrossRef] [PubMed]
  37. Ashbaugh, L.; Zhang, Y. A Comparative Study of Sentiment Analysis on Customer Reviews Using Machine Learning and Deep Learning. Computers 2024, 13, 340. [Google Scholar] [CrossRef]
  38. Ilic, E.; Garcia Martinez, M.; Souto Pastor, M. A Review of Text Classification Models from Bayesian to Transformers. In CEUR Workshop Proceedings; 2022; Volume 3361, Paper 3; Available online: https://ceur-ws.org/Vol-3361/paper3.pdf (accessed on 1 April 2025).
  39. Cohn, T.; Preotiuc-Pietro, D.; Lawrence, N. Gaussian Processes for Natural Language Processing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Tutorials, Baltimore, MD, USA, 22 June 2014. [Google Scholar]
Figure 1. Sequence diagram showing the interaction between JavaFX components during the initialization and web content processing workflow.
Figure 1. Sequence diagram showing the interaction between JavaFX components during the initialization and web content processing workflow.
Digital 05 00015 g001
Figure 2. Graphical representation of the top 20 most frequent words in publication titles (April 2024–March 2025).
Figure 2. Graphical representation of the top 20 most frequent words in publication titles (April 2024–March 2025).
Digital 05 00015 g002
Figure 3. Graphical representation of the occurrences of the top 20 title words in publication texts (April 2024–March 2025).
Figure 3. Graphical representation of the occurrences of the top 20 title words in publication texts (April 2024–March 2025).
Digital 05 00015 g003
Table 1. Total number of publications by media platform (April 2024–March 2025).
Table 1. Total number of publications by media platform (April 2024–March 2025).
MediaWebsitePublications
Bulgaria ON AIRhttps://bgonair.bg (accessed on 1 April 2025)34,213
Bloomberghttps://bloombergtv.bg (accessed on 1 April 2025)13,842
BNRhttps://bnr.bg/ (accessed on 1 April 2025)52,692
BNThttps://bntnews.bg (accessed on 1 April 2025)41,520
bTVhttps://btvnovinite.bg (accessed on 1 April 2025)25,911
DWhttps://www.dw.com/bg/ (accessed on 1 April 2025)2719
Eurocomhttps://eurocom.bg (accessed on 1 April 2025)32,358
Novahttps://nova.bg/news (accessed on 1 April 2025)44,867
Table 2. Top 20 most frequent words in publication titles and corresponding text occurrences (April 2024–March 2025).
Table 2. Top 20 most frequent words in publication titles and corresponding text occurrences (April 2024–March 2025).
WordOccurrences in TitlesOccurrences in Text
Bulgaria900855,816
Sofia777630,777
Ukraine681821,113
Trump637415,106
USA620429,274
Elections477124,628
Russia460318,176
Israel33399480
NATO31536248
Europe308320,880
Government262926,030
Germany219811,616
Greece21007174
Gaza19986373
China19839304
Turkey18017179
Paris16627104
France14809472
Iran12944744
Romania12835855
Table 3. Distribution of top 20 words in publications of Bulgaria ON AIR (April 2024–March 2025).
Table 3. Distribution of top 20 words in publications of Bulgaria ON AIR (April 2024–March 2025).
WordPublications
Bulgaria7866
Sofia4026
Ukraine2666
Trump1362
USA2896
Elections2790
Russia2290
Israel1184
NATO635
Europe2089
Government2760
Germany959
Greece1260
Gaza733
China828
Turkey844
Paris670
France851
Iran509
Romania571
Table 4. Distribution of top 20 words in publications of Bloomberg (April 2024–March 2025).
Table 4. Distribution of top 20 words in publications of Bloomberg (April 2024–March 2025).
WordPublications
Bulgaria1341
Sofia247
Ukraine2505
Trump3742
USA7320
Elections2368
Russia2485
Israel1248
NATO740
Europe3813
Government3313
Germany1635
Greece269
Gaza3891
China921
Turkey526
Paris399
France1207
Iran980
Romania265
Table 5. Distribution of top 20 words in publications of BNR (April 2024–March 2025).
Table 5. Distribution of top 20 words in publications of BNR (April 2024–March 2025).
WordPublications
Bulgaria13,176
Sofia8149
Ukraine4625
Trump2467
USA5071
Elections5400
Russia3798
Israel1941
NATO1356
Europe4632
Government5389
Germany2751
Greece2024
Gaza1573
China1368
Turkey1560
Paris1622
France2315
Iran821
Romania1564
Table 6. Distribution of top 20 words in publications of BNT (April 2024–March 2025).
Table 6. Distribution of top 20 words in publications of BNT (April 2024–March 2025).
WordPublications
Bulgaria10,294
Sofia5628
Ukraine2325
Trump1188
USA3109
Elections3174
Russia1927
Israel1347
NATO732
Europe2884
Government2992
Germany2111
Greece1294
Gaza832
China725
Turkey1326
Paris2500
France2128
Iran590
Romania1245
Table 7. Distribution of top 20 words in publications of bTV (April 2024–March 2025).
Table 7. Distribution of top 20 words in publications of bTV (April 2024–March 2025).
WordPublications
Bulgaria6542
Sofia4069
Ukraine2093
Trump1465
USA2761
Elections2839
Russia1836
Israel993
NATO642
Europe1947
Government2869
Germany1013
Greece764
Gaza606
China608
Turkey878
Paris423
France768
Iran509
Romania654
Table 8. Distribution of top 20 words in publications of DW (April 2024–March 2025).
Table 8. Distribution of top 20 words in publications of DW (April 2024–March 2025).
WordPublications
Bulgaria600
Sofia187
Ukraine827
Trump511
USA946
Elections674
Russia835
Israel263
NATO301
Europe722
Government978
Germany815
Greece114
Gaza271
China196
Turkey223
Paris89
France246
Iran173
Romania128
Table 9. Distribution of top 20 words in publications of Eurocom (April 2024–March 2025).
Table 9. Distribution of top 20 words in publications of Eurocom (April 2024–March 2025).
WordPublications
Bulgaria6251
Sofia3417
Ukraine3528
Trump2446
USA3736
Elections3482
Russia2990
Israel1430
NATO1027
Europe2206
Government3865
Germany1013
Greece570
Gaza647
China1041
Turkey922
Paris495
France844
Iran609
Romania605
Table 10. Distribution of top 20 words in publications of Nova (April 2024–March 2025).
Table 10. Distribution of top 20 words in publications of Nova (April 2024–March 2025).
WordPublications
Bulgaria9746
Sofia5054
Ukraine2544
Trump1925
USA3435
Elections3901
Russia2015
Israel1074
NATO815
Europe2587
Government3864
Germany1319
Greece879
Gaza751
China686
Turkey900
Paris906
France1113
Iran553
Romania824
Table 11. Comparative overview of editorial tendencies across Bulgarian digital media platforms.
Table 11. Comparative overview of editorial tendencies across Bulgarian digital media platforms.
Media OutletFocus on National TopicsFocus on International TopicsPolitical Terms ProminenceGeopolitical FramingDistinctive Features
Bulgaria ON AIRModerateHighModerateHighHybrid focus
BloombergLowVery highModerateFocused on financeBusiness-centric
BNRVery highModerateStrongBalancedInstitutional tone
BNTHighModerateStrongFocused on EuropePublic broadcaster role
bTVBalancedBalancedModerateModerateGeneralist reporting
DWLowVery highModerateStrongInternational scope
EurocomVery highHighHighVariablePolitically opinionated
NovaBalancedModerateHighHighPrivate, mass appeal
Note: Categorization based on lexical frequency analysis and observed editorial topic emphases.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Milev, P.H. Mapping the Digital Media Landscape in Bulgaria: Analysis of Web Publications. Digital 2025, 5, 15. https://doi.org/10.3390/digital5020015

AMA Style

Milev PH. Mapping the Digital Media Landscape in Bulgaria: Analysis of Web Publications. Digital. 2025; 5(2):15. https://doi.org/10.3390/digital5020015

Chicago/Turabian Style

Milev, Plamen Hristov. 2025. "Mapping the Digital Media Landscape in Bulgaria: Analysis of Web Publications" Digital 5, no. 2: 15. https://doi.org/10.3390/digital5020015

APA Style

Milev, P. H. (2025). Mapping the Digital Media Landscape in Bulgaria: Analysis of Web Publications. Digital, 5(2), 15. https://doi.org/10.3390/digital5020015

Article Metrics

Back to TopTop