1. Introduction
Technological developments and the integration of social media have transformed the media ecosystem, reshaping the processes of collection, production, and dissemination of news. While these changes offer new opportunities, they have also introduced challenges, such as the absence of specific standards for assessing the quality of online news. Journalistic principles such as impartiality and integrity are often compromised, particularly in smaller news organizations, where sales and advertising pressures, political agendas, and powerful media ownership interfere with editorial independence [
1,
2]. News editors frequently overlook ethical standards, prioritizing market-driven goals or clickbait techniques, such as sensational headlines to increase page views and audience engagement metrics, thus undermining the credibility of the media over time.
Given these challenges, researchers and media professionals are exploring how AI can support high-quality journalism in the digital era. This research presents an innovative approach by integrating expert-based quality indicators, explainable machine learning and iterative design, specifically tailored to the Greek media ecosystem. Unlike other tools that rely on general linguistic or grammatical correction, IQJournalism provides interpretable recommendations aligned with editorial needs.
More specifically, this study introduces IQJournalism, an AI-powered intelligent advisor designed to guide journalists in improving the overall quality of articles and predicting social media success. IQJournalism adopts a mixed-methods research approach [
3], combining semi-structured in-depth interviews with big data analysis to identify factors that improve the quality and engagement of journalistic content. Oriented by definitions of journalism as a medium for informing and empowering citizens [
4,
5], the project emphasizes key quality parameters, including subject matter, linguistic attributes, writing style, and audience reception, to support journalists in creating credible and impactful news. Leveraging also Natural Language Processing (NLP) and user-centered design principles, the system provides real-time feedback on various dimensions, including language quality, subjectivity level, emotionality, and entertainment. By fostering readability and engagement, IQJournalism improves the dissemination of news articles, discourages clickbait practices, and offers applications in various fields.
This paper makes several theoretical and practical contributions. Theoretically, it proposes a framework for defining journalistic quality based on expert knowledge and validated through machine learning. It contributes to the fields of journalism studies, AI, and human–computer interaction by demonstrating how explainable AI can be aligned with professional principles in content production. Practically, the study offers a system that helps editors by supporting content improvement in both journalism and other content areas. In particular, IQJournalism is designed to support various editorial roles throughout the content production workflow. Journalists can take advantage of the tool to improve language clarity and assess levels of subjectivity and emotionality. Audience editors and social media specialists can also benefit by understanding how their content influences engagement on digital platforms. Beyond journalism, the platform can assist writers in fields such as marketing and political communication who aim to produce credible and engaging content. The adaptability of the platform to different editorial contexts and its user-friendly interface further underscore its practical utility.
The study addresses a clear research gap: while AI tools to support writing are proliferating, few are based on quality definitions specific to the media field. This research fills this gap by offering a system adapted to the Greek journalistic context, with the potential for broader adaptation to other contexts. The main objectives of this work are to identify journalistic quality indicators defined by experts and transform them into measurable characteristics, to develop and evaluate machine learning models that predict article quality and social media engagement, to design and test a user-centered interface for editorial support, and to demonstrate how AI tools can improve the quality and credibility of online news.
The expected findings include the validation of a theoretically quality framework using high-performance machine learning models, the identification of key predictive features of both quality and engagement, as well as strong usability results from user testing. Overall, these results demonstrate that IQJournalism can provide meaningful feedback to journalists and editors, improving both content quality and audience trust in the digital age.
Background Work
Advances in AI have profoundly transformed journalism, improving multiple stages of the news production and distribution process. As noted by Simon [
6], common applications of AI systems in news organizations include supporting journalists in accessing and observing information through tools for discovery, audience analysis, story detection, and idea generation. In selection and filtering, AI tools help with fact-checking, content categorization, automated data collection, transcription, and translation, improving accuracy and efficiency. In editing and processing, AI assists in brainstorming, drafting, editing, content reformatting, and SEO optimization. Finally, in publishing and distribution, AI enhances content personalization and paywall management, thus promoting greater audience engagement.
According to Verma [
7], these automations not only improve efficiency but also free journalists to focus on more complex aspects of their work. The integration of AI is seen as a strategic development in journalistic practices allowing for deeper research by enhancing human characteristics such as insight, empathy, and investigative rigor. This shift marks a key development in the media sector, where AI technologies, rather than replacing editorial judgment, ultimately enhance the quality of journalism.
In recent years, intelligent authoring tools and natural language generators have incorporated sophisticated methods, such as the integration of Large Language Models (LLMs), that enable authors to generate content based on given instructions. Significant advancements in LLMs, such as Chat-GPT (
https://chatgpt.com/ (accessed on 29 April 2025)), Gemini (
https://gemini.google.com/app (accessed on 29 April 2025)), and Claude (
https://claude.ai/ (accessed on 29 April 2025)), along with their widespread incorporation into everyday products, highlights their potential as powerful authoring assistants [
8].
Based on these advancements, we conducted a thorough examination of various online editors and writing assistants to identify the specific needs and challenges journalists encounter when composing, editing, and revising text guided by system recommendations, as well as when organizing text files and folders. More specifically, we studied the functionality of the widely used application “Grammarly” (
https://www.grammarly.com/ (accessed on 29 April 2025)). This is an award-winning online language component, often used by English as a foreign language (EFL) speakers, that detects and corrects linguistic errors, including grammatical and spelling mistakes, irregular verb conjugations, inappropriate noun usage, and incorrect word choices, and detects instances of plagiarism [
9]. The system operates through an interconnected network powered by AI techniques, such as machine learning, deep learning, and NLP. It is important to note that Fitria’s [
9] descriptive qualitative research demonstrated a significant improvement in students’ writing ability, with test performance scores rising from 34 to 77 out of 100 after using “Grammarly.”
Additionally, “Wordcraft” is a conventional writing tool supported by LLM that assists in rewriting, summarizing and stylistic editing. The features of this application are powered by LaMDA, a neural language model trained on Google data, such as public Web pages, forum discussions, and Wikipedia content. Refinement through conversational data has produced a chatbot-style interface. The user evaluation of the application showed that “Wordcraft” enhanced engagement, proved effective, shortened the writing process, enabled the development of long-form content, promoted the incorporation of AI-generated inputs, and improved overall user satisfaction [
10].
“Wordtune” (
https://www.wordtune.com/ (accessed on 29 April 2025)) surpasses the basic text editing functions, exploiting AI to interpret the writer’s intent. It provides rewording options in either informal or formal tones, and adjusts the length of the text by either shortening or expanding it. It also has the ability to change sentence structures and replace words with synonyms, while maintaining the original context [
11]. Furthermore, a Chinese data input method designed to enhance the writing experience through real-time suggestions was presented by Dai et al. [
12]. The system provides context-appropriate suggestions, including syntactic and semantic options, which are derived from text corpus mining and use NLP techniques, such as word vector representation and the Latent Dirichlet Allocation (LDA) topic model. The study revealed the effectiveness of “WINGS” in supporting creative writing.
Alongside these developments, Stefnisson and Thue [
13] developed “Mimisbrunnur”, an interactive authoring tool that integrates AI, NLP, and a mixed-initiative framework. Specifically, the system comprises three main components: a state editor for defining the foundational facts of the narratives, an editor that controls the potential actions, and a goal editor that enables writers to define the outcomes for the story. An alternative research approach explored the use of conversational agents to support the development of fictional characters [
14]. This tool named “CharacterChat”, assists writers in defining character traits through chatbot prompts and enables editors to refine these features through dialogue. Findings from two user studies demonstrated the effectiveness of “CharacterChat” especially when inventing new fictional characters. Lastly, Osone and Ochiai [
15] introduced “BunCho,” an online writing environment, tailored for Japanese novelists, designed to stimulate creativity by exploiting GPT-2. User feedback reflected high satisfaction, with 69% of writers reporting enjoyment while using the platform to enrich their storytelling. Additional analysis also revealed that 69% of summaries created with the tool showed enhanced creativity.
This paper examines the development process of the IQJournalism system following a combination of qualitative and quantitative methods. For each phase of the system’s development, research questions and hypotheses were formulated to guide the process and confirm the outcomes, based on previous findings from journalism studies, AI-assisted writing and user-centered design. The paper begins by presenting key insights from 20 semi-structured interviews with journalism experts, identifying crucial indicators of journalistic quality. Previous research [
1,
16] has shown that accurate and informative headlines have a positive impact on readers’ perception of the credibility and quality of content, leading to H
1: Experts will argue that the decreased accuracy of the headline of a story undermines the overall quality and attractiveness of the journalistic product. Research findings [
16,
17] indicating that emotionality undermines news quality formed the following hypothesis: H
2: Experts of our research are likely to assert that the intense use of emotionally charged discourse reduces the overall quality of the journalistic product. Furthermore, the inclusion of relevant audiovisual material, such as images and videos, has been associated with increased credibility in journalism [
18], supporting H
3: Experts of this research are expected to contend that the use of audiovisual material enhances the overall quality of the journalistic product.
Following this, the paper delves into the stages of knowledge extraction, including data collection, preprocessing, transformation, data mining, and interpretation/evaluation, to extract meaningful patterns and features for model training. The extracted features served as input variables for training the machine learning models, with a focus on supervised learning approaches applied to classify article quality. Evaluation metrics of model performance are presented, along with detailed description of prediction accuracy and the final selection of the optimal model. From a computational perspective, studies support the use of linguistic and contextual features in predicting both article quality and social media engagement. This guided H4: Lower levels of subjectivity and entertainment will predict higher quality, whereas higher levels of emotional content and poor language use will predict lower quality.
The final section presents the user-centered methodology, analyzing the iterative design process of prototyping, testing, and refining the system. Drawing on the literature of user-centered design [
19,
20,
21], the study evaluates the usability and user experience of the platform through a series of targeted hypotheses: H
5: The interaction with the prototype results in a very positive user experience among the participants; H
6: The perceived usability score of all participants is higher than the standard average SUS score of 68; H
7: Participants’ NPS score is above 30, indicating a strong tendency to recommend the IQJournalism system; H
8: The distribution of participants’ scores regarding their overall experience and satisfaction with the prototype shows a central tendency toward the highest values on the Likert scales, indicating strong positive attitudes; and H
9: Most participants completed the prototype’s tasks more quickly and efficiently, (i.e., SWA < 1).
2. Materials and Methods
Despite the fact that there is a growing focus on developing solutions for educational use, writing enhancement, and even science fiction character creation [
9,
10,
11,
12,
13,
14,
15], tools that specifically support the production of quality journalistic content remain limited. The IQJournalism system came to fill this gap aiming to highlight the qualitative characteristics of news stories and their capacity to increase engagement on social media. In more detail, the objectives of this approach were as follows:
a. To detect and evaluate the impact of specific text features, such as language quality parameters, headline accuracy, emotional discourse, and audiovisual material, on the overall quality and attractiveness of news pieces.
b. To build and evaluate a machine learning model capable of predicting the quality and social media engagement of online news content, focusing on the dimensions of language quality, subjectivity, emotionality, and entertainment.
c. To implement an iterative design process that includes prototyping, testing, and redesigning to effectively address specific user needs and requirements.
Bearing in mind the aforementioned objectives, the study was carried out through three distinct methodological phases (see
Table 1). The first phase (A) involved a qualitative approach, using semi-structured, in-depth interviews with experts to identify key characteristics that influence journalistic quality and engagement. For the purpose of this qualitative study, the following research questions were formulated: Which are the main reliability characteristics of a journalistic product/report? Which are the main language quality parameters/characteristics of a journalistic text? Which factors determine the impartiality of a journalist?
The second phase (B) followed a quantitative and computational approach, where insights from the qualitative study were translated into measurable features and incorporated into machine learning models created to predict the quality and social media engagement of online news. In this phase, the following research questions were established: How can AI contribute to understanding quality standards in digital journalism? Can the model dimensions, namely Language Quality, Subjectivity, Emotionality, and Entertainment predict quality in online news? How do the individual quality criteria specifically affect the accuracy of quality prediction? Is it possible for a machine learning model to reliably predict the engagement of news content on social media?
The third phase (C) focused on iterative design, where prototyping, testing, and redesigning processes were employed to improve the system, ensuring alignment with the requirements and expectations of users [
22]. Based on this inventive framework, the IQJournalism system aims to become an essential tool customized to editorial needs, assisting them to craft quality and engaging content.
2.1. Phase A: Qualitative Research
Our initial step toward achieving our main objective was to conduct, for the first time in Greece, semi-structured in-depth interviews [
23] (p. 183), [
24], [
25] (p. 756), [
26] (p. 156), [
27]. The interviews were conducted between 13 May 2022 and 13 July 2022 and the total number of interviews was 20: there were 16 journalists with significant experience in the media field, focusing on structuring journalistic discourse within a framework of reliability, quality, and impartiality, and 4 Greek academic researchers specializing in communication and journalism. The interviewees were selected through purposive sampling to ensure a diverse representation of professional backgrounds and media roles. Interviews were conducted via videoconference and consent was obtained from all participants after being informed of the study’s purpose and confidentiality protocols. To minimize potential bias, the interview protocol included open-ended, neutrally worded questions. Thematic analysis was employed to analyze data [
28,
29], [
30] (p. 40) and was cross-checked by multiple researchers. Data response saturation occurred at the 18th interview and 2 further interviews were conducted for final confirmation.
The research hypotheses (Hs) proposed for Phase A of the study were as follows:
H1. Experts will argue that the decreased accuracy of the headline of a story undermines the overall quality and attractiveness of the journalistic product.
H2. Experts of our research are likely to assert that the intense use of emotionally charged discourse reduces the overall quality of the journalistic product.
H3. Experts of this research are expected to contend that the use of audiovisual material enhances the overall quality of the journalistic product.
Thematic analysis of interviews with journalism experts highlighted key aspects of quality in journalism. The experts emphasized that credibility relies on core journalistic principles, including effective questioning techniques and the use of reliable, diverse sources. They also highlighted the importance of investigative journalism, which faces challenges in the online media landscape [
31]. Quality also involves correct language use, effective source management, and informative lead paragraphs. Experts largely disapprove capital letters in news articles, finding them distracting, but support bullet points for clarity.
Moreover, according to the experts, the ideal article length varies based on news type and coverage depth, with some preferring concise texts to retain reader attention. Regarding impartiality, although considered a challenge due to inherent biases, experts agree that journalists should present all perspectives, even when expressing personal opinions. Based on experts’ responses, headlines should accurately reflect the article’s content, supporting the hypothesis that accuracy outweighs emotional appeal [
16,
17]. Experts also affirmed the value of audiovisual material in digital journalism, provided it is relevant to the text, aligning with the third hypothesis (H
3) on its integral role in quality news content.
The insights provided by the experts on structuring journalistic discourse within a framework of qualitative and engaging journalistic writing were substantial and significant. However, a deeper analysis of the findings from this qualitative research method would be beyond the scope of this paper [
32].
The preliminary qualitative research findings, which emerged from the experts’ interviews, described several key issues: credibility, diversity of opinions and sources, language quality, text characteristics such as punctuation and article length, impartiality, importance of the headline, emotionality, and the role of accompanying audiovisual material. These main themes formed the basis for developing perceived quality indicators. In other words, experts’ insights on crafting high-quality, engaging journalistic articles were used as input data to train the machine learning model in Phase B of the study.
2.2. Phase B: Computational Model Development and Training
As mentioned in the previous research phase, the findings of the qualitative research, which were derived from the interviews with the experts, identified key topics and characteristics that formed the foundation for the development of the quality indicators. These indicators were then used as input features for training the machine learning model in phase B of the study. This phase, therefore, examines the various steps involved in the computational approach, with the aim of building machine learning models able to predict the quality and engagement of online social media news.
2.2.1. Data Collection, Preprocessing, and Feature Extraction
Initially, in the second phase of the study, a text analysis was performed on a dataset of over 10 million Greek news articles published on news websites from the year 2021 to 2023. Specifically, the content was sourced from 7.359 Greek news websites, with articles selected from the following categories: Economy, Politics, International, Sports, Technology, Culture, Society, News, Health, Tourism, and Lifestyle. After tagging a representative sample of publications and analyzing all sites, approximately 2.5 million articles were selected. Before using the corpus, Python 3 (
https://www.python.org/ (accessed on 29 April 2025)) techniques were used to preprocess the dataset by removing stopwords, symbols, nonstandard words, NaN values and HTML code from the texts. After the data cleaning stage, 902.133 unique texts of articles from high-quality news websites and 607.704 unique texts from tabloid websites remained.
To generate advanced features aligned with the theoretical framework, a range of text analysis methods, NLP libraries, and Python packages were applied. Techniques such as tokenization, stemming, lemmatization, and part-of-speech tagging were employed along with multiple lexicons which required the original form of words. In certain cases, the raw text was used instead of the preprocessed text to identify adjectives or evaluate the level of readability.
For sentiment analysis, we chose the dictionary method and used translated versions of emotion and subjectivity lexicons. Specifically, to assess subjectivity, we used the Multi-perspective Question Answering (MPQA) subjectivity lexicon by Wilson, Wiebe, and Hoffmann [
33], freely available for research purposes. To measure emotionality, we utilized three established dictionaries: the NRC Word-Emotion Association Lexicon (Emolex), with 14.182 words associated with 8 basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) at 2 levels (1 = associated, 0 = not associated) and 2 polarities (positive and negative); the National Research Council Canada Valence, Arousal and Dominance Lexicon (NRC-VAD) [
34], containing 20.000 words labeled with scores for Valence, Arousal, and Dominance; and the National Research Council Canada Affect Intensity Lexicon (NRC-AIL) [
35], with 5.815 terms rated by intensity for 4 basic emotions, anger, fear, joy, sadness [
36,
37].
The dependent categorical variable was determined by annotators: articles receiving at least two positive responses to the quality question were assigned a score of 1, while the rest were assigned a score of 0. Independent variables for the model were selected and operationalized based on the theoretical framework dimensions, which, according to the literature, define journalistic quality (refer to
Table 2 for measurement details). To examine general attributes across a large corpus, we employed a computer-assisted text analysis methodology, using concept measurement.
The primary objective of the research was to explore, verify, and evaluate the theoretical framework of quality in journalistic discourse, derived from the previous phases of the project and from the study of the literature, rather than simply constructing an efficient classifier, which could be achieved by simpler methods.
2.2.2. Machine Learning Model Development
To develop the machine learning models, we implemented multiple classification algorithms using the Python library scikit-learn (
https://scikit-learn.org/stable/ (accessed on 29 April 2025)). Specifically, we employed seven distinct models, i.e., Naive Bayes, K-Nearest Neighbors, Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and XGBoost classifier, which leverage gradient boosting to refine predictions by adding new trees to the previous model. The selection of these models aimed to cover a diverse set of classification algorithms, ranging from simple methods to more complex ones. This variety ensures a balance between model performance, interpretability, and computational efficiency. Such diversity allowed us to evaluate their suitability for predicting news quality and engagement. For model training and evaluation, the dataset was divided, with 80% of the articles used for training and the remaining 20% for testing. To evaluate the different classification methods, the weighted average F-measure (F1) was applied, which is derived from the average of accuracy and recall.
The process of text mining refers to the application of methods to extract and analyze patterns and insights from unstructured textual content. Initially, three techniques were used for the baseline models, Bag of Words (bigrams and trigrams), TF-IDF (Term Frequency-Inverse Document Frequency), and the BERT language model. More specifically, the Bag of Words model was employed to transform each text into a numerical representation, where the frequency of each word was encoded into a vector. These numerical vectors were then used as input features for the machine learning algorithms.
To investigate our basic hypothesis that the specific features of a news item are related to quality, a corpus of newspapers was used as input for machine learning models. These models were trained on the proposed features to predict the quality of the article. After the initial modeling, a series of analyses follows to interpret the machine learning models in order to make the hidden decision-making mechanisms apparent and to assess the importance of each dimension in more detail. In the stage of model explanation (Explainable AI), all available tools in Python language were used, i.e., SHAP (
https://shap.readthedocs.io/en/latest/generated/shap.Explanation.html (accessed on 29 April 2025)), LIME (
https://c3.ai/glossary/data-science/lime-local-interpretable-model-agnostic-explanations/#:~:text=LIME%2C%20the%20acronym%20for%20local,to%20explain%20each%20indivi (accessed on 29 April 2025)), TreeInterpreter (
https://github.com/andosa/treeinterpreter (accessed on 29 April 2025)), Eli5 (
https://eli5.readthedocs.io/en/latest/overview.html (accessed on 29 April 2025)), and DTreeViz (
https://github.com/parrt/dtreeviz (accessed on 29 April 2025)).
The dataset of online news stories was also used as input for the machine learning algorithms in order to test our hypothesis that specific inherent features of a news item can influence its engagement on social media. To create the dependent variable about successful and unsuccessful articles on Facebook, we separated the data into two groups, where the low one consisted of articles belonging to the 5th percentile and had a value of 0 and the high consisted of articles belonging to the 95th and had a value of 1. We believed that in order to discover the hidden patterns behind engaging news content, we should examine the really popular ones in contrast with the completely irrelevant ones in terms of engagement. Furthermore, we created four models, each one predicting a different engagement metric, namely Likes, Shares, Comments, and the number of Total Interactions, which included the emojis of Love, Care, Haha, Wow, and so on. The independent variables (see
Table 3) represented various engagement and quality indicators drawn from the literature selected for their potential to reflect audience engagement with news content on Facebook. From the 7.359 websites, after tagging a representative sample of posts and analyzing all websites in relation to user engagement with Facebook and Twitter posts, over 5 million articles were selected.
The engagement analysis relied on tree-based models, as previous studies [
53,
54] have demonstrated that such classifiers offer in-depth interpretations of the model predictions. Accordingly, a binary classification task was conducted using three models from the scikit-learn Python library: Decision Tree, Random Forest, and XGBoost. The dataset was split into 80% for training and the remaining 20% for testing. The F-measure (F1) was selected as the primary accuracy method. Finally, several explainability techniques were also employed to extract rules associated with audience engagement.
At this stage of the research, the following research hypothesis was proposed:
H4. Lower levels of subjectivity and entertainment will predict higher quality, whereas higher levels of emotional content and poor language use will predict lower quality. The accuracy of the quality classification models is presented in Table 4 and the best models are XGBoost classifier and Random Forest. The preferred classifier was able to correctly classify 85% of the news articles into high- or low-quality categories based on the identified attributes. The other models achieved F1 scores ranging from 80% to 85%. To find the most important features, we used the ELI5 Python library for “Inspecting Black-Box Estimators” to get the permutation importance (Figure 1). Therefore, the accuracy of the models indicates that the theoretical quality framework is effective in Greek articles. Specifically, as shown in
Figure 1, readability is the primary predictor for the model. The concept of readability refers to the overall sum of all the features of a text and their interactions that affect whether a reader will read it successfully. The degree to which a text is successfully read lies in the extent to which readers understand it, retain the information taken in from it, read it at a satisfactory speed, and find it interesting [
55].
Additionally, in the ranking, it appears that emotions play a particularly important role in prediction, with Anger intensity, Joy intensity, Sadness intensity, and Trust ranking high. The number of adjectives and the presence of capitals in the body of the article are also important. Finally, the number of celebrities appearing in the article seems to play a large role in prediction.
To improve interpretability of the predictions, we analyzed the graphical representation of a Decision Tree from the XGBoost classifier to propose rules related to journalistic quality. However, extracting all rules from a Decision Tree can become increasingly challenging as the tree expands, leading to greater complexity and reduced interpretability. Furthermore, since constraints along a single path are conjunctive and different paths may give conflicting information, additional post-processing is often required to improve the original rule set. To address this, we incorporated measures of confidence and opted for the “most important” rules. Therefore, we selected the leaf nodes with a high probability to belong in class 0 (low quality) or class 1 (high quality), having a large number of samples (20% of the total training samples) at the tree construction phase, where at the same time, the misclassification error during the test phase for that specific leaf nodes remains very low (<0.1). Using Python’s DTreeViz library [
56], we visualized the algorithm’s decision-making process at the leaf nodes, and based on the above preconditions for node selection, this approach supports confident generalization of the extracted rules.
The results identify six key rules for news discrimination based on nine important subdimensions of the framework. According to these rules (see
Table 5), a high-quality article tends to be difficult to read, suitable for people with at least a third-grade education or age 17 or older, and has words of fewer than six characters on average. In addition, such articles occasionally convey positive emotions, with the number of adjectives varying according to other characteristics of the text, such as length, references to famous people, or references to crimes, accidents, or conflicts. For journalistic articles that are more readable—understandable by people without a high school education or under 17 years old—the prediction model considers them to be of high quality if less than 21% of the words in the articles express joy. In addition, in shorter texts (less than 243 words) adjectives should make up more than 15% of the total words, while longer texts should contain words that convey emotions such as anger or confidence.
For the engagement prediction classification models, we conducted three experiments, each targeting a different variable: Likes, Shares, and Comments. Using three classifiers, the XGBoost model demonstrated the highest accuracy, achieving an F1-score of 91% for predicting Likes (see
Table 6). The Information Quality dimension emerged as the most important, with the number of words in the headline constantly ranking as the strongest predictor across all three engagement metrics. Other key attributes, such as readability, article length, and diversity, also ranked high in the permutation importance table. Furthermore, all four dimensions of the framework contributed meaningfully to the model’s performance. Emotionality indicators like anticipation, dominance, arousal, and valence proved influential, Subjectivity within the article body and the Facebook headline played a role in engagement prediction, and finally, from the Entertainment dimension, the presence of famous people was important. We observe that the significance of some features changes depending on the target variable, which is to be expected according to the literature [
57]. Therefore, the number of celebrities mentioned in the article is the strongest predictor for Comments and Likes but has no significant effect on Shares. Additionally, the emotion of dominance is crucial for Likes, fear is only relevant for Likes, and positivity is relevant for Comments.
An examination of the five most significant decision paths revealed a set of rules which, if applied by professional journalists prior to publishing, could enhance engagement levels on Facebook. In general, the basic rule is this: a story with a high likelihood of engagement tends to have an average word length of up to five characters, is easy to understand, includes positive language, and is made up of less than 11% adjectives. In addition, the story should express trust and include references to crimes, conflicts, or accidents. If the article refers to famous people, evoking emotions such as fear or excitement is likely to improve engagement.
2.3. Phase C: User-Centered Design
The development of the intelligent advisor IQJournalism followed a design thinking approach, initially focusing on a deep understanding of the user’s specific requirements. This user-centered methodology then incorporated an iterative design process involving prototyping, testing, and redesigning [
58]. The design and development steps followed in creating the IQJournalism system are depicted in the
Figure 2 below.
The process began with conducting user research, in our case with journalists, in order to identify the needs and challenges they face during the writing process. These needs mainly concern the composition, editing, and revision of the text, based on system-generated suggestions, along with organizing documents and managing files. Given that journalists often rely on similar writing tools, our research also included an analysis of popular online writing assistants to understand common functionalities and user expectations. Most of the authoring tools and AI assistants we evaluated shared common features, such as a minimalist design to support focused and interrupted writing, AI feedback displayed to the right of the text, and color-coded suggestions.
The above research resulted in the prototype development of the intelligent text editor. More specifically, during the ideation stage, we developed various design alternatives for the tool’s Homepage and User Dashboard using Drawio (
https://www.drawio.com/ (accessed on 29 April 2025)). As shown in
Figure 3, the text editing interface is centered, with a main menu on the left sidebar offering essential functions like saving and printing. The right sidebar provides text quality improvement suggestions in terms of “Language quality”, “Subjectivity”, “Emotionality”, and “Entertainment,” displayed through color progress bars for visual assessment. Users can access more details by clicking the “See details” hypertext beneath each category. In the foreground, the “Text Preferences” pop-up window appears, where users at the beginning set article parameters such as the “Genre”, “Medium”, “Length of the text”, and multimedia inclusion of “Photographs” and “Videos”. This window is accessible anytime via the “Edit Preferences” button.
This preliminary concept was tested with a focus group, gathering valuable user insights that helped refine the design prototype for greater usability [
59]. The focus group method facilitated group interaction, allowing us to adapt questions and gather important observations on the design of the system. This session included 10 Media Studies postgraduate students (8 female, 2 male, aged 23–40), all of whom had prior experience with professional text editing tools. After reviewing the designs, participants expressed preferences and suggested improvements for features such as a login button, navigation menu, color palette, IQ mode button, and performance score display.
The next step involved refining the initial designs based on focus group feedback and creating an interactive functional prototype using Figma (
https://www.figma.com/ (accessed on 29 April 2025)). This prototype allowed users to engage with the system’s computational layer, which assesses an article’s perceived quality. The prototype included several key interfaces: a Homepage featuring a “Start Writing” prompt, top menu, and user sign-up options; a User Dashboard for file management and document creation; and a Document Page, where users can adjust preferences, review system feedback through the IQ Mode button, and make necessary changes. The following diagram (
Figure 4) depicts a visual representation of the system’s entities. It provides a clear overview of the components, along with the interactions and actions available to the user.
Indicatively,
Figure 5 illustrates the main interface of the IQJournalism platform, which includes two sidebars that host tools and information. These sidebars can be displayed or hidden by pressing the IQJournalism button. The left sidebar (Menu) allows users to perform a variety of actions, such as creating new files, uploading or downloading text, printing documents, editing their profile, and accessing informational and help material related to the application. The right sidebar (IQJournalism) displays detailed analysis results provided by the platform. These results include assessments of the text’s language quality, subjectivity, emotionality, and entertainment value. At the top-right corner of the screen, the platform presents an overall performance score for the article, along with a button labeled “Edit Preferences” for defining or updating the article’s attributes.
To evaluate the IQJournalism prototype’s user experience, we conducted a moderated desirability (light) usability study over two weeks in May 2023 at the Department of Communication and Media Studies. For the purpose of the usability study, five research hypotheses were formulated:
H5. The interaction with the prototype results in a very positive user experience among the participants [19]. H6. The perceived usability score of all participants is higher than the standard average SUS score of 68 [20]. H7. Participants’ NPS score is above 30, indicating a strong tendency to recommend the IQJournalism system [21]. H8. The distribution of participants’ scores regarding their overall experience and satisfaction with the prototype shows a central tendency toward the highest values on the Likert scales, indicating strong positive attitudes.
H9. Most participants completed the prototype’s tasks more quickly and efficiently, (i.e., SWA < 1).
The study involved 20 postgraduate students (18 female, 2 male) aged 20–35, experienced in journalism and familiar with online editing tools. Each participant, guided by one or two moderators, performed situation-specific tasks designed to provide both implicit and explicit feedback. Participants completed three tasks: uploading a text file to assess its predicted performance, modifying article preferences to observe changes in performance scores, and improving the language quality score before downloading the file. Task performance was recorded, including timing and assistance levels. To capture usability and desirability, we used several assessment approaches: the User Experience Questionnaire (UEQ), System Usability Scale (SUS), Product Reaction Cards, perceived satisfaction items, Net Promoter Score (NPS), and open-ended questions. These tools allowed us to gather insights into user satisfaction and acceptance, combining qualitative feedback and quantitative measures like NPS, which indicated participants’ likelihood of recommending IQJournalism.
Participants also completed a Google Forms survey that documented demographic information, experience with web-based editors, and expertise at editing and authoring tasks. This valuable feedback highlighted the prototype’s strengths in visual design, ease of use, and usefulness, alongside areas for further refinement. The final stage in the user-centered iterative process involved refining IQJournalism’s design and functionality driven by user insights and testing. Iterative improvements enhanced the tool’s usability, performance, and overall user experience, leading to a fully functional version. During this stage, a fully operational software system was built and incorporated into the text editor, integrating machine learning models and additional functionalities described in Phase B of the methodology.
3. Results
The evaluation of IQJournalism focused on user experience, usability, satisfaction, and user performance when interacting with the IQJournalism prototype.
3.1. User Experience
Using the UEQ focusing on Pragmatic quality, which measures task-related aspects such as efficiency and ease of use, as well as Hedonic Quality, which captures the system’s appeal, enjoyment, and originality, participants rated IQJournalism highly on both conditions with Cronbach’s alpha reliability scores above 0.7 (
= 0.72 for the Pragmatic and
= 0.80 for the Hedonic qualities, respectively). According to the UEQ benchmarking framework [
19], the results were as follows: Pragmatic quality scored 1.81 (“Excellent”), Hedonic quality scored 1.12 (“Above Average”), and overall experience scored 1.47 (“Good”). Additionally, using Product Reaction Cards, positive attributes like “Easy to use” and “Friendly” were described by 80% and 70% of users, respectively, 55% of users described it as “Clean” and 50% as “Efficient”, while few selected negative terms like “Sterile” (5%) or “Inconsistent,” (5%). All these scores indicated strong user acceptance, simultaneously confirming H
5.
3.2. SUS and NPS
The SUS baseline score of 82.5 indicated high perceived usability, with a standard deviation of 7.8, well above the threshold of 68 (above average [
60]). These results showed that participants’ interactions with the prototype tasks positively influenced their perceptions of usability, thus supporting the research hypothesis related to the usability of the system (H
6). The NPS score of 40 (Mdn = 8.5, IQR = 7.75) showed a high likelihood of recommendation and consequently confirmation of the seventh hypothesis. Based on the scale categories outlined by Reichheld [
61], among the 20 participants, 5 (25%) rated the system a 10 and another 5 (25%) rated it a 9 (categorized as Promoters); conversely, 2 (10%) participants fell into the category of Detractors (scoring 5 and 6). Among the users categorized as Passives, 8 participants (40%) provided scores of 7 and 8. These responses showed a generally positive attitude towards IQJournalism, indicating a potential for favorable word-of-mouth among peers and future users of the proposed system [
62]. Given the established strong positive correlation between SUS and NPS (r = 0.61) [
62], the high perceived usability score seems to align with the observed NPS. However, a higher NPS score closer to 70 [
63] might have been expected. This difference can be attributed to the limitations of the prototype used in the evaluation. Compared to a fully developed version, the first version of the interactive prototype may have lacked certain improvements, such as improved response times and clearer task guidance, which could have influenced users’ perceptions during the evaluation.
3.3. Satisfaction
Regarding users’ perceived satisfaction and performance, the findings revealed a generally positive consensus. Participants’ responses were concentrated toward the higher end of the scale (i.e., 5, 6, and 7), indicating strong positive attitudes. In response to the question, “How easy is it to deal with IQJournalism for doing your job?”, the participants provided positive feedback (M = 6, SD = 1.02), with 20% selecting 5 on the scale, 60% selecting 6 and 35% selecting 7. For the statement “Using the data from the various system’s views (e.g., articles’ preferences, predicted performance), I can self-reflect and get a good understanding of my performance”, 85% of the respondents expressed general agreement (M = 5.3, SD = 1.08). Additionally, 90% of the participants provided a positive response to the question “How would you rate your overall satisfaction with the IQJournalism prototype?” (M = 5.65, SD = 1.03). As regards the negatively worded element “When I interact with the various views of IQJournalism, (e.g., data entry, dashboard, preferences) for accomplishing my tasks, I usually feel uncomfortable and emotionally loaded (i.e., stressed-out/overwhelmed)” 85% of participants disagreed as anticipated (M = 2.05, SD = 1.35). The combination of the above findings supports the acceptance of hypothesis H8. Participants found the system to be engaging and easy to use, demonstrating an awareness of their performance while interacting with the various features of the prototype. Moreover, users reported not feeling stressed or overwhelmed during their interactions with the system and expressed overall satisfaction with the experience.
3.4. Task Performance
During the study, users were asked to perform three primary tasks, capturing SWA, completion time, and open-ended feedback. To test H
9, which states that most participants would perform faster and more efficiently (i.e., SWA < 1) when interacting with the prototype, participants were split into two groups: Group A included those who completed all three tasks with minimal assistance (i.e., SWA < 1) and Group B included participants who required more support during the tasks (i.e., SWA ≥ 1). Overall, a positive correlation was observed between assists and task completion time. Specifically, strong positive correlations were noted for tasks 1 and 2, r = 0.71 and r = 0.65, respectively, and a weak positive correlation for task 3, r = 0.15. This indicates that Group A (the majority of participants) completed the tasks more quickly than those in Group B. Task-specific results are presented in
Figure 6, including the Coefficient of Variation (CV) for each group to better illustrate the relative variability in performance.
For task 1, study participants were asked to upload a text file on IQJournalism and review its predicted performance. Prior to this, they read the article to gain an overview of its genre and content. Group A users (65%, CV = 34.8%) completed the task with an average time of M = 01:10, SD = 00:24, and had a mean SWA score M = 0, SD = 0, indicating no significant assistance from the moderators. In contrast, for Group B users (35%, CV = 39.6%), the mean time of completion was M = 02:30, SD = 00:59 and the average SWA score was M = 1, SD = 0, indicating they needed more support. Therefore, users who did not receive assistance completed this task significantly faster. Based on the corresponding qualitative feedback, 11 out of 20 participants described the interaction as easy and user-friendly, noting that the system’s output was clear and understandable. However, 4 participants reported difficulty locating the IQ Mode button quickly.
For Task 2, users were asked to edit the article’s preferences and observe the resulting changes in the system’s output. Group A users (65%, CV = 59.3%) completed this task within an average time of M = 00:44, SD = 00:26, and average SWA score of M = 0, SD = 0. On the contrary, for Group B (35%, CV = 87.9%), the average completion time was M = 01:29, SD = 01:18 and the average SWA score was M = 1, SD = 0.4. Also, users with low SWA scores were able to perform faster during the execution of task 2. In their open-ended responses, users stated that the task was easy to complete and that the changes in performance scores were clear. However, 6 participants (N = 20) mentioned that they had difficulty in locating the “Edit Preferences” button.
For task 3, study participants were asked to edit the article in order to improve the language quality score, download the article, and sign out. In detail, Group A (55%, CV = 61.8%) completed task 3 within an average time of M = 02:07, SD = 01:19. The average SWA score for Group A was M = 0.4, SD = 0.2. For Group B (45%, CV = 24.9%), the average completion time was M = 02:08, SD = 00:32 and the average SWA score was M = 1.1, SD = 0.17. In contrast to tasks 1 and 2, task 3 showed a weak correlation between completion time and SWA (r = 0.15), and users took longer to complete the task, irrespective of the level of support received. According to the qualitative feedback, participants faced some challenges: many users expected more guidance by the system, failed to notice the suggestions for improving the language quality score, and had difficulty in locating the download button. Despite these challenges, users in Group A completed the the task nearly as quickly as those in Group B, even without significant assistance. After implementing iterative adjustments and improvements based on the findings from the usability study, the fully functional version of the system will be developed. The IQJournalism system integrates a comprehensive software solution, incorporating machine learning algorithms alongside complementary features into the text editor.
4. Theoretical and Practical Implications
IQJournalism provides a robust framework and a proof-of-concept tool that can effectively support newsrooms in navigating the challenges of digital journalism. By leveraging AI, it offers practical assistance for producing high-quality, engaging content and contributes to the evolving landscape of media technology.
This study contributes to the fields of journalism studies, AI, and human–computer interaction by demonstrating a novel approach to predicting and enhancing journalistic quality and audience engagement in the digital age. By integrating qualitative insights from journalism experts with quantitative methods including machine learning and user-centered design, IQJournalism offers significant theoretical and practical implications. More specifically, the methodology, which began with semi-structured interviews with 20 experts (16 journalists and 4 academic researchers) in Greece, provides a theoretically grounded framework for defining perceived quality in online news content within the Greek context. Key indicators identified by experts, such as credibility, relying on source diversity and investigative journalism, correct language use, effective source management, informative lead paragraphs, appropriate article length, and the importance of headline accuracy over emotional appeal, were validated. The experts also affirmed the value of relevant audiovisual material.
By transforming these expert insights into measurable features, the study empirically tested and validated this theoretical framework using machine learning models. The high F1 scores achieved by models like XGBoost and Random Forest (up to 85% for quality classification) indicate that the identified features are effective predictors of perceived journalistic quality. This confirms that machine learning, guided by expert knowledge, can indeed contribute to understanding and operationalizing quality standards in digital journalism. The analysis also highlighted the distinct contributions of various linguistic and contextual factors to quality prediction, such as readability, emotional intensity (anger, joy, trust), the number of adjectives, the presence of celebrities, and the use of capitals.
Furthermore, the study provides theoretical insights into the factors driving social media engagement with news content. The machine learning models, particularly XGBoost, successfully predicted engagement metrics like Likes, Shares, and Comments with high accuracy (e.g., 91% F1 score for Likes). The identification of features like title length, emotionality (anticipation, dominance, arousal, valence, positivity, fear), subjectivity levels (in the article body and Facebook headline), readability, article length, diversity, and the presence of celebrities as significant predictors contribute to the understanding of how content attributes influence audience interaction on platforms like Facebook. The finding that the importance of features varies depending on the specific engagement metric (Likes, Shares, Comments) aligns with existing literature and provides nuanced insights into audience behavior.
The application of a user-centered, iterative design process (prototyping, testing, redesigning) in developing the IQJournalism platform also offers theoretical contributions to the field of human–computer interaction within the context of AI tools for creative and professional tasks. Evaluating the prototype using metrics like UEQ, SUS, and NPS and task performance analysis provided empirical evidence supporting the effectiveness and user acceptance of this design approach for developing intelligent authoring tools.
Additionally, the developed IQJournalism platform serves as a practical intelligent advisor for journalists and editors. By providing real-time editing recommendations based on dimensions such as language quality, subjectivity, emotionality, entertainment, and social media engagement, the tool empowers media professionals to enhance the overall quality and impact of their writing. It can support various roles within a newsroom, including journalists refining their text, audience editors, and social media specialists optimizing content for digital platforms. This support for readability and engagement can help discourage clickbait practices and contribute to more reliable and engaging online news content.
The system’s design, featuring a user-friendly interface with sidebars displaying analysis results through color progress bars and overall performance scores, provides clear visual feedback. The ability to define and adjust article preferences allows users to tailor the AI analysis to different genres, mediums, and publishing contexts (e.g., adapting content for Facebook or Twitter). Beyond traditional journalism, the platform has practical applications for optimizing content quality and impact in other fields, like marketing, political, and strategic communication. This highlights the broader utility of AI-powered writing assistance across various content creation industries.
To maximize the benefit of this methodology, the findings from the user evaluation during Phase C suggest key areas for practical improvement and future development. Users expressed a strong need for more detailed feedback and clear identification of specific text passages needing improvement. Incorporating features for tracking changes and offering auto-generation capabilities, such as keyword and synonym suggestions, and recommendations for enriching articles with media (photos, videos) are practical steps to enhance the tool’s utility. The request for always-on spelling, grammar, and syntax checks, as well as a text reading time indicator, points to incorporating standard editing features alongside the AI quality analysis. Finally, exploring integration options, such as developing the tool as an add-on for widely used platforms like Word or Wordpress, could significantly increase its accessibility and adoption by professionals.
5. Limitations and Future Research
IQJournalism, while offering a novel approach to enhancing journalistic quality through AI, has certain limitations that also highlight key areas for future research and development.
One of the most important limitations is that the machine learning models were developed and trained exclusively based on a dataset of Greek news articles. Consequently, the system’s linguistic adaptability is restricted to the Greek language and context. The intelligent advisor’s recommendations are tailored to the specific nuances of Greek journalistic practices and are not directly applicable to other languages. This limitation points to a clear and necessary direction for future work: to extend the capabilities of the system to other languages. This would involve creating new datasets, adapting NLP techniques, and potentially refining the quality indicators based on linguistic and cultural differences in journalism across various regions.
Regarding the methodology and samples, the user-centered design phase involved evaluating an interactive prototype rather than a fully functional system. While this approach was valuable for gathering initial user feedback and refining the design, the evaluation results, particularly the NPS score, suggested that the prototype may have lacked certain improvements present in a potential final version, such as faster response times or clearer task guidance.
This indicates that a future study evaluating the fully functional system with a broader range of participants, potentially including seasoned professional journalists beyond postgraduate students, would be valuable to fully assess its usability and impact in a real-world newsroom setting. The current usability study used postgraduate students with journalism/editing experience, which provided useful insights but may not capture the full spectrum of needs and workflows of experienced professionals.
Furthermore, user feedback during the prototype evaluation highlighted several areas for practical improvement and future feature development. To maximize the benefit of the IQJournalism methodology and platform, future research and development should focus on: (a) Providing more detailed feedback and clearer identification of specific text passages requiring improvement; (b) Implementing auto-generation features, such as keyword and synonym suggestions; (c) Offering recommendations for enriching articles with media (photos, videos etc.); (d) Incorporating always-on spelling, grammar, and syntax checks; (e) Adding a text reading time indicator; and (f) Exploring integration options, such as developing the tool as an add-on for widely used platforms like Word or Wordpress.
These suggested features, grounded on the user needs and the evaluation of the prototype, represent practical steps for enhancing the tool’s utility and facilitating its adoption in newsrooms and other content creation environments. Implementing these features in the fully operational system constitutes a significant portion of the future research and development agenda, aimed at creating a truly comprehensive intelligent assistant for journalists.
6. Conclusions
IQJournalism showcases the potential of integrating journalism principles with machine learning techniques and user-centered design to enhance the quality and engagement of journalistic content. The qualitative phase (Phase A) validated the fundamental hypothesis that journalistic quality is multifaceted. Experts agreed on the importance of correct language use, the inclusion of essential information (what, where, who, when, and why), the presence of sources within the article, and the representation of diverse points of view or information (pluralism), reflecting findings from previous studies on media trust [
16,
17,
18]. The confirmation of hypotheses H
1, H
2, and H
3 related to the accuracy of the headline [
1,
16], the moderation of emotions [
16,
17], and the importance of audiovisual material [
18] suggests that traditional values continue to guide the perceptions of quality by experts.
In Phase B, the development and testing of machine learning models confirmed that expert-defined features were strong predictors of both perceived quality and social media engagement. The performance of models such as XGBoost and Random Forest (F1 score up to 85%) not only validates the effectiveness of the proposed framework, but also demonstrates the possibility of predicting quality using linguistic and contextual factors. In addition to the research question that investigated whether machine learning models could accurately predict social media engagement, the prediction experiments focused on specific metrics, i.e., Likes, Shares, and Comments, and XGBoost achieved an F1 score of 91% for Likes. Title length, emotionality, and references to celebrities emerged as significant predictors. For instance, positivity enhanced Comments, while fear and dominance were linked to Likes. These findings indicated the importance of tailoring content to audience preferences while maintaining journalistic integrity.
The final phase of the methodology (Phase C) focused on iterative design and usability testing, which produced strong user satisfaction metrics. Overall, the usability evaluation of the IQJournalism platform verified its effectiveness and user satisfaction, aligning with the proposed hypotheses (H
5 [
19], H
6 [
20], H
7 [
21], H
8, and H
9). Importantly, qualitative user feedback revealed both enthusiasm and reservations. While the interface and visual scoring elements of the tool were appreciated, participants requested deeper feedback, better change tracking, and broader editing capabilities, suggesting user trust depends not only on technical accuracy but also on how clearly the tool demonstrates its usefulness and how well it integrates into workflows. It is worth noting that users’ requests for features such as real-time grammar checks, keyword suggestions, and integration with platforms such as Word or WordPress reveal user expectations shaped by mainstream writing tools.
The progress of the interactive prototype was driven by iterative adjustments based on the feedback from the usability study, ensuring improvements in usability and user experience. The user-centered design process proved essential to the refinement of the platform, allowing for the incorporation of analysis features. This iterative approach transformed the prototype into a fully functional tool that effectively meets its objectives. Specifically, the final implementation of IQJournalism combines machine learning insights with user-friendly design, offering journalists a tool to enhance content quality and audience engagement. The ability to customize features for different publishing contexts allows for flexibility and adaptability, making the platform a useful service for newsrooms facing the challenges of digital journalism. The project not only contributes to the state of journalism technology but also provides a framework for future research and development in the media sector.