Next Article in Journal
On the Continuity of Efficient Solution Mappings in Portfolio Optimization
Previous Article in Journal
EdgeFormer-YOLO: A Lightweight Multi-Attention Framework for Real-Time Red-Fruit Detection in Complex Orchard Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prioritizing Generative Artificial Intelligence Co-Writing Tools in Newsrooms: A Hybrid MCDM Framework for Transparency, Stability, and Editorial Integrity

by
Fenglan Chen
1,*,
Bella Akhmedovna Bulgarova
1 and
Raman Kumar
2,3
1
Department of Mass Communications, Peoples’ Friendship University of Russia Named After Patrice Lumumba (RUDN University), Miklukho-Maklaya Str. 6, 117198 Moscow, Russia
2
Department of Mechanical and Production Engineering, Guru Nanak Dev Engineering College, Ludhiana 141006, India
3
Jadara Research Center, Jadara University, Irbid 21110, Jordan
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(23), 3791; https://doi.org/10.3390/math13233791
Submission received: 25 October 2025 / Revised: 20 November 2025 / Accepted: 24 November 2025 / Published: 26 November 2025
(This article belongs to the Section E: Applied Mathematics)

Abstract

The rapid integration of generative artificial intelligence (AI) into newsroom workflows has transformed journalistic writing. Still, selecting reliable co-writing tools remains a multi-criteria challenge as it involves technical, ethical, and economic trade-offs. This study develops a hybrid multi-criteria decision-making (MCDM) framework that integrates the Measurement of Alternatives and Ranking according to the Compromise Solution (MARCOS) model with Entropy, CRITIC, MEREC, CILOS, and Standard Deviation objective weighting methods fused through the Bonferroni operator to reduce subjectivity and enhance robustness. Nine generative AI tools, including ChatGPT, Claude, Gemini, and Copilot, were evaluated against sixteen benefit- and cost-type criteria encompassing accuracy, usability, transparency, risk, and scalability. The decision matrix was normalized and benchmarked against ideal and anti-ideal profiles. The MCDM model was validated through correlation and sensitivity analyses using Spearman’s and Kendall’s coefficients. The results indicate that Gemini and Claude achieved the highest overall performance due to superior factual accuracy, transparency, and workflow integration, while ChatGPT demonstrated high linguistic versatility. The hybrid model achieved a stability index above 0.9 across perturbation scenarios, confirming its consistency and reliability. Overall, the proposed MARCOS–objective weight framework provides a mathematically transparent and reproducible decision protocol for newsroom technology evaluation, supporting evidence-based selection of generative AI co-writing systems.

1. Introduction

Artificial Intelligence (AI) co-writing represents a hybrid mechanism of complex text production that is increasingly transformative to contemporary news production, blending machine-generated outputs with human editorial discretion. In its initial phase of adoption, commonly referred to as “robot journalism,” automated text-generating systems produced journalistic content such as financial and sports reports largely from predesigned templates, with minimal human intervention [1]. The emergence of large language models (LLMs) in recent years [2], including GPT, Claude, and Gemini, has enabled more sophisticated forms of co-writing. These include long-form news article summarization, multilingual translation, and stylistic alignment with newsroom editorial guidelines [3]. Such technological integration accelerates routine reporting tasks, thereby allowing journalists to allocate greater effort to investigative and interpretive forms of journalism [4].
Efficiency gains are particularly evident in the pursuit of faster turnaround times for breaking news and scalability across digital platforms [5]. At the same time, AI-augmented co-writing is reshaping the professional role of journalists, who are increasingly positioned as overseers of AI-generated outputs rather than as sole content producers [6]. This transformation, however, introduces critical challenges related to attribution, accountability, and algorithmic bias, highlighting the need for hybrid editorial models that preserve meaningful human oversight. Recent advances in large language models have demonstrated significant improvements in reasoning, multilingual processing, and generative fluency. Chain-of-thought guided understanding has shown enhanced factual consistency in structured scientific contexts [7], while word–phrase fusion techniques further boost text generation coherence [8]. Simultaneously, multilingual and cross-script processing has become crucial in real-world media applications, as highlighted by Eli et al. [9]. Importantly, the increasing complexity and deployment scale of LLMs have raised concerns about safety, transparency, and responsible use [10]. These advancements emphasize the need to consider comprehensive performance, usability, multilingual, and ethical criteria when evaluating generative AI co-writing tools.
This study employs multi-criteria decision-making (MCDM) methods to evaluate generative AI co-writing tools. MCDM techniques such as Measurement of Alternatives and Ranking according to Compromise Solution (MARCOS), Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), VlseKriterijumska Optimizacija I Kompromisno Resenje (Multicriteria Optimization and Compromise Solution) VIKOR, Preference Ranking Organization Method for Enrichment Evaluations (complete ranking version) PROMETHEE-II, Weighted Aggregated Sum Product Assessment (WASPAS), and Evaluation Based on Distance from Average Solution (EDAS) are mathematical frameworks used to compare alternatives across multiple conflicting criteria. These methods are particularly suited to newsroom technology evaluation, where decisions must balance accuracy, usability, transparency, ethical risk, and economic factors. By applying a hybrid MCDM approach, this study develops an analytically rigorous and reproducible protocol for selecting AI co-writing tools. This methodological foundation supports the paper’s aims and contributions, which include (i) constructing a structured decision matrix of leading generative AI tools, (ii) integrating objective weighting techniques fused through the Bonferroni operator, and (iii) implementing the MARCOS method as the primary ranking mechanism supported by comparative and consensus-based MCDM analyses.

Related Work

The incorporation of AI into newsroom environments has fundamentally transformed the delivery of journalistic practices worldwide. Empirical evidence indicates that AI applications reduce response times to breaking news events and enhance scalability. So, enabling journalists to allocate more time to investigative and analytical journalism [11].
Pathways to adoption vary considerably across regions. A comparative study of newsrooms in Africa and the Middle East highlighted that formal institutional adoption of AI tools remains limited. Usage is often confined to informal or routine tasks such as transcription, idea generation, or information retrieval [12]. Constraints on broader integration are frequently attributed to factors including cost, insufficient language coverage, cultural resistance, and entrenched professional practices. From the perspective of organizational studies, this evolution reflects a reconfiguration of newsroom roles from direct content production to supervisory oversight, necessitating new skill sets and targeted training [13]. Importantly, rather than functioning as a substitute, AI is increasingly regarded as a partner, supporting hybrid editorial models that combine human editorial judgment with automated article generation.
Generative AI tools represent the first large-scale implementation of genuinely collaborative systems for writing and content ideation, marking a shift from automation to co-writing. LLMs such as GPT, Claude, Copilot, and Gemini have been deployed for brainstorming, summarization, and alignment with editorial style guidelines [14]. These tools have been found to enhance readability, audience engagement, and multilingual accessibility [15].
Comparative studies reveal significant differences across tools in terms of accuracy, transparency, and integration with editorial systems. For example, Gemini has been shown to outperform ChatGPT-3 in certain tasks. However, ChatGPT and Microsoft 365 Copilot still provide effective external structures to support writing in both academic and journalistic domains, even though they cannot independently verify factual accuracy; responsibility for fact-checking therefore continues to rest with the user [16]. Additional determinants of newsroom adoption include licensing models, vendor support, and ecosystem integration [17].
Newsroom efficiency is being reshaped by co-writing tools. They are transforming journalistic practices from primarily individual authorship toward supervisory oversight of AI-assisted text production [18]. While this transition enhances efficiency, it simultaneously raises tensions concerning creativity, authorship, and editorial legitimacy [19]. Cross-regional studies reveal that journalists are particularly concerned with issues of misinformation, copyright infringement, and inadequate regulation [20]. Ethical frameworks, such as those advanced by UNESCO, emphasize transparency, human oversight, and editorial responsibility as essential safeguards [3]. Moreover, audience research indicates that although readers often cannot discern between AI-generated and human-authored articles, they generally prefer explicit disclosure of AI involvement [21]. Beyond functional accuracy, generative models carry discursive power. ChatGPT’s coverage of immigration narratives reveals subtle shifts in authority and representation, highlighting the cultural stakes of newsroom adoption [22].
The ethical and editorial implications of AI adoption extend well beyond considerations of cost and accuracy. In recent years, Multi-Criteria Decision Making (MCDM) methods have been increasingly applied to guide technology adoption in newsrooms [23]. To mitigate bias in evaluative processes, scholars advocate the use of objective weighting approaches [24]. Case studies demonstrate the value of these methods in assessing generative AI tools across dimensions such as editorial performance, governance risks, and scalability [25,26]. Furthermore, sensitivity analysis enhances robustness by examining how variations in weighting influence rankings [27]. By integrating efficiency with ethical imperatives, MCDM frameworks promote consensus-oriented, transparent decision-making, thereby anchoring technological innovation in journalistic values. Recent work in Digital Journalism also demonstrated how transparency cues shape audience perceptions. Reference [28] found that attributing stories to AI lowered perceptions of hostile media bias by activating a ‘machine heuristic’ that associates machines with neutrality.
Generative AI co-writing tools present a double-edged sword for newsroom efficiency, scalability, and editorial integrity. Unlike earlier waves of “robot journalism,” which were primarily defined by template-driven automation, contemporary tools such as ChatGPT, Claude, Gemini, and Copilot enable collaborative text production and alignment with editorial standards. News organizations face conflicting objectives, including maintaining content quality, ensuring user accessibility, maintaining structural consistency, and adhering to ethical standards. Existing evaluation frameworks predominantly rely on single-factor efficiency metrics, which fail to capture the multidimensional trade-offs associated with cost, transparency, bias mitigation, and integration into newsroom workflows. This gap necessitates the development of a systematic, quantitative framework that evaluates both performance and ethical and editorial considerations. MCDM provides a structured approach that converts heterogeneous criteria into concrete rankings grounded in evidence.
Although the literature on AI adoption in journalism is extensive, most studies offer descriptive accounts of use cases (e.g., automation, summarization, personalization) or ethical dilemmas (e.g., bias, hallucination, attribution), without providing reproducible evaluation frameworks. Empirical, multi-criteria comparisons of generative AI tools that simultaneously address editorial requirements, ethical risks, and economic factors remain scarce. Furthermore, the application of MCDM to media technology selection is limited, and studies rarely incorporate objective weighting methods, such as Entropy and CRiteria Importance Through Intercriteria Correlation (CRITIC), in combination assessments.
Although numerous fuzzy MCDM extensions, such as Fuzzy TOPSIS, Fuzzy VIKOR, and Fuzzy AHP, have been widely applied in decision analysis, they were not adopted in this study for two methodological reasons. First, fuzzy approaches require subjective linguistic assessments to construct membership functions, reintroducing expert-driven uncertainty into a framework intentionally designed to rely exclusively on objective data. Because one of the goals of this study is to minimize evaluator subjectivity and ensure reproducibility across diverse newsroom contexts, purely objective weighting schemes and crisp values were deemed more appropriate. Second, the dataset used in this evaluation consists of structured, numerical performance indicators without linguistic or ambiguous terms that typically necessitate fuzzy modeling.
Recent developments, such as hybrid fuzzy MCDM frameworks, have further expanded the capabilities of classical models, demonstrating the field’s rapid progression. For example, Abdulaal et al. [29] proposed a fuzzy MEREC–G–TOPSIS model for prioritizing strategic objectives in higher education, highlighting the usefulness of fuzzy extensions in settings with ambiguous expert judgments. Similarly, Akhtar et al. [30] and Akram et al. [31] demonstrated that fuzzy TOPSIS, fuzzy MARCOS, and fuzzy ELECTRE variants effectively handle linguistic uncertainty and subjective assessments in healthcare and water resource management. Cheng [32] further advanced fuzzy TOPSIS by integrating a circular T-spherical fuzzy Bonferroni mean operator for complex linguistic rating aggregation. In parallel, new geometric ranking methods, such as RADAR and fuzzy RADAR, have emerged, enabling the visual interpretation of multi-criteria performance and providing enhanced robustness in handling uncertainty [33,34]. Additional contributions, such as researcher [35,36,37,38], continue to show the versatility of fuzzy MCDM across domains, including healthcare prioritization, autonomous robotics, and industrial decision-making. These studies demonstrate the broad evolution of fuzzy and hybrid MCDM techniques, while also highlighting that their reliance on linguistic and subjective inputs differs from the objective, data-driven design adopted in the present paper.
This study aims to develop a mathematically transparent and robust framework for prioritizing generative AI co-writing tools for newsroom adoption using a hybrid MCDM approach. Since newsroom decisions involve multiple trade-offs, technical, ethical, operational, and economic, MCDM provides a structured way to compare tools under heterogeneous criteria. This study, therefore, aims to:
  • Construct a decision matrix evaluating AI co-writing tools across criteria of content quality, usability, ethics, and economics.
  • Apply hybrid objective weighting methods (Entropy, CRITIC, MEthod based on the Removal Effects of Criteria (MEREC), Criteria Importance through the Level of Supply (CILOS), and Standard Deviation) integrated through the Bonferroni operator to generate robust criteria weights.
  • Employ MARCOS as the principal ordinal ranking method and validate results comparatively against TOPSIS, VIKOR, PROMETHEE-II, WASPAS, and EDAS.
  • Enhance confidence in the rankings through correlation, sensitivity, and stability analyses.
  • Derive a consensus ranking using Borda and Copeland aggregation, thereby providing newsroom decision-makers with a reliable meta-decision tool.
This research contributes in four distinct ways:
  • Theoretical contribution: It advances a hybrid MCDM framework that integrates technical efficiency with ethical and editorial imperatives, bridging computational decision science and journalism studies.
  • Methodological contribution: The integration of Entropy–CRITIC weighting with MARCOS ranking and multiple MCDM models ensures methodological rigor and robustness.
  • Practical contribution: The decision matrix and consensus ranking provide actionable insights for newsroom managers, enabling them to select AI co-writing tools aligned with their organizational values and constraints.
  • Ethical contribution: The explicit inclusion of bias mitigation, transparency, and attribution criteria operationalizes ethical frameworks (e.g., UNESCO’s Trustworthy AI principles) into quantifiable measures, promoting the adoption of responsible AI.
Although newsroom automation has been studied from organizational and ethical perspectives, relatively little focus has been given to the mathematical optimization of tool selection under conflicting criteria. Conventional MCDM methods such as TOPSIS and VIKOR are sensitive to data scaling and criterion correlations, which can distort rankings when input data are heterogeneous. Prior research has shown that distance-based methods, such as TOPSIS, are affected by the relative scale of criteria and may produce inconsistent rankings when criteria exhibit strong correlations [39,40]. Similarly, VIKOR’s compromise ranking can shift considerably under different normalization schemes or variations in criterion relationships [41,42]. These limitations highlight the need for more stable benchmarking approaches when dealing with heterogeneous data. To overcome these issues, this study uses the MARCOS method combined with objective weighting schemes merged through the Bonferroni operator. This approach guarantees normalization invariance, extracts balanced information from both dispersion and correlation structures, and maintains numerical stability in ranking generative AI tools. This methodological improvement adds to the growing class of hybrid MCDM models that have recently exhibited enhanced discriminatory power and robustness [43,44].

2. Materials and Methods

The mathematical significance of this work lies in four aspects: (1) development of a hybrid MARCOS–objective weights–Bonferroni framework for unbiased and scale-independent multi-criteria ranking; (2) formulation of a Stability Index (SI) to measure how sensitive rankings are to changes in weights; (3) validation of consistency between methods using Spearman’s and Kendall’s correlation measures, establishing reliable rank preservation; and (4) applying consensus-based rank aggregation, where the Borda and Copeland methods combine the rankings produced by all MCDM models to generate a single unified decision vector that reflects the overall collective preference.
The methodological improvements in this study blend well-known ideas from the MCDM literature with new contributions. The use of Entropy, CRITIC, and MARCOS individually is well documented in prior research. However, the novelty of this work lies in combining them into a MARCOS–objective weights–Bonferroni framework that guarantees scale independence and minimizes subjectivity. Furthermore, the development of a Stability Index, along with the utilization of Borda–Copeland rank aggregation for consensus building across multiple MCDM methods, represents a methodological extension that enhances transparency and robustness in decision-making processes. These methodological advances distinguish this study from previous MCDM applications.

Hybrid Objective-Weighted MCDM Framework for Evaluating AI Co-Writing Tools

This process is operationalized through the hybrid MCDM workflow, as illustrated in Figure 1. The flowchart operationalizes a hybrid, objectively weighted MCDM pipeline tailored to newsroom objectives across performance, ethics, and cost. The importance of each alternative under the objective weighting schemes, Entropy, Standard Deviation, CRITIC, MEREC, and CILOS, is calculated, as these schemes capture distinct informational aspects such as dispersion, contrast/correlation, and marginal contribution [45].
Only objective weighting schemes were used in this study because newsroom evaluations of AI tools are subject to considerable variability in expert judgments, organizational preferences, and editorial cultures. Subjective weighting techniques such as AHP or fuzzy AHP rely heavily on expert pairwise comparisons, which can introduce inconsistency or bias when evaluators come from diverse backgrounds [46]. Objective weighting methods, such as Entropy, CRITIC, Standard Deviation, MEREC, and CILOS, derive weights directly from the intrinsic properties of the data, ensuring transparency, reproducibility, and independence from subjective preference inputs. This is particularly important for newsroom technology assessments, where the criteria involve heterogeneous quantitative and qualitative elements, and where an unbiased, data-driven weighting process is necessary for a credible comparison of tools.
A Bonferroni fusion is subsequently employed to combine these signals, ensuring that no single scheme is disproportionately represented and yielding a well-balanced signal vector for downstream ranking [47]. MARCOS was selected as the primary ranking method because it evaluates alternatives against explicit ideal and anti-ideal reference profiles, producing utilities that are directly interpretable. This aligns with newsroom “close-to-best” reasoning and accommodates heterogeneous criteria [48]. To mitigate dependence on a single method, rankings were further cross-compared against five other approaches, TOPSIS [49], VIKOR [50], PROMETHEE-II, WASPAS [51], and EDAS [52] thereby assessing whether divergent preference structures converge toward consistent or divergent rankings.
Step 1. Decision Matrix Formation.
The first step involves constructing the decision framework for newsroom AI tool evaluation. This framework is expressed in the form of a decision matrix, Equation (1).
M = m i j p × q
Here, m i j denotes the performance score of the i t h alternative A i with respect to the j t h criterion C j . The parameter p represents the number of alternatives (AI tools), while q denotes the number of evaluation criteria. This formulation enables the comparison of all alternatives within a unified structure.
Step 2. Classification of Criteria.
Each criterion must be categorized as either cost-oriented (where lower values are preferable) or benefit-oriented (where higher values are preferable). This is represented as Equation (2).
θ j = + 1   o r   ( B ) , if   C j   is   a   benefit-type   criterion 1   o r   ( C ) , if   C j   is   a   cost-type   criterion
This classification is crucial as it determines the normalization approach adopted in the following step.
Step 3. Normalization of the Decision Matrix.
Since criteria are often measured on heterogeneous scales, normalization is required to transform all values into a uniform scale within the range [ 0 , 1 ] .
For benefit-type criteria, Equation (3) and for cost-type criteria, Equation (4). z i j is the nnormalized value of alternative A i for criterion C j . m a x i   m i j is the maximum observed value across alternatives (benefit case), and m i n i   m i j is the minimum observed value across alternatives (cost case). These transformations map all criteria to a 0–1 comparable scale.
z i j = m i j m a x i   m i j
z i j = m i n i   m i j m i j
This transformation ensures comparability while preserving the orientation of each criterion.
Although all criteria are scored on a consistent 1–9 scale, normalization is still necessary because they include both benefit- and cost-type measures. Normalization ensures that cost-type criteria are aligned with benefit-type criteria in the correct direction and that all values are converted to a standard utility scale. This prevents confusion during the MARCOS benchmarking step, where the normalized values are compared to ideal and anti-ideal reference profiles [53,54].
Step 4. Determining the Best and Worst Profiles.
To benchmark performance, two reference profiles, the ideal and anti-ideal solutions, are appended to the normalized matrix, Equation (5). Z + is the ideal alternative (best achievable performance), Z is the anti-ideal alternative (worst achievable performance). This step benchmarks each alternative relative to best/worst reference points.
Z + = m a x i z i j , Z = m i n i     z i j
The ideal solution represents the best performance across all criteria, while the anti-ideal captures the worst observed performance.
Step 5. Determination of Objective Weights.
To minimize subjectivity and ensure reproducibility, five objective weighting schemes were applied: Entropy, Standard Deviation, CRITIC, MEREC, and CILOS. Each method captures distinct informational properties of the criteria, and their combined use provides a more balanced and robust representation of their importance. Each method captures a distinct dimension of variability: Entropy measures the distribution of values, CRITIC incorporates contrast and inter-criterion correlation, while MEREC and CILOS refine weights through marginal contribution and optimization principles, respectively.
In this study, the objective weighting methods were applied to the initial decision matrix rather than to the standardized values obtained in Step 3. This choice aligns with the theoretical basis of objective weighting techniques such as Entropy, Standard Deviation, CRITIC, MEREC, and CILOS, which are intended to capture the inherent informational properties of each criterion, specifically their natural variability, contrast strength, and ability to differentiate among alternatives. Using these methods on normalized or standardized data would change the original distribution of criterion values, reduce variability, and potentially distort the perceived importance derived from the raw data. Therefore, the original performance scores from Step 1 were used to ensure that the weights truly reflect each criterion’s actual informational contribution.
  • Entropy Method
The Entropy method measures the degree of disorder or uncertainty in each criterion. Criteria with higher variability (lower entropy) provide more discriminatory information and therefore receive higher weights [55]. This method emphasizes how uniformly or unevenly alternatives perform on each criterion [45]. p i j is the proportion of alternative A i under criterion C j , H j is the entropy value of the criterion C j , 1 H j is the degree of divergence/contrast for the criterion C j . Entropy highlights criteria with greater information variability.
p i j = m i j i = 1 p m i j
H j = λ i = 1 p p i j l n   p i j
w j ( E ) = 1 H j j = 1 q 1 H j
  • Standard Deviation Method
The Standard Deviation method assigns higher weights to criteria with greater dispersion. Unlike entropy, which considers proportional information, the SD method focuses purely on numerical spread [56]. Criteria with wider score distributions are assumed to have greater influence on distinguishing alternatives.
σ j = 1 p i = 1 p m i j m ¯ j 2
In Equation (9), m j   denotes the mean (average) value of criterion C j   across all alternatives and is used to quantify the deviation of each option from the central tendency of that criterion and σ j is standard deviation of criterion C j . Higher dispersion→higher weight.
w j ( S D ) = σ j j = 1 q σ j
  • CRITIC Method
CRITIC (CRiteria Importance Through Intercriteria Correlation) incorporates both contrast intensity (variation) and conflict (correlation between criteria). A criterion receives a higher weight when it exhibits significant variability and is weakly correlated with other criteria. Thus, CRITIC accounts for redundancy, giving more importance to independent and highly informative criteria [57].
In Equation (11), c o r r ( C j , C k ) is the correlation between criteria C j and C k . It represents the correlation coefficient between pairs of criteria, capturing the degree of redundancy and assigning lower importance to highly correlated criteria under the CRITIC method. C j is the information content of the criterion C j (includes contrast and conflict). w j is the final CRITIC weight for the criterion C j . CRITIC reduces weight for redundant criteria (highly correlated).
ρ j k = corr C j , C k
C j = σ j k = 1 q 1 ρ j k
w j ( C ) = C j j = 1 q C j
  • MEREC Method
The MEREC (MEthod based on the Removal Effects of Criteria) method evaluates how the overall performance ranking changes when each criterion is removed. A criterion is considered more important if its removal causes a significant distortion in the aggregated results. This dependency-based mechanism highlights criteria with strong structural influence in the decision model [58].
In the MEREC weighting procedure, z i j   denotes the normalized performance of alternative A i   under criterion C j , while w j   represents the initial weight assigned to criterion C j . Using these values, S i   refers to the aggregated performance score of alternatives A i when all criteria C 1 C q   are included, and S i ( j )   is the aggregated score of the same alternative after removing criterion C j . The resulting difference contributes to E j , which captures the total performance loss associated with removing criterion C j   and is computed as the sum of absolute deviations S i S i ( j ) across all alternatives. The final MEREC weight w j ( M ) is obtained by normalizing E j   relative to the sum of all E j   values. p   denotes the number of alternatives, and q denotes the number of criteria.
S i = j = 1 q w j z i j
E j = i = 1 p S i S i ( j )
w j ( M ) = E j j = 1 q E j
  • CILOS Method
CILOS (Criteria Importance through the Level of Supply) determines weights by analyzing the supply and deficiency levels of each criterion relative to an ideal benchmark. It captures how much each criterion contributes to closing or widening the gap between actual and ideal performance, emphasizing goal-oriented significance [59].
In the CILOS weighting method, m i j   represents the performance value of alternative A i   under criterion C j , and it is used to compute R j , which denotes the cumulative reciprocal supply level of criterion C j   across all alternatives, calculated as R j , Equation (17). A larger value of R j indicates that a criterion contributes more significantly to narrowing the gap between observed and ideal performance levels. The final CILOS weight for criterion C j , denoted as w j ( C I ) . p   represents the number of alternatives and q   represents the number of evaluation criteria.
R j = i = 1 p 1 m i j
w j ( C I ) = R j j = 1 q R j
Although each technique is objective, they assess importance from different angles: entropy emphasizes information utility; SD focuses on dispersion; CRITIC on independence and contrast; MEREC on structural impact; and CILOS on proximity to ideal performance. Using all five methods and combining them with the Bonferroni operator yields a composite, less biased weight vector that captures multiple aspects of criteria importance. This multi-perspective approach enhances the reliability and robustness of the final rankings.
  • Bonferroni Operator Fusion
The formulation of the Bonferroni operator is based on established work on generalized Bonferroni means in multi-criteria aggregation. Yager [60] introduced generalized Bonferroni operators that explicitly model interactions among criteria via multiplicative terms, while Beliakov et al. [61] further expanded the family of Bonferroni means and demonstrated their suitability for capturing both conjunctive and disjunctive relationships among information sources. The multiplicative interaction component of the Bonferroni operator captures the interdependence among weighting schemes, as discussed in Yager [60] and Beliakov et al. [61], thereby allowing the fused vector to incorporate consensus and reduce sensitivity to outlier weights.
The fused weight vector combines outputs from multiple schemes, Equation (19). In Equation (19), the index l   denotes the second weighting scheme included in the Bonferroni fusion operator. The interaction between weights w ( k ) and w ( l )   allows the operator to aggregate information from multiple weighting schemes while capturing their mutual agreement.
w j = 1 K k = 1 K w j ( k ) + w j ( k ) w j ( l ) 2
where K represents the number of weighting methods.
The Bonferroni operator was selected to fuse the objective weights. Unlike the simple arithmetic mean, it accounts for interaction effects between weighting schemes. The arithmetic mean treats all weight vectors as independent and averages them linearly, which may dilute the influence of highly informative criteria or amplify noise when weighting schemes diverge.
In contrast, the Bonferroni operator introduces a multiplicative interaction term, w ( k ) w ( l ) 2 , which strengthens consensus among weighting methods. When two schemes assign similarly high weights to a criterion, the interaction term raises its fused weight above what a simple average would produce. Conversely, when the schemes disagree, the multiplicative component reduces the fused value, preventing instability caused by a single outlying method.
Thus, the Bonferroni fusion provides a more robust aggregation mechanism by jointly capturing individual weight magnitudes and their mutual agreement. This enhances stability and reduces sensitivity to divergence across objective weighting techniques, resulting in a composite weight vector that better reflects a multi-perspective consensus.
Step 6. MARCOS Ranking Procedure.
The MARCOS (Measurement of Alternatives and Ranking according to Compromise Solution) approach is applied to derive the ranking [62,63].
The weighted normalized value, Equation (20), aggregate score of each alternative, Equation (21), reference points, Equation (22), relative indicators, Equation (23), and utility function, Equation (24). In the MARCOS formulation, v i j   represents the weighted normalized performance of alternative A i   under criterion C j , while S i represents the aggregated weighted score of alternative A i   across all criteria. The values S and S represent the maximum and minimum aggregated scores among all alternatives, corresponding to the ideal and anti-ideal performance levels. Based on these benchmarks, K i + represents the relative closeness of alternative A i   to the ideal solution, and K i represents its relative closeness to the anti-ideal solution. Finally, U i   represents the overall utility score of alternative A i , integrating both ideal and anti-ideal comparisons to determine its final ranking.
v i j = z i j w j
S i = j = 1 q v i j
S = m a x i S i , S = m i n i S i
K i + = S i S , K i = S i S
U i = K i + K i
Alternatives are ranked in descending order based on U i .
Step 7. Benchmarking with Other MCDM Methods.
For robustness, comparative analysis is performed using TOPSIS, VIKOR, PROMETHEE-II, WASPAS, and EDAS, applying the same fused weights. This ensures the rankings are not method-dependent. The six MCDM methods used in this study differ in how they treat distances, preferences, and compromise solutions:
MARCOS evaluates each alternative relative to both ideal and anti-ideal profiles, deriving directly interpretable utility scores. It explicitly incorporates benchmark alternatives, enabling proportional comparison to the best and worst possible cases [53,63].
TOPSIS ranks alternatives based on their distance from the ideal and anti-ideal points. It assumes monotonicity and compensatory behavior but does not explicitly quantify the utility level of each alternative [64,65].
VIKOR focuses on identifying a compromise solution using group utility and individual regret measures. It emphasizes trade-offs but requires the decision-maker to select a parameter, which introduces subjectivity [41,66,67].
PROMETHEE-II is an outranking method that uses pairwise comparisons and preference functions to generate a complete ranking. However, its results depend on the choice of preference function and thresholds [68].
WASPAS combines weighted-sum and weighted-product models to enhance ranking stability. While computationally efficient, it lacks explicit benchmarking against ideal profiles [69].
EDAS evaluates alternatives based on their distances from the average solution rather than the ideal point. It provides good discrimination but may underrepresent extreme-performing alternatives [52].
  • Rationale for Using MARCOS
MARCOS was selected as the primary ranking method because it offers explicit benchmarking through ideal/anti-ideal reference alternatives, direct interpretability of utility scores, compatibility with heterogeneous criterion scales, and strong stability under weight perturbations. Unlike distance-only (TOPSIS, EDAS), compromise-based (VIKOR), or outranking (PROMETHEE-II) methods, MARCOS preserves proportionality of alternatives to benchmark profiles, making it especially suitable for evaluating AI tools where close-to-best performance is meaningful. Comparative methods were employed for validation. However, MARCOS offers the most precise mathematical representation of newsroom decision logic.
Step 8. Correlation Analysis of MCDM Rankings.
Correlation measures are employed to assess consistency between rankings generated by different methods.
  • Spearman’s Rank Correlation
ρ = 1 6 i = 1 p R i ( k ) R i ( l ) 2 p p 2 1
  • Kendall’s Tau
τ = C D 1 2 p ( p 1 )
where C and D are the counts of concordant and discordant pairs, respectively.
Step 9. Sensitivity Analysis.
Finally, sensitivity analysis is conducted to assess stability under perturbations of the weight vector. For a selected criterion C h , Equation (27), updated weighted matrix, Equation (28), and revised utility functions, Equation (29).
w h = w h + δ , w j = w j δ w j j h w j , j = 1 q w j = 1
v i j = z i j w j , S i = j = 1 q v i j
K i + = S i S , K i = S i S , U i = K i + K i
Re-ranking based on U i provides insights into how minor perturbations influence the order of alternatives, thereby evaluating robustness.
Step 10. Rank Aggregation for Consensus Decision-Making.
To mitigate dependence on any single MCDM method, a consensus ranking was constructed by integrating the results of MARCOS (used as the primary method), TOPSIS, VIKOR, PROMETHEE-II, WASPAS, and EDAS. This approach provides a meta-decision framework that captures concordance across multiple methodologies. Two rank aggregation strategies were employed as follows:
  • Borda Count Method
Each method generates a rank vector, as shown in Equation (30).
R ( k ) = r 1 , r 2 , , r p
For each alternative A i , the Borda score is obtained by summing the ranks across all methods, as shown in Equation (31).
B i = k = 1 m r i ( k )
where m denotes the total number of MCDM methods considered. Alternatives are ordered based on the ascending values of B i , with the consensus solution corresponding to the alternative with the lowest cumulative rank.
  • Copeland’s Method
This approach employs pairwise comparisons of alternatives across all methods. For each pair A i , A j , a “win” is assigned if A i is ranked higher than A j by the majority of methods.
The Copeland score of an alternative is defined as the difference between its number of wins and losses:
C i = W i L i
A higher Copeland score indicates a stronger degree of consensus preference.
  • Final Consensus Ranking
The combined use of Borda Count (valued for its simplicity and transparency) and Copeland’s Method (recognized for its robustness to outliers and emphasis on majority preference) yields the final consensus ranking.
Step 11. Stability Index of Alternatives.
The Stability Index (SI) is a high cumulative rank metric that indicates that an alternative consistently performs among the top candidates across multiple MCDM methods. The SI provides a human-interpretable measure for assessing the persistence of rankings.
Definition 1. 
For each alternative  A i , the SI is defined in Equation (33).
S I i = { occurrences of   A i Top-k } total runs
where
Top-k denotes the set of best-performing candidates (with  k = 3  in this case, i.e., the top-3 rankings).
The numerator represents the number of times the alternative  A i  appears in the Top-k across all methods.
The denominator corresponds to the total number of evaluations (baseline case, benchmarking methods, and perturbation scenarios).

3. Generative AI Co-Writing Tools for Newsrooms: Study Design, Data, and Decision Matrix Construction

In this context, the study proposes a structured MCDM framework for evaluating generative AI co-writing solutions considered for adoption by newsrooms. Eleven alternatives were assessed: ChatGPT (CPT), Claude (CLD), Gemini (GEM), Copilot (CPL), Perplexity AI (PPX), Jasper AI (JSP), Writesonic (WRT), QuillBot/GrammarlyGO (QBG), and a custom open-source model (COS). This selection reflects the diversity of technological options available to news organizations, ranging from proprietary LLMs to open-source deployments that can be tailored to organizational needs.
The evaluated alternatives are leading generative AI co-writing solutions that vary significantly in design, capabilities, and intended newsroom applications. ChatGPT, Claude, and Gemini are large language model–based assistants optimized for broad content creation with high linguistic fluency and reasoning skills. Copilot integrates closely with Microsoft workflows, making it suitable for editorial teams using office environments. Perplexity focuses on fact-linked answering with citation retrieval, while Jasper and Writesonic target marketing-style copywriting with extensive template libraries. QuillBot/Grammarly offers paraphrasing and editing support rather than whole document generation. The custom open-source model is a deployable, locally controlled LLM that allows newsroom-specific fine-tuning but generally has more limited multilingual and reasoning capabilities. These differences justify comparing them based on heterogeneous performance, transparency, usability, and risk factors.
The taxonomy of evaluation criteria encompassed sixteen dimensions, organized into four overarching categories:
  • Content quality and accuracy (C1–C3);
  • Performance and usability (C4–C6);
  • Ethics, trust, and risk (C7–C10);
  • Economics and scalability (C11–C16).
The selection of the sixteen criteria (C1–C16) followed a structured, multi-source process combining academic evidence, newsroom practices, and AI governance recommendations. First, a targeted review of peer-reviewed studies on AI-assisted journalism and LLM-based writing tools identified recurring concerns about factual accuracy, bias, accountability, transparency, and editorial oversight. Second, reports by major news organizations and industry bodies (workflow integration notes, AI-use guidelines, and editorial quality standards) highlighted operational needs, including speed, usability, multilingual support, and tool reliability. Third, platform documentation from leading AI providers indicated performance variations that required the inclusion of criteria addressing scalability, cost, and long-term sustainability.
By synthesizing these inputs, the final set of 16 criteria was selected to capture both technical capabilities and the editorial, ethical, and organizational considerations that influence real-world newsroom adoption. This ensures that the evaluation framework reflects the practical decision-making environment faced by journalists, editors, and technology managers.
In accordance with Equation (2), each criterion was classified as either a benefit (B: higher values are preferable) or a cost (C: lower values are preferable). For instance, factual accuracy (C1) and readability (C2) are examples of benefit criteria, whereas cost per 1000 tokens (C11) represents a cost criterion. Sixteen metrics were used to define the evaluation framework.
  • C1 (B): Factual accuracy, the first criterion, assesses the correctness of generated information.
  • C2 (B): Rewriting quality, the second criterion, is measured through readability and stylistic similarity, since generated content should ideally conform to a journalistic tone.
  • C3 (B): Language coverage evaluates the number of languages supported.
  • C4 (B): Speed of generation measures the efficiency with which drafts are produced.
  • C5 (B): Integration with newsroom workflows assesses compatibility with platforms such as CMS, Word, Docs, or Slack.
  • C6 (B): Ease of use and user-friendliness evaluate the intuitiveness of the system.
  • C7 (B): Bias and fairness mitigation measures the extent to which the AI tool reduces or avoids harmful bias in generated content. Higher values indicate better performance in producing impartial, fair, and non-discriminatory outputs.
  • C8 (B): Attribution transparency assesses the tool’s ability to clearly disclose the origin of generated content, including whether text was produced by AI and which sources or references informed the output. Higher values indicate stronger transparency and traceability.
  • C9 (B): Plagiarism and originality checks.
  • C10 (B): Privacy and data security, particularly with respect to newsroom confidentiality and legal compliance.
  • C11 (C): Cost per 1000 tokens, defined as a cost criterion. It reflects the operational and licensing costs of each tool and is the only cost-related criterion in the framework. For C11, lower values are better because they indicate more affordable or cost-efficient solutions. Therefore, C11 was treated as a cost criterion during normalization (i.e., transformed so that lower cost results in higher normalized utility). This guarantees a consistent preference direction across all criteria.
  • C12 (B): Scalability, reflecting the ability to accommodate increased workloads.
  • C13 (B): Customizability, referring to flexibility and the capacity to align with newsroom-specific styles or datasets.
  • C14 (B): Uptime and stability, reflecting system reliability.
  • C15 (B): Vendor support and updates, concerning responsiveness, service quality, and product improvement.
  • C16 (B): Ecosystem and community, which covers third-party tools, documentation, and peer knowledge, thereby ensuring long-term adoption and sustainability.
For consistency and robustness, data collection was triangulated across multiple sources. First, objective values such as pricing, supported languages, and integration features were obtained from official technical documentation and product specifications. Second, empirical measures of performance, including factuality, hallucination rates, and detection reliability, were drawn from independent benchmark reports and peer-reviewed studies. Third, findings from comparative technology reviews and professional assessments by journalists were synthesized to evaluate usability, workflow integration, and stylistic alignment. Information gaps were addressed through structured interviews with five experts in journalism and AI, who rated qualitative aspects such as transparency, bias mitigation, and editorial trustworthiness. This triangulated approach ensured that both quantitative and qualitative requirements were captured in a balanced manner.
Each criterion was evaluated using a 1–9 scale (very poor to excellent), grounded in available evidence (Equation (1)). The raw scores for all alternatives were compiled into a consolidated decision matrix (Table 1). To ensure reliability, scores were cross-validated among multiple raters, and inter-rater consistency was assessed before finalization. In cases of conflicting evidence (e.g., mixed evaluations of factual reliability), median or consensus scores were adopted. When data were missing or ambiguous, conservative estimates were applied, with priority given to peer-reviewed or vendor-verified sources, thereby reducing subjective variability.
To ensure comparability across heterogeneous scales, the decision matrix was normalized (Equations (3) and (4)), and structured rankings were derived by benchmarking the normalized profiles against ideal and anti-ideal reference profiles (Equation (5)). The use of a decision matrix serves two principal purposes: first, it consolidates diverse and complex data points into a single coherent framework; second, it enables newsrooms to evaluate and compare AI tools transparently and reproducibly with respect to clearly defined criteria. This approach not only minimizes ad hoc decision-making in prioritization but also establishes a repeatable evaluation protocol. As new AI technologies emerge, this framework can be reapplied to support ongoing strategic alignment with newsroom priorities.

4. Results and Discussion

4.1. Objective Weights Computation

Five objective methods Entropy (Equations (5)–(7)), Standard Deviation (Equation (9)), CRITIC (Equations (10) and (12)), MEREC (Equations (13)–(15)), and CILOS (Equations (16) and (17)) were employed to compute criterion weights in a non-subjective, evidence-based manner. The normalized weights were subsequently aggregated using the Bonferroni operator (Equation (19)), which moderates extremes and balances between conservatism and compensation, as referred to Table 2.
The results identify transparency and attribution (C8) and customizability (C13) as the most influential factors, with cost (C11), plagiarism detection (C9), and workflow integration (C5) maintaining moderate influence. Figure 2 illustrates that C8 and C13 exert a disproportionate effect on rankings, as evidenced by their steep Pareto contributions. The prominence of C8 underscores the need for verifiable transparency to preserve journalistic credibility, while the weight of C13 underscores the importance of tailoring AI tools to align with editorial standards. Collectively, these findings demonstrate that trust and adaptability, rather than speed or raw accuracy, constitute the decisive drivers of sustainable generative AI adoption in newsroom environments.

4.2. Prioritizing Benchmark by MARCOS—Generative AI Co-Writing for Newsrooms

The MARCOS method was applied to rank the generative AI co-writing solutions under consideration for newsroom adoption. Following normalization of the decision matrix (Equations (3) and (4)), the extended normalized matrix was constructed by appending both ideal and anti-ideal profiles (Equation (5)), thereby ensuring that all alternatives were evaluated with respect to upper and lower reference bounds. The normalized values were subsequently multiplied by the Bonferroni-fused weights (Equation (19)) and depicted in Table 2. Objective and fused weights: Results for AI co-writing tools. and plotted in Figure 2, resulting in the weighted normalized matrix (Equation (20)), which is presented in Table 3.
For each alternative, aggregate scores were derived (Equation (21)), and utility function calculations were performed based on the ideal and anti-ideal solutions (Equations (22)–(24)). The overall performance scores ( S i ), relative indicators ( K i ), and utilities ( U i ) which reflect the relative distance of each alternative from the compromise solution, are shown on the right-hand side of Table 3.
The results indicate that Writesonic (WRT) achieves the highest utility value ( U i = 0.7762 ) and ranks first overall. Perplexity AI (PPX) and Gemini (GEM) occupy the second and third positions, respectively. Jasper AI (JSP) and Copilot (CPL) achieve mid-tier rankings, whereas ChatGPT (CPT) and Claude (CLD) are positioned lower due to relatively weaker performance on transparency and attribution criteria. QuillBot/GrammarlyGO (QBG) and the custom open-source model (COS) receive the lowest scores, reflecting limitations in attribution, consistency, and ease of use.
These findings demonstrate the robustness of the MARCOS method in handling a multi-dimensional evaluation landscape. Additionally, they highlight that while LLMs perform well on accuracy and usability metrics, specialized tools optimized for transparency and workflow integration may offer greater strategic value for newsrooms. The robustness of the MARCOS method in this study is shown through three empirical observations. First, MARCOS produced rankings that closely matched those generated by five comparison MCDM methods (TOPSIS, VIKOR, PROMETHEE-II, WASPAS, EDAS), as confirmed by high Spearman and Kendall correlation coefficients. Second, in weight-perturbation scenarios, the top-performing alternatives under MARCOS remained stable, indicating low sensitivity to changes in criterion importance. Third, the Stability Index showed consistent top-three placements for leading tools across all scenarios and methods. Together, these results demonstrate that MARCOS maintains ranking consistency across different evaluation conditions, confirming its robustness.

4.3. Comparative Rankings and Correlation Analysis

The ranking analysis enables comparison of the performance of generative AI co-writing tools from multiple perspectives by applying six MCDM methods (MARCOS, TOPSIS, VIKOR, PROMETHEE-II, WASPAS, and EDAS). Figure 3 presents the ranking outcomes across these methods. As shown, certain tools, such as Writesonic (WRT), consistently attain the highest rank under most approaches (MARCOS, TOPSIS, VIKOR, WASPAS, and EDAS), whereas others, such as the custom open-source model (COS) and QuillBot/GrammarlyGO (QBG), remain in the lowest ranks regardless of the method employed. This stability emphasizes a strong consensus concerning the relative superiority of some alternatives and the relative inadequacy of others. Mid-performing options, such as Gemini (GEM) and Copilot (CPL), exhibit variability: they achieve comparatively higher ranks under PROMETHEE-II but lower ranks under other methods, thereby reflecting a degree of methodological sensitivity.
These observations are corroborated by correlation analysis. Table 4 reports the Spearman correlation coefficients among MARCOS, VIKOR, WASPAS, and EDAS ( ρ > 0.98 ), indicating strong concordance and highly similar rankings across these methods (Equation (25)). Similarly, Kendall’s Tau coefficients (Equation (26)) demonstrate strong agreement among MARCOS, VIKOR, WASPAS, and EDAS ( τ 0.94 τ), further supporting the reliability of the alternatives across these techniques. By contrast, PROMETHEE-II exhibits weak to negligible positive correlations with the other methods ( ρ = 0.0167 with MARCOS; τ = 0.0556), reflecting divergence in its intrinsic preference structure.
PROMETHEE-II produced rankings that differed from those of other MCDM methods mainly because it is an outranking approach that relies on pairwise preference functions rather than distance- or utility-based evaluation. Unlike MARCOS, TOPSIS, VIKOR, WASPAS, and EDAS, which generate rankings by comparing each alternative to ideal, average, or compromise reference points, PROMETHEE-II assesses alternatives using asymmetric preference flows defined by predefined preference functions and thresholds. As a result, even slight differences in criterion scores can cause strong preference dominance or reversal effects. Additionally, PROMETHEE-II does not use explicit ideal or anti-ideal benchmarks, which makes its rankings more sensitive to local pairwise relationships rather than the overall performance structure. These methodological differences explain why PROMETHEE-II results diverged from those of the other methods in this study.
In summary, the joint use of comparative rankings and correlation matrices proves effective in identifying consistently strong performers (e.g., WRT, PPX) while also revealing potential discrepancies across methods. This dual analysis balances methodological robustness with interpretive nuance, thereby providing newsroom decision-makers with greater confidence that prioritization results are not artifacts of a single technique.

4.4. Sensitivity Analysis

To assess the robustness of the decision results, a systematic sensitivity analysis was conducted in which single-criterion weights were perturbed relative to the normalization condition. Specifically, a one-at-a-time (OAT) approach was adopted, with perturbations of δ = ± 0.05 , ± 0.10 , ± 0.15 applied to selected criteria. The remaining weights were proportionally adjusted to maintain normalization such that j = 1 q w j = 1 , in accordance with Equation (27). Three attributes were selected for perturbation: factual accuracy ( C 1 ), readability/style alignment ( C 4 ), and bias/fairness mitigation ( C 7 ). These correspond broadly to dimensions of content integrity (e.g., accuracy and quality), linguistic integrity (e.g., fluency and coherence), and ethical integrity (e.g., non-bias, toxicity, and trustworthiness). As a result, 18 scenario weights (ScW-1 to ScW-18) were generated, summarized in Table 5.
The MARCOS method was reapplied under these perturbed conditions (Equations (28) and (29)), producing updated utility values and final rankings. Figure 4 presents the consolidated scenario-wise rankings. Results indicate that WRT consistently achieved first place across all perturbations, corresponding to a stability index of 1.0. This outcome confirms that the selection of WRT as the most favorable newsroom AI tool is robust and not merely a byproduct of weighting assumptions. GEM and CPL demonstrated moderate stability, whereas CLD and the custom open-source model (COS) exhibited greater variability underweight perturbations.
Correlation analysis further supports these findings. Spearman’s rank correlation coefficients (Equation (25)) exceeded 0.90 in most perturbation cases (with no instance falling below 0.86), demonstrating strong consistency across the 18 scenarios. Kendall’s τ (Equation (26)) also confirmed high rank-order agreement between baseline and perturbed results. These findings substantiate the robustness of the MARCOS-based decision framework and ensure that policy recommendations are not unduly sensitive to moderate variations in subjective weight assignments.

4.5. Rank Aggregation and Stability Index of Alternatives for Consensus Decision-Making

To mitigate the risk of excessive reliance on a single MCDM method in determining final decision outcomes, rank aggregation and stability analysis were employed to provide a more comprehensive evaluation. The Borda Count (Equation (30)) aggregates ranks across methods, while Copeland’s Method (Equation (31)) incorporates pairwise wins and losses to capture majority preferences. In addition, the Stability Index (Equation (32)) measures the frequency with which an alternative appears in the Top-3 across all methods and sensitivity scenarios. Complementary indicators, including average, best, and worst ranks, as well as the number of wins (#Wins) reflecting how often each alternative achieved the first position, further enhance the robustness of the analysis.
In contrast to controlled generation, the consensus evaluation (Table 6) is unequivocal: WRT attains first place under both Borda and Copeland rankings, achieves a high Stability Index (0.9), and secures the maximum number of wins (9). These findings indicate that WRT consistently outperforms competing tools across diverse methodological and weighting scenarios. Gemini and Perplexity AI also exhibit notable stability, with Stability Index values of 1.0 and 0.9, respectively, while maintaining relatively low average ranks (≤2.7). Such results underscore the consistent performance of GEM and PPX across multiple evaluation environments.
By contrast, CLD, QBG, and the COS demonstrate low stability (SI = 0.0) and zero wins. This outcome highlights their limited competitiveness in newsroom contexts, where consistency across networks is essential. The combined use of rank aggregation methods (Equations (30) and (31)) together with the stability framework (Equation (32)) reinforces the dependability of the consensus decision, positioning top-ranked alternatives as both robust and practically implementable choices.
The hybrid objective-weighted MCDM framework developed in this study is not limited to evaluating generative AI co-writing tools. Its design, combining multiple objective weighting schemes, MARCOS-based benchmarking, and consensus-oriented fusion, makes it adaptable to a wide range of decision-making contexts involving heterogeneous criteria and requiring robust, reproducible rankings. Applications such as technology selection, software assessment, supplier evaluation, and policy or resource prioritization can also benefit from this structured, data-driven approach, demonstrating the broader applicability of the proposed method.

5. Implications

5.1. Practical Implications for Newsroom Workflows

For newsrooms where efficiency, fact-checking, and scalability are central priorities, this study provides direct implications. MARCOS, supported by consensus-based decision-making with rank aggregation and stability analysis, identified WRT as the most reliable co-writing tool, followed by Gemini and Perplexity. These results suggest that generative AI can be effectively integrated for tasks such as breaking news coverage, generating templated articles, and standardizing multilingual content.
WRT consistently achieved top rankings across all sensitivity scenarios, underscoring its resilience to editorial variations related to style, fairness, or factuality. GEM and PPX, while less dominant, exhibited flexibility across a broader range of evaluative criteria, making them suitable for newsrooms with diverse editorial requirements. Integrating such tools can shorten turnaround times, facilitate multi-platform publishing, and allow journalists to devote greater attention to investigative and interpretive reporting. Furthermore, the use of consensus and stability indices provides editors with an evidence-based foundation for informed adoption decisions, thereby reducing reliance on anecdotal judgments and aligning with newsroom priorities of transparency, accountability, and technology governance.

5.2. Ethical and Risk Issues (Bias, Hallucination, Attribution)

Although generative AI enhances productivity, its widespread implementation introduces significant ethical and risk-related challenges. A primary concern is bias, as training data may reinforce stereotypes or distort coverage. While leading tools such as WRT, GEM, and PPX partially address this issue, editorial quality control remains essential.
Another critical challenge is the hallucination of the generation of fabricated information presented as fact. Due to the probabilistic nature of AI, this risk persists even when accuracy (C1) and transparency (C8) are emphasized as evaluative priorities. Human verification processes are therefore indispensable, particularly in sensitive areas such as politics, health, and finance.
Attribution and liability further complicate AI integration, as questions of accountability for errors, plagiarism, or bias remain unresolved. Transparent attribution policies that clearly distinguish human and AI contributions are essential to ensure compliance with copyright, ethical standards, and professional norms.
Governance of generative AI in journalism, as [83] contend, is also a matter of labor: unions have increasingly begun to frame collective bargaining strategies that establish limits on AI to safeguard journalistic authority and working conditions.
In conclusion, while AI co-writing tools offer substantial organizational benefits, their responsible adoption requires governance frameworks that mitigate risks of bias, hallucination, and attribution. Addressing these concerns is essential for safeguarding journalistic credibility and maintaining public trust during this critical period of technological transition. Further insights can be drawn from the studies by [22,28,83]. Taken together, these strands of research demonstrate that newsroom decisions regarding the types of AI that can and should be adopted are not solely questions of technical efficiency. Instead, such decisions necessarily engage with the broader ecology of audience trust, the evolving contours of narrative authority, and the governance of labor.

6. Conclusions

The study proposes a systematic MCDM framework that can be employed for evaluating and ranking generative AI co-writing tools for newsrooms to adopt. None provides a subjective assignment of criterion weights, as the latter was realized in a hybrid method that combined the Entropy, CRITIC, MEREC, CILOS, and Standard Deviation weighting methods into the Bonferroni operator. MARCOS was used as a reference ranking method in comparison to TOPSIS, VIKOR, PROMETHEE-II, WASPAS, and EDAS. The study validated the robustness and consensus of the findings against noise through correlation analysis (Spearman, Kendall), rank aggregation (Borda, Copeland), and precalculated stability indices. So, it leads to some key findings.
  • C8 Transparency and Attribution and C13 Customizability were then the winning criteria, revealing that credibility and versatility were higher priority needs than rapid response.
  • Writesonic was often placed in the top layer (top rank) and had the best stability index, which means it showed the highest resilience to both methodological and weighting variations.
  • Gemini and Perplexity followed suit with stable orderings across sensitivity scenarios, also pushing their performance to the max.
  • Copilot and Claude achieved selective strengths but weaker resilience, indicating a narrower applicability.
  • Custom open-source models came in at the bottom, limited by usability and ecosystem support but with great long-term flexibility.
The findings highlight that newsroom adoption should prioritize governance, ethics, and integration with workflows, rather than focusing on speed or cost. Theoretically, this study advances the conceptualization of ethical risks, bias, hallucination, plagiarism, and accountability, which can be defined and thus measured. In practice, this study also better aligns the needs of news organizations with the ethical dilemmas of AI in journalism through a consensus, criterion-based approach to AI adoption.
From a mathematical perspective, the proposed MARCOS–objective weights–Bonferroni framework demonstrates strong robustness, convergence, and generalizability across heterogeneous decision environments. The integration of dispersion and correlation-based weighting ensures normalization invariance and reduces rank volatility. At the same time, the Bonferroni fusion mechanism guarantees convergence of the composite weight vector under varying data distributions. The stability index and correlation analyses confirm consistent performance across comparative MCDM methods, validating the model’s rank-preserving properties. Beyond newsroom applications, the formulation provides a scalable and reproducible decision-analytic architecture suitable for complex, multi-domain evaluations involving conflicting quantitative and qualitative criteria.

Limitations and Future Work

Despite its robustness, this study has several limitations. AI tools evolve dynamically, meaning that benchmark data and expert ratings may only partially reflect real-world performance. Furthermore, while objective weighting methods enhance impartiality, they cannot fully capture stakeholders’ value judgments. Generalizability is constrained by the study’s scope, which was limited to 11 generative AI tools and 16 evaluation criteria. Expanding the framework to incorporate additional alternatives and dimensions such as environmental sustainability, long-term ownership costs, and regional language adaptability would strengthen its applicability.
Future research should pursue participatory evaluations with journalists, editors, and audiences to situate assessment within the lived practices of newsrooms. A longitudinal approach could capture how tool rankings evolve, particularly in response to technological advancements and regulatory developments, such as the European Union’s proposed AI regulation. Additionally, combining MCDM with predictive modeling (e.g., machine learning-based forecasting of adoption trends) could provide insights into future uncertainties and assist newsroom managers in strategic planning. Finally, cross-regional comparative studies are necessary to investigate how cultural, linguistic, and policy contexts shape AI adoption, thereby enhancing the framework’s global relevance.

Author Contributions

F.C. (Funding acquisition; Writing—original draft; Validation; Data curation; Software), B.A.B. (Investigation; Visualization; Resources; Software; Methodology; Formal analysis; Writing—review & editing), R.K. (Writing—review & editing; Data curation; Resources; Supervision; Visualization). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

References

  1. Xu, M. Interaction between students and artificial intelligence in the context of creative potential development. Interact. Learn. Environ. 2025, 33, 4460–4475. [Google Scholar] [CrossRef]
  2. Neshaei, S.P.; Mejia-Domenzain, P.; Davis, R.L.; Käser, T. Metacognition meets AI: Empowering reflective writing with large language models. Br. J. Educ. Technol. 2025, 56, 1864–1896. [Google Scholar] [CrossRef]
  3. Banafi, W. A review of the role of artificial intelligence in Journalism. Edelweiss Appl. Sci. Technol. 2024, 8, 3951–3961. [Google Scholar] [CrossRef]
  4. Hermida, A.; Simon, F.M. AI in the Newsroom: Lessons from the Adoption of The Globe and Mail’s Sophi. J. Pract. 2025, 19, 2323–2340. [Google Scholar] [CrossRef]
  5. Bisht, S.; Parihar, T.S. Digital news platforms and mediatization of religion: Understanding the religious coverage in different ‘News Frames’. Comun. Midia E Consumo 2023, 20, 445–461. [Google Scholar] [CrossRef]
  6. Baptista, J.P.; Rivas-de-Roca, R.; Gradim, A.; Pérez-Curiel, C. Human-made news vs. AI-generated news: A comparison of Portuguese and Spanish journalism students’ evaluations. Humanit. Soc. Sci. Commun. 2025, 12, 567. [Google Scholar] [CrossRef]
  7. Yin, Z.; Wang, S. Enhancing scientific table understanding with type-guided chain-of-thought. Inf. Process. Manag. 2025, 62, 18. [Google Scholar] [CrossRef]
  8. Lv, S.; Lu, S.; Wang, R.; Yin, L.; Yin, Z.; AlQahtani, S.A.; Tian, J.; Zheng, W. Enhancing Chinese Dialogue Generation with Word–Phrase Fusion Embedding and Sparse SoftMax Optimization. Systems 2024, 12, 516. [Google Scholar] [CrossRef]
  9. Eli, E.; Wang, D.; Xu, W.; Mamat, H.; Aysa, A.; Ubul, K. A comprehensive review of non-Latin natural scene text detection and recognition techniques. Eng. Appl. Artif. Intell. 2025, 156, 111107. [Google Scholar] [CrossRef]
  10. Liu, S.; Li, C.; Qiu, J.; Zhang, X.; Huang, F.; Zhang, L.; Hei, Y.; Yu, P. The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs. arXiv 2025. [Google Scholar] [CrossRef]
  11. Jamil, S. Artificial Intelligence and Journalistic Practice: The Crossroads of Obstacles and Opportunities for the Pakistani Journalists. J. Pract. 2020, 15, 1400–1422. [Google Scholar] [CrossRef]
  12. Al-Zoubi, O.A.; Ahmad, N.; Abdul Hamid, N.A. Artificial Intelligence in Newsrooms: Case Study on Al-Mamlaka TV. J. Komun. Malays. J. Commun. 2025, 41, 35–51. [Google Scholar] [CrossRef]
  13. Cools, H.; Diakopoulos, N. Uses of Generative AI in the Newsroom: Mapping Journalists’ Perceptions of Perils and Possibilities. J. Pract. 2024, 1–19. [Google Scholar] [CrossRef]
  14. Abuyaman, O. Strengths and Weaknesses of ChatGPT Models for Scientific Writing About Medical Vitamin B12: Mixed Methods Study. JMIR Form. Res. 2023, 7, e49459. [Google Scholar] [CrossRef]
  15. Canavilhas, J.; Ioscote, F.; Gonçalves, A. Artificial Intelligence as an Opportunity for Journalism: Insights from the Brazilian and Portuguese Media. Soc. Sci. 2024, 13, 590. [Google Scholar] [CrossRef]
  16. AlSagri, H.S.; Farhat, F.; Sohail, S.S.; Saudagar, A.K.J. ChatGPT or Gemini: Who Makes the Better Scientific Writing Assistant? J. Acad. Ethics 2025, 23, 1121–1135. [Google Scholar] [CrossRef]
  17. Arets, D.; Brugman, M.; de Cooker, J. AI-Powered Editorial Systems and Organizational Changes. SMPTE Motion Imaging J. 2024, 133, 58–65. [Google Scholar] [CrossRef]
  18. Adjin-Tettey, T.D.; Muringa, T.; Danso, S.; Zondi, S. The Role of Artificial Intelligence in Contemporary Journalism Practice in Two African Countries. J. Media 2024, 5, 846–860. [Google Scholar] [CrossRef]
  19. Amigo, L.; Porlezza, C. “Journalism Will Always Need Journalists.” The Perceived Impact of AI on Journalism Authority in Switzerland. J. Pract. 2025, 19, 2266–2284. [Google Scholar] [CrossRef]
  20. Al-Zoubi, O.; Ahmad, N.; Abdul Hamid, N.A. Artificial Intelligence in Newsrooms: Ethical Challenges Facing Journalists. Stud. Media Commun. 2023, 12, 401–409. [Google Scholar] [CrossRef]
  21. Bien-Aimé, S.; Wu, M.; Appelman, A.; Jia, H. Who Wrote It? News Readers’ Sensemaking of AI/Human Bylines. Commun. Rep. 2025, 38, 46–58. [Google Scholar] [CrossRef]
  22. Breazu, P.; Katsos, N. ChatGPT-4 as a journalist: Whose perspectives is it reproducing? Discourse Soc. 2024, 35, 687–707. [Google Scholar] [CrossRef]
  23. Dong, Y.; Yu, X.; Alharbi, A.; Ahmad, S. AI-based production and application of English multimode online reading using multi-criteria decision support system. Soft Comput. 2022, 26, 10927–10937. [Google Scholar] [CrossRef]
  24. Peifer, J.T.; Myrick, J.G. Risky satire: Examining how a traditional news outlet’s use of satire can affect audience perceptions and future engagement with the news source. Journalism 2021, 22, 1629–1646. [Google Scholar] [CrossRef]
  25. Lang, G.; Triantoro, T.; Sharp, J.H. Large Language Models as AI-Powered Educational Assistants: Comparing GPT-4 and Gemini for Writing Teaching Cases. J. Inf. Syst. Educ. 2024, 35, 390–407. [Google Scholar] [CrossRef]
  26. Lookadoo, K.; Moore, S.; Wright, C.; Hemby, V.; McCool, L.B. AI-Based Writing Assistants in Business Education: A Cross-Institutional Study on Student Perspectives. Bus. Prof. Commun. Q. 2025, 23294906241310415. [Google Scholar] [CrossRef]
  27. Helgeson, S.A.; Johnson, P.W.; Gopikrishnan, N.; Koirala, T.; Moreno-Franco, P.; Carter, R.E.; Quicksall, Z.S.; Burger, C.D. Human Reviewers’ Ability to Differentiate Human-Authored or Artificial Intelligence–Generated Medical Manuscripts: A Randomized Survey Study. Mayo Clin. Proc. 2025, 100, 622–633. [Google Scholar] [CrossRef]
  28. Cloudy, J.; Banks, J.; Bowman, N.D. The Str(AI)ght Scoop: Artificial Intelligence Cues Reduce Perceptions of Hostile Media Bias. Digit. J. 2023, 11, 1577–1596. [Google Scholar] [CrossRef]
  29. Abdulaal, R.M.S.; Makki, A.A.; Al-Madi, E.M.; Qhadi, A.M. Prioritizing Strategic Objectives and Projects in Higher Education Institutions: A New Hybrid Fuzzy MEREC-G-TOPSIS Approach. IEEE Access 2024, 12, 89735–89753. [Google Scholar] [CrossRef]
  30. Akhtar, M.N.; Haleem, A.; Javaid, M.; Vasif, M. Understanding Medical 4.0 implementation through enablers: An integrated multi-criteria decision-making approach. Inform. Health 2024, 1, 29–39. [Google Scholar] [CrossRef]
  31. Akram, M.; Zahid, K.; Deveci, M. Multi-criteria group decision-making for optimal management of water supply with fuzzy ELECTRE-based outranking method. Appl. Soft Comput. 2023, 143, 110403. [Google Scholar] [CrossRef]
  32. Cheng, H. Advancing Theater and Vocal Research in Chinese Opera for Role-Centric Acoustic and Speech Studies through Fuzzy Topsis Evaluation. IEEE Access 2025. [Google Scholar] [CrossRef]
  33. Karjust, K.; Mehrparvar, M.; Kaganski, S.; Raamets, T. Development of a Sustainability-Oriented KPI Selection Model for Manufacturing Processes. Sustainability 2025, 17, 6374. [Google Scholar] [CrossRef]
  34. Yalcin, G.C. Development of a Fuzzy-Based Decision Support System for Sustainable Tractor Selection in Green Ports. Facta Univ. Ser. Mech. Eng. 2025, 23, 579–604. [Google Scholar] [CrossRef]
  35. Nguyen, P.H.; Tran, T.H.; Thi Nguyen, L.A.; Pham, H.A.; Pham, M.A. Streamlining apartment provider evaluation: A spherical fuzzy multi-criteria decision-making model. Heliyon 2023, 9, e22353. [Google Scholar] [CrossRef]
  36. Rana, H.S.; Umer, M.; Hassan, U.; Asgher, U.; Silva-Aravena, F.; Ehsan, N. Application of Fuzzy Topsis for Prioritization of Patients on Elective Surgeries Waiting List—A Novel Multi-Criteria Decision-Making Approach. Decis. Mak. Appl. Manag. Eng. 2023, 6, 603–630. [Google Scholar] [CrossRef]
  37. Tarafdar, A.; Shaikh, A.; Ali, M.N.; Haldar, A. An integrated fuzzy decision-making framework for autonomous mobile robot selection: Balancing subjective and objective measures with fuzzy TOPSIS and picture fuzzy CoCoSo approach. J. Oper. Res. Soc. 2025, 1–27. [Google Scholar] [CrossRef]
  38. Tran, N.T.; Trinh, V.L.; Chung, C.K. An Integrated Approach of Fuzzy AHP-TOPSIS for Multi-Criteria Decision-Making in Industrial Robot Selection. Processes 2024, 12, 1723. [Google Scholar] [CrossRef]
  39. Tzeng, G.-H.; Huang, J.-J. Multiple Attribute Decision Making: Methods and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2011. [Google Scholar]
  40. Jahanshahloo, G.R.; Lotfi, F.H.; Izadikhah, M. An algorithmic method to extend TOPSIS for decision-making problems with interval data. Appl. Math. Comput. 2006, 175, 1375–1384. [Google Scholar] [CrossRef]
  41. Opricovic, S.; Tzeng, G.H. Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS. Eur. J. Oper. Res. 2004, 156, 445–455. [Google Scholar] [CrossRef]
  42. Mardani, A.; Zavadskas, E.K.; Khalifah, Z.; Zakuan, N.; Jusoh, A.; Nor, K.M.; Khoshnoudi, M. A review of multi-criteria decision-making applications to solve energy management problems: Two decades from 1995 to 2015. Renew. Sustain. Energy Rev. 2017, 71, 216–256. [Google Scholar] [CrossRef]
  43. Antunes, J.; Tan, Y.; Wanke, P. Analyzing Chinese banking performance with a trigonometric envelopment analysis for ideal solutions model. IMA J. Manag. Math. 2024, 35, 379–401. [Google Scholar] [CrossRef]
  44. Arslan, A.E.; Arslan, O.; Yerel, S.Y. AHP–TOPSIS hybrid decision-making analysis: Simav integrated system case study. J. Therm. Anal. Calorim. 2021, 145, 1191–1202. [Google Scholar] [CrossRef]
  45. Kumar, R.; Singh, S.; Bilga, P.S.; Jatin; Singh, J.; Singh, S.; Scutaru, M.L.; Pruncu, C.I. Revealing the benefits of entropy weights method for multi-objective optimization in machining operations: A critical review. J. Mater. Res. Technol. 2021, 10, 1471–1492. [Google Scholar] [CrossRef]
  46. Rao, R.V. Evaluation of environmentally conscious manufacturing programs using multiple attribute decision-making methods. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2008, 222, 441–451. [Google Scholar] [CrossRef]
  47. Radovanović, M.; Božanić, D.; Tešić, D.; Puška, A.; Hezam Al-Mishnanah, I.; Jana, C. Application of Hybrid DIBR-FUCOM-LMAW-Bonferroni-grey-EDAS model in multicriteria decision-making. Facta Univ. Ser. Mech. Eng. 2023, 21, 387–403. [Google Scholar] [CrossRef]
  48. Stević, Ž.; Tanackov, I.; Subotić, M. Evaluation of Road Sections in Order Assessment of Traffic Risk: Integrated FUCOM–MARCOS Model. In Proceedings of the 1st International Conference on Challenges and New Solutions in Industrial Engineering and Management and Accounting, Sari, Iran, 16 July 2020. [Google Scholar]
  49. Chodha, V.; Dubey, R.; Kumar, R.; Singh, S.; Kaur, S. Selection of industrial arc welding robot with TOPSIS and Entropy MCDM techniques. Mater. Today Proc. 2022, 50, 709–715. [Google Scholar] [CrossRef]
  50. Jahan, A.; Mustapha, F.; Ismail, M.Y.; Sapuan, S.M.; Bahraminasab, M. A comprehensive VIKOR method for material selection. Materials & Design 2011, 32, 1215–1221. [Google Scholar] [CrossRef]
  51. Zavadskas, E.K.; Antucheviciene, J.; Saparauskas, J.; Turskis, Z. MCDM methods WASPAS and MULTIMOORA: Verification of robustness of methods when assessing alternative solutions. Econ. Comput. Econ. Cybern. Stud. Res. 2013, 47, 5. [Google Scholar]
  52. Ghorabaee, M.K.; Zavadskas, E.; Olfat, L.; Turskis, Z.J.I. Multi-Criteria Inventory Classification Using a New Method of Evaluation Based on Distance from Average Solution (EDAS). Informatica 2015, 26, 435–451. [Google Scholar] [CrossRef]
  53. Stevic, Ž.; Brković, N. A Novel Integrated FUCOM-MARCOS Model for Evaluation of Human Resources in a Transport Company. Logistics 2020, 4, 4. [Google Scholar] [CrossRef]
  54. Kumar, R.; Goel, P.; Zavadskas, E.K.; Stevic, Ž.; Vujovic, V. A New Joint Strategy for Multi-Criteria Decision-Making: A Case Study for Prioritizing Solid-State Drive. Int. J. Comput. Commun. Control 2022, 17, 5010. [Google Scholar] [CrossRef]
  55. Chen, X.H.; Xu, X.H.; Zeng, J.H. Method of multi-attribute large group decision making based on entropy weight. Syst. Eng. Electron. 2007, 29, 1086–1089. [Google Scholar]
  56. Chakraborty, S.; Chakraborty, S. A Scoping Review on the Applications of MCDM Techniques for Parametric Optimization of Machining Processes. Arch. Comput. Methods Eng. 2022, 29, 4165–4186. [Google Scholar] [CrossRef]
  57. Diakoulaki, D.; Mavrotas, G.; Papayannakis, L. Determining objective weights in multiple criteria problems: The critic method. Comput. Oper. Res. 1995, 22, 763–770. [Google Scholar] [CrossRef]
  58. Keshavarz-Ghorabaee, M.; Amiri, M.; Zavadskas, E.K.; Turskis, Z.; Antucheviciene, J. Determination of objective weights using a new method based on the removal effects of criteria (Merec). Symmetry 2021, 13, 525. [Google Scholar] [CrossRef]
  59. Kaur, S.; Kumar, R.; Singh, K. Sustainable Component-Level Prioritization of PV Panels, Batteries, and Converters for Solar Technologies in Hybrid Renewable Energy Systems Using Objective-Weighted MCDM Models. Energies 2025, 18, 5410. [Google Scholar] [CrossRef]
  60. Yager, R.R. On generalized Bonferroni mean operators for multi-criteria aggregation. Int. J. Approx. Reason. 2009, 50, 1279–1286. [Google Scholar] [CrossRef]
  61. Beliakov, G.; James, S.; Mordelová, J.; Rückschlossová, T.; Yager, R.R. Generalized Bonferroni mean operators in multi-criteria aggregation. Fuzzy Sets Syst. 2010, 161, 2227–2242. [Google Scholar] [CrossRef]
  62. Stević, Ž.; Pamučar, D.; Puška, A.; Chatterjee, P. Sustainable supplier selection in healthcare industries using a new MCDM method: Measurement of alternatives and ranking according to COmpromise solution (MARCOS). Comput. Ind. Eng. 2020, 140, 106231. [Google Scholar] [CrossRef]
  63. Stanković, M.; Stevic, Z.; Das, D.K.; Subotić, M.; Pamucar, D. A new fuzzy marcos method for road traffic risk analysis. Mathematics 2020, 8, 457. [Google Scholar] [CrossRef]
  64. Chen, C.T. Extensions of the TOPSIS for group decision-making under fuzzy environment. Fuzzy Sets Syst. 2000, 114, 1–9. [Google Scholar] [CrossRef]
  65. Yurdakul, M.; Çoǧun, C. Development of a multi-attribute selection procedure for non-traditional machining processes. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2003, 217, 993–1009. [Google Scholar] [CrossRef]
  66. Opricovic, S. Programski paket VIKOR za višekriterijumsko kompromisno rangiranje. In Proceedings of the 17th International Symposium on Operational Research SYM-OP-IS, Kupari, Yugoslavia, 9–12 October 1990. [Google Scholar]
  67. Huang, J.-J.; Tzeng, G.-H.; Liu, H.-H. A Revised VIKOR Model for Multiple Criteria Decision Making—The Perspective of Regret Theory. In Proceedings of the Cutting-Edge Research Topics on Multiple Criteria Decision Making, Chengdu, China, 21–26 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 761–768. [Google Scholar]
  68. Tavares, R.O.; Maciel, J.H.R.D.; De Vasconcelos, J.A. The a posteriori decision in multiobjective optimization problems with smarts, promethee II, and a fuzzy algorithm. IEEE Trans. Magn. 2006, 42, 1139–1142. [Google Scholar] [CrossRef]
  69. Zavadskas, E.K.; Turskis, Z.; Antucheviciene, J.; Zakarevicius, A. Optimization of weighted aggregated sum product assessment. Elektron. Elektrotech. 2012, 122, 3–6. [Google Scholar] [CrossRef]
  70. Aloa. Best Long-form AI Content Writers 2025: ChatGPT vs. Claude vs. Jasper|Features, Pricing, Enterprise Solutions. 2025. Available online: https://aloa.co/ai/comparisons/ai-writing-comparison/best-long-form-ai-content-writers/ (accessed on 21 September 2025).
  71. Matt, O.B. Chatbots Sometimes Make Things Up. Is AI’s Hallucination Problem Fixable? The Associated Press: New York, NY, USA, 2023; Available online: https://apnews.com/article/artificial-intelligence-hallucination-chatbots-chatgpt-falsehoods-ac4672c5b06e6f91050aa46ee731bcf4 (accessed on 21 September 2025).
  72. Rezolve.ai. Claude vs. GPT-4: A Comprehensive Comparison of AI Language Models. 2025. Available online: https://www.rezolve.ai/blog/claude-vs-gpt-4 (accessed on 22 September 2025).
  73. Koray, K. Gemini 2.5: Our Most intelligent AI Model. 2025. Available online: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ (accessed on 24 September 2025).
  74. Google. Gemini. 2025. Available online: https://deepmind.google/models/gemini/ (accessed on 20 September 2025).
  75. Microsoft Learn. Customize Microsoft 365 Copilot with Copilot Tuning. 2025. Available online: https://learn.microsoft.com/en-us/copilot/microsoft-365/copilot-tuning-process (accessed on 25 September 2025).
  76. Reddit. Are Citations in Copilot Chat Now a Paid Feature? 2025. Available online: https://www.reddit.com/r/bing/comments/1gwlbrc/are_citations_in_copilot_chat_now_a_paid_feature/ (accessed on 26 September 2025).
  77. Jess, S. Chatsonic by Writesonic—Company and Statistical Insights. 2025. Available online: https://originality.ai/blog/chatsonic-statistics (accessed on 20 September 2025).
  78. Arsh, T. The Truth About AI Writers in 2025: I Spent $138 and 40 Hours Testing Jasper, Writesonic & Copy.ai So You Don’t Have To; Volture Luxe (Medium): San Francisco, CA, USA, 2025; Available online: https://medium.com/@arshthakkar127/the-truth-about-ai-writers-in-2025-i-spent-138-and-40-hours-testing-jasper-writesonic-copy-ai-5c836f8e7db5 (accessed on 22 September 2025).
  79. Jasper Help Center. Brand Voice. 2025. Available online: https://help.jasper.ai/hc/en-us/articles/18618693085339-Brand-Voice (accessed on 21 September 2025).
  80. Fahimai. Writesonic vs. Grammarly: Head-to-Head Comparison in 2025. 2025. Available online: https://www.fahimai.com/writesonic-vs-grammarly (accessed on 27 September 2025).
  81. 10Web. GrammarlyGO Review: Features, Pros, and Cons. 2025. Available online: https://10web.io/ai-tools/grammarlygo/ (accessed on 27 September 2025).
  82. Quantum IT Innovation. Top ChatGPT Alternatives in 2025: Smarter AI Tools for Every Need. 2025. Available online: https://quantumitinnovation.com/blog/top-chatgpt-alternatives (accessed on 21 September 2025).
  83. Ananny, M.; Karr, J. How Media Unions Stabilize Technological Hype Tracing Organized Journalism’s Discursive Constructions of Generative Artificial Intelligence. Digit. J. 2025, 1–21. [Google Scholar] [CrossRef]
Figure 1. Methodology flowchart for Prioritizing Generative AI Co-Writing Tools for Newsrooms: MARCOS and comparative MCDM approaches.
Figure 1. Methodology flowchart for Prioritizing Generative AI Co-Writing Tools for Newsrooms: MARCOS and comparative MCDM approaches.
Mathematics 13 03791 g001
Figure 2. Pareto chart of fused weights highlighting dominant newsroom criteria (C8, C13).
Figure 2. Pareto chart of fused weights highlighting dominant newsroom criteria (C8, C13).
Mathematics 13 03791 g002
Figure 3. Comparative rankings of generative AI co-writing tools for newsrooms using multiple MCDM techniques.
Figure 3. Comparative rankings of generative AI co-writing tools for newsrooms using multiple MCDM techniques.
Mathematics 13 03791 g003
Figure 4. MARCOS sensitivity ranks across 18 scenarios.
Figure 4. MARCOS sensitivity ranks across 18 scenarios.
Mathematics 13 03791 g004
Table 1. Decision Matrix: Generative AI co-writing solutions considered for newsrooms.
Table 1. Decision Matrix: Generative AI co-writing solutions considered for newsrooms.
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16Refs.
CPT8897698257597889[70,71]
CLD9978689268685777[72]
GEM8998998368698988[73,74]
CPL8987988359593998[75,76]
PPX8767587976873765[77]
JSP7897777287487888[78,79]
WRT6778776886776866[80]
QBG7869998198994988[81]
COS6765544249959659[82]
BBBBBBBBBBCBBBBB
B = Benefit-type criterion (higher values are preferred); C = Cost-type criterion (lower values are preferred). Criterion C11 represents operational/licensing costs and is the only cost-type criterion in the evaluation.
Table 2. Objective and fused weights: Results for AI co-writing tools.
Table 2. Objective and fused weights: Results for AI co-writing tools.
CriterionEntropyStd. Dev.CRITICMERECCILOSBonferroni-
Fused Weights
C10.01530.04270.03960.03710.05230.0298
C20.00950.03650.02650.05010.04830.0245
C30.02610.05610.04650.04080.05290.0398
C40.01980.04710.04100.03580.05350.0330
C50.04560.06980.05390.05190.05750.0542
C60.03910.06660.05000.05490.05290.0501
C70.03770.06240.04560.04120.05580.0461
C80.45450.12110.19330.21340.16590.2894
C90.05490.07020.08240.07450.06320.0662
C100.01840.04760.05720.04790.05170.0371
C110.06520.07620.06490.07340.06260.0684
C120.02610.05740.03690.05150.05020.0393
C130.11880.09120.13100.09990.07670.1112
C140.01490.04440.03170.04460.04930.0294
C150.02730.05480.03600.03120.05470.0368
C160.02680.05610.06330.05190.05240.0446
Table 3. MARCOS-based prioritization of generative AI co-writing tools for newsrooms.
Table 3. MARCOS-based prioritization of generative AI co-writing tools for newsrooms.
Extended Normalized Matrix
Alt.CPTCLDGEMCPLPPXJSPWRTQBGCOSS*
(Ideal Reference)
S-
(Anti-Ideal Reference)
C10.88891.00000.88890.88890.88890.77780.66670.77780.66671.00000.6667
C20.88891.00001.00001.00000.77780.88890.77780.88890.77781.00000.7778
C31.00000.77781.00000.88890.66671.00000.77780.66670.66671.00000.6667
C40.77780.88890.88890.77780.77780.77780.88891.00000.55561.00000.5556
C50.66670.66671.00001.00000.55560.77780.77781.00000.55561.00000.5556
C61.00000.88891.00000.88890.88890.77780.77781.00000.44441.00000.4444
C70.88891.00000.88890.88890.77780.77780.66670.88890.44441.00000.4444
C80.22220.22220.33330.33331.00000.22220.88890.11110.22221.00000.1111
C90.55560.66670.66670.55560.77780.88890.88891.00000.44441.00000.4444
C100.77780.88890.88891.00000.66670.77780.66670.88891.00001.00000.6667
C110.80000.66670.66670.80000.50001.00000.57140.44440.44441.00000.4444
C121.00000.88891.00001.00000.77780.88890.77781.00000.55561.00000.5556
C130.77780.55560.88890.33330.33330.77780.66670.44441.00001.00000.3333
C140.88890.77781.00001.00000.77780.88890.88891.00000.66671.00000.6667
C150.88890.77780.88891.00000.66670.88890.66670.88890.55561.00000.5556
C161.00000.77780.88890.88890.55560.88890.66670.88891.00001.00000.5556
Weighted Normalized Matrix
C10.02650.02980.02650.02650.02650.02320.01990.02320.01990.02980.0199
C20.02180.02450.02450.02450.01910.02180.01910.02180.01910.02450.0191
C30.03980.03100.03980.03540.02650.03980.03100.02650.02650.03980.0265
C40.02570.02930.02930.02570.02570.02570.02930.03300.01830.03300.0183
C50.03610.03610.05420.05420.03010.04220.04220.05420.03010.05420.0301
C60.05010.04450.05010.04450.04450.03900.03900.05010.02230.05010.0223
C70.04100.04610.04100.04100.03590.03590.03070.04100.02050.04610.0205
C80.06430.06430.09650.09650.28940.06430.25730.03220.06430.28940.0322
C90.03680.04410.04410.03680.05150.05890.05890.06620.02940.06620.0294
C100.02890.03300.03300.03710.02470.02890.02470.03300.03710.03710.0247
C110.05470.04560.04560.05470.03420.06840.03910.03040.03040.06840.0304
C120.03930.03490.03930.03930.03060.03490.03060.03930.02180.03930.0218
C130.08650.06180.09890.03710.03710.08650.07410.04940.11120.11120.0371
C140.02610.02290.02940.02940.02290.02610.02610.02940.01960.02940.0196
C150.03270.02860.03270.03680.02450.03270.02450.03270.02040.03680.0204
C160.04460.03470.03960.03960.02480.03960.02970.03960.04460.04460.0248
Aggregate Weighted Scores (Si), and Utility Score (Ui)
Si0.65490.61130.72460.65910.74800.66780.77620.60200.535610.3971
Ui0.65490.61130.72460.65910.74800.66780.77620.60200.5356
Rank673524189
Table 4. Spearman and Kendall correlation matrices: Ranks by multiple MCDM techniques.
Table 4. Spearman and Kendall correlation matrices: Ranks by multiple MCDM techniques.
Spearman Correlation Matrix
MARCOSTOPSISVIKORPROMETHEE-IIWASPASEDAS
MARCOS10.88330.98330.01670.98330.9833
TOPSIS 10.9000−0.16670.90000.9000
VIKOR 10.08331.00001
PROMETHEE-II 10.08330.0833
WASPAS 11
EDAS 1
Kendall Tau Correlation Matrix
MARCOSTOPSISVIKORPROMETHEE-IIWASPASEDAS
MARCOS1
TOPSIS0.77781
VIKOR0.94440.83331
PROMETHEE-II0.0556−0.05560.11111
WASPAS0.94440.83331.00000.11111
EDAS0.94440.83331.00000.111111
Table 5. Sensitivity analysis of under-criteria weight perturbations using the MARCOS method.
Table 5. Sensitivity analysis of under-criteria weight perturbations using the MARCOS method.
ScenarioC1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
ScW-1S-4 (C4)_ − 0.150.0340.0280.046−0.1170.0630.0580.0530.3340.0770.0430.0790.0450.1290.0340.0430.052
ScW-2S-4 (C4)_ − 0.100.0330.0270.044−0.0670.0600.0550.0510.3190.0730.0410.0760.0430.1230.0320.0410.049
ScW-3S-4 (C4)_ − 0.050.0310.0260.042−0.0170.0570.0530.0490.3040.0700.0390.0720.0410.1170.0310.0390.047
ScW-4S-4 (C4)_ + 0.050.0280.0230.0380.0830.0510.0480.0440.2750.0630.0350.0650.0370.1060.0280.0350.042
ScW-5S-4 (C4)_ + 0.100.0270.0220.0360.1330.0490.0450.0410.2600.0590.0330.0610.0350.1000.0260.0330.040
ScW-6S-4 (C4)_ + 0.150.0250.0210.0340.1830.0460.0420.0390.2450.0560.0310.0580.0330.0940.0250.0310.038
ScW-7S-1 (C1)_ − 0.15−0.1200.0280.0460.0380.0630.0580.0530.3340.0760.0430.0790.0450.1280.0340.0430.052
ScW-8S-1 (C1)_ − 0.10−0.0700.0270.0440.0360.0600.0550.0510.3190.0730.0410.0760.0430.1230.0320.0410.049
ScW-9S-1 (C1)_ − 0.05−0.0200.0260.0420.0350.0570.0530.0490.3040.0700.0390.0720.0410.1170.0310.0390.047
ScW-10S-1 (C1)_ + 0.050.0800.0230.0380.0310.0510.0480.0440.2750.0630.0350.0650.0370.1060.0280.0350.042
ScW-11S-1 (C1)_ + 0.100.1300.0220.0360.0300.0490.0450.0410.2600.0590.0330.0610.0350.1000.0260.0330.040
ScW-12S-1 (C1)_ + 0.150.1800.0210.0340.0280.0460.0420.0390.2450.0560.0310.0580.0330.0940.0250.0310.038
ScW-13S-7 (C7)_ − 0.150.0350.0280.0460.0380.0630.058−0.1040.3350.0770.0430.0790.0460.1290.0340.0430.052
ScW-14S-7 (C7)_ − 0.100.0330.0270.0440.0370.0600.055−0.0540.3200.0730.0410.0760.0430.1230.0330.0410.049
ScW-15S-7 (C7)_ − 0.050.0310.0260.0420.0350.0570.053−0.0040.3050.0700.0390.0720.0410.1170.0310.0390.047
ScW-16S-7 (C7)_ + 0.050.0280.0230.0380.0310.0510.0480.0960.2740.0630.0350.0650.0370.1050.0280.0350.042
ScW-17S-7 (C7)_ + 0.100.0270.0220.0360.0300.0490.0450.1460.2590.0590.0330.0610.0350.1000.0260.0330.040
ScW-18S-7 (C7)_ + 0.150.0250.0210.0340.0280.0460.0420.1960.2440.0560.0310.0580.0330.0940.0250.0310.038
Table 6. Consensus ranking of alternatives using Borda Count, Copeland’s Method, and Stability Index with robustness indicators.
Table 6. Consensus ranking of alternatives using Borda Count, Copeland’s Method, and Stability Index with robustness indicators.
AlternativeBorda RankCopeland RankStability IndexAvg. RankBest RankWorst Rank#Wins
WRT110.91.6179
GEM2312.7131
PPX320.92.7280
CPL440.13.9250
JSP5505460
CPT660.15.6370
CLD7706.8580
QBG8807.6490
COS9908.7690
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, F.; Bulgarova, B.A.; Kumar, R. Prioritizing Generative Artificial Intelligence Co-Writing Tools in Newsrooms: A Hybrid MCDM Framework for Transparency, Stability, and Editorial Integrity. Mathematics 2025, 13, 3791. https://doi.org/10.3390/math13233791

AMA Style

Chen F, Bulgarova BA, Kumar R. Prioritizing Generative Artificial Intelligence Co-Writing Tools in Newsrooms: A Hybrid MCDM Framework for Transparency, Stability, and Editorial Integrity. Mathematics. 2025; 13(23):3791. https://doi.org/10.3390/math13233791

Chicago/Turabian Style

Chen, Fenglan, Bella Akhmedovna Bulgarova, and Raman Kumar. 2025. "Prioritizing Generative Artificial Intelligence Co-Writing Tools in Newsrooms: A Hybrid MCDM Framework for Transparency, Stability, and Editorial Integrity" Mathematics 13, no. 23: 3791. https://doi.org/10.3390/math13233791

APA Style

Chen, F., Bulgarova, B. A., & Kumar, R. (2025). Prioritizing Generative Artificial Intelligence Co-Writing Tools in Newsrooms: A Hybrid MCDM Framework for Transparency, Stability, and Editorial Integrity. Mathematics, 13(23), 3791. https://doi.org/10.3390/math13233791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop