Supporting Newsrooms with Journalistic Knowledge Graph Platforms: Current State and Future Directions †

: Increasing competition and loss of revenues force newsrooms to explore new digital solutions. The new solutions employ artiﬁcial intelligence and big data techniques such as machine learning and knowledge graphs to manage and support the knowledge work needed in all stages of news production. The result is an emerging type of intelligent information system we have called the Journalistic Knowledge Platform (JKP). In this paper, we analyse for the ﬁrst time knowledge graph-based JKPs in research and practice. We focus on their current state, challenges, opportunities and future directions. Our analysis is based on 14 platforms reported in research carried out in collaboration with news organisations and industry partners and our experiences with developing knowledge graph-based JKPs along with an industry partner. We found that: (a) the most central contribution of JKPs so far is to automate metadata annotation and monitoring tasks; (b) they also increasingly contribute to improving background information and content analysis, speeding-up newsroom workﬂows and providing newsworthy insights; (c) future JKPs need better mechanisms to extract information from textual and multimedia news items; (d) JKPs can provide a digitalisation path towards reduced production costs and improved information quality while adapting the current workﬂows of newsrooms to new forms of journalism and readers’ demands.


Introduction
News agencies and news organisations are under pressure from the loss of advertisement and revenues [1,2], and facing an audience that is less likely willing to pay for digital content [3,4]. Despite an increase in digital consumption, information is no longer consumed from a limited number of TV stations and news outlets. Instead, readers have access to and can contrast fresh and first-hand information from free-available sources on the internet and social media at any time. As a consequence of their freedom of choice, readers demand high-quality journalism [5] and trusted sources [4,6,7].
In response, news agencies and news organisations are constantly adapting their business models to digital media innovations in order to improve information quality, competitiveness and growth [8]. Innovation and digitalisation of newsrooms are needed to increase the quality and lower the cost of news production, changing how journalists and readers interact with news content and background information [9]. Newsrooms are therefore embracing big data and artificial intelligence (AI) techniques such as knowledge graphs and machine learning (ML) for journalistic purposes [10,11] such as identifying and contextualising newsworthy events in investigative journalism; facilitating data visualisation in digital journalism; analysing information in data journalism; automating news writing in robot journalism; providing real-time fact-checking tools for political journalism. The result is an emerging type of intelligent information system that we call the Journalistic Knowledge Platform (JKP) which is currently gaining interest in research and practice. In this paper, we define JKPs as platforms that apply AI and big data to journalism in order to manage and support the knowledge work needed in all stages of news production.
JKPs can be described from a functional, an organisational and a technical perspective. From a functional point of view JKPs automate the process of annotating metadata and support daily workflows like news production [12,13], archiving [14,15], management [16,17] and distribution [18][19][20][21]. JKPs harvest and analyse news and social media information over the net in real time [22], leverage encyclopaedic sources [23], and provide journalists with both meaningful background knowledge [24] and newsworthy information [25]. From an organisational viewpoint: JKPs are deployed in newsrooms to manage the knowledge needed to support journalists with creativity and discovery tasks. These are tailored to the particular digital strategies and editorial lines to improve news broadcast. JKPs also follow media standards to facilitate communication with customers and providers, and are subject to legal regulations such as data privacy. From a technical perspective JKPs implement stateof-the-art AI technologies such as machine learning, natural language processing (NLP) and knowledge representation and reasoning. News-relevant information is represented in knowledge bases which are exploited with data analysis, reasoning and information retrieval techniques to help journalists and readers dive more deeply into information, events and storylines. Today, knowledge graphs [26] are a topical technique for knowledge representation that continues to grow in importance, therefore, we centre our analysis on JKPs building on knowledge graphs.
According to the authors of Hogan et al. [26], knowledge graphs capture and abstract knowledge using graph-based data models. They are particularly relevant for scenarios that integrate and extract value from diverse and dynamic data. Wherein entities of interest are represented as nodes and the relations between them as edges of the graph. Ontologies and rules are used to define the semantics and terms of the graph and reason about it, but also to ease data integration from, for example, Linked Open Data (LOD) [27] and existing large-scale knowledge graphs like Wikidata and DBpedia. Compared to relational and NoSQL models, knowledge graphs facilitate semantic integration, flexible data and schema evolution and graph query languages with mechanisms to explore complex relations through arbitrary-length paths.
In this article, we explore the current state and suggest future research directions for knowledge graph-based JKPs. We ask: "What challenges and opportunities for newsrooms have motivated the knowledge graph-based JKPs?" (RQ1 ), "How does the research on knowledge graph-based JKPs address these challenges and opportunities?" (RQ2) and "What are the most important open areas for research on knowledge graph-based JKPs?" (RQ3). To answer these three questions we have performed a detailed analysis of 14 JKPs reported in the literature that apply AI and big data to journalism in order to manage and support the knowledge work needed in all stages of news production. A broader literature on related technologies exists. Our analysis does not ignore other solutions applying artificial intelligence to journalism, but our focus is on providing a comprehensive analysis of the main concepts of those JKPs that build on knowledge graphs rather than specific techniques, optimisations, tools and systems. The JKPs were selected in context of a broader systematic literature review on how knowledge graphs can support news in a wide sense [28]. Compared to this study, Opdahl et al. [28] was not restricted to JKPs and did not analyse the challenges and opportunities nor the current and future directions of JKPs. We conducted a qualitative meta-analysis (see Appendix A for a detailed description of the meta-analysis method), and we examined the existing JKPs in light of our experiences with developing JKPs along with an industry partner for the international newsroom market [29].
The present article extends Gallofré Ocaña and Opdahl [30], which analyses challenges and opportunities for developing JKPs along six axes: stakeholders, information, functionalities, techniques, components and concerns. This article extends the analysis by considering more JKPs. It investigates how well the challenges and opportunities are cov-ered in the research literature and suggests future research directions. The rest of the paper is organised as follows: we summarise the identified JKPs in Section 2; analyse the current challenges and opportunities for newsrooms that motivated JKPs in Section 3; present the state of research on JKPs in terms of their stakeholders, information, functionalities, techniques, components and concerns in Section 4; discuss the future directions for research on JKPs in Section 5.

Analysed Platforms
We identified 14 platforms that fit under our definition of JKPs, which we list in Table 1. The identified JKPs cover a total of 28 papers carried out by distinct research groups located in 11 different countries and in collaboration with a variety of news agencies, news organisations and industry partners. The JKPs from 2000 to early 2010 implemented the Semantic Web idea [45] in newsrooms. These JKPs used semantic web technologies [46] to automate the metadata annotation process [16], combine different knowledge bases [24], and formalise media standards [14]. They used ontologies in NLP pipelines together with Linked Open Data (LOD) [27] resources from external knowledge bases (i.e., Wikipedia, DBpedia) to automatically annotate news archives and feeds with metadata about topics, keywords, categories and other relevant information (e.g., persons, places, organisations, sentiments and relations) [14,19]. The annotated information was stored in knowledge bases, facilitating the interlinking of news across different archives, online catalogues and external LOD repositories [16,24]. For instance, Neptuno was the first project to publish a journalistic ontology and adapt the IPTC topics [47] as RDF [14,48], and Troncy [49] converted the IPTC NewsCodes [50] into SKOS [51] thesaurus and defined an OWL [52] ontology for the IPTC News Architecture [53]. The resulting systems provided services for supporting news creation [14], personalising news retrieval [18,20], facilitating semantic search [14,18,19,24,33], visualising ontologies [14], managing content [16], aggregating information [24,34] and recommending news [16].
The JKPs from early 2010s until today focused on identifying and analysing events and advancing AI/ML for supporting journalism. In addition, some of them focused on scaling over large volumes of live streams of multimedia news [36], social media [39] and TV/radio broadcasts [17]. Similar to the previous JKPs, news items were annotated using either media standards [35] and LOD resources [37] or both [23] and stored in knowledge bases to facilitate cross-lingual information retrieval services through semantic technologies and ontologies [21,23]. These JKPs continuously monitored and curated the annotated items using AI/ML and LOD to provide relevant insights for journalists and identify current, past and future events. For example, the annotated news items were used to identify networks of actors [15], suggest news angles [13,54], automate news creation [22] and facilitate factchecking [17], and the events were analysed using different AI/ML techniques for grouping events and news items [21,23], reasoning over events, and reconstructing the evolution of the events along time [15].

Challenges and Opportunities Facing Newsrooms
In current newsroom workflows, metadata annotation like tagging and categorisation is often performed manually by journalists. This is a time-consuming process that is error-prone, imprecise and restricts future usability [12]. The added metadata is reduced to a few general categories that are limited to authorship, dates, content language and news management information. This metadata is used to address newsworthiness and filter events according to news customers' and audiences' interests. However, due to the lack of fine-grained annotations, newsrooms have difficulties implementing highquality information retrieval and filtering services [14,16,20]. Hence, they return irrelevant, incomplete and even biased results to customers [21].
Journalists spend a lot of their time monitoring and filtering large volumes of news feeds like TV broadcasts, radio shows, social media and published news to keep them up-to-date, time that otherwise would have been invested in producing news [42]. Today's worldwide daily news volumes scale over 100,000 articles making it unfeasible for journalists to manually handle tasks like fact-checking and searching for related articles. Germann et al. [41] (p. 1) claim that "each of [BBC] ca. 300 monitoring journalists usually keeps track up to 4 live sources in parallel (typically TV channels received via satellite), plus a number of other sources of information such as social media feeds". This is an undesired situation for a business sector where time is a critical factor, delays can lower the value of information and imply economic losses [35].
This massive volume of textual and multimedia data is often organised in different catalogues or databases and managed by external services [24,35]. Because these catalogues are not integrated nor share a common schema and lack fine-grained annotations, they limit the possibilities for newsrooms to extract valuable insights and knowledge. Structuring the information and integrating the data from a variety of sources bring newsrooms with better ways to exploit data and facilitate the adoption of AI. For example, it can ease the implementation of information retrieval services and recommender systems and the automation of news creation processes and the detection of fake news and newsworthy events.
To help with these processes, newsrooms currently use a mix of proprietary systems, external services, tools and in-house taxonomies or categorisation schemas that are challenging to integrate and operate together [35,55]. It is a complex ecosystem of applications that hinders the expansion and evolution of digitally integrated newsrooms. It makes it difficult for managers to get an overview of what is happening in news rooms [41]. It limits the interaction with customers [35]. Additionally, it can lead to vendor binding or dependence situations due to the difficulties of maintaining multiple and diverse proprietary solutions. All together and with the urge of reducing cost, increasing high-quality journalism and adapting current newsrooms to digital advances, journalists and newsrooms are becoming interested in the services that JKPs can offer [9].

State of Research on JKPs
We describe the state of research on JKPs by investigating the stakeholders, information, functionalities, techniques, components and concerns dealt with in the identified JKPs. These six analysis axes are based on a qualitative analysis reported in an earlier paper [30].

Stakeholders
JKPs provide services to and interact with a large variety of stakeholders. Figure 1 shows the identified stakeholders and their three top-level categories: general user, organisation and technical agent. The general users can be divided between the internal users that belong to newsrooms and the external ones. The internal users are news professionals like journalists who use JKPs for creating histories [35,39]; fact-checkers who conduct an essential task in combating with fake news and misinformation [17]; archivists who maintain up-to-date the schemas and news archives [14]; ICT professionals and knowledge engineers who develop and maintain JKPs [12]. Whereas, the external users are the audience [21]; the customers to whom new agencies offer services and researchers who investigate JKPs or use JKP to analyse data, as in the SUMMA project where "[political scientists want] to perform data analyses based on large amounts of news reports" [42] JKPs support organisations in different ways: The most direct is in news agencies and news organisations where JKPs are deployed and adapted to particular digital strategies and purposes, but also to other news organisations that consume services from external JKPs. Moreover, JKPs provide services to both private and public organisations like governmental agencies that interact with or consume services from newsrooms, for example, the SUMMA project "provides media monitoring and analysis services to [. . . ] the British government" [42] (p. 1). JKPs also interact indirectly with the organisations responsible for controlling news media standards, vocabulary and ontologies (e.g., the IPTC organisation). This impacts how JKPs are designed because the work of many news agencies depends on those standards, and JKPs often need to build on and comply with them. However, the media standards may not cover or fit the use cases of newsrooms, as in the NEWS project where "most of the NewsCodes defined by IPTC do not have alternative versions in different languages, only in English" [35] (p. 9). Hence, JKPs need to adapt or expand the media standards according to their needs.
Last but not least, the technical agent represents the JKPs and any system or technical infrastructure in newsrooms that support or interact with JKPs. A sub-type of the technical agent is the external system that communicates with newsroom services, like the customers' information systems [35].

Information
JKPs cover the whole news production pipeline from gathering information and news creation to knowledge exploitation and distribution. Table 2 lists the identified categories of information. JKPs deal with textual and multimedia news content produced by news agencies, news organisations and external sources that are managed and distributed to customers and audience [12,14,15]. As textual data we consider the raw text from any source like news articles, social media feeds, web pages, blogs, PDF files, biographies, reports, historical data and geopolitical data. Whereas, as multimedia we consider live broadcasts, photographs, audio files and video files. Moreover, news agencies produce and distribute content in different formats like plain text, Information Interchange Model (IIM), News Industry Text Format (NITF), NewsML and RDF [16,35].
News content is annotated and enriched with metadata using LOD, semantic vocabularies and ontologies, for example, the ASRAEL project "leverage[s] the Wikidata knowledge base to produce semantic annotations of news articles" [23] (p. 1). Metadata can describe different types of basic information like the authorship, language, creation time, ownership, media type, priority, status, version, keywords and categories; as well as inferred information like provenance, tone and sentiment, and the relevant persons, stories, locations, organisations and events [14,34,37].
Journalists and customers of newsrooms are highly interested in current events and their related information [12]. In addition, JKPs are designed to support additional information needs: General users want to have access to details about the stories (i.e., who, what, why, where and when), identify networks of actors and implications, search the events based on their type or place, obtain facts, and retrieve evidences [15,16,24]. News professionals need access to news archives and knowledge bases for documentation purposes, finding connections from past events, following histories and identifying emerging topics [14,35,36,42]. Additionally, customers have different information needs depending on their business or interests, for example, "the press cabinet of a company is usually interested in news items talking about the company or its rivals, whereas a sports TV channel is interested mostly in news items describing sports events" [35] (p. 1).

Functionalities
JKPs provide different functionalities to their users. Table 3 lists the identified main functionalities.

News creation
The process to create a news story. Verification The process of checking the facts and claims.

Source selection
The ability to select the information sources of interest.

Monitoring
The ability to continuously distil information from source. Knowledge discovery Functionalities for exploring relevant information. Trends The current newsworthy developments. Alert A notification.

Summarisation
Extracting and representing the key information from a larger text or group of text. Clustering Grouping similar stories or events. Business support Functionalities to support management workflows.

Content management
Functionalities oriented to store, organise and distribute information. Personalisation Providing information according to the user's interests.
News professionals use JKPs for news creation. This creative process involves different tasks such as discovering, collecting, organising, contextualising and publishing [56,57]. JKPs guide news professionals in writing up their stories [29], support them with contextual background knowledge [12,13,29], provide the means for comparing current events with other events [23] and facilitate access to previous work for creating similar content for a different audience, region or language [42]. JKPs also support news professionals with verification [58] tasks like fact-checking [19,59], provenance [15], rights and authorship management [35]. These are typically time-consuming tasks for journalists and fact-checkers that JKPs automate [17].
Source selection and monitoring functionalities are common across the studied JKPs that harvest and store content from internal and external sources and monitor them in real-time [19,21,36,42]. These functionalities allow journalists to automatically follow and distil news and social media of interest and relieve them from these time-consuming tasks.
Knowledge discovery [60] is one of the most attractive functionalities of JKPs. It allows users to obtain news insights, analysis and relevant information. For instance, in NewsReader it "increases the user understanding of the domain, facilitates the reconstruction of news story lines, and enables users to perform exploratory investigation of news hidden facts" [15] (p. 1). Other interesting functionalities among the studied JKPs are the trends identification used to discover emerging topics, long-term developments and changes in events over time [21,37]; alerts to keep users up-to-date with the last incoming items [19,31,41]; summarisation [61] of news histories and events to provide additional insights [21]; clustering of story lines and events [23,42].
JKPs can be used as business support systems to manage and monitor internal newsrooms production, news coverage and broadcast decisions [31,42]. This helps managers and editors in allocating resources, avoiding duplicate work and detecting news that can be relevant to different audiences. JKPs are also used for content management that allows newsrooms to store, organise and distribute the daily produced content and metadata [14,16,35].
Most of these functionalities should be personalised and tailored to the stakeholders' needs. Hence, JKPs allow the personalisation of their functionalities according to users' preferences and profiles [12,18,33].

Techniques
JKPs implement and combine different IT techniques to fulfil their functionalities. Table 4 lists the IT techniques that we have identified.

Technique Explanation
Semantic technologies Set of technologies designed to work with LOD and semantic data [46].

Fact extraction
The techniques used to identify factual claims.

Conceptual model
A representations of the world or a part of.

Reasoning
The techniques used to infer knowledge.

Network analysis
The techniques used to analyse networks of things.

Event analysis
The techniques used to analyse events.
Natural Language Processing (NLP) A set of techniques intended to work and process language.
AI training The process of creating and tuning an AI model to perform on a given dataset or scenario.
Semantic technologies [46] and similar semantic representation techniques are widely utilised in all the studied JKPs. They use semantic technologies for automating annotation, disambiguating, enriching and leveraging news items with information from external knowledge bases [12,14,19,37]. The semantic representations provide neutral language, explicit relations and facilitate structural matching and lingual independence. They are used for clustering news items and events [23] and detecting trends and story lines [15]. These semantic representations together with fact extraction techniques are used to obtain factual claims from news items and link them to their sources and facts in external knowledge bases (e.g., Wikidata, Wikipedia) [15,19,42].
Conceptual models provide vocabularies, schemas and ontologies. These are often implemented using semantic technologies and represent news stories, events and related information. In addition, conceptual models can define users' interests and preferences [18,20,35], and provide shared resources and formats to facilitate content management and semantic interoperability [14,16,24,37].
Conceptual models and semantic technologies are also used for reasoning, network analysis and event analysis. Reasoning techniques abstract and infer new knowledge from news items, events and temporal aspects [37]. Network analysis is used to find networks of actors, organisations and their implications [15]. Event analysis is applied to detect, identify, cluster and annotate the events described in the news [21,23,35].
The aforementioned techniques are supported by NLP tasks such as named entity recognition, relation extraction and temporal expression normalisation [19][20][21]37,40]. These NLP tasks, among others, are used in many of the components and functionalities of JKPs. In order to obtain optimal results from the NLP tasks, near-continuous training on extensive news corpora [23] is needed to always keep the machine learning models up-to-date.

Components
JKPs rely on different components to fulfil their functionalities and support users. We split these components into four groups: processing, storage, interaction and distribution (see Figure 2). The processing components deal with harvesting data from different sources and processing them. The storage components store and manage data. The interaction components allow users to interact with the information from the system and the distribution components distribute information to users. The processing components cover tasks from data gathering to transforming input sources into knowledge representations. The textual and multimedia sources are continuously harvested. However, not all contents receive the same interest from news professionals, like in SUMMA where "entertainment programming such as movies and sitcoms, commercial breaks, and repetitions of content (e.g., on 24/7 news channels) [. . . ] [are] of limited interest to monitoring operations" [42] (p. 1). Thus, the harvested content is also translated [42] and filtered according with the different stakeholders' interests and needs. In the studied JKPs, spoken content is transcribed [42] and images are textually described [12] to be able further process them.
The harvested content is automatically annotated with metadata (e.g., authorship, categories and topics) to support functionalities like business support, content management and personalisation [14,31,33,35]. The annotated content is often processed by a NLP pipeline using state-of-the-art NLP and natural language understanding modules to perform linguistic tasks such as co-reference resolution, named entity recognition, relation extraction and sentiment analysis [15,19,62]. Both the results of the NLP pipeline and the annotated content are represented semantically following a predefined schema or ontology. These representations link the annotations to a knowledge base (e.g., an RDF-based knowledge graph) [20,37] and enrich the news items with facts from external knowledge bases (e.g., the LOD cloud, DBpedia and Wikidata) [15,23].
The storage infrastructure of a JKP can be composed of an archive, an ontology and a knowledge base. The archive can store millions of historical news articles, biographies, reports [14,37] and other relevant textual and multimedia items. The knowledge base is where the annotated semantic representations of news items are stored and enriched with external information [14,15,24]. The ontology is used to represent the structure of the news items, leveraged information, metadata and vocabulary [14,24,31,35]. Most recent JKPs also include dedicated storage for real-time news-related feeds [42].
Stakeholders interact with the previous components and have access to the functionalities of JKPs mainly by using three types of interaction components: front-ends that implement specific functionalities, for example, news editors with automatic annotation for creating news articles, statistical and visual analysis features for generating reports [19,21] and enhanced insights for discovering new stories [18]; tools that provide useful resources for creating news like currency converters and dictionaries [35]; query engines that can be accessed through APIs and user interfaces. These allow journalists and customers to query, explore, analyse and visualise the archives and knowledge bases [16,19,20,31,42].
News agencies and news organisations use the push and pull components for delivering and distributing content to their users. Push components offer interfaces where information consumers can select and subscribe to feeds of news [12,16,19,31,41]. Whereas the pull components are used to access and browse the repositories of JKPs [14,16,21,31,35].

Concerns
Stakeholders, information, functionalities, techniques and components are influenced or affected by additional concerns of various types. Table 5 lists the identified concerns.

Timeliness
The temporal aspect of news, when they are published and when the stories happen.

Human factors
Human-related aspects that affect newsroom and JKPs. Quality The information and data quality.
Big data Aspects related to the large volume of data, variety of data and velocity in which data is produced.

Performance
The ability to provide results with the expected quality and on time.

Legacy
Old systems or repositories. Software architecture The structure and components of a software system [63].

Maintenance
The ability to reuse, fix and update existing systems.
The customers of JKPs are heterogeneous. They cover diverse sectors and industries, from other newsrooms to companies and institutions, and use different systems to interact with JKPs [35,42]. To improve the interoperability between news agencies and stakeholders, JKPs utilise standards like the IPTC news codes, media topics, semantic vocabularies and RDF [14,35], and keep track of information related to ownership, such as authorship, copyrights, privacy and sources [12,64]. JKPs can also use the ownership information to control the information provenance and reliability [15] by, for example, tracking back the information to its original source and identifying trustworthy providers.
Customers and audiences prefer different languages [21,23,35,37,42]. Hence, JKPs deal with and produce multilingual news items (e.g., Norwegian, Italian, Spanish, English) that are translated, transcribed and delivered in the preferred languages. In addition, these news items have an intrinsic timeliness aspect that defines their value either as a fresh event or as part of a past or present storyline or historic development that can be reconstructed [12,15,20,42].
JKPs attempt to address different human factors in newsrooms. JKPs automate errorprone and time-consuming processes that were performed manually like news tagging, source monitoring, information filtering, verification, fact-checking and finding related articles and relevant information [14,17,19,21,35]. Hence, JKPs free journalists from these tedious tasks and improve their results. As a result, JKPs facilitate high-quality information to meet the standards of their stakeholders [12].
On the technical side, JKPs deal with big data requirements like volume, velocity, variety. ASRAEL estimates that "the number of collected articles ranges between 100,000 and 200,000 articles per day [. . . ] from around 75,000 news sources" [21] (p. 1). NewsReader uses an archive that "contains billions of articles, biographies, and reports" [37] (p. 1). SUMMA platform "[is] able to ingest 400 TV streams simultaneously" [42] (p. 6). Hence, the components of JKPs are designed considering their performance to minimise the processing and distribution times [12,15]. JKPs also integrate legacy components and facilitate interoperability with other systems and external services [16,24,35,37,41]. All these factors make the software architecture of JKPs complex and difficult to maintain without guidance.

Stakeholders
Studies on understanding how journalists embrace digital tools can aid in better adapting JKPs to the way journalists work. Such studies should consider the journalists' perceptions on using intelligent systems for creating news, how journalists process and use background information and the journalists' experiences working with AI, etc. Along these lines, related studies have been proposed, but not limited to, the journalists' usage of social media for gathering and verifying information [65,66] and the relation of the journalism practices and AI [67,68]. Similar user-oriented studies should be conducted on readers and younger and future generations of news consumers to identify what new forms of interaction and consumption are more appealing to them. These studies could consider, for example, the readers' perceptions of automated journalism [69,70] and young people's engagement with news recommendations [71].

Information
To date, the knowledge extraction and recognition of entities from images and videos remain limited. Due to that, JKPs are not able to capture enough information from multimedia news. Promising directions for extracting knowledge from multimedia sources are multimodal machine learning approaches [72] that combine different types of data such as visual and text representations [73,74] and spoken language understanding tasks that analyse and detect audio speech [75]. Another limitation for knowledge extraction is the dark entities (i.e., those entities that do not exist yet in the knowledge base) [76,77]. Fresh stories about newer facts are the most attractive news, therefore, the chances of finding entity representations for those newer facts in knowledge bases are low. Therefore, research on knowledge extraction from multimedia news and dark entities can improve news representation in JKPs.

Functionalities
Non-technical users find it difficult to perform complex searches in knowledge bases, archives and background information due to their lack of expertise. The usage of chatbots can aid user interaction using natural language [42,78]. Additional solutions that can support journalists' interaction with knowledge and information, and automate news production are text summarisation [61], automated reporting or story generation [79,80] and automatic data visualisation [81]. Augmented reality may also bring new possibilities for assisting the exploration of information using knowledge representations and LOD [82].

Techniques
Due to the increase in misinformation and propaganda, it is crucial for journalists and readers to detect and distinguish trustworthy information from fake and biased news. Hence, research on JKPs should include automating the detection of fake news, political bias and rumours across social media platforms and news sources [58,83]. Techniques for such purposes can benefit from research on automating fact-checking [17,59], detecting derived or copied works [21], and media and audio forensics to identify manipulated or tempered multimedia files [84,85]. In addition, identifying misinformation items before they are stored in the knowledge base can improve the data quality of JKPs. Another promising direction is the inclusion of neural-symbolic AI [86] techniques as part of the different components of JKPs. Neural-symbolic AI combines neural networks with reasoning and logic. This can facilitate the inference and deductive reasoning over the data in the JKPs and reduce the computational cost of reasoning over knowledge graphs [87].

Components
In addition to automatic techniques for verification and fact-checking, promising collaborative tools for news and social media verification that involve journalists and readers [88] should be considered, for example, the tools developed in the ReVeaL (https: //revealproject.eu accessed on 15 May 2022), InVID (https://www.invid-project.eu accessed on 15 May 2022) [89] and WeVerify (https://weverify.eu accessed on 15 May 2022) [90] projects. Some of these tools such as WeVerify employ blockchain and knowledge graphs services for recording debunked claims and news. These collaborative repositories could be considered as additional information sources from which JKPs can obtain checked claims and provenance information but also contribute with verified information. Apart from this, the current JKPs are focused on in-house platforms that are typically accessed through a computer and oriented to print journalism. However, there is limited research on components that can facilitate access to the services offered by JKPs for mobile journalism [91] (i.e., journalism edited and published through smartphones and oriented towards audio-visual storytelling).

Concerns
There are no gold standards or methodologies to evaluate JKPs. Accordingly, research needs to include the design and study of evaluation methods for JKPs. Moreover, readers and journalists may perceive results from JKPs as less transparent and difficult to understand [92] as they are driven by AI. To improve their perception of trustworthiness and transparency, research on JKPs should consider explainable AI methods [93].

Stakeholders
To date, there have not been any studies on the implementation of JKPs in newsrooms. Such studies should evaluate the effectiveness, adoption and demand of JKPs. The experiences in implementing JKPs can help to draw a digitalisation path for newsrooms by providing best practices and identifying the main obstacles and solutions. This can support newsrooms with the definition of their roadmaps towards the adoption of JKPs, as it facilitates the identification of the most relevant aspects of JKPs and particular needs according to their current stage. Related studies have considered and provided guidelines for the utilisation of AI in news creation processes in a broader sense [55].

Information
The literature is unclear on how JKPs should best represent events and there is no general agreement on what constitutes an event [21]. Events can range from fine-grained actions like a shot, injury or a handshake between two actors [15] to bigger and broader events like the Spanish Civil War and the COVID-19 pandemic [23] or events in between like a trial process. Therefore, research on JKPs needs to define and discuss how different types of events at different granularity can co-exist in a JKP and what conceptualisations of the event are useful for specific use cases.

Functionalities
A better understanding of how to represent events and news items can bring new possibilities for JKPs, for example, on data analysis like measuring the popularity of people and companies [15], finding cause and effect relations [21], and identifying newsworthy events for specific audiences and particular user' interests [18,33,94].

Techniques
One of the main limitations of the studied JKPs is the extraction of enough and precise information from text and multimedia to represent news stories in high detail [19,31]. For the knowledge graph-based JKPs we have considered in this paper, this means representing the content of text and multimedia as knowledge graphs. JKPs use relation extraction models to extract the textual relations between the entities in news text [15,62]. However, these models are in an early research stage and the extracted relations are basic and limited for representing news [95]. Therefore, the functionalities that are based on these models must be considered for the longer term.

Components
Current open-source large triple-stores are not scalable and their reasoning services are time-consuming and use too many computing resources. This limits the possibilities for JKPs to exploit reasoning capabilities and analyse large knowledge graphs. Hence, scalable triple-stores and mechanisms for better reasoning over large knowledge graphs can ease the incorporation of such solutions and bring new possibilities for JKPs. A promising approach is the inclusion of entity spaces [96]. These are vector spaces that represent the different entities of a knowledge graph and also capture their semantic information. They can be used to speed up processes that require complex graph explorations like inferring and disambiguating knowledge for unseen entities. Another promising approach for integrating and managing information from different types of databases is the usage of virtual knowledge graph [97]. Virtual knowledge graphs represent the schema of the different databases and provide mechanisms for querying the databases using SPARQL, hence, it integrates databases on the schema level and reduces data replication.

Concerns
Only the most recent projects proposed systems to deal with big data [37,39,42]. Their architectures must also keep the machine learning models up-to-date and replace them for future best-of-breed, facilitate the schema evolution of knowledge bases and ease the expansion, distribution and independence of services [44]. Research on software reference architectures [98] for JKPs can assist in better designing and implementing them, as well as establishing a vocabulary and a framework to compare JKPs.

Limitations
This study only covers the English-language literature and is based on JKPs developed in Europe, Canada and USA. We have not identified any relevant JKPs in other geographical regions, but of course such JKPs may have been reported in languages other than English. The study is also influenced by the authors' involvement in the development of News Hunter. To reduce bias, we have not included our JKP during the meta-analysis process and we limited the News Hunter contribution to supporting and extending the findings. Additionally, the purpose of our analysis is to review the current state and future directions of the field, and not to evaluate the quality of the proposals.

Conclusions
This study has addressed which challenges and opportunities have motivated knowledge graph-based JKPs (RQ1), how knowledge graph-based JKPs are addressing these challenges and opportunities (RQ2), and the future directions of research on knowledge graph-based JKPs (RQ3). To our knowledge, no previous studies have identified and analysed JKPs as an emerging type of intelligent information system in this way. Although there are examples of such systems in the literature, to date, ours is the first clear definition and broad analysis of JKPs and their context.
In current newsroom workflows, metadata annotation is a manual, time-consuming and error-prone process. Newsrooms face difficulties to implement high-quality information systems. Journalists spend a lot of their time monitoring and filtering vast volumes of news, time that otherwise could be invested in creative tasks. These vast volumes of data often lack fine-grained annotation and are split into different repositories with different schemas. This limits the capacity of newsrooms to analyse and exploit their information resources, and share data with news consumers. To help with these processes, newsrooms use a large variety of services that are challenging to integrate and operate together, hindering their evolution towards digitally-integrated newsrooms. JKPs are a new type of intelligent information system that offer many opportunities for high-quality journalism in newsrooms by combining AI, knowledge bases, LOD, NLP, ML and deep learning techniques. JKPs automate the metadata annotation and content enrichment with background information from external sources; monitor internal and word-wide news media output; facilitate event detection; support news creation and verification. They also facilitate the ingestion of vast amounts of data, and its storage, organisation and distribution. JKPs can provide newsrooms with a digitalisation path to reduce production costs and improve information quality while adapting the current workflows of newsrooms to new forms of journalism and readers' demands. We expect the next generation of JKPs to focus on enhancing journalism and providing unexpected news insights for journalists.
Many JKPs are big-data-oriented systems [15,21,22,35,42,44] that need a significant investment effort from newsrooms, making the adoption of JKPs challenging for small or local newsrooms. The adoption of JKP can yield many benefits, but newsrooms may perceive JKPs as an investment risk and look for alternative services. Thus, the formalisation of JKPs and the usage of open-source and out-of-the-box solutions, together with the popularisation of knowledge graphs will lower the adoption risk and increase the benefits. For small and local newsrooms, sharing JKPs can reduce the entrance barriers-a practice that is becoming more popular among digital-born news organisations and freelancers.
Section 5 has already proposed several paths for further research on JKPs. As an immediate continuation of this study, we are designing the software reference architecture for JKPs and developing tools to further study and enhance JKPs [44]. Through this work, we plan to define a reference model for JKPs that will allow their comparison and validation.

Conflicts of Interest:
The study is influenced by the authors' involvement in the development of the News Hunter platform. To reduce bias, we have not included our JKP during the meta-analysis process and we limited the News Hunter contribution to supporting and extending the findings. Additionally, the purpose of our analysis is to review the current state and future directions of the field, and not to evaluate the quality of the proposals. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Analysis Method
To synthesise data from the literature on the platforms that fit under our definition of JKP, we have used a qualitative meta-analysis approach [99,100]. We have searched the research literature and identified 28 papers describing 14 JKPs carried by distinct research groups located in different countries and in collaboration with a variety of news organisations and industry partners ( Table 1 presents an overview of the selected JKPs and papers). According to Maxwell [101], our sample represents an adequate variation in the phenomenon of interest. During the meta-analysis process, we focused on the last 10 years of advances and excluded our JKP from the process of extracting and coding data. After we synthesised the first conclusions, the excluded JKPs were added and analysed to support and expand our findings. This decision was taken to focus on the most recent advances and minimise the bias in the meta-analysis process inducted from our point of view.
From the selected literature we manually extracted 322 claims about the JKPs, i.e., statements that described the current state or expressed potential challenges or opportunities. Two independent expert coders (viz., the authors) conducted a purposive sampling [102,103] using the extracted claims that became marked up with 406 codes. We cleaned the generated codes with the support of NLP and natural language understanding techniques (implemented in python with support of Scikit-learn [104], NLTK [105], SpaCy [106] and other libraries) (i.e., Damerau-Levenshtein distance [107], word2vec [108] and Wordnet [109]). After cleaning and tidying up the initial codes, we interatively classified the resulting codes into six top-level categories and 64 sub-categories (Figures 1 and 2 and Tables 2-5 shown the final top-level and sub-categories).
We used the top-level and sub-categories to re-code the 322 claims and we measured the final agreement using Gwet's AC 1 [110] inter-rater reliability coefficient with nominal ratings. The AC 1 coefficient for each category was 0.77 for Stakeholders, 0.65 for Components, 0.71 for Techniques, 0.71 for Aspects, 0.72 for Information types and 0.57 for Functionalities; with an average AC 1 of 0.69 and a standard deviation of 0.063. Following the recommendations of Gwet [110] about Landis-Koch and Altman's benchmark scales, our AC 1 express an acceptable agreement among coders. Finally, we agreed on the final codes for each claim and initiated an abductive process to understand and derive the current state and future directions of the research of JKPs.