Assessment of Online Deliberative Quality: New Indicators Using Network Analysis and Time-Series Analysis

: Online deliberation research has recently developed automated indicators to assess the deliberative quality of much user-generated online data. While most previous studies have developed indicators based on content analysis and network analysis, time-series data and associated methods have been studied less thoroughly. This article contributes to the literature by proposing indicators based on a combination of network analysis and time-series analysis, arguing that it will help monitor how online deliberation evolves. Based on Habermasian deliberative criteria, we develop six throughput indicators and demonstrate their applications in the OmaStadi participatory budgeting project in Helsinki, Finland. The study results show that these indicators consist of intuitive ﬁgures and visualizations that will facilitate collective intelligence on ongoing processes and ways to solve problems promptly.


Introduction
This article proposes deliberative quality indicators that will help to monitor online governance processes. It is widely acknowledged that governments can no longer address social problems alone, and that there is an increasing need for collaboration with the market and civil society in making collective decisions and sharing responsibility [1][2][3][4][5][6]. Governance has become a popular term, referring to the emerging forms of the governing system, and featuring "interactive processes through which society and the economy are steered towards collectively negotiated objectives" [7] (p. 4). The notion of governance entails a change from top-down to bottom-up policy-making that promotes public participation to address social issues and social sustainability together [8][9][10]. Despite the positive connotation, however, governance brings new challenges [11][12][13][14]. Unlike governments under a bureaucratic hierarchy and markets coordinated with contracts, governance is based on networks of actors with fragmented interests, resources, and jurisdictions [15]. Moreover, citizens have more opportunities to directly engage in multi-channel and multivoice processes as partners rather than customers [1,16]. In these cross-boundary settings, deliberation is an indispensable way to build consensus and foster voluntary collaboration among actors, but is prone to ineffective outcomes in the absence of appropriate arrangements [1,7,[17][18][19][20].
In this regard, deliberation is a key element of governance, and assessing its quality is crucial to the development of better governance practices. As a social system, governance faces both internal and external impulses under changing environments. 'Resilience' refers to the collective capacity to cope with these shocks and quickly bounce back to its core functions after encountering disturbances [14,[21][22][23]. Face-to-face meetings and public relation activities provide some assistance, but they are limited for this purpose. Scholars have recently focused on digital technology and online deliberation as complementary 1.
How can the quality of online deliberation be monitored on government-run platforms? 2.
What new indicators can support such monitoring by applying network analysis and time-series analysis? 3.
How can the new monitoring indicators help to develop more resilient governance practices?
This article is organized as follows. The second section reviews theoretical discussions of deliberation and existing automated indicators, followed by proposing new indicators. The third section introduces an empirical case of the OmaStadi project in Helsinki, Finland. The fourth and fifth sections then set formal definitions of the proposed indicators and report empirical results. In the final section, we discuss the findings, limitations, and implications for future study.

Concept of Online Deliberation and Past Measurement Efforts
Online deliberation research is an emerging strand of deliberation literature that focuses on three aspects of internet-enabled deliberation [27,33,34]: input, throughput, and output. An input aspect sheds light on the preconditions of deliberation. Institutional arrangements (e.g., participatory budgeting), platforms (e.g., government-run platform), and socio-political elements (e.g., internet access rate and social strata) are examples of such. A second aspect is related to outcomes resulting from online deliberation-be they internal (e.g., knowledge gains and digital citizenship) or external effects (e.g., policy changes and side effects). A third aspect concerns processes through which multiple stakeholders participate and build consensus democratically. This article aims to contribute to the third aspect, the process of online deliberation, by proposing automated quality indicators using a social network and time-series analysis.
This section reviews current automated computational methods for assessing online deliberative quality. Theories of deliberation and deliberative democracy have long been influenced by Habermas's notion of communicative rationality and the public sphere [49][50][51][52]. The central presumption is that social problems are increasingly "wicked," that is, subjective and contextual; thus, instrumental rationality, which uses impersonal tools designed to attain measurable objectives, no longer captures the essence of social problems [53]. Since there is no single optimal solution to the problems faced by multiple stakeholders, they need to engage in communicative processes through which problems will be identified, viewpoints exchanged, and collective action promoted [54][55][56]. Therefore, social problems need to be solved through inter-subjective communication rather than objective calculation.
Nevertheless, deliberation literature has suffered from a lack of standard definition of deliberation [33,57,58], leading to fragmented quality indicators [28,41,[59][60][61][62]. For instance, Dahlberg [62] suggested six criteria: reasoned critique, reflexivity, ideal roletaking, sincerity, inclusion and discursive equality, and autonomy from state and economic power. Fishkin [60] proposed five criteria: information, substantive balance, diversity, conscientiousness, and equal consideration, based on three democratic values: deliberation, equality, and participation. Gastil and Black [61] developed the following criteria: create an information base, prioritize key values, identify solutions, weigh solutions, make the best decision, speaking opportunities, mutual comprehension, consideration, and respect. Steenbergen et al. [41] suggested the criteria: open participation, level of justification, content of justification, respect, and constructive politics. Although many listed indicators share similar traits, this fragmented landscape demonstrates less standardized deliberative quality indicators even within the same theoretical root [28].
Traditionally, empirical studies select some of the theory-based criteria using coding schemes and then hire trained coders to assess deliberative quality [35][36][37][38][39][40][41]58,63,64]. For instance, Esau et al. [35] developed eight quality measures, and five coders assessed the quality of textual contents by reading a sample of user comments on several online platforms. This method is considered a "gold standard," since human experts can extract sophisticated meanings from text [43]. However, Beauchamp [42] has pointed out that it requires intensive work and there may be biases as to what count as deliberative criteria. We agree in part, because there are well-established measures (e.g., Krippendorff's alpha) in content analysis to handle inter-coder reliability. Nevertheless, manual coding can still be problematic when there is significant disagreement among coders or a large amount of online discussion data, which is the case in this article.
Alternatively, there have recently been attempts to develop automated deliberative quality indicators. By automation, we mean measuring deliberative quality by computational methods rather than by human judgment. According to Beauchamp's review [42], automated methods are still rare but growing, centering around natural language processing and social network analysis. Social network analysis is useful in studying intricate interaction patterns in deliberation processes [44][45][46]65]. For instance, Gonzalez-Bailon et al. [46] developed two dimensions of network topology, representativeness and argu-mentation, and compared the quality of deliberation across different topics in an online discussion community. Another automated method is machine learning-based natural language processing [48,66,67]. For instance, Fournier-Tombs and Marzo Serugendo [48] adopted Steenbergen et al.'s [41] well-known discourse quality index and manually coded online comments in the training dataset, then applied the random forests algorithm (supervised learning) to automatically label deliberative quality in the testing set with computing training errors. Several studies combined network analysis and content analysis to identify major discussion topics and support flows in the discussion network [44,45].
While these automated methods have opened up promising future research avenues, they are in their infancy, and are far from complete. First, the most significant limitation is that human experts can better interpret nuanced political debates and contexts than machines at the current level of technical development. Beauchamp [42] found that studies employing automated methods tend "to focus on superficial, easily measured markers of argument and deliberation" (p. 324), "many of which measures are also disappointingly superficial when examined in detail" (p. 345). Second, he pointed out dozens of heterogenous indicators in the field (p. 336). This is perhaps because of unique online deliberative systems. For instance, online deliberation takes place on small online forums [38], online communities such as Twitter and Facebook [46,63,68], government-run platforms [65], parliament websites [69], and online newspapers [70,71]. These various digital platforms have unique deliberative systems, influencing data collection and analysis. For instance, while Campos-Domínguez et al. [70] used Twitter's hashtags, Black et al. [38] used Wikipedia's collaborative systems, and Gonzalez-Bailon et al. [46] used Slashdot's moderation system to identify divergent topics under discussion and assess their quality quantitatively. Steenbergen et al. [41] saw that the lack of standard definitions and measures could lead to problems with validity. If one attempts to develop a comprehensive set of indicators that captures the universal elements of online deliberation, this could achieve external validity (generalizability) at the risk of reducing internal validity (trustworthiness). Third, automated methods have focused relatively less on a crucial dimension of deliberation: time. Without the time dimension, online deliberation data consist of a chunk of texts and interactions captured by a single snapshot. Since deliberation is a communicative process, it is crucial to assess how its quality changes over time.
Overall, we note that automated methods for assessing online deliberative quality pose potential validity issues. This article addresses internal validity by developing new deliberative quality indicators using network data and time-series data collected from an online deliberative platform of interest, arguing that the combination will provide information on how online deliberation evolves. In terms of external validity, we argue that the two types of data are commonly observed on many online platforms, creating comparable datasets.

New Online Deliberative Quality Indicators
Against this backdrop, we select some of the criteria from the pool of existing indicators or create a new one that can be measured and analyzed using the two types of data. For this aim, this article defines deliberation as "the process by which individuals sincerely weigh the merits of competing arguments in discussions together," followed by James Fishkin [60] (p. 33) based on Habermas [72], and developed indicators based on his framework with three democratic dimensions: participation, deliberation, and equality [60]. These dimensions will provide us an analytical lens to examine online deliberation processes from various angles. We set participation and deliberation as two dimensions of the proposed indicators, with equality as an overarching value.
First, participation means "behavior on the part of members of the mass public directed at influencing, directly or indirectly, the formulation, adoption, or implementation of governmental or policy choices" [60] (p. 45). When residents intend to influence local politics, they might engage in a wide range of activities, for instance, joining an association, visiting petition websites, and attending offline meetings. Fishkin [60] regards these activi- ties as forms of mass political participation, arguing that such activities should be spread throughout the population, and people should reinforce their participatory activities over time: "Mass participation is a cornerstone of democracy" (p. 46). One of the following criteria is the participation rate: what percentage of the population participates in online deliberation? It is a measure for the representative participation in online deliberation. Next, activeness measures the number of active commentators and comments and their longitudinal trends. To what extent do residents actively participate in online deliberation? Does online deliberation show an upward, downward, or constant trend? Lastly, continuity measures whether there is consistency in deliberation engagement, without significant gaps, especially during the operational periods. Overall, we consider participation rate, activeness, and consistency as essential indicators of the extent to which residents participate in online deliberation over time.
The second dimension is deliberation. While the participation dimension focuses on the volume of deliberation, in this article, the deliberation dimension focuses on interactions in deliberation. Fishkin [60] suggested five criteria of deliberation: information (accessibility of crucial information), substantive balance (reciprocal communications), diversity (multiple topics by multiple actors), conscientiousness (reasoned arguments), and equal consideration (equal opportunities to weigh up values offered by all actors). We found that network analysis and time-series analysis were useful in examining substantive balance and equal consideration, by which three indicators were developed. Responsiveness measures the degree to which online comments generate back-and-forth conversations like a real discussion that can help participants identify others' viewpoints and clarify their preferences [73]. Janssen and Kies [28] noted that previous studies mostly categorized comments into "initiate" (a message initiates a new debate), "reply" (a message that replies to a previous message), and "monologue" (a message that is not part of a debate). This categorization requires qualitative interpretation of each comment, which does not conform to this article's aim. Therefore, we applied an initiate-reply categorization and measured the proportion of replies. It is a simple yet useful indicator that shows the extent to which others respond to messages. Inter-linkedness examines structural patterns of who communicates with whom and how proposals are related to each other. Although online deliberative platforms provide free and accessible public space for the mass public, actors still select the appropriate partners and topics for benefits [46]. This intentionality in creating and maintaining interactions might form polarized subgroups when conflictual issues emerge that can be analyzed through social network analysis. Lastly, commitment measures the variability of engagement in online deliberation. Many empirical studies have observed that a handful of people and topics often dominate deliberation processes while others remain silent [61,62,74]. We consider this political inactivity an essential issue of political equality [60], and examine the degree of engagement across actors. Overall, we consider responsiveness, inter-linkedness, and commitment as essential indicators of interactions during deliberation processes.

Empirical Case: OmaStadi Participatory Budgeting Project
In this article, we focus on the case of OmaStadi to demonstrate how the proposed indicators can be used in practice. OmaStadi is a recently launched participatory budgeting project led by the City of Helsinki, Finland, in which residents can distribute a city budget of 4.4 million euros (0.1% of the total city budget). The project's basic idea is to provide a platform for residents to initiate proposals for local planning, develop them in collaboration with city experts, and allocate public budgets through popular vote. Likewise, this project's main feature is that residents can play active roles as initiators, developers, and decisionmakers. OmaStadi has a biennial cycle. The first year involves decision-making for budget allocation; then, the second year involves implementation. We studied the OmaStadi 2018-2020 when the project was piloted. The project is now in its second term (2020-2022) with a doubled budget (8.8 million euros). The city government employs an open-source digital platform developed by Metadecidim, called decidim (https://decidim.org), to make it possible for residents to initiate, discuss, and vote in one place. Dozens of municipalities in European countries have employed it for local participatory programs [75]. OmaStadi has six participatory budgeting stages [76]: • Proposal: Residents initiate proposals.

•
Screening: City experts screen all proposals and mark them either as impossible (ei mahdollinen) or possible (mahdollinen). Once a proposal is labeled "impossible," it is no longer proceeded with. • Co-creation: Several "possible" proposals (ehdotukset) are combined into plans (suunnitelmat) based on traits and relevance in collaboration with residents and experts.

•
Cost estimates: City experts estimate the budget for each plan. Plans are prepared for a popular vote. • Voting: Citizens vote on desirable plans online or offline. • Implementation: Voted plans are implemented in the following year.
Another feature of OmaStadi is that online and offline participatory activities are combined within a digital platform. For instance, any registered resident can initiate a proposal(s) online or offline. It is then displayed on a dedicated web page where other residents can develop ideas through comments ( Figure 1). This user-generated discussion system is similar to Reddit, while being distinct from the actor-oriented system of Facebook, for example. The city government reports that around 53,000 residents engaged in online/offline deliberation and voting processes for 1273 proposals through 3188 comments and 107 offline meetings. However, 1273 proposals were too many for a popular vote and less developed for a feasible plan, as there were no cost estimates, job assignments, or area surveys (see Figure 1). Therefore, the goal of the deliberation process was to reduce the number of proposals and develop them into formal plans: 1273 proposals were combined into 336 plans for a vote. As each proposal and plan had a webpage like Figure 1, there were 1609 separate spaces where residents could discuss. However, during a one-month voting stage, residents who entered into an online/offline voting system read a list of 336 plans, not 1273 proposals, on which they were to vote. sion-making for budget allocation; then, the second year involves implementation. W studied the OmaStadi 2018-2020 when the project was piloted. The project is now in it second term (2020-2022) with a doubled budget (8.8 million euros).
The city government employs an open-source digital platform developed by Meta decidim, called decidim (https://decidim.org), to make it possible for residents to initiate discuss, and vote in one place. Dozens of municipalities in European countries have em ployed it for local participatory programs [75]. OmaStadi has six participatory budgeting stages [76]: • Proposal: Residents initiate proposals.
• Screening: City experts screen all proposals and mark them either as impossible (e mahdollinen) or possible (mahdollinen). Once a proposal is labeled "impossible," it i no longer proceeded with.
• Co-creation: Several "possible" proposals (ehdotukset) are combined into plans (suun nitelmat) based on traits and relevance in collaboration with residents and experts.
• Cost estimates: City experts estimate the budget for each plan. Plans are prepared fo a popular vote.
• Voting: Citizens vote on desirable plans online or offline.
• Implementation: Voted plans are implemented in the following year.
Another feature of OmaStadi is that online and offline participatory activities are com bined within a digital platform. For instance, any registered resident can initiate a pro posal(s) online or offline. It is then displayed on a dedicated web page where other resident can develop ideas through comments ( Figure 1). This user-generated discussion system i similar to Reddit, while being distinct from the actor-oriented system of Facebook, for ex ample. The city government reports that around 53,000 residents engaged in online/offlin deliberation and voting processes for 1273 proposals through 3188 comments and 107 of fline meetings. However, 1273 proposals were too many for a popular vote and less devel oped for a feasible plan, as there were no cost estimates, job assignments, or area survey (see Figure 1). Therefore, the goal of the deliberation process was to reduce the number o proposals and develop them into formal plans: 1273 proposals were combined into 33 plans for a vote. As each proposal and plan had a webpage like Figure 1, there were 160 separate spaces where residents could discuss. However, during a one-month voting stage, residents who entered into an online/offline voting system read a list of 336 plans not 1273 proposals, on which they were to vote. . Note: This figure shows an actual web page of a resident-initiated proposal. A resident proposed the idea of making a walking path from Kirsikka park to Roihuvuori water tower. This idea was marked as a possible proposal (Mahdollinen, marked with a green label) and received 11 comments during the deliberative process. Each comment has a timestamp that has been used to construct time-series data.
The city government played a crucial role as a moderator and facilitator in this process of "turning ideas into proposals" [77]. The city government hosted face-to-face meetings and workshops at various locations in eight areas (east, west, north, south, southeast, northeast, and central areas and the entire area). Seven borough liaisons hired by the city facilitated offline and online deliberation processes. The city government intervened directly in the screening and co-creation stages, during which initial proposals were filtered and developed.

Data Collection
We collected data from an online deliberative platform of OmaStadi (https://omastadi. hel.fi). The first dataset contains information on offline meetings, including specific meeting dates and their frequency. The second dataset contains observational data on proposals, plans, and online comments collected by parsing the web pages of all proposals (n = 1273) and plans (n = 336) with Python in May 2020. Although the official deliberation process started in October 2018, the first comment was made on 15 November 2018, which becomes the first date of the investigation period (from 15 November 2018 to 31 October 2019). The parsed data contain the proposal ID, proposal title, proposal area (n = 8), proposal status (impossible or possible), type of post (proposal/plan/initial comments/replies), author ID, and date of publication. The author IDs are registered with nicknames on the digital platform. The Finnish National Board on Research Integrity defines personal data as "any information relating to an identified or identifiable natural person," such as names, telephone numbers, and age, which are strictly regulated [78] (p. 55). Since nicknames could become identifiable when users register with their real names, we anonymized nicknames by numbering them. We used the R statistical program to conduct empirical analyses.

Methods: Network Analysis and Time-Series Analysis
This article proposes new online deliberative quality indicators using network analysis [79,80] and time-series analysis [81,82]. Although the two analyses have distinct theories and applications, we stress that the analyses can be used to shed light on different sides of the same coin, and more specifically, the same data [83]. Recall Figure 1, which shows the engagement of several residents in a proposal through comments. Contrary to Facebook, for instance, in which actors directly connect with other actors through being Facebook Friends, there is no direct connection between users in Figure 1. That is, commentators only create indirect relationships with others, mediated through proposals or plans. In social network analysis, a network that consists of two node sets (actors and proposals) is called a two-mode (bipartite or affiliation) network [79].
A two-mode network can be represented as a graph B = {U, V, E}, consisting of disjoint sets of commentators U, proposals (plans) V, and edges E = {(u, v) : u ∈ U, v ∈ V} that maps connections into pairs of the two sets [84]. If there are n commentators and m proposals, an edge set E can be represented as a matrix with a size of n × m that contains x uv elements. We designate k u as the degree of node u ∈ U and d v as the degree of node v ∈ V, where the degree refers to the number of edges connected to each node [85]. If we consider a time dimension, the network B contains additional time-varying functions The total number of edges of the network B(t) at a given time t ∈ T is then as follows: This simple equation will bridge the network and time-series data. Figure 2 illustrates a fictitious example of how these two types of data are interconnected. Figure 2a  on Proposal B. The total number of edges at time t is, thus, 3. At time t + 1, Actor 1 commented again on Proposal A, whereas Actor 2 and 3 made comments on Proposals C. Despite the change in interactions, the total number of edges at time t + 1 is still 3. We can use this network metric |E|(t) to construct a continuous-time series Y ∈ T, where Y = {y(t)|t = {1, . . . , p}, y(t) ∈ R}, as shown in Figure 2b [81,83].
This simple equation will bridge the network and time-series data. Figure 2 illustrates a fictitious example of how these two types of data are interconnected. Figure 2a presents a two-mode network composed of actors (1, 2, 3) and proposals (A, B, C) at time t and t + 1. At time t, Actor 1 commented on Proposals A, and Actor 2 and 3 commented on Proposal B. The total number of edges at time t is, thus, 3. At time t + 1, Actor 1 commented again on Proposal A, whereas Actor 2 and 3 made comments on Proposals C. Despite the change in interactions, the total number of edges at time t + 1 is still 3. We can use this network metric | |( ) to construct a continuous-time series ∈ , where = { ( )| = {1, … , }, ( ) ∈ }, as shown in Figure 2b [81,83]. While the former is useful in analyzing a system's properties (e.g., patterns of interactions), the latter is useful in analyzing the system's evolution (e.g., trend and forecasting).
As Figure 2 shows, the two types of data have both advantages and disadvantages. On the one hand, network data ( Figure 2a) contains snapshots of relational information among nodes at discrete time points. This relational information allows us to study the structural properties of interactions but becomes cumbersome as new nodes or time variables are added. On the other hand, time-series data (Figure 2b) stores the volume of interactions at discrete points of time. Compared to the network data, time-series data are efficient in analyzing trends, seasonal variations, and forecasting without relational information. Based on the discussions, we now present measurements for each indicator (Table  1).  While the former is useful in analyzing a system's properties (e.g., patterns of interactions), the latter is useful in analyzing the system's evolution (e.g., trend and forecasting).
As Figure 2 shows, the two types of data have both advantages and disadvantages. On the one hand, network data ( Figure 2a) contains snapshots of relational information among nodes at discrete time points. This relational information allows us to study the structural properties of interactions but becomes cumbersome as new nodes or time variables are added. On the other hand, time-series data (Figure 2b) stores the volume of interactions at discrete points of time. Compared to the network data, time-series data are efficient in analyzing trends, seasonal variations, and forecasting without relational information. Based on the discussions, we now present measurements for each indicator (Table 1).

Indicator Description Measurement
Participation dimension (volume of deliberation) Participation rate The proportion of residents who registered with an online deliberative system # total IDs/population Activeness A longitudinal change in active commentators, proposals, and comments # active IDs/# total IDs (two-sided) moving average Continuity The extent of consistency in participation # active days/# entire days

Deliberation dimension (interaction in deliberation) Responsiveness
The proportion of replies in online comments # replies/# comments Inter-linkedness Interactive patterns among actors and proposals Network properties Commitment Variability of the degree of engagement Degree distribution

Participation Dimension
The participation dimension examines the volume of public engagement in deliberation and its longitudinal change within a given online deliberative system. First, the participation rate measures the proportion of residents who registered with the online system, calculated by the number of registered IDs divided by the registered population. The registered IDs of interest cover residents who initiated proposals and commentators on this article.
Second, the activeness measures the volume of active commentators, proposals, and comments over time. While the participation rate shows a total number of available participants and proposals, the activeness shows the degree of actual engagement in deliberation. We count active commentators and comments based on a condition of x uv > 1 in a discussion network of the whole investigation period. This means that we apply a substantially low threshold for defining "activeness" even though residents commented only once or proposals received only one comment within a whole process. Next, we use a (two-sided) moving average (Equation (2)), a fundamental function of time-series analysis, to capture a smoothed trend of online comments [86]. A moving average (MA) is an (arithmetic) average of the values of y t , denoted asŷ t , obtained from the sum of y t divided by "moving" period q. A period q is "moving" because it is continuously rolling accompanied with time t with a fixed length defined by past q 1 and future q 2 .
Third, continuity measures the extent of consistency in participation. There is a similar concept in time-series analysis, called a "stationary" process with three conditions [81]: (1) the mean of y t is constant, (2) the variance of y t is constant (height of fluctuation), and (3) the correlation structure of y t and its lags is constant (width of fluctuation). As will be discussed later, the data collected show a substantial inactivity level, specifically y t = 0, during the investigation period; thus, we created a binary variable C t (1: if y t > 0; 0, otherwise) that counts the existence of daily activities. Using this variable, continuity is obtained by the number of active days divided by the number of the investigation period, p.

Deliberation Dimension
The deliberation dimension examines interactions in deliberation and its longitudinal change. First, responsiveness indicates the proportion of replies to all comments. We consider replies to be the simplest yet explicit evidence of reciprocated communication. Responsiveness is calculated by the number of replies divided by the number of comments.
Second, inter-linkedness refers to the interactive patterns among actors and proposals analyzed using social network analysis. This article focuses on the networks in the southeast and central area due to specific controversial events that will be discussed later. In terms of the network graph, we will demonstrate how the two networks evolve at three stages (proposal stage, co-creation stage, and vote stage) to detect hidden patterns of interactions. We will then calculate descriptive statistics, including the mean number of comments per actor and the mean number comments per proposal.
Third, commitment measures how the number of connections is distributed across the entire network. A handful of active actors and proposals is substantially vibrant in many cases, while most others remain inactive. This article calculates commitment by counting k u (t) and d v (t) separately, then draws degree distribution defined as follows: Actors : P u (k) = fraction of nodes in U with degree k Proposals : P v (d) = fraction of nodes in V with degree d

Results
This section reports the empirical results of participation rate, activeness, and continuity in the participation dimension; and responsiveness, inter-linkedness, and commitment in the deliberation dimension based on the online data of OmaStadi 2018-2020.

Participation Rate
The participation rate measures the proportion of residents who registered in the online system. To register in the online system of OmaStadi, a resident had to have a Finnish bank account or a mobile certificate (linked to the Finnish local registration system) to verify that their home address was in Helsinki (there were other registration options for the youth population). We considered online IDs as identifiers based on this registration system. The number of unique IDs who participated in the proposal stage or made comments at least once during the investigation period was 2281. As the registered population of Helsinki was 648,042 in 2019, according to Helsinki Region Infoshare ( https://hri.fi/data/en_GB/dataset/helsinki-vaesto), the participation rate was found to be 0.0035, that is 0.4% of the population.
Note that multiple participation channels existed in OmaStadi, such as offline meetings and workshops [77]. The City of Helsinki counted offline participants using the same registration system and estimates the total number as 52,938 (https://omastadi.hel.fi/ processes/osbu-2019?locale=en). If we consider these participants, the total participation rate in deliberation processes was up to 8.2% of the population. Moreover, Rask et al. [76] found that the voter turnout rate was 8.6% (49,705 residents) in OmaStadi 2018-2020. These two figures show a moderately high level of participation rate given the fact that it was a pilot project with a small proportion of the city budget (0.1% of the total budget). In this article, we take 0.4% as the participation rate of interest because the focus is on online deliberation rather than overall participation in OmaStadi.

Activeness
The participation rate measures the pool of registered participants available for online deliberation. In practice, however, only a portion of them will be active. Therefore, activeness measures active participants and proposals (Figure 3). Figure 3a shows that the number of residents who commented on any proposal or plan at least once during the investigation period was 1385, or 60.7% of the total number of IDs identified earlier (n = 2281). Figure 3b shows that the number of proposals or plans that received any comment during the same period was 1040, 64.6% of all proposals and plans (n = 1609). This means that 569 proposals and plans received zero comments during the deliberation process.

Results
This section reports the empirical results of participation rate, activeness, and continuity in the participation dimension; and responsiveness, inter-linkedness, and commitment in the deliberation dimension based on the online data of OmaStadi 2018-2020.

Participation Rate
The participation rate measures the proportion of residents who registered in the online system. To register in the online system of OmaStadi, a resident had to have a Finnish bank account or a mobile certificate (linked to the Finnish local registration system) to verify that their home address was in Helsinki (there were other registration options for the youth population). We considered online IDs as identifiers based on this registration system. The number of unique IDs who participated in the proposal stage or made comments at least once during the investigation period was 2281. As the registered population of Helsinki was 648,042 in 2019, according to Helsinki Region Infoshare (https://hri.fi/data/en_GB/dataset/helsinki-vaesto), the participation rate was found to be 0.0035, that is 0.4% of the population.
Note that multiple participation channels existed in OmaStadi, such as offline meetings and workshops [77]. The City of Helsinki counted offline participants using the same registration system and estimates the total number as 52,938 (https://omastadi.hel.fi/processes/osbu-2019?locale=en). If we consider these participants, the total participation rate in deliberation processes was up to 8.2% of the population. Moreover, Rask et al. [76] found that the voter turnout rate was 8.6% (49,705 residents) in OmaStadi 2018-2020. These two figures show a moderately high level of participation rate given the fact that it was a pilot project with a small proportion of the city budget (0.1% of the total budget). In this article, we take 0.4% as the participation rate of interest because the focus is on online deliberation rather than overall participation in OmaStadi.

Activeness
The participation rate measures the pool of registered participants available for online deliberation. In practice, however, only a portion of them will be active. Therefore, activeness measures active participants and proposals (Figure 3). Figure 3a shows that the number of residents who commented on any proposal or plan at least once during the investigation period was 1385, or 60.7% of the total number of IDs identified earlier (n = 2281). Figure 3b shows that the number of proposals or plans that received any comment during the same period was 1040, 64.6% of all proposals and plans (n = 1609). This means that 569 proposals and plans received zero comments during the deliberation process.  Next, we examined the longitudinal change of comments to examine how online engagement evolved. Before this, we briefly illustrate what percentage of resident-initiated proposals were finally selected in the OmaStadi 2018-2020. In the proposal stage, residents proposed 1273 ideas, among which 838 proposals were labeled as "possible" by city experts in the following screening stage (January 2019). This means that 65.8% of proposals survived the filtering process. These "possible" proposals were then combined into 336 plans in the co-creation stage (February-April 2019), which means that 2.5 possible proposals combined into one plan on average. Among 336 plans, 44 plans were selected by popular vote in October 2019, consisting of 83 proposals. Therefore, 6.5% of 1273 initial proposals were finally selected, which shows a substantially low acceptance rate.
This competitive process might have influenced activeness. As Figure 4 shows, the volume of online comments fluctuated greatly according to the different stages of deliberation. In the proposal stage, during which residents proposed their ideas hopefully, there was a vital signal of online deliberation (31% of all comments). In the following screening stage, city experts decided whether proposals were possible or impossible; residents became relatively silent (3.7%). In the co-creation stage, during which possible proposals were prepared for a popular vote, residents showed the most active engagement (49%). However, after six months, until the voting stage (3.3%), online deliberation became almost entirely inactive (1.9%). From this result, we can conclude that the proposal stage and the co-creation stage attracted the majority of online participation (80% of all comments). We also marked the dates of offline meetings (red dots in Figure 4) to visually examine the tendency for offline meetings and online deliberation to co-occur, which was not explicit.
idents became relatively silent (3.7%). In the co-creation stage, during which possible pro posals were prepared for a popular vote, residents showed the most active engagemen (49%). However, after six months, until the voting stage (3.3%), online deliberation be came almost entirely inactive (1.9%). From this result, we can conclude that the proposa stage and the co-creation stage attracted the majority of online participation (80% of al comments). We also marked the dates of offline meetings (red dots in Figure 4) to visually examine the tendency for offline meetings and online deliberation to co-occur, which wa not explicit.
Overall, these results indicate that the degree of public participation in online delib eration fluctuated according to the six stages of OmaStadi. Time-series analysis provide tools, such as an ARIMA model (autoregressive integrated moving average) for analyzing such systematic patterns (seasonality) [81]. If OmaStadi conducts multiple rounds and ac cumulates multi-year data in the future, these models might become useful. The moving average (a blue dotted line in Figure 4) shows that it is hard to detect a clear trend in participation. Nevertheless, the result of a linear regression model of time-series data (online comments = constant + trend component + error term) shows that the trend com ponent coefficient was −0.053 (SE: 0.01 ***), meaning that the degree of engagement wa decreasing slightly (adjusted R square: 0.08). Overall, these results indicate that the degree of public participation in online deliberation fluctuated according to the six stages of OmaStadi. Time-series analysis provides tools, such as an ARIMA model (autoregressive integrated moving average) for analyzing such systematic patterns (seasonality) [81]. If OmaStadi conducts multiple rounds and accumulates multi-year data in the future, these models might become useful. The moving average (a blue dotted line in Figure 4) shows that it is hard to detect a clear trend in participation. Nevertheless, the result of a linear regression model of time-series data (online comments = constant + trend component + error term) shows that the trend component coefficient was −0.053 (SE: 0.01 ***), meaning that the degree of engagement was decreasing slightly (adjusted R square: 0.08). Figure 4 shows volatile patterns of public participation over time, which raises the need for the resilience of deliberation under varying situations. Since there was a substantially low degree of participation during the period between the co-creation stage and the vote stage, we created an indicator continuity to quantify the daily activeness of online deliberation. The continuity records whether online deliberation occurs daily (1 = happened, 0 = not happened), where white spaces in Figure 5 show the proportion of inactive dates. Despite the low threshold, we identified that 32.5% (114 days) of the investigation period (p = 351 days) showed no activity at all (0 comments). tive stages. Figure 4 shows volatile patterns of public participation over time, which raises th need for the resilience of deliberation under varying situations. Since there was a substan tially low degree of participation during the period between the co-creation stage and th vote stage, we created an indicator continuity to quantify the daily activeness of onlin deliberation. The continuity records whether online deliberation occurs daily (1 = hap pened, 0 = not happened), where white spaces in Figure 5 show the proportion of inactiv dates. Despite the low threshold, we identified that 32.5% (114 days) of the investigatio period ( = 351 days) showed no activity at all (0 comments).

Responsiveness
We examined the proportion of replies ( Figure 6). The number of replies was 43 during the investigation period, 13.7% of all comments (n = 3188). This means that a ma jority of online comments did not proceed to become back-and-forth discussions. How ever, there were noticeable differences in the proportion of replies according to differen stages: the highest responsiveness was recorded during the cost estimate stage (22%) par tially due to city experts' responsibility for explaining the budgets, followed by the co creation stage (19.2%). In contrast, the proposal stage (6.8%), the screening stage (3.4% and the voting stage (10.4%) showed low responsiveness. These results indicate that alt hough residents actively engaged in deliberation during both the proposal stage (31% o all comments) and the co-creation stage (49%), the former was characterized by unilatera communications (3.4% replied) while the latter by a higher level of mutual communica tions (19.2%). Does this indicate that the deliberative quality during the co-creation stag was higher than that of the proposal stage? Rather than answering yes or no, we highligh that this deliberative system serves for participatory budgeting, which has different stage and activities. During the proposal stage, residents may simply express their opinions re garding the proposals. Later, residents attend offline meetings and gradually deliberat

Responsiveness
We examined the proportion of replies ( Figure 6). The number of replies was 435 during the investigation period, 13.7% of all comments (n = 3188). This means that a majority of online comments did not proceed to become back-and-forth discussions. However, there were noticeable differences in the proportion of replies according to different stages: the highest responsiveness was recorded during the cost estimate stage (22%) partially due to city experts' responsibility for explaining the budgets, followed by the co-creation stage (19.2%). In contrast, the proposal stage (6.8%), the screening stage (3.4%), and the voting stage (10.4%) showed low responsiveness. These results indicate that although residents actively engaged in deliberation during both the proposal stage (31% of all comments) and the co-creation stage (49%), the former was characterized by unilateral communications (3.4% replied) while the latter by a higher level of mutual communications (19.2%). Does this indicate that the deliberative quality during the co-creation stage was higher than that of the proposal stage? Rather than answering yes or no, we highlight that this deliberative system serves for participatory budgeting, which has different stages and activities. During the proposal stage, residents may simply express their opinions regarding the proposals. Later, residents attend offline meetings and gradually deliberate to develop proposals into plans through reciprocal discussions. Another remarkable feature is the promptness of responses. As Figure 6 represents, the correlation between initiates and replies on the same day (lag 0 in time series analysis) was 0.64, indicating that residents rarely responded to others' comments, but did so quickly when they did. Overall, these results underline that online deliberation on government-run platforms substantially reflects formal governance processes so that the proposed indicators should be interpreted using qualitative investigations.
ture is the promptness of responses. As Figure 6 represents, the correlation between initi ates and replies on the same day (lag 0 in time series analysis) was 0.64, indicating tha residents rarely responded to others' comments, but did so quickly when they did. Over all, these results underline that online deliberation on government-run platforms substan tially reflects formal governance processes so that the proposed indicators should be in terpreted using qualitative investigations.

Inter-Linkedness
This article focuses on networks of two selected inner-city areas, the southeast an central areas, to investigate commentators' inter-linkedness. In their final evaluation re port on OmaStadi 2018-2020, Rask et al. [76] found fierce competition between supporter of proposals in these two areas. In the southeastern area, there were competing demand to renovate the Aino Ackté villa (a historical villa that commemorates soprano singer Ain Ackté) and to install a new artificial turf in the Herttoniemi sports park. The renovatio for the villa finally received 2727 votes, while the artificial turf received 2710. In the centra area, a similar competition was found between a proposal for artificial turf in Arabian ranta and the regeneration of a historical Vallila workshop area, in which the former re ceived 2870 votes, and the latter received 2784 votes.
In the voting stage, the city government used a vote visualization system that dis played real-time information on which proposals were leading. Unlike those voting i libraries and other public spaces, residents who voted through their own electronic de vices could change their votes again during the one-month voting stage. The combinatio of the competing proposals and voting system sparked wait-and-see voting behaviors un til the last minute. As a result, voter turnouts in the two areas were significantly highe than in other areas. Turnout in the southeastern area was three times higher than that i the eastern area [76].
Based on this context, we investigated discussion networks related to these two area using a social network analysis. Table 2 shows that the discussion networks of the south east and central areas took up 30.6% of total active commentators, 21.6% of all proposal and plans, and 22.7% of the total number of comments. This result indicates that these tw areas were relatively more vibrant than the other six areas. Moreover, mean number o

Inter-Linkedness
This article focuses on networks of two selected inner-city areas, the southeast and central areas, to investigate commentators' inter-linkedness. In their final evaluation report on OmaStadi 2018-2020, Rask et al. [76] found fierce competition between supporters of proposals in these two areas. In the southeastern area, there were competing demands to renovate the Aino Ackté villa (a historical villa that commemorates soprano singer Aino Ackté) and to install a new artificial turf in the Herttoniemi sports park. The renovation for the villa finally received 2727 votes, while the artificial turf received 2710. In the central area, a similar competition was found between a proposal for artificial turf in Arabianranta and the regeneration of a historical Vallila workshop area, in which the former received 2870 votes, and the latter received 2784 votes.
In the voting stage, the city government used a vote visualization system that displayed real-time information on which proposals were leading. Unlike those voting in libraries and other public spaces, residents who voted through their own electronic devices could change their votes again during the one-month voting stage. The combination of the competing proposals and voting system sparked wait-and-see voting behaviors until the last minute. As a result, voter turnouts in the two areas were significantly higher than in other areas. Turnout in the southeastern area was three times higher than that in the eastern area [76].
Based on this context, we investigated discussion networks related to these two areas using a social network analysis. Table 2 shows that the discussion networks of the southeast and central areas took up 30.6% of total active commentators, 21.6% of all proposals and plans, and 22.7% of the total number of comments. This result indicates that these two areas were relatively more vibrant than the other six areas. Moreover, mean number of comments per commentator in these two areas were slightly lower than the average, indicating that residents participated more equally in online discussions. In contrast, mean number of comments per proposal in the two areas were higher than the average, indicating that proposals received more comments in these areas. Overall, the two areas' discussion networks were characterized by the active and broad involvement of residents.
Next, we further investigated these two areas' networks using network visualization, as shown in Figure 7. Unlike the qualitative investigation by Rask et al. [76], we could not identify clear patterns of network evolution. This result implies that residents might not have considered the platform of OmaStadi as a preferred place for discussion compared to other private platforms, such as Facebook or Twitter. However, Figure 7 still shows a hidden pattern of interactions to be noted. In Figure 7, circles denote actors, and squares denote proposals or plans (red: "impossible" proposals; green: "possible" proposals; yellow: plans). Recall that residents who entered into a voting system were allowed to read and vote among 336 plans (yellow squares). . Network visualization of two selected areas. Note: grey circles denote commentators; red squares denote "impossible" proposals; green squares denote "possible" proposals; yellow squares denote plans.

Commitment
Another crucial deliberative quality indicator is commitment, which measures the degree to which commentators and proposals are evenly distributed. In network theory, Figure 7. Network visualization of two selected areas. Note: grey circles denote commentators; red squares denote "impossible" proposals; green squares denote "possible" proposals; yellow squares denote plans. In the proposal stage, residents tended to engage with both "impossible" and "possible" proposals; then, they started to focus on discussions for "possible" proposals and plans in the co-creation stage. This trend of a transition from "impossible" proposals (red squares) to plans (yellow squares) implies that residents understood the mechanism of OmaStadi and strategically chose which proposals were worth focusing on. Moreover, residents who participated in proposals for the southeast formed two polarized subgroups, which requires further investigation for the context. Logically, one would expect residents to continue to engage in plans during the voting stage. However, we observed two abnormal patterns. First, residents moved back to a few proposals and continued discussions, which are not displayed in the voting system. Second, compared to the clustered interactions shown in the co-creation stage, interactions between the proposals were hardly observed in the voting stage, indicating that their supporters discussed in separate forums. Overall, this result shows that interactive patterns of deliberation tend to be fragmented rather than inter-linked across groups in competitive situations, which raises a question about the efficacy of the decentralized system of OmaStadi: does the deliberative system consisting of 1609 separated spaces (1273 proposals plus 336 plans) provide practical ways for residents to discuss local matters collectively? Moreover, when conflictual issues occur, does the system provide an integrated space to exchange reasonable arguments and resolve conflicts?

Commitment
Another crucial deliberative quality indicator is commitment, which measures the degree to which commentators and proposals are evenly distributed. In network theory, degree refers to the number of links through which a given node has connected with other nodes. In our case, the term degree denotes the number of online comments. Most social networks show a highly right-skewed degree distribution, where a majority of nodes exhibit a small number of degrees with a handful of highly active nodes (e.g., social influencers) [85]. Our case did not deviate from this tendency. Figure 8a shows that most commentators made fewer than two comments in an entire process, 1.96 comments on average (sd: 5.69) with 16.9 skewness. The highest degree was 127 made by one individual, indicating an extremely active participant. Similarly, Figure 8b shows that proposals received 2.61 comments on average (sd: 3.42) with 8.4 skewness. The highest degree was 68, also indicating the existence of a few popular proposals. We do not consider these extreme cases of commentators and proposals to be inherently problematic. Instead, we argue for the importance of promoting inactive actors' participation to make their voices heard and reflected in deliberation processes.
(sd: 5.69) with 16.9 skewness. The highest degree was 127 made by one individual, indicating an extremely active participant. Similarly, Figure 8b shows that proposals received 2.61 comments on average (sd: 3.42) with 8.4 skewness. The highest degree was 68, also indicating the existence of a few popular proposals. We do not consider these extreme cases of commentators and proposals to be inherently problematic. Instead, we argue for the importance of promoting inactive actors' participation to make their voices heard and reflected in deliberation processes.

Discussion
As online deliberation results in a large amount of user-generated discussion data, traditional human coding for assessing deliberative quality is impracticable. A growing body of research has attempted to overcome this limitation by employing automated computational methods, including natural language processing and social network analysis. While this has opened up a promising research agenda, time-series data and associated analysis have been less studied. Time is a crucial dimension of deliberation because deliberation is a communicative process, the quality of which might change over time.
To fill this gap, we proposed throughput indicators for assessing online deliberative quality using network analysis and time-series analysis, arguing that the combination will help actors monitor how online deliberation evolves. Throughput indicators that focus on assessing the deliberation process could be communicative and intuitive to facilitate deliberation among actors and build the capacity to cope with various governance challenges. Based on Fishkin's framework [60], we developed the six indicators of participation rate, activeness, continuity, responsiveness, inter-linkedness, and commitment, and then demonstrated their application with the empirical case of OmaStadi participatory budgeting in Helsinki. This was a pilot project in which residents could initiate proposals, develop them together, and vote for desirable proposals on a digital platform.

Discussion
As online deliberation results in a large amount of user-generated discussion data, traditional human coding for assessing deliberative quality is impracticable. A growing body of research has attempted to overcome this limitation by employing automated computational methods, including natural language processing and social network analysis. While this has opened up a promising research agenda, time-series data and associated analysis have been less studied. Time is a crucial dimension of deliberation because deliberation is a communicative process, the quality of which might change over time.
To fill this gap, we proposed throughput indicators for assessing online deliberative quality using network analysis and time-series analysis, arguing that the combination will help actors monitor how online deliberation evolves. Throughput indicators that focus on assessing the deliberation process could be communicative and intuitive to facilitate deliberation among actors and build the capacity to cope with various governance challenges. Based on Fishkin's framework [60], we developed the six indicators of participation rate, activeness, continuity, responsiveness, inter-linkedness, and commitment, and then demonstrated their application with the empirical case of OmaStadi participatory budgeting in Helsinki. This was a pilot project in which residents could initiate proposals, develop them together, and vote for desirable proposals on a digital platform. Table 3 summarizes the description and the usefulness of each indicator. By analyzing the online data of OmaStadi, we first found that 0.4% of Helsinki residents participated in online deliberation based on the participation rate; this is useful in assessing the representativeness of online deliberation. Second, among those participants, 60.7% of residents made comments at least once, and 64.6% of proposals received comments at least once during the entire process with a −0.053 linear decreasing trend based on activeness; this is useful in assessing the degree to which residents actively engaged in online deliberation processes and its longitudinal trend. Third, we found that 32.5% of the investigation period recorded zero participation in online deliberation based on continuity; this is useful in assessing the extent to which residents consistently participate. Fourth, we found that 13.7% of comments were replies, indicating the prevalence of messages receiving no reply, based on responsiveness; this can be used to assess how many online communications were reciprocated. Fifth, we focused on discussion networks in two areas where intense competition occurred and found intense and fragmented discussion patterns based on inter-linkedness; this is useful in examining discussion networks' structural properties, especially communications between and within different subgroups. Sixth, we found that residents made 1.96 comments and proposals received 2.6 comments on average, while there were a few highly active cases based on commitment; this is useful to monitor unequal involvement in online deliberation. The proportion of residents who registered with an online deliberative system Representativeness Activeness A longitudinal change in active commentators, proposals, and comments Activeness Continuity The extent of consistency in participation Consistency

Responsiveness
The proportion of replies in online comments Reciprocity Inter-linkedness Interactive patterns among actors and proposals Structural property Commitment Variability of the degree of engagement Equal involvement These results answer the three research questions. The first question was about how the quality of online deliberation could be monitored on government-run platforms. We proposed automated throughput indicators that produce replicable results as an alternative to traditional manual coding schemes. In particular, we shed light on the importance of the time dimension and provide a novel approach for combining social network analysis and time-series analysis. The results demonstrated that online deliberation reflects formal governance processes, particularly on a government-run platform.
The second question was about what new indicators could support such monitoring by applying a network analysis and time-series analysis. We proposed six throughput indicators, as summarized above, which revealed substantial evidence regarding what transpires online.
The third question was about how the proposed indicators could help to develop more resilient governance practices. We summarize two main points. First, by considering the time dimension, the indicators could be used as a monitoring tool for keeping track of dynamic deliberation processes, which promote resilient governance capacity. Since the automated indicators can be produced rapidly, they can be used during ongoing deliberation processes. Second, the indicators help detect possible conflictual groups and facilitate discussion between them by focusing on the interaction dimension. Online deliberation does not automatically facilitate harmonious and integrated decision-making; instead, it could exacerbate polarization if manipulated by deepfakes and disinformation [87]. The time and interaction dimensions are crucial to monitoring the possible defects of online deliberation.
Based on these findings, we end this article by suggesting future research. One of the urgent research agendas is to develop a comprehensive framework that coherently connects multiple indicators to zoom in and out of multi-layered governance [88]. Under the framework, the next issue is to develop automated indicators by combining natural language processing, network analysis, time-series analysis, and other methods. Although these analyses have been developed in Mathematics, Physics, and Computer Sciences using distinct data collection strategies, online environments today often generate all data types. As an applied science, online deliberation research should combine multiple methods to investigate different aspects of the same empirical phenomenon. Lastly, future research should develop automated indicators that complement qualitative investigation. It is important to note that automated indicators are not the end of democratic assessment but the start of collective learning. The proposed indicators generate quantified results but do not answer how and why. Without considering context, automated quality indicators are mere numbers. We suggest that future research develop automated indicators as a tool for generating in-depth questions that will facilitate deliberation for improving a deliberative system, rather than answers. It is not unilateral communication; researchers could provide timely indicators, but the readers of these results probably know the local context better, and can better interpret the results. Public managers (e.g., borough liaisons in this case) can also pinpoint where to concentrate public resources to facilitate public deliberation. This co-learning combined with a trial-and-error process will strengthen the resilience of governance as a collective capacity.

Institutional Review Board Statement:
Not applicable for studies not involving humans or animals.

Informed Consent Statement:
Not applicable for studies not involving humans or animals.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy concerns but will be shared under the data management plan of the COLDIGIT project (https://blogs.helsinki.fi/collectiveintelligence-through-digital-tools-coldigit).