Cohesive Subgraph Identiﬁcation in Weighted Bipartite Graphs

: Cohesive subgraph identiﬁcation is a fundamental problem in bipartite graph analysis. In real applications, to better represent the co-relationship between entities, edges are usually associated with weights or frequencies, which are neglected by most existing research. To ﬁll the gap, we propose a new cohesive subgraph model, ( k , ω ) -core, by considering both subgraph cohesiveness and frequency for weighted bipartite graphs. Speciﬁcally, ( k , ω ) -core requires each node on the left layer to have at least k neighbors (cohesiveness) and each node on the right layer to have a weight of at least ω (frequency). In real scenarios, different users may have different parameter requirements. To handle massive graphs and queries, index-based strategies are developed. In addition, effective optimization techniques are proposed to improve the index construction phase. Compared with the baseline, extensive experiments on six datasets validate the superiority of our proposed methods.


Introduction
Bipartite graphs are widely used in many real-world applications to model the complex relationships across different types of entities, such as customer-product network and author-paper collaboration network [1][2][3][4]. A bipartite graph G = (L, R, E) consists of two sets of disjoint nodes, i.e., L and R. Only nodes from different sets can be connected. For example, Figure 1 shows an example of customer-product bipartite network, where edges represent the purchase relationships. The left layer L is a set of customers and the right layer R consists of a set of products purchased. There is no edge between the customers L (resp. products R).
As a fundamental problem in graph analysis, cohesive subgraph identification is widely studied in the literature (e.g., [5][6][7][8]). For bipartite graphs, a variety of cohesive subgraph models have been proposed to identify important structures, such as (α, β)core [3], bitruss [9] and biclique [10]. Biclique is the most cohesive model, which requires the nodes inside to be fully connected. However, the computation complexity, i.e., NPhard, makes it hard to apply in many time-efficient applications. Bitruss [11] adopts the butterfly motif (i.e., a (2, 2)-biclique) to investigate the cohesiveness of bipartite graphs. The (α, β)-core of bipartite graphs, which can be computed in linear time, has attracted great attention recently [3,12,13]. However, the model still has a drawback.
Motivations: Given a bipartite graph G = (L, R, E), (α, β)-core is the maximal subgraph, where each node in L has at least α neighbors in R while each node in R has at least β neighbors in L. It can be computed in linear time by iteratively deleting the node with a degree less than α or β. For instance, in Figure 1, the subgraph consisting of nodes {u 2 , u 3 , v 2 , v 3 , v 4 } is a (2,1)-core. However, in the (α, β)-core, it only emphasizes the engagement of each node, i.e., each node has a sufficient number of neighbors in the subgraph and treats each edge equally. However, in real applications, edges usually tend to have quite different weights. For example, in the customer-product network (e.g., Figure 1), each edge is assigned a weight, which reflects the frequency between a customer and a product. The frequency denotes the number of times the customer has bought the product.
To make sense of the weight information, we propose a novel model, (k, ω)-core, to detect the densely frequent communities, which ensures that the nodes in the left layer have a sufficient number of neighbors and the nodes in the right layer have enough weights. Given a bipartite graph, the (k, ω)-core is the maximal subgraph where each node in L (resp. R) has at least k neighbors (resp. ω weight). The weight of a node is the sum of the weights of each adjacent edge. For instance, reconsidering the graph in Figure 1, the weights of products {v 1 , v 2 , v 3 , v 4 , v 5 } are {1, 5, 5, 3, 1}, and the subgraph consisting of {u 1 , u 2 , u 3 , v 2 , v 3 , v 4 } is a (2,2)-core. The nodes {u 4 , v 1 , v 5 } are excluded from the (2,2)-core, since customer {u 4 } has not bought a sufficient number of distinct products while products {v 1 , v 5 } have not been purchased enough times.

Figure 1.
A weighted bipartite graph of the customer-product network (the weight on the edge denotes the number of times that the customer has bought the product).
Applications: The proposed (k, ω)-core model can be used in many real-world applications, such as product recommendation and fraud detection.

•
Product recommendation: In a product-customer network (e.g., Figure 1), a (k, ω)core means a group of users with sufficient common tastes. Then, we can use the group information for product recommendation. For example, in Figure 1, {u 1 , u 2 , u 3 , v 2 , v 3 , v 4 } is the (2,2)-core. Then, we can recommend product v 2 to user u 3 , since u 3 shares many common interests with u 1 and u 2 . • Fraud detection: For an online shopping website, fraudsters use a larger number of accounts to frequently purchase some selected products in order to boost the ranking of these products. This behavior can be modeled with a (k, ω)-core. By carefully selecting the parameters, we can use the detected (k, ω)-core to narrow down the searching space of fraudster accounts.
In real-life applications, the value of k (resp. ω) is determined by users based on their own requirements. The two parameters provide more flexibility when adjusting the resulting communities. As observed, the (α, β)-core is a special case of the (k, ω)-core when all the weights in the graph equal 1. Naively, we can extend the solution of computing (α, β)-core by iteratively deleting the nodes violating the constraints. The time complexity is linear to the input graph. However, in real applications, the graph size is usually large, which means algorithms that are linear to the input graph size are also not affordable [14]. In addition, different users may have different requirements of the input parameters k and ω, which can lead to a large amount of queries. Therefore, more efficient methods are expected to handle the massive graphs and queries.
In this paper, we resort to index-based approaches. A straightforward solution is to compute all possible (k, ω)-cores and maintain all the results. However, it will cause a huge computational cost by visiting the same subgraph multiple times. Thus, the time cost of computing all (k, ω)-cores becomes unaffordable on large graphs. To reduce the cost, we propose different index construction strategies to ensure a balance between building space-efficient indexes and supporting efficient-scalable query processing. Our major contributions are summarized as follows: • We propose a new cohesive subgraph model (k, ω)-core on weighted bipartite graphs by considering both density and frequency of the subgraph.
• To efficiently handle massive graphs and queries, we develop three advanced index construction strategies, i.e., RowIndex, OptionIndex and UnionIndex, to reduce index construction cost. In addition, the corresponding querying algorithms by using the three index structures are provided. • We validate the advantages of the proposed algorithms through extensive experiments on real-world datasets. The results show that the index-based algorithms outperform the baselines significantly. Moreover, users can make a trade-off between the time and space cost when selecting from the three strategies.
Roadmap: The rest of the paper is organized as follows. In Section 2, we introduce the (k, ω)-core model and formulate our problem. Section 3 introduces the naive online algorithm. Section 4 presents the index-based algorithms and advanced index structures. We report our experimental results in Section 5 and review the related work in Section 6. Finally, we present the conclusion and future work in Section 7.

Preliminaries
We use G = (L, R, E, W) to denote a weighted bipartite graph, where nodes in G are partitioned into two disjoint sets L and R, such that each edge from E ⊆ L × R connects two nodes from L and R, respectively. We use n = |L| + |R| and m = |E| to denote the number of nodes and edges, respectively. N(u) is the set of adjacent nodes of u in G, which is also called the neighbor set of u in G. The degree of a node u ∈ L, denoted by d(u), is the number of neighbors of u in G. For each edge e(u, v), we assign it a positive weight w(u, v) ∈ W, defined as the frequency of edge e(u, v). The weight of a node v ∈ R, denoted by w t (v) = ∑ u∈N(v) w(u, v), is the sum of weights of each adjacent edge. We use k max = max{d(u)|u ∈ L} and ω max = max{ω(v)|v ∈ R} to denote the maximum degree and weight for nodes in G, respectively. For a bipartite graph G and two node sets L ⊆ L and R ⊆ R, the bipartite subgraph induced by L and R is the subgraph G of G such that E = E ∩ (L × R ). To evaluate the cohesiveness and frequency of communities in weighted bipartite subgraphs, we resort to the minimum degree for node set L and minimum weight for node set R. In detail, for an induced subgraph, we request that nodes in L have a degree of at least k and nodes in R have a weight no less than ω. Definition 1 ((k, ω)-core). Given a weighted bipartite graph G = (L, R, E, W) and two query parameters k and ω, the induced subgraph S = (L , R , E , W ) is the (k, ω)-core of G, denoted by C k,ω , if S satisfies: • Degree constraint. For each node u ∈ L , it has degree at least k, i.e., d(u, S) ≥ k; • Weight constraint. For each node v ∈ R , it has weight no less than ω, i.e., ω t (v, S) ≥ ω; • Maximal. Any supergraph S ⊃ S is not a (k, ω)-core. Figure 1 is a toy weighted bipartite graph for modeling the customer-product affiliations. It consists of two layers of nodes, i.e., the four nodes in the left layer denote the customers and five nodes in the right layer denote the products. The edges between nodes represent the purchase relationships and the weight of edges reflects the purchase frequency. Given the query parameters k = 2 and ω = 4, we can obtain C 2,4 consisting of nodes {u 1 , u 2 , v 2 , v 3 }.
For simplicity, we refer to a weighted bipartite graph as a graph, and omit G, S in the notations if the context is self-evident. In the following lemma, we show that (k, ω)-core has the nested property. It is easy to verify the correctness of the lemma based on the definition. Thus, we omit the proof here. Lemma 1. Given a weighted bipartite graph G, the (k , ω )-core is nested to the (k, ω)-core, i.e., C k ,ω ⊆ C k,ω , if k ≥ k and ω ≥ ω.

Example 2.
As shown in Example 1, C 2,4 consists of nodes {u 1 , u 2 , v 2 , v 3 }. Suppose k = 2 and ω = 2. We can find that C 2,2 contains C 2,4 , i.e., Problem 1. Given a weighted . bipartite graph G and two query parameters k and ω, we aim to design algorithms to compute the (k, ω)-core correctly and efficiently.

Online Solution
Before introducing the detailed algorithms, Figure 2 shows the general framework of the proposed techniques in this paper. To identify the (k, ω)-core, an online solution is first developed in Section 3. To efficiently handle large networks and different input parameters, an index-based solution is further proposed in Section 4. The index-based solution consists of two phases: an index construction phase and query phase. In addition, different optimization techniques are proposed to ensure a balance between the index construction time and index space.
For the online solution, we introduce a baseline algorithm, named GCORE, by extending the solution for (α, β)-core computation. The main idea of GCORE is to iteratively remove nodes with a degree less than k in L and a weight less than ω in R. GCORE terminates until the size of G stays unchanged, i.e., there is no node that violates the constraints. Then, we output the remaining graph as (k, ω)-core. The details are shown in Algorithm 1. In Lines 2-5, we check the degree constraint for nodes in L. For each node u ∈ L with d(u) < k, we remove it with its adjacent edges. Then, we update the weight of node v in N(u), i.e, subtract the weight of corresponding removed edge e(u, v) from the total weight w t (v). In Lines 6-9, we examine the weight constraint for nodes in R. For each node v with w t (v) < ω, we remove it with its incident edges. Accordingly, we decrease the degree of u by 1 for each u in N(v), which may cause the node to violate the degree constraint. The algorithm terminates until both constraints are satisfied and finally returns the (k, ω)-core of G.

Algorithm 1: GENERATE (k, ω)-CORE
Input : Bipartite graph: G = (L, R, E, W), degree constraint: k, weight constraint: return G 10 Discussion: The time complexity of Algorithm 1 is linear to the size of the graph. However, as discussed in the introduction, the method is still not affordable, especially for massive graphs and queries.
y J L n c z X u F / N 5 Z I X t Y G N e R I p t k i + S I R / b J I T k h J V I m n N y T R / J K 3 p w H 5 9 l 5 d z 5 G 0 S l n P L N B f s H 5 / A I N t a i F < / l a t e x i t > Input: bipartite graph G < l a t e x i t s h a 1 _ b a s e 6 4 = " g k A K w V t D 8 a e P W S q v F a T Q 2 U z H / + 8 = " > A A A C C H i c b V D L S s N A F J 3 U V 6 2 v q E s X D h b B V U m k o r g q u F B 3 F e w D 2 l A m 0 5 t 2 6 G Q S Z i Z i C X X n x l 9 x 4 0 I R t 3 6 C O / / G S d u F t h 4 Y O J x z H 3 O P H 3 O m t O N 8 W 7 m F x a X l l f x q Y W 1 9 Y 3 P L 3 t 6 p q y i R F G o 0 4 p F s + k Q B Z w J q m m k O z V g C C X 0 O D X 9 w k f m N O 5 C K R e J W D 2 P w Q t I T L G C U a C N 1 7 P 2 2 h n u d X o s 4 0 e f Y Z z G R Z g r g n i R x f / R w 2 b G L T s k Z A 8 8 T d 0 q K a I p q x / 5 q d y O a h C A 0 5 U S p l u v E 2 k u z s Z T D q N B O F M S E D k g P W o Y K E o L y 0 v E h I 3 x o l C 4 O I m m e 0 H i s / u 5 I S a j U M P R N Z U h 0 X 8 1 6 m f i f 1 0 p 0 c O a l L L s S B J 0 s C h K O d Y S z V H C X S a C a D w 0 h V D L z V 0 z 7 R B K q T X Y F E 4 I 7 e / I 8 q R + X 3 H L p 5 K Z c r J S n c e T R H j p A R 8 h F p 6 i C r l A V 1 R B F j + g Z v a I 3 6 8 l 6 s d 6 t j 0 l p z p r 2 7 K I / s D 5 / A E Z u m h k = < / l a t e x i t > constraint k and ! < l a t e x i t s h a 1 _ b a s e 6 4 = " + 7 R 0 E c 9 F A u 2 / A a M u p z 9 F D q v w W d 4 = " > A A A C D n i c b V A 9 S w N B E N 2 L 3 / E r a m l z G A S r c C c R L Q M 2 l g p G h S S E u c 0 g n D + S J v J B X 7 9 F 7 9 t 6 8 9 2 n r g j e b 2 S e / 4 H 1 8 A 2 6 h n z A = < / l a t e x i t > delete u or v < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 y 6 j X L g 1 Input: bipartite graph G < l a t e x i t s h a 1 _ b a s e 6 4 = " g k A K w V t D 8 a e P W S q v F a T Q 2 U z H / T X Y F E 4 I 7 e / I 8 q R + X 3 H L p 5 K Z c r J S n c e T R H j p A R 8 h F p 6 i C r l A V 1 R B F j + g Z v a I 3 6 8 l 6 s d 6 t j 0 l p z p r 2 7 K I / s D 5 / A E Z u m h k = < / l a t e x i t > Time-improved < l a t e x i t s h a 1 _ b a s e 6 4 = " f z s g d M s 5 x 8 M R c S 0 G t S G z E / 6 n q E I = " l S C 9 t P Z 9 R P n z C h d p x c p U y E 6 M / X 3 R E q l 1 m M Z m E 5 J c a A X v a n 4 n 9 d K s H f j p z y M E 4 S Q z R f 1 E u F g 5 E y j c L p c A U M x N o Q y x c 2 t D h t Q R R m a w P I m B G / x 5 W V S v y x 6 p e L V f a l Q L m V x 5 M g J O S X n x C P X p E z u S I X U C C O P 5 J m 8 k j f r y X q x 3 q 2 P e e u K l c 0 c k T + w P n 8 A L S + V q A = = < / l a t e x i t > Baseline < l a t e x i t s h a 1 _ b a s e 6 n g G l 3 3 2 y q s r W 9 s b h W 3 S z u 7 e / t l + + C w p e N U U d a k s Y h V J y C a C S 5 Z E z k K 1 k k U I 1 E g W D s Y 3 8 z 8 9 g N T m s f y H i c J 8 y M y l D z k l K C R + n a 5 h + w R s + v 8 i W n f r r h V d w 5 n l X g 5 q U C O R t / + 6 g 1 i m k Z M I h V E 6 6 7 n J u h n R C G n g k 1 L v V S z h N A x G b K u o Z J E T P v Z f P G p c 2 q U g R P G y p R E Z 6 7 + n s h I p P U k C k x n R H C k l 7 2 Z + J / X T T G 8 H o x 6 9 B I v g q S R S 0 W P B i 9 6 q 2 A 9 o Q 9 l s p + 3 S z S b s T r Q 1 9 J d 4 8 a C I V 3 a B a t z w 3 R j + l C j k T M M m 3 E w 0 x Z U P a h 5 a h k o a g / X R 2 + M Q 5 M U r X 6 U X K l E R n p v 6 e S G m o 9 T g M T G d I c a A X v a n 4 n 9 d K s H f p p 1 z G C Y J k 8 0 W 9 R D g Y O d M U n C 5 X w F C M D a F M c X O r w w Z U U Y Y m q 7 w J w V t 8 e Z n U z 0 p e u X R + W y 5 W y l k c O X J E j s k p 8 c g F q Z B r U i U 1 w k h C n s k r e b O e r B f r 3 f q Y t 6 5 Y 2 c w h + Q P r 8 w d e l 5 O E < / l a t e x i t > OptionIndex < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 i s / 5 x g a N E j J K z A 1 Q f R 0 t m 7 Z 5 s c = " > A A A B + 3 i c b V B N S 8 N A E N 3 4 W e t X r E c v w S J 4 K o l U 9 F j w o i c r 2 A 9 o Q 9 l s p + 3 S z S b s T q Q l 9 K 9 4 8 a C I V / + I N / + N m z Y H b X 0 w 8 P a 9 G X b m B b H g G l 3 3 2 1 p b 3 9 j c 2 i 7 s F H f 3 9 g 8 O 7 a N S U 0 e J Y t B g k Y h U O 6 A a B J f Q Q I 4 C 2 r E C G g Y C W s H 4 J v N b T 6 A 0 j + Q j T m P w Q z q U f M A Z R S P 1 7 F I X Y Y L p f Z w 9 7 2 Q f J r O e X X Y r 7 h z O K v F y U i Y 5 6 j 3 7 q 9 u P W B K C R C a o 1 h 3 P j d F P q U L O B M y K 3 U R D T N m Y D q F j q K Q h a D + d 7 z 5 z z o z S d w a R M i X R m a u / J 1 I a a j 0 N A 9 M Z U h z p Z S 8 T / / M 6 C Q 6 u / Z T L O E G Q b P H R I B E O R k 4 W h N P n C h i K q S G U K W 5 2 d d i I K s r Q x F U 0 I X j L J 6 + S 5 k X F q 1 Y u H 6 r l W j W P o 0 B O y C k 5 J x 6 5 I j V y S + q k Q R i Z k G f y S t 6 s m f V i v V s f i 9 Y 1 K 5 8 5 J n 9 g f f 4 A y M u U 4 w = = < / l a t e x i t > UnionIndex < l a t e x i t s h a 1 _ b a s e 6 4 = " E 2 F S R p d F X k N R 1 4 + X O b T K N D R V T 4 s = " > A A A B + n i c b V B N T 8 J A E N 3 i F + J X 0 a O X R m L i i b Q G o 0 c S L 3 r D x A I J E L J d B t i w 3 T a 7 U 4 V U f o o X D x r j 1 V / i z X / j A j 0 o + J J J 3 r 4 3 k 5 1 5 Q S y 4 R t f 9 t n J r 6 x u b W / n t w s 7 u 3 v 6 B X T y s 6 y h R D H w W i U g 1 A 6 p B c A k + c h T Q j B X Q M B D Q C E b X M 7 / x A E r z S N 7 j J I Z O S A e S 9 z m j a K S u X W w j j D H 1 p X n d y h 6 M p 1 2 7 5 J b d O Z x V 4 m W k R D L U u v Z X u x e x J A S J T F C t W 5 4 b Y y e l C j k T M C 2 0 E w 0 x Z S M 6 g J a h k o a g O + l 8 9 a l z a p S e 0 4 + U K Y n O X P 0 9 k d J Q 6 0 k Y m M 6 Q 4 l A v e z P x P 6 + V Y P + q k 3 I Z J w i S L T 7 q J 8 L B y J n l 4 P S 4 A o Z i Y g h l i p t d H T a k i j I 0 a R V M C N 7 y y a u k f l 7 2 K u W L u 0 q p W s n i y J N j c k L O i E c u S Z X c k B r x C S O P 5 J m 8 k j f r y X q x 3 q 2 P R W v O y m a O y B 9 Y n z / y l J R p < / l a t e x i t > Output: constructed index < l a t e x i t s h a 1 _ b a s e 6 4 = " x n 7 S r 0

Index-Based Solution
For each input parameter, Algorithm 1 has to compute the (k, ω)-core from scratch, which is time-consuming and cannot support a large number of queries. To tackle the challenges, in this section, index-based algorithms are developed. The main idea is that we effectively organize all the (k, ω)-cores in the index, so that a query could be efficiently answered. Firstly, a baseline solution is presented. To speed up the processing of the baseline, we devise a time-improved solution. Then, several novel index structures are developed to shrink the storage space.

Baseline Solution
Intuitively, the naive index-based algorithm is to compute all the (k, ω)-cores by repeatedly using the GCORE algorithm and then storing all of them in the index. As a result, we can quickly return the (k, ω)-core for any given query parameters. In details, we organize all the (k, ω)-cores in a two-dimensional index. That is, the nodes in (k, ω)core are all stored in (k, ω)-cell, where (k, ω)-cell is in the k-th row and ω-th column (0 ≤ k ≤ k max , 0 ≤ ω ≤ ω max ) of the index. The procedure terminates until all the possible (k, ω)-cores are found. As a result, we can immediately obtain (k, ω)-core for any given pair of parameters k and ω, according to the two-dimensional locations of cells. Table 1 shows the index for the graph in Figure 1. For example, the set of nodes in the (1, 1)-core, i.e., {u 1 , u 2 , u 3 , u 4 , v 1 , v 2 , v 3 , v 4 , v 5 }, are all stored in the (1, 1)-cell. If querying the (1, 1)-core, we only need to visit the (1, 1)-cell. Hence, Q 1,1 can be easily solved in optimal time, with O(1) time complexity.

Time-Improved Method
The baseline index method is time-consuming, since we need to compute all the possible (k, ω)-cores one by one. Due to the nested property of (k, ω)-core, many subgraphs will be computed multiple times. To reduce the time consumption, we resort to the time-improved solution by escaping the unnecessary (k, ω)-core computations. Before going to the detailed method, we first introduce the concept of ω max,k (u) to help present the algorithm. Definition 2 (ω max,k (u)). Given a weighted bipartite graph G = (U, E, W), where U = R ∪ L, and a specific value k, for each node u ∈ U, ω max,k (u) is the maximum value of ω for which there exists a (k, ω)-core that contains u.
For a node u ∈ U and a specific value k, we know that the (k, ω max,k (u))-core contains u by Definition 2. According to the nested property of (k, ω)-core by Lemma 1, we can infer that the (k, ω max,k (u))-core is also contained in (k, ω i )-cores of G, where ω i is no larger than ω max,k (u). Thus, there are many redundant computations in the process of constructing index structure. To address the above concerns, we devise an improved index-based algorithm. Given a graph G and an integer k, we first compute ω max,k (u) for each node u ∈ U and then store u in the (k, ω)-cells where 0 ≤ ω ≤ ω max,k (u). Note that we store all nodes in row for a specific input k. The details are shown in Algorithm 2.

Algorithm 2: COMPUTEROW(k, G)
Input : Bipartite graph: In Algorithm 2, we first initialize row as empty and ω as 1 (Line 1). Then, we generate the (k, 0)-core as the candidate subgraph by using the GCORE algorithm. In Lines 5-10, if node u ∈ L violates the degree constraint, we remove it with its adjacent edges and update the weight of the node v ∈ R which is also included in the neighbor set of u. After obtaining ω max,k (u), we put node u into row[i] where 0 ≤ i ≤ ω max,k (u) − 1. Similarly, we check the weight constraint. In Lines 11-16, if node v ∈ R dissatisfies the weight constraint, we decrease the degree of the node u inside the neighbor set of v by 1. Then, we obtain ω max,k (v) and put node v into row[i], where 0 ≤ i ≤ ω max,k (v) − 1. We continue the iteration until all the nodes are removed from G . Finally, we return row as the resulting index for a given specific k. Note that we can obtain the index structure for the whole graph by repeatedly invoking Algorithm 2 with different input values of k.
Discussion: Although the time-improved method can speedup the processing, it is prohibitive for large graphs due to the large index storage cost. This is because a node can be stored in multiple cells due to the nested property. For instance, given a fixed k = 1, the nodes in the (1, 3)-cell will also be stored in (1, 1)-cell, (1, 2)-cell and (1, 3)-cell. Similarly, for a specific ω, the same problem still exists when computing the column index.

Advanced Index Structures
As discussed, the baseline index method suffers from storage issues. To shrink the index space without sacrificing much efficiency, we introduce three novel index structures, i.e., (1) RowIndex: by utilizing the nested property of (k, ω)-core, we compress each row of the index; (2) OptionIndex: by comparing the shrink size of compression in row and column, we select the better compression direction; (3) UnionIndex: by considering both row and column compression, we conduct the union operations on cells of the index. In addition, the corresponding query algorithms are presented.

Rowindex
According to the nested property in Lemma 1, we know that C k,ω is always a subset of C k,ω−1 . Thus, we resort to the RowIndex by compressing row of the index, since it can avoid storing a single node many times. Given a specific k, we say that all the (k, * )-cells are in the k-th row, where the symbol " * " represents any possible value of ω. The main difference between RowIndex and the index structure proposed above is that we only store each node u ∈ U in the (k, ω max,k (u))-cell, instead of putting it into (k, ω i )-cells where 0 ≤ ω i ≤ ω max,k (u). Thus, we only need to deposit each node at most once in each row of the index, which can save space from the redundant copies of nodes. Meanwhile, we also record the shrink direction (i.e., "→") in the shrink, which is a direction table. As the procedure of RowIndex is easy to understand, we omit its pseudo-codes in the context. RowIndex Query Algorithm: Given query parameters k and ω, we first locate the (k, ω)-cell. Then, we collect all the nodes contained in the (k, ω i )-cell where ω ≤ ω i ≤ ω max , and output them together as the resulting (k, ω)-core. Table 1, for k = 1, the (1, 1)-cell containing nodes {u 1 , u 2 , u 3 , u 4 , v 1 , v 2 , v 3 , v 4 , v 5 } can be compressed to the (1, 3)-cell and the (1, 5)-cell. That is, nodes u 4 , v 4 only need to be saved in the (1, 3)-cell and nodes u 1 , u 2 , u 3 , v 2 , v 3 only need to be stored in the (1, 5)-cell. Thus, only the remaining nodes v 1 and v 5 are stored in the (1, 1)-cell. Obviously, RowIndex saves a lot of space. When querying the (2, 3)-core, we first locate the (2, 3)-cell and output nodes in the (2, 3)-cell and (2, 4)-cell together. Thus, we have C 2,3 = {u 1 , u 2 , v 2 , v 3 }.

OptionIndex
As discussed above, RowIndex utilizes the nested property to reduce the redundant storage for each node in each row of the index. Similarly, we can construct ColumnIndex to compress each column of the index in the same manner, which also enjoys the same space cost. Naturally, it is possible that certain cells may compress more storage by ColumnIndex than RowIndex. That is, column compression may contribute more to space saving for some cells. Motivated by this, we devised the OptionIndex structure, which is constructed by traversing all cells one by one. Specifically, when visiting a specific cell, we first compared the compression size of different compression directions, i.e., RowIndex or ColumnIndex, and then selected the better one to reduce more space. For example, in Table 1, the compression size is 7 if we use RowIndex to shrink the (1, 1)-cell to the (1, 2)-cell with shrink direction "→". Additionally, the compression size is 8 if we use ColumnIndex to shrink the (1, 1)-cell to the (2, 1)-cell with shrink direction "↓". Since ColumnIndex shrinks more than the RowIndex for the (1, 1)-cell, we chose ColumnIndex and shrank (1, 1)-cell to the (2, 1)-cell. Similarly, we chose RowIndex for the (1, 4)-cell, as RowIndex saves a space of five nodes while ColumnIndex saves four. The details of the construction procedure for OptionIndex are shown in Algorithm 3. for ω = 0 to ω max do 5 rs ← +∞, cs ← +∞; In Algorithm 3, we first initialize the index and shrink as empty (Line 1). In Line 2, the algorithm computes (0, ω)-core as the initialization of the current processing row cRow and deals with each row in the main loop (Lines 4-17). We set the row next to the cRow as nRow at Line 4. Then, we compressed the storage space for all possible (k, ω)-cores in cRow (Lines 5-16). In each inner iteration, we first initialized both of the resulting sizes of the (k, ω)-cell after row shrink (rs) and column shrink (cs) as positive infinity. In Lines 7-8, we use RowIndex to shrink the (k, ω)-cell to the (k, ω + 1)-cell and the resulting size of the (k, ω)-cell is reserved in rs. Meanwhile, in Lines 9-10, we utilize ColumnIndex to shrink the (k, ω)-cell to the (k + 1, ω)-cell and the resulting size of the (k, ω)-cell is reserved in cs. It is obvious that smaller the resulting size of the (k, ω)-cell is, the better the result of compression. Hence, in Lines 11-16, for a specific (k, ω)-cell, if the value of rs is no larger than cs, we choose RowIndex to compress and put the nodes contained in the (k, ω)-core but not in the (k, ω + 1)-core into (k, ω)-cell, with the corresponding direction "→" recorded in shrink. Otherwise, we select ColumnIndex to compress, and put the nodes contained in the (k, ω)-core but not in the (k + 1, ω)-core into (k, ω)-cell, with the corresponding direction "↓" reserved in shrink. We deal with each cell the same way one by one. Finally, we shrink the last row of the index in Line 18 by using Algorithm 2 and then return the resulting OptionIndex with its corresponding direction table shrink in Line 19.
OptionIndex Query Algorithm: Based on the pre-computed OptionIndex, we devised an efficient option query algorithm, and the details are shown in Algorithm 4. In Line 1, we first initialize the (k, ω)-core Q as empty. In Lines 2-3, for given k and ω, we locate index [k][ω] and then add the nodes contained in the (k, ω)-cell to Q. At the same time, we obtain the shrink direction d from shrink [k][ω] (Line 4). In Lines 6-9, if the direction is "→", it implies the current (k, ω)-cell adopts the row compression. Then, we add the nodes contained in the (k, ω + 1)-cell to Q and then turn to the (k, ω + 1)-cell. In Lines 10-13, if the direction is "↓", it suggests that the shrink direction is down the column. Accordingly, the nodes stored in the (k, ω)-cell are added into Q, and then we turn to (k + 1, ω)-cell for the next iteration. The procedure terminates until the shrink direction is null and finally we return Q as the resulting (k, ω)-core.

Unionindex
To further reduce index cost, we propose the UnionIndex. The main difference between UnionIndex and OptionIndex is that we compress certain cells both in row and column directions at the same time to narrow more space. For example, recall that in Table 1, the (1, 1)-cell can be shrunk to the (1, 2)-cell with compression size 7, or to the (2, 1)-cell with compression size 8. However, the compression size can be up to 9 (i.e., all nodes in the graph) if we shrink the (1, 1)-cell to both the (1, 2)-cell and (2, 1)-cell simultaneously with shrink directions "→" and "↓". Thus, we chose both of the two directions to shrink space storage. In detail, we deposited the nodes contained in the (1, 1)-core but not in the union set of (1, 2)-core and (2, 1)-core into the (1, 1)-cell with shrink directions "→" and "↓" recorded simultaneously in the direction table.
The pseudo-codes to construct UnionIndex are presented in Algorithm 5. Since the UnionIndex structure is similar to the OptionIndex structure, we only demonstrate the difference from Algorithm 3 for simplicity. In Lines 6-8, for a specific k, if the (k + 1, ω)-core is nested to the (k, ω + 1)-core, it indicates that the compression size of RowIndex is larger than that of ColunmIndex. Thus, we put the nodes contained in the (k, ω)-core but not in the (k, ω + 1)-core into the (k, ω)-cell with shrink direction "→". On the contrary, if the (k + 1, ω)-core contained the (k, ω + 1)-core, we deposited the nodes included in the (k, ω)-core but not in the (k + 1, ω)-core into the (k, ω)-cell with shrink direction "↓" in Lines 9-11. Otherwise, in Lines 12-14, we compress the (k, ω)-cell to the (k + 1, ω)-cell and the (k, ω + 1)-cell at the same time with shrink directions "→" and "↓" recorded in the direction table, by avoiding the redundant storage of nodes in the union set of the (k + 1, ω)-core and the (k, ω + 1)-core. Finally, the algorithm returns UnionIndex with its corresponding direction table shrink in Line 17.  UnionIndex Query Algorithm: The procedure for querying UnionIndex is simple and the details are shown as follows. When given two query parameters k and ω, we first locate the (k, ω)-cell and collect the nodes stored inside it. Then, we obtain the corresponding shrink direction in the direction table shrink, which is obtained with Algorithm 5. If the direction is only "→" (resp. "↓"), then we locate the (k, ω + 1)-cell (resp. (k + 1, ω)-cell) and collect the nodes contained inside it. Particularly, if there are two shrink directions "→" and "↓" recorded in the shrink, we visit the (k + 1, ω)-cell and the (k, ω + 1)-cell at the same time, collecting their nodes together without duplications. We did the same for all visited cells until the current shrink direction was null and finally we outputted all the collected nodes as the resulting (k, ω)-core.

Experiments
In this section, we detail experiments over six real-life networks to verify the performance of the proposed methods.

Experiment Setup
Algorithms: In the experiments, we implemented and evaluated the algorithms as follows. Datasets: We employed six real-life networks, i.e., Pedia, Movielens, News, Quote, Books and Citeulike, which have been widely used in previous studies (e.g., [2,3,15]) and are publicly available at http://konect.cc/networks/ (2021/05/20). Table 2 provides the statistical details of the datasets. For a query with a given pair of parameters k and ω, we ran the algorithms over each dataset 200 times and reported the average value. All the programs were implemented in C++ and the experiments were performed on a PC with an Intel Xeon 3.2 GHz CPU and 32 GB RAM.

Performance Evaluation
To evaluate the efficiency, we compare the response time and space storage of the algorithms on the datasets as follows.
Efficiency of the time-improved algorithm: We firstly compared the baseline solution (BL) with the time-improved algorithm (TI) over all the datasets for index construction. The results are shown in Figure 3. It is obvious that TI runs much faster than BL, since BL needs to compute each subgraph from scratch. In particular, TI significantly outperforms BL in large graphs. For instance, in the Books dataset, TI can achieve a speed that is up to 42X faster. Evaluation of index-construction time: To evaluate the performance of different strategies, in Figure 4, we report the index construction time of TI, TI+Row, TI+Option and TI+Union. We vary the percentage of nodes selected as the input graph. As expected, more nodes will lead to a higher index construction time. TI+Union is slower than the other methods, since it is the most complex method for index construction. The rank of the time costs for the three index construction methods is: RowIndex < OptionIndex < UnionIndex. Note that the highest time cost gap is in the order of seconds, which is tolerable for many applications. Evaluation of the index space: In this experiment, the space costs are compared among the four index-construction algorithms, i.e., TI, TI+Row, TI+Option and TI+Union. Note that the space storage is measured by the number of nodes stored inside the index. Similarly, we vary the percentage of nodes in each dataset. The results are shown in Figure 5. As observed, with the increase in nodes involved, more index space is required for all the algorithms. Obviously, the space cost of TI+Union is much less than that of TI, which can save up to 7X space in the News dataset. As expected, TI+Union greatly outperforms TI+Row and TI+Option, for it can omit the largest number of unpromising copies of nodes. The rank of the space cost for these methods is: UnionIndex < OptionIndex < RowIndex. Effect of k in (k, ω)-core queries: To evaluate the querying performance of proposed techniques, we report the response time of GCore, TI+Row, TI+Option and TI+Union algorithms on the three largest datasets by varying k. The results are shown in Figure 6. As shown, with the increase in k, the response time of each algorithm decreases. This is because the returned densely frequent community size become smaller when the degree constraint k becomes tighter. Moreover, there is no doubt that all the index-based query algorithms run much faster than GCore for all k values. The main reason is that the indexbased algorithms pre-compute the (k, ω)-core information, so that we can quickly obtain the related nodes when querying any (k, ω)-core. Effect of ω in (k, ω)-core queries: In Figure 7, we report the response time of GCore, TI+Row, TI+Option and TI+Union by varying ω. With the increase in ω, the response time decreases for all the algorithms, since the size of the detected cohesive subgraph decreases accordingly. The index-based solutions are much faster than the online solution, i.e., GCore. As shown, more complex index structures, such as TI+Union, will lead to a higher computation cost. Therefore, users can make a trade-off between the querying time and space cost when selecting the index strategies. Discussion: According to the results of the experiment, we can find that the index construction time grows with the increase in dataset size. However, the increase rates of OptionIndex and UnionIndex grow much faster than RowIndex. This is because, for larger datasets, it usually means larger k max and ω max . Therefore, with the increase in k max and ω max , OptionIndex and UnionIndex need to take more time to decide the best direction for index construction in order to shrink the index space. When selecting the appropriate solution, in addition to the index space issue, users should pay more attention to k max and ω max of the used networks.

Related Work
Graphs are widely used to model the complex relationships between entities [16]. As a special graph, many real-life systems are modeled in bipartite graphs, such as author-paper networks [17], customer-product networks [18] and gene co-expression networks [19]. Bipartite graph analysis is of great importance and has attracted great attention in the literature. Guillaume et al. show that all complex networks can be viewed as bipartite structures sharing some important statistics, such as degree distributions [20]. In [21], Kannan et al. utilize simple Markov chains for the problem of generating labeled bipartite graphs with a given degree sequence. Borgatti et al. present and discuss ways of applying and interpreting traditional network analysis techniques to two-mode data [22].
Cohesive subgraph identification is a fundamental problem in graph analysis, and different models are proposed, such as k-core [23], k-truss [24] and clique [25]. Due to the unique properties of bipartite graphs, many studies are conducted to design and investigate the cohesive subgraph models for bipartite graphs, such as (α, β)-core, bitruss and biclique. Ahmed et al. [26] are the first to formally propose and investigate the (α, β)-core model. The authors of [3] further extend the linear k-core mining algorithm to compute the (α, β)-core. In [4], the authors combine the influence property with (α, β)-core for community detection. Considering the structure properties, Zou et al. [9] propose the bitruss model, where each edge in the community is contained in at least k butterflies. To further study the clustering ability in bipartite graphs, Flajolet et al. [27] use the ratio of the number of butterflies to the number of three paths for modeling the cohesiveness of the graph. In [28], Robins et al. resort to the (2, 2)-biclique to model the cohesion. In [10], a progressive method is proposed to speed up the computation of biclique. As we can see, the previous studies do not consider the weight factor for cohesive subgraph identification. Thus, in this paper, we propose (k, ω)-core to capture the weight property for bipartite network analysis. Even though we can extend the computation procedure of (α, β)-core for (k, ω)-core identification (i.e., online solution), it cannot handle large graphs and different parameters efficiently. Therefore, in this paper, we propose index-based solutions with different optimization strategies to deal with this issue.

Conclusions and Future Work
In this paper, we introduce a novel cohesive subgraph model (k, ω)-core for weighted bipartite graph analysis. A baseline online solution is first presented by extending the method for (α, β)-core computation. To handle massive graphs and queries, index-based strategies are developed by using the nested property. To balance the query performance and space cost, three advanced index structures are further introduced. Finally, we conduct extensive experiments on real-world datasets to evaluate the performance of the proposed techniques. In future work, we will consider the external algorithms or distributed solutions for (α, β)-core identification in order to support larger networks.