Online Social Space Identiﬁcation.A Computational Tool for Optimizing Social Recommendations

: Conscious and functional use of online social spaces can support the elderly with mind cognitive impairment (MCI) in their daily routine, not only for systematic monitoring, but to achieve effective targeted engagement. In this sense, although social involvement can be obtained when elder’s experiences, interests, and goals are shared and accepted by the community, an important subsistence for aging depends on the compelling information, users’ co-operation


Introduction
Mind cognitive impairment (MCI) is rapidly becoming one of the most common clinical manifestations affecting the elderly.It is characterized by deterioration of memory and cognitive function that is beyond what is expected based on age and educational level.MCI does not interfere significantly with individuals' daily activities.It can act as a transitional level of evolving dementia with a range of conversion of 10%-15% per year [1].Therefore, it is crucial to protect older people against MCI.
The SystEm of Nudge theory-based ICT applications for OldeR citizens (SENIOR) project (a project supported by the Cariplo Foundation according to well-being and social cohesion planned intervention strategies) will create an advanced nudge-based [2,3] social platform, collecting and integrating significant physiological and behavioral data to interact with elderly people and to provide personalized suggestions about preventive measures, social participation, and overall wellness.The challenge of SENIOR will be to move many elderly with MCI at a first stage towards a conscious and functional use of new technologies by exploiting the advantages of being connected, not only to be clinically monitored, but to remain in contact with an expert audience of users.
Proper handling of procedures and data is therefore fundamental to convert available information into useful formulation.We address this issue from a theoretical perspective: we propose a computational problem that aims to leverage user-user and user-item relationships for identifying heterogeneous communities of users and items (we also use the term "resource" as a synonym for "item") where "expert users" (or "experts") can engage and support elderly by suggesting available services and facilities for their conditions and social well-being (hospitals, care providers, leisure services, cultural activities).In our model, "experts" will be a sort of intermediary ("facilitators") who, thanks to their experience and ability to meet the elderly's needs (acquaintances, patients with similar history, care providers, parents), will be able to encourage social participation under therapeutic plans and objectives.In other terms, solving the proposed computational problem will allow SENIOR's engine to offer heterogeneous communities where the elderly can find optimized target "spaces of interest" for their wellness.Importantly, this "social space" will be exploited by SENIOR's future technological component: a recommender system able to strengthen and influence elderly engagement and participation in the identified communities.
Our intent, in this paper, is to introduce the "social space identification" problem, a computational optimization problem for target communities identification, and to provide heuristic-based solutions for the considered formulation.The paper is organized as follows.In Section 2, we review the related literature.We summarize the theoretical concepts in Section 3. In Section 4, we discuss the GA-based approach to seek approximate solutions for the proposed problem.We extend the applied framework to distributed environments in Section 5.In Section 6, we discusses numerical results using synthetic data, and in Section 7, we conclude the paper by discussing future developments of our research.

Related work
Online education plays a fundamental role in the prevention and conscious participation of patients in the care process.In particular, some online social networks have been emphasized as a useful tool that acts as social support for the elderly, capable of shielding, in some circumstances, from psychological stress and encouraging participation and engagement [4][5][6][7][8][9].Some of these platforms, such as PatientsLikeMe (http://www.patientslikeme.com),CureTogether (http://www.curetogether.com), or MedHelp (http://www.medhelp.org),have provided users self-management of personal activities for their own diseases.For example, in [10], an auto-tracking system called "Empower"was designed to allow the self-management of patients and their treating physicians.In this case, specialists can provide recommendations and monitor patient's daily progress.Similarly, HealthNet (HN) [11] allows users to access suggestions on both doctors and health facilities that best fit the patient's clinical profile.The main component of HN is a recommender system that is able to suggest similar profiled patients and indicate health services for a specific condition.
Moreover, understanding (or even inducing) the formation of social ties within online health networks, their structure, and functions, as well as the associated mechanisms linking these to health services, can be extremely relevant for several reasons.For example,
In this regard, one of the most convincing ideas for promoting the creation of suitable health social spaces has been the application of recommender systems (RS).Unfortunately, these mechanisms have not yet been widely used in the medical context (and in health informatics).Above all, in these fields, they still lack the fulfillment of the above issues simultaneously.To some extent, current solutions are limited to conferring direct benefits either to health professionals or patients, rather than inducing both optimized and reliable information content spaces for targeted allocation of health services.Wiesner and Pfeifer [21] distinguished two different scenarios.On the one hand, healthcare professionals are the beneficiaries of the recommendations to find, for example, additional information for specific case studies, guidelines, or research articles.On the other hand, the beneficiaries are the patients.In the latter case, RS focuses on delivering high quality, evidence-based, health-related content to end-user patients, for example suggesting clinical examinations [22], lifestyle changes [23,24] , or improving patient safety [25].Similarly, they are also used to indicate to the patient a better understanding of his/her personal health status, to retrieve semantically-related content, or to suggest web sites concerning specific diseases [26,27].
To the best of our knowledge, no framework exists that specifies how health-related recommender systems can guide the network creation process and, in particular, can generate networks that comply with the requirements argued previously.It is our opinion that promising new research in this direction can be conducted by applying computational methods to the study of heterogeneous information networks more widely [18,[28][29][30][31][32].In this context, graph theory and optimization techniques could provide future recommendation systems with the ability to influence and effectively support user communication in online health spaces.For example, in the literature, some algorithms designed for this purpose, i.e., to identify specific subgraph communities such as cliques or k-clubs, can be found in [33][34][35][36].It is worth noting that the optimization of targeting social spaces for potential recommender systems, and therefore the theoretical identification of these communities, may lead to combinatorial optimization issues, which, in turn, bring the identification of intrinsically complex (intractable) problems [37].Anyway, even in these circumstances, common approaches exist for applying specific heuristics or designing particular (machine) learning techniques that offer approximate, but efficient solutions [38][39][40][41][42].

The "Social Space Identification" Problem
In some circumstances, friendship is not the only factor that allows us to build reliable suggestions for a recommendation system.Instead, for users, sharing information about available facilities and services, being involved in common goals, or even sharing their own history and experience with other users should be the key issues to consider when providing adequate recommendations.With this perspective, SENIOR's social engine will be formulated as a computational problem aimed at optimizing heterogeneous groups of people and resources by exploiting the involvement and the skills of "expert users", i.e., care providers, specialized personal, acquaintances, patients with similar experiences.
Without loss of generality, we can consider the case of a "clinic oriented engagement".Suppose users are recently diagnosed (RD) patients who want to be involved in managing their diseases through the experience of other users (e.g., being followed up for similar diseases), say "expert patients" (EX).In this situation, new users may be interested in acquiring information about, e.g., follow-up treatments, specialized hospitals, and doctors; in general terms, they should be interested in knowing as much as possible about the caregivers (here, we will also use the term health service provider or its acronym HSP) and the services they offer.In this case, the platform should motivate new patients to socialize with experts who have already been followed up, for example, in the same health centers or by other specific caregivers.It is more likely that, under these circumstances, EX patients can share reliable and convincing suggestions.
Importantly, the identification of potential users and resources would, therefore, allow properly providing the recommendation system with effective information, which, in turn, would allow inducing the targeted communication within the chosen group.

Problem Formulation
Modeling the situation described in the previous section naturally leads to the problem of identifying (large and dense) communities of users and items.In our case, the role of expert users is fundamental to apply the method properly.We could represent EX patients through "entities", which act as "bridges" between new patients and HSPs: their role is to connect RD patients and HSPs, indirectly.In such a situation, the interest becomes that of identifying a large (and dense) community in which RD users can be at most at a distance of two from HSPs, being connected through the "experts" who can offer their support.
The problem can, therefore, be formulated as follows.Within a given network, identify the largest community in which users are connected (through paths of length two) to items through available specific users.This model leads to the theoretical concept of two-clubs (i.e., graphs with particular properties).For the sake of clarity, before defining the problem computationally, we recall some important definitions in the context of graph theory.

Main Notation and Definitions
Graphs are abstract models of complex relationships commonly established in current networks.Formally, a graph G = (V, E) is a collection of the network's entities (i.e., vertices, V) and interactions among them (edges, E).Given a graph G, we use the corresponding adjacency matrix A, which indicates whether two vertices v i , v j of G are connected by an edge, i.e., (A) i,j = 1, if {v i , v j } ∈ E. We also use the notation A[i, k : j, m] to denote the sub-matrix of A indexed by the rows i...k and columns j...m.Moreover, we denote by A path in G is a finite (or infinite) sequence of edges that joins a sequence of distinct vertices.The distance, d G (u, v), between two distinct vertices u and v is the number of edges in the shortest path connecting them.The diameter of G = (V, E) is defined as max u,v∈V d G (u, v), i.e., the maximum distance between any two vertices in V. Finally, we denote by N(v) = {u : {v, u} ∈ E} the neighborhood of a vertex v.The following definitions are fundamental for our objectives: Definition 1. Two-clubs are subgraphs where all the vertices are at a distance at most two.Importantly, given a graph G = (V, E), a two-club in G is a subgraph G[W], with W ⊆ V, whose diameter is at most equal to two.
Let G be a graph, V = T ∪ X ∪ R its set of vertices, and T, X, and R the set of RD patients, EX patients, and HSP centers, respectively.Consider a labeling function h : V → S that assigns to each vertex in V a (type of) service in S. Notice that a service h(v) ∈ S can be required by new users v ∈ T (e.g., v requires information about h(v)), already prescribed for some expert v ∈ X (e.g., v was followed up with h(v)) or available in some center v ∈ R (i.e., health service h(v) is available in v).Definition 2. A vertex x ∈ X, for which there exists, in N(x), at least one pair (t, r) ∈ T × R, where h(t) = h(x) = h(r), will be named a "feasible vertex".Moreover, we call the set of "feasible vertices", C, a "feasible set" and the pair, (t, r), a feasible pair.
In our example, feasible vertices are experts "equipped" with pairs (t, r) consisting of a new patient t with the available caregiver r.In other terms, when a new (target) patient t requires information, our interest will be to supply t with his/her "own" heterogeneous community of users/items G[T ∪ X ∪ R ], with T ⊆ T, X ⊆ X, R ⊆ R, consisting of the largest number of feasible pairs.More formally, we seek a maximum size two-club for t (i.e., t included), which has the further property of providing, the largest number of (feasible) pairs (t, r) whose vertices t and r are connected with at least one simple path passing through some x ∈ X , such that h( t) = h(x) = h(r), i.e., the information about service h( t), requested by t, is available with (the caregiver) r.This information can be obtained consulting x.
For any "target", the identification of the largest size two-club (with the largest number of feasible pairs) offers multiple sources of alternative information (each node of a two-club community being reachable through paths of size two).In other words, "targets" can acquire greater awareness in reaching available resources by exploiting different paths and through different "experts".For example, in our case, recent patients (i.e., target users) can collect information on alternative providers (HSP) from the same expert, as well as information on the same center from different available experts.With the above concepts, we can formulate the optimization problem in the following general terms.Problem 1. Input: (1) A graph G = (V, E), with V = T ∪ X ∪ R, where T is the set of RD patients, X is the set of EX patients, and R is the set of HSP centers.(2) A set of service S and a labeling h, such that h : T ∪ X ∪ R → S , i.e., h labels expert patients, recent patients, and services.(3) The target patient t ∈ T. Output: A set V ⊆ V, such that t ⊆ V and G[V ] is a two-club having both maximum size and the largest number of feasible pairs.

Approximate Solutions
Problem 1 is a computational variant of the max s-club problem, whose complexity is NP-hard for s ≥ 1 [39].Similarly, this result also applies to 1.
In order to find approximate solutions at a reasonable cost (time), in this paper, we designed a genetic-based heuristic by defining specific operators that allow obtaining fast solutions.

Chromosome Representation
Let G = (V, E) be an input graph and G[V ], with V ⊆ V a given feasible solution for Problem 1.Consider a binary vector c such that c In this way, we assume that c is used to represent a (chromosome) solution obtained through the evolution of randomly initialized binary vectors (starting population) of dimension |V|.In other terms, we represent with binary vectors a population of chromosomes that state the vertices to be included within the corresponding identified community (two-club).

Fitness Function
Genetic algorithms (GAs) promote system adaptation in such a way that candidate chromosomes (in our case, being able to represent graphs with a diameter value no larger than two) "evolve" through standard elitism.The evaluation of a chromosome is then obtained by applying a fitness function [43].
Here, we give a general formulation for the fitness we apply in the following paragraphs.
where the value of α will be functionally related either to (1) the health services and vertex-type distributions around the neighborhoods of vertices in V[c] or (2) the number of feasible pairs in the neighborhoods of vertices in V[c].Therefore, the objective of Equation ( 1) is to promote differently (depending on the role of α (i.e., either Case 1 or 2)) large sized sub-networks for correct diameter values (i.e., by weighting with α the number of vertices, More in detail, α will be evaluated as follows.

Case 1
In this case, α is functionally related to the health service and vertex-type distributions around the neighborhoods of vertices in V[c].In particular, 1.From one side, Equation (1) promotes chromosomes that identify, around feasible vertices, different (types of) services operated by HSPs and different (types of) services required by targets, i.e., we promote uniform distributions of the supply and demand of services.2. From the other side, we promote chromosomes that identify, around feasible vertices, the participation of different types of entities (here, we focus on two types of entities only: HSPs and target users, respectively), i.e., we promote uniform user-/resource-type distributions around Notice that the above objectives are reflected, in turn, by the entropy of the corresponding distributions, i.e., a "distribution of entities", say f Y , obtained by counting the different types of entities in N(v) for each v ∈ V[c], and a "distribution of services" (supplied or demanded by entities), say f S , obtained by counting the different types of services in N(v), for each v ∈ V[c] (These estimations are clearly biased due to the "relationships" between neighborhoods.In fact, here, we are assuming the independence between the distributions of vertices in different neighborhoods.).
As the entropy H( f S ) (respectively, H( f Y )) reaches the maximum if all the outcomes are equally likely, in this case fitness 1, it should promote those chromosomes that represent uniform service-/entity-type distributions around EX patients.Hence, following the above arguments, we set

Case 2
In this case, α is functionally related to the number of feasible pairs around the neighborhoods of vertices in V[c].Given a chromosome c and the set of induced vertices V[c], we sample, for each feasible vertex v ∈ V[c], a set (of observations) obs ∈ N(v).In this way, we have a fast computational estimation, n p , by counting the number of all feasible pairs within the observed samples.Finally, for this case, we set α = n p .

Mutation
The objective of mutation is to encourage system adaptation.The following three genetic operators are applied (with equal probability).

•
Mutation Operator 1: Let c be a chromosome observed during some step of the evolution process.Assume that, at such a step of evolution, G[c], the current hypothesis conjectured by c, is not feasible for Problem 1.Moreover, let G[V + ] be the subgraph induced by V + = {v i : c[i] = 1}.In order to provide the system's compliance to get feasible solutions, to obtain feasible two-clubs sparingly, we randomly sampled the vertices v i ∈ V + , and for each ({v i , v}, v ∈ V + \ {v i }, we checked whether the minimum length between v i and v was ≤ 2. If this test was negative, then c[i] was flipped to zero, thus orienting the system towards the feasibility.

•
Mutation Operator 2: Through this operator, we aimed to increment the size of a feasible solution.Let V − = {v j : c[j] = 0} and G[V + ] be defined as above.We randomly sampled v j from V − , and then, we checked if the shortest distance of v j from V + was ≤ 2. If this test was negative, then c[j] was flipped to one.

• Mutation Operator 3:
In this case, a standard mutation procedure was applied: bits were randomly switched either "on" or "off".

Cross-Over
The following cross-over operators were designed.

•
Logical AND/OR cross-over: New offspring was generated by applying AND/OR logic operations on parent chromosomes.
• Standard cross-over: This operation is typical and is often used in standard applications of genetic algorithms.Parent chromosomes are reported and mixed in new descendants.

Distributed Learning
The large number of data observed in online social interactions and the intrinsic complexity of the proposed problem make it difficult to solve the optimization target computationally.The effective distribution of the learning task is therefore desirable.
The idea of "distributed learning" in GAs has been explored with many motivations [44].Most of these approaches share the concept of evolving independent (genetic) populations in parallel.Similarly, our goal in this paper is task-motivated: we framed the optimization Problem 1 into independent, local smaller optimization tasks, while returning (local) solutions that could be associated with specific subgraphs.

A Genetic Cascade Model
Distributed learning was conceived following the process in Figure 1.Large scale optimization was divided into smaller problems, where different populations evolved over different (computational) "sites" independently.Each "site" was locally "trained" with the GA described in Section 4, using only a part of the input instances.Using this load distribution, each (first level) site processes its own solution for the graph adjacency (sub)matrix A [i, i + k; i, i + k] whose rows and columns are associated, respectively, with the corresponding sequence of k(input data) elements (B i ) i∈{i,...,i+k} .
Given two (subsequent) first-level sites s 1,i and s 1,i+1 , their input workload {b 1 , b 2 , . . . ,b k } and {b k+1 , b k+1 , . . . ,b 2k }, and the respective GAs solutions obtained in s 1,i and s 1,i+1 , we can insert the best chromosomes (from s 1,i and s 1,i+1 , respectively) within the population of a new processing unit (say, site s 2,k , of Level 2).In this way, the new chromosome population in s 2,k is properly initialized to return the local solution for the corresponding input subgraph.Indeed, by extending the chromosome representation reported in Section 4.1, we can train the GA employed in s 2,k on the set of elements {b 1 , b 2 , . . . ,b 2k }, thus providing feasible solutions for the extended set of input instances, which, in turn, represent a larger local solution whose graph adjacency matrix is: Finally, by extending the process to the lower level site, we complete the evolution of all input data.The best solution provided by the last site identifies, in this way, the community (i.e., the two-club) on the whole input network.

Numerical Experiments
The numerical experiments mainly aimed at evaluating the ability of the GA-based approach to obtain correct solutions in a reasonable time.In order to promote large sized sub-networks, we applied the two cases of fitness introduced in Section 4.2.In particular, we applied Case 1 (distribution-based) for a standard GA evolution (reported as "centralized" in the following discussion) and Case 2 (feasible pairs estimation) for the distributed learning.

Centralized Learning
As we could not compare the solution of the GAs with the problem's optimal solution (NP-hardness), we evaluated the possibility of identifying communities where the services operated by health centers and the services required by users were best distributed around feasible vertices.
As discussed in Section 4.2, we encouraged, on the one hand, the offer of a large number of services versus a differentiated request for services and, on the other, the participation of different types of entities around feasible users (uniform distributions).

•
We considered the average value of the entropy for the health service and vertex-type distributions around expert patients.It is worth emphasizing that the system returned solutions whose entropy could not be associated with distributions having the whole probability mass centered on any specific value.Notice that, since only RD patients and HSP could be found around EX users, then we had 0 ≤ H( f Y ) ≤ 1.Similarly, as we had S ∈ {1 . . .5}, i.e., five types of services were considered in our experiments, then H( f S ) was such that 0 ≤ H( f S ) ≤ log 2 (5).• Moreover, notice that, at least 12% of the output nodes were identified as feasible.Larger values are reported for large input networks, e.g., ER(300, 0.15) or ER(150, 0.3).This was a compelling property, for example, in the case of large communities.

•
System time seemed reasonable (T2 ≤ 7.2 s) for the applied instances.

Distributed Learning
In this case, fitness promoted the identification of large networks by weighting the chromosomes with the estimation of the number of feasible pairs (Case 2 fitness) observed in the corresponding chromosome representation.We coded the procedures with the "Multiprocessing" Python package, working on local concurrency with four cores (i.e., the cascade model used four starting sites).The R and Python interfaces were managed by the rpy2 (https://rpy2.readthedocs.io/en/version_2.8.x) utility.Experiments were executed on Apple OSX 10.12.6, system type MacBook Pro Retina, processor Intel(R), Core(TM) i5-6360U 2.00 GHz, 3.100 GHz, 2 core(s), 4 logical processors; installed physical memory (RAM) 8.00 GB.Results are given in Tables 2 and 3.In particular, we compared the results with "centralized" (not distributed) executions.The following attributes were used.

•
Ratio between the number of input vertices and the number of vertices within the two-club represented by the chromosome solution.

•
Average CPU user time: for a single processing unit, i.e., standard GA evolution, this quantity corresponded to the (GA) execution time; when applying distributed learning, the whole execution time was averaged over the number of (framework) levels.

•
Early stopping: the number of consecutive generations without improvement of the fitness value; the GA execution was stopped after the "early stopping".

•
Max number of generations: the maximum number of iterations before the GA search was halted.

•
Final generation number: the iteration number associated with the final solution.Notice that, in our case (i.e., distributed evolution), this number corresponded to the iterations of the lowest site.
The following observations can be given.
• All models, except ER(500, 0.1), returned correct two-clubs.This was an interesting result when large input graphs (e.g., more than 1000 vertices in our table) were considered.
• Similarly to previous experiments, as we could not confront the problem's optimal solution, to evaluate the solution proposed by the genetic algorithm qualitatively, we report the ratio between the number of vertices of the input and the output graphs.Notice that the approximability of the problem was very hard to obtain (not approximable within a factor |V| 1/2−ε , for each ε > 0 [47]).
Considering our results, in particular the ratio in Table 3, we concluded that the most interesting solutions were those for which the larger the input graph (number of vertices), the lower the difference between the ratio and the unit.

•
Although the size of the identified communities did not seem to differ, the (average) number of iterations of the last level site (distributed execution) was much lower than the iterations reported by the standard, centralized evolution.This was also evident from the decrease in the average execution time per level, reported by the distributed process.Since GAs use the same parameters for the standard and the distributed evolution, this behavior could be traced back to the initialization that each lower level site received from the higher levels.

•
The computational cost seemed to depend on both the edge numbers (i.e., high expected connectivity of random models E(n, p)) and the number of input vertices.As the fitness computational complexity was related to the diameter computation, this assertion could be justified considering the diameter computational cost, which is known to be bounded by O(|V| 3 ).

Conclusions
The challenge set by the SENIOR project is to involve the elderly, with mind cognitive impairment, in target communities where the shared information and the available resources can be exploited as best as possible by users and caregivers.The SENIOR social engine will be designed as a specific recommender system able to leverage on-line information to influence elderly participation and to promote resources for their well-being.As recognized above, to the best of our knowledge, no framework exists that specifies how health-related recommender systems can promote this "network creation process".In this regard, the expected value of the proposed ideas is to promote the creation of online "health spaces" by enhancing trust-based communication between patients and "experts" for suitable resource allocation.In this model, the "experts" not only play an essential role for patients in the real world, but represent mediation nodes for suggesting "items" (i.e., resources) accurately.The recommender system should, therefore, influence the development of the social network with appropriate suggestions.
In conclusion, we can characterize the contributions of this article as follows.

•
We formulated a computational optimization problem for the future suggestions of the SENIOR recommender system.• Algorithmic solutions were proposed as well, based on evolutionary heuristics, both for centralized and parallel processing environments.
Further extension of our research will be addressed as follows.

•
As reported in recent studies, using additional user and item relationships could improve the recommendation quality [48,49] .In this perspective, the optimization objective formulated in Eq. 1 will be further constrained by taking into account relationships between target users.These relationships could be interpretable as either "friendships between target users" or "targets sharing similar experiences".The feasible solutions (i.e., two-clubs) optimized with such a type of connection would allow the future recommendation system to induce even more compelling communication among users, perhaps looking for similar reliable resources.Furthermore, edges among "experts " (within the identified two-clubs) could be used as an endorsement, thus allowing target users to evaluate new different experiences or the consistency of the information of other expert users.Such links could also induce more effective communication between experts to facilitate final patient support.Similarly, a constrained optimization accounting for "resource to resource" relationships, within the identified community, could offer the user equivalent available resources.

•
In Section 4.2, we presented a general fitness formulation for two different evaluations of a free parameter (α).A more context-based formulation able to manage real needs will certainly provide more effective results in this sense.As an example, we can consider the first reported case (Number 1), where α is related to the health service and vertex-type distributions.In this situation, a linear combination of entropies is used to optimize (and balance) an induced two-club community.Direct comparisons (e.g., using cross-entropy evaluations) of user requirement distribution vs. service distribution could be similarly evaluated for future numerical experiments.

•
Finally, we assumed so far that each provider could supply one service only.In fact, this was "coded" by the labeling function h discussed in Section 3.2.It is straightforward to extend this labeling to multiple services for a more realistic implementation.
With a slight abuse of notation, we indicate with G[c] the subgraph of G induced by c and with V[c] and E[c] the corresponding sets of vertices and edges, respectively, i.e., G[c] = (V[c], E[c]).

Figure 1 .
Figure1.Distributed process.Different populations evolve over different (computational) "sites" independently.Each "site" is locally trained with the GA, and the best chromosomes from the pair of "ascendents" initialize the lower level site population.Without loss of generality, we can assume that the sets of expert patients, X, recent patients T, and health providers R form an indexed collection of elements(B i ) i∈[n] , B = {T ∪ X ∪ R},whose values [n] (to simplify the notation, here we use [n] as a shortcut for {1, 2, .., n}) also index the rows and columns of the graph adjacency matrix A[1, n : 1, n] of G[B].Moreover, let us assume that the computational load is distributed on a first sequence of different units (say, sites of Level 1) in such a way that each unit of this sequence (say, site i of Level 1, s 1,i ) executes the GA, only on a part of the available users and items data in B. In particular, we distribute to s 1,i only a sub-sequence of k input elements {b 1 , b 2 , . . . ,b k }, where b i is either a user or an item in B. For example, site A1 = s 1,1 in Figure2works on the sequence consisting of the first five elements {b 1 , b 2 , . . . ,b 5 }.Similarly, the second site A2 = s 1,2 employs the successive elements {b 6 , b 7 , . . . ,b 10 }.Using this load distribution, each (first level) site processes its own solution for the graph adjacency (sub)matrix A [i, i + k; i, i + k] whose rows and columns are associated, respectively, with the corresponding sequence of k(input data) elements (B i ) i∈{i,...,i+k} .

Figure 2
Figure 2 displays the situation when B1's population (initialized by the best chromosomes from its ascendents A1 and A2) is trained to provide solutions for A [1, 10; 1, 10].Finally, by extending the process to the lower level site, we complete the evolution of all input data.The best solution provided by the last site identifies, in this way, the community (i.e., the two-club) on the whole input network.

Table 3 .
Models (Erdos-Renyi), best fitness (Fit), iteration (Iter), CPU Av. time., the ratio between input and output vertices.The standard deviation is shown in brackets.