SoftRec: Multi-Relationship Fused Software Developer Recommendation

: Collaboration efﬁciency is of primary importance in software development. It is widely recognized that choosing suitable developers is an efﬁcient and effective practice for improving the efﬁciency of software development and collaboration. Recommending suitable developers is complex and time-consuming due to the difﬁculty of learning developers’ expertise and willingness. Existing works focus on learning developers’ expertise and interactions from their explicit historical information and matching them to speciﬁc task. However, such procedures may suffer low accuracy because they ignore implicit information, such as (1) developer–developer collaboration relationships, (2) developer–task implicit interaction relationships, and (3) task–task association relationships, etc. To that end, this paper proposes a multi-relationship fused approach for software developer recommendation (termed SoftRec). First, in addition to explicit developer–task interactions, it considers multivariate implicit relationships, including the three types mentioned above. Second, it integrates these relationships based on joint matrix factorization and generates forecast results upon the architecture of deep neural network. Furthermore, we propose a fast update method to address the cold start issue by making online recommendations for new developers and new tasks. Extensive experiments are conducted on two real-world datasets, and a user study is conducted in a well-known software company. The results demonstrate that SoftRec outperforms four state-of-the-art works.

They usually adopt open-collaboration mode [1,2] with a voluntary or competitive mechanism to facilitate self-organizing collaborative software development, in which the productivity and quality greatly depends on the efficiency of developers' collaboration.
For example, in the process of modern code review (MCR) [3], a developer usually submits a code change to a code review system (e.g., Gerrit (https://www.gerritcodereview.com/), Rietveld (https: //en.wikipedia.org/wiki/Rietveld_(software)), Crucible (https://www.atlassian.com/software/ crucible)) and recommends a set of suitable developers to review the change. Then, the reviewers would check the change and give some useful suggestions. Next, the developer will refine the change according to these suggestions and commit the change to the main branch of the version control system when the reviewers approve it. For some large open source projects, hundreds of contributors may attempt to change the code, adding new features or fixing bugs every day. These changes are submitted as pull-requests [4,5] subject to review by the whole community. However, such a mechanism makes it very difficult to find suitable developers as reviewers. For example, Thongtanunam et al. found that 4%-30% of the reviews suffer from a code-reviewer assignment problem, and these reviews require an average of 12 extra days to complete [6], and more than 15% of the complaints were filed the delays in completing pull requests [7]. Similar problems are encountered in StackOverflow, one of the most successful question and answering (Q&A) open-collaboration platforms. It has over 9 million users who has posted more than 16 million questions and answers on programming. However, among those questions, only about 70% of them can be answered and closed in time and millions of questions have yet to be answered properly [1]. As a result, an automatic developer recommendation approach is urgently needed for identifying suitable developers to improve developers' collaboration efficiency.
In recent years, many efforts have been made for developer recommendation. Typical approaches such as the expertise-based developer recommendation and the collaborative filtering (CF)-based developer recommendation. Usually, the expertise-based approaches [2,8] recommend developers based on the ratings of their explicit expertise (e.g., skills, contributions, activeness, workload). The CF-based approaches [4,9,10] which make recommendations based on the assumption that developers with similar historic behaviors or social relationship would behave similarly on future tasks. For example, some research [11,12] integrate the explicit social information into the recommender system and recommend developers based on their friendships.
However, due to the difficulty of matching developers' expertise and willingness, together with the sparsity of developer-task explicit interactions, previous approaches have not achieved ideal results. In practice, there are a lot of implicit relationships between developers and tasks that can be leveraged to help find suitable developers. Take GitHub's code review as an example, as shown in Figure 1a-c, where u i denotes the developer, p i denotes the pull-request, dots with blank cells denotes the developers' explicit review (or comment) on pull-request, and dots with grey cells denotes the developers' implicit interaction (e.g., star, browse). Since u 2 has implicit interaction on p 6 , we can infer that u 2 has potential preferences for p 6 (denoted as u 2 → p 6 ). Since u 1 has a close collaboration relationship with u 2 (u 1 ↔ u 2 ), we can infer the potential relationship u 1 → p 6 (red circles in Figure 1a). Similarly, since u 1 has preferences for p 1 (u 1 → p 1 ), and p 1 has an association relationship with p 4 (p 1 → p 4 ), we can infer the potential relationship (u 1 → p 4 ). By analyzing GitHub's code review recommendation, we find that developers are more likely to accept a recommended pull-request if (1) they have reviewed code or solved issues of the contributors of the same pull-request (i.e., the collaboration relationships between developers can keep continuity); (2) they have browsed or stared (besides explicitly reviewed) the pull-request (i.e., implicit interaction relationships help improve recommendation accuracy); or (3) they have reviewed a similar pull-request of the recommended one. Such information can be exploited to improve the accuracy of developer recommendation.
To leverage the above mentioned relationships between developers and tasks, we develop a novel multi-relationship fused approach for software developer recommendation (SoftRec). The major contributions of SoftRec are:

•
We formally define the multivariate relationships between developers and tasks, including three types: (1) developer-developer collaboration relationships, (2) developer-task interaction relationships, and (3) task-task association relationships. • We propose a multi-relationship fused approach to recommend developers based on joint matrix factorization and generate forecast results upon the architecture of deep neural network. To our best knowledge, this is the first attempt to integrate these three implicit relationships into developer recommendation. • We propose a fast approach to update the changes of the model efficiently and improve the recommendation efficiency, which can address the cold start issue. • To evaluate the effectiveness of SoftRec, we conduct experiments on two real-world datasets: One from GitHub with 2517 developers and 9329 tasks, while the other from a well-known company's GitLab with 590 developers and 15,632 tasks, and we also conduct a user study in this company. By comparisons of four state-of-the-art works, the results demonstrate the advantages of the SoftRec.
The remainder of this paper is organized as follows. We review the related work in Section 2. In Section 3, we present the proposed multi-relationship fused software developer recommendation framework. In Section 4, we present and discuss the result of experiments on real-world datasets and conduct a user study. In Section 5, we discuss the threats to validity of our approach. Concluding remarks with a discussion of some future work are given in Section 6.

Related Work
Among previous approaches, collaborative filtering (CF) [13][14][15][16][17] is the most typical approach that achieves great success. CF-based developer recommendations share a similar process, that is, they first calculate the similarity between the given task and other resolved task based on the explicit interaction relationships. Then, the similarity ratings between the given task and each resolved task are assigned to the developers of the corresponding resolved tasks. This way, each developer has a rating to indicate their expertise with respect to the given task. However, CF-based approaches usually suffer from serious sparsity of the developer-task explicit interactions and the cold start issues. For example, by analyzing GitHub datasets, the data sparsity of the developer-task explicit interaction matrix is as low as 0.1351%, which greatly limits the effectiveness of the recommendations.
To address these limitations, previous work incorporated various side information into CF [3,4,[18][19][20][21][22][23][24][25][26][27][28]. For example, Zheng et al. [4] proposed a CF-based approach (PR-CF) that generates the latent factor models based on the explicit interaction matrix, and then combines the latent factor models with the tasks' neighborhoods. Jiang et al. [19] constructed an explicit social relationship network on GitHub, and then propose a CF-based recommendation approach based on Co-cluster for developers and tasks. Ma et al. [18] proposed a factor analysis approach called SoRec which was based on joint matrix factorization. It integrates the social information into the rating matrix and shares the users' latent vectors in both the recommender system and the social network. Yu et al. [20] proposed an approach named IR+CN that mines social relationship from historical comments for recommend reviewers. Xia et al. [3] proposed a hybrid approach (TIE) which utilizes text mining and file location to find similar tasks and recommends developers based on conditional probabilities. Bosu et al. [22,23] analyze the characteristics of different kinds of social interaction networks between developers and their influence on the impression of developers. They found that the interactions between developers and tasks can help form an accurate perception of expertise. Its log-joint posterior probability distribution is given as follows: where C and R denote the social relationship and rating matrices, Q, U, and V denote the latent social, users, and item feature matrices, respectively. C, R, U, V, and Q follow Gaussian distributions with the means of 0 and variances of σ 2 C , σ 2 R , σ 2 U , σ 2 V , σ 2 Q , respectively. I R and I C denote the indicator matrices, denotes the Hadamard product, g(x) is the logistic function, and Reg is the regularization term. Parameters U, V, and Q can be learned by maximizing Equation (1). The common limitation of CF-based developer recommendation approaches is that they usually suffer from low accuracy caused by the sparsity of explicit developer-task interactions.
In recent years, deep learning approaches have been widely used to build recommender systems in many fields [13,17,[29][30][31][32]. For example, He et al. [13] proposed a neural network architecture to model the latent features of users and items and devise a general framework (NeuCF) for collaborative filtering. Wang et al. [17] proposed a graph collaborative filtering approach based on graph neural networks, which explicitly encodes the user-item interaction relationships in the form of high-order connectivities with embedding propagation. Xue et al. [29] proposed a deep matrix factorization models (DMF) with a neural network that map the users and items into a common low-dimensional space with non-linear projections. He et al. [31] proposed a deep learning model (NFM) that unifies the strengths of factorization machines and deep neural networks for sparse rating modelling.
Despite prevalence and effectiveness, we argue that previous approaches with side information are insufficient, since they only consider the explicit collaborative similarity (e.g., explicit interactions), which lacks of rich semantics (e.g., various implicit relationships). In real-world applications, there are more implicit relationships between developers and tasks (e.g., mentioned in Figure 1), which particularly help understand developer's behaviors and preferences. In this paper, we try to integrate both explicit and implicit relationships among developers and tasks to make accurate developer recommendations.

The Multi-Relationship Fused Software Developer Recommendation
In this section, we propose SoftRec, a novel multi-relationship fused approach for developer recommendation. Its overall recommendation process is shown in Figure 2, there are three parts: (1) the input is multi-relationships, in which we define the collaboration relationship matrix C, the interaction relationship matrix R and the association relationship matrix S in steps 1 , 2 and 3 respectively; (2) the fusion of multi-relationships as shown in step 4 , in which we share the common developer latent vector U i in both C and R as well as the common task latent vector V j in both R and S based on joint matrix factorization; and (3) the output is the developer prediction in step 5 , in which we propose to project the vectors U i and V j into a latent structured space based on the architecture of deep neural network. Furthermore, we propose a fast model update approach for SoftRec.

Definition of Multi-Relationships
In this section, we first formally define the developer-developer collaboration relationship, developer-task interaction relationship and task-task association relationship illustrated in Figure 1.

Developer-Developer Collaboration Relationship
Recent research [33] at Google shows that code review usually depends on close working relationships between authors and reviewers. Through a study of Neusoft Corporation 's R&D teams (Neusoft Corporation (https://www.neusoft.com/) is the largest software service provider in China, with more than 18,000 developers and tens of thousands of commercial customers all over the world), we also found that developers usually prefer reviewers that have close collaboration relationships with them. Therefore, collaboration relationships can be leveraged to improve developer recommendation accuracy. In this paper, we categorize the collaboration relationship into explicit collaboration and implicit collaboration. The former refers to direct interaction between developers and the latter refers to indirect collaboration between developers.
Explicit Collaboration. Suppose O denotes a set of interaction objects between u and u , A o denotes a set of actions which performed on object o ∈ O, the explicit collaboration relationship is formalized as: where s denotes the number of interactions while interactive object is o and action is a. ω ∈ [0, 1] is the decay factor used to weaken the influence of multiple interactions between the same developers.
, which is used to reflect the influence of time locality. It supposes that the most recent interaction is more important than it was a long time ago. interactTime(u, u , o, a, i) denotes the i th occurrence time of the action a which occurred between u and u on object o, i = 1 denotes the first interaction, beginTime and endTime denote the earliest and latest occurrence time of all interactions in the datasets.
Suppose a 1 = a 2 = comment, and the performed number of a 1 and a 2 are s a 1 = 1 and s a 2 = 2. Suppose the occurrence time of a 1 and a 2 are t a 11 and t a 21 , beginTime and endTime are t b and t e , respectively. We can Implicit Collaboration. Suppose O denotes a set of co-occurrence objects (e.g., organization, project, team) between u and u . The implicit collaboration relationship is formalized as: where I(·) is the indicator function. If u, u are co-occurred in o, its value is 1, otherwise, its value is 0.
ψ is a constant, used to regularize the results, similar to its use in [4], its default value is 100.

Example 2.
Suppose u and u participated in a same project p, in addition, they belong to the same technical organization org. We can get O = {p, org}, and c imp (u, u ) = 1+1 1+1+100 . (2) and (3), we formalize the final collaboration relationship between u and u as:

Based on Equations
where α ∈ [0, 1] is a weight which used to balance the explicit and implicit collaboration. Based on Equation (2) to Equation (4), we can obtain the explicit collaboration matrix, the implicit collaboration matrix and the final collaboration matrix, denote as C imp , C exp and C respectively. The strength of the collaboration relationship is measured based on the developers' interactive behaviors for the first time. From the viewpoint of quantitative analysis, it not only considers the influence of the time window for the interactions but also the multiple interactions between the same developers. From the viewpoint of qualitative analysis, the collaboration relationship might be specific and targeted to reflect the intimacy between developers in their development process than traditional generalized social relationships (as shown in Section 4.2.1).

Developer-Task Interaction Relationship
The strength of the interaction relationship reflects the degree of developers' preferences for tasks. It means that the closer developer u interacts with task v, the more likely u suits for v (evidences in recent investigations [10]). Next, we formally define the interaction relationship between developer and task.
Interaction Relationship. Similar to Equation (2), suppose A v denotes a set of actions (e.g., browed, stared, explicitly comment) which performed on task v by developer u, the interaction relationship r(u, v) is formalized as: where π 2 (u, v, a , i) = interactTime(u,v,a ,i)−beginTime endTime−beginTime , and interactTime(u, v, a , i) is the occurrence time of the action a which performed on task v by developer u, beginTime, endTime and ω are similar to the definitions in Equation (2). Based on Equation (5), we can calculate the interaction relationship matrix R. Example 3. Suppose u has the interactive actions A v = {a 1 , a 2 } on task v. Suppose a 1 = comment and a 2 = star (e.g., humbs-up) and the performed number of a 1 and a 2 are s a 1 = 2 and s a 2 = 1, the occurrence time of a 1 and a 2 are t a 11 , t a 12 and t a 21 , beginTime and endTime are t b and t e , respectively. We can calculate: Different from the existing methods [17,20], that usually set r(u, v)=1 by default while the interaction exists. Here, we quantitatively calculate r(u, v) based on the developers' interactive behaviors and consider that the strength of the interaction will be affected by the time locality and interaction numbers. In addition, there may be many interactive objects between developers in Equation (2), but for Equation (5), the interactive object defaults to task v.

Task-Task Association Relationship
Intuitively, developers are more willing to accept the tasks which are similar to those they have done before. Therefore, how to measure the similarity relationship between tasks is is an important issue. Existing item-based collaborative filtering methods [14,16,34] only consider the collaborative similarity relationship (i.e., the item similarity evidenced by user interactions like ratings and purchases), which lacks of concrete semantics. In real-world applications, there typically exist multiple relationships between tasks that have concrete semantics, and they are particularly helpful to understand developer behaviors. For example, some tasks may have similar titles, file paths or related similar source code, and others although have no such explicit similar features, they share the same contributors or reviewers, etc. In the context of this research, we define two types of similarity: (1) co-occurrence developer similarity between tasks, and (2) text similarity between tasks. The association relationship is formalized as: where |N v ∪N v | denotes the co-occurrence relationship of developers in v and v , where N v and N v denotes the corresponding developer sets associated with v and v respectively. β is a weight which used to balance the text similarity and the developer co-occurrence relationship.
denotes the cosine similarity of v and v , where e t and e t denote the text vectors of v and v , here, we learn e t and e t based on Doc2Vec [35], a popular text vector representations model, which can learn vector representations for variable length of texts such as sentences and documents. Here, we implement Doc2Vec upon the model of gensim PV-DBOW (https: //radimrehurek.com/gensim/models/doc2vec.html) due to its simplicity and ease of implementation. The calculate process is that, for each task, there are some text description (e.g., title, tags, abstract, content and code file paths), we first concatenate each task's text description and generate the concatenated text vectors (i.e., e t , e t ). Then we calculate the cosine similarity of those concatenated text vectors. Based on Equation (6), we can calculate the similar relationship matrix S.

Fusion of Multi-Relationships
In this section, we integrate the collaboration relationship and association relationship with the interaction relationship based on joint matrix factorization [18]. [18,36] with the means of 0 and variances of σ 2 C , σ 2 R , σ 2 S , σ 2 Q , σ 2 u and σ 2 v , respectively. Here, Q, U and V denote the latent features of the collaboration relationships, the developers and the tasks, respectively. d is the vector size.
As shown in Figure 2 middle subfigure, for each developer u i , the latent collaboration feature vector is denoted by Q t ∈ R d×1 , and the latent developer feature vector is denoted by U i ∈ R d×1 . They are calculated by factorizing C n×n . Similarly, for each task v j , the latent task feature vector is denoted by V j ∈ R d×1 , and the latent association feature vector is denoted by V l ∈ R d×1 . They are calculated by factorizing S m×m . Here, U i is the shared latent developer feature vector, calculated by factorizing C ti and R ij . V j is the shared latent task feature vector, calculated by factorizing R ij and S jl .
To learn U, V and Q, we design a log-joint posterior probability distribution function as shown in Equation (7): where I R , I C and I S denote the indicator matrices. denotes the Hadamard product of two matrices.
is a logistic function [18] used to limit the predicted value of U T i V j to the interval of [0, 1]. The model parameters U, V and Q can be learned by maximizing the log-joint posterior in Equation (7) which is equivalent to minimizing the objective function in Equation (8): . Based on Equation (8), we calculate the partial and ∂L 1 ∂Q t as shown in Equation (9): To calculate the feature vectors of U i , V j and Q t , we utilize the momentum-based stochastic gradient descent (MSGD) method [37] to update the parameters of SoftRec to accelerate its convergence as shown in Equation (10): where ν ∈ [0, 1] is the momentum parameter and η > 0 is the learning rate. According to previous work [37], the default values of ν and η are usually set to be 0.8 and 0.05, respectively. U i , V j , and Q t are the temporary variables which used to update the momentum. When the parameters U i , V j , and Q t converge or the number of iterations reaches its maximum, we can obtain the feature vectors U i , V j and Q t .

Developer Prediction
Next, we present the developer prediction based on the feature vectors U i and V j calculated in Equation (10). Due to the linear inner product might limit the expressiveness of MF [13,31], here, we project the feature vectors U i and V j into a non-linear space upon the architecture of deep neural network [38].
As shown in Figure 2 right subfigure, for developer u i and task v j , the input feature vectors are U i and V j respectively. Suppose W (t) V are the t th (t=1,2,...,N) layer weighting matrices for U i and V j respectively. The developer u i and task v j are finally mapped to a low-dimensional vector space as: where f (t) (·) denotes the activation function, we implement f (t) (·) with ReLU, which is formalized as f (t) (x) = max(0, x). In this paper, we have two multi-layer networks to transform the representations of u and v respectively. Based on the output vectors U i and V j in Equation (11), we can calculate the predicted value as follows:ŷ where V } N l=1 are the parameters. To learn Θ, we opt for the pairwise BPR loss [39], which has been widely used in training recommender systems [40]. It assumes that the observed interactions, which indicate more user preferences, should be assigned higher prediction values than unobserved ones. The objective function is as: where (i, j) ∈ R + denotes the set of observed interaction relationships, (i, j ) ∈ R − denotes the set of unobserved interactions, and σ(·) is the sigmoid function. λ Θ 2 2 is the regularizer used to prevent the overftting. Similar to Equations (9) and (10), we can calculate the partial derivative of L 2 as well as other model parameters in Equation (13).

Fast Model Update
How to update the model is a critical problem. Previous update approaches based on large matrix factorization [4,29,41] usually update the whole system offline and periodically. This approach is referred to as fullUpdate. However, on developer collaboration platforms, a lot of explicit and implicit relationships are produced on a daily basis. Frequent full updates are expensive, especially in recommendation scenarios that involve large-scale matrices and multiple relationships. They limit the update frequency and, consequently, lower the timeliness of the recommendation results.
However, updating in time may help address the cold start issue (as will be evaluated and demonstrated in Section 4.  (9) and (10). On the other hand, for old developers and old tasks, because their historical information plays the dominant role during the recommendation process, minor updates may hardly affect the recommendation results within a short period of time. Therefore, SoftRec updates old developers and old tasks' latent feature vectors U i , V j , and Q t with offline fullUpdate only.
Based on the above principles, we propose a novel fast update approach for new developers and new tasks, called fastUpdate. Take the update of feature vector U i as an example, its pseudo code is presented in Algorithm 1. For developer u i , let φ(u i ) =Θ(R i * ) denote u i ' known interaction set in R i * , Θ(·) denote the set of known entities. The probability of performing the operation R ← R ∪ R ij is defined as follows: where the parameter ϕ ∈ [0, 1] denotes the decay factor. The probability of update will gradually decrease as |φ(u i )| increases. Then, based on Equations (9) and (10), the update of U i is described in Algorithm 1.

Algorithm 1: fastUpdate for U i
for s from 1 to Iter do 4 for j from 1 to d do where θ is the empiric constant and Iter is the number of iterations before an early stop is applied, its value defaults to 100.
Similar to the update of U i in Algorithm 1, Q t and V j can be updated as well. We can see that the time complexity of fastUpdate approach is O( |Θ(C * i )| + |Θ(S j * )| + |Θ(R i * )| · d · Iter), while the time complexity of the traditional fullUpdate approach that updates the entire model is O( |Θ(C)| + |Θ(S)| + |Θ(R)| · d · Iter). This way, the proposed fastUpdate approach as a supplement to the fullUpdate can address the cold start issue by making online recommendations for new developers and new tasks (as shown in Section 4.2.4). After updating U i and V j , we can calculate the value ofŷ ij based on Equations (11) and (12).

Experiment and Discussion
We evaluate the proposed approach on two real-world datasets, one is collected from the GitHub projects, another is collected from Neusoft Corporation. We aim to answer the following research questions: RQ1 How does SoftRec perform compared with state-of-the-art CF-based recommendation methods? RQ2 How does SoftRec perform when tackling different data sparsity? RQ3 How do developer-developer collaboration relationships and task-task association relationships (i.e., λ C and λ S ) affect SoftRec? RQ4 How does SoftRec perform when tackling model update and does it help solve the cold start issue? RQ5 Can SoftRec have practical value and be recognized by enterprise users in practice?

Data Collection and Preprocessing
To collect data for our experiments, we crawl five popular GitHub projects with GitHub API (https: //developer.github.com/v3/), including symfony/symfony, akka/akka, elasticsearch, netty/netty and ipython/ipython, and the collected data is from March 2015 to Novemaber 2018. These five projects are highly popular and have 2760 watches, 39,340 stars and 16,860 forks on average on GitHub. In addition, we select five popular commercial projects from the Neusoft Corporation's GitLab platform, including Workflow, DI, DataViz, ACAP and APM, and the collected data is from January 2017 to June 2019. In GitHub projects, we recommend code reviewers for the given pull-requests, and in GitLab projects we recommend assignees for the given issues.
Next, we filter out the tasks (i.e., pull-requests or issues) with less than two different reviewers or assignees according to previous works [4]. The statistics of preprocessed data are shown in Table 1, and the initial data density of the interaction matrix are ρ 1 = 3.57% and ρ 2 = 6.01%, respectively.

. Relationship Mapping
We calculate the three types of relationships based on the definitions in Section 3.1. Table 2 shows the mappings of interactive objects and interactive actions in the collected datasets. Take the GitHub dataset as an example. The collaboration relationship defined in Equation (1) are calculated based on the interactive object set O={issue_comments, commit_comments, pull_request_comments} in each GitHub project. The developers' interactive action set A performed on O is {comment, commit, reaction}, where the interactive action reaction represents a series of emojis (https://developer.github. com/v3/reactions/).

Approaches for Comparison
In this experiment, we compare SoftRec with five state-of-the-art approaches: • PR-CF [4]: a typical CF-based hybrid approach that generates the latent factor models based on the developer-task explicit interaction matrix, and then combines the latent factor models with the tasks' neighborhoods to capture the similarity between developers and tasks. • IR+CN [20]: This approach recommends developers based on their social relationships. By mining historical comments, it constructs a weighted graph called comment network (CN) to model developers' social relationships. • DMF [38]: a typical matrix factorization model with neural network architecture to learn a common low dimensional space for the representations of users and items. • NFM [31]: a typical deep learning model that unifies the strengths of factorization machines and deep neural networks for sparse rating modelling.

Scenario Description
To test the performance of SoftRec in tackling the interaction sparsity (i.e., RQ2) and model update (i.e., RQ4), we design two test scenarios: (1) interaction sparse scenario; (2) new developer cold start scenario. To simulate the interaction sparse scenario, we design different ratios of data density by removing the known elements from developer-task explicit interaction matrix R exp according to their time slice. To simulate cold-start scenarios with new developers, (1) we first change the developers to new developers by removing their related information from the initial database tables; (2) second, we recover the removed information into the corresponding database tables and recalculate various relationships according to their time slice (i.e., recover one day's information at a time), in this process, the fastUpdate is performed online, but the fullUpdate is performed periodically (i.e., performed after seven day's recoveries).

Performance Evaluation
We adopt three representative performance metrics: precision @k, recall@k [4], and ndcg@k [42] for performance evaluation, as shown in Equations (15)- (17). Where precision@k and recall@k are widely-used metrics that don't consider the ranking position, ndcg@k is a popular ranking-based metrics in recent years, which considers the ranking position, where a higher position is assigned with a higher score. By default, we set k = 5, and randomly divided the datasets into 10 groups, where 80% as the training set and 20% as the testing set, and the evaluation was conducted based on cross-validation. We use the task's actual developers as the ground-truth results and the reported performance was averaged over 20 repetitions.
where Z k is the normalizer to ensure that the perfect ranking has a value of 1. r i is the relevance of developer u at position i, if u exists in the test, we set r i = 1, otherwise r i = 0. Table 3 compares the overall performance of the five approaches. We have the following observations. First, IR+CN achieves poor performance on both datasets. This indicates that considering only the explicit social relationships does not suffice to capture the potential interactions between developers and tasks. Compared to IR+CN, PR-CF generally achieves better improvements in most cases. This indicates the importance of considering the latent factor similarity between tasks. The reason we chose PR-CF and IR+CN as the comparison algorithm is that both PR-CF and IR+CN are typical developer recommendation approaches and similar to our approach. Where PR-CF utilizes the implicit similarity relationship between tasks to improve the recommendation accuracy, while IR+CN utilizes the social relationship between developers to improve the recommendation accuracy. Second, compared to DMF, NFM achieves a better accuracy in most cases. The reason might be that NFM combines the linearity of a factorization machine in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions, which is more expressive than DMF. Third, among all approaches in comparison, SoftRec achieves the highest accuracy across all different cases. For example, for the GitHub dataset, the precision, recall and ndcg obtained by SoftRec are 0.4673, 0.7633 and 0.4727, respectively. It outperforms other state-of-the-art approaches by an average of 23.26%, 6.69% and 34.56% in precision, recall and ndcg, respectively. For the GitLab dataset, SoftRec's precision, recall and ndcg are 0.4929, 0.7701 and 0.5109, respectively, outperforming other state-of-the-art approache by an average of 29.75%, 5.33% and 23.56%, respectively. The reasons might be that (1) SoftRec can fully explore the explicit and implicit multi-relationships which is helpful to improve recommendation accuracy, unlike NMF and DMF which employ the explicit interactions only; (2) SoftRec can project the feature vectors U i and V j into a non-linear space upon the architecture of deep neural network, which helps capture the non-linear and complex feature of real-world data, unlike PR-CF and IR+CN model feature interactions into a linear space only. We test SoftRec in scenarios 1) with different sparsity of developer-task interactions (described in Section 4.1.5 Scenario Description). Figure 3a,b show the compared results of precision, recall and ndcg for the GitHub and GitLab datasets, where ρ 1 and ρ 2 denote the initial data density of interaction matrics in GitHub and GitLab datasets, respectively. We have the following observations.

Overall Performance Comparison (RQ1)
When the data density decreases from ρ 1 and ρ 2 to 0, the precision, recall and ndcg values of all approaches decrease quickly. However, SoftRec significantly and consistently outperforms than other approaches. For example, for GitHub dataset, when the data density is 0.6 * ρ 1 , the values of precision, recall and ndcg of SoftRec are 43.55%, 47.17% and 28.71% higher than other compared approaches on average. For GitLab dataset, when the data sparsity is 0.6 * ρ 2 , SoftRec's precision, recall and ndcg are 25.29%, 22.66%, 24.78% higher than other compared approaches on average. Besides, as the data density gradually decreases, the performance of SoftRec decreases more slowly than other compared methods. This phenomenon indicates that SoftRec performs better when tackling data sparsity. Another discussion is that why SoftRec performs better than other compared approaches in GitHub dataset than in GitLib dataset. The reason might be that the GitHub dataset is more sparse than the GitLab dataset (e.g., in GitHub dataset ρ 1 = 3.75%, in GitLab dataset ρ 2 = 6.01%), and the compared approaches do not fully mine various implicit relationships, making its performance worse when the interactions becomes more and more sparse. However, SoftRec takes advantage of various implicit relationships, and this effectively alleviates the sparsity of interaction and improves the recommendation accuracy.

Effects of the Multi-Relationship (RQ3)
In SoftRec, the parameters λ C and λ S determined the effects of developer-developer collaboration relationship and task-task association relationship on the recommendation results. Now let us discuss how these relationships affect SoftRec and how to determine their values. From Equations (8)- (10) we can see that SoftRec uses λ C and λ S to balance the collaboration relationships and the association relationships with the developer-task interaction relationships. When λ C = 0, it is equivalent to completely ignoring the influence of collaboration relationships. As λ C increases, it means that the collaboration relationships are leveraged to make recommendations with a higher priority. We also demonstrate it in our experiments, as shown in Figure 4a,b, as λ C and λ S increases from 0.01 to 5 and 0.01 to 10, SoftRec's precision, recall and ndcg values increase, while when λ C and λ S exceed 5 and 10 respectively, these values decrease gradually.  Comparison of the recommendation accuracy (precision, recall and ndcg) with different density of developer-task interaction matrix (e.g., the density ranges from 1 * ρ 1 to 0 * ρ 1 in subfigure (a)) in GitHub and GitLab datasets. Furthermore, to compare the performance of SoftRec with and without the multi-relationship, we set λ C = λ S = 0. Table 4 shows the compared results, where SoftRec' denotes the SoftRec without the multi-relationship. For example, for GitHub dataset, the values of precision, recall and ndcg of SoftRec with multi-relationship are 50.11%, 32.93% and 59.10% higher than the values of SoftRec without multi-relationship.
This phenomenon presented in Figure 4 and Table 4 shows that by leveraging of the developer-developer collaboration relationships and task-task association relationships can help improve the recommendation accuracy effectively. Because the optimal values for those parameters are domain-specific, we set the parameters through trial-and-error in the experiments.   Now let us discuss whether SoftRec can alleviate the cold start issue and its performance in tackling update. The advantage of SoftRec is that its fastUpdate approach as a supplement to the fullUpdate can accommodate new developers and new tasks promptly, thus effectively alleviating the cold start issue. To demonstrate the performance of fastUpdate, we take the new developers cold start (described in Section 4.1.5 Scenario Description) as an example, as shown in Table 5, with the proposed fastUpdate, for GitHub dataset, SoftRec's average precision, recall and ndcg values increase by about 4.25% to 4.1%, 2.88% respectively; for the GitLab dataset, its average precision, recall and ndcg values increase by about 4.17% to 7.14%, 3.70% respectively. It shows that SoftRec's fastUpdate can properly address the cold start issue by making online recommendations for new developers.

User Study and System Design (RQ5)
To further demonstrate the effectiveness of SoftRec, we conduct a user study at Neusoft Corporation. The scenario is that in Neusoft's project and process management, hundreds of issues (e.g., defects or new features) are created by developers (e.g., project managers, software testers or tech supports) every day. To improve the cooperation efficiency, these issues need to be assigned to the appropriate developers as soon as possible. Current manual assignments are usually time consuming and inefficient and automatic assignment mechanism is urgently required.
In this study, we choose five typical projects in Neusoft Corporation, including Workflow, DI, DataViz, ACAP and APM. For each project, we filter 100 open state issues and try to recommend developers for them based on the SoftRec framework and evaluate the results by means of online questionnaires.

Effectiveness of Our Approach
The results of the questionnaires are shown in Figure 5, We can see that for all surveyed projects, SoftRec's recommendation results are significantly superior to other approaches. For SoftRec, The average values of precision, recall and ndcg in these study projects Workflow, DI, DataViz, ACAP and APM are 0.4721, 0.7868, 0.5102, respectively. Compared with other four approaches, the values of precision, recall and ndcg of SoftRec are 26.33%, 12.64% and 15.63% higher on average, respectively.
Because we recommend developers within the scope of each project, the developer-developer collaboration relationships, developer-task interaction relationships, and task-task association relationships are very close. This leads to higher recommendation accuracies compared with those achieved on the sparse datasets.

User Interview
We interviewed 25 developers about their opinions of the recommendations made by SoftRec versus those made by the conventional expertise-based recommendation approaches tested in our experiments. The participants of the interview consisted of 20 male developers and 5 female developers, with an average work experience of 5 years. After inspecting the recommended tasks, 16 of them felt highly confident that they would accept the recommended tasks, 7 of them felt confident about the recommended tasks, and the other 2 may or may not have accepted the recommended tasks. The interview results indicate that more than 92% of the interviewees are satisfied with the recommendations. These interview results empirically show that SoftRec can improve their collaboration efficiency by making accurate developer recommendations.

System Design
To facilitate the practice of DevRec in the real environment, we have designed a feasible technical framework at Neusoft Corporation. As shown in Figure 6, its functions include real-time data collection, distributed data preprocessing, distributed data indexing, distributed data storage, multi-relationship computing, model trainning and prediction, etc.
In this technical framework, we exploit the open source tool chains as much as possible. For example, we collect data through open source technologies, such as Crawler (https://www. npmjs.com/package/crawler), GitHub API (https://developer.github.com/), Logstash and Beats (https://www.elastic.co/cn/downloads/beats), etc. To optimize the data transmission efficiency, we integrate a message queue middleware (Apache Kafka (http://kafka.apache.org/)) into the framework. The data preprocessing (e.g., data extraction, data transformation, data cleaning) are implemented based on Logstash. The data indexing and storage are implemented based on ElasticSearch (https://github.com/elastic/elasticsearch). The multi-relationship computing and model trainning are based on Spark (http://spark.apache.org/) and Tensorflow (https://www.tensorflow. org/). The advantages of our technical framework are as follows: • From the viewpoint of software development, we design the framework based on a series of open source tool chains as much as possible, which follows the idea of open source software and can shield the underlying complexity and improve the development efficiency. • From the viewpoint of system availability, the framework supports distributed data storage and parallel computing for developer recommendation, which makes it have better performance in big data environment and provide a valuable technical reference for system practice. As far as we know, this technical framework has been adopted by Neusoft Corporation and will be integrated into their commercial product (https://platform.neusoft.com/allproducts/acap) as part of their DevOps tool chains. ...

Threats to Validity
First, our experiments are performed on five popular projects in GitHub dataset and five large commercial projects in Neusoft Corporation. We cannot claim that the same results would be achieved with other projects or other periods of time. Moreover, the results of the application of SoftRec to other platforms, e.g., Bitbucket, StackOverflow, TopCoder, might not be exactly the same. As future work, we plan to extend our evaluation on more universal open-source and industrial projects.
Second, SoftRec is a relationship-aware developer recommendation approach aiming to recommend suitable developers based on the idea of collaborative filtering. We use the actual developers of tasks as the ground truth and do not consider their expertise, reputation and workloads, etc. Thus, there is a risk that the recommended developers might not be the best ones of all. To mitigate this threat, we plan to extend our framework by measuring the developers' abilities (e.g., skills, expertise, reputation, contribution and workload).

Conclusions and Future Work
In this paper, we proposed SoftRec, a novel multi-relationship fused approach for developer recommendation. In SoftRec, three types of implicit relationships are utilized, including the collaboration relationships between developers, the interaction relationships between developers and tasks, the association relationships between tasks. Furthermore, a novel fast model update approach was proposed to address the cold start issue. To our best knowledge, this is the first attempt to systematically integrate the developers' collaboration relationships and tasks' association relationships into developer recommendation.
Form the viewpoint of theory innovation, we propose to utilize joint matrix factorization to project developers' and tasks' features and their relationships into a low dimensional latent vector space. It leverages not only the common developer latent vectors in both the interaction matrix and the developer collaboration matrix, but also the common task latent vectors in both the interaction matrix and the task association matrix, and we refine the latent vectors based on deep neural network. It effectively solves the issues of interactive sparseness and cold start in traditional collaborative filtering. From the viewpoint of practice, we conduct extensive experiments on two real-world datasets and we also conduct a user study in a well-known software company. The results demonstrate the high performance of SoftRec. Furthermore, we design a feasible technical framework and exploit the open source tool chains as much as possible, which helps to facilitate the practice of SoftRec in real environment.
In the future, we will extend SoftRec in three ways. The first one is to employ deep learning to solve the multiple implicit relationship fusion for developer recommendation. The second one is to further solve the boundary of the fast update for SoftRec through the theoretical or experimental analysis. The third one is to further investigate the usefulness of SoftRec and consider integrating the developers' abilities (e.g., skills, expertise, reputation, contribution and workload) into SoftRec. Moreover, we plan to provide a set of developer recommendation tools that can be used in real environments. We hope to provide free plugins or service APIs for websites such as GitHub, StackOverflow and Topcoder, etc.
Author Contributions: Conceptualization, X.X.; Data curation, X.X.; Investigation, X.X.; Writing-original draft, X.X.; Writing-review and editing, B.W. and X.Y.; Resources, X.Y. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.